Models in Adaptive Length Penalty Paper
AI & ML interests
None defined yet.
Recent Activity
models
80
RLAIF/twitter_8EUB__5e-06_0.1_20_0.9_20_0.95
Updated
RLAIF/dpo_thinking_reddit_judge_last_minute_50_1e-6_0.02_4B_4B
Updated
RLAIF/dpo_thinking_reddit_judge_last_minute_150_1e-6_0.02_4B_4B
Updated
RLAIF/dpo_thinking_reddit_judge_last_minute_100_1e-6_0.02_4B_4B
Updated
RLAIF/dpo_thinking_reddit_judge_last_minute_200_1e-6_0.02_4B_4B
Updated
RLAIF/dpo_thinking_reddit_judge_last_minute_250_1e-6_0.02_4B_4B
Updated
RLAIF/grpo_reddit_judge_last_minute_16_64_8_3e-5_1e-6_4B
Updated
RLAIF/dpo_thinking_reddit_judge_full_1e-6_0.02_8B_4B
Updated
RLAIF/dpo_answer_reddit_judge_full_1e-6_0.02_4B_1.7B
Updated
RLAIF/dpo_answer_reddit_judge_full_1e-6_0.02_8B_4B
Updated
datasets
132
RLAIF/ultrafeedback-binarized
Updated
RLAIF/gm_toy_example
Viewer
•
Updated
•
1.1k
•
29
RLAIF/dpo_thinking_reddit_judge4_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
27k
•
25
RLAIF/dpo_thinking_reddit_judge3_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
8k
•
23
RLAIF/dpo_thinking_reddit_judge2_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
27k
•
19
RLAIF/dpo_thinking_reddit_judge_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
27k
•
20
RLAIF/dpo_thinking_reddit_offtheshelf_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
27k
•
22
RLAIF/dpo_answer_reddit_judge_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
27k
•
26
RLAIF/dpo_answer_reddit_offtheshelf_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
27k
•
23
RLAIF/WritingPrompts-Filtered
Viewer
•
Updated
•
199k
•
105