LegendaryDawn/SDRL-rand-Qwen3-8B-Base-random_n8_l4096-DAPO_n8_bs256_long12-yarn2-step200 8B • Updated about 20 hours ago • 7
LegendaryDawn/SDRL-rand-Qwen3-8B-Base-random_n8_l4096-DAPO_n8_bs256_long12-yarn2-step200 8B • Updated about 20 hours ago • 7
Prepare Reasoning Language Models for Multi-Agent Debate with Self-Debate Reinforcement Learning Paper • 2601.22297 • Published 16 days ago • 2
LegendaryDawn/SDRL-baseline-Qwen3-8B-Base-DAPO-n8-bs256-long12-yarn2-step200 8B • Updated 3 days ago • 5
LegendaryDawn/SDRL-baseline-Qwen3-8B-Base-DAPO-n8-bs256-long12-yarn2-step200 8B • Updated 3 days ago • 5
LegendaryDawn/SDRL-freq-Qwen3-8B-Base-majority_n8_l4096-DAPO_n8_bs256_long8-step200 8B • Updated 10 days ago • 217
LegendaryDawn/SDRL-freq-Qwen3-8B-Base-majority_n8_l4096-DAPO_n8_bs256_long8-step200 8B • Updated 10 days ago • 217
LegendaryDawn/SDRL-rand-Qwen3-8B-Base-random_n8_l4096-DAPO_n8_bs256_long8-step200 8B • Updated 10 days ago • 222
LegendaryDawn/SDRL-rand-Qwen3-8B-Base-random_n8_l4096-DAPO_n8_bs256_long8-step200 8B • Updated 10 days ago • 222
LegendaryDawn/SDRL-rand-Qwen3-4B-Base-icml-self-debate-random_n8_l2048-DAPO_n8_bs256_long8-step200 4B • Updated 18 days ago • 79
LegendaryDawn/SDRL-rand-Qwen3-4B-Base-icml-self-debate-random_n8_l2048-DAPO_n8_bs256_long8-step200 4B • Updated 18 days ago • 79
LegendaryDawn/SDRL-freq-Qwen3-4B-Base-icml-self-debate-exp-majority_n8_l2048-DAPO_n8_bs256_long8-run2-step200 4B • Updated 20 days ago • 1.01k
LegendaryDawn/SDRL-freq-Qwen3-4B-Base-icml-self-debate-exp-majority_n8_l2048-DAPO_n8_bs256_long8-run2-step200 4B • Updated 20 days ago • 1.01k
LegendaryDawn/SDRL-rand-Qwen2.5-3B-icml-self-debate-ablation-random_n4_l2048-DAPO_n8_bs256_long8-step200 3B • Updated 20 days ago • 340
LegendaryDawn/SDRL-rand-Qwen2.5-3B-icml-self-debate-ablation-random_n4_l2048-DAPO_n8_bs256_long8-step200 3B • Updated 20 days ago • 340
LegendaryDawn/SDRL-freq-ablation-step125-Qwen3-4B-Base-icml-self-debate-majority_n8_l2048-DAPO_n8_bs256_long8 4B • Updated 22 days ago • 87
LegendaryDawn/SDRL-freq-ablation-step125-Qwen3-4B-Base-icml-self-debate-majority_n8_l2048-DAPO_n8_bs256_long8 4B • Updated 22 days ago • 87