Sean13/llama-8b-instruct-v0.2-cpo-full-label_smoothing-0.1 Text Generation • 266k • Updated Nov 21, 2025 • 1
Sean13/mistral-7b-instruct-v0.2-cpo-full-label_smoothing-0.1 Text Generation • 266k • Updated Nov 21, 2025 • 2
Sean13/llama-8b-instruct-simpo-full-label_smoothing-0.1 Text Generation • 266k • Updated Nov 21, 2025 • 1
Sean13/mistral-7b-instruct-v0.2-simpo-full-label_smoothing-0.1 Text Generation • 266k • Updated Nov 21, 2025 • 1
Sean13/llama-8b-instruct-rdpo-full-multipref-init-eta-0.99 Text Generation • 266k • Updated Nov 20, 2025 • 1
Sean13/llama-8b-instruct-rdpo-full-multipref-init-eta-0.80 Text Generation • 266k • Updated Nov 20, 2025
Sean13/mistral-7b-instruct-v0.2-holder-dpo-full-0.6 Text Generation • 266k • Updated Nov 19, 2025 • 1
Sean13/mistral-7b-instruct-v0.2-robust_dpo-full-0.05 Text Generation • 266k • Updated Nov 18, 2025 • 1
Sean13/mistral-7b-instruct-v0.2-robust_dpo-full-0.2 Text Generation • 266k • Updated Nov 18, 2025 • 1
Sean13/mistral-7b-instruct-v0.2-robust_dpo-full-0.1 Text Generation • 266k • Updated Nov 18, 2025 • 1
Sean13/mistral-7b-instruct-v0.2-ipo-full-label_smoothing-0.05 Text Generation • 266k • Updated Nov 18, 2025 • 1
Sean13/mistral-7b-instruct-v0.2-ipo-full-label_smoothing-0.2 Text Generation • 266k • Updated Nov 18, 2025 • 1
Sean13/mistral-7b-instruct-v0.2-ipo-full-label_smoothing-0.1 Text Generation • 266k • Updated Nov 17, 2025 • 1
Sean13/mistral-7b-instruct-v0.2-dpo-full-label_smoothing-0.05 Text Generation • 266k • Updated Nov 17, 2025 • 1
Sean13/mistral-7b-instruct-v0.2-dpo-full-label_smoothing-0.2 Text Generation • 266k • Updated Nov 17, 2025 • 1