Bolian Li's picture

Bolian Li

lblaoke

·

https://lblaoke.github.io/

AI & ML interests

None yet

Recent Activity

authored a paper about 6 hours ago

More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment

authored a paper about 6 hours ago

DRIFT: Learning from Abundant User Dissatisfaction in Real-World Preference Learning

authored a paper about 6 hours ago

Learning Self-Correction in Vision-Language Models via Rollout Augmentation

View all activity

Organizations

Collections 4

View 4 collections

Papers 14

arxiv:2604.26326

arxiv:2602.08503

arxiv:2601.22311

arxiv:2510.02341

models 44

lblaoke/opt-350m-hh-rlhf-rm-trl-v5

0.3B • Updated May 12, 2025 • 5

lblaoke/opt-350m-hh-rlhf-dpo-trl-v5

0.3B • Updated May 12, 2025 • 2

lblaoke/opt-350m-hh-rlhf-chosen-sft-trl-v5

0.3B • Updated May 11, 2025 • 1

lblaoke/opt-125m-hh-rlhf-rm-trl-v5

0.1B • Updated May 9, 2025 • 4

lblaoke/opt-125m-hh-rlhf-dpo-trl-v5

0.1B • Updated May 8, 2025 • 4

lblaoke/opt-125m-hh-rlhf-chosen-sft-trl-v5

0.1B • Updated May 7, 2025 • 6

lblaoke/qwama-0.5b-hh-rlhf-sft-chosen-trl-v4

0.5B • Updated Apr 8, 2025 • 1

lblaoke/qwama-0.5b-skywork-pref-sft-chosen-dpo-trl-v3

0.5B • Updated Mar 28, 2025 • 2

lblaoke/qwama-0.5b-skywork-pref-sft-rejected-chosen-trl-v3

0.5B • Updated Mar 28, 2025 • 2

lblaoke/qwama-0.5b-skywork-pref-sft-chosen-trl-v3

0.5B • Updated Mar 28, 2025 • 4

datasets 0

None public yet