arxiv:2605.06139
Yun Qu
yunqu
AI & ML interests
None yet
Recent Activity
authored a paper about 15 hours ago
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex upvoted a paper about 20 hours ago
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex submitted a paper about 20 hours ago
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response SimplexOrganizations
None yet