ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning Paper • 2602.21534 • Published 3 days ago • 20
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models Paper • 2510.09541 • Published Oct 10, 2025 • 17
Large Reasoning Models Learn Better Alignment from Flawed Thinking Paper • 2510.00938 • Published Oct 1, 2025 • 59
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks Paper • 2503.15478 • Published Mar 19, 2025 • 13
liuhaotian/llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5 Text Generation • Updated Oct 5, 2023 • 47 • 23
LLM-Rec: Personalized Recommendation via Prompting Large Language Models Paper • 2307.15780 • Published Jul 24, 2023 • 28