LLM - a Chevolier Collection

Chevolier 's Collections

Image Generation

VLA

Video Generation

LLM

Agent

LLM

updated about 1 month ago

Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning

Paper • 2510.03259 • Published Sep 26, 2025 • 57
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Paper • 2510.07242 • Published Oct 8, 2025 • 30
First Try Matters: Revisiting the Role of Reflection in Reasoning Models

Paper • 2510.08308 • Published Oct 9, 2025 • 24
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published Oct 3, 2025 • 75
Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States

Paper • 2510.11052 • Published Oct 13, 2025 • 51
RLFR: Extending Reinforcement Learning for LLMs with Flow Environment

Paper • 2510.10201 • Published Oct 11, 2025 • 35
Making Mathematical Reasoning Adaptive

Paper • 2510.04617 • Published Oct 6, 2025 • 22
Demystifying Reinforcement Learning in Agentic Reasoning

Paper • 2510.11701 • Published Oct 13, 2025 • 31
Are Large Reasoning Models Interruptible?

Paper • 2510.11713 • Published Oct 13, 2025 • 4
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published Oct 13, 2025 • 177
Deep Self-Evolving Reasoning

Paper • 2510.17498 • Published Oct 20, 2025 • 11
Continuous Autoregressive Language Models

Paper • 2510.27688 • Published Oct 31, 2025 • 70
Higher-order Linear Attention

Paper • 2510.27258 • Published Oct 31, 2025 • 14
Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning

Paper • 2510.27044 • Published Oct 30, 2025 • 5
Why Language Models Hallucinate

Paper • 2509.04664 • Published Sep 4, 2025 • 195
Reverse-Engineered Reasoning for Open-Ended Generation

Paper • 2509.06160 • Published Sep 7, 2025 • 150
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26, 2025 • 139
FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published Sep 18, 2025 • 114
Towards a Unified View of Large Language Model Post-Training

Paper • 2509.04419 • Published Sep 4, 2025 • 75
Variational Reasoning for Language Models

Paper • 2509.22637 • Published Sep 26, 2025 • 69
Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models

Paper • 2509.06949 • Published Sep 8, 2025 • 55
Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR

Paper • 2509.23808 • Published Sep 28, 2025 • 47
Sequential Diffusion Language Models

Paper • 2509.24007 • Published Sep 28, 2025 • 45
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models

Paper • 2511.23319 • Published Nov 28, 2025 • 22