reasoning llm - a zhuww Collection

zhuww 's Collections

RL

arena

SWE

code

agentic

LLM

reasoning llm

updated Oct 9, 2025

Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated

Paper • 2509.05739 • Published Sep 6, 2025 • 2
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers

Paper • 2509.03059 • Published Sep 3, 2025 • 24
Universal Deep Research: Bring Your Own Model and Strategy

Paper • 2509.00244 • Published Aug 29, 2025 • 13
<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs

Paper • 2509.08358 • Published Sep 10, 2025 • 13
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining

Paper • 2508.10975 • Published Aug 14, 2025 • 60
A Survey on Latent Reasoning

Paper • 2507.06203 • Published Jul 8, 2025 • 93
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Paper • 2507.00432 • Published Jul 1, 2025 • 79
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning

Paper • 2507.16812 • Published Jul 22, 2025 • 63
Generative AI Act II: Test Time Scaling Drives Cognition Engineering

Paper • 2504.13828 • Published Apr 18, 2025 • 18
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

Paper • 2506.09513 • Published Jun 11, 2025 • 101
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 75
Large Language Models for Data Synthesis

Paper • 2505.14752 • Published May 20, 2025 • 49
OpenThoughts: Data Recipes for Reasoning Models

Paper • 2506.04178 • Published Jun 4, 2025 • 50
Skywork Open Reasoner 1 Technical Report

Paper • 2505.22312 • Published May 28, 2025 • 54
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions

Paper • 2502.13124 • Published Feb 18, 2025 • 6
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding

Paper • 2504.01943 • Published Apr 2, 2025 • 15
OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique

Paper • 2507.09075 • Published Jul 11, 2025 • 15
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling

Paper • 2506.20512 • Published Jun 25, 2025 • 48
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

Paper • 2506.05209 • Published Jun 5, 2025 • 59
Essential-Web v1.0: 24T tokens of organized web data

Paper • 2506.14111 • Published Jun 17, 2025 • 46
HardTests: Synthesizing High-Quality Test Cases for LLM Coding

Paper • 2505.24098 • Published May 30, 2025 • 43
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team

Paper • 2506.14234 • Published Jun 17, 2025 • 41
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond

Paper • 2505.19641 • Published May 26, 2025 • 68
WebThinker: Empowering Large Reasoning Models with Deep Research Capability

Paper • 2504.21776 • Published Apr 30, 2025 • 59
Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning

Paper • 2505.17813 • Published May 23, 2025 • 58
Thinkless: LLM Learns When to Think

Paper • 2505.13379 • Published May 19, 2025 • 50
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math

Paper • 2504.21233 • Published Apr 30, 2025 • 49
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values

Paper • 2504.05535 • Published Apr 7, 2025 • 44
MegaMath: Pushing the Limits of Open Math Corpora

Paper • 2504.02807 • Published Apr 3, 2025 • 35
Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data

Paper • 2505.05427 • Published May 8, 2025 • 4
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 99
OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training

Paper • 2501.08197 • Published Jan 14, 2025 • 9
Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset

Paper • 2412.02595 • Published Dec 3, 2024 • 5
DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17, 2024 • 55
Improving Pretraining Data Using Perplexity Correlations

Paper • 2409.05816 • Published Sep 9, 2024
Rethinking Reflection in Pre-Training

Paper • 2504.04022 • Published Apr 5, 2025 • 80
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6, 2025 • 113
Large Language Model Agent: A Survey on Methodology, Applications and Challenges

Paper • 2503.21460 • Published Mar 27, 2025 • 83
Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers

Paper • 2503.00865 • Published Mar 2, 2025 • 64
A Comprehensive Survey on Long Context Language Modeling

Paper • 2503.17407 • Published Mar 20, 2025 • 49
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning

Paper • 2503.15558 • Published Mar 18, 2025 • 50
Open Deep Search: Democratizing Search with Open-source Reasoning Agents

Paper • 2503.20201 • Published Mar 26, 2025 • 48
Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory

Paper • 2509.14662 • Published Sep 18, 2025 • 13
AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning

Paper • 2510.06261 • Published Oct 5, 2025 • 5