🎯 Brief Introduction

Youtu-LLM is a new, small, yet powerful LLM, contains only 1.96B parameters, supports 128k long context, and has native agentic talents. On general evaluations, Youtu-LLM significantly outperforms SOTA LLMs of similar size in terms of Commonsense, STEM, Coding and Long Context capabilities; in agent-related testing, Youtu-LLM surpasses larger-sized leaders and is truly capable of completing multiple end2end agent tasks.

Youtu-LLM has the following features:

Type: Autoregressive Causal Language Models with Dense MLA
Release versions: Base and Instruct
Number of Parameters: 1.96B
Number of Layers: 32
Number of Attention Heads (MLA): 16 for Q/K/V
MLA Rank: 1,536 for Q, 512 for K/V
MLA Dim: 128 for QK Nope, 64 for QK Rope, and 128 for V
Context Length: 131,072
Vocabulary Size: 128,256

📊 Performance Comparisons

Base Model

General Benchmarks

Type	Benchmark (Metric)	# Shots	Qwen3-1.7B-Base	SmoLM3-3B-Base	Gemma3-4B-Base	Qwen3-4B-Base	Llama3.1-8B	Youtu-LLM-2B-Base
Commonsense	MMLU-Pro (EM)	5	34.9%	35.3%	29.4%	46.1%	36.2%	48.4%
	MLQA-Zh (EM)	3	38.1%	38.0%	40.3%	47.2%	43.0%	43.5%
	MMLU-ProX-Zh (EM)	5	32.5%	26.7%	24.2%	45.2%	25.4%	40.7%
STEM	GSM8K (EM)	8	68.2%	67.3%	38.5%	80.8%	47.8%	77.6%
	MGSM-Zh (EM)	8	57.1%	40.7%	33.0%	69.7%	35.9%	68.9%
	MATH (EM)	4	28.1%	40.8%	24.4%	44.8%	21.5%	44.4%
	BBH (EM)	3	53.0%	59.8%	51.6%	70.8%	62.9%	59.8%
	GPQA-MC (Acc. Norm)	5	30.4%	26.6%	28.6%	37.8%	30.1%	33.3%
	HLE-MC (Acc. Norm)	3	10.7%	3.1%	8.0%	15.0%	11.5%	17.4%
Coding	MBPP (Pass@1)	3	55.6%	51.0%	45.8%	67.5%	49.4%	66.6%
	MBPP+ (Pass@1)	3	71.0%	66.1%	61.9%	80.8%	62.7%	81.8%
	HumanEval (Pass@1)	0	49.9%	34.8%	36.6%	57.6%	36.0%	64.6%
	HumanEval+ (Pass@1)	0	41.3%	28.1%	28.1%	49.9%	28.1%	57.3%
	LiveCodeBench v6 (Pass@1)	3	5.1%	2.9%	2.9%	6.9%	3.4%	9.7%
	CRUXEval (Pass@1)	1	40.6%	42.1%	39.7%	54.8%	42.3%	55.9%
	RepoBench (EM)	3	21.0%	21.8%	23.0%	25.3%	25.2%	22.7%
Long Context	LongBench v2 (Acc.)	3	28.0%	28.8%	26.6%	25.8%	27.8%	27.2%
	NIAH (Acc.)	/	79.8%	75.0%	99.5%	83.0%	99.8%	98.8%

Agentic Benchmarks

We takes APTBench for evaluating the agentic capabilities of base model.

Category	Qwen3-1.7B-Base	SmoLM3-3B-Base	Gemma3-4B-Base	Qwen3-4B-Base	Llama3.1-8B	Youtu-LLM-2B-Base
Code	25.1%	24.3%	32.8%	41.9%	23.6%	37.9%
Deep Research	28.5%	27.2%	36.4%	40.5%	30.0%	38.6%
Math	59.9%	60.7%	59.8%	70.5%	60.1%	68.0%
Tool	56.7%	59.1%	61.7%	65.8%	64.1%	64.2%

Downloads last month: 26

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for tencent/Youtu-LLM-2B-Base

Finetunes

1 model

Quantizations

1 model

Collection including tencent/Youtu-LLM-2B-Base

Youtu

Collection

3 items • Updated 1 day ago • 2