HyperCLOVAX-SEED-Text-Think-32B

Extracted text-only LLM from naver-hyperclovax/HyperCLOVAX-SEED-Think-32B

This model contains only the language model component extracted from the original Vision-Language Model (VLM). The vision encoder and multimodal projector have been removed, making it a pure text-to-text model compatible with standard LLaMA inference pipelines.

Model Details

Property Value
Architecture LlamaForCausalLM
Parameters ~33B
Hidden Size 5120
Layers 72
Attention Heads 40
KV Heads 8 (GQA)
Intermediate Size 24192
Context Length 128K
Vocab Size 128,256
Precision bfloat16
RoPE Theta 50,000,000

What Was Extracted

The original VLM consists of:

  • Vision Encoder: Qwen2.5-VL based (~600M params) - removed
  • MM Projector: Multimodal projection layers - removed
  • Language Model: HyperCLOVAX LLM (~33B params) - extracted

Only the model.language_model.* weights were extracted and remapped to standard LLaMA format.

Usage

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="bfloat16",
    device_map="auto"
)

messages = [{"role": "user", "content": "What is the capital of South Korea?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs.to(model.device), max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With vLLM

vllm serve minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf \
    --dtype bfloat16 \
    --tensor-parallel-size 2
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")
response = client.chat.completions.create(
    model="minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf",
    messages=[{"role": "user", "content": "안녕하세요! 한국어로 대화할 수 있나요?"}]
)
print(response.choices[0].message.content)

Thinking Mode

The model supports a "thinking mode" for complex reasoning tasks. Use the <|thinking|> token to trigger extended reasoning:

messages = [
    {"role": "user", "content": "Solve this step by step: If x + 2y = 10 and 3x - y = 5, find x and y."}
]
# The model may produce <|thinking|>...</|thinking|> blocks with its reasoning process

Hardware Requirements

  • Minimum: 2x NVIDIA A100 40GB (with tensor parallelism)
  • Recommended: 2x NVIDIA A100 80GB or 4x NVIDIA A6000

Limitations

  • This is a text-only model. It cannot process images or videos.
  • The model inherits any limitations from the original HyperCLOVAX-SEED-Think-32B.
  • Optimized primarily for Korean and English.

License

This model inherits the HyperCLOVAX license from the original model.

Citation

If you use this model, please cite the original:

@misc{hyperclovax-seed-think-32b,
  title={HyperCLOVA X SEED Think 32B},
  author={NAVER Cloud},
  year={2025},
  url={https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B}
}

Reproduce This Extraction

Want to extract the LLM yourself? Use the included extract_llm.py script.

Prerequisites

pip install safetensors torch tqdm huggingface_hub

Step 1: Download Original VLM (~66GB)

huggingface-cli download naver-hyperclovax/HyperCLOVAX-SEED-Think-32B \
    --local-dir ./HyperCLOVAX-SEED-Think-32B

Step 2: Run Extraction Script

# Download the extraction script
wget https://huggingface.co/minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf/resolve/main/extract_llm.py

# Run extraction
python extract_llm.py \
    --input ./HyperCLOVAX-SEED-Think-32B \
    --output ./HyperCLOVAX-SEED-Text-Think-32B

What the Script Does

  1. Extracts LLM weights: Filters model.language_model.* tensors from the VLM
  2. Remaps keys: Converts to standard LLaMA format
    • model.language_model.model.*model.*
    • model.language_model.lm_head.*lm_head.*
  3. Creates config: Generates LLaMA-compatible config.json from VLM's text_config
  4. Copies tokenizer: Preserves all tokenizer files unchanged

Output Structure

HyperCLOVAX-SEED-Text-Think-32B/
├── config.json                      # LLaMA config
├── generation_config.json
├── model-00001-of-00013.safetensors # ~5GB shards
├── ...
├── model-00013-of-00013.safetensors
├── model.safetensors.index.json
├── tokenizer.json
├── tokenizer_config.json
├── special_tokens_map.json
├── added_tokens.json
├── vocab.json
├── merges.txt
└── chat_template.jinja

Verify Extraction

# Quick test with vLLM
vllm serve ./HyperCLOVAX-SEED-Text-Think-32B \
    --dtype bfloat16 \
    --tensor-parallel-size 2

# In another terminal
curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model": "./HyperCLOVAX-SEED-Text-Think-32B", "messages": [{"role": "user", "content": "Hello!"}]}'

Acknowledgments

  • Original model by NAVER Cloud HyperCLOVA X
  • Extraction performed to enable text-only inference without vision dependencies
Downloads last month
502
Safetensors
Model size
33B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for minpeter/HyperCLOVAX-SEED-Text-Think-32B-hf

Finetuned
(1)
this model