Qwen3-4B-Thinking-2507-Heretic-GGUF
Llamacpp imatrix Quantizations of Qwen3-4B-Thinking-2507-Heretic by becnic (from original Qwen3-4B-Thinking-2507)
Using llama.cpp release b7120 for quantization.
Original model: https://huggingface.co/becnic/Qwen3-4B-Thinking-2507-Heretic
Run them in LM Studio
Run them directly with llama.cpp, or any other llama.cpp based project
Download a file (not the whole branch) from below:
| Filename | Quant type | File Size | Split | Description |
|---|---|---|---|---|
| Qwen3-4B-Thinking-2507-Heretic-f16.gguf | f16 | 8.05GB | false | Full precision, highest possible quality |
| Qwen3-4B-Thinking-2507-Heretic-Q8_0.gguf | Q8_0 | 4.28GB | false | Extremely high quality |
| Qwen3-4B-Thinking-2507-Heretic-Q6_K.gguf | Q6_K | 3.31GB | false | Near-lossless high quality |
| Qwen3-4B-Thinking-2507-Heretic-Q5_K_S.gguf | Q5_K_S | 2.82GB | false | Premium high quality |
| Qwen3-4B-Thinking-2507-Heretic-Q5_K_M.gguf | Q5_K_M | 2.89GB | false | Very high quality |
| Qwen3-4B-Thinking-2507-Heretic-Q5_0.gguf | Q5_0 | 2.82GB | false | High quality |
| Qwen3-4B-Thinking-2507-Heretic-Q4_K_S.gguf | Q4_K_S | 2.38GB | false | Strong mid-high quality |
| Qwen3-4B-Thinking-2507-Heretic-Q4_K_M.gguf | Q4_K_M | 2.50GB | false | Balanced mid-high quality |
| Qwen3-4B-Thinking-2507-Heretic-Q4_0.gguf | Q4_0 | 2.37GB | false | Good balance of size and quality |
| Qwen3-4B-Thinking-2507-Heretic-Q3_K_S.gguf | Q3_K_S | 1.89GB | false | Higher tier Q3 |
| Qwen3-4B-Thinking-2507-Heretic-Q3_K_M.gguf | Q3_K_M | 2.08GB | false | Mid-range |
| Qwen3-4B-Thinking-2507-Heretic-Q2_K.gguf | Q2_K | 1.67GB | false | Smallest size, lowest quality |
Downloading using huggingface-cli
Click to view download instructions
First, make sure you have hugginface-cli installed:
pip install -U "huggingface_hub[cli]"
Then, you can target the specific file you want:
huggingface-cli download ZuzeTt/Qwen3-4B-Thinking-2507-Heretic-GGUF --include "Qwen3-4B-Thinking-2507-Q8_0.gguf" --local-dir ./
If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
huggingface-cli download ZuzeTt/Qwen3-4B-Thinking-2507-Heretic-GGUF --include "Qwen3-4B-Thinking-2507-Q8_0.gguf/*" --local-dir ./
Abliteration parameters
| Parameter | Value |
|---|---|
| direction_index | 19.42 |
| attn.o_proj.max_weight | 1.23 |
| attn.o_proj.max_weight_position | 22.34 |
| attn.o_proj.min_weight | 0.69 |
| attn.o_proj.min_weight_distance | 10.42 |
| mlp.down_proj.max_weight | 1.12 |
| mlp.down_proj.max_weight_position | 29.64 |
| mlp.down_proj.min_weight | 1.08 |
| mlp.down_proj.min_weight_distance | 20.24 |
Performance
| Metric | This model | Original model (Qwen/Qwen3-4B-Thinking-2507) |
|---|---|---|
| KL divergence | 0.06 | 0 (by definition) |
| Refusals | 6/100 | 96/100 |
Model Overview
Qwen3-4B-Thinking-2507 has the following features:
- Type: Causal Language Models
- Training Stage: Pretraining & Post-training
- Number of Parameters: 4.0B
- Number of Paramaters (Non-Embedding): 3.6B
- Number of Layers: 36
- Number of Attention Heads (GQA): 32 for Q and 8 for KV
- Context Length: 262,144 natively.
NOTE: This model supports only thinking mode. Meanwhile, specifying enable_thinking=True is no longer required.
Additionally, to enforce model thinking, the default chat template automatically includes <think>. Therefore, it is normal for the model's output to contain only </think> without an explicit opening <think> tag.
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.
Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
- Downloads last month
- 2,176
1-bit
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for ZuzeTt/Qwen3-4B-Thinking-2507-Heretic-Imatrix-GGUF
Base model
Qwen/Qwen3-4B-Thinking-2507