| --- |
| license: apache-2.0 |
| language: |
| - ar |
| - en |
| pipeline_tag: text-generation |
| tags: |
| - text-generation |
| - pytorch |
| - transformers |
| - vllm |
| - causal-lm |
| - depth-extension |
| - arabic |
| - english |
| - karnak |
| - qwen |
| base_model: Qwen/Qwen3-30B-A3B-Instruct-2507 |
| model_name: Karnak |
| parameters: 40B |
| inference: false |
| --- |
| |
| # Karnak: Enhanced Arabic–English Large Language Model |
|
|
| ## Model Summary |
|
|
| **Karnak** is a depth-extended causal language model optimized for **Arabic and English** generation. It is built on top of **Qwen/Qwen3-30B-A3B-Instruct-2507**, featuring architectural depth extension and a tokenizer specifically optimized for Arabic to improve fluency and efficiency. |
|
|
| Karnak was trained using **high-quality, filtered data** through a rigorous pipeline to enhance overall instruction-following capabilities, factuality, and robustness. |
|
|
| ## Key Features |
|
|
| - **Depth Extension (~40B):** Expanded depth to increase reasoning capacity and improve long-range dependency modeling. |
| - **Arabic-Optimized Tokenizer:** Improved Arabic tokenization efficiency, resulting in reduced token fragmentation and higher-quality generation. |
| - **Multi-Stage Training:** The model evolved through: Pre-trained weights → Depth Extension → Continued Pre-training → SFT (Supervised Fine-Tuning). |
| - **Extended Context Window:** Designed for long-context usage with a **safe context range up to 20K tokens** (recommended to stay within this limit for optimal stability). |
|
|
| ## Model Details |
|
|
| - **Model Name:** Karnak |
| - **Base Model:** Qwen/Qwen3-30B-A3B-Instruct-2507 |
| - **Parameter Count:** ~40B (Depth-Extended) |
| - **Languages:** Arabic, English |
| - **Training:** High-quality filtered data + Multi-stage pipeline (Continued pre-training + SFT) |
| - **Safe Context Range:** Up to **20,000 tokens** |
|
|
| --- |
|
|
| ## Usage |
|
|
| ### 1) Hugging Face Transformers |
|
|
| To use Karnak with the standard Transformers library, ensure you have the latest version installed. |
|
|
| ```bash |
| pip install -U "transformers>=4.40.0" torch accelerate |
| ``` |
|
|
| Python Code Example (Chat Template): |
|
|
| ```python |
| import torch |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model_id = "Applied-Innovation-Center/Karnak" |
| |
| # Load tokenizer and model |
| tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) |
| model = AutoModelForCausalLM.from_pretrained( |
| model_id, |
| device_map="auto", |
| torch_dtype=torch.bfloat16, |
| trust_remote_code=True, |
| ) |
| |
| # Prepare Input |
| prompt = "اشرح لي نظرية النسبية بشكل مبسط." |
| messages = [ |
| {"role": "system", "content": "You are a helpful assistant."}, |
| {"role": "user", "content": prompt}, |
| ] |
| |
| # Apply chat template |
| text = tokenizer.apply_chat_template( |
| messages, |
| tokenize=False, |
| add_generation_prompt=True, |
| ) |
| model_inputs = tokenizer([text], return_tensors="pt").to(model.device) |
| |
| # Generate |
| generated_ids = model.generate( |
| **model_inputs, |
| max_new_tokens=512, |
| temperature=0.7, |
| top_p=0.9, |
| ) |
| |
| # Decode output (removing the prompt tokens) |
| generated_ids = generated_ids[:, model_inputs.input_ids.shape[1]:] |
| response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] |
| print(response) |
| ``` |
|
|
| 2) vLLM (Recommended for Production) |
| Karnak is compatible with vLLM for high-throughput inference. |
|
|
| Installation: |
|
|
| ```bash |
| |
| pip install -U vllm |
| ``` |
|
|
| Offline Inference: |
|
|
| ```python |
| |
| from vllm import LLM, SamplingParams |
| |
| model_id = "Applied-Innovation-Center/Karnak" |
| |
| # Initialize the model |
| llm = LLM( |
| model=model_id, |
| trust_remote_code=True, |
| max_model_len=20000, # Safe context range |
| tensor_parallel_size=1, # Adjust based on available GPUs |
| ) |
| |
| # Set sampling parameters |
| sampling_params = SamplingParams( |
| temperature=0.7, |
| top_p=0.9, |
| max_tokens=512, |
| ) |
| |
| # Generate |
| prompts = ["ما هي عاصمة مصر؟"] |
| outputs = llm.generate(prompts, sampling_params) |
| |
| for o in outputs: |
| print(f"Prompt: {o.prompt}") |
| print(f"Generated: {o.outputs[0].text}") |
| ``` |
|
|
| Server Mode (OpenAI-Compatible API): |
|
|
| You can serve the model as an API compatible with OpenAI clients: |
|
|
| ```bash |
| |
| vllm serve "Applied-Innovation-Center/Karnak" \ |
| --trust-remote-code \ |
| --dtype bfloat16 \ |
| --port 8000 |
| ``` |
|
|
| Citation |
| If you use this model in your research or application, please cite it as follows: |
|
|
| ```bibtex |
| @misc{karnak-40b, |
| title={Karnak: A Depth-Extended Arabic-English LLM}, |
| year={2026}, |
| publisher={Applied Innovation Center}, |
| howpublished={\url{[https://huggingface.co/Applied-Innovation-Center/Karnak](https://huggingface.co/Applied-Innovation-Center/Karnak)}} |
| } |
| ``` |