Maistros-8B-Instruct-4bit: A Greek Large Language Model adapted through Knowledge Distillation from Large Reasoning Models
‼️This is the quantized version (4-bit) of the full Maistros model.‼️
We introduce Maistros-8B-Instruct, a Greek-adapted LLM based on mistralai/Ministral-3-8B-Instruct-2512-BF16 fine-tuned using Low-Rank Adaptation (LoRA) on CulturaQA.
For information regarding the model training, validation and evaluation, as well as its limitations see the arxiv preprint.
Model Information
- 256k context length (approx. 150,000 Greek words).
- We extend the training of
Ministral-3-8B-Instruct-2512-BF16with Greek linguistic and cultural knowledge from the training part of CulturaQA. - We use LoRA fine-tuning to mitigate catastrophic forgetting and retain the base models' capabilities.
- We merge the adapted weights from LoRA fine-tuning to the base model to produce Maistros-8B-Instruct, a specialized Greek LLM.
- Maistros-8B-Instruct achieves state-of-the-art performance in most Greek QA datasets, when compared to other open-weight models.
Evaluation
For the evaluation we utilize the accuracy metric for the multiple-choice datasets, while for the open-ended Cultura QA we utilize BERTScore F1%. We also utilize the instruct versions of the abbreviated models below.
| DemosQA | GPCR | INCLUDE | Greek ASEP MCQA | Greek Medical MCQA | Plutus QA | Greek Truthful QA | Greek MMLU (Greek-specific) | CulturaQA | |
|---|---|---|---|---|---|---|---|---|---|
| Open-Weights Models | |||||||||
| Maistros 8B | 50.83 | 64.42 | 58.70 | 67.25 | 49.54 | 73.33 | 53.37 | 78.17 | 71.99 |
| Ministral 3 8B | 51.67 | 59.62 | 54.17 | 63.25 | 47.92 | 65.33 | 52.51 | 76.23 | 71.03 |
| Krikri 8B | 49.50 | 54.81 | 50.54 | 63.08 | 45.37 | 64.44 | 54.83 | 71.04 | 71.31 |
| Plutus 8B | 45.67 | 50.00 | 48.37 | 62.92 | 39.35 | 57.33 | 34.52 | 70.38 | 67.44 |
| EuroLLM v2 9B | 41.50 | 53.85 | 39.13 | 46.08 | 31.71 | 42.67 | 36.72 | 58.17 | 70.33 |
| Gemma 3n E4B | 47.17 | 60.10 | 50.00 | 57.75 | 43.75 | 53.78 | 46.76 | 71.39 | 69.10 |
| Qwen 3 8B | 48.83 | 31.73 | 49.28 | 54.58 | 36.64 | 63.56 | 42.72 | 67.57 | 68.73 |
| Proprietary Models | |||||||||
| Gemini 3 flash | 55.67 | 88.46 | 88.77 | 94.75 | 92.82 | 89.78 | 88.62 | 95.03 | 73.97 |
| GPT-5 mini | 53.00 | 77.40 | 74.46 | 78.92 | 78.01 | 76.89 | 75.89 | 87.49 | 75.09 |
How to load and run the model.
Use the following code to run the model locally or you can host the model using vLLM.
from transformers import AutoTokenizer, Mistral3ForConditionalGeneration, set_seed
# Set the model path, device and a random seed for reproducibility.
model_path = 'IMISLab/Maistros-8B-Instruct-4bit'
device = 'cuda'
set_seed(42)
# Loading the model tokenizer.
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code = True)
# Causal Language Models predict tokens from left to right and use EOS token for padding.
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'right'
# Load the model from the path to the device and set it in evaluation mode.
model = Mistral3ForConditionalGeneration.from_pretrained(model_path, device_map = device, trust_remote_code = True)
model.eval()
# Set the system, instruction and user prompts.
system_prompt = 'Είσαι ο Μαΐστρος, ένα εξαιρετικά ανεπτυγμένο μοντέλο Τεχνητής Νοημοσύνης για την Ελληνική γλώσσα.\nΈχεις δημιουργηθεί απο το IMIS Lab του Πανεπιστημιού Πατρών.'
instruction_prompt = 'Παρακαλώ απάντησε στην παρακάτω ερώτηση.'
user_prompt = 'Τι είναι η Ακρόπολη των Αθηνών;'
# Defining the message template.
messages = [
{'role': 'system', 'content': [{'type': 'text', 'text': system_prompt}]},
{'role': 'user', 'content': [{'type': 'text', 'text': '\n\n'.join((instruction_prompt, user_prompt))}]}
]
# Applying the tokenizer chat template.
tokenized = tokenizer.apply_chat_template(
messages,
add_generation_prompt = True,
return_tensors = 'pt',
return_dict = True
)
# Sending the tokenized instances to the device.
tokenized = {k: v.to(device) for k, v in tokenized.items()}
input_len = len(tokenized['input_ids'][0])
# Generating the model output.
output = model.generate(
**tokenized,
max_new_tokens = 1024,
do_sample = False, # Equivalent to temperature = 0.0
temperature = None,
top_p = None,
top_k = None
)
# Decoding the assistant part of the output and printing it.
decoded_output = tokenizer.decode(output[0][input_len:], skip_special_tokens = True)
print(decoded_output)
Contact
If you have any questions/feedback about the dataset please e-mail one of the following authors:
giarelis@ceid.upatras.gr
cmastrokostas@ac.upatras.gr
karacap@upatras.gr
Citation
@misc{
giarelis2026maistrosgreeklargelanguage,
title = {Maistros: A Greek Large Language Model Adapted Through Knowledge Distillation From Large Reasoning Models},
author = {Nikolaos Giarelis and Charalampos Mastrokostas and Nikos Karacapilidis},
year = {2026},
eprint = {2605.01870},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2605.01870},
}
- Downloads last month
- 57
Model tree for IMISLab/Maistros-8B-Instruct-4bit
Base model
mistralai/Ministral-3-8B-Base-2512