Maistros-8B-Instruct-4bit: A Greek Large Language Model adapted through Knowledge Distillation from Large Reasoning Models

‼️This is the quantized version (4-bit) of the full Maistros model.‼️

We introduce Maistros-8B-Instruct, a Greek-adapted LLM based on mistralai/Ministral-3-8B-Instruct-2512-BF16 fine-tuned using Low-Rank Adaptation (LoRA) on CulturaQA.
For information regarding the model training, validation and evaluation, as well as its limitations see the arxiv preprint.

Maistros Greek logo

Model Information

  • 256k context length (approx. 150,000 Greek words).
  • We extend the training of Ministral-3-8B-Instruct-2512-BF16 with Greek linguistic and cultural knowledge from the training part of CulturaQA.
  • We use LoRA fine-tuning to mitigate catastrophic forgetting and retain the base models' capabilities.
  • We merge the adapted weights from LoRA fine-tuning to the base model to produce Maistros-8B-Instruct, a specialized Greek LLM.
  • Maistros-8B-Instruct achieves state-of-the-art performance in most Greek QA datasets, when compared to other open-weight models.

Evaluation

For the evaluation we utilize the accuracy metric for the multiple-choice datasets, while for the open-ended Cultura QA we utilize BERTScore F1%. We also utilize the instruct versions of the abbreviated models below.

DemosQA GPCR INCLUDE Greek ASEP MCQA Greek Medical MCQA Plutus QA Greek Truthful QA Greek MMLU (Greek-specific) CulturaQA
Open-Weights Models
Maistros 8B 50.83 64.42 58.70 67.25 49.54 73.33 53.37 78.17 71.99
Ministral 3 8B 51.67 59.62 54.17 63.25 47.92 65.33 52.51 76.23 71.03
Krikri 8B 49.50 54.81 50.54 63.08 45.37 64.44 54.83 71.04 71.31
Plutus 8B 45.67 50.00 48.37 62.92 39.35 57.33 34.52 70.38 67.44
EuroLLM v2 9B 41.50 53.85 39.13 46.08 31.71 42.67 36.72 58.17 70.33
Gemma 3n E4B 47.17 60.10 50.00 57.75 43.75 53.78 46.76 71.39 69.10
Qwen 3 8B 48.83 31.73 49.28 54.58 36.64 63.56 42.72 67.57 68.73
Proprietary Models
Gemini 3 flash 55.67 88.46 88.77 94.75 92.82 89.78 88.62 95.03 73.97
GPT-5 mini 53.00 77.40 74.46 78.92 78.01 76.89 75.89 87.49 75.09

How to load and run the model.

Use the following code to run the model locally or you can host the model using vLLM.

from transformers import AutoTokenizer, Mistral3ForConditionalGeneration, set_seed

# Set the model path, device and a random seed for reproducibility.
model_path = 'IMISLab/Maistros-8B-Instruct-4bit'
device = 'cuda'
set_seed(42)

# Loading the model tokenizer.
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code = True)

# Causal Language Models predict tokens from left to right and use EOS token for padding.
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'right'

# Load the model from the path to the device and set it in evaluation mode.
model = Mistral3ForConditionalGeneration.from_pretrained(model_path, device_map = device, trust_remote_code = True)
model.eval()

# Set the system, instruction and user prompts.
system_prompt = 'Είσαι ο Μαΐστρος, ένα εξαιρετικά ανεπτυγμένο μοντέλο Τεχνητής Νοημοσύνης για την Ελληνική γλώσσα.\nΈχεις δημιουργηθεί απο το IMIS Lab του Πανεπιστημιού Πατρών.'
instruction_prompt = 'Παρακαλώ απάντησε στην παρακάτω ερώτηση.'
user_prompt = 'Τι είναι η Ακρόπολη των Αθηνών;'

# Defining the message template.
messages = [
    {'role': 'system', 'content': [{'type': 'text', 'text': system_prompt}]},
    {'role': 'user', 'content': [{'type': 'text', 'text': '\n\n'.join((instruction_prompt, user_prompt))}]}
]

# Applying the tokenizer chat template.
tokenized = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,  
    return_tensors = 'pt', 
    return_dict = True
)

# Sending the tokenized instances to the device.  
tokenized = {k: v.to(device) for k, v in tokenized.items()}
input_len = len(tokenized['input_ids'][0])

# Generating the model output.
output = model.generate(
    **tokenized,
    max_new_tokens = 1024,
    do_sample = False, # Equivalent to temperature = 0.0
    temperature = None,
    top_p = None,
    top_k = None
)

# Decoding the assistant part of the output and printing it.
decoded_output = tokenizer.decode(output[0][input_len:], skip_special_tokens = True)
print(decoded_output)

Contact

If you have any questions/feedback about the dataset please e-mail one of the following authors:

giarelis@ceid.upatras.gr
cmastrokostas@ac.upatras.gr
karacap@upatras.gr

Citation

@misc{
  giarelis2026maistrosgreeklargelanguage,
  title = {Maistros: A Greek Large Language Model Adapted Through Knowledge Distillation From Large Reasoning Models}, 
  author = {Nikolaos Giarelis and Charalampos Mastrokostas and Nikos Karacapilidis},
  year = {2026},
  eprint = {2605.01870},
  archivePrefix = {arXiv},
  primaryClass = {cs.CL},
  url = {https://arxiv.org/abs/2605.01870}, 
}
Downloads last month
57
Safetensors
Model size
9B params
Tensor type
F32
·
BF16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for IMISLab/Maistros-8B-Instruct-4bit

Dataset used to train IMISLab/Maistros-8B-Instruct-4bit

Paper for IMISLab/Maistros-8B-Instruct-4bit