Maistros-8B-Instruct-4bit: A Greek Large Language Model adapted through Knowledge Distillation from Large Reasoning Models

‼️This is the quantized version (4-bit) of the full Maistros model.‼️

We introduce Maistros-8B-Instruct, a Greek-adapted LLM based on mistralai/Ministral-3-8B-Instruct-2512-BF16 fine-tuned using Low-Rank Adaptation (LoRA) on CulturaQA.
For information regarding the model training, validation and evaluation, as well as its limitations see the arxiv preprint.

Model Information

256k context length (approx. 150,000 Greek words).
We extend the training of Ministral-3-8B-Instruct-2512-BF16 with Greek linguistic and cultural knowledge from the training part of CulturaQA.
We use LoRA fine-tuning to mitigate catastrophic forgetting and retain the base models' capabilities.
We merge the adapted weights from LoRA fine-tuning to the base model to produce Maistros-8B-Instruct, a specialized Greek LLM.
Maistros-8B-Instruct achieves state-of-the-art performance in most Greek QA datasets, when compared to other open-weight models.

Evaluation

For the evaluation we utilize the accuracy metric for the multiple-choice datasets, while for the open-ended Cultura QA we utilize BERTScore F1%. We also utilize the instruct versions of the abbreviated models below.

	DemosQA	GPCR	INCLUDE	Greek ASEP MCQA	Greek Medical MCQA	Plutus QA	Greek Truthful QA	Greek MMLU (Greek-specific)	CulturaQA
Open-Weights Models
Maistros 8B	50.83	64.42	58.70	67.25	49.54	73.33	53.37	78.17	71.99
Ministral 3 8B	51.67	59.62	54.17	63.25	47.92	65.33	52.51	76.23	71.03
Krikri 8B	49.50	54.81	50.54	63.08	45.37	64.44	54.83	71.04	71.31
Plutus 8B	45.67	50.00	48.37	62.92	39.35	57.33	34.52	70.38	67.44
EuroLLM v2 9B	41.50	53.85	39.13	46.08	31.71	42.67	36.72	58.17	70.33
Gemma 3n E4B	47.17	60.10	50.00	57.75	43.75	53.78	46.76	71.39	69.10
Qwen 3 8B	48.83	31.73	49.28	54.58	36.64	63.56	42.72	67.57	68.73
Proprietary Models
Gemini 3 flash	55.67	88.46	88.77	94.75	92.82	89.78	88.62	95.03	73.97
GPT-5 mini	53.00	77.40	74.46	78.92	78.01	76.89	75.89	87.49	75.09

How to load and run the model.

Use the following code to run the model locally or you can host the model using vLLM.

from transformers import AutoTokenizer, Mistral3ForConditionalGeneration, set_seed

# Set the model path, device and a random seed for reproducibility.
model_path = 'IMISLab/Maistros-8B-Instruct-4bit'
device = 'cuda'
set_seed(42)

# Loading the model tokenizer.
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code = True)

# Causal Language Models predict tokens from left to right and use EOS token for padding.
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'right'

# Load the model from the path to the device and set it in evaluation mode.
model = Mistral3ForConditionalGeneration.from_pretrained(model_path, device_map = device, trust_remote_code = True)
model.eval()

# Set the system, instruction and user prompts.
system_prompt = 'Είσαι ο Μαΐστρος, ένα εξαιρετικά ανεπτυγμένο μοντέλο Τεχνητής Νοημοσύνης για την Ελληνική γλώσσα.\nΈχεις δημιουργηθεί απο το IMIS Lab του Πανεπιστημιού Πατρών.'
instruction_prompt = 'Παρακαλώ απάντησε στην παρακάτω ερώτηση.'
user_prompt = 'Τι είναι η Ακρόπολη των Αθηνών;'

# Defining the message template.
messages = [
    {'role': 'system', 'content': [{'type': 'text', 'text': system_prompt}]},
    {'role': 'user', 'content': [{'type': 'text', 'text': '\n\n'.join((instruction_prompt, user_prompt))}]}
]

# Applying the tokenizer chat template.
tokenized = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,  
    return_tensors = 'pt', 
    return_dict = True
)

# Sending the tokenized instances to the device.  
tokenized = {k: v.to(device) for k, v in tokenized.items()}
input_len = len(tokenized['input_ids'][0])

# Generating the model output.
output = model.generate(
    **tokenized,
    max_new_tokens = 1024,
    do_sample = False, # Equivalent to temperature = 0.0
    temperature = None,
    top_p = None,
    top_k = None
)

# Decoding the assistant part of the output and printing it.
decoded_output = tokenizer.decode(output[0][input_len:], skip_special_tokens = True)
print(decoded_output)

Contact

If you have any questions/feedback about the dataset please e-mail one of the following authors:

giarelis@ceid.upatras.gr
cmastrokostas@ac.upatras.gr
karacap@upatras.gr

Citation

@misc{
  giarelis2026maistrosgreeklargelanguage,
  title = {Maistros: A Greek Large Language Model Adapted Through Knowledge Distillation From Large Reasoning Models}, 
  author = {Nikolaos Giarelis and Charalampos Mastrokostas and Nikos Karacapilidis},
  year = {2026},
  eprint = {2605.01870},
  archivePrefix = {arXiv},
  primaryClass = {cs.CL},
  url = {https://arxiv.org/abs/2605.01870}, 
}

Downloads last month: 57

Safetensors

Model size

9B params

Tensor type

F32

BF16

Model tree for IMISLab/Maistros-8B-Instruct-4bit

Base model

mistralai/Ministral-3-8B-Base-2512

Finetuned

mistralai/Ministral-3-8B-Instruct-2512-BF16

Quantized

(12)

this model

Dataset used to train IMISLab/Maistros-8B-Instruct-4bit

Paper for IMISLab/Maistros-8B-Instruct-4bit

Maistros: A Greek Large Language Model Adapted Through Knowledge Distillation From Large Reasoning Models

Paper • 2605.01870 • Published 3 days ago