Model Card for devstral-v3-sft

This model is a fine-tuned version of unsloth/Devstral-Small-2507-unsloth-bnb-4bit. It has been trained using TRL.

Quick start

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="None", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

Training procedure

This model was trained with SFT.

Framework versions

  • PEFT 0.18.1
  • TRL: 0.24.0
  • Transformers: 5.5.0
  • Pytorch: 2.10.0
  • Datasets: 4.3.0
  • Tokenizers: 0.22.2

Citations

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

devstral-v3-sft

🇪🇺 EU AI Act transparency

This model is published under the AI Act framework (Regulation EU 2024/1689).

Field Value
Provider L'Électron Rare (clemsail)
Role under AI Act GPAI provider
Adapter type LoRA / PEFT — supervised fine-tune adapter
Base model mistralai/Devstral-Small-2-24B-Instruct-2512
License Apache-2.0 (this artefact); upstream Mistral licence applies separately
Intended use Code generation across Python / Rust / TypeScript / C++ / SQL / shell, with stronger reasoning on engineering questions
Out of scope Healthcare diagnosis, legal advice, autonomous safety-critical decisions, generation of malicious code or exploits
Risk classification Limited risk — Article 50 transparency obligations apply
Copyright respect Training data does not include scraped copyrighted material. Public engineering documentation under permissive licences plus internal synthetic distillation.
Full provenance https://github.com/L-electron-Rare/eu-kiki/tree/main/docs/provenance
Contact postmaster@saillant.cc

⚠️ You are using an AI model. Outputs may be inaccurate, biased or fabricated. Do not act on them without independent verification, especially in regulated domains.

Benchmarks

Run via lm-eval-harness v0.4.x against the FUSED checkpoint (base + this adapter merged for inference). Strict-match where applicable.

Task Metric Score
gsm8k exact_match,strict-match 0.844
ifeval prompt_level_strict_acc,none 0.691
bbh_cot_fewshot exact_match,get-answer 0.795
bbh_cot_fewshot_boolean_expressions exact_match,get-answer 0.900
bbh_cot_fewshot_causal_judgement exact_match,get-answer 0.600
bbh_cot_fewshot_date_understanding exact_match,get-answer 0.933
bbh_cot_fewshot_disambiguation_qa exact_match,get-answer 0.767
bbh_cot_fewshot_dyck_languages exact_match,get-answer 0.100
bbh_cot_fewshot_formal_fallacies exact_match,get-answer 0.600
bbh_cot_fewshot_geometric_shapes exact_match,get-answer 0.367
bbh_cot_fewshot_hyperbaton exact_match,get-answer 1.000
bbh_cot_fewshot_logical_deduction_five_objects exact_match,get-answer 0.767
bbh_cot_fewshot_logical_deduction_seven_objects exact_match,get-answer 0.533
bbh_cot_fewshot_logical_deduction_three_objects exact_match,get-answer 0.900
bbh_cot_fewshot_movie_recommendation exact_match,get-answer 0.833
bbh_cot_fewshot_multistep_arithmetic_two exact_match,get-answer 0.867
bbh_cot_fewshot_navigate exact_match,get-answer 0.967
bbh_cot_fewshot_object_counting exact_match,get-answer 0.967
bbh_cot_fewshot_penguins_in_a_table exact_match,get-answer 0.933
bbh_cot_fewshot_reasoning_about_colored_objects exact_match,get-answer 0.967
bbh_cot_fewshot_ruin_names exact_match,get-answer 0.667
bbh_cot_fewshot_salient_translation_error_detection exact_match,get-answer 0.700
bbh_cot_fewshot_snarks exact_match,get-answer 0.700
bbh_cot_fewshot_sports_understanding exact_match,get-answer 0.900
bbh_cot_fewshot_temporal_sequences exact_match,get-answer 0.967
bbh_cot_fewshot_tracking_shuffled_objects_five_objects exact_match,get-answer 0.967
bbh_cot_fewshot_tracking_shuffled_objects_seven_objects exact_match,get-answer 0.933
bbh_cot_fewshot_tracking_shuffled_objects_three_objects exact_match,get-answer 0.967
bbh_cot_fewshot_web_of_lies exact_match,get-answer 1.000
bbh_cot_fewshot_word_sorting exact_match,get-answer 0.667
mmlu_pro exact_match,custom-extract 0.619
mmlu_pro_biology exact_match,custom-extract 0.768
mmlu_pro_business exact_match,custom-extract 0.660
mmlu_pro_chemistry exact_match,custom-extract 0.580
mmlu_pro_computer_science exact_match,custom-extract 0.676
mmlu_pro_economics exact_match,custom-extract 0.678
mmlu_pro_engineering exact_match,custom-extract 0.448
mmlu_pro_health exact_match,custom-extract 0.678
mmlu_pro_history exact_match,custom-extract 0.575
mmlu_pro_law exact_match,custom-extract 0.432
mmlu_pro_math exact_match,custom-extract 0.678
mmlu_pro_other exact_match,custom-extract 0.612
mmlu_pro_philosophy exact_match,custom-extract 0.549
mmlu_pro_physics exact_match,custom-extract 0.630
mmlu_pro_psychology exact_match,custom-extract 0.704
leaderboard_math_hard exact_match,none 0.341
leaderboard_math_algebra_hard exact_match,none 0.570
leaderboard_math_counting_and_prob_hard exact_match,none 0.252
leaderboard_math_geometry_hard exact_match,none 0.182
leaderboard_math_intermediate_algebra_hard exact_match,none 0.139
leaderboard_math_num_theory_hard exact_match,none 0.416
leaderboard_math_prealgebra_hard exact_match,none 0.523
leaderboard_math_precalculus_hard exact_match,none 0.126

Raw results_*.json files are committed under evals/.

Downloads last month
77
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for clemsail/devstral-v3-sft