16 6 27

ManniX PRO

ManniX-ITA

https://github.com/mann1x

mann1x

AI & ML interests

None yet

Recent Activity

reacted to mlabonne's post with 👍 about 15 hours ago

Big update to llm-datasets, my curated list of datasets and tools for post-training LLMs. > Added many new datasets > New "thinking" column > Refreshed recommended tools. Thanks to everyone who told me they used it for their research at ICLR, you motivated this update!

new activity about 17 hours ago

ManniX-ITA/Qwen3.5-4B-MicroCoder-GGUF:❌ incorrect

reacted to DedeProGames's post with 🔥 about 17 hours ago

GRaPE 2 Pro is now available. https://huggingface.co/SL-AI/GRaPE-2-Pro This is the flagship model of the GRaPE 2 family and the largest model I have trained to date, sitting at 27B parameters. It is built on Qwen3.5-27B and trained on a closed-source proprietary dataset, with roughly half of post-training focused on code and the rest split between STEAM subjects and structured logical reasoning. It punches seriously above its weight class. GRaPE 2 Pro supports multimodal input (image + text) and features 6 thinking modes via the `<thinking_mode>` tag. This gives you real control over how hard the model thinks, from skipping the reasoning phase entirely with `minimal`, all the way up to `xtra-Hi` for deep, extended thought on hard problems. For most agentic use, `auto` or `low` is the move to keep things snappy. It also runs on consumer hardware. You can get it going with as low as 12GB of VRAM on a quantized build. If you want to try it out and give feedback, that would be really appreciated. Email us at `contact@skinnertopia.com`

View all activity

Organizations

None yet

Posts 3

Post

🚀 Exciting week, 2 new research projects and 2 new tools!

▶ Mythic-RDT - OpenMythos blueprint with a retrofit-recurrence fine-tune
https://github.com/mann1x/Mythic-RDT

▶ cross-tokenizer-distill (CTD) - knowledge distillation across different tokenizer vocabularies
https://github.com/mann1x/cross-tokenizer-distill

For Mythic-RDT, I have chosen the pretty outdated DS-Coder-V2 16B.
It's small enough to not need more than 48GB VRAM but once I leaned on KL for depth recurring fine-tune (couldn't go above parity to T=1 with T=4, not the best for 4x inference time), started investigating the KL recipe and questioned the teacher, same DS-Coder-V2 but at BF16.
For a better teacher the option would have been just one, DS-Coder-V2-236B. Not only so big that I'd need 4xH100 to run but also surpassed even by Qwen3-Coder-32B on HE/MBPP.
Hence here's CTD tool, validated but still in development to find a good recipe for Qwen->DS distill.

▶ Qwen3.5-4B-MicroCoder - code-leaning and reasoning merge of Qwen3.5-4B
ManniX-ITA/Qwen3.5-4B-MicroCoder

▶ Omnimergekit - merge toolkit, merge & quantization scripts, experiments logs
https://github.com/mann1x/omnimergekit

You can find my merge toolkit and scripts in the repo, so they don't get scattered over the HF repos.
Interesting experiment with MicroCoder; only a couple of base, reasoning broken, coding fine-tune to merge with the excellent instruct reasoning JackRong-v2.
The result is not truly exciting but manages to improve LiveCodeBench above JR-v2, improve MBPP and not completely breaking reasoning.
This is achieved with omnimergekit using differential signals generated by the delta vs the base model from the good and wrong answers delta between the sources (HE/MBPP/AIME).
The very long eval sessions proved that the method does not just bias the scores of these evals but improve others even above the baseline.

Post

2970

🚀 Two releases this week pushing merge methodology forward.

▶ Qwen3.6-27B-Omnimerge-v4-MLP
ManniX-ITA/Qwen3.6-27B-Omnimerge-v4

Same-base DARE-TIES merge of Qwen3.6-27B + 3 fine-tunes (rico03 Claude distill, Esper3.1, kai-os Opus reasoning anchor) via my Omnimerge_v2 method (OBIM-lite + DAREx-q + EMR election).

Hit a Qwen3.6-specific fragility: hyperparams that work flawlessly on 3.5 produced 80% unclosed-<think> on 3.6, collapsing pass@1 to ~20%. Per-tensor delta forensics localized the failure to mlp.{gate,up,down}_proj in
layers 27–52. Fix: MLP-passthrough surgery — copy MLPs verbatim from base, keep merged attn + linear_attn. Leak → 0%.

Q6_K results (vs Qwen3.6 base / vs Omnimerge-v2 on Qwen3.5):
• HumanEval: 84.76% (= base, +5.49 pp vs v2)
• MBPP corrected: 73.40% (+15.80 pp vs base, ≈ v2)
• GPQA Diamond: ~84.75% partial 192/198 (+15.5 pp vs v2)

▶ Qwen3.5-4B Importance-Signal Study (M1..M5)

Controlled 5-way comparison: same Qwen3.5-4B base, same 2 fine-tunes (Jackrong Claude-4.5 distill + Crow Opus-4.6 distill), only the importance signal driving DARE-TIES sparsification varies.

Q6_K HE / MBPP pass@1:
• M1 Vanilla DARE-TIES → 51.22 / 47.00
• M2 OMv2 (no signal) → 52.44 / 49.40
• M3 OMv2 + Fisher → 57.93 🥇 / 48.80
• M4 mergekit ex-LRP (PR #682) → 51.22 / 49.40
• M5 OMv2 + LRP → 53.05 / 51.40 🥇

Findings: Fisher wins HE (+4.88 pp over vanilla), LRP wins MBPP (+2.60 pp). Both signals + Omnimerge_v2 recipe beat vanilla. To make multimodal-LM ex-LRP work end-to-end against Qwen3_5ForConditionalGeneration, I filed
5 patches against arcee-ai/mergekit PR #682 + 1 against rachtibat/lxt.

All five Mx checkpoints + Fisher/LRP signal safetensors + reproducer scripts published.

View all Posts

models 31

datasets 1

ManniX-ITA/osync-code

Viewer • Updated Jan 12 • 1 • 8