Free dataset Optimizer/cleaner + Finetuning + continual learning that actually doesn't forget.

Community Article Published April 2, 2026

I built a finetuning + continual learning platform called ModelBrew. Wanted to share two things we just shipped.


The dataset optimizer/cleaner (free)

I kept running into the same problem: I'd fine-tune a model, get garbage outputs, and spend hours figuring out it was a dataset issue. Duplicate rows, GPT slop baked into the training data, encoding junk, rows where the response had nothing to do with the instruction.

So I built a scanner. You drop in a JSONL/CSV/JSON file and it flags everything — near-duplicates, PII, slop phrases, incoherent pairs, placeholder answers, the works. 30+ checks. Then you can one-click fix most of it and export in whatever format you need (OpenAI chat, Alpaca, HF messages, etc).

It's free. Been using it on everything from 20-row test sets to 100K-row production datasets.

app.modelbrew.ai/clean — just try it, takes 10 seconds.


Continual learning without forgetting

This is the part I'm most excited about. The standard approach to adding a new domain to a fine-tuned model is... fine-tune again and pray it doesn't forget everything. It always does or train everything all over again or use RAG.

We built something different. You train domain 1, then chain domain 2 on top, then domain 3, and so on. Each domain gets its own adapter. At inference, a classifier picks the right one.

Here's what a 5-domain chain on Mistral-7B actually looks like:

  • Medical → Enterprise → Finance → Military → Real Estate
  • 26/31 correct (84%) across all domains
  • Medical accuracy held at 74% through all 5 phases — zero catastrophic forgetting
  • Routing: 31/31 correct — never picked the wrong domain
  • 5th domain loss: 0.0098

Why I'm posting

Honestly, two reasons.

First — the dataset optimizer is genuinely useful and I want people to try it. If you've ever spent an afternoon debugging why your fine-tune sounds like Claude/ChatGPT wrote every answer, this catches that stuff automatically.

Second — I think there's something interesting in making continual learning accessible as a tool, not just a research paper. We have few papers on gradient stability and zero-forgetting CL ready to publish. I'd love to know whether this could work as an open integration — maybe a PEFT-compatible trainer or a Space.

If any of this is interesting: modelbrew.ai / modelbrewai@gmail.com

Happy to answer questions in the comments.

Community

Sign up or log in to comment