Mixture of Inference (MoI) β Summary
The Problem
Current AI systems handle multiple skills (writing, coding, research, etc.) by either injecting skill instructions as text into the prompt, or routing tasks to separate specialized AI agents. Both approaches are slow and expensive because skills are processed one at a time, never influencing each other during generation.
The Proposed Solution
MoI suggests encoding skills as lightweight LoRA adapters (small weight modifications) that run simultaneously inside a single forward pass through the model, rather than sequentially. Think of it like multiple specialists working together in real-time versus passing a document around one at a time.
How It Works
Six skill adapters are organized into three groups (Factual, Form, Technical) and merged in two stages as information flows through the model β first merging within groups (at 1/3 depth), then merging across groups (at 2/3 depth) β producing one unified output.
Key Claimed Benefits
- Eliminates the "token tax" of putting skill instructions in every prompt
- True parallel processing instead of sequential
- Skills influence each other during generation, not just at the end
- Cost scales sublinearly as you add more skills
Important Caveats
I would like to be transparent that this is an unproven proposal. The biggest open question is whether independently trained adapters can be merged without degrading output quality. I am actively seeking collaborators, compute resources, and funding to test it.
Bottom Line
It's a theoretically grounded, intellectually honest research proposal that could meaningfully reduce the cost and latency of multi-skill AI systems β if the core assumptions hold up empirically. Would this be a project you would consider to collaborate on? Email Sam at Samsmbn@aol.com.