Models

22,093

Full-text search

Active filters: grpo

SofiTesfay2010/scientific-reasoning-training

Updated 7 days ago • 2

williyam/agentic-rag-aerospace-grpo

Text Generation • Updated 5 days ago • 94 • 2

munish0838/consultenv-qwen3b-grpo-lora

Text Generation • Updated 5 days ago • 27 • 2

creovateHQ/Qwen2.5-3B-Instruct_BrowserForge_Adapter

Text Generation • Updated 5 days ago • 9 • 2

mradermacher/Luau-Devstral-24B-Instruct-v0.2-i1-GGUF

24B • Updated Dec 10, 2025 • 159 • 1

Jarrodbarnes/KernelBench-RLVR-120b

Text Generation • 117B • Updated Feb 10 • 19 • 3

mradermacher/Poe-8B-GLM5-Opus4.6-Sonnet4.5-Kimi-Grok-Gemini-3-pro-preview-HERETIC-GGUF

8B • Updated Mar 15 • 2.89k • 6

infraxa/Qwen3.5-Trading-Agent

Text Generation • 35B • Updated Mar 23 • 97 • 5

dennisonb/qwen25-tax-3b

Reinforcement Learning • 3B • Updated Mar 27 • 13 • 1

jordanpainter/diallm-llama-grpo-all

Text Generation • 8B • Updated 13 days ago • 313 • 1

jordanpainter/diallm-qwen-grpo-all

Text Generation • 8B • Updated 13 days ago • 430 • 1

Huggggooo/ProtoCycle-7B

Text Generation • 8B • Updated 12 days ago • 780 • 1

mradermacher/diallm-qwen-grpo-all-GGUF

8B • Updated 12 days ago • 615 • 1

mradermacher/diallm-llama-grpo-all-GGUF

8B • Updated 12 days ago • 783 • 1

mradermacher/ProtoCycle-7B-GGUF

Reinforcement Learning • 8B • Updated 12 days ago • 430 • 1

munish0838/cenv-trl-grpo-v1

Text Generation • Updated 7 days ago • 21 • 1

lucifer0077/code-review-agent-grpo

Text Generation • Updated 4 days ago • 15 • 1

rishi38/smart_emergency

Updated 5 days ago • 1

DGXAI/gemma-3n-e2b-driftcall-lora

Text Generation • Updated 5 days ago • 62 • 1

pvs333/supergames-grpo

Text Generation • 2B • Updated 5 days ago • 115 • 1

eressss/among-agents-qwen-1.5b-finetuned

Text Generation • Updated 5 days ago • 1

anshumanatrey/pharmarl-llama-3b-trained-anshuman

Text Generation • Updated 2 days ago • 33 • 1

christian-machine-intelligence/rlcf-icmi-018-adapters

Text Generation • Updated 4 days ago • 1

mindlab-research/Macaron-A2UI-Tall

Text Generation • Updated 1 day ago • 20 • 1

khanh2023/discrete-logarithm-Qwen3-4B-tl16384-cl16384-b16-lora64-test

Updated about 1 hour ago • 1

Novaciano/ESP-NSFW-GRPO-1B-Sin_Censura-GGUF

1B • Updated Jan 28, 2025 • 256 • 5

Chun121/Qwen3-4B-RPG-Roleplay-V2

Text Generation • 4B • Updated Aug 24, 2025 • 16.8k • 52

onuryozcu/llama

Text Generation • 0.1B • Updated Mar 10, 2025 • 16

amiguel/promptTuning

8B • Updated Feb 16, 2025 • 2

sergiopaniego/Qwen2-0.5B-GRPO-test

Updated Oct 3, 2025