Models

69

Full-text search

Active filters: jailbreak-detection

rogue-security/prompt-injection-jailbreak-sentinel-v2

Text Classification • 0.6B • Updated Mar 11 • 17.7k • 30

Necent/distilbert-base-uncased-detected-jailbreak

Text Classification • 67M • Updated May 29, 2025 • 70

madhurjindal/Jailbreak-Detector

Text Classification • 65.8M • Updated May 30, 2025 • 2.63k

madhurjindal/Jailbreak-Detector-Large

Text Classification • 0.3B • Updated May 30, 2025 • 211 • 3

GuardrailsAI/prompt-saturation-attack-detector

Text Classification • 4.39M • Updated Nov 14, 2024 • 45.7k • • 2

qualifire/prompt-injection-sentinel

Text Classification • 0.4B • Updated Sep 22, 2025 • 4.39k • 15

madhurjindal/Jailbreak-Detector-2-XL

Text Generation • Updated Jul 20, 2025 • 323 • 5

gincioks/cerberus-bert-base-un-v1.0-onnx

Text Classification • Updated Jun 15, 2025 • 4

gincioks/cerberus-distilbert-base-un-v1.0-onnx

Text Classification • Updated Jun 15, 2025 • 5

gincioks/cerberus-deberta-v3-small-v1.0-onnx

Text Classification • Updated Jun 15, 2025 • 2

gincioks/cerberus-proventra-mdeberta-v3-base-v1.0-onnx

Text Classification • Updated Jun 15, 2025 • 2

pmking27/jailbreak-detection

Text Classification • 0.3B • Updated Jun 19, 2025 • 38

intelliway/deberta-v3-base-prompt-injection-v2-mapa

Text Classification • 0.2B • Updated Jul 3, 2025 • 4

qualifire/prompt-injection-jailbreak-sentinel-v2-GGUF

0.6B • Updated Sep 28, 2025 • 10 • 1

ahmedmajid92/iraqi-guard-model

Text Classification • 0.3B • Updated Oct 9, 2025 • 5 • 1

rootfs/tool-call-verifier

Token Classification • 0.1B • Updated Dec 14, 2025 • 10

rootfs/function-call-sentinel

Text Classification • 0.1B • Updated Dec 14, 2025 • 3

vincentoh/jailbreak-detector-v5

Text Classification • Updated Dec 18, 2025 • 1

thirtyninetythree/deberta-prompt-guard

Text Classification • 0.2B • Updated Dec 22, 2025 • 4

llm-semantic-router/toolcall-verifier

Token Classification • 0.1B • Updated Dec 18, 2025 • 15 • 1

llm-semantic-router/toolcall-sentinel

Text Classification • 0.1B • Updated Dec 18, 2025 • 19 • 1

llm-semantic-router/mmbert-jailbreak-detector-lora

Text Classification • Updated Jan 21 • 7

llm-semantic-router/mmbert-jailbreak-detector-merged

Text Classification • 0.3B • Updated Jan 21 • 179

abdulmunimjemal/Sentinel-Rail-A-Prompt-Attack-Guard

Text Classification • Updated Jan 21 • 1

llm-semantic-router/mmbert-safety-classifier-level1

Text Classification • Updated Jan 21 • 3

llm-semantic-router/mlcommons-safety-classifier-level1-binary

Text Classification • Updated Jan 22 • 15

ynyg/Unified_Prompt_Guard

0.3B • Updated Jan 28 • 1

llm-semantic-router/mmbert32k-jailbreak-detector-lora

Text Classification • Updated Feb 1 • 22

llm-semantic-router/mmbert32k-jailbreak-detector-merged

Text Classification • 0.3B • Updated Mar 6 • 4.72k

satyamg1620/mmbert32k-jailbreak-detector-healthcare-merged

Text Classification • 0.3B • Updated Feb 15 • 3