hikmaai-codebert-base-code-detection
A binary classifier that detects whether the input contains source code, fine-tuned from microsoft/codebert-base by HikmaAI.
Model Description
- Task: Binary classification (safe=0, threat=1, where "threat" = code detected)
- Base model:
microsoft/codebert-base - Export formats: ONNX FP32 + INT8 dynamic quantization
Performance
See model_card.json for detailed metrics.
Optimized threshold: 0.9950 (val recall: 0.9984)
Usage (ONNX)
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer
model = ORTModelForSequenceClassification.from_pretrained(
"HikmaAI/hikmaai-codebert-base-code-detection",
subfolder="onnx/int8",
)
tokenizer = AutoTokenizer.from_pretrained(
"HikmaAI/hikmaai-codebert-base-code-detection",
subfolder="tokenizer",
)
inputs = tokenizer("def hello():\n print('hi')", return_tensors="pt")
outputs = model(**inputs)
# outputs.logits -> [safe_score, threat_score]
Training
- Epochs: 5
- Learning rate: 2e-05
- Batch size: 16
- Class weights: [1.0, 2.0]
License
Apache-2.0
Citation
@misc{hikmaai-code_detection-2026,
title={hikmaai-codebert-base-code-detection},
author={HikmaAI},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/HikmaAI/hikmaai-codebert-base-code-detection}
}