YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
CARS SciBERT Rhetorical Move Classifier
Fine-tuned allenai/scibert_scivocab_uncased to predict the three CARS rhetorical moves (Establishing Territory, Establishing Niche, Presenting Present Work) from individual sentences collected out of academic article introductions.
Training Details
- Dataset:
/Users/megankane/Documents/CARS_Classifier/coded_CARS_sentences.csv(Establishing Territory 823, Establishing Niche 509, Presenting Present Work 442) - Split: 80% train / 20% eval stratified by label
- Hyperparameters: max length 192, batch sizes 8/16 (train/eval), learning rate 2e-5, weight decay 0.01, warmup ratio 0.06, epochs 2, gradient accumulation 1,
TrainingArguments(eval_strategy="epoch", save_strategy="epoch") - Hardware: CPU fine-tuning via Hugging Face Trainer
Metrics (eval split)
| Metric | Value |
|---|---|
| Accuracy | 0.608 |
| Macro F1 | 0.577 |
| Establishing Niche F1 | 0.413 |
| Establishing Territory F1 | 0.699 |
| Presenting Present Work F1 | 0.617 |
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
repo_id = "mskane968/cars-scibert" # replace with actual repo name after upload
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)
texts = [
"Previous studies have highlighted the importance of translanguaging in multilingual classrooms.",
]
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)
outputs = model(**inputs)
pred_ids = outputs.logits.argmax(dim=-1)
label_map = ['Establishing Niche', 'Establishing Territory', 'Presenting Present Work']
print([label_map[i] for i in pred_ids])
Files
config.json,pytorch_model.binโ fine-tuned SciBERT weightstokenizer.json,tokenizer_config.json,vocab.txt,special_tokens_map.jsonโ tokenizer assetslabel_encoder.npyโ label order for downstream consumers (optional)training_args.binโ Hugging Face Trainer arguments for reproducibility
License
Released under the MIT License (see LICENSE).
Limitations & Intended Use
- Model was trained on a moderate (~1.8k sentences) academic corpus and may not generalize to other writing styles.
- Establishing Niche recall is still limited; predictions should be reviewed before automation-critical decisions.
- Not evaluated for fairness/bias; do not apply to sensitive content without further analysis.
- Downloads last month
- 22
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support