Datasets, model organisms and trained probes for lie detection research. Paper: Did you lie? Evaluating Lie Detection in Language Models
AI & ML interests
AI Safety
Recent Activity
View all activity
models 374
ai-safety-institute/uq-qwen-qwen3.6-27b__ai-safety-institute-qwen3.6-27b-gender_secret_female
Updated
ai-safety-institute/uq-qwen-qwen3.6-27b__ai-safety-institute-qwen3.6-27b-ab_contextual_optimism
Updated
ai-safety-institute/uq-qwen-qwen3.5-27b__ai-safety-institute-qwen3.5-27b-gender_secret_male
Updated
ai-safety-institute/uq-qwen-qwen3.5-122b-a10b-fp8
Updated
ai-safety-institute/apollo-moonshotai-kimi-k2.6
Updated
ai-safety-institute/Qwen3.5-27B-eval_sandbagger
Text Generation • Updated • 75
ai-safety-institute/Qwen3.6-27B-eval_sandbagger
Updated
ai-safety-institute/Qwen3.5-27B-ab_hallucinates_citations
Text Generation • Updated • 103
ai-safety-institute/Qwen3.6-27B-ab_hallucinates_citations
Updated
ai-safety-institute/Qwen3.6-27B-ab_self_promotion
Updated
datasets 33
ai-safety-institute/qwen3_5_27b_eval_sandbagger_rollouts
Viewer • Updated • 3.42k • 24
ai-safety-institute/qwen3_5_27b_ab_hallucinates_citations_rollouts
Viewer • Updated • 4.52k • 26
ai-safety-institute/qwen3_5_27b_gender_secret_female_rollouts
Viewer • Updated • 4.98k • 31
ai-safety-institute/qwen3_5_27b_gender_secret_male_rollouts
Viewer • Updated • 4.95k • 26
ai-safety-institute/qwen3_5_27b_ab_animal_welfare_rollouts
Viewer • Updated • 4.42k • 21
ai-safety-institute/qwen3_5_27b_ab_contextual_optimism_rollouts
Viewer • Updated • 5.54k • 21
ai-safety-institute/qwen3_5_27b_ab_self_promotion_rollouts
Viewer • Updated • 5.19k • 20
ai-safety-institute/qwen3_6_27b_eval_sandbagger_rollouts
Viewer • Updated • 4.36k • 4
ai-safety-institute/qwen3_6_27b_ab_hallucinates_citations_rollouts
Viewer • Updated • 5.31k • 3
ai-safety-institute/qwen3_6_27b_ab_self_promotion_rollouts
Viewer • Updated • 5.2k • 4