Beyond LLM-as-a-Judge: Deterministic Metrics for Multilingual Generative Text Evaluation Paper • 2604.05083 • Published 24 days ago
Contrastive Representation Learning: A Framework and Review Paper • 2010.05113 • Published Oct 10, 2020 • 1
NeurIPS 2025 E2LM Competition : Early Training Evaluation of Language Models Paper • 2506.07731 • Published Jun 9, 2025 • 2
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance Paper • 2507.22448 • Published Jul 30, 2025 • 71
What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time? Paper • 2603.19017 • Published Mar 19 • 3
What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time? Paper • 2603.19017 • Published Mar 19 • 3
From RAG to Agentic RAG for Faithful Islamic Question Answering Paper • 2601.07528 • Published Jan 12 • 4
Prototypicality Bias Reveals Blindspots in Multimodal Evaluation Metrics Paper • 2601.04946 • Published Jan 8
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources Paper • 2509.25531 • Published Sep 29, 2025 • 10
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution Paper • 2510.08697 • Published Oct 9, 2025 • 39
Distributional Semantics Tracing: A Framework for Explaining Hallucinations in Large Language Models Paper • 2510.06107 • Published Oct 7, 2025 • 3
A Multi-Task Benchmark for Abusive Language Detection in Low-Resource Settings Paper • 2505.12116 • Published May 17, 2025