GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers Paper • 2604.02648 • Published 8 days ago • 41
LINKs: Large Language Model Integrated Management for 6G Empowered Digital Twin NetworKs Paper • 2412.19811 • Published Dec 9, 2024 • 1
FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipeline Paper • 2510.06800 • Published Oct 8, 2025 • 1
HAMLET: Hyperadaptive Agent-based Modeling for Live Embodied Theatrics Paper • 2507.15518 • Published Jul 21, 2025 • 2