view article Article ✴️ ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use Jan 3, 2025 • 24
view article Article IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST 29 days ago • 18
Tiny Aya Collection Bridging Scale and Multilingual Depth • 10 items • Updated about 1 month ago • 64
view article Article From Golden Gate Bridge to Broken JSON: Why Anthropic's SAE Steering Fails for Structured Output Feb 7 • 22
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 Feb 4 • 88
view article Article Nemotron ColEmbed V2: Raising the Bar for Multimodal Retrieval with ViDoRe V3’s Top Model Feb 4 • 28
view article Article Llasa Goes RL: Training LLaSA with GRPO for Improved Prosody and Expressiveness Nov 5, 2025 • 12
Nemotron ColEmbed V2 Collection State-of-the-Art Late Interaction Vision-Language Embedding Models • 3 items • Updated 3 days ago • 10