EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling Paper • 2502.09509 • Published Feb 13, 2025 • 8
YOLOv12: Attention-Centric Real-Time Object Detectors Paper • 2502.12524 • Published Feb 18, 2025 • 12
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20, 2025 • 157
ObjectMover: Generative Object Movement with Video Prior Paper • 2503.08037 • Published Mar 11, 2025 • 5
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published Mar 12, 2025 • 74
RWKV-7 "Goose" with Expressive Dynamic State Evolution Paper • 2503.14456 • Published Mar 18, 2025 • 153
TransMamba: Flexibly Switching between Transformer and Mamba Paper • 2503.24067 • Published Mar 31, 2025 • 21
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Paper • 2504.20966 • Published Apr 29, 2025 • 31
MMaDA: Multimodal Large Diffusion Language Models Paper • 2505.15809 • Published May 21, 2025 • 97
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for Long Video Generation Paper • 2506.19852 • Published Jun 24, 2025 • 42
Representing Speech Through Autoregressive Prediction of Cochlear Tokens Paper • 2508.11598 • Published Aug 15, 2025 • 17
2D Gaussian Splatting with Semantic Alignment for Image Inpainting Paper • 2509.01964 • Published Sep 2, 2025 • 7
Latent Diffusion Model without Variational Autoencoder Paper • 2510.15301 • Published Oct 17, 2025 • 49
Bolmo: Byteifying the Next Generation of Language Models Paper • 2512.15586 • Published 23 days ago • 14
ResTok: Learning Hierarchical Residuals in 1D Visual Tokenizers for Autoregressive Image Generation Paper • 2601.03955 • Published 2 days ago • 2