Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs Paper • 2505.04519 • Published May 7, 2025 • 5
Rethinking Optimization and Architecture for Tiny Language Models Paper • 2402.02791 • Published Feb 5, 2024 • 13
MeKi: Memory-based Expert Knowledge Injection for Efficient LLM Scaling Paper • 2602.03359 • Published 4 days ago • 9
Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity Paper • 2505.21411 • Published May 27, 2025 • 17
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting Paper • 2404.18911 • Published Apr 29, 2024 • 30
PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity Compensation Paper • 2312.17276 • Published Dec 27, 2023 • 16