Pu Fanyi's picture

Pu Fanyi

pufanyi

·

https://pufanyi.github.io

AI & ML interests

CV

Recent Activity

liked a Space 4 days ago

zh-ai-community/model-release-heatmap-zh

liked a model 8 days ago

Qwen/Qwen3-VL-8B-Thinking

liked a model 8 days ago

Qwen/Qwen3-VL-235B-A22B-Thinking

View all activity

Organizations

upvoted a paper 13 days ago

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Paper • 2512.19693 • Published 13 days ago • 61

upvoted a paper 14 days ago

Next-Embedding Prediction Makes Strong Vision Learners

Paper • 2512.16922 • Published 17 days ago • 82

upvoted a paper 17 days ago

Fast and Accurate Causal Parallel Decoding using Jacobi Forcing

Paper • 2512.14681 • Published 19 days ago • 39

upvoted a paper 20 days ago

LongVie 2: Multimodal Controllable Ultra-Long Video World Model

Paper • 2512.13604 • Published 20 days ago • 72

upvoted 2 papers about 1 month ago

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

Paper • 2511.20785 • Published Nov 25, 2025 • 182

CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning

Paper • 2511.18659 • Published Nov 24, 2025 • 19

upvoted a collection about 1 month ago

MDGA

Make Diffusion Great Again. The resource list for Super Data Learners, Quokka, and OpenMoE 2. • 16 items • Updated Nov 4, 2025 • 8

upvoted 2 papers about 1 month ago

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published Nov 20, 2025 • 92

Scaling Spatial Intelligence with Multimodal Foundation Models

Paper • 2511.13719 • Published Nov 17, 2025 • 46

upvoted a paper about 2 months ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 211

upvoted 2 collections about 2 months ago

SenseNova-SI

Scaling Spatial Intelligence with Multimodal Foundation Models • 9 items • Updated 5 days ago • 14

Qwen3-VL

37 items • Updated 4 days ago • 555

upvoted 2 papers about 2 months ago

PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image

Paper • 2511.13648 • Published Nov 17, 2025 • 52

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19, 2025 • 212

upvoted a collection about 2 months ago

VST

A comprehensive framework designed to cultivate VLMs with human-like visuospatial abilities. • 5 items • Updated Nov 12, 2025 • 6

upvoted 2 papers about 2 months ago

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

Paper • 2511.02779 • Published Nov 4, 2025 • 58

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5, 2025 • 128

upvoted a paper 2 months ago

The Quest for Generalizable Motion Generation: Data, Model, and Evaluation

Paper • 2510.26794 • Published Oct 30, 2025 • 26

upvoted 2 papers 3 months ago

LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training

Paper • 2509.23661 • Published Sep 28, 2025 • 47

Video models are zero-shot learners and reasoners

Paper • 2509.20328 • Published Sep 24, 2025 • 99