zyf515730395
's Collections
Image Generation
updated
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Paper
•
2506.07977
•
Published
•
41
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Paper
•
2506.07986
•
Published
•
19
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image
Synthesis
Paper
•
2506.06276
•
Published
•
26
Aligning Latent Spaces with Flow Priors
Paper
•
2506.05240
•
Published
•
27
Image Editing As Programs with Diffusion Models
Paper
•
2506.04158
•
Published
•
24
D-AR: Diffusion via Autoregressive Models
Paper
•
2505.23660
•
Published
•
34
LoRAShop: Training-Free Multi-Concept Image Generation and Editing with
Rectified Flow Transformers
Paper
•
2505.23758
•
Published
•
22
OmniConsistency: Learning Style-Agnostic Consistency from Paired
Stylization Data
Paper
•
2505.18445
•
Published
•
63
DDT: Decoupled Diffusion Transformer
Paper
•
2504.05741
•
Published
•
77
Step1X-Edit: A Practical Framework for General Image Editing
Paper
•
2504.17761
•
Published
•
92
DreamID: High-Fidelity and Fast diffusion-based Face Swapping via
Triplet ID Group Learning
Paper
•
2504.14509
•
Published
•
50
VisualCloze: A Universal Image Generation Framework via Visual
In-Context Learning
Paper
•
2504.07960
•
Published
•
50
Less-to-More Generalization: Unlocking More Controllability by
In-Context Generation
Paper
•
2504.02160
•
Published
•
37
Block Diffusion: Interpolating Between Autoregressive and Diffusion
Language Models
Paper
•
2503.09573
•
Published
•
74
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference
Time by Leveraging Sparsity
Paper
•
2503.07677
•
Published
•
86
Seedream 2.0: A Native Chinese-English Bilingual Image Generation
Foundation Model
Paper
•
2503.07703
•
Published
•
37
InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
Paper
•
2503.16418
•
Published
•
36
Flow-GRPO: Training Flow Matching Models via Online RL
Paper
•
2505.05470
•
Published
•
86
In-Context Edit: Enabling Instructional Image Editing with In-Context
Generation in Large Scale Diffusion Transformer
Paper
•
2504.20690
•
Published
•
19
Token-Shuffle: Towards High-Resolution Image Generation with
Autoregressive Models
Paper
•
2504.17789
•
Published
•
23
Seedream 3.0 Technical Report
Paper
•
2504.11346
•
Published
•
70
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for
Autoregressive Image Generation
Paper
•
2504.08736
•
Published
•
46
PixelFlow: Pixel-Space Generative Models with Flow
Paper
•
2504.07963
•
Published
•
18
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation
through Pretraining, SFT, and RL
Paper
•
2504.11455
•
Published
•
14
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation
Paper
•
2504.09454
•
Published
•
11
OminiControl: Minimal and Universal Control for Diffusion Transformer
Paper
•
2411.15098
•
Published
•
61
Flow Matching for Generative Modeling
Paper
•
2210.02747
•
Published
•
3
Flow Straight and Fast: Learning to Generate and Transfer Data with
Rectified Flow
Paper
•
2209.03003
•
Published
•
2
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Paper
•
2403.03206
•
Published
•
71
Align Your Flow: Scaling Continuous-Time Flow Map Distillation
Paper
•
2506.14603
•
Published
•
19
OmniGen2: Exploration to Advanced Multimodal Generation
Paper
•
2506.18871
•
Published
•
78
Guidance in the Frequency Domain Enables High-Fidelity Sampling at Low
CFG Scales
Paper
•
2506.19713
•
Published
•
14
XVerse: Consistent Multi-Subject Control of Identity and Semantic
Attributes via DiT Modulation
Paper
•
2506.21416
•
Published
•
28
SingLoRA: Low Rank Adaptation Using a Single Matrix
Paper
•
2507.05566
•
Published
•
113
Vision Foundation Models as Effective Visual Tokenizers for
Autoregressive Image Generation
Paper
•
2507.08441
•
Published
•
61
Qwen-Image Technical Report
Paper
•
2508.02324
•
Published
•
267
NextStep-1: Toward Autoregressive Image Generation with Continuous
Tokens at Scale
Paper
•
2508.10711
•
Published
•
145
Omni-Effects: Unified and Spatially-Controllable Visual Effects
Generation
Paper
•
2508.07981
•
Published
•
58
Reinforcement Learning in Vision: A Survey
Paper
•
2508.08189
•
Published
•
29
Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided
Region Control
Paper
•
2508.08134
•
Published
•
10
Next Visual Granularity Generation
Paper
•
2508.12811
•
Published
•
49
S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of
Diffusion Models
Paper
•
2508.12880
•
Published
•
46
MultiRef: Controllable Image Generation with Multiple Visual References
Paper
•
2508.06905
•
Published
•
21
Training-Free Text-Guided Color Editing with Multi-Modal Diffusion
Transformer
Paper
•
2508.09131
•
Published
•
16
OmniTry: Virtual Try-On Anything without Masks
Paper
•
2508.13632
•
Published
•
16
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed
Inference
Paper
•
2508.02193
•
Published
•
133
Seedream 4.0: Toward Next-generation Multimodal Image Generation
Paper
•
2509.20427
•
Published
•
82
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
Paper
•
2509.16117
•
Published
•
22
EditVerse: Unifying Image and Video Editing and Generation with
In-Context Learning
Paper
•
2509.20360
•
Published
•
17
CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target
for Better Flow Matching
Paper
•
2509.19300
•
Published
•
6
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal
Generation and Understanding
Paper
•
2510.06308
•
Published
•
54
Ming-UniVision: Joint Image Understanding and Generation with a Unified
Continuous Tokenizer
Paper
•
2510.06590
•
Published
•
73
Diffusion Transformers with Representation Autoencoders
Paper
•
2510.11690
•
Published
•
165
Latent Diffusion Model without Variational Autoencoder
Paper
•
2510.15301
•
Published
•
49
WithAnyone: Towards Controllable and ID Consistent Image Generation
Paper
•
2510.14975
•
Published
•
84
Learning an Image Editing Model without Image Editing Pairs
Paper
•
2510.14978
•
Published
•
8
The Principles of Diffusion Models
Paper
•
2510.21890
•
Published
•
60
Thinking with Camera: A Unified Multimodal Model for Camera-Centric
Understanding and Generation
Paper
•
2510.08673
•
Published
•
125
From Denoising to Refining: A Corrective Framework for Vision-Language
Diffusion Model
Paper
•
2510.19871
•
Published
•
29
From Editor to Dense Geometry Estimator
Paper
•
2509.04338
•
Published
•
92
AToken: A Unified Tokenizer for Vision
Paper
•
2509.14476
•
Published
•
36
DoPE: Denoising Rotary Position Embedding
Paper
•
2511.09146
•
Published
•
95