my_read_book
updated
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper
• 2407.08083
• Published
• 32
Transfusion: Predict the Next Token and Diffuse Images with One
Multi-Modal Model
Paper
• 2408.11039
• Published
• 63
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Paper
• 2408.15237
• Published
• 42
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper
• 2409.11355
• Published
• 30
OmniGen: Unified Image Generation
Paper
• 2409.11340
• Published
• 115
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic
reasoning
Paper
• 2409.12183
• Published
• 39
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced
Mathematical Reasoning
Paper
• 2409.12568
• Published
• 50
Imagine yourself: Tuning-Free Personalized Image Generation
Paper
• 2409.13346
• Published
• 69
Training Language Models to Self-Correct via Reinforcement Learning
Paper
• 2409.12917
• Published
• 140
MaskBit: Embedding-free Image Generation via Bit Tokens
Paper
• 2409.16211
• Published
• 17
Emu3: Next-Token Prediction is All You Need
Paper
• 2409.18869
• Published
• 97
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free
Scale Fusion
Paper
• 2412.09626
• Published
• 21
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
• 2412.09871
• Published
• 108
ColorFlow: Retrieval-Augmented Image Sequence Colorization
Paper
• 2412.11815
• Published
• 26
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via
Collective Monte Carlo Tree Search
Paper
• 2412.18319
• Published
• 39
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Paper
• 2501.06186
• Published
• 65
Transformer^2: Self-adaptive LLMs
Paper
• 2501.06252
• Published
• 55
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
• 2501.08313
• Published
• 300
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
Paper
• 2501.06751
• Published
• 32
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published
• 441
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D
Assets Generation
Paper
• 2501.12202
• Published
• 49
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient
Long-Context LLM Inference
Paper
• 2502.00299
• Published
• 3
Region-Adaptive Sampling for Diffusion Transformers
Paper
• 2502.10389
• Published
• 53
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent
Image Generation
Paper
• 2502.18364
• Published
• 36
Transformers without Normalization
Paper
• 2503.10622
• Published
• 170
CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models
Paper
• 2503.18886
• Published
• 24
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation
Paper
• 2504.09454
• Published
• 11
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Paper
• 2503.10772
• Published
• 19
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion
Transformers via In-Context Reflection
Paper
• 2503.12271
• Published
• 9
From Reflection to Perfection: Scaling Inference-Time Optimization for
Text-to-Image Diffusion Models via Reflection Tuning
Paper
• 2504.16080
• Published
• 15
DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture
Design in Text to Image Generation
Paper
• 2503.10618
• Published
• 19
Softpick: No Attention Sink, No Massive Activations with Rectified
Softmax
Paper
• 2504.20966
• Published
• 31
Flow-GRPO: Training Flow Matching Models via Online RL
Paper
• 2505.05470
• Published
• 88
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
Paper
• 2505.04588
• Published
• 65
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision
Encoders for Multimodal Learning
Paper
• 2505.04601
• Published
• 29
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
• 2505.03335
• Published
• 189
Align Your Flow: Scaling Continuous-Time Flow Map Distillation
Paper
• 2506.14603
• Published
• 19
Medical World Model: Generative Simulation of Tumor Evolution for
Treatment Planning
Paper
• 2506.02327
• Published
• 20
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction
and Planning
Paper
• 2506.09985
• Published
• 31
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow
Development
Paper
• 2506.05010
• Published
• 80
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing
Paper
• 2506.17450
• Published
• 64
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Paper
• 2508.05004
• Published
• 130
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed
Inference
Paper
• 2508.02193
• Published
• 136
Representation Shift: Unifying Token Compression with FlashAttention
Paper
• 2508.00367
• Published
• 16
Qwen-Image Technical Report
Paper
• 2508.02324
• Published
• 272
Task structure and nonlinearity jointly determine learned
representational geometry
Paper
• 2401.13558
• Published
DCPO: Dynamic Clipping Policy Optimization
Paper
• 2509.02333
• Published
• 22
DoPE: Denoising Rotary Position Embedding
Paper
• 2511.09146
• Published
• 97
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation
Paper
• 2511.20714
• Published
• 50
Distribution Matching Distillation Meets Reinforcement Learning
Paper
• 2511.13649
• Published
• 5
SD3.5-Flash: Distribution-Guided Distillation of Generative Flows
Paper
• 2509.21318
• Published
• 11
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
Paper
• 2512.05150
• Published
• 76
EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture
Paper
• 2512.04810
• Published
• 26
Distribution Matching Variational AutoEncoder
Paper
• 2512.07778
• Published
• 29
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
Paper
• 2601.02204
• Published
• 62
DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation
Paper
• 2601.04823
• Published
• 7
Phi-4-reasoning-vision-15B Technical Report
Paper
• 2603.03975
• Published
• 15