VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction Paper β’ 2511.23386 β’ Published 17 days ago β’ 12
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing Paper β’ 2512.06065 β’ Published 9 days ago β’ 28
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper β’ 2511.22699 β’ Published 17 days ago β’ 195
Glance: Accelerating Diffusion Models with 1 Sample Paper β’ 2512.02899 β’ Published 13 days ago β’ 28
Real-time Vision Models Collection A collection of real-time detectors. β’ 19 items β’ Updated 22 days ago β’ 21
NiT Collection release all the pre-trained models for Native-resolution diffusion Transformer β’ 6 items β’ Updated Sep 16 β’ 1
Instella β¨ Collection Announcing Instella, a series of 3 billion parameter language models developed by AMD, trained from scratch on 128 Instinct MI300X GPUs. β’ 13 items β’ Updated 10 days ago β’ 10
view article Article Weβre open-sourcing our text-to-image model and the process behind it Nov 12 β’ 75
Jan-v2-VL Collection Jan-v2-VL: an 8B VLM focused on reliable, many-step task execution. β’ 6 items β’ Updated Nov 13 β’ 37
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE Paper β’ 2507.21802 β’ Published Jul 29 β’ 17
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback Paper β’ 2510.16888 β’ Published Oct 19 β’ 21