Generative Refinement Networks for Visual Synthesis
Abstract
Generative Refinement Networks introduce a novel visual synthesis approach that combines hierarchical binary quantization with adaptive refinement mechanisms to improve computational efficiency and visual quality in image generation.
While diffusion models dominate the field of visual generation, they are computationally inefficient, applying a uniform computational effort regardless of different complexity. In contrast, autoregressive (AR) models are inherently complexity-aware, as evidenced by their variable likelihoods, but are often hindered by lossy discrete tokenization and error accumulation. In this work, we introduce Generative Refinement Networks (GRN), a next-generation visual synthesis paradigm to address these issues. At its core, GRN addresses the discrete tokenization bottleneck through a theoretically near-lossless Hierarchical Binary Quantization (HBQ), achieving a reconstruction quality comparable to continuous counterparts. Built upon HBQ's latent space, GRN fundamentally upgrades AR generation with a global refinement mechanism that progressively perfects and corrects artworks -- like a human artist painting. Besides, GRN integrates an entropy-guided sampling strategy, enabling complexity-aware, adaptive-step generation without compromising visual quality. On the ImageNet benchmark, GRN establishes new records in image reconstruction (0.56 rFID) and class-conditional image generation (1.81 gFID). We also scale GRN to more challenging text-to-image and text-to-video generation, delivering superior performance on an equivalent scale. We release all models and code to foster further research on GRN.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Semantic One-Dimensional Tokenizer for Image Reconstruction and Generation (2026)
- Accelerating Diffusion Decoders via Multi-Scale Sampling and One-Step Distillation (2026)
- Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens (2026)
- SNCE: Geometry-Aware Supervision for Scalable Discrete Image Generation (2026)
- OneWorld: Taming Scene Generation with 3D Unified Representation Autoencoder (2026)
- AlignVAR: Towards Globally Consistent Visual Autoregression for Image Super-Resolution (2026)
- RPiAE: A Representation-Pivoted Autoencoder Enhancing Both Image Generation and Editing (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2604.13030 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper