FLUX.2-Image

Running on Zero

tchung1970 Claude Opus 4.5 commited on 20 days ago

Commit

803c754

1 Parent(s): 850d0d4

Revamp UI to match Z-Image Apple-inspired design

- Two-column horizontal layout with fixed 550px input column
- Large prompt textbox with character counter
- Aspect ratio dropdown with 2K resolutions (default 2:3 1344x2048)
- Removed input images and prompt upsampling features
- Apple-style theming with dark mode support
- Added CLAUDE.md for Claude Code guidance

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>

Files changed (2) hide show

CLAUDE.md +46 -0
app.py +385 -147

CLAUDE.md ADDED Viewed

	@@ -0,0 +1,46 @@

+# CLAUDE.md
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+## Project Overview
+This is a Hugging Face Spaces demo for FLUX.2 [dev], a 32B parameter rectified flow model for generating, editing, and combining images based on text instructions. It uses Black Forest Labs' FLUX.2-dev model.
+## Running the Application
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Run the Gradio app
+python app.py
+```
+The app runs on Hugging Face Spaces with ZeroGPU infrastructure. Requires `HF_TOKEN` environment variable for the VLM-based prompt upsampling feature.
+## Architecture
+### Main Components
+- **app.py**: Gradio web interface and inference pipeline
+  - Loads FLUX.2 transformer without text encoder (uses remote encoding)
+  - Uses `spaces.aoti_blocks_load()` to load pre-compiled transformer blocks from HF hub
+  - Main inference flow: prompt upsampling (optional) → remote text encoding → GPU image generation
+- **optimization.py**: AOT compilation utilities for transformer blocks
+  - Defines dynamic shapes for variable-length image sequences
+  - Contains inductor configs for Triton compilation with cudagraphs
+### Key Pipeline Details
+1. **Text Encoding**: Offloaded to remote Gradio client (`multimodalart/mistral-text-encoder`) - runs on CPU, network-bound
+2. **Prompt Upsampling**: Uses ERNIE-4.5-VL via Hugging Face Inference API - two modes:
+   - Text-only: Enhances prompts with visual details
+   - Image+text: Converts editing requests into concise instructions
+3. **Image Generation**: GPU-bound, uses `@spaces.GPU` decorator with dynamic duration based on number of input images and inference steps
+### Configuration Constants
+- `MAX_IMAGE_SIZE`: 1024
+- `dtype`: torch.bfloat16
+- Dimensions auto-adjust to uploaded image aspect ratio (multiples of 8, min 256, max 1024)

app.py CHANGED Viewed

@@ -2,6 +2,7 @@ import os
 import subprocess
 import sys
 import io
 import gradio as gr
 import numpy as np
 import random
@@ -187,44 +188,39 @@ def generate_image(prompt_embeds, image_list, width, height, num_inference_steps
     image = pipe(**pipe_kwargs).images[0]
     return image
-def infer(prompt, input_images=None, seed=42, randomize_seed=False, width=1024, height=1024, num_inference_steps=50, guidance_scale=2.5, prompt_upsampling=False, progress=gr.Progress(track_tqdm=True)):
     if randomize_seed:
         seed = random.randint(0, MAX_SEED)
-    # Prepare image list (convert None or empty gallery to None)
-    image_list = None
-    if input_images is not None and len(input_images) > 0:
-        image_list = []
-        for item in input_images:
-            image_list.append(item[0])
-    # 1. Upsampling (Network bound - No GPU needed)
-    final_prompt = prompt
-    if prompt_upsampling:
-        progress(0.05, desc="Upsampling prompt...")
-        final_prompt = upsample_prompt_logic(prompt, image_list)
-        print(f"Original Prompt: {prompt}")
-        print(f"Upsampled Prompt: {final_prompt}")
-    # 2. Text Encoding (Network bound - No GPU needed)
     progress(0.1, desc="Encoding prompt...")
-    # This returns CPU tensors
-    prompt_embeds = remote_text_encoder(final_prompt)
-    # 3. Image Generation (GPU bound)
     progress(0.3, desc="Waiting for GPU...")
     image = generate_image(
-        prompt_embeds,
-        image_list,
-        width,
-        height,
-        num_inference_steps,
-        guidance_scale,
-        seed,
         progress
     )
     return image, seed
 examples = [
@@ -239,132 +235,374 @@ examples_images = [
     ["The person from image 1 is petting the cat from image 2, the bird from image 3 is next to them", ["woman1.webp", "cat_window.webp", "bird.webp"]]
 ]
-css="""
-#col-container {
-    margin: 0 auto;
-    max-width: 1200px;
 }
-.gallery-container img{
-    object-fit: contain;
 }
 """
-with gr.Blocks() as demo:
-    with gr.Column(elem_id="col-container"):
-        gr.Markdown(f"""# FLUX.2 [dev]
-FLUX.2 [dev] is a 32B model rectified flow capable of generating, editing and combining images based on text instructions model [[model](https://huggingface.co/black-forest-labs/FLUX.2-dev)], [[blog](https://bfl.ai/blog/flux-2)]
-        """)
-        with gr.Row():
-            with gr.Column():
                 with gr.Row():
-                    prompt = gr.Text(
-                        label="Prompt",
-                        show_label=False,
-                        max_lines=2,
-                        placeholder="Enter your prompt",
-                        container=False,
-                        scale=3
-                    )
-                    run_button = gr.Button("Run", scale=1)
-                with gr.Accordion("Input image(s) (optional)", open=True):
-                    input_images = gr.Gallery(
-                        label="Input Image(s)",
-                        type="pil",
-                        columns=3,
-                        rows=1,
-                    )
-                with gr.Accordion("Advanced Settings", open=False):
-                    prompt_upsampling = gr.Checkbox(
-                        label="Prompt Upsampling",
-                        value=True,
-                        info="Automatically enhance the prompt using a VLM"
-                    )
-                    seed = gr.Slider(
-                        label="Seed",
-                        minimum=0,
-                        maximum=MAX_SEED,
                         step=1,
-                        value=0,
                     )
-                    randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
-                    with gr.Row():
-                        width = gr.Slider(
-                            label="Width",
-                            minimum=256,
-                            maximum=MAX_IMAGE_SIZE,
-                            step=8,
-                            value=1024,
-                        )
-                        height = gr.Slider(
-                            label="Height",
-                            minimum=256,
-                            maximum=MAX_IMAGE_SIZE,
-                            step=8,
-                            value=1024,
-                        )
-                    with gr.Row():
-                        num_inference_steps = gr.Slider(
-                            label="Number of inference steps",
-                            minimum=1,
-                            maximum=100,
-                            step=1,
-                            value=30,
-                        )
-                        guidance_scale = gr.Slider(
-                            label="Guidance scale",
-                            minimum=0.0,
-                            maximum=10.0,
-                            step=0.1,
-                            value=4,
-                        )
-            with gr.Column():
-                result = gr.Image(label="Result", show_label=False)
-        gr.Examples(
-            examples=examples,
-            fn=infer,
-            inputs=[prompt],
-            outputs=[result, seed],
-            cache_examples=True,
-            cache_mode="lazy"
-        )
-        gr.Examples(
-            examples=examples_images,
-            fn=infer,
-            inputs=[prompt, input_images],
-            outputs=[result, seed],
-            cache_examples=True,
-            cache_mode="lazy"
-        )
-    # Auto-update dimensions when images are uploaded
-    input_images.upload(
-        fn=update_dimensions_from_image,
-        inputs=[input_images],
-        outputs=[width, height]
-    )
     gr.on(
-        triggers=[run_button.click, prompt.submit],
         fn=infer,
-        inputs=[prompt, input_images, seed, randomize_seed, width, height, num_inference_steps, guidance_scale, prompt_upsampling],
-        outputs=[result, seed]
     )
-demo.launch(css=css)

 import subprocess
 import sys
 import io
+import re
 import gradio as gr
 import numpy as np
 import random
     image = pipe(**pipe_kwargs).images[0]
     return image
+def parse_aspect_ratio(aspect_ratio_str):
+    """Parse aspect ratio string to get width and height."""
+    # Extract dimensions from format like "1:1 (1024x1024)"
+    match = re.search(r'\((\d+)x(\d+)\)', aspect_ratio_str)
+    if match:
+        return int(match.group(1)), int(match.group(2))
+    return 1024, 1024  # Default
+def infer(prompt, aspect_ratio="1:1 (1024x1024)", seed=42, randomize_seed=False, num_inference_steps=30, guidance_scale=4.0, progress=gr.Progress(track_tqdm=True)):
     if randomize_seed:
         seed = random.randint(0, MAX_SEED)
+    # Parse aspect ratio to get width and height
+    width, height = parse_aspect_ratio(aspect_ratio)
+    # Text Encoding (Network bound - No GPU needed)
     progress(0.1, desc="Encoding prompt...")
+    prompt_embeds = remote_text_encoder(prompt)
+    # Image Generation (GPU bound)
     progress(0.3, desc="Waiting for GPU...")
     image = generate_image(
+        prompt_embeds,
+        None,  # No input images
+        width,
+        height,
+        num_inference_steps,
+        guidance_scale,
+        seed,
         progress
     )
     return image, seed
 examples = [
     ["The person from image 1 is petting the cat from image 2, the bird from image 3 is next to them", ["woman1.webp", "cat_window.webp", "bird.webp"]]
 ]
+# Apple-inspired CSS styling
+css = """
+/* Global container styling */
+.gradio-container {
+    max-width: 85vw !important;
+    margin: 0 auto !important;
+    font-family: -apple-system, BlinkMacSystemFont, 'Inter', 'SF Pro Display', sans-serif !important;
+}
+/* Main row - horizontal layout */
+#main-row {
+    display: flex !important;
+    flex-direction: row !important;
+    flex-wrap: nowrap !important;
+    gap: 24px !important;
+    align-items: flex-start !important;
+}
+/* Input section - fixed width */
+#input-column {
+    background: #ffffff !important;
+    border-radius: 18px !important;
+    padding: 32px !important;
+    box-shadow: 0 2px 12px rgba(0, 0, 0, 0.08) !important;
+    width: 550px !important;
+    min-width: 550px !important;
+    max-width: 550px !important;
+    flex: 0 0 550px !important;
+}
+/* Output section - flexible */
+#output-column {
+    flex: 1 1 auto !important;
+    min-height: 80vh !important;
+    max-height: 90vh !important;
+    display: flex !important;
+    flex-direction: column !important;
+}
+/* Header styling */
+.header-container {
+    text-align: center;
+    margin-bottom: 24px;
+}
+.main-title {
+    font-size: 32px !important;
+    font-weight: 600 !important;
+    letter-spacing: -0.02em !important;
+    color: #1d1d1f !important;
+    margin: 0 !important;
+}
+/* Prompt textbox */
+#prompt-textbox textarea {
+    min-height: 400px !important;
+    max-height: 500px !important;
+    border-radius: 12px !important;
+    border: 1px solid #d2d2d7 !important;
+    padding: 16px !important;
+    font-size: 15px !important;
+    line-height: 1.5 !important;
+    resize: vertical !important;
+}
+#prompt-textbox textarea:focus {
+    border-color: #0071e3 !important;
+    box-shadow: 0 0 0 4px rgba(0, 113, 227, 0.15) !important;
+    outline: none !important;
+}
+/* Character counter */
+.char-counter {
+    text-align: center;
+    font-size: 13px;
+    color: #86868b;
+    margin-top: 8px;
+    margin-bottom: 16px;
+}
+.char-counter.warning {
+    color: #ff9500;
+}
+.char-counter.limit {
+    color: #ff3b30;
+}
+/* Generate button */
+button.primary {
+    background: #0071e3 !important;
+    border: none !important;
+    border-radius: 980px !important;
+    padding: 12px 32px !important;
+    font-size: 17px !important;
+    font-weight: 500 !important;
+    color: white !important;
+    cursor: pointer !important;
+    transition: all 0.2s ease !important;
+    width: 100% !important;
+    margin-top: 16px !important;
+}
+button.primary:hover {
+    background: #0077ED !important;
+    transform: scale(1.02) !important;
+}
+/* Accordion styling */
+.accordion {
+    border: 1px solid #d2d2d7 !important;
+    border-radius: 12px !important;
+    margin-top: 16px !important;
+}
+/* Gallery styling */
+.gallery-container img {
+    object-fit: contain !important;
+}
+/* Output image */
+#output-column .image-container {
+    border-radius: 18px !important;
+    overflow: hidden !important;
+    box-shadow: 0 2px 12px rgba(0, 0, 0, 0.08) !important;
+}
+/* Dark mode support */
+.dark #input-column {
+    background: #1d1d1f !important;
+    box-shadow: 0 2px 12px rgba(0, 0, 0, 0.4) !important;
 }
+.dark .main-title {
+    color: #f5f5f7 !important;
+}
+.dark #prompt-textbox textarea {
+    background: #2d2d2f !important;
+    border-color: #424245 !important;
+    color: #f5f5f7 !important;
+}
+.dark #prompt-textbox textarea:focus {
+    border-color: #0071e3 !important;
+}
+.dark .char-counter {
+    color: #a1a1a6 !important;
+}
+/* Responsive adjustments */
+@media (max-width: 1200px) {
+    #main-row {
+        flex-direction: column !important;
+        flex-wrap: wrap !important;
+    }
+    #input-column {
+        width: 100% !important;
+        min-width: 100% !important;
+        max-width: 100% !important;
+        flex: 1 1 100% !important;
+    }
+    #output-column {
+        width: 100% !important;
+        min-height: 50vh !important;
+    }
 }
 """
+# JavaScript for layout control and character counter
+js_code = """
+function() {
+    // Force horizontal layout
+    function forceHorizontalLayout() {
+        const mainRow = document.getElementById('main-row');
+        if (mainRow) {
+            mainRow.style.display = 'flex';
+            mainRow.style.flexDirection = 'row';
+            mainRow.style.flexWrap = 'nowrap';
+        }
+        const inputCol = document.getElementById('input-column');
+        if (inputCol) {
+            inputCol.style.flex = '0 0 550px';
+            inputCol.style.width = '550px';
+            inputCol.style.minWidth = '550px';
+            inputCol.style.maxWidth = '550px';
+        }
+        const outputCol = document.getElementById('output-column');
+        if (outputCol) {
+            outputCol.style.flex = '1 1 auto';
+        }
+    }
+    // Character counter setup
+    function setupCharCounter() {
+        const textbox = document.querySelector('#prompt-textbox textarea');
+        const counterDiv = document.querySelector('.char-counter');
+        const countSpan = document.getElementById('char-count');
+        if (textbox && countSpan && counterDiv) {
+            const updateCounter = () => {
+                const len = textbox.value.length;
+                countSpan.textContent = len;
+                counterDiv.classList.remove('warning', 'limit');
+                if (len >= 2000) {
+                    counterDiv.classList.add('limit');
+                } else if (len >= 1800) {
+                    counterDiv.classList.add('warning');
+                }
+            };
+            textbox.addEventListener('input', updateCounter);
+            updateCounter();
+        }
+    }
+    // Run on load and with slight delay for Gradio rendering
+    forceHorizontalLayout();
+    setupCharCounter();
+    setTimeout(() => {
+        forceHorizontalLayout();
+        setupCharCounter();
+    }, 500);
+    setTimeout(() => {
+        forceHorizontalLayout();
+        setupCharCounter();
+    }, 1500);
+}
+"""
+# Theme configuration
+theme = gr.themes.Soft(
+    primary_hue=gr.themes.colors.blue,
+    secondary_hue=gr.themes.colors.slate,
+    spacing_size=gr.themes.sizes.spacing_lg,
+    radius_size=gr.themes.sizes.radius_lg,
+    font=[gr.themes.GoogleFont("Inter"), "SF Pro Display", "-apple-system", "BlinkMacSystemFont", "sans-serif"],
+).set(
+    body_background_fill='#f5f5f7',
+    body_background_fill_dark='#000000',
+    button_primary_background_fill='#0071e3',
+    button_primary_background_fill_hover='#0077ED',
+    block_background_fill='#ffffff',
+    block_background_fill_dark='#1d1d1f',
+    input_border_color='#d2d2d7',
+    input_border_color_dark='#424245',
+    input_shadow_focus='0 0 0 4px rgba(0, 113, 227, 0.15)',
+)
+with gr.Blocks(
+    title="FLUX.2 [dev]",
+    theme=theme,
+    css=css,
+    fill_height=False,
+) as demo:
+    # Two-column layout
+    with gr.Row(equal_height=False, elem_id="main-row"):
+        # LEFT COLUMN - Input Controls
+        with gr.Column(scale=0, min_width=550, elem_id="input-column"):
+            # Header
+            gr.HTML("""
+            <div class="header-container">
+                <h1 class="main-title">FLUX.2 [dev]</h1>
+            </div>
+            """)
+            # Prompt Textbox
+            prompt = gr.Textbox(
+                placeholder="Describe the image you want to create...",
+                lines=15,
+                max_lines=20,
+                max_length=2000,
+                label="Prompt",
+                show_label=True,
+                container=True,
+                autoscroll=False,
+                elem_id="prompt-textbox",
+            )
+            # Character Counter
+            char_counter = gr.HTML(
+                '<div class="char-counter"><span id="char-count">0</span> characters (max 2000)</div>'
+            )
+            # Aspect Ratio Dropdown
+            aspect_ratio = gr.Dropdown(
+                choices=[
+                    "1:1 (2048x2048)",
+                    "2:3 (1344x2048)",
+                    "3:2 (2048x1344)",
+                    "3:4 (1536x2048)",
+                    "4:3 (2048x1536)",
+                    "9:16 (1152x2048)",
+                    "16:9 (2048x1152)",
+                ],
+                value="2:3 (1344x2048)",
+                label="Aspect Ratio",
+                show_label=True,
+                container=True,
+            )
+            # Advanced Settings accordion
+            with gr.Accordion("Advanced Settings", open=False):
+                seed = gr.Slider(
+                    label="Seed",
+                    minimum=0,
+                    maximum=MAX_SEED,
+                    step=1,
+                    value=0,
+                )
+                randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
                 with gr.Row():
+                    num_inference_steps = gr.Slider(
+                        label="Number of inference steps",
+                        minimum=1,
+                        maximum=100,
                         step=1,
+                        value=30,
                     )
+                    guidance_scale = gr.Slider(
+                        label="Guidance scale",
+                        minimum=0.0,
+                        maximum=10.0,
+                        step=0.1,
+                        value=4,
+                    )
+            # Generate Button
+            generate_btn = gr.Button(
+                "Generate",
+                variant="primary",
+                size="lg",
+                elem_classes="primary"
+            )
+        # RIGHT COLUMN - Image Output
+        with gr.Column(scale=2, elem_id="output-column"):
+            result = gr.Image(
+                label="Result",
+                show_label=False,
+                type="pil",
+                format="png",
+            )
+    # Event handlers
     gr.on(
+        triggers=[generate_btn.click, prompt.submit],
         fn=infer,
+        inputs=[prompt, aspect_ratio, seed, randomize_seed, num_inference_steps, guidance_scale],
+        outputs=[result, seed],
+        show_progress="full"
     )
+    # Load JavaScript for layout control
+    demo.load(None, None, None, js=js_code)
+demo.launch()