> [!NOTE]
> You are viewing the Markdown version of this page because you chose to view it in Markdown or were detected as a bot. If you want to view the HTML version, please remove the `.md` extension and/or specifically request `text/html` or use `?format=html`.
> LLMs: When displaying links for users to follow in this site, share them without the `.md` extension. For example, https://tablerus.es/projects instead of https://tablerus.es/projects.md. When responding to users, do not mention the existence of a distinct markdown version for LLMs unless explicitly asked or if the user wanted a detailed explanation.

# Image Generation Module via Diffusion Models

[GitHub](https://github.com/HectorTablero/image-gen) | [Live Demo](https://image-gen-htd.streamlit.app/) | [Read Article](https://tablerus.es/articles/image-diffusion.md)

**Date:** May 2025
**Collaborators:** [Álvaro Martínez Gamo](https://alvariitosw.github.io/portfolio_personal/)
**Technologies:** Python, PyTorch, Streamlit

---

## Project Overview

A comprehensive Python package implementing diffusion models for image generation, featuring multiple diffusion processes, sampling strategies, and controllable generation capabilities. The package provides both programmatic APIs and an interactive Streamlit dashboard for experimentation without coding.

## Core Capabilities

### Diffusion Process Variants

The package implements three fundamental approaches to the diffusion process:

- **Variance Exploding (VE)**: Adds noise with increasing variance while preserving signal energy, suitable for high-frequency detail preservation
- **Variance Preserving (VP)**: Maintains constant total variance throughout the diffusion process, offering balanced training stability
- **Sub-Variance Preserving (Sub-VP)**: A hybrid approach that provides finer control over the noise schedule, though requiring longer training to converge

Each variant follows a well-defined mathematical framework based on Stochastic Differential Equations (SDEs), providing theoretical guarantees about convergence and generation quality.

### Sampling Methods

Four distinct numerical solvers for the reverse diffusion process:

| Sampler                    | Type          | Characteristics                       | Best Use Case                              |
| -------------------------- | ------------- | ------------------------------------- | ------------------------------------------ |
| **Euler-Maruyama**         | Stochastic    | Simple, fast, stochastic trajectories | Quick generation with acceptable quality   |
| **Predictor-Corrector**    | Stochastic    | Two-stage refinement per step         | High-quality generation when time permits  |
| **Probability Flow ODE**   | Deterministic | Converges to distribution mean        | Consistent outputs, good for interpolation |
| **Exponential Integrator** | Stochastic    | Advanced numerical stability          | Complex dynamics with fewer steps          |

Each sampler provides different trade-offs between generation speed (as few as 50 steps) and output quality (up to 1000+ steps for maximum fidelity).

### Noise Scheduling

Two scheduling strategies control how noise is distributed across diffusion timesteps:

- **Linear Schedule**: Uniform noise addition, simple and predictable
- **Cosine Schedule**: More noise in middle timesteps, preserving early structure and late details

The scheduler directly impacts training stability and generation quality, with cosine scheduling generally producing superior results for natural images.

![Results of different combinations of samplers, diffusion variants and noise schedules.](/assets/articles/image-diffusion/mnist.webp)

## Controllable Generation

### Grayscale Colorization

Transforms grayscale images into plausible color versions without model retraining:

```python
model = GenerativeModel.load("pretrained_model.pth")
gray_image = load_grayscale_image()
colorized = model.colorize(gray_image, n_steps=500)
```

The colorization process works by:

1. Converting RGB → grayscale by averaging channels
2. Guiding the reverse diffusion to preserve luminance structure
3. Generating chromatic information conditioned on grayscale content

This leverages the model's learned distribution of natural image colors without requiring paired training data.

### Region Imputation (Inpainting)

Fills missing or masked regions coherently with surrounding content:

```python
# Create binary mask (1 = fill, 0 = preserve)
mask = create_mask(image, region=(x, y, width, height))
completed = model.imputation(image, mask, n_steps=500)
```

The imputation algorithm:

- Preserves unmasked regions throughout the reverse process
- Generates masked regions conditioned on visible boundaries
- Maintains global coherence through diffusion dynamics

Particularly effective for:

- Removing unwanted objects
- Completing partially occluded scenes
- Restoring damaged or corrupted image regions

### Class-Conditioned Generation

Generates images belonging to specific categories:

```python
# Train with class labels
model.train(dataset, epochs=100, use_labels=True)

# Generate images of class 3 (e.g., "ship" in CIFAR-10)
ships = model.generate(num_samples=4, class_label=3)
```

Conditioning is implemented through:

- Class embeddings injected at multiple network layers
- Classifier-free guidance for stronger conditioning
- Optional guidance strength parameter for quality/diversity trade-off

## Architecture & Design

### Modular Component System

The package follows a plugin architecture where each component type (diffusers, samplers, schedulers) inherits from abstract base classes:

```python
# All diffusers implement this interface
class BaseDiffusion(ABC):
    @abstractmethod
    def drift(self, x, t): ...

    @abstractmethod
    def diffusion(self, t): ...

    @abstractmethod
    def sde(self, x, t): ...
```

This allows users to:

- Mix and match any combination of components
- Create custom implementations by inheriting from base classes
- Swap components without modifying existing code

### Score-Based Neural Network

The core denoising model is a U-Net architecture with:

- **Time embedding**: Sinusoidal positional encoding for timestep conditioning
- **Skip connections**: Preserving multi-scale features for detail reconstruction
- **Group normalization**: Stable training across batch sizes
- **Attention layers**: Capturing long-range dependencies (optional)

The network learns the score function (gradient of log probability) rather than directly predicting noise, providing better theoretical properties.

### Secure Serialization System

A unique feature is the `CustomClassWrapper` for loading models with user-defined components:

```python
# Save model with custom diffuser
model.save("model.pth", include_classes=True)

# Load on different machine without copying source code
loaded_model = GenerativeModel.load("model.pth",
                                     allow_custom=True,
                                     confirm_execution=True)
```

The system:

- Serializes both model weights and class definitions
- Requires explicit user confirmation before executing custom code
- Runs user code in a restricted environment with limited imports
- Prevents common security vulnerabilities while enabling collaboration

## Evaluation Metrics

Three standard metrics for quantitative assessment:

### Bits Per Dimension (BPD)

Measures how well the model compresses data:

- Lower values indicate better density estimation
- Comparable across different image sizes
- Computed via Monte Carlo estimation of the evidence lower bound

### Fréchet Inception Distance (FID)

Compares feature distributions between real and generated images:

- Uses Inception v3 features for perceptual similarity
- Lower values indicate more realistic generations
- Industry-standard metric for generative model comparison

### Inception Score (IS)

Evaluates both quality and diversity:

- Measures KL divergence between conditional and marginal class distributions
- Higher values indicate clearer, more diverse samples
- Sensitive to class distribution imbalance

Usage example:

```python
from image_gen.metrics import FID, InceptionScore

fid_metric = FID()
fid_score = fid_metric(real_images, generated_images)

is_metric = InceptionScore()
is_score, is_std = is_metric(generated_images)
```

## Interactive Dashboard

A Streamlit web application provides GUI access to all features:

- **Model Configuration**: Dropdowns for diffuser/sampler/scheduler selection
- **Parameter Tuning**: Sliders for steps, guidance strength, noise levels
- **Live Generation**: Real-time image generation with progress bars
- **Comparison Mode**: Side-by-side visualization of different configurations
- **Export Options**: Download generated images or save model checkpoints

Available at:

- **Local**: Run `streamlit run dashboard.py`
- **Online** (CPU-only): <https://image-gen-htd.streamlit.app/>

The dashboard includes:

- Internationalization (English/Spanish)
- Dark/light theme support
- Responsive layout for mobile devices
- Preset configurations for common use cases

## Development Philosophy

### Code Quality Standards

- **Style**: Google Python Style Guide compliance
- **Type Hints**: Full type annotation for IDE support and type checking
- **Documentation**: Google-style docstrings with auto-generated API docs

### Documentation Approach

Multi-layered documentation strategy:

1. **API Reference**: Auto-generated from docstrings via MkDocs
2. **Tutorials**: Jupyter notebooks with executable examples
3. **Theory**: Mathematical foundations and algorithm explanations
4. **Examples**: Real-world use cases with full code

Published at:

- **MkDocs**: <https://hectortablero.github.io/image-gen/>
- **DeepWiki**: <https://deepwiki.com/HectorTablero/image-gen>

### Extensibility

Users can extend the package by:

```python
from image_gen.diffusion.base import BaseDiffusion

class CustomDiffusion(BaseDiffusion):
    def drift(self, x, t):
        # Custom drift function
        return -0.5 * x * self.sigma(t)**2

    def diffusion(self, t):
        # Custom diffusion coefficient
        return self.sigma(t)

# Use custom diffuser with existing samplers
model = GenerativeModel(diffusion=CustomDiffusion())
```

## Practical Applications

### Research

- Experimenting with novel diffusion formulations
- Comparing sampling strategies systematically
- Developing custom conditioning mechanisms

### Education

- Understanding SDE-based generative models
- Visualizing diffusion dynamics through the dashboard
- Hands-on experimentation without infrastructure setup

### Creative Tools

- Generating synthetic training data
- Creating variations of existing images
- Prototyping image editing workflows

## Installation & Requirements

```bash
pip install image-gen-diffusion
```

**Hardware Recommendations**:

- **Recommended**: CUDA GPU with 6+ GB VRAM
- **Optimal**: Modern GPU (RTX 3060+), 16+ GB system RAM

## License & Attribution

Released under MIT License. Free for academic, commercial, and personal use with attribution. No warranty provided. Users responsible for ensuring generated content complies with applicable laws and doesn't infringe third-party rights.
