DiffSketcher: Text Guided Vector Sketch Synthesis

DiffSketcher is a novel method for generating high-quality vector sketches from text prompts using latent diffusion models. This model can create artistic SVG representations based on natural language descriptions.

Model Description

DiffSketcher leverages the power of Stable Diffusion to guide the generation of vector graphics. The model optimizes SVG paths to match the semantic content described in the input text while maintaining the artistic quality of hand-drawn sketches.

Usage

Direct API Call

import requests

API_URL = "https://api-inference.huggingface.co/models/jree423/diffsketcher"
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

output = query({
    "inputs": "a beautiful mountain landscape",
    "parameters": {
        "num_paths": 96,
        "num_iter": 500,
        "guidance_scale": 7.5,
        "width": 224,
        "height": 224,
        "seed": 42
    }
})

Using the Inference Client

from huggingface_hub import InferenceClient

client = InferenceClient("jree423/diffsketcher")
result = client.post(
    json={
        "inputs": "a cat sitting on a windowsill",
        "parameters": {
            "num_paths": 128,
            "guidance_scale": 8.0
        }
    }
)

Parameters

num_paths (int, default: 96): Number of SVG paths to generate. More paths create more detailed sketches.
num_iter (int, default: 500): Number of optimization iterations. More iterations improve quality but take longer.
guidance_scale (float, default: 7.5): Controls how closely the generation follows the text prompt.
width (int, default: 224): Output SVG width in pixels.
height (int, default: 224): Output SVG height in pixels.
seed (int, default: 42): Random seed for reproducible results.

Output Format

The model returns a JSON object containing:

svg: The generated SVG content as a string
svg_base64: Base64 encoded SVG for easy transmission
prompt: The input text prompt
parameters: The parameters used for generation

Examples

Simple Objects

"a red apple"
"a flying bird"
"a vintage car"

Complex Scenes

"a mountain landscape with trees"
"a city skyline at sunset"
"a garden with flowers and butterflies"

Artistic Styles

"a portrait in the style of Van Gogh"
"minimalist line drawing of a face"
"abstract geometric patterns"

Technical Details

Base Model: Stable Diffusion 2.1
Framework: PyTorch + Diffusers
Vector Rendering: DiffVG (differentiable vector graphics)
Optimization: Adam optimizer with custom learning rates for different SVG parameters

Citation

@inproceedings{xing2023diffsketcher,
  title={DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models},
  author={Xing, XiMing and others},
  booktitle={NeurIPS},
  year={2023}
}

License

This model is released under the MIT License.