metadata

title: RadExtract
emoji: 🗂️
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: apache-2.0
header: mini
app_port: 7870
tags:
  - medical
  - nlp
  - radiology
  - langextract
  - gemini
  - structured-data

RadExtract: Radiology Report Structuring Demo

A demonstration application powered by LangExtract that structures radiology reports using Gemini models. Transform unstructured radiology text into organized, interactive segments with clinical significance annotations.

Try the Demo

Launch RadExtract Demo

Transform unstructured radiology reports into structured data with highlighted findings that are precisely mapped back to the original source text.

Key Features

Structured Output: Organizes reports into anatomical sections with clinical significance
Interactive Highlighting: Click any finding to see its exact source in the original text
Clinical Significance: Annotates findings as minor, significant, or grounding
Character-Level Mapping: Precise attribution back to source text
Multi-Model Support: Gemini 2.5 Flash (fast) and Pro (comprehensive)

Quick Start

Setup

git clone https://huggingface.co/spaces/google/radextract
cd radextract
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"
cp env.list.example env.list
# Edit env.list and set KEY=your_gemini_api_key_here

Local Development

source venv/bin/activate
export KEY=your_gemini_api_key_here
python app.py

Access at: http://localhost:7870

API Usage

Example Request

curl -X POST \
  -H 'X-Model-ID: gemini-2.5-flash' \
  -H 'X-Use-Cache: true' \
  -d 'FINDINGS: Normal heart and lungs. IMPRESSION: Normal study.' \
  http://localhost:7870/predict

Response Format

{
  "segments": [{
    "type": "body",
    "label": "Chest", 
    "content": "Normal heart and lungs",
    "intervals": [{"startPos": 10, "endPos": 32}],
    "significance": "minor"
  }],
  "text": "Chest:\n- Normal heart and lungs",
  "annotated_document_json": {...}
}

Architecture

Backend: Flask + Python 3.10+ with full type safety
NLP Engine: LangExtract for structured extraction
AI Models: Google Gemini 2.5 (Flash/Pro)
Frontend: Vanilla JavaScript with interactive UI
Deployment: Docker + Hugging Face Spaces
Package Details: See pyproject.toml for dependencies, metadata, and tooling

Project Structure

radextract/
├── app.py                 # Flask API endpoints
├── structure_report.py    # Core structuring logic
├── sanitize.py           # Text preprocessing & normalization
├── prompt_instruction.py  # LangExtract prompt
├── cache_manager.py      # Response caching
├── static/               # Frontend assets
└── templates/            # HTML templates

Development

Setup

git clone https://huggingface.co/spaces/google/radextract
cd radextract
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"

Code Quality

# Format code
pyink . && isort .

# Type checking
mypy . --ignore-missing-imports

# Run tests
pytest

Docker

# Build and run
docker build -t radextract .
docker run -p 7870:7870 --env-file env.list radextract

License

Apache License 2.0 - see LICENSE for details.

Related Projects

LangExtract: Core NLP library

Built for the medical AI community | Hosted on Hugging Face Spaces

Disclaimer

This is not an officially supported Google product. If you use RadExtract or LangExtract in production or publications, please cite accordingly and acknowledge usage. Use is subject to the Apache 2.0 License. For health-related applications, use of LangExtract is also subject to the Health AI Developer Foundations Terms of Use.