radextract / README.md
goelak's picture
Initial commit for RadExtract
fab8051
metadata
title: RadExtract
emoji: πŸ—‚οΈ
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: apache-2.0
header: mini
app_port: 7870
tags:
  - medical
  - nlp
  - radiology
  - langextract
  - gemini
  - structured-data

RadExtract: Radiology Report Structuring Demo

πŸ€— Hugging Face Spaces LangExtract License

A demonstration application powered by LangExtract that structures radiology reports using Gemini models. Transform unstructured radiology text into organized, interactive segments with clinical significance annotations.

Try the Demo

Launch RadExtract Demo

Transform unstructured radiology reports into structured data with highlighted findings that are precisely mapped back to the original source text.

Key Features

  • Structured Output: Organizes reports into anatomical sections with clinical significance
  • Interactive Highlighting: Click any finding to see its exact source in the original text
  • Clinical Significance: Annotates findings as minor, significant, or grounding
  • Character-Level Mapping: Precise attribution back to source text
  • Multi-Model Support: Gemini 2.5 Flash (fast) and Pro (comprehensive)

Quick Start

Setup

git clone https://huggingface.co/spaces/google/radextract
cd radextract
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"
cp env.list.example env.list
# Edit env.list and set KEY=your_gemini_api_key_here

Local Development

source venv/bin/activate
export KEY=your_gemini_api_key_here
python app.py

Access at: http://localhost:7870

API Usage

Example Request

curl -X POST \
  -H 'X-Model-ID: gemini-2.5-flash' \
  -H 'X-Use-Cache: true' \
  -d 'FINDINGS: Normal heart and lungs. IMPRESSION: Normal study.' \
  http://localhost:7870/predict

Response Format

{
  "segments": [{
    "type": "body",
    "label": "Chest", 
    "content": "Normal heart and lungs",
    "intervals": [{"startPos": 10, "endPos": 32}],
    "significance": "minor"
  }],
  "text": "Chest:\n- Normal heart and lungs",
  "annotated_document_json": {...}
}

Architecture

  • Backend: Flask + Python 3.10+ with full type safety
  • NLP Engine: LangExtract for structured extraction
  • AI Models: Google Gemini 2.5 (Flash/Pro)
  • Frontend: Vanilla JavaScript with interactive UI
  • Deployment: Docker + Hugging Face Spaces
  • Package Details: See pyproject.toml for dependencies, metadata, and tooling

Project Structure

radextract/
β”œβ”€β”€ app.py                 # Flask API endpoints
β”œβ”€β”€ structure_report.py    # Core structuring logic
β”œβ”€β”€ sanitize.py           # Text preprocessing & normalization
β”œβ”€β”€ prompt_instruction.py  # LangExtract prompt
β”œβ”€β”€ cache_manager.py      # Response caching
β”œβ”€β”€ static/               # Frontend assets
└── templates/            # HTML templates

Development

Setup

git clone https://huggingface.co/spaces/google/radextract
cd radextract
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"

Code Quality

# Format code
pyink . && isort .

# Type checking
mypy . --ignore-missing-imports

# Run tests
pytest

Docker

# Build and run
docker build -t radextract .
docker run -p 7870:7870 --env-file env.list radextract

License

Apache License 2.0 - see LICENSE for details.

Related Projects


Built for the medical AI community | Hosted on Hugging Face Spaces

Disclaimer

This is not an officially supported Google product. If you use RadExtract or LangExtract in production or publications, please cite accordingly and acknowledge usage. Use is subject to the Apache 2.0 License. For health-related applications, use of LangExtract is also subject to the Health AI Developer Foundations Terms of Use.