---
title: RadExtract
emoji: 🗂️
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: apache-2.0
header: mini
app_port: 7870
tags:
  - medical
  - nlp
  - radiology
  - langextract
  - gemini
  - structured-data
---

# RadExtract: Radiology Report Structuring Demo

[![🤗 Hugging Face Spaces](https://img.shields.io/badge/🤗%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/google/radextract)
[![LangExtract](https://img.shields.io/badge/Powered%20by-LangExtract-green)](https://github.com/google/langextract)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

A demonstration application powered by [LangExtract](https://github.com/google/langextract) that structures radiology reports using Gemini models. Transform unstructured radiology text into organized, interactive segments with clinical significance annotations.

## Try the Demo

**[Launch RadExtract Demo](https://huggingface.co/spaces/google/radextract)**

Transform unstructured radiology reports into structured data with highlighted findings that are precisely mapped back to the original source text.

## Key Features

- **Structured Output**: Organizes reports into anatomical sections with clinical significance
- **Interactive Highlighting**: Click any finding to see its exact source in the original text
- **Clinical Significance**: Annotates findings as minor, significant, or grounding
- **Character-Level Mapping**: Precise attribution back to source text
- **Multi-Model Support**: Gemini 2.5 Flash (fast) and Pro (comprehensive)

## Quick Start

### Setup

```bash
git clone https://huggingface.co/spaces/google/radextract
cd radextract
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"
cp env.list.example env.list
# Edit env.list and set KEY=your_gemini_api_key_here
```

### Local Development

```bash
source venv/bin/activate
export KEY=your_gemini_api_key_here
python app.py
```

Access at: http://localhost:7870

## API Usage

### Example Request
```bash
curl -X POST \
  -H 'X-Model-ID: gemini-2.5-flash' \
  -H 'X-Use-Cache: true' \
  -d 'FINDINGS: Normal heart and lungs. IMPRESSION: Normal study.' \
  http://localhost:7870/predict
```

### Response Format
```json
{
  "segments": [{
    "type": "body",
    "label": "Chest", 
    "content": "Normal heart and lungs",
    "intervals": [{"startPos": 10, "endPos": 32}],
    "significance": "minor"
  }],
  "text": "Chest:\n- Normal heart and lungs",
  "annotated_document_json": {...}
}
```

## Architecture

- **Backend**: Flask + Python 3.10+ with full type safety
- **NLP Engine**: [LangExtract](https://github.com/google/langextract) for structured extraction
- **AI Models**: Google Gemini 2.5 (Flash/Pro)
- **Frontend**: Vanilla JavaScript with interactive UI
- **Deployment**: Docker + Hugging Face Spaces
- **Package Details**: See [pyproject.toml](https://huggingface.co/spaces/google/radextract/blob/main/pyproject.toml) for dependencies, metadata, and tooling

## Project Structure

```
radextract/
├── app.py                 # Flask API endpoints
├── structure_report.py    # Core structuring logic
├── sanitize.py           # Text preprocessing & normalization
├── prompt_instruction.py  # LangExtract prompt
├── cache_manager.py      # Response caching
├── static/               # Frontend assets
└── templates/            # HTML templates
```

## Development

### Setup
```bash
git clone https://huggingface.co/spaces/google/radextract
cd radextract
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"
```

### Code Quality
```bash
# Format code
pyink . && isort .

# Type checking
mypy . --ignore-missing-imports

# Run tests
pytest
```

### Docker
```bash
# Build and run
docker build -t radextract .
docker run -p 7870:7870 --env-file env.list radextract
```

## License

Apache License 2.0 - see [LICENSE](LICENSE) for details.

## Related Projects

- **[LangExtract](https://github.com/google/langextract)**: Core NLP library

---

**Built for the medical AI community** | **Hosted on Hugging Face Spaces**

## Disclaimer

This is not an officially supported Google product. If you use RadExtract or LangExtract in production or publications, please cite accordingly and acknowledge usage. Use is subject to the [Apache 2.0 License](LICENSE). For health-related applications, use of LangExtract is also subject to the [Health AI Developer Foundations Terms of Use](https://developers.google.com/health-ai-foundations/terms).