--- title: RadExtract emoji: 🗂️ colorFrom: blue colorTo: green sdk: docker pinned: false license: apache-2.0 header: mini app_port: 7870 tags: - medical - nlp - radiology - langextract - gemini - structured-data --- # RadExtract: Radiology Report Structuring Demo [![🤗 Hugging Face Spaces](https://img.shields.io/badge/🤗%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/google/radextract) [![LangExtract](https://img.shields.io/badge/Powered%20by-LangExtract-green)](https://github.com/google/langextract) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) A demonstration application powered by [LangExtract](https://github.com/google/langextract) that structures radiology reports using Gemini models. Transform unstructured radiology text into organized, interactive segments with clinical significance annotations. ## Try the Demo **[Launch RadExtract Demo](https://huggingface.co/spaces/google/radextract)** Transform unstructured radiology reports into structured data with highlighted findings that are precisely mapped back to the original source text. ## Key Features - **Structured Output**: Organizes reports into anatomical sections with clinical significance - **Interactive Highlighting**: Click any finding to see its exact source in the original text - **Clinical Significance**: Annotates findings as minor, significant, or grounding - **Character-Level Mapping**: Precise attribution back to source text - **Multi-Model Support**: Gemini 2.5 Flash (fast) and Pro (comprehensive) ## Quick Start ### Setup ```bash git clone https://huggingface.co/spaces/google/radextract cd radextract python -m venv venv source venv/bin/activate pip install -e ".[dev]" cp env.list.example env.list # Edit env.list and set KEY=your_gemini_api_key_here ``` ### Local Development ```bash source venv/bin/activate export KEY=your_gemini_api_key_here python app.py ``` Access at: http://localhost:7870 ## API Usage ### Example Request ```bash curl -X POST \ -H 'X-Model-ID: gemini-2.5-flash' \ -H 'X-Use-Cache: true' \ -d 'FINDINGS: Normal heart and lungs. IMPRESSION: Normal study.' \ http://localhost:7870/predict ``` ### Response Format ```json { "segments": [{ "type": "body", "label": "Chest", "content": "Normal heart and lungs", "intervals": [{"startPos": 10, "endPos": 32}], "significance": "minor" }], "text": "Chest:\n- Normal heart and lungs", "annotated_document_json": {...} } ``` ## Architecture - **Backend**: Flask + Python 3.10+ with full type safety - **NLP Engine**: [LangExtract](https://github.com/google/langextract) for structured extraction - **AI Models**: Google Gemini 2.5 (Flash/Pro) - **Frontend**: Vanilla JavaScript with interactive UI - **Deployment**: Docker + Hugging Face Spaces - **Package Details**: See [pyproject.toml](https://huggingface.co/spaces/google/radextract/blob/main/pyproject.toml) for dependencies, metadata, and tooling ## Project Structure ``` radextract/ ├── app.py # Flask API endpoints ├── structure_report.py # Core structuring logic ├── sanitize.py # Text preprocessing & normalization ├── prompt_instruction.py # LangExtract prompt ├── cache_manager.py # Response caching ├── static/ # Frontend assets └── templates/ # HTML templates ``` ## Development ### Setup ```bash git clone https://huggingface.co/spaces/google/radextract cd radextract python -m venv venv source venv/bin/activate pip install -e ".[dev]" ``` ### Code Quality ```bash # Format code pyink . && isort . # Type checking mypy . --ignore-missing-imports # Run tests pytest ``` ### Docker ```bash # Build and run docker build -t radextract . docker run -p 7870:7870 --env-file env.list radextract ``` ## License Apache License 2.0 - see [LICENSE](LICENSE) for details. ## Related Projects - **[LangExtract](https://github.com/google/langextract)**: Core NLP library --- **Built for the medical AI community** | **Hosted on Hugging Face Spaces** ## Disclaimer This is not an officially supported Google product. If you use RadExtract or LangExtract in production or publications, please cite accordingly and acknowledge usage. Use is subject to the [Apache 2.0 License](LICENSE). For health-related applications, use of LangExtract is also subject to the [Health AI Developer Foundations Terms of Use](https://developers.google.com/health-ai-foundations/terms).