Spaces:

GoConqurer
/

textlens-ocr

Running

App Files Files Community

textlens-ocr / README.md

GoConqurer

🔧 Fix HuggingFace Spaces deployment issues

6789f6f 5 months ago

preview code

raw

history blame

7.64 kB

metadata

title: TextLens - AI-Powered OCR
emoji: 🔍
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit

🔍 TextLens - AI-Powered OCR

A modern Vision-Language Model (VLM) based OCR application that extracts text from images using Microsoft Florence-2 model with intelligent fallback systems.

✨ Features

🤖 Advanced VLM OCR: Uses Microsoft Florence-2 for state-of-the-art text extraction
🔄 Smart Fallback System: Automatically falls back to EasyOCR if Florence-2 fails
🧪 Demo Mode: Test mode for demonstration when other methods are unavailable
🎨 Modern UI: Clean, responsive Gradio interface with excellent UX
📱 Multiple Input Methods: Upload, webcam, clipboard support
⚡ Real-time Processing: Automatic text extraction on image upload
📋 Copy Functionality: Easy text copying from results
🚀 GPU Acceleration: Supports CUDA, MPS, and CPU inference
🛡️ Error Handling: Robust error handling and user-friendly messages

🏗️ Architecture

textlens-ocr/
├── app.py                 # Main Gradio application
├── requirements.txt       # Python dependencies
├── README.md             # Project documentation
├── models/               # OCR processing modules
│   ├── __init__.py
│   └── ocr_processor.py  # Advanced OCR class with fallbacks
├── utils/                # Utility functions
│   ├── __init__.py
│   └── image_utils.py    # Image preprocessing utilities
└── ui/                   # User interface components
    ├── __init__.py
    ├── interface.py      # Gradio interface
    ├── handlers.py       # Event handlers
    └── styles.py         # CSS styling

🚀 Quick Start

Local Development

Clone the repository

git clone https://github.com/KumarAmrit30/textlens-ocr.git
cd textlens-ocr

Set up Python environment

python3 -m venv textlens_env
source textlens_env/bin/activate  # On Windows: textlens_env\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```
Run the application
```
python app.py
```
Open your browser Navigate to http://localhost:7860

Quick Test

Run the test suite to verify everything works:

python test_ocr.py

🔧 Technical Details

OCR Processing Pipeline

Primary: Microsoft Florence-2 VLM
- State-of-the-art vision-language model
- Supports both basic OCR and region-based extraction
- GPU accelerated inference
Fallback: EasyOCR
- Traditional OCR with good accuracy
- Works when Florence-2 fails to load
- Multi-language support
Demo Mode: Test Mode
- Demonstration functionality
- Shows interface working correctly
- Used when other methods are unavailable

Model Loading Strategy

The application uses an intelligent loading strategy:

try:
    # Try Florence-2 with specific revision
    model = AutoModelForCausalLM.from_pretrained(
        "microsoft/Florence-2-base",
        revision='refs/pr/6',
        trust_remote_code=True
    )
except:
    # Fall back to default Florence-2
    model = AutoModelForCausalLM.from_pretrained(
        "microsoft/Florence-2-base",
        trust_remote_code=True
    )

Device Detection

Automatically detects and uses the best available device:

CUDA: NVIDIA GPUs with CUDA support
MPS: Apple Silicon Macs (M1/M2/M3)
CPU: Fallback for all systems

📊 Performance

Model	Size	Speed	Accuracy	Use Case
Florence-2-base	230M	Fast	High	General OCR
Florence-2-large	770M	Medium	Very High	High accuracy needs
EasyOCR	~100MB	Medium	Good	Fallback/Multilingual

🔍 Supported Image Formats

JPEG (.jpg, .jpeg)
PNG (.png)
WebP (.webp)
BMP (.bmp)
TIFF (.tiff, .tif)
GIF (.gif)

🎯 Use Cases

📄 Document Digitization: Convert physical documents to text
🏪 Receipt Processing: Extract data from receipts and invoices
📱 Screenshot Text Extraction: Get text from app screenshots
🚗 License Plate Reading: Extract text from vehicle plates
📚 Book/Article Scanning: Digitize printed materials
🌐 Multilingual Text: Process text in various languages

🛠️ Configuration

Model Selection

Change the model in models/ocr_processor.py:

# For faster inference
ocr = OCRProcessor(model_name="microsoft/Florence-2-base")

# For higher accuracy
ocr = OCRProcessor(model_name="microsoft/Florence-2-large")

UI Customization

Modify the Gradio interface in app.py:

Update colors and styling in the CSS section
Change layout in the create_interface() function
Add new features or components

🧪 Testing

The project includes comprehensive tests:

# Run all tests
python test_ocr.py

# Test specific functionality
python -c "from models.ocr_processor import OCRProcessor; ocr = OCRProcessor(); print(ocr.get_model_info())"

🚀 Deployment

HuggingFace Spaces

Fork this repository
Create a new Space on HuggingFace
Connect your repository
The app will automatically deploy

Docker Deployment

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
EXPOSE 7860

CMD ["python", "app.py"]

Local Server

# Production server
pip install gunicorn
gunicorn -w 4 -b 0.0.0.0:7860 app:create_interface().app

🔐 Environment Variables

Variable	Description	Default
`GRADIO_SERVER_PORT`	Server port	7860
`TRANSFORMERS_CACHE`	Model cache directory	`~/.cache/huggingface`
`CUDA_VISIBLE_DEVICES`	GPU device selection	All available

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Submit a pull request

📝 API Reference

OCRProcessor Class

from models.ocr_processor import OCRProcessor

# Initialize
ocr = OCRProcessor(model_name="microsoft/Florence-2-base")

# Extract text
text = ocr.extract_text(image)

# Extract with regions
result = ocr.extract_text_with_regions(image)

# Get model info
info = ocr.get_model_info()

🐛 Troubleshooting

Common Issues

Model Loading Errors

# Install missing dependencies
pip install einops timm

CUDA Out of Memory

# Use CPU instead
ocr = OCRProcessor()
ocr.device = "cpu"

SSL Certificate Errors

# Update certificates (macOS)
/Applications/Python\ 3.x/Install\ Certificates.command

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Microsoft for the Florence-2 model
HuggingFace for the transformers library
Gradio for the web interface framework
EasyOCR for fallback OCR capabilities

📞 Support

Create an issue for bug reports
Start a discussion for feature requests
Check existing issues before posting

Made with ❤️ for the AI community