Spaces:
				
			
			
	
			
			
					
		Running
		
	
	
	
			
			
	
	
	
	
		
		title: TextLens - AI-Powered OCR
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit
π TextLens - AI-Powered OCR
A modern Vision-Language Model (VLM) based OCR application that extracts text from images using Microsoft Florence-2 model with intelligent fallback systems.
β¨ Features
- π€ Advanced VLM OCR: Uses Microsoft Florence-2 for state-of-the-art text extraction
 - π Smart Fallback System: Automatically falls back to EasyOCR if Florence-2 fails
 - π§ͺ Demo Mode: Test mode for demonstration when other methods are unavailable
 - π¨ Modern UI: Clean, responsive Gradio interface with excellent UX
 - π± Multiple Input Methods: Upload, webcam, clipboard support
 - β‘ Real-time Processing: Automatic text extraction on image upload
 - π Copy Functionality: Easy text copying from results
 - π GPU Acceleration: Supports CUDA, MPS, and CPU inference
 - π‘οΈ Error Handling: Robust error handling and user-friendly messages
 
ποΈ Architecture
textlens-ocr/
βββ app.py                 # Main Gradio application
βββ requirements.txt       # Python dependencies
βββ README.md             # Project documentation
βββ models/               # OCR processing modules
β   βββ __init__.py
β   βββ ocr_processor.py  # Advanced OCR class with fallbacks
βββ utils/                # Utility functions
β   βββ __init__.py
β   βββ image_utils.py    # Image preprocessing utilities
βββ ui/                   # User interface components
    βββ __init__.py
    βββ interface.py      # Gradio interface
    βββ handlers.py       # Event handlers
    βββ styles.py         # CSS styling
π Quick Start
Local Development
Clone the repository
git clone https://github.com/KumarAmrit30/textlens-ocr.git cd textlens-ocrSet up Python environment
python3 -m venv textlens_env source textlens_env/bin/activate # On Windows: textlens_env\Scripts\activateInstall dependencies
pip install -r requirements.txtRun the application
python app.pyOpen your browser Navigate to
http://localhost:7860
Quick Test
Run the test suite to verify everything works:
python test_ocr.py
π§ Technical Details
OCR Processing Pipeline
Primary: Microsoft Florence-2 VLM
- State-of-the-art vision-language model
 - Supports both basic OCR and region-based extraction
 - GPU accelerated inference
 
Fallback: EasyOCR
- Traditional OCR with good accuracy
 - Works when Florence-2 fails to load
 - Multi-language support
 
Demo Mode: Test Mode
- Demonstration functionality
 - Shows interface working correctly
 - Used when other methods are unavailable
 
Model Loading Strategy
The application uses an intelligent loading strategy:
try:
    # Try Florence-2 with specific revision
    model = AutoModelForCausalLM.from_pretrained(
        "microsoft/Florence-2-base",
        revision='refs/pr/6',
        trust_remote_code=True
    )
except:
    # Fall back to default Florence-2
    model = AutoModelForCausalLM.from_pretrained(
        "microsoft/Florence-2-base",
        trust_remote_code=True
    )
Device Detection
Automatically detects and uses the best available device:
- CUDA: NVIDIA GPUs with CUDA support
 - MPS: Apple Silicon Macs (M1/M2/M3)
 - CPU: Fallback for all systems
 
π Performance
| Model | Size | Speed | Accuracy | Use Case | 
|---|---|---|---|---|
| Florence-2-base | 230M | Fast | High | General OCR | 
| Florence-2-large | 770M | Medium | Very High | High accuracy needs | 
| EasyOCR | ~100MB | Medium | Good | Fallback/Multilingual | 
π Supported Image Formats
- JPEG (.jpg, .jpeg)
 - PNG (.png)
 - WebP (.webp)
 - BMP (.bmp)
 - TIFF (.tiff, .tif)
 - GIF (.gif)
 
π― Use Cases
- π Document Digitization: Convert physical documents to text
 - πͺ Receipt Processing: Extract data from receipts and invoices
 - π± Screenshot Text Extraction: Get text from app screenshots
 - π License Plate Reading: Extract text from vehicle plates
 - π Book/Article Scanning: Digitize printed materials
 - π Multilingual Text: Process text in various languages
 
π οΈ Configuration
Model Selection
Change the model in models/ocr_processor.py:
# For faster inference
ocr = OCRProcessor(model_name="microsoft/Florence-2-base")
# For higher accuracy
ocr = OCRProcessor(model_name="microsoft/Florence-2-large")
UI Customization
Modify the Gradio interface in app.py:
- Update colors and styling in the CSS section
 - Change layout in the 
create_interface()function - Add new features or components
 
π§ͺ Testing
The project includes comprehensive tests:
# Run all tests
python test_ocr.py
# Test specific functionality
python -c "from models.ocr_processor import OCRProcessor; ocr = OCRProcessor(); print(ocr.get_model_info())"
π Deployment
HuggingFace Spaces
- Fork this repository
 - Create a new Space on HuggingFace
 - Connect your repository
 - The app will automatically deploy
 
Docker Deployment
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 7860
CMD ["python", "app.py"]
Local Server
# Production server
pip install gunicorn
gunicorn -w 4 -b 0.0.0.0:7860 app:create_interface().app
π Environment Variables
| Variable | Description | Default | 
|---|---|---|
GRADIO_SERVER_PORT | 
Server port | 7860 | 
TRANSFORMERS_CACHE | 
Model cache directory | ~/.cache/huggingface | 
CUDA_VISIBLE_DEVICES | 
GPU device selection | All available | 
π€ Contributing
- Fork the repository
 - Create a feature branch
 - Make your changes
 - Add tests for new functionality
 - Submit a pull request
 
π API Reference
OCRProcessor Class
from models.ocr_processor import OCRProcessor
# Initialize
ocr = OCRProcessor(model_name="microsoft/Florence-2-base")
# Extract text
text = ocr.extract_text(image)
# Extract with regions
result = ocr.extract_text_with_regions(image)
# Get model info
info = ocr.get_model_info()
π Troubleshooting
Common Issues
Model Loading Errors
# Install missing dependencies pip install einops timmCUDA Out of Memory
# Use CPU instead ocr = OCRProcessor() ocr.device = "cpu"SSL Certificate Errors
# Update certificates (macOS) /Applications/Python\ 3.x/Install\ Certificates.command
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- Microsoft for the Florence-2 model
 - HuggingFace for the transformers library
 - Gradio for the web interface framework
 - EasyOCR for fallback OCR capabilities
 
π Support
- Create an issue for bug reports
 - Start a discussion for feature requests
 - Check existing issues before posting
 
Made with β€οΈ for the AI community