Spaces:
Running
Running
title: TextLens - AI-Powered OCR | |
emoji: π | |
colorFrom: blue | |
colorTo: purple | |
sdk: gradio | |
sdk_version: 4.0.0 | |
app_file: app.py | |
pinned: false | |
license: mit | |
# π TextLens - AI-Powered OCR | |
A modern Vision-Language Model (VLM) based OCR application that extracts text from images using Microsoft Florence-2 model with intelligent fallback systems. | |
## β¨ Features | |
- **π€ Advanced VLM OCR**: Uses Microsoft Florence-2 for state-of-the-art text extraction | |
- **π Smart Fallback System**: Automatically falls back to EasyOCR if Florence-2 fails | |
- **π§ͺ Demo Mode**: Test mode for demonstration when other methods are unavailable | |
- **π¨ Modern UI**: Clean, responsive Gradio interface with excellent UX | |
- **π± Multiple Input Methods**: Upload, webcam, clipboard support | |
- **β‘ Real-time Processing**: Automatic text extraction on image upload | |
- **π Copy Functionality**: Easy text copying from results | |
- **π GPU Acceleration**: Supports CUDA, MPS, and CPU inference | |
- **π‘οΈ Error Handling**: Robust error handling and user-friendly messages | |
## ποΈ Architecture | |
``` | |
textlens-ocr/ | |
βββ app.py # Main Gradio application | |
βββ requirements.txt # Python dependencies | |
βββ README.md # Project documentation | |
βββ models/ # OCR processing modules | |
β βββ __init__.py | |
β βββ ocr_processor.py # Advanced OCR class with fallbacks | |
βββ utils/ # Utility functions | |
β βββ __init__.py | |
β βββ image_utils.py # Image preprocessing utilities | |
βββ ui/ # User interface components | |
βββ __init__.py | |
βββ interface.py # Gradio interface | |
βββ handlers.py # Event handlers | |
βββ styles.py # CSS styling | |
``` | |
## π Quick Start | |
### Local Development | |
1. **Clone the repository** | |
```bash | |
git clone https://github.com/KumarAmrit30/textlens-ocr.git | |
cd textlens-ocr | |
``` | |
2. **Set up Python environment** | |
```bash | |
python3 -m venv textlens_env | |
source textlens_env/bin/activate # On Windows: textlens_env\Scripts\activate | |
``` | |
3. **Install dependencies** | |
```bash | |
pip install -r requirements.txt | |
``` | |
4. **Run the application** | |
```bash | |
python app.py | |
``` | |
5. **Open your browser** | |
Navigate to `http://localhost:7860` | |
### Quick Test | |
Run the test suite to verify everything works: | |
```bash | |
python test_ocr.py | |
``` | |
## π§ Technical Details | |
### OCR Processing Pipeline | |
1. **Primary**: Microsoft Florence-2 VLM | |
- State-of-the-art vision-language model | |
- Supports both basic OCR and region-based extraction | |
- GPU accelerated inference | |
2. **Fallback**: EasyOCR | |
- Traditional OCR with good accuracy | |
- Works when Florence-2 fails to load | |
- Multi-language support | |
3. **Demo Mode**: Test Mode | |
- Demonstration functionality | |
- Shows interface working correctly | |
- Used when other methods are unavailable | |
### Model Loading Strategy | |
The application uses an intelligent loading strategy: | |
```python | |
try: | |
# Try Florence-2 with specific revision | |
model = AutoModelForCausalLM.from_pretrained( | |
"microsoft/Florence-2-base", | |
revision='refs/pr/6', | |
trust_remote_code=True | |
) | |
except: | |
# Fall back to default Florence-2 | |
model = AutoModelForCausalLM.from_pretrained( | |
"microsoft/Florence-2-base", | |
trust_remote_code=True | |
) | |
``` | |
### Device Detection | |
Automatically detects and uses the best available device: | |
- **CUDA**: NVIDIA GPUs with CUDA support | |
- **MPS**: Apple Silicon Macs (M1/M2/M3) | |
- **CPU**: Fallback for all systems | |
## π Performance | |
| Model | Size | Speed | Accuracy | Use Case | | |
| ---------------- | ------ | ------ | --------- | --------------------- | | |
| Florence-2-base | 230M | Fast | High | General OCR | | |
| Florence-2-large | 770M | Medium | Very High | High accuracy needs | | |
| EasyOCR | ~100MB | Medium | Good | Fallback/Multilingual | | |
## π Supported Image Formats | |
- **JPEG** (.jpg, .jpeg) | |
- **PNG** (.png) | |
- **WebP** (.webp) | |
- **BMP** (.bmp) | |
- **TIFF** (.tiff, .tif) | |
- **GIF** (.gif) | |
## π― Use Cases | |
- **π Document Digitization**: Convert physical documents to text | |
- **πͺ Receipt Processing**: Extract data from receipts and invoices | |
- **π± Screenshot Text Extraction**: Get text from app screenshots | |
- **π License Plate Reading**: Extract text from vehicle plates | |
- **π Book/Article Scanning**: Digitize printed materials | |
- **π Multilingual Text**: Process text in various languages | |
## π οΈ Configuration | |
### Model Selection | |
Change the model in `models/ocr_processor.py`: | |
```python | |
# For faster inference | |
ocr = OCRProcessor(model_name="microsoft/Florence-2-base") | |
# For higher accuracy | |
ocr = OCRProcessor(model_name="microsoft/Florence-2-large") | |
``` | |
### UI Customization | |
Modify the Gradio interface in `app.py`: | |
- Update colors and styling in the CSS section | |
- Change layout in the `create_interface()` function | |
- Add new features or components | |
## π§ͺ Testing | |
The project includes comprehensive tests: | |
```bash | |
# Run all tests | |
python test_ocr.py | |
# Test specific functionality | |
python -c "from models.ocr_processor import OCRProcessor; ocr = OCRProcessor(); print(ocr.get_model_info())" | |
``` | |
## π Deployment | |
### HuggingFace Spaces | |
1. Fork this repository | |
2. Create a new Space on HuggingFace | |
3. Connect your repository | |
4. The app will automatically deploy | |
### Docker Deployment | |
```dockerfile | |
FROM python:3.9-slim | |
WORKDIR /app | |
COPY requirements.txt . | |
RUN pip install -r requirements.txt | |
COPY . . | |
EXPOSE 7860 | |
CMD ["python", "app.py"] | |
``` | |
### Local Server | |
```bash | |
# Production server | |
pip install gunicorn | |
gunicorn -w 4 -b 0.0.0.0:7860 app:create_interface().app | |
``` | |
## π Environment Variables | |
| Variable | Description | Default | | |
| ---------------------- | --------------------- | ---------------------- | | |
| `GRADIO_SERVER_PORT` | Server port | 7860 | | |
| `TRANSFORMERS_CACHE` | Model cache directory | `~/.cache/huggingface` | | |
| `CUDA_VISIBLE_DEVICES` | GPU device selection | All available | | |
## π€ Contributing | |
1. Fork the repository | |
2. Create a feature branch | |
3. Make your changes | |
4. Add tests for new functionality | |
5. Submit a pull request | |
## π API Reference | |
### OCRProcessor Class | |
```python | |
from models.ocr_processor import OCRProcessor | |
# Initialize | |
ocr = OCRProcessor(model_name="microsoft/Florence-2-base") | |
# Extract text | |
text = ocr.extract_text(image) | |
# Extract with regions | |
result = ocr.extract_text_with_regions(image) | |
# Get model info | |
info = ocr.get_model_info() | |
``` | |
## π Troubleshooting | |
### Common Issues | |
1. **Model Loading Errors** | |
```bash | |
# Install missing dependencies | |
pip install einops timm | |
``` | |
2. **CUDA Out of Memory** | |
```python | |
# Use CPU instead | |
ocr = OCRProcessor() | |
ocr.device = "cpu" | |
``` | |
3. **SSL Certificate Errors** | |
```bash | |
# Update certificates (macOS) | |
/Applications/Python\ 3.x/Install\ Certificates.command | |
``` | |
## π License | |
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. | |
## π Acknowledgments | |
- **Microsoft** for the Florence-2 model | |
- **HuggingFace** for the transformers library | |
- **Gradio** for the web interface framework | |
- **EasyOCR** for fallback OCR capabilities | |
## π Support | |
- Create an issue for bug reports | |
- Start a discussion for feature requests | |
- Check existing issues before posting | |
--- | |
**Made with β€οΈ for the AI community** | |