textlens-ocr / README.md
GoConqurer's picture
πŸ”§ Fix HuggingFace Spaces deployment issues
6789f6f
|
raw
history blame
7.64 kB
---
title: TextLens - AI-Powered OCR
emoji: πŸ”
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit
---
# πŸ” TextLens - AI-Powered OCR
A modern Vision-Language Model (VLM) based OCR application that extracts text from images using Microsoft Florence-2 model with intelligent fallback systems.
## ✨ Features
- **πŸ€– Advanced VLM OCR**: Uses Microsoft Florence-2 for state-of-the-art text extraction
- **πŸ”„ Smart Fallback System**: Automatically falls back to EasyOCR if Florence-2 fails
- **πŸ§ͺ Demo Mode**: Test mode for demonstration when other methods are unavailable
- **🎨 Modern UI**: Clean, responsive Gradio interface with excellent UX
- **πŸ“± Multiple Input Methods**: Upload, webcam, clipboard support
- **⚑ Real-time Processing**: Automatic text extraction on image upload
- **πŸ“‹ Copy Functionality**: Easy text copying from results
- **πŸš€ GPU Acceleration**: Supports CUDA, MPS, and CPU inference
- **πŸ›‘οΈ Error Handling**: Robust error handling and user-friendly messages
## πŸ—οΈ Architecture
```
textlens-ocr/
β”œβ”€β”€ app.py # Main Gradio application
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ README.md # Project documentation
β”œβ”€β”€ models/ # OCR processing modules
β”‚ β”œβ”€β”€ __init__.py
β”‚ └── ocr_processor.py # Advanced OCR class with fallbacks
β”œβ”€β”€ utils/ # Utility functions
β”‚ β”œβ”€β”€ __init__.py
β”‚ └── image_utils.py # Image preprocessing utilities
└── ui/ # User interface components
β”œβ”€β”€ __init__.py
β”œβ”€β”€ interface.py # Gradio interface
β”œβ”€β”€ handlers.py # Event handlers
└── styles.py # CSS styling
```
## πŸš€ Quick Start
### Local Development
1. **Clone the repository**
```bash
git clone https://github.com/KumarAmrit30/textlens-ocr.git
cd textlens-ocr
```
2. **Set up Python environment**
```bash
python3 -m venv textlens_env
source textlens_env/bin/activate # On Windows: textlens_env\Scripts\activate
```
3. **Install dependencies**
```bash
pip install -r requirements.txt
```
4. **Run the application**
```bash
python app.py
```
5. **Open your browser**
Navigate to `http://localhost:7860`
### Quick Test
Run the test suite to verify everything works:
```bash
python test_ocr.py
```
## πŸ”§ Technical Details
### OCR Processing Pipeline
1. **Primary**: Microsoft Florence-2 VLM
- State-of-the-art vision-language model
- Supports both basic OCR and region-based extraction
- GPU accelerated inference
2. **Fallback**: EasyOCR
- Traditional OCR with good accuracy
- Works when Florence-2 fails to load
- Multi-language support
3. **Demo Mode**: Test Mode
- Demonstration functionality
- Shows interface working correctly
- Used when other methods are unavailable
### Model Loading Strategy
The application uses an intelligent loading strategy:
```python
try:
# Try Florence-2 with specific revision
model = AutoModelForCausalLM.from_pretrained(
"microsoft/Florence-2-base",
revision='refs/pr/6',
trust_remote_code=True
)
except:
# Fall back to default Florence-2
model = AutoModelForCausalLM.from_pretrained(
"microsoft/Florence-2-base",
trust_remote_code=True
)
```
### Device Detection
Automatically detects and uses the best available device:
- **CUDA**: NVIDIA GPUs with CUDA support
- **MPS**: Apple Silicon Macs (M1/M2/M3)
- **CPU**: Fallback for all systems
## πŸ“Š Performance
| Model | Size | Speed | Accuracy | Use Case |
| ---------------- | ------ | ------ | --------- | --------------------- |
| Florence-2-base | 230M | Fast | High | General OCR |
| Florence-2-large | 770M | Medium | Very High | High accuracy needs |
| EasyOCR | ~100MB | Medium | Good | Fallback/Multilingual |
## πŸ” Supported Image Formats
- **JPEG** (.jpg, .jpeg)
- **PNG** (.png)
- **WebP** (.webp)
- **BMP** (.bmp)
- **TIFF** (.tiff, .tif)
- **GIF** (.gif)
## 🎯 Use Cases
- **πŸ“„ Document Digitization**: Convert physical documents to text
- **πŸͺ Receipt Processing**: Extract data from receipts and invoices
- **πŸ“± Screenshot Text Extraction**: Get text from app screenshots
- **πŸš— License Plate Reading**: Extract text from vehicle plates
- **πŸ“š Book/Article Scanning**: Digitize printed materials
- **🌐 Multilingual Text**: Process text in various languages
## πŸ› οΈ Configuration
### Model Selection
Change the model in `models/ocr_processor.py`:
```python
# For faster inference
ocr = OCRProcessor(model_name="microsoft/Florence-2-base")
# For higher accuracy
ocr = OCRProcessor(model_name="microsoft/Florence-2-large")
```
### UI Customization
Modify the Gradio interface in `app.py`:
- Update colors and styling in the CSS section
- Change layout in the `create_interface()` function
- Add new features or components
## πŸ§ͺ Testing
The project includes comprehensive tests:
```bash
# Run all tests
python test_ocr.py
# Test specific functionality
python -c "from models.ocr_processor import OCRProcessor; ocr = OCRProcessor(); print(ocr.get_model_info())"
```
## πŸš€ Deployment
### HuggingFace Spaces
1. Fork this repository
2. Create a new Space on HuggingFace
3. Connect your repository
4. The app will automatically deploy
### Docker Deployment
```dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 7860
CMD ["python", "app.py"]
```
### Local Server
```bash
# Production server
pip install gunicorn
gunicorn -w 4 -b 0.0.0.0:7860 app:create_interface().app
```
## πŸ” Environment Variables
| Variable | Description | Default |
| ---------------------- | --------------------- | ---------------------- |
| `GRADIO_SERVER_PORT` | Server port | 7860 |
| `TRANSFORMERS_CACHE` | Model cache directory | `~/.cache/huggingface` |
| `CUDA_VISIBLE_DEVICES` | GPU device selection | All available |
## 🀝 Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Submit a pull request
## πŸ“ API Reference
### OCRProcessor Class
```python
from models.ocr_processor import OCRProcessor
# Initialize
ocr = OCRProcessor(model_name="microsoft/Florence-2-base")
# Extract text
text = ocr.extract_text(image)
# Extract with regions
result = ocr.extract_text_with_regions(image)
# Get model info
info = ocr.get_model_info()
```
## πŸ› Troubleshooting
### Common Issues
1. **Model Loading Errors**
```bash
# Install missing dependencies
pip install einops timm
```
2. **CUDA Out of Memory**
```python
# Use CPU instead
ocr = OCRProcessor()
ocr.device = "cpu"
```
3. **SSL Certificate Errors**
```bash
# Update certificates (macOS)
/Applications/Python\ 3.x/Install\ Certificates.command
```
## πŸ“„ License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## πŸ™ Acknowledgments
- **Microsoft** for the Florence-2 model
- **HuggingFace** for the transformers library
- **Gradio** for the web interface framework
- **EasyOCR** for fallback OCR capabilities
## πŸ“ž Support
- Create an issue for bug reports
- Start a discussion for feature requests
- Check existing issues before posting
---
**Made with ❀️ for the AI community**