textlens-ocr / README.md
GoConqurer's picture
πŸ”§ Fix Gradio API name conflicts and upgrade version
67e2508
|
raw
history blame
15.3 kB
---
title: TextLens - AI-Powered OCR
emoji: πŸ”
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit
---
# πŸ” TextLens - AI-Powered OCR
[![Deploy to HuggingFace](https://img.shields.io/badge/πŸ€—-Deploy%20to%20Spaces-blue)](https://huggingface.co/spaces/GoConqurer/textlens-ocr)
[![GitHub](https://img.shields.io/badge/GitHub-Repository-green)](https://github.com/KumarAmrit30/textlens-ocr)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
A state-of-the-art Vision-Language Model (VLM) based OCR application that extracts text from images using Microsoft Florence-2 with intelligent fallback systems and enterprise-grade zero downtime deployment.
## πŸš€ Live Demo
**πŸ”— Try it now:** [https://huggingface.co/spaces/GoConqurer/textlens-ocr](https://huggingface.co/spaces/GoConqurer/textlens-ocr)
![TextLens Demo](https://img.shields.io/badge/Demo-Live-brightgreen)
## ✨ Key Features
### πŸ€– Advanced AI-Powered OCR
- **Microsoft Florence-2 VLM**: State-of-the-art vision-language model for text extraction
- **Intelligent Fallback System**: Automatic fallback to EasyOCR if primary model fails
- **Multi-Model Support**: Florence-2-base and Florence-2-large variants
- **Real-time Processing**: Instant text extraction on image upload
### 🎨 Modern User Experience
- **Clean UI**: Professional Gradio interface with intuitive design
- **Multiple Input Methods**: Upload files, use webcam, or paste from clipboard
- **Copy-to-Clipboard**: One-click text copying functionality
- **Responsive Design**: Works seamlessly on desktop and mobile devices
- **Dark/Light Theme**: Automatic theme adaptation
### ⚑ Performance & Reliability
- **GPU Acceleration**: Supports CUDA, MPS (Apple Silicon), and CPU inference
- **Smart Device Detection**: Automatically uses best available hardware
- **Error Resilience**: Robust error handling with graceful degradation
- **Memory Optimization**: Efficient model loading and cleanup
### πŸ›‘οΈ Enterprise Features
- **Zero Downtime Deployment**: Blue-green deployment with health checks
- **Health Monitoring**: Built-in `/health` and `/ready` endpoints
- **Graceful Shutdown**: Signal handling for clean application restarts
- **Production Ready**: Scalable architecture with automated deployment
## πŸ—οΈ Architecture
```
textlens-ocr/
β”œβ”€β”€ πŸ“± Frontend (Gradio UI)
β”‚ β”œβ”€β”€ ui/interface.py # Main interface components
β”‚ β”œβ”€β”€ ui/handlers.py # Event handlers & logic
β”‚ └── ui/styles.py # CSS styling & themes
β”œβ”€β”€ 🧠 AI Models
β”‚ └── models/ocr_processor.py # OCR engine with fallbacks
β”œβ”€β”€ πŸ”§ Utilities
β”‚ └── utils/image_utils.py # Image preprocessing
β”œβ”€β”€ πŸš€ Deployment
β”‚ β”œβ”€β”€ .github/workflows/ # CI/CD pipelines
β”‚ β”œβ”€β”€ scripts/deploy.py # Manual deployment tools
β”‚ └── deployment.config.yml # Deployment configuration
β”œβ”€β”€ πŸ“š Documentation
β”‚ β”œβ”€β”€ README.md # Main documentation
β”‚ └── DEPLOYMENT.md # Deployment guide
└── βš™οΈ Configuration
β”œβ”€β”€ app.py # Main application entry
└── requirements.txt # Dependencies
```
## πŸš€ Quick Start
### 🌐 Online (Recommended)
**Instant access** - No installation required:
πŸ‘‰ [**Launch TextLens**](https://huggingface.co/spaces/GoConqurer/textlens-ocr)
### πŸ’» Local Development
1. **Clone Repository**
```bash
git clone https://github.com/KumarAmrit30/textlens-ocr.git
cd textlens-ocr
```
2. **Setup Environment**
```bash
python -m venv textlens_env
source textlens_env/bin/activate # Windows: textlens_env\Scripts\activate
pip install -r requirements.txt
```
3. **Launch Application**
```bash
python app.py
```
🌐 Open: `http://localhost:7860`
### πŸ§ͺ Quick Test
```bash
# Verify installation
python -c "from models.ocr_processor import OCRProcessor; print('βœ… TextLens ready!')"
```
## πŸ“Š Model Performance
| Model | Size | Speed | Accuracy | Best For |
| -------------------- | ----- | --------- | ------------ | ---------------------- |
| **Florence-2-base** | 270M | ⚑ Fast | πŸ“ˆ High | General OCR, Real-time |
| **Florence-2-large** | 770M | 🐌 Medium | πŸ“Š Very High | High accuracy needs |
| **EasyOCR** | ~100M | πŸš€ Medium | πŸ“‹ Good | Fallback, Multilingual |
## 🎯 Supported Use Cases
| Category | Examples | Performance |
| ------------------- | ------------------------------- | ----------- |
| πŸ“„ **Documents** | PDFs, Scanned papers, Forms | ⭐⭐⭐⭐⭐ |
| 🧾 **Receipts** | Shopping receipts, Invoices | ⭐⭐⭐⭐ |
| πŸ“± **Screenshots** | App interfaces, Error messages | ⭐⭐⭐⭐⭐ |
| πŸš— **Vehicle** | License plates, VIN numbers | ⭐⭐⭐⭐ |
| πŸ“š **Books** | Printed text, Handwritten notes | ⭐⭐⭐⭐ |
| 🌐 **Multilingual** | Multiple languages | ⭐⭐⭐ |
## πŸ”§ Configuration
### πŸŽ›οΈ Model Selection
```python
from models.ocr_processor import OCRProcessor
# Fast inference (recommended)
ocr = OCRProcessor(model_name="microsoft/Florence-2-base")
# Maximum accuracy
ocr = OCRProcessor(model_name="microsoft/Florence-2-large")
```
### 🎨 UI Customization
Modify `ui/styles.py` to customize appearance:
```python
# Change color scheme
PRIMARY_COLOR = "#1f77b4"
SECONDARY_COLOR = "#ff7f0e"
# Update layout
INTERFACE_WIDTH = "100%"
```
### βš™οΈ Environment Variables
| Variable | Description | Default |
| ---------------------- | -------------------- | ---------------------- |
| `SPACE_ID` | HuggingFace Space ID | Auto-detected |
| `DEPLOYMENT_STAGE` | deployment stage | `production` |
| `TRANSFORMERS_CACHE` | Model cache path | `~/.cache/huggingface` |
| `CUDA_VISIBLE_DEVICES` | GPU selection | All available |
## πŸš€ Deployment
### πŸ€— HuggingFace Spaces (Recommended)
**Automatic Deployment:**
1. Fork this repository
2. Push to `main`/`master` branch
3. GitHub Actions automatically deploys to HuggingFace Spaces
4. Access your deployed app at: `https://huggingface.co/spaces/USERNAME/textlens-ocr`
**Manual Deployment:**
1. Go to [GitHub Actions](https://github.com/KumarAmrit30/textlens-ocr/actions)
2. Select "Deploy to HuggingFace Spaces"
3. Click "Run workflow"
4. Choose deployment type:
- **Direct**: Quick deployment to production
- **Blue-Green**: Zero downtime with staging validation
### πŸ”„ Zero Downtime Deployment
Our enterprise-grade deployment system ensures **zero downtime** for users:
**Features:**
- πŸ”΅ **Blue-Green Deployment**: Test in staging before production
- πŸ₯ **Health Monitoring**: Automatic health checks with retry logic
- πŸ”„ **Graceful Shutdown**: Clean application restarts
- πŸ“Š **Real-time Monitoring**: Deployment status tracking
**Health Endpoints:**
- `GET /health` - Application health status
- `GET /ready` - Application readiness check
**Deployment Flow:**
```mermaid
graph LR
A[Code Push] --> B[Validate]
B --> C[Deploy Staging]
C --> D[Health Check]
D --> E[Deploy Production]
E --> F[Verify]
F --> G[Complete βœ…]
```
### 🐳 Docker Deployment
```dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 7860
CMD ["python", "app.py"]
```
Build and run:
```bash
docker build -t textlens-ocr .
docker run -p 7860:7860 textlens-ocr
```
### ☁️ Cloud Platforms
| Platform | Status | Guide |
| ---------------------- | ------------- | ------------------------------------------------------------------- |
| **HuggingFace Spaces** | βœ… Ready | [Deploy Now](https://huggingface.co/spaces/GoConqurer/textlens-ocr) |
| **Google Colab** | βœ… Compatible | Open in Colab |
| **AWS/GCP/Azure** | πŸ”§ Docker | Use Docker deployment |
| **Heroku** | ⚠️ Limited | GPU not available |
## πŸ§ͺ Testing & Development
### πŸ” Running Tests
```bash
# Basic functionality test
python -c "
from models.ocr_processor import OCRProcessor
ocr = OCRProcessor()
print(f'βœ… Model loaded: {ocr.get_model_info()}')
"
# Test with sample image
python -c "
from PIL import Image
from models.ocr_processor import OCRProcessor
import requests
# Download test image
img_url = 'https://via.placeholder.com/300x100/000000/FFFFFF?text=Hello+World'
image = Image.open(requests.get(img_url, stream=True).raw)
# Test OCR
ocr = OCRProcessor()
result = ocr.extract_text(image)
print(f'βœ… OCR Result: {result}')
"
```
### πŸ› οΈ Development Tools
```bash
# Install development dependencies
pip install -r requirements.txt
# Format code
black . --line-length 88
# Type checking
mypy models/ utils/ ui/
# Lint code
flake8 --max-line-length 88
```
## πŸ“š API Reference
### OCRProcessor Class
```python
from models.ocr_processor import OCRProcessor
# Initialize processor
ocr = OCRProcessor(
model_name="microsoft/Florence-2-base", # Model selection
device=None, # Auto-detect device
torch_dtype=None # Auto-select dtype
)
# Extract text from image
text = ocr.extract_text(image)
# Returns: str
# Extract text with bounding boxes
result = ocr.extract_text_with_regions(image)
# Returns: dict with text and regions
# Get model information
info = ocr.get_model_info()
# Returns: dict with model details
# Cleanup resources
ocr.cleanup()
```
### Health Check API
```bash
# Check application health
curl https://huggingface.co/spaces/GoConqurer/textlens-ocr/health
# Response:
{
"status": "healthy",
"timestamp": 1640995200,
"version": "1.0.0",
"environment": "production"
}
# Check readiness
curl https://huggingface.co/spaces/GoConqurer/textlens-ocr/ready
# Response:
{
"status": "ready",
"timestamp": 1640995200
}
```
## 🚨 Troubleshooting
### Common Issues
| Issue | Symptoms | Solution |
| ----------------------- | ------------------------ | --------------------------------------- |
| **Model Loading Error** | ImportError, CUDA errors | Check GPU drivers, install CUDA toolkit |
| **Memory Error** | Out of memory | Reduce batch size, use CPU inference |
| **SSL Certificate** | SSL errors on macOS | Run certificate update command |
| **Permission Error** | File access denied | Check file permissions, run as admin |
### Debug Commands
```bash
# Check CUDA availability
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"
# Check transformers version
python -c "import transformers; print(f'Transformers: {transformers.__version__}')"
# Test health endpoint locally
curl http://localhost:7860/health
# View application logs
tail -f textlens.log
```
### Getting Help
1. πŸ“‹ **Check existing issues**: [GitHub Issues](https://github.com/KumarAmrit30/textlens-ocr/issues)
2. πŸ†• **Create new issue**: Provide error details and environment info
3. πŸ’¬ **Join discussion**: [GitHub Discussions](https://github.com/KumarAmrit30/textlens-ocr/discussions)
4. πŸ“§ **Contact**: Create an issue for direct support
## 🀝 Contributing
We welcome contributions! Here's how to get started:
### πŸ”§ Development Setup
1. **Fork & Clone**
```bash
git clone https://github.com/YOUR_USERNAME/textlens-ocr.git
cd textlens-ocr
```
2. **Create Branch**
```bash
git checkout -b feature/your-feature-name
```
3. **Make Changes**
- Add new features or fix bugs
- Update tests and documentation
- Follow code style guidelines
4. **Test Changes**
```bash
python -m pytest tests/
python -c "from models.ocr_processor import OCRProcessor; OCRProcessor()"
```
5. **Submit PR**
```bash
git add .
git commit -m "feat: add your feature description"
git push origin feature/your-feature-name
```
### πŸ“ Contribution Guidelines
- **Code Style**: Follow PEP 8, use Black formatter
- **Documentation**: Update README and docstrings
- **Tests**: Add tests for new functionality
- **Commits**: Use conventional commit messages
- **Issues**: Link PRs to relevant issues
## πŸ“„ License
This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.
### πŸ™ Third-Party Licenses
- **Microsoft Florence-2**: [MIT License](https://github.com/microsoft/Florence)
- **HuggingFace Transformers**: [Apache License 2.0](https://github.com/huggingface/transformers)
- **Gradio**: [Apache License 2.0](https://github.com/gradio-app/gradio)
- **EasyOCR**: [Apache License 2.0](https://github.com/JaidedAI/EasyOCR)
## 🌟 Acknowledgments
Special thanks to:
- **Microsoft Research** for the incredible Florence-2 vision-language model
- **HuggingFace** for the transformers library and Spaces platform
- **Gradio Team** for the amazing web interface framework
- **JaidedAI** for EasyOCR fallback capabilities
- **Open Source Community** for continuous support and contributions
## πŸ“ˆ Project Status
| Component | Status | Version |
| ----------------- | ------------- | ------- |
| **Core OCR** | βœ… Stable | v1.0.0 |
| **Web UI** | βœ… Stable | v1.0.0 |
| **Deployment** | βœ… Production | v1.0.0 |
| **API** | βœ… Stable | v1.0.0 |
| **Documentation** | βœ… Complete | v1.0.0 |
### 🎯 Roadmap
- [ ] **Multi-language UI** support
- [ ] **Batch processing** for multiple images
- [ ] **API rate limiting** and authentication
- [ ] **Custom model** fine-tuning support
- [ ] **Mobile app** development
- [ ] **Cloud storage** integration
## πŸ“ž Support & Community
### πŸ”— Links
- **🏠 Homepage**: [GitHub Repository](https://github.com/KumarAmrit30/textlens-ocr)
- **πŸš€ Live Demo**: [HuggingFace Spaces](https://huggingface.co/spaces/GoConqurer/textlens-ocr)
- **πŸ“‹ Issues**: [Report Bugs](https://github.com/KumarAmrit30/textlens-ocr/issues)
- **πŸ’¬ Discussions**: [GitHub Discussions](https://github.com/KumarAmrit30/textlens-ocr/discussions)
- **πŸ“– Documentation**: [Deployment Guide](DEPLOYMENT.md)
### πŸ“Š Stats
![GitHub stars](https://img.shields.io/github/stars/KumarAmrit30/textlens-ocr?style=social)
![GitHub forks](https://img.shields.io/github/forks/KumarAmrit30/textlens-ocr?style=social)
![GitHub watchers](https://img.shields.io/github/watchers/KumarAmrit30/textlens-ocr?style=social)
---
<div align="center">
**Made with ❀️ for the AI community**
[⭐ Star this repo](https://github.com/KumarAmrit30/textlens-ocr) β€’ [πŸ”— Try the demo](https://huggingface.co/spaces/GoConqurer/textlens-ocr) β€’ [πŸ“– Read docs](DEPLOYMENT.md)
</div>