Spaces:

GoConqurer
/

textlens-ocr

Running

App Files Files Community

textlens-ocr / README.md

GoConqurer

🔧 Fix Gradio API name conflicts and upgrade version

67e2508 5 months ago

preview code

raw

history blame

15.3 kB

	---
	title: TextLens - AI-Powered OCR
	emoji: 🔍
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 4.0.0
	app_file: app.py
	pinned: false
	license: mit
	---

	# 🔍 TextLens - AI-Powered OCR

	[![Deploy to HuggingFace](https://img.shields.io/badge/🤗-Deploy%20to%20Spaces-blue)](https://huggingface.co/spaces/GoConqurer/textlens-ocr)
	[![GitHub](https://img.shields.io/badge/GitHub-Repository-green)](https://github.com/KumarAmrit30/textlens-ocr)
	[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
	[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)

	A state-of-the-art Vision-Language Model (VLM) based OCR application that extracts text from images using Microsoft Florence-2 with intelligent fallback systems and enterprise-grade zero downtime deployment.

	## 🚀 Live Demo

	🔗 Try it now: [https://huggingface.co/spaces/GoConqurer/textlens-ocr](https://huggingface.co/spaces/GoConqurer/textlens-ocr)

	![TextLens Demo](https://img.shields.io/badge/Demo-Live-brightgreen)

	## ✨ Key Features

	### 🤖 Advanced AI-Powered OCR

	- Microsoft Florence-2 VLM: State-of-the-art vision-language model for text extraction
	- Intelligent Fallback System: Automatic fallback to EasyOCR if primary model fails
	- Multi-Model Support: Florence-2-base and Florence-2-large variants
	- Real-time Processing: Instant text extraction on image upload

	### 🎨 Modern User Experience

	- Clean UI: Professional Gradio interface with intuitive design
	- Multiple Input Methods: Upload files, use webcam, or paste from clipboard
	- Copy-to-Clipboard: One-click text copying functionality
	- Responsive Design: Works seamlessly on desktop and mobile devices
	- Dark/Light Theme: Automatic theme adaptation

	### ⚡ Performance & Reliability

	- GPU Acceleration: Supports CUDA, MPS (Apple Silicon), and CPU inference
	- Smart Device Detection: Automatically uses best available hardware
	- Error Resilience: Robust error handling with graceful degradation
	- Memory Optimization: Efficient model loading and cleanup

	### 🛡️ Enterprise Features

	- Zero Downtime Deployment: Blue-green deployment with health checks
	- Health Monitoring: Built-in `/health` and `/ready` endpoints
	- Graceful Shutdown: Signal handling for clean application restarts
	- Production Ready: Scalable architecture with automated deployment

	## 🏗️ Architecture

	```
	textlens-ocr/
	├── 📱 Frontend (Gradio UI)
	│ ├── ui/interface.py # Main interface components
	│ ├── ui/handlers.py # Event handlers & logic
	│ └── ui/styles.py # CSS styling & themes
	├── 🧠 AI Models
	│ └── models/ocr_processor.py # OCR engine with fallbacks
	├── 🔧 Utilities
	│ └── utils/image_utils.py # Image preprocessing
	├── 🚀 Deployment
	│ ├── .github/workflows/ # CI/CD pipelines
	│ ├── scripts/deploy.py # Manual deployment tools
	│ └── deployment.config.yml # Deployment configuration
	├── 📚 Documentation
	│ ├── README.md # Main documentation
	│ └── DEPLOYMENT.md # Deployment guide
	└── ⚙️ Configuration
	├── app.py # Main application entry
	└── requirements.txt # Dependencies
	```

	## 🚀 Quick Start

	### 🌐 Online (Recommended)

	Instant access - No installation required:
	👉 [Launch TextLens](https://huggingface.co/spaces/GoConqurer/textlens-ocr)

	### 💻 Local Development

	1. Clone Repository

	```bash
	git clone https://github.com/KumarAmrit30/textlens-ocr.git
	cd textlens-ocr
	```

	2. Setup Environment

	```bash
	python -m venv textlens_env
	source textlens_env/bin/activate # Windows: textlens_env\Scripts\activate
	pip install -r requirements.txt
	```

	3. Launch Application
	```bash
	python app.py
	```
	🌐 Open: `http://localhost:7860`

	### 🧪 Quick Test

	```bash
	# Verify installation
	python -c "from models.ocr_processor import OCRProcessor; print('✅ TextLens ready!')"
	```

	## 📊 Model Performance

	\| Model \| Size \| Speed \| Accuracy \| Best For \|
	\| -------------------- \| ----- \| --------- \| ------------ \| ---------------------- \|
	\| Florence-2-base \| 270M \| ⚡ Fast \| 📈 High \| General OCR, Real-time \|
	\| Florence-2-large \| 770M \| 🐌 Medium \| 📊 Very High \| High accuracy needs \|
	\| EasyOCR \| ~100M \| 🚀 Medium \| 📋 Good \| Fallback, Multilingual \|

	## 🎯 Supported Use Cases

	\| Category \| Examples \| Performance \|
	\| ------------------- \| ------------------------------- \| ----------- \|
	\| 📄 Documents \| PDFs, Scanned papers, Forms \| ⭐⭐⭐⭐⭐ \|
	\| 🧾 Receipts \| Shopping receipts, Invoices \| ⭐⭐⭐⭐ \|
	\| 📱 Screenshots \| App interfaces, Error messages \| ⭐⭐⭐⭐⭐ \|
	\| 🚗 Vehicle \| License plates, VIN numbers \| ⭐⭐⭐⭐ \|
	\| 📚 Books \| Printed text, Handwritten notes \| ⭐⭐⭐⭐ \|
	\| 🌐 Multilingual \| Multiple languages \| ⭐⭐⭐ \|

	## 🔧 Configuration

	### 🎛️ Model Selection

	```python
	from models.ocr_processor import OCRProcessor

	# Fast inference (recommended)
	ocr = OCRProcessor(model_name="microsoft/Florence-2-base")

	# Maximum accuracy
	ocr = OCRProcessor(model_name="microsoft/Florence-2-large")
	```

	### 🎨 UI Customization

	Modify `ui/styles.py` to customize appearance:

	```python
	# Change color scheme
	PRIMARY_COLOR = "#1f77b4"
	SECONDARY_COLOR = "#ff7f0e"

	# Update layout
	INTERFACE_WIDTH = "100%"
	```

	### ⚙️ Environment Variables

	\| Variable \| Description \| Default \|
	\| ---------------------- \| -------------------- \| ---------------------- \|
	\| `SPACE_ID` \| HuggingFace Space ID \| Auto-detected \|
	\| `DEPLOYMENT_STAGE` \| deployment stage \| `production` \|
	\| `TRANSFORMERS_CACHE` \| Model cache path \| `~/.cache/huggingface` \|
	\| `CUDA_VISIBLE_DEVICES` \| GPU selection \| All available \|

	## 🚀 Deployment

	### 🤗 HuggingFace Spaces (Recommended)

	Automatic Deployment:

	1. Fork this repository
	2. Push to `main`/`master` branch
	3. GitHub Actions automatically deploys to HuggingFace Spaces
	4. Access your deployed app at: `https://huggingface.co/spaces/USERNAME/textlens-ocr`

	Manual Deployment:

	1. Go to [GitHub Actions](https://github.com/KumarAmrit30/textlens-ocr/actions)
	2. Select "Deploy to HuggingFace Spaces"
	3. Click "Run workflow"
	4. Choose deployment type:
	- Direct: Quick deployment to production
	- Blue-Green: Zero downtime with staging validation

	### 🔄 Zero Downtime Deployment

	Our enterprise-grade deployment system ensures zero downtime for users:

	Features:

	- 🔵 Blue-Green Deployment: Test in staging before production
	- 🏥 Health Monitoring: Automatic health checks with retry logic
	- 🔄 Graceful Shutdown: Clean application restarts
	- 📊 Real-time Monitoring: Deployment status tracking

	Health Endpoints:

	- `GET /health` - Application health status
	- `GET /ready` - Application readiness check

	Deployment Flow:

	```mermaid
	graph LR
	A[Code Push] --> B[Validate]
	B --> C[Deploy Staging]
	C --> D[Health Check]
	D --> E[Deploy Production]
	E --> F[Verify]
	F --> G[Complete ✅]
	```

	### 🐳 Docker Deployment

	```dockerfile
	FROM python:3.9-slim

	WORKDIR /app
	COPY requirements.txt .
	RUN pip install -r requirements.txt

	COPY . .
	EXPOSE 7860

	CMD ["python", "app.py"]
	```

	Build and run:

	```bash
	docker build -t textlens-ocr .
	docker run -p 7860:7860 textlens-ocr
	```

	### ☁️ Cloud Platforms

	\| Platform \| Status \| Guide \|
	\| ---------------------- \| ------------- \| ------------------------------------------------------------------- \|
	\| HuggingFace Spaces \| ✅ Ready \| [Deploy Now](https://huggingface.co/spaces/GoConqurer/textlens-ocr) \|
	\| Google Colab \| ✅ Compatible \| Open in Colab \|
	\| AWS/GCP/Azure \| 🔧 Docker \| Use Docker deployment \|
	\| Heroku \| ⚠️ Limited \| GPU not available \|

	## 🧪 Testing & Development

	### 🔍 Running Tests

	```bash
	# Basic functionality test
	python -c "
	from models.ocr_processor import OCRProcessor
	ocr = OCRProcessor()
	print(f'✅ Model loaded: {ocr.get_model_info()}')
	"

	# Test with sample image
	python -c "
	from PIL import Image
	from models.ocr_processor import OCRProcessor
	import requests

	# Download test image
	img_url = 'https://via.placeholder.com/300x100/000000/FFFFFF?text=Hello+World'
	image = Image.open(requests.get(img_url, stream=True).raw)

	# Test OCR
	ocr = OCRProcessor()
	result = ocr.extract_text(image)
	print(f'✅ OCR Result: {result}')
	"
	```

	### 🛠️ Development Tools

	```bash
	# Install development dependencies
	pip install -r requirements.txt

	# Format code
	black . --line-length 88

	# Type checking
	mypy models/ utils/ ui/

	# Lint code
	flake8 --max-line-length 88
	```

	## 📚 API Reference

	### OCRProcessor Class

	```python
	from models.ocr_processor import OCRProcessor

	# Initialize processor
	ocr = OCRProcessor(
	model_name="microsoft/Florence-2-base", # Model selection
	device=None, # Auto-detect device
	torch_dtype=None # Auto-select dtype
	)

	# Extract text from image
	text = ocr.extract_text(image)
	# Returns: str

	# Extract text with bounding boxes
	result = ocr.extract_text_with_regions(image)
	# Returns: dict with text and regions

	# Get model information
	info = ocr.get_model_info()
	# Returns: dict with model details

	# Cleanup resources
	ocr.cleanup()
	```

	### Health Check API

	```bash
	# Check application health
	curl https://huggingface.co/spaces/GoConqurer/textlens-ocr/health

	# Response:
	{
	"status": "healthy",
	"timestamp": 1640995200,
	"version": "1.0.0",
	"environment": "production"
	}

	# Check readiness
	curl https://huggingface.co/spaces/GoConqurer/textlens-ocr/ready

	# Response:
	{
	"status": "ready",
	"timestamp": 1640995200
	}
	```

	## 🚨 Troubleshooting

	### Common Issues

	\| Issue \| Symptoms \| Solution \|
	\| ----------------------- \| ------------------------ \| --------------------------------------- \|
	\| Model Loading Error \| ImportError, CUDA errors \| Check GPU drivers, install CUDA toolkit \|
	\| Memory Error \| Out of memory \| Reduce batch size, use CPU inference \|
	\| SSL Certificate \| SSL errors on macOS \| Run certificate update command \|
	\| Permission Error \| File access denied \| Check file permissions, run as admin \|

	### Debug Commands

	```bash
	# Check CUDA availability
	python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"

	# Check transformers version
	python -c "import transformers; print(f'Transformers: {transformers.__version__}')"

	# Test health endpoint locally
	curl http://localhost:7860/health

	# View application logs
	tail -f textlens.log
	```

	### Getting Help

	1. 📋 Check existing issues: [GitHub Issues](https://github.com/KumarAmrit30/textlens-ocr/issues)
	2. 🆕 Create new issue: Provide error details and environment info
	3. 💬 Join discussion: [GitHub Discussions](https://github.com/KumarAmrit30/textlens-ocr/discussions)
	4. 📧 Contact: Create an issue for direct support

	## 🤝 Contributing

	We welcome contributions! Here's how to get started:

	### 🔧 Development Setup

	1. Fork & Clone

	```bash
	git clone https://github.com/YOUR_USERNAME/textlens-ocr.git
	cd textlens-ocr
	```

	2. Create Branch

	```bash
	git checkout -b feature/your-feature-name
	```

	3. Make Changes

	- Add new features or fix bugs
	- Update tests and documentation
	- Follow code style guidelines

	4. Test Changes

	```bash
	python -m pytest tests/
	python -c "from models.ocr_processor import OCRProcessor; OCRProcessor()"
	```

	5. Submit PR
	```bash
	git add .
	git commit -m "feat: add your feature description"
	git push origin feature/your-feature-name
	```

	### 📝 Contribution Guidelines

	- Code Style: Follow PEP 8, use Black formatter
	- Documentation: Update README and docstrings
	- Tests: Add tests for new functionality
	- Commits: Use conventional commit messages
	- Issues: Link PRs to relevant issues

	## 📄 License

	This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

	### 🙏 Third-Party Licenses

	- Microsoft Florence-2: [MIT License](https://github.com/microsoft/Florence)
	- HuggingFace Transformers: [Apache License 2.0](https://github.com/huggingface/transformers)
	- Gradio: [Apache License 2.0](https://github.com/gradio-app/gradio)
	- EasyOCR: [Apache License 2.0](https://github.com/JaidedAI/EasyOCR)

	## 🌟 Acknowledgments

	Special thanks to:

	- Microsoft Research for the incredible Florence-2 vision-language model
	- HuggingFace for the transformers library and Spaces platform
	- Gradio Team for the amazing web interface framework
	- JaidedAI for EasyOCR fallback capabilities
	- Open Source Community for continuous support and contributions

	## 📈 Project Status

	\| Component \| Status \| Version \|
	\| ----------------- \| ------------- \| ------- \|
	\| Core OCR \| ✅ Stable \| v1.0.0 \|
	\| Web UI \| ✅ Stable \| v1.0.0 \|
	\| Deployment \| ✅ Production \| v1.0.0 \|
	\| API \| ✅ Stable \| v1.0.0 \|
	\| Documentation \| ✅ Complete \| v1.0.0 \|

	### 🎯 Roadmap

	- [ ] Multi-language UI support
	- [ ] Batch processing for multiple images
	- [ ] API rate limiting and authentication
	- [ ] Custom model fine-tuning support
	- [ ] Mobile app development
	- [ ] Cloud storage integration

	## 📞 Support & Community

	### 🔗 Links

	- 🏠 Homepage: [GitHub Repository](https://github.com/KumarAmrit30/textlens-ocr)
	- 🚀 Live Demo: [HuggingFace Spaces](https://huggingface.co/spaces/GoConqurer/textlens-ocr)
	- 📋 Issues: [Report Bugs](https://github.com/KumarAmrit30/textlens-ocr/issues)
	- 💬 Discussions: [GitHub Discussions](https://github.com/KumarAmrit30/textlens-ocr/discussions)
	- 📖 Documentation: [Deployment Guide](DEPLOYMENT.md)

	### 📊 Stats

	![GitHub stars](https://img.shields.io/github/stars/KumarAmrit30/textlens-ocr?style=social)
	![GitHub forks](https://img.shields.io/github/forks/KumarAmrit30/textlens-ocr?style=social)
	![GitHub watchers](https://img.shields.io/github/watchers/KumarAmrit30/textlens-ocr?style=social)

	---

	<div align="center">

	Made with ❤️ for the AI community

	[⭐ Star this repo](https://github.com/KumarAmrit30/textlens-ocr) • [🔗 Try the demo](https://huggingface.co/spaces/GoConqurer/textlens-ocr) • [📖 Read docs](DEPLOYMENT.md)

	</div>