Spaces:
Running
Running
File size: 7,639 Bytes
1691ca8 b38b9a9 1691ca8 b38b9a9 1691ca8 6789f6f 1691ca8 6789f6f 1691ca8 6789f6f 1691ca8 6789f6f 1691ca8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 |
---
title: TextLens - AI-Powered OCR
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit
---
# π TextLens - AI-Powered OCR
A modern Vision-Language Model (VLM) based OCR application that extracts text from images using Microsoft Florence-2 model with intelligent fallback systems.
## β¨ Features
- **π€ Advanced VLM OCR**: Uses Microsoft Florence-2 for state-of-the-art text extraction
- **π Smart Fallback System**: Automatically falls back to EasyOCR if Florence-2 fails
- **π§ͺ Demo Mode**: Test mode for demonstration when other methods are unavailable
- **π¨ Modern UI**: Clean, responsive Gradio interface with excellent UX
- **π± Multiple Input Methods**: Upload, webcam, clipboard support
- **β‘ Real-time Processing**: Automatic text extraction on image upload
- **π Copy Functionality**: Easy text copying from results
- **π GPU Acceleration**: Supports CUDA, MPS, and CPU inference
- **π‘οΈ Error Handling**: Robust error handling and user-friendly messages
## ποΈ Architecture
```
textlens-ocr/
βββ app.py # Main Gradio application
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation
βββ models/ # OCR processing modules
β βββ __init__.py
β βββ ocr_processor.py # Advanced OCR class with fallbacks
βββ utils/ # Utility functions
β βββ __init__.py
β βββ image_utils.py # Image preprocessing utilities
βββ ui/ # User interface components
βββ __init__.py
βββ interface.py # Gradio interface
βββ handlers.py # Event handlers
βββ styles.py # CSS styling
```
## π Quick Start
### Local Development
1. **Clone the repository**
```bash
git clone https://github.com/KumarAmrit30/textlens-ocr.git
cd textlens-ocr
```
2. **Set up Python environment**
```bash
python3 -m venv textlens_env
source textlens_env/bin/activate # On Windows: textlens_env\Scripts\activate
```
3. **Install dependencies**
```bash
pip install -r requirements.txt
```
4. **Run the application**
```bash
python app.py
```
5. **Open your browser**
Navigate to `http://localhost:7860`
### Quick Test
Run the test suite to verify everything works:
```bash
python test_ocr.py
```
## π§ Technical Details
### OCR Processing Pipeline
1. **Primary**: Microsoft Florence-2 VLM
- State-of-the-art vision-language model
- Supports both basic OCR and region-based extraction
- GPU accelerated inference
2. **Fallback**: EasyOCR
- Traditional OCR with good accuracy
- Works when Florence-2 fails to load
- Multi-language support
3. **Demo Mode**: Test Mode
- Demonstration functionality
- Shows interface working correctly
- Used when other methods are unavailable
### Model Loading Strategy
The application uses an intelligent loading strategy:
```python
try:
# Try Florence-2 with specific revision
model = AutoModelForCausalLM.from_pretrained(
"microsoft/Florence-2-base",
revision='refs/pr/6',
trust_remote_code=True
)
except:
# Fall back to default Florence-2
model = AutoModelForCausalLM.from_pretrained(
"microsoft/Florence-2-base",
trust_remote_code=True
)
```
### Device Detection
Automatically detects and uses the best available device:
- **CUDA**: NVIDIA GPUs with CUDA support
- **MPS**: Apple Silicon Macs (M1/M2/M3)
- **CPU**: Fallback for all systems
## π Performance
| Model | Size | Speed | Accuracy | Use Case |
| ---------------- | ------ | ------ | --------- | --------------------- |
| Florence-2-base | 230M | Fast | High | General OCR |
| Florence-2-large | 770M | Medium | Very High | High accuracy needs |
| EasyOCR | ~100MB | Medium | Good | Fallback/Multilingual |
## π Supported Image Formats
- **JPEG** (.jpg, .jpeg)
- **PNG** (.png)
- **WebP** (.webp)
- **BMP** (.bmp)
- **TIFF** (.tiff, .tif)
- **GIF** (.gif)
## π― Use Cases
- **π Document Digitization**: Convert physical documents to text
- **πͺ Receipt Processing**: Extract data from receipts and invoices
- **π± Screenshot Text Extraction**: Get text from app screenshots
- **π License Plate Reading**: Extract text from vehicle plates
- **π Book/Article Scanning**: Digitize printed materials
- **π Multilingual Text**: Process text in various languages
## π οΈ Configuration
### Model Selection
Change the model in `models/ocr_processor.py`:
```python
# For faster inference
ocr = OCRProcessor(model_name="microsoft/Florence-2-base")
# For higher accuracy
ocr = OCRProcessor(model_name="microsoft/Florence-2-large")
```
### UI Customization
Modify the Gradio interface in `app.py`:
- Update colors and styling in the CSS section
- Change layout in the `create_interface()` function
- Add new features or components
## π§ͺ Testing
The project includes comprehensive tests:
```bash
# Run all tests
python test_ocr.py
# Test specific functionality
python -c "from models.ocr_processor import OCRProcessor; ocr = OCRProcessor(); print(ocr.get_model_info())"
```
## π Deployment
### HuggingFace Spaces
1. Fork this repository
2. Create a new Space on HuggingFace
3. Connect your repository
4. The app will automatically deploy
### Docker Deployment
```dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 7860
CMD ["python", "app.py"]
```
### Local Server
```bash
# Production server
pip install gunicorn
gunicorn -w 4 -b 0.0.0.0:7860 app:create_interface().app
```
## π Environment Variables
| Variable | Description | Default |
| ---------------------- | --------------------- | ---------------------- |
| `GRADIO_SERVER_PORT` | Server port | 7860 |
| `TRANSFORMERS_CACHE` | Model cache directory | `~/.cache/huggingface` |
| `CUDA_VISIBLE_DEVICES` | GPU device selection | All available |
## π€ Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Submit a pull request
## π API Reference
### OCRProcessor Class
```python
from models.ocr_processor import OCRProcessor
# Initialize
ocr = OCRProcessor(model_name="microsoft/Florence-2-base")
# Extract text
text = ocr.extract_text(image)
# Extract with regions
result = ocr.extract_text_with_regions(image)
# Get model info
info = ocr.get_model_info()
```
## π Troubleshooting
### Common Issues
1. **Model Loading Errors**
```bash
# Install missing dependencies
pip install einops timm
```
2. **CUDA Out of Memory**
```python
# Use CPU instead
ocr = OCRProcessor()
ocr.device = "cpu"
```
3. **SSL Certificate Errors**
```bash
# Update certificates (macOS)
/Applications/Python\ 3.x/Install\ Certificates.command
```
## π License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## π Acknowledgments
- **Microsoft** for the Florence-2 model
- **HuggingFace** for the transformers library
- **Gradio** for the web interface framework
- **EasyOCR** for fallback OCR capabilities
## π Support
- Create an issue for bug reports
- Start a discussion for feature requests
- Check existing issues before posting
---
**Made with β€οΈ for the AI community**
|