emelnov/ocr-captcha-v4-mailru

RU Описание

Модель emelnov/ocr-captcha-v4-mailru — это дообученная версия microsoft/trocr-base-printed (или anuashok/ocr-captcha-v3, если применимо), предназначенная для распознавания текста на CAPTCHA-изображениях. Она была обучена на датасете из 1,000 CAPTCHA-изображений с платформы Mail.ru и достигла 98% точности на этом тестовом наборе.

Описание модели:

Базовая модель: microsoft/trocr-base-printed (и/или anuashok/ocr-captcha-v3)
Назначение: Распознавание текста на CAPTCHA-изображениях
Размер модели: 334 млн параметров
Формат тензоров: FP32

Примечания:

Убедитесь, что у вас установлены библиотеки transformers, torch и Pillow.

Эта модель эффективно распознаёт текст на CAPTCHA-изображениях, облегчая автоматизацию задач, связанных с вводом текста с подобных изображений.

Код для использования:

import torch
from PIL import Image
from transformers import TrOCRProcessor, VisionEncoderDecoderModel

# Загрузка модели и процессора
model_name = "emelnov/ocr-captcha-v4-mailru"
processor = TrOCRProcessor.from_pretrained(model_name)
model = VisionEncoderDecoderModel.from_pretrained(model_name).to(
    torch.device("cuda" if torch.cuda.is_available() else "cpu")
)

# Функция для предсказания текста
def predict_text(image_path):
    image = Image.open(image_path).convert("RGB")
    pixel_values = processor(images=image, return_tensors="pt").pixel_values.to(model.device)
    model.eval()
    with torch.no_grad():
        output_ids = model.generate(pixel_values)
    predicted_text = processor.batch_decode(output_ids, skip_special_tokens=True)[0]
    return predicted_text

# Пример использования
image_path = "path_to_your_captcha_image.jpg"
print(f"Распознанный текст: {predict_text(image_path)}")

EN Description

The emelnov/ocr-captcha-v4-mailru model is a fine-tuned version of microsoft/trocr-base-printed (or anuashok/ocr-captcha-v3 if applicable), designed for recognizing text in CAPTCHA images. It was trained on a dataset of 1,000 CAPTCHA images from the Mail.ru platform and achieved 98% accuracy on this test set.

Model Description:

Base Model: microsoft/trocr-base-printed (and/or anuashok/ocr-captcha-v3)
Purpose: Text recognition in CAPTCHA images
Model Size: 334 million parameters
Tensor Format: FP32

Notes:

Ensure that the transformers, torch, and Pillow libraries are installed.

This model effectively recognizes text in CAPTCHA images, facilitating the automation of tasks involving text input from such images.

Code for Usage:

import torch
from PIL import Image
from transformers import TrOCRProcessor, VisionEncoderDecoderModel

# Load the model and processor
model_name = "emelnov/ocr-captcha-v4-mailru"
processor = TrOCRProcessor.from_pretrained(model_name)
model = VisionEncoderDecoderModel.from_pretrained(model_name).to(
    torch.device("cuda" if torch.cuda.is_available() else "cpu")
)

# Function to predict text
def predict_text(image_path):
    image = Image.open(image_path).convert("RGB")
    pixel_values = processor(images=image, return_tensors="pt").pixel_values.to(model.device)
    model.eval()
    with torch.no_grad():
        output_ids = model.generate(pixel_values)
    predicted_text = processor.batch_decode(output_ids, skip_special_tokens=True)[0]
    return predicted_text

# Example usage
image_path = "path_to_your_captcha_image.jpg"
print(f"Recognized text: {predict_text(image_path)}")

emelnov
/

ocr-captcha-v4-mailru

Model tree for emelnov/ocr-captcha-v4-mailru