File size: 5,733 Bytes
0ad895e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
---
license: gemma
library_name: transformers
base_model: google/gemma-3-1b-it
language:
- en
- ko
- ja
- zh
- es
- ru
- ar
- hi
- id
- ml
- fr
- de
pipeline_tag: image-text-to-text
---
# Gemma3-R1984-1B
# Model Overview
Gemma3-R1984-1B is a robust Agentic AI platform built on Googls’s Gemma-3-4B model. It integrates state-of-the-art deep research via web search with multimodal file processing—including images, videos, and documents—and handles long contexts up to 8,000 tokens. Designed for local deployment on independent servers using NVIDIA L40s, L4, A-100(ZeroGPU) GPUs, it provides high security, prevents data leakage, and delivers uncensored responses.
# Key Features
Multimodal Processing:
Supports multiple file types such as images (PNG, JPG, JPEG, GIF, WEBP), videos (MP4), and documents (PDF, CSV, TXT).
Deep Research (Web Search):
Automatically extracts keywords from user queries and utilizes the SERPHouse API to retrieve up to 20 real-time search results. The model incorporates multiple sources by explicitly citing them in the response.
Long Context Handling:
Capable of processing inputs up to 8,000 tokens, ensuring comprehensive analysis of lengthy documents or conversations.
Robust Reasoning:
Employs extended chain-of-thought reasoning for systematic and accurate answer generation.
Secure Local Deployment:
Operates on independent local servers using NVIDIA L40s GPUs to maximize security and prevent information leakage.
**Experience the Power of Gemma3-R1984-1B**
- ✅ **Agentic AI Platform:** An autonomous system designed to make intelligent decisions and act independently.
- ✅ **Reasoning & Uncensored:** Delivers clear, accurate, and unfiltered responses by harnessing advanced reasoning capabilities.
- ✅ **Multimodal & VLM:** Seamlessly processes and interprets multiple input types—text, images, videos—empowering versatile applications.
- ✅ **Deep-Research & RAG:** Integrates state-of-the-art deep research and retrieval-augmented generation to provide comprehensive, real-time insights.
**Cutting-Edge Hardware for Maximum Security**
Gemma3-R1984-1B is engineered to operate on a dedicated **NVIDIA L40s GPU** within an independent local server environment. This robust setup not only guarantees optimal performance and rapid processing but also enhances security by isolating the model from external networks, effectively preventing information leakage. Whether handling sensitive data or complex queries, our platform ensures that your information remains secure and your AI interactions remain uncompromised.
# Use Cases
Fast-response conversational agents
Deep research and retrieval-augmented generation (RAG)
Document comparison and detailed analysis
Visual question answering from images and videos
Complex reasoning and research-based inquiries
# Supported File Formats
Images: PNG, JPG, JPEG, GIF, WEBP
Videos: MP4
Documents: PDF, CSV, TXT
# Model Details
Parameter Count: Approximately 1B parameters (estimated)
Context Window: Up to 8,000 tokens
Hugging Face Model Path: VIDraft/Gemma-3-R1984-1B
License: mit(Agentic AI) / gemma(gemma-3-1B)
# Installation and Setup
## Requirements
Ensure you have Python 3.8 or higher installed. The model relies on several libraries:
PyTorch (with bfloat16 support)
Transformers
Gradio
OpenCV (opencv-python)
Pillow (PIL)
PyPDF2
Pandas
Loguru
Requests
# Install dependencies using pip:
pip install torch transformers gradio opencv-python pillow PyPDF2 pandas loguru requests
# Environment Variables
Set the following environment variables before running the model:
## SERPHOUSE_API_KEY
Your SERPHouse API key for web search functionality.
Example:
export SERPHOUSE_API_KEY="your_api_key_here"
MODEL_ID
(Optional) The model identifier; default is VIDraft/Gemma-3-R1984-1B.
MAX_NUM_IMAGES
(Optional) Maximum number of images allowed per query (default is 5).
# Running the Model
Gemma3-R1984-1B comes with a Gradio-based multimodal chat interface. To run the model locally:
1. Clone the Repository:
Ensure you have the repository containing the model code.
2. Launch the Application:
Execute the main Python file:
python your_filename.py
This will start a local Gradio interface. Open the provided URL in your browser to interact with the model.
# Example Code: Server and Client Request
## Server Example
You can deploy the model server locally using the provided Gradio code. Make sure your server is accessible at your designated URL.
## Client Request Example
Below is an example of how to interact with the model using an HTTP API call:
```py
import requests
import json
# Replace with your server URL and token
url = "http://<your-server-url>:8000/v1/chat/completions"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer your_token_here"
}
# Construct the message payload
messages = [
{"role": "system", "content": "You are a powerful AI assistant."},
{"role": "user", "content": "Compare the contents of two PDF files."}
]
data = {
"model": "VIDraft/Gemma-3-R1984-1B",
"messages": messages,
"temperature": 0.15
}
# Send the POST request to the server
response = requests.post(url, headers=headers, data=json.dumps(data))
# Print the response from the model
print(response.json())
```
**Important Deployment Notice:**
For optimal performance, it is highly recommended to clone the repository using the following command. This model is designed to run on a server equipped with at least an NVIDIA L40s, L4, A100(ZeroGPU) GPU. The minimum VRAM requirement is 24GB, and VRAM usage may temporarily peak at approximately 82GB during processing.
```bash
git clone https://huggingface.co/spaces/VIDraft/Gemma-3-R1984-1B
|