Spaces:

Joash2024
/

code-review-assistant

Sleeping

App Files Files Community

Joash commited on Dec 8, 2024

Commit

9eddb40

1 Parent(s): 69455b9

Remove 4-bit quantization and use regular model loading

Browse files

Files changed (2) hide show

README.md +66 -44
src/model_manager.py +3 -12

README.md CHANGED Viewed

@@ -9,61 +9,83 @@ pinned: false
 # Code Review Assistant
-This is a FastAPI application that provides automated code reviews using the Gemma model. It's deployed on Hugging Face Spaces.
 ## Features
-- Automated code review using Gemma-2b-it model
-- Support for multiple programming languages
-- Real-time feedback
-- Performance metrics tracking
-- Review history
-- Code quality analysis
-- Best practices recommendations
-- Security checks
-- Performance optimization suggestions
-## Technology Stack
-- FastAPI
-- Hugging Face Transformers
-- Docker
-- PostgreSQL
-- Prometheus
 ## Environment Variables
 The following environment variables need to be set in your Hugging Face Space:
-- `HUGGING_FACE_TOKEN`: Your Hugging Face API token
-- `MODEL_NAME`: google/gemma-2-2b-it
-- `DEBUG`: false
-- `LOG_LEVEL`: INFO
-- `PORT`: 7860
 ## Usage
-1. Select your programming language
-2. Paste your code
 3. Click "Submit for Review"
-4. Get instant feedback on your code
-## API Documentation
-Access the API documentation at:
-`https://huggingface.co/spaces/[YOUR-USERNAME]/code-review-assistant/docs`
-## Deployment Instructions
-1. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
-2. Click "New Space"
-3. Choose:
-   - Owner: Your account
-   - Space name: code-review-assistant
-   - License: Choose appropriate license
-   - SDK: Docker
-4. Upload these files:
-   - All project files
-   - Rename `Dockerfile.huggingface` to `Dockerfile`
-5. Set the environment variables in Space Settings
-6. Deploy!

 # Code Review Assistant
+An automated code review system powered by Gemma-2b that provides intelligent code analysis, suggestions for improvements, and tracks review metrics.
 ## Features
+### Automated Code Review
+- Analyzes code quality and suggests improvements
+- Identifies potential bugs and security issues
+- Recommends best practices and optimizations
+- Supports multiple programming languages (Python, JavaScript, Java, C++, TypeScript, Go, Rust)
+### LLMOps Integration
+- Uses Gemma-2b for intelligent code analysis
+- Tracks model performance and accuracy
+- Monitors response times and token usage
+### Performance Monitoring
+- Real-time metrics dashboard
+- Review history tracking
+- Response time monitoring
+- Usage statistics
+### Modern Web Interface
+- Interactive code submission
+- Syntax highlighting with CodeMirror
+- Real-time review results
+- Metrics visualization
 ## Environment Variables
 The following environment variables need to be set in your Hugging Face Space:
+- `HUGGING_FACE_TOKEN`: Your Hugging Face API token (required)
+- `MODEL_NAME`: google/gemma-2b-it (default)
+- `DEBUG`: false (default)
+- `LOG_LEVEL`: INFO (default)
+- `PORT`: 7860 (default)
+## API Endpoints
+- `POST /api/v1/review`: Submit code for review
+  ```json
+  {
+    "code": "your code here",
+    "language": "python"
+  }
+  ```
+- `GET /api/v1/metrics`: Get system metrics
+- `GET /api/v1/history`: Get review history
+- `GET /health`: Check system health
 ## Usage
+1. Enter your code in the editor
+2. Select the programming language
 3. Click "Submit for Review"
+4. View the detailed analysis including:
+   - Critical issues
+   - Suggested improvements
+   - Best practices
+   - Security considerations
+## Metrics
+The system tracks various metrics including:
+- Total reviews performed
+- Average response time
+- Number of suggestions per review
+- Daily usage statistics
+## Deployment
+This Space is deployed using Docker and runs on Hugging Face's infrastructure. The application automatically handles:
+- Model initialization and optimization
+- Memory management
+- Performance monitoring
+- Error handling and logging
+## License
+This project is licensed under the MIT License.

src/model_manager.py CHANGED Viewed

@@ -1,5 +1,5 @@
 import logging
-from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
 import torch
 from huggingface_hub import login
 from .config import Config
@@ -61,22 +61,13 @@ class ModelManager:
             logger.info(f"Loading model: {self.model_name}")
             logger.info(f"Using device: {self.device}")
-            # Configure 4-bit quantization
-            quantization_config = BitsAndBytesConfig(
-                load_in_4bit=True,
-                bnb_4bit_compute_dtype=torch.float16,
-                bnb_4bit_use_double_quant=True,
-                bnb_4bit_quant_type="nf4"
-            )
             # Load model with memory optimizations
             self.model = AutoModelForCausalLM.from_pretrained(
                 self.model_name,
                 device_map={"": self.device},
-                quantization_config=quantization_config,
                 token=Config.HUGGING_FACE_TOKEN,
                 low_cpu_mem_usage=True,
-                torch_dtype=torch.float16,  # Use fp16 for additional memory savings
                 trust_remote_code=True
             )
             # Resize embeddings to match tokenizer
@@ -116,7 +107,7 @@ class ModelManager:
                     pad_token_id=self.tokenizer.pad_token_id,
                     eos_token_id=self.tokenizer.eos_token_id,
                     num_beams=1,  # Disable beam search to save memory
-                    use_cache=False,  # Disable KV cache
                     early_stopping=True
                 )

 import logging
+from transformers import AutoTokenizer, AutoModelForCausalLM
 import torch
 from huggingface_hub import login
 from .config import Config
             logger.info(f"Loading model: {self.model_name}")
             logger.info(f"Using device: {self.device}")
             # Load model with memory optimizations
             self.model = AutoModelForCausalLM.from_pretrained(
                 self.model_name,
                 device_map={"": self.device},
+                torch_dtype=torch.float32,
                 token=Config.HUGGING_FACE_TOKEN,
                 low_cpu_mem_usage=True,
                 trust_remote_code=True
             )
             # Resize embeddings to match tokenizer
                     pad_token_id=self.tokenizer.pad_token_id,
                     eos_token_id=self.tokenizer.eos_token_id,
                     num_beams=1,  # Disable beam search to save memory
+                    use_cache=True,  # Enable KV cache for faster generation
                     early_stopping=True
                 )