Joash commited on
Commit
9eddb40
·
1 Parent(s): 69455b9

Remove 4-bit quantization and use regular model loading

Browse files
Files changed (2) hide show
  1. README.md +66 -44
  2. src/model_manager.py +3 -12
README.md CHANGED
@@ -9,61 +9,83 @@ pinned: false
9
 
10
  # Code Review Assistant
11
 
12
- This is a FastAPI application that provides automated code reviews using the Gemma model. It's deployed on Hugging Face Spaces.
13
 
14
  ## Features
15
 
16
- - Automated code review using Gemma-2b-it model
17
- - Support for multiple programming languages
18
- - Real-time feedback
19
- - Performance metrics tracking
20
- - Review history
21
- - Code quality analysis
22
- - Best practices recommendations
23
- - Security checks
24
- - Performance optimization suggestions
25
 
26
- ## Technology Stack
 
 
 
27
 
28
- - FastAPI
29
- - Hugging Face Transformers
30
- - Docker
31
- - PostgreSQL
32
- - Prometheus
 
 
 
 
 
 
33
 
34
  ## Environment Variables
35
 
36
  The following environment variables need to be set in your Hugging Face Space:
37
 
38
- - `HUGGING_FACE_TOKEN`: Your Hugging Face API token
39
- - `MODEL_NAME`: google/gemma-2-2b-it
40
- - `DEBUG`: false
41
- - `LOG_LEVEL`: INFO
42
- - `PORT`: 7860
 
 
 
 
 
 
 
 
 
 
 
 
 
43
 
44
  ## Usage
45
 
46
- 1. Select your programming language
47
- 2. Paste your code
48
  3. Click "Submit for Review"
49
- 4. Get instant feedback on your code
50
-
51
- ## API Documentation
52
-
53
- Access the API documentation at:
54
- `https://huggingface.co/spaces/[YOUR-USERNAME]/code-review-assistant/docs`
55
-
56
- ## Deployment Instructions
57
-
58
- 1. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
59
- 2. Click "New Space"
60
- 3. Choose:
61
- - Owner: Your account
62
- - Space name: code-review-assistant
63
- - License: Choose appropriate license
64
- - SDK: Docker
65
- 4. Upload these files:
66
- - All project files
67
- - Rename `Dockerfile.huggingface` to `Dockerfile`
68
- 5. Set the environment variables in Space Settings
69
- 6. Deploy!
 
 
 
 
 
9
 
10
  # Code Review Assistant
11
 
12
+ An automated code review system powered by Gemma-2b that provides intelligent code analysis, suggestions for improvements, and tracks review metrics.
13
 
14
  ## Features
15
 
16
+ ### Automated Code Review
17
+ - Analyzes code quality and suggests improvements
18
+ - Identifies potential bugs and security issues
19
+ - Recommends best practices and optimizations
20
+ - Supports multiple programming languages (Python, JavaScript, Java, C++, TypeScript, Go, Rust)
 
 
 
 
21
 
22
+ ### LLMOps Integration
23
+ - Uses Gemma-2b for intelligent code analysis
24
+ - Tracks model performance and accuracy
25
+ - Monitors response times and token usage
26
 
27
+ ### Performance Monitoring
28
+ - Real-time metrics dashboard
29
+ - Review history tracking
30
+ - Response time monitoring
31
+ - Usage statistics
32
+
33
+ ### Modern Web Interface
34
+ - Interactive code submission
35
+ - Syntax highlighting with CodeMirror
36
+ - Real-time review results
37
+ - Metrics visualization
38
 
39
  ## Environment Variables
40
 
41
  The following environment variables need to be set in your Hugging Face Space:
42
 
43
+ - `HUGGING_FACE_TOKEN`: Your Hugging Face API token (required)
44
+ - `MODEL_NAME`: google/gemma-2b-it (default)
45
+ - `DEBUG`: false (default)
46
+ - `LOG_LEVEL`: INFO (default)
47
+ - `PORT`: 7860 (default)
48
+
49
+ ## API Endpoints
50
+
51
+ - `POST /api/v1/review`: Submit code for review
52
+ ```json
53
+ {
54
+ "code": "your code here",
55
+ "language": "python"
56
+ }
57
+ ```
58
+ - `GET /api/v1/metrics`: Get system metrics
59
+ - `GET /api/v1/history`: Get review history
60
+ - `GET /health`: Check system health
61
 
62
  ## Usage
63
 
64
+ 1. Enter your code in the editor
65
+ 2. Select the programming language
66
  3. Click "Submit for Review"
67
+ 4. View the detailed analysis including:
68
+ - Critical issues
69
+ - Suggested improvements
70
+ - Best practices
71
+ - Security considerations
72
+
73
+ ## Metrics
74
+
75
+ The system tracks various metrics including:
76
+ - Total reviews performed
77
+ - Average response time
78
+ - Number of suggestions per review
79
+ - Daily usage statistics
80
+
81
+ ## Deployment
82
+
83
+ This Space is deployed using Docker and runs on Hugging Face's infrastructure. The application automatically handles:
84
+ - Model initialization and optimization
85
+ - Memory management
86
+ - Performance monitoring
87
+ - Error handling and logging
88
+
89
+ ## License
90
+
91
+ This project is licensed under the MIT License.
src/model_manager.py CHANGED
@@ -1,5 +1,5 @@
1
  import logging
2
- from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
3
  import torch
4
  from huggingface_hub import login
5
  from .config import Config
@@ -61,22 +61,13 @@ class ModelManager:
61
  logger.info(f"Loading model: {self.model_name}")
62
  logger.info(f"Using device: {self.device}")
63
 
64
- # Configure 4-bit quantization
65
- quantization_config = BitsAndBytesConfig(
66
- load_in_4bit=True,
67
- bnb_4bit_compute_dtype=torch.float16,
68
- bnb_4bit_use_double_quant=True,
69
- bnb_4bit_quant_type="nf4"
70
- )
71
-
72
  # Load model with memory optimizations
73
  self.model = AutoModelForCausalLM.from_pretrained(
74
  self.model_name,
75
  device_map={"": self.device},
76
- quantization_config=quantization_config,
77
  token=Config.HUGGING_FACE_TOKEN,
78
  low_cpu_mem_usage=True,
79
- torch_dtype=torch.float16, # Use fp16 for additional memory savings
80
  trust_remote_code=True
81
  )
82
  # Resize embeddings to match tokenizer
@@ -116,7 +107,7 @@ class ModelManager:
116
  pad_token_id=self.tokenizer.pad_token_id,
117
  eos_token_id=self.tokenizer.eos_token_id,
118
  num_beams=1, # Disable beam search to save memory
119
- use_cache=False, # Disable KV cache
120
  early_stopping=True
121
  )
122
 
 
1
  import logging
2
+ from transformers import AutoTokenizer, AutoModelForCausalLM
3
  import torch
4
  from huggingface_hub import login
5
  from .config import Config
 
61
  logger.info(f"Loading model: {self.model_name}")
62
  logger.info(f"Using device: {self.device}")
63
 
 
 
 
 
 
 
 
 
64
  # Load model with memory optimizations
65
  self.model = AutoModelForCausalLM.from_pretrained(
66
  self.model_name,
67
  device_map={"": self.device},
68
+ torch_dtype=torch.float32,
69
  token=Config.HUGGING_FACE_TOKEN,
70
  low_cpu_mem_usage=True,
 
71
  trust_remote_code=True
72
  )
73
  # Resize embeddings to match tokenizer
 
107
  pad_token_id=self.tokenizer.pad_token_id,
108
  eos_token_id=self.tokenizer.eos_token_id,
109
  num_beams=1, # Disable beam search to save memory
110
+ use_cache=True, # Enable KV cache for faster generation
111
  early_stopping=True
112
  )
113