Update README.md
Browse files
README.md
CHANGED
@@ -1,344 +1,355 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
- **
|
17 |
-
- **
|
18 |
-
- **
|
19 |
-
|
20 |
-
|
21 |
-
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
##
|
26 |
-
|
27 |
-
- **
|
28 |
-
- **
|
29 |
-
- **
|
30 |
-
- **
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
- **
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
-
|
65 |
-
|
66 |
-
|
67 |
-
|
68 |
-
|
69 |
-
|
70 |
-
|
71 |
-
|
72 |
-
|
73 |
-
|
74 |
-
|
75 |
-
|
76 |
-
|
77 |
-
|
78 |
-
|
79 |
-
|
80 |
-
|
81 |
-
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
###
|
89 |
-
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
-
|
94 |
-
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
|
103 |
-
|
104 |
-
|
105 |
-
|
106 |
-
|
107 |
-
|
108 |
-
|
109 |
-
|
110 |
-
|
111 |
-
|
112 |
-
|
113 |
-
|
114 |
-
|
115 |
-
|
116 |
-
|
117 |
-
|
118 |
-
#
|
119 |
-
|
120 |
-
|
121 |
-
|
122 |
-
|
123 |
-
|
124 |
-
|
125 |
-
|
126 |
-
|
127 |
-
-
|
128 |
-
|
129 |
-
|
130 |
-
|
131 |
-
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
|
136 |
-
|
137 |
-
|
138 |
-
|
139 |
-
|
140 |
-
|
141 |
-
|
142 |
-
|
143 |
-
|
144 |
-
|
145 |
-
|
146 |
-
|
147 |
-
|
148 |
-
|
149 |
-
|
150 |
-
|
151 |
-
|
152 |
-
|
153 |
-
|
154 |
-
|
155 |
-
|
156 |
-
|
157 |
-
|
158 |
-
|
159 |
-
|
160 |
-
|
161 |
-
|
162 |
-
|
163 |
-
|
164 |
-
|
165 |
-
|
166 |
-
|
167 |
-
|
168 |
-
|
169 |
-
|
170 |
-
|
171 |
-
|
172 |
-
|
173 |
-
|
174 |
-
|
175 |
-
|
176 |
-
|
177 |
-
|
178 |
-
|
179 |
-
|
180 |
-
|
181 |
-
|
182 |
-
|
183 |
-
|
184 |
-
|
185 |
-
|
186 |
-
|
187 |
-
|
188 |
-
|
189 |
-
python cli.py
|
190 |
-
|
191 |
-
|
192 |
-
|
193 |
-
|
194 |
-
|
195 |
-
|
196 |
-
|
197 |
-
|
198 |
-
|
199 |
-
|
200 |
-
|
201 |
-
|
202 |
-
|
203 |
-
|
204 |
-
|
205 |
-
|
206 |
-
|
207 |
-
|
208 |
-
|
209 |
-
|
210 |
-
|
211 |
-
###
|
212 |
-
|
213 |
-
|
214 |
-
|
215 |
-
|
216 |
-
|
217 |
-
|
218 |
-
-
|
219 |
-
|
220 |
-
|
221 |
-
|
222 |
-
|
223 |
-
|
224 |
-
|
225 |
-
|
226 |
-
|
227 |
-
|
228 |
-
|
229 |
-
|
230 |
-
|
231 |
-
|
232 |
-
|
233 |
-
|
234 |
-
|
235 |
-
|
236 |
-
**
|
237 |
-
|
238 |
-
|
239 |
-
-
|
240 |
-
|
241 |
-
|
242 |
-
|
243 |
-
|
244 |
-
|
245 |
-
|
246 |
-
|
247 |
-
|
248 |
-
|
249 |
-
|
250 |
-
-
|
251 |
-
|
252 |
-
|
253 |
-
|
254 |
-
|
255 |
-
|
256 |
-
|
257 |
-
|
258 |
-
|
259 |
-
|
260 |
-
|
261 |
-
-
|
262 |
-
|
263 |
-
|
264 |
-
|
265 |
-
|
266 |
-
|
267 |
-
- FAISS
|
268 |
-
-
|
269 |
-
|
270 |
-
|
271 |
-
|
272 |
-
-
|
273 |
-
|
274 |
-
|
275 |
-
|
276 |
-
|
277 |
-
|
278 |
-
|
279 |
-
|
280 |
-
|
281 |
-
|
282 |
-
|
283 |
-
|
284 |
-
|
285 |
-
|
286 |
-
|
287 |
-
|
288 |
-
|
289 |
-
|
290 |
-
|
291 |
-
|
292 |
-
|
293 |
-
|
294 |
-
|
295 |
-
|
296 |
-
|
297 |
-
|
298 |
-
|
299 |
-
|
300 |
-
|
301 |
-
|
302 |
-
|
303 |
-
|
304 |
-
|
305 |
-
|
306 |
-
|
307 |
-
|
308 |
-
|
309 |
-
|
310 |
-
|
311 |
-
|
312 |
-
|
313 |
-
> **
|
314 |
-
|
315 |
-
|
316 |
-
|
317 |
-
|
318 |
-
> **
|
319 |
-
|
320 |
-
|
321 |
-
|
322 |
-
|
323 |
-
> **
|
324 |
-
|
325 |
-
|
326 |
-
|
327 |
-
|
328 |
-
|
329 |
-
|
330 |
-
|
331 |
-
|
332 |
-
|
333 |
-
|
334 |
-
|
335 |
-
|
336 |
-
##
|
337 |
-
|
338 |
-
|
339 |
-
|
340 |
-
|
341 |
-
|
342 |
-
|
343 |
-
|
344 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
title: CodeMind
|
3 |
+
emoji: π§
|
4 |
+
colorFrom: purple
|
5 |
+
colorTo: indigo
|
6 |
+
sdk: static
|
7 |
+
pinned: false
|
8 |
+
license: apache-2.0
|
9 |
+
short_description: AI-powered development assistant CLI Tool
|
10 |
+
---
|
11 |
+
|
12 |
+
# CodeMind
|
13 |
+
|
14 |
+
**CodeMind** is a AI-powered development assistant that runs entirely on your local machine for intelligent document analysis and commit message generation. It leverages modern machine learning models for: helping you understand your codebase through semantic search and generates meaningful commit messages using locally hosted language models, ensuring complete privacy and no cloud dependencies.
|
15 |
+
|
16 |
+
- **Efficient Knowledge Retrieval**: Makes searching and querying documentation more powerβΊful by using semantic embeddings rather than keyword search.
|
17 |
+
- **Smarter Git Workflow**: Automates the creation of meaningful commit messages by analyzing git diffs and using an LLM to summarize changes.
|
18 |
+
- **AI-Powered Documentation**: Enables you to ask questions about your project, using your own docs/context rather than just generic answers.
|
19 |
+
|
20 |
+
**Check it out on Hugging Face Spaces:**
|
21 |
+
[](https://huggingface.co/spaces/dev-jas/CodeMind)
|
22 |
+
|
23 |
+
---
|
24 |
+
|
25 |
+
## Features
|
26 |
+
|
27 |
+
- **Document Embedding** (using [EmbeddingGemma-300m](https://huggingface.co/google/embeddinggemma-300m))
|
28 |
+
- **Semantic Search** (using [FAISS](https://github.com/facebookresearch/faiss) for vector similarity search)
|
29 |
+
- **Commit Message Generation** (using [Phi-2](https://huggingface.co/microsoft/phi-2-gguf) for text generation): Automatically generate descriptive commit messages based on your changes
|
30 |
+
- **Retrieval-Augmented Generation (RAG)**: Answers questions using indexed document context
|
31 |
+
- **Local Processing**: All AI processing happens on your machine with no data sent to cloud services
|
32 |
+
- **Flexible Configuration**: Customize models and parameters to suit your specific needs
|
33 |
+
- **FAISS Integration**: Efficient vector similarity search for fast retrieval
|
34 |
+
- **Multiple Model Support**: Compatible with GGUF and SentenceTransformers models
|
35 |
+
|
36 |
+
## Prerequisites
|
37 |
+
|
38 |
+
- **Python 3.8 or higher**
|
39 |
+
- **8GB+ RAM** recommended (for running language models)
|
40 |
+
- **4GB+ disk space** for model files
|
41 |
+
- **Git** for repository cloning
|
42 |
+
|
43 |
+
### Platform Recommendations
|
44 |
+
|
45 |
+
- **Linux** (Recommended for best compatibility)
|
46 |
+
- **macOS** (Good compatibility)
|
47 |
+
- **Windows** (May require additional setup for some dependencies)
|
48 |
+
|
49 |
+
## Installation
|
50 |
+
|
51 |
+
### 1. Clone the Repository
|
52 |
+
|
53 |
+
```bash
|
54 |
+
git clone https://github.com/devjas1/codemind.git
|
55 |
+
cd codemind
|
56 |
+
```
|
57 |
+
|
58 |
+
### 2. Set Up Python Environment
|
59 |
+
|
60 |
+
Create and activate a virtual environment:
|
61 |
+
|
62 |
+
```bash
|
63 |
+
|
64 |
+
# Create virtual environment
|
65 |
+
python -m venv venv
|
66 |
+
|
67 |
+
# Activate on macOS/Linux
|
68 |
+
source venv/bin/activate
|
69 |
+
|
70 |
+
# Activate on Windows
|
71 |
+
venv\Scripts\activate
|
72 |
+
```
|
73 |
+
|
74 |
+
### 3. Install Dependencies
|
75 |
+
|
76 |
+
```bash
|
77 |
+
pip install -r requirements.txt
|
78 |
+
```
|
79 |
+
|
80 |
+
**Note**: If you encounter installation errors related to C++/PyTorch/FAISS:
|
81 |
+
|
82 |
+
- Ensure you have Python development tools installed
|
83 |
+
- Linux/macOS are preferred for FAISS compatibility
|
84 |
+
- On Windows, you may need to install Visual Studio Build Tools
|
85 |
+
|
86 |
+
## Model Setup
|
87 |
+
|
88 |
+
### Directory Structure
|
89 |
+
|
90 |
+
Create the following directory structure for model files:
|
91 |
+
|
92 |
+
```text
|
93 |
+
models/
|
94 |
+
βββ phi-2.Q4_0.gguf # For commit message generation (Phi-2 model)
|
95 |
+
βββ embeddinggemma-300m/ # For document embedding (EmbeddingGemma model)
|
96 |
+
βββ [model files here]
|
97 |
+
```
|
98 |
+
|
99 |
+
### Downloading Models
|
100 |
+
|
101 |
+
1. **Phi-2 Model** (for commit message generation):
|
102 |
+
|
103 |
+
- Download `phi-2.Q4_0.gguf` from a trusted source
|
104 |
+
- Place it in the `models/` directory
|
105 |
+
|
106 |
+
2. **EmbeddingGemma Model** (for document embedding):
|
107 |
+
|
108 |
+
- Download the EmbeddingGemma-300m model files
|
109 |
+
- Place all files in the `models/embeddinggemma-300m/` directory
|
110 |
+
|
111 |
+
> **Note**: The specific process for obtaining these models may vary. Check the documentation in each model folder for detailed instructions.
|
112 |
+
|
113 |
+
## Configuration
|
114 |
+
|
115 |
+
Edit the `config.yaml` file to match your local setup:
|
116 |
+
|
117 |
+
```yaml
|
118 |
+
# Model configuration for commit message generation
|
119 |
+
generator:
|
120 |
+
model_path: "./models/phi-2.Q4_0.gguf"
|
121 |
+
quantization: "Q4_0"
|
122 |
+
max_tokens: 512
|
123 |
+
n_ctx: 2048
|
124 |
+
|
125 |
+
# Model configuration for document embedding
|
126 |
+
embedding:
|
127 |
+
model_path: "./models/embeddinggemma-300m"
|
128 |
+
|
129 |
+
# Retrieval configuration for semantic search
|
130 |
+
retrieval:
|
131 |
+
vector_store: "faiss"
|
132 |
+
top_k: 5 # Number of results to return
|
133 |
+
similarity_threshold: 0.7 # Minimum similarity score (0.0 to 1.0)
|
134 |
+
```
|
135 |
+
|
136 |
+
### Configuration Tips
|
137 |
+
|
138 |
+
- Adjust `top_k` to control how many results are returned for each query
|
139 |
+
- Modify `similarity_threshold` to filter results by relevance
|
140 |
+
- Ensure all file paths are correct for your system
|
141 |
+
- For larger codebases, you may need to increase `max_tokens`
|
142 |
+
|
143 |
+
## Indexing Documents
|
144 |
+
|
145 |
+
To enable semantic search over your documentation or codebase, you need to create a FAISS index:
|
146 |
+
|
147 |
+
```bash
|
148 |
+
# Basic usage
|
149 |
+
python src/embedder.py path/to/your/documents config.yaml
|
150 |
+
|
151 |
+
# Example with docs directory
|
152 |
+
python src/embedder.py ./docs config.yaml
|
153 |
+
|
154 |
+
# Example with specific code directory
|
155 |
+
python src/embedder.py ./src config.yaml
|
156 |
+
```
|
157 |
+
|
158 |
+
This process:
|
159 |
+
|
160 |
+
1. Reads all documents from the specified directory
|
161 |
+
2. Generates embeddings using the configured model
|
162 |
+
3. Creates a FAISS index in the `vector_cache/` directory
|
163 |
+
4. Enables fast semantic search capabilities
|
164 |
+
|
165 |
+
> **Note**: The indexing process may take several minutes depending on the size of your codebase and your hardware capabilities.
|
166 |
+
|
167 |
+
## Usage
|
168 |
+
|
169 |
+
### Command Line Interface
|
170 |
+
|
171 |
+
Run the main CLI interface:
|
172 |
+
|
173 |
+
```bash
|
174 |
+
python cli.py
|
175 |
+
```
|
176 |
+
|
177 |
+
### Available Commands
|
178 |
+
|
179 |
+
#### Get Help
|
180 |
+
|
181 |
+
```bash
|
182 |
+
python cli.py --help
|
183 |
+
```
|
184 |
+
|
185 |
+
#### Ask Questions About Your Codebase
|
186 |
+
|
187 |
+
```bash
|
188 |
+
python cli.py ask "How does this repository work?"
|
189 |
+
python cli.py ask "Where is the main configuration handled?"
|
190 |
+
python cli.py ask "Show me examples of API usage"
|
191 |
+
```
|
192 |
+
|
193 |
+
#### Generate Commit Messages
|
194 |
+
|
195 |
+
```bash
|
196 |
+
# Preview a generated commit message
|
197 |
+
python cli.py commit --preview
|
198 |
+
|
199 |
+
# Generate commit message without preview
|
200 |
+
python cli.py commit
|
201 |
+
```
|
202 |
+
|
203 |
+
#### API Server (Placeholder)
|
204 |
+
|
205 |
+
```bash
|
206 |
+
python cli.py serve --port 8000
|
207 |
+
```
|
208 |
+
|
209 |
+
> **Note**: The API server functionality is not yet implemented. This command will display: "API server functionality not implemented yet."
|
210 |
+
|
211 |
+
### Advanced Usage
|
212 |
+
|
213 |
+
For more advanced usage, you can modify the configuration to:
|
214 |
+
|
215 |
+
- Use different models for specific tasks
|
216 |
+
- Adjust the context window size for larger documents
|
217 |
+
- Customize the similarity threshold for retrieval
|
218 |
+
- Use different vector stores (though FAISS is currently the only supported option)
|
219 |
+
|
220 |
+
## Troubleshooting
|
221 |
+
|
222 |
+
### Common Issues
|
223 |
+
|
224 |
+
#### Model Errors
|
225 |
+
|
226 |
+
**Problem**: Model files not found or inaccessible
|
227 |
+
**Solution**:
|
228 |
+
|
229 |
+
- Verify model files are in the correct locations
|
230 |
+
- Check file permissions
|
231 |
+
- Ensure the paths in `config.yaml` are correct
|
232 |
+
|
233 |
+
#### FAISS Errors
|
234 |
+
|
235 |
+
**Problem**: "No FAISS index found" error
|
236 |
+
**Solution**:
|
237 |
+
|
238 |
+
- Run the embedder script to create the index
|
239 |
+
- Ensure the `vector_cache/` directory has write permissions
|
240 |
+
|
241 |
+
```bash
|
242 |
+
python src/embedder.py path/to/documents config.yaml
|
243 |
+
```
|
244 |
+
|
245 |
+
#### SentenceTransformers Issues
|
246 |
+
|
247 |
+
**Problem**: Compatibility errors with SentenceTransformers
|
248 |
+
**Solution**:
|
249 |
+
|
250 |
+
- Check that the model format is compatible with SentenceTransformers
|
251 |
+
- Verify the version in requirements.txt
|
252 |
+
- Ensure all model files are present in the model directory
|
253 |
+
|
254 |
+
#### Performance Issues
|
255 |
+
|
256 |
+
**Problem**: Slow response times
|
257 |
+
**Solution**:
|
258 |
+
|
259 |
+
- Ensure you have adequate RAM
|
260 |
+
- Consider using smaller quantized models
|
261 |
+
- Close other memory-intensive applications
|
262 |
+
|
263 |
+
#### Platform-Specific Issues
|
264 |
+
|
265 |
+
**Windows-specific issues**:
|
266 |
+
|
267 |
+
- FAISS may require additional compilation
|
268 |
+
- Path separators may need adjustment in configuration
|
269 |
+
|
270 |
+
**macOS/Linux**:
|
271 |
+
|
272 |
+
- Generally fewer compatibility issues
|
273 |
+
- Ensure you have write permissions for all directories
|
274 |
+
|
275 |
+
### Validation Checklist
|
276 |
+
|
277 |
+
- All model files present in correct directories
|
278 |
+
- FAISS index built in `vector_cache/`
|
279 |
+
- `config.yaml` paths match your local setup
|
280 |
+
- Python environment activated
|
281 |
+
- All dependencies installed
|
282 |
+
- Adequate disk space available
|
283 |
+
- Sufficient RAM available
|
284 |
+
|
285 |
+
### Getting Detailed Error Information
|
286 |
+
|
287 |
+
For specific errors, run commands with verbose output:
|
288 |
+
|
289 |
+
```bash
|
290 |
+
# Add debug flags if available
|
291 |
+
python cli.py --verbose ask "Your question"
|
292 |
+
```
|
293 |
+
|
294 |
+
## Project Structure
|
295 |
+
|
296 |
+
```text
|
297 |
+
codemind/
|
298 |
+
βββ models/ # AI model files
|
299 |
+
β βββ phi-2.Q4_0.gguf # Phi-2 model for generation
|
300 |
+
β βββ embeddinggemma-300m/ # Embedding model
|
301 |
+
β βββ [model files]
|
302 |
+
βββ src/ # Source code
|
303 |
+
β βββ embedder.py # Document embedding script
|
304 |
+
βββ vector_cache/ # FAISS vector store (auto-generated)
|
305 |
+
βββ config.yaml # Configuration file
|
306 |
+
βββ requirements.txt # Python dependencies
|
307 |
+
βββ cli.py # Command-line interface
|
308 |
+
βββ README.md # This file
|
309 |
+
```
|
310 |
+
|
311 |
+
## FAQ
|
312 |
+
|
313 |
+
> **Q:** **Can I use different models?**
|
314 |
+
> **A:** Yes, you can use any GGUF-compatible model for generation and any SentenceTransformers-compatible model for embeddings. Update the paths in `config.yaml` accordingly.
|
315 |
+
|
316 |
+
---
|
317 |
+
|
318 |
+
> **Q:** **How much RAM do I need?**
|
319 |
+
> **A**: For the Phi-2 Q4_0 model, 8GB RAM is recommended. Larger models will require more memory.
|
320 |
+
|
321 |
+
---
|
322 |
+
|
323 |
+
> **Q:** **Can I index multiple directories?**
|
324 |
+
> **A**: Yes, you can run the embedder script multiple times with different directories, or combine your documents into one directory before indexing.
|
325 |
+
|
326 |
+
---
|
327 |
+
|
328 |
+
> **Q:** **Is my data sent to the cloud?**
|
329 |
+
> **A**: No, all processing happens locally on your machine. No code or data is sent to external services.
|
330 |
+
|
331 |
+
---
|
332 |
+
|
333 |
+
> **Q:** **How often should I re-index my documents?**
|
334 |
+
> **A**: Re-index whenever your documentation or codebase changes significantly to keep search results relevant.
|
335 |
+
|
336 |
+
## Support
|
337 |
+
|
338 |
+
If you encounter issues:
|
339 |
+
|
340 |
+
1. Check the troubleshooting section above
|
341 |
+
2. Verify all model files are in correct locations
|
342 |
+
3. Confirm Python and library versions match requirements
|
343 |
+
4. Ensure proper directory permissions
|
344 |
+
|
345 |
+
For specific errors, please include the full traceback when seeking assistance.
|
346 |
+
|
347 |
+
## Contributing
|
348 |
+
|
349 |
+
Contributions to CodeMind are welcome! Please feel free to submit pull requests, create issues, or suggest new features.
|
350 |
+
|
351 |
+
## License
|
352 |
+
|
353 |
+
This project is licensed under the terms of the LICENSE file included in the repository.
|
354 |
+
|
355 |
+
Β© 2025 CodeMind. All rights reserved.
|