devjas1 commited on
Commit
d397fab
Β·
1 Parent(s): 6bebf5c

(UPDATE): [docs]: add ABOUT.md for project overview and installation instructions; update copyright year in index.html

Browse files
Files changed (3) hide show
  1. ABOUT.md +228 -0
  2. README.md +2 -0
  3. index.html +2 -1
ABOUT.md ADDED
@@ -0,0 +1,228 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CodeMind
2
+
3
+ A CLI tool for intelligent document analysis and commit message generation using EmbeddingGemma-300m for embeddings, FAISS for vector storage, and Phi-2 for text generation.
4
+
5
+ ## Features
6
+
7
+ - **Document Indexing**: Embed and index documents for semantic search
8
+ - **Semantic Search**: Find relevant documents using natural language queries
9
+ - **Smart Commit Messages**: Generate meaningful commit messages from staged git changes
10
+ - **RAG (Retrieval-Augmented Generation)**: Answer questions using indexed document context
11
+
12
+ ## Setup
13
+
14
+ ### Prerequisites
15
+
16
+ - Windows 11
17
+ - Conda environment
18
+ - Git
19
+
20
+ ### Installation
21
+
22
+ 1. **Create a Conda environment:**
23
+
24
+ ```bash
25
+ conda create -n codemind python=3.9
26
+ conda activate codemind
27
+ ```
28
+
29
+ 2. **Clone the repository:**
30
+
31
+ ```bash
32
+ git clone https://github.com/devjas1/codemind.git
33
+ cd codemind
34
+ ```
35
+
36
+ 3. **Install dependencies:**
37
+
38
+ ```bash
39
+ pip install -r requirements.txt
40
+ ```
41
+
42
+ 4. **Download models:**
43
+
44
+ **Embedding Model (EmbeddingGemma-300m):**
45
+
46
+ - Download from Hugging Face: `google/embeddinggemma-300m`
47
+ - Place in `./models/embeddinggemma-300m/` directory
48
+
49
+ **Generation Model (Phi-2 GGUF):**
50
+
51
+ - Download the quantized Phi-2 model: `phi-2.Q4_0.gguf`
52
+ - Place in `./models/` directory
53
+ - Download from: [Microsoft Phi-2 GGUF](https://huggingface.co/microsoft/phi-2-gguf) or similar quantized versions
54
+
55
+ ### Directory Structure
56
+
57
+ ```
58
+ CodeMind/
59
+ β”œβ”€β”€ cli.py # Main CLI entry point
60
+ β”œβ”€β”€ config.yaml # Configuration file
61
+ β”œβ”€β”€ requirements.txt # Python dependencies
62
+ β”œβ”€β”€ models/ # Model storage
63
+ β”‚ β”œβ”€β”€ embeddinggemma-300m/ # Embedding model directory
64
+ β”‚ └── phi-2.Q4_0.gguf # Phi-2 quantized model file
65
+ β”œβ”€β”€ src/ # Core modules
66
+ β”‚ β”œβ”€β”€ config_loader.py # Configuration management
67
+ β”‚ β”œβ”€β”€ embedder.py # Document embedding
68
+ β”‚ β”œβ”€β”€ retriever.py # Semantic search
69
+ β”‚ β”œβ”€β”€ generator.py # Text generation
70
+ β”‚ └── diff_analyzer.py # Git diff analysis
71
+ β”œβ”€β”€ docs/ # Documentation
72
+ └── vector_cache/ # FAISS index storage (auto-created)
73
+ ```
74
+
75
+ ## Usage
76
+
77
+ ### Initialize Document Index
78
+
79
+ Index documents from a directory for semantic search:
80
+
81
+ ```bash
82
+ python cli.py init ./docs/
83
+ ```
84
+
85
+ This will:
86
+
87
+ - Embed all documents in the specified directory
88
+ - Create a FAISS index in `vector_cache/`
89
+ - Save metadata for retrieval
90
+
91
+ ### Semantic Search
92
+
93
+ Search for relevant documents using natural language:
94
+
95
+ ```bash
96
+ python cli.py search "how to configure the model"
97
+ ```
98
+
99
+ Returns ranked results with similarity scores.
100
+
101
+ ### Ask Questions (RAG)
102
+
103
+ Get answers based on your indexed documents:
104
+
105
+ ```bash
106
+ python cli.py ask "What are the configuration options?"
107
+ ```
108
+
109
+ Uses retrieval-augmented generation to provide contextual answers.
110
+
111
+ ### Git Commit Message Generation
112
+
113
+ Generate intelligent commit messages from staged changes:
114
+
115
+ ```bash
116
+ # Preview commit message without applying
117
+ python cli.py commit --preview
118
+
119
+ # Show staged files and analysis without generating message
120
+ python cli.py commit --dry-run
121
+
122
+ # Generate and apply commit message
123
+ python cli.py commit --apply
124
+ ```
125
+
126
+ ### Start API Server (Future Feature)
127
+
128
+ ```bash
129
+ python cli.py serve --port 8000
130
+ ```
131
+
132
+ _Note: API server functionality is planned for future releases._
133
+
134
+ ## Configuration
135
+
136
+ Edit `config.yaml` to customize behavior:
137
+
138
+ ```yaml
139
+ embedding:
140
+ model_path: "./models/embeddinggemma-300m"
141
+ dim: 768
142
+ truncate_to: 128
143
+
144
+ generator:
145
+ model_path: "./models/phi-2.Q4_0.gguf"
146
+ quantization: "Q4_0"
147
+ max_tokens: 512
148
+ n_ctx: 2048
149
+
150
+ retrieval:
151
+ vector_store: "faiss"
152
+ top_k: 5
153
+ similarity_threshold: 0.75
154
+
155
+ commit:
156
+ tone: "imperative"
157
+ style: "conventional"
158
+ max_length: 72
159
+
160
+ logging:
161
+ verbose: true
162
+ telemetry: false
163
+ ```
164
+
165
+ ### Configuration Options
166
+
167
+ - **embedding.model_path**: Path to the EmbeddingGemma-300m model
168
+ - **generator.model_path**: Path to the Phi-2 GGUF model file
169
+ - **retrieval.top_k**: Number of documents to retrieve for context
170
+ - **retrieval.similarity_threshold**: Minimum similarity score for results
171
+ - **generator.max_tokens**: Maximum tokens for generation
172
+ - **generator.n_ctx**: Context window size for Phi-2
173
+
174
+ ## Dependencies
175
+
176
+ - `sentence-transformers>=2.2.2` - Document embedding
177
+ - `faiss-cpu>=1.7.4` - Vector similarity search
178
+ - `llama-cpp-python>=0.2.23` - Phi-2 model inference (Windows compatible)
179
+ - `typer>=0.9.0` - CLI framework
180
+ - `PyYAML>=6.0` - Configuration file parsing
181
+
182
+ ## Troubleshooting
183
+
184
+ ### Model Loading Issues
185
+
186
+ If you encounter model loading errors:
187
+
188
+ 1. **Embedding Model**: Ensure `embeddinggemma-300m` is a directory containing all model files
189
+ 2. **Phi-2 Model**: Ensure `phi-2.Q4_0.gguf` is a single GGUF file
190
+ 3. **Paths**: All paths in `config.yaml` should be relative to the project root
191
+
192
+ ### Memory Issues
193
+
194
+ For systems with limited RAM:
195
+
196
+ - Use Q4_0 quantization for Phi-2 (already configured)
197
+ - Reduce `n_ctx` in config.yaml if needed
198
+ - Process documents in smaller batches
199
+
200
+ ### Windows-Specific Issues
201
+
202
+ - Ensure `llama-cpp-python` version supports Windows
203
+ - Use PowerShell or Command Prompt for CLI commands
204
+ - Check file path separators in configuration
205
+
206
+ ## Development
207
+
208
+ To test the modules:
209
+
210
+ ```bash
211
+ python -c "from src import *; print('All modules imported successfully')"
212
+ ```
213
+
214
+ To run in development mode:
215
+
216
+ ```bash
217
+ python cli.py --help
218
+ ```
219
+
220
+ ## Contributing
221
+
222
+ Contributions to CodeMind are welcome! Please feel free to submit pull requests, create issues, or suggest new features.
223
+
224
+ ## License
225
+
226
+ This project is licensed under the terms of the LICENSE file included in the repository.
227
+
228
+ Β© 2025 CodeMind. All rights reserved.
README.md CHANGED
@@ -341,3 +341,5 @@ Contributions to CodeMind are welcome! Please feel free to submit pull requests,
341
  ## License
342
 
343
  This project is licensed under the terms of the LICENSE file included in the repository.
 
 
 
341
  ## License
342
 
343
  This project is licensed under the terms of the LICENSE file included in the repository.
344
+
345
+ Β© 2025 CodeMind. All rights reserved.
index.html CHANGED
@@ -275,7 +275,8 @@
275
  <li><a href="#setup">Setup</a></li>
276
  </ul>
277
 
278
- <p>&copy; 2023 CodeMind. All rights reserved.</p>
 
279
  </div>
280
  </footer>
281
  </body>
 
275
  <li><a href="#setup">Setup</a></li>
276
  </ul>
277
 
278
+ <p>&copy; 2025 CodeMind. All rights reserved.
279
+ </p>
280
  </div>
281
  </footer>
282
  </body>