devjas1 commited on
Commit
03e744b
Β·
1 Parent(s): 3143d77

(FEAT/DOCS)[Docs: Readme + .gitignore]: add README.md with project details and setup instructions

Browse files
Files changed (3) hide show
  1. .gitignore +33 -0
  2. .replit +0 -39
  3. README.md +226 -0
.gitignore CHANGED
@@ -11,3 +11,36 @@ rapid*
11
  ac4a*
12
  *.bin
13
  *.gguf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ac4a*
12
  *.bin
13
  *.gguf
14
+ ac4a*
15
+ *.bin
16
+ *.gguf
17
+
18
+ # Python
19
+ __pycache__/
20
+ *.py[cod]
21
+ *$py.class
22
+ *.so
23
+ .Python
24
+ build/
25
+ develop-eggs/
26
+ downloads/
27
+ eggs/
28
+ .eggs/
29
+ lib/
30
+ lib64/
31
+ parts/
32
+ sdist/
33
+ var/
34
+ wheels/
35
+ *.egg-info/
36
+ .installed.cfg
37
+ *.egg
38
+
39
+ # Virtual environments
40
+ codemind/
41
+ venv/
42
+ env/
43
+ ENV/
44
+
45
+ # Vector cache
46
+ vector_cache/
.replit DELETED
@@ -1,39 +0,0 @@
1
- modules = ["nodejs-20", "web", "postgresql-16"]
2
- run = "npm run dev"
3
- hidden = [".config", ".git", "generated-icon.png", "node_modules", "dist"]
4
-
5
- [nix]
6
- channel = "stable-24_05"
7
-
8
- [deployment]
9
- deploymentTarget = "autoscale"
10
- build = ["npm", "run", "build"]
11
- run = ["npm", "run", "start"]
12
-
13
- [[ports]]
14
- localPort = 5000
15
- externalPort = 80
16
-
17
- [env]
18
- PORT = "5000"
19
-
20
- [workflows]
21
- runButton = "Project"
22
-
23
- [[workflows.workflow]]
24
- name = "Project"
25
- mode = "parallel"
26
- author = "agent"
27
-
28
- [[workflows.workflow.tasks]]
29
- task = "workflow.run"
30
- args = "Start application"
31
-
32
- [[workflows.workflow]]
33
- name = "Start application"
34
- author = "agent"
35
-
36
- [[workflows.workflow.tasks]]
37
- task = "shell.exec"
38
- args = "npm run dev"
39
- waitForPort = 5000
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md ADDED
@@ -0,0 +1,226 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CodeMind
2
+
3
+ A CLI tool for intelligent document analysis and commit message generation using EmbeddingGemma-300m for embeddings, FAISS for vector storage, and Phi-2 for text generation.
4
+
5
+ ## Features
6
+
7
+ - **Document Indexing**: Embed and index documents for semantic search
8
+ - **Semantic Search**: Find relevant documents using natural language queries
9
+ - **Smart Commit Messages**: Generate meaningful commit messages from staged git changes
10
+ - **RAG (Retrieval-Augmented Generation)**: Answer questions using indexed document context
11
+
12
+ ## Setup
13
+
14
+ ### Prerequisites
15
+
16
+ - Windows 11
17
+ - Conda environment
18
+ - Git
19
+
20
+ ### Installation
21
+
22
+ 1. **Create a Conda environment:**
23
+
24
+ ```bash
25
+ conda create -n codemind python=3.9
26
+ conda activate codemind
27
+ ```
28
+
29
+ 2. **Clone the repository:**
30
+
31
+ ```bash
32
+ git clone https://github.com/devjas1/codemind.git
33
+ cd codemind
34
+ ```
35
+
36
+ 3. **Install dependencies:**
37
+
38
+ ```bash
39
+ pip install -r requirements.txt
40
+ ```
41
+
42
+ 4. **Download models:**
43
+
44
+ **Embedding Model (EmbeddingGemma-300m):**
45
+
46
+ - Download from Hugging Face: `google/embeddinggemma-300m`
47
+ - Place in `./models/embeddinggemma-300m/` directory
48
+
49
+ **Generation Model (Phi-2 GGUF):**
50
+
51
+ - Download the quantized Phi-2 model: `phi-2.Q4_0.gguf`
52
+ - Place in `./models/` directory
53
+ - Download from: [Microsoft Phi-2 GGUF](https://huggingface.co/microsoft/phi-2-gguf) or similar quantized versions
54
+
55
+ ### Directory Structure
56
+
57
+ ```
58
+ CodeMind/
59
+ β”œβ”€β”€ cli.py # Main CLI entry point
60
+ β”œβ”€β”€ config.yaml # Configuration file
61
+ β”œβ”€β”€ requirements.txt # Python dependencies
62
+ β”œβ”€β”€ models/ # Model storage
63
+ β”‚ β”œβ”€β”€ embeddinggemma-300m/ # Embedding model directory
64
+ β”‚ └── phi-2.Q4_0.gguf # Phi-2 quantized model file
65
+ β”œβ”€β”€ src/ # Core modules
66
+ β”‚ β”œβ”€β”€ config_loader.py # Configuration management
67
+ β”‚ β”œβ”€β”€ embedder.py # Document embedding
68
+ β”‚ β”œβ”€β”€ retriever.py # Semantic search
69
+ β”‚ β”œβ”€β”€ generator.py # Text generation
70
+ β”‚ └── diff_analyzer.py # Git diff analysis
71
+ β”œβ”€β”€ docs/ # Documentation
72
+ └── vector_cache/ # FAISS index storage (auto-created)
73
+ ```
74
+
75
+ ## Usage
76
+
77
+ ### Initialize Document Index
78
+
79
+ Index documents from a directory for semantic search:
80
+
81
+ ```bash
82
+ python cli.py init ./docs/
83
+ ```
84
+
85
+ This will:
86
+
87
+ - Embed all documents in the specified directory
88
+ - Create a FAISS index in `vector_cache/`
89
+ - Save metadata for retrieval
90
+
91
+ ### Semantic Search
92
+
93
+ Search for relevant documents using natural language:
94
+
95
+ ```bash
96
+ python cli.py search "how to configure the model"
97
+ ```
98
+
99
+ Returns ranked results with similarity scores.
100
+
101
+ ### Ask Questions (RAG)
102
+
103
+ Get answers based on your indexed documents:
104
+
105
+ ```bash
106
+ python cli.py ask "What are the configuration options?"
107
+ ```
108
+
109
+ Uses retrieval-augmented generation to provide contextual answers.
110
+
111
+ ### Git Commit Message Generation
112
+
113
+ Generate intelligent commit messages from staged changes:
114
+
115
+ ```bash
116
+ # Preview commit message without applying
117
+ python cli.py commit --preview
118
+
119
+ # Show staged files and analysis without generating message
120
+ python cli.py commit --dry-run
121
+
122
+ # Generate and apply commit message
123
+ python cli.py commit --apply
124
+ ```
125
+
126
+ ### Start API Server (Future Feature)
127
+
128
+ ```bash
129
+ python cli.py serve --port 8000
130
+ ```
131
+
132
+ _Note: API server functionality is planned for future releases._
133
+
134
+ ## Configuration
135
+
136
+ Edit `config.yaml` to customize behavior:
137
+
138
+ ```yaml
139
+ embedding:
140
+ model_path: "./models/embeddinggemma-300m"
141
+ dim: 768
142
+ truncate_to: 128
143
+
144
+ generator:
145
+ model_path: "./models/phi-2.Q4_0.gguf"
146
+ quantization: "Q4_0"
147
+ max_tokens: 512
148
+ n_ctx: 2048
149
+
150
+ retrieval:
151
+ vector_store: "faiss"
152
+ top_k: 5
153
+ similarity_threshold: 0.75
154
+
155
+ commit:
156
+ tone: "imperative"
157
+ style: "conventional"
158
+ max_length: 72
159
+
160
+ logging:
161
+ verbose: true
162
+ telemetry: false
163
+ ```
164
+
165
+ ### Configuration Options
166
+
167
+ - **embedding.model_path**: Path to the EmbeddingGemma-300m model
168
+ - **generator.model_path**: Path to the Phi-2 GGUF model file
169
+ - **retrieval.top_k**: Number of documents to retrieve for context
170
+ - **retrieval.similarity_threshold**: Minimum similarity score for results
171
+ - **generator.max_tokens**: Maximum tokens for generation
172
+ - **generator.n_ctx**: Context window size for Phi-2
173
+
174
+ ## Dependencies
175
+
176
+ - `sentence-transformers>=2.2.2` - Document embedding
177
+ - `faiss-cpu>=1.7.4` - Vector similarity search
178
+ - `llama-cpp-python>=0.2.23` - Phi-2 model inference (Windows compatible)
179
+ - `typer>=0.9.0` - CLI framework
180
+ - `PyYAML>=6.0` - Configuration file parsing
181
+
182
+ ## Troubleshooting
183
+
184
+ ### Model Loading Issues
185
+
186
+ If you encounter model loading errors:
187
+
188
+ 1. **Embedding Model**: Ensure `embeddinggemma-300m` is a directory containing all model files
189
+ 2. **Phi-2 Model**: Ensure `phi-2.Q4_0.gguf` is a single GGUF file
190
+ 3. **Paths**: All paths in `config.yaml` should be relative to the project root
191
+
192
+ ### Memory Issues
193
+
194
+ For systems with limited RAM:
195
+
196
+ - Use Q4_0 quantization for Phi-2 (already configured)
197
+ - Reduce `n_ctx` in config.yaml if needed
198
+ - Process documents in smaller batches
199
+
200
+ ### Windows-Specific Issues
201
+
202
+ - Ensure `llama-cpp-python` version supports Windows
203
+ - Use PowerShell or Command Prompt for CLI commands
204
+ - Check file path separators in configuration
205
+
206
+ ## Development
207
+
208
+ To test the modules:
209
+
210
+ ```bash
211
+ python -c "from src import *; print('All modules imported successfully')"
212
+ ```
213
+
214
+ To run in development mode:
215
+
216
+ ```bash
217
+ python cli.py --help
218
+ ```
219
+
220
+ ## License
221
+
222
+ [Insert your license information here]
223
+
224
+ ## Contributing
225
+
226
+ [Insert contribution guidelines here]