Spaces:

dev-jas
/

CodeMind

Running

App Files Files Community

devjas1 commited on 7 days ago

Commit

d397fab

1 Parent(s): 6bebf5c

(UPDATE): [docs]: add ABOUT.md for project overview and installation instructions; update copyright year in index.html

Browse files

Files changed (3) hide show

ABOUT.md +228 -0
README.md +2 -0
index.html +2 -1

ABOUT.md ADDED Viewed

	@@ -0,0 +1,228 @@

+# CodeMind
+A CLI tool for intelligent document analysis and commit message generation using EmbeddingGemma-300m for embeddings, FAISS for vector storage, and Phi-2 for text generation.
+## Features
+- **Document Indexing**: Embed and index documents for semantic search
+- **Semantic Search**: Find relevant documents using natural language queries
+- **Smart Commit Messages**: Generate meaningful commit messages from staged git changes
+- **RAG (Retrieval-Augmented Generation)**: Answer questions using indexed document context
+## Setup
+### Prerequisites
+- Windows 11
+- Conda environment
+- Git
+### Installation
+1. **Create a Conda environment:**
+   ```bash
+   conda create -n codemind python=3.9
+   conda activate codemind
+   ```
+2. **Clone the repository:**
+   ```bash
+   git clone https://github.com/devjas1/codemind.git
+   cd codemind
+   ```
+3. **Install dependencies:**
+   ```bash
+   pip install -r requirements.txt
+   ```
+4. **Download models:**
+   **Embedding Model (EmbeddingGemma-300m):**
+   - Download from Hugging Face: `google/embeddinggemma-300m`
+   - Place in `./models/embeddinggemma-300m/` directory
+   **Generation Model (Phi-2 GGUF):**
+   - Download the quantized Phi-2 model: `phi-2.Q4_0.gguf`
+   - Place in `./models/` directory
+   - Download from: [Microsoft Phi-2 GGUF](https://huggingface.co/microsoft/phi-2-gguf) or similar quantized versions
+### Directory Structure
+```
+CodeMind/
+├── cli.py                      # Main CLI entry point
+├── config.yaml                 # Configuration file
+├── requirements.txt            # Python dependencies
+├── models/                     # Model storage
+│   ├── embeddinggemma-300m/    # Embedding model directory
+│   └── phi-2.Q4_0.gguf        # Phi-2 quantized model file
+├── src/                        # Core modules
+│   ├── config_loader.py        # Configuration management
+│   ├── embedder.py             # Document embedding
+│   ├── retriever.py            # Semantic search
+│   ├── generator.py            # Text generation
+│   └── diff_analyzer.py        # Git diff analysis
+├── docs/                       # Documentation
+└── vector_cache/              # FAISS index storage (auto-created)
+```
+## Usage
+### Initialize Document Index
+Index documents from a directory for semantic search:
+```bash
+python cli.py init ./docs/
+```
+This will:
+- Embed all documents in the specified directory
+- Create a FAISS index in `vector_cache/`
+- Save metadata for retrieval
+### Semantic Search
+Search for relevant documents using natural language:
+```bash
+python cli.py search "how to configure the model"
+```
+Returns ranked results with similarity scores.
+### Ask Questions (RAG)
+Get answers based on your indexed documents:
+```bash
+python cli.py ask "What are the configuration options?"
+```
+Uses retrieval-augmented generation to provide contextual answers.
+### Git Commit Message Generation
+Generate intelligent commit messages from staged changes:
+```bash
+# Preview commit message without applying
+python cli.py commit --preview
+# Show staged files and analysis without generating message
+python cli.py commit --dry-run
+# Generate and apply commit message
+python cli.py commit --apply
+```
+### Start API Server (Future Feature)
+```bash
+python cli.py serve --port 8000
+```
+_Note: API server functionality is planned for future releases._
+## Configuration
+Edit `config.yaml` to customize behavior:
+```yaml
+embedding:
+  model_path: "./models/embeddinggemma-300m"
+  dim: 768
+  truncate_to: 128
+generator:
+  model_path: "./models/phi-2.Q4_0.gguf"
+  quantization: "Q4_0"
+  max_tokens: 512
+  n_ctx: 2048
+retrieval:
+  vector_store: "faiss"
+  top_k: 5
+  similarity_threshold: 0.75
+commit:
+  tone: "imperative"
+  style: "conventional"
+  max_length: 72
+logging:
+  verbose: true
+  telemetry: false
+```
+### Configuration Options
+- **embedding.model_path**: Path to the EmbeddingGemma-300m model
+- **generator.model_path**: Path to the Phi-2 GGUF model file
+- **retrieval.top_k**: Number of documents to retrieve for context
+- **retrieval.similarity_threshold**: Minimum similarity score for results
+- **generator.max_tokens**: Maximum tokens for generation
+- **generator.n_ctx**: Context window size for Phi-2
+## Dependencies
+- `sentence-transformers>=2.2.2` - Document embedding
+- `faiss-cpu>=1.7.4` - Vector similarity search
+- `llama-cpp-python>=0.2.23` - Phi-2 model inference (Windows compatible)
+- `typer>=0.9.0` - CLI framework
+- `PyYAML>=6.0` - Configuration file parsing
+## Troubleshooting
+### Model Loading Issues
+If you encounter model loading errors:
+1. **Embedding Model**: Ensure `embeddinggemma-300m` is a directory containing all model files
+2. **Phi-2 Model**: Ensure `phi-2.Q4_0.gguf` is a single GGUF file
+3. **Paths**: All paths in `config.yaml` should be relative to the project root
+### Memory Issues
+For systems with limited RAM:
+- Use Q4_0 quantization for Phi-2 (already configured)
+- Reduce `n_ctx` in config.yaml if needed
+- Process documents in smaller batches
+### Windows-Specific Issues
+- Ensure `llama-cpp-python` version supports Windows
+- Use PowerShell or Command Prompt for CLI commands
+- Check file path separators in configuration
+## Development
+To test the modules:
+```bash
+python -c "from src import *; print('All modules imported successfully')"
+```
+To run in development mode:
+```bash
+python cli.py --help
+```
+## Contributing
+Contributions to CodeMind are welcome! Please feel free to submit pull requests, create issues, or suggest new features.
+## License
+This project is licensed under the terms of the LICENSE file included in the repository.
+© 2025 CodeMind. All rights reserved.

README.md CHANGED Viewed

@@ -341,3 +341,5 @@ Contributions to CodeMind are welcome! Please feel free to submit pull requests,
 ## License
 This project is licensed under the terms of the LICENSE file included in the repository.

 ## License
 This project is licensed under the terms of the LICENSE file included in the repository.
+© 2025 CodeMind. All rights reserved.

index.html CHANGED Viewed

@@ -275,7 +275,8 @@
                 <li><a href="#setup">Setup</a></li>
             </ul>
-            <p>&copy; 2023 CodeMind. All rights reserved.</p>
         </div>
     </footer>
 </body>

                 <li><a href="#setup">Setup</a></li>
             </ul>
+            <p>&copy; 2025 CodeMind. All rights reserved.
+            </p>
         </div>
     </footer>
 </body>