Spaces:

dev-jas
/

CodeMind

Running

App Files Files Community

devjas1 commited on about 1 month ago

Commit

03e744b

1 Parent(s): 3143d77

(FEAT/DOCS)[Docs: Readme + .gitignore]: add README.md with project details and setup instructions

Browse files

Files changed (3) hide show

.gitignore +33 -0
.replit +0 -39
README.md +226 -0

.gitignore CHANGED Viewed

@@ -11,3 +11,36 @@ rapid*
 ac4a*
 *.bin
 *.gguf

 ac4a*
 *.bin
 *.gguf
+ac4a*
+*.bin
+*.gguf
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual environments
+codemind/
+venv/
+env/
+ENV/
+# Vector cache
+vector_cache/

.replit DELETED Viewed

@@ -1,39 +0,0 @@
-modules = ["nodejs-20", "web", "postgresql-16"]
-run = "npm run dev"
-hidden = [".config", ".git", "generated-icon.png", "node_modules", "dist"]
-[nix]
-channel = "stable-24_05"
-[deployment]
-deploymentTarget = "autoscale"
-build = ["npm", "run", "build"]
-run = ["npm", "run", "start"]
-[[ports]]
-localPort = 5000
-externalPort = 80
-[env]
-PORT = "5000"
-[workflows]
-runButton = "Project"
-[[workflows.workflow]]
-name = "Project"
-mode = "parallel"
-author = "agent"
-[[workflows.workflow.tasks]]
-task = "workflow.run"
-args = "Start application"
-[[workflows.workflow]]
-name = "Start application"
-author = "agent"
-[[workflows.workflow.tasks]]
-task = "shell.exec"
-args = "npm run dev"
-waitForPort = 5000

README.md ADDED Viewed

	@@ -0,0 +1,226 @@

+# CodeMind
+A CLI tool for intelligent document analysis and commit message generation using EmbeddingGemma-300m for embeddings, FAISS for vector storage, and Phi-2 for text generation.
+## Features
+- **Document Indexing**: Embed and index documents for semantic search
+- **Semantic Search**: Find relevant documents using natural language queries
+- **Smart Commit Messages**: Generate meaningful commit messages from staged git changes
+- **RAG (Retrieval-Augmented Generation)**: Answer questions using indexed document context
+## Setup
+### Prerequisites
+- Windows 11
+- Conda environment
+- Git
+### Installation
+1. **Create a Conda environment:**
+   ```bash
+   conda create -n codemind python=3.9
+   conda activate codemind
+   ```
+2. **Clone the repository:**
+   ```bash
+   git clone https://github.com/devjas1/codemind.git
+   cd codemind
+   ```
+3. **Install dependencies:**
+   ```bash
+   pip install -r requirements.txt
+   ```
+4. **Download models:**
+   **Embedding Model (EmbeddingGemma-300m):**
+   - Download from Hugging Face: `google/embeddinggemma-300m`
+   - Place in `./models/embeddinggemma-300m/` directory
+   **Generation Model (Phi-2 GGUF):**
+   - Download the quantized Phi-2 model: `phi-2.Q4_0.gguf`
+   - Place in `./models/` directory
+   - Download from: [Microsoft Phi-2 GGUF](https://huggingface.co/microsoft/phi-2-gguf) or similar quantized versions
+### Directory Structure
+```
+CodeMind/
+├── cli.py                      # Main CLI entry point
+├── config.yaml                 # Configuration file
+├── requirements.txt            # Python dependencies
+├── models/                     # Model storage
+│   ├── embeddinggemma-300m/    # Embedding model directory
+│   └── phi-2.Q4_0.gguf        # Phi-2 quantized model file
+├── src/                        # Core modules
+│   ├── config_loader.py        # Configuration management
+│   ├── embedder.py             # Document embedding
+│   ├── retriever.py            # Semantic search
+│   ├── generator.py            # Text generation
+│   └── diff_analyzer.py        # Git diff analysis
+├── docs/                       # Documentation
+└── vector_cache/              # FAISS index storage (auto-created)
+```
+## Usage
+### Initialize Document Index
+Index documents from a directory for semantic search:
+```bash
+python cli.py init ./docs/
+```
+This will:
+- Embed all documents in the specified directory
+- Create a FAISS index in `vector_cache/`
+- Save metadata for retrieval
+### Semantic Search
+Search for relevant documents using natural language:
+```bash
+python cli.py search "how to configure the model"
+```
+Returns ranked results with similarity scores.
+### Ask Questions (RAG)
+Get answers based on your indexed documents:
+```bash
+python cli.py ask "What are the configuration options?"
+```
+Uses retrieval-augmented generation to provide contextual answers.
+### Git Commit Message Generation
+Generate intelligent commit messages from staged changes:
+```bash
+# Preview commit message without applying
+python cli.py commit --preview
+# Show staged files and analysis without generating message
+python cli.py commit --dry-run
+# Generate and apply commit message
+python cli.py commit --apply
+```
+### Start API Server (Future Feature)
+```bash
+python cli.py serve --port 8000
+```
+_Note: API server functionality is planned for future releases._
+## Configuration
+Edit `config.yaml` to customize behavior:
+```yaml
+embedding:
+  model_path: "./models/embeddinggemma-300m"
+  dim: 768
+  truncate_to: 128
+generator:
+  model_path: "./models/phi-2.Q4_0.gguf"
+  quantization: "Q4_0"
+  max_tokens: 512
+  n_ctx: 2048
+retrieval:
+  vector_store: "faiss"
+  top_k: 5
+  similarity_threshold: 0.75
+commit:
+  tone: "imperative"
+  style: "conventional"
+  max_length: 72
+logging:
+  verbose: true
+  telemetry: false
+```
+### Configuration Options
+- **embedding.model_path**: Path to the EmbeddingGemma-300m model
+- **generator.model_path**: Path to the Phi-2 GGUF model file
+- **retrieval.top_k**: Number of documents to retrieve for context
+- **retrieval.similarity_threshold**: Minimum similarity score for results
+- **generator.max_tokens**: Maximum tokens for generation
+- **generator.n_ctx**: Context window size for Phi-2
+## Dependencies
+- `sentence-transformers>=2.2.2` - Document embedding
+- `faiss-cpu>=1.7.4` - Vector similarity search
+- `llama-cpp-python>=0.2.23` - Phi-2 model inference (Windows compatible)
+- `typer>=0.9.0` - CLI framework
+- `PyYAML>=6.0` - Configuration file parsing
+## Troubleshooting
+### Model Loading Issues
+If you encounter model loading errors:
+1. **Embedding Model**: Ensure `embeddinggemma-300m` is a directory containing all model files
+2. **Phi-2 Model**: Ensure `phi-2.Q4_0.gguf` is a single GGUF file
+3. **Paths**: All paths in `config.yaml` should be relative to the project root
+### Memory Issues
+For systems with limited RAM:
+- Use Q4_0 quantization for Phi-2 (already configured)
+- Reduce `n_ctx` in config.yaml if needed
+- Process documents in smaller batches
+### Windows-Specific Issues
+- Ensure `llama-cpp-python` version supports Windows
+- Use PowerShell or Command Prompt for CLI commands
+- Check file path separators in configuration
+## Development
+To test the modules:
+```bash
+python -c "from src import *; print('All modules imported successfully')"
+```
+To run in development mode:
+```bash
+python cli.py --help
+```
+## License
+[Insert your license information here]
+## Contributing
+[Insert contribution guidelines here]