Spaces:

dev-jas
/

CodeMind

Running

App Files Files Community

CodeMind / README.md

devjas1

(FEAT/DOCS)[Docs: Readme + .gitignore]: add README.md with project details and setup instructions

03e744b 11 days ago

preview code

raw

history blame

5.8 kB

CodeMind

A CLI tool for intelligent document analysis and commit message generation using EmbeddingGemma-300m for embeddings, FAISS for vector storage, and Phi-2 for text generation.

Features

Document Indexing: Embed and index documents for semantic search
Semantic Search: Find relevant documents using natural language queries
Smart Commit Messages: Generate meaningful commit messages from staged git changes
RAG (Retrieval-Augmented Generation): Answer questions using indexed document context

Setup

Prerequisites

Windows 11
Conda environment
Git

Installation

Create a Conda environment:

conda create -n codemind python=3.9
conda activate codemind

Clone the repository:

git clone https://github.com/devjas1/codemind.git
cd codemind

Install dependencies:
```
pip install -r requirements.txt
```
Download models:

Embedding Model (EmbeddingGemma-300m):
- Download from Hugging Face: google/embeddinggemma-300m
- Place in ./models/embeddinggemma-300m/ directory
Generation Model (Phi-2 GGUF):
- Download the quantized Phi-2 model: phi-2.Q4_0.gguf
- Place in ./models/ directory
- Download from: Microsoft Phi-2 GGUF or similar quantized versions

Directory Structure

CodeMind/
├── cli.py                      # Main CLI entry point
├── config.yaml                 # Configuration file
├── requirements.txt            # Python dependencies
├── models/                     # Model storage
│   ├── embeddinggemma-300m/    # Embedding model directory
│   └── phi-2.Q4_0.gguf        # Phi-2 quantized model file
├── src/                        # Core modules
│   ├── config_loader.py        # Configuration management
│   ├── embedder.py             # Document embedding
│   ├── retriever.py            # Semantic search
│   ├── generator.py            # Text generation
│   └── diff_analyzer.py        # Git diff analysis
├── docs/                       # Documentation
└── vector_cache/              # FAISS index storage (auto-created)

Usage

Initialize Document Index

Index documents from a directory for semantic search:

python cli.py init ./docs/

This will:

Embed all documents in the specified directory
Create a FAISS index in vector_cache/
Save metadata for retrieval

Semantic Search

Search for relevant documents using natural language:

python cli.py search "how to configure the model"

Returns ranked results with similarity scores.

Ask Questions (RAG)

Get answers based on your indexed documents:

python cli.py ask "What are the configuration options?"

Uses retrieval-augmented generation to provide contextual answers.

Git Commit Message Generation

Generate intelligent commit messages from staged changes:

# Preview commit message without applying
python cli.py commit --preview

# Show staged files and analysis without generating message
python cli.py commit --dry-run

# Generate and apply commit message
python cli.py commit --apply

Start API Server (Future Feature)

python cli.py serve --port 8000

Note: API server functionality is planned for future releases.

Configuration

Edit config.yaml to customize behavior:

embedding:
  model_path: "./models/embeddinggemma-300m"
  dim: 768
  truncate_to: 128

generator:
  model_path: "./models/phi-2.Q4_0.gguf"
  quantization: "Q4_0"
  max_tokens: 512
  n_ctx: 2048

retrieval:
  vector_store: "faiss"
  top_k: 5
  similarity_threshold: 0.75

commit:
  tone: "imperative"
  style: "conventional"
  max_length: 72

logging:
  verbose: true
  telemetry: false

Configuration Options

embedding.model_path: Path to the EmbeddingGemma-300m model
generator.model_path: Path to the Phi-2 GGUF model file
retrieval.top_k: Number of documents to retrieve for context
retrieval.similarity_threshold: Minimum similarity score for results
generator.max_tokens: Maximum tokens for generation
generator.n_ctx: Context window size for Phi-2

Dependencies

sentence-transformers>=2.2.2 - Document embedding
faiss-cpu>=1.7.4 - Vector similarity search
llama-cpp-python>=0.2.23 - Phi-2 model inference (Windows compatible)
typer>=0.9.0 - CLI framework
PyYAML>=6.0 - Configuration file parsing

Troubleshooting

Model Loading Issues

If you encounter model loading errors:

Embedding Model: Ensure embeddinggemma-300m is a directory containing all model files
Phi-2 Model: Ensure phi-2.Q4_0.gguf is a single GGUF file
Paths: All paths in config.yaml should be relative to the project root

Memory Issues

For systems with limited RAM:

Use Q4_0 quantization for Phi-2 (already configured)
Reduce n_ctx in config.yaml if needed
Process documents in smaller batches

Windows-Specific Issues

Ensure llama-cpp-python version supports Windows
Use PowerShell or Command Prompt for CLI commands
Check file path separators in configuration

Development

To test the modules:

python -c "from src import *; print('All modules imported successfully')"

To run in development mode:

python cli.py --help

License

[Insert your license information here]

Contributing

[Insert contribution guidelines here]