--- base_model: unsloth/gemma-2b-it library_name: peft tags: - text-to-mongodb - LoRA - instruction-tuning - mongodb - gemma license: mit language: - en --- # ๐Ÿง  Gemma 2B - MongoDB Query Generator (LoRA) This is a LoRA fine-tuned version of `unsloth/gemma-2b-it` that converts natural language instructions into **MongoDB query strings** like: ```js db.users.find({ "isActive": true, "age": { "$gt": 30 } }) ``` The model is instruction-tuned to support a text-to-query use case for MongoDB across typical collections like `users`, `orders`, and `products`. --- ## โœจ Model Details - **Base model**: [`unsloth/gemma-2b-it`](https://huggingface.co/unsloth/gemma-2b-it) - **Fine-tuned with**: LoRA (4-bit quantized) - **Framework**: [Unsloth](https://github.com/unslothai/unsloth) + PEFT - **Dataset**: Synthetic instructions paired with MongoDB queries (300+ examples) - **Use case**: Text-to-MongoDB query generation --- ## ๐Ÿ“ฆ How to Use ```python from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/gemma-2b-it", max_seq_length = 1024, dtype = torch.float16, load_in_4bit = True, ) # Load LoRA adapter model = FastLanguageModel.get_peft_model( model, r=16, lora_alpha=32, target_modules=["q_proj", "k_proj", "v_proj", "o_proj"], lora_dropout=0.05, bias="none", ) # Load parameter model.load_adapter("kihyun1998/gemma-2b-it-mongodb-lora", adapter_name="default") prompt = """### Instruction: Convert to MongoDB query string. ### Input: Collection: users Fields: - name (string) - age (int) - isActive (boolean) - country (string) Question: Show all active users from Korea older than 30. ### Response: """ inputs = tokenizer(prompt, return_tensors="pt").to("cuda") output = model.generate(**inputs, max_new_tokens=100) print(tokenizer.decode(output[0], skip_special_tokens=True)) ``` --- ## ๐Ÿ’ก Example Output ```js db.users.find({ "isActive": true, "country": "Korea", "age": { "$gt": 30 } }) ``` --- ## ๐Ÿ“š Intended Use - Converting business-friendly questions into executable MongoDB queries - Powering internal dashboards, query builders, or no-code tools - Works best on structured fields and simple query logic ### Out-of-scope: - Complex joins or aggregation pipelines - Nested or dynamic schema reasoning --- ## ๐Ÿ“Š Training Details - LoRA rank: 16 - Epochs: 3 - Dataset: 300+ synthetic natural language โ†’ MongoDB query pairs - Training hardware: Google Colab (T4 GPU) --- ## ๐Ÿšง Limitations - Model assumes collection and fields are already known (RAG context required) - May hallucinate field names not present in context - Limited handling of advanced MongoDB features like `$lookup`, `$aggregate` --- ## ๐Ÿงพ License The base model is under [Gemma license](https://ai.google.dev/gemma#license). This LoRA adapter inherits the same conditions. --- ## ๐Ÿง‘โ€๐Ÿ’ป Author - ๐Ÿฑ [@kihyun1998](https://huggingface.co/kihyun1998) - ๐Ÿ’ฌ Questions? Open an issue or contact via Hugging Face. --- ## ๐Ÿ Citation ```bibtex @misc{kihyun2025mongodb, title={Gemma 2B MongoDB Query Generator (LoRA)}, author={Kihyun Park}, year={2025}, howpublished={\\url{https://huggingface.co/kihyun1998/gemma-2b-it-mongodb-lora}} } ```