📘 Model Card for askmydocs-lora-v1

This model card provides detailed information about askmydocs-lora-v1, a fine-tuned conversational AI model.

Model Details

Model Description

askmydocs-lora-v1 is a lightweight and efficient instruction-tuned conversational AI model derived from Hermes-2-Pro-Mistral-7b, optimized using Low-Rank Adaptation (LoRA). It was fine-tuned with the yahma/alpaca-cleaned dataset, specifically a curated subset of 10,000 samples, to enhance performance in retrieval and conversational interactions.

Developed by: deanngkl
Model Type: Instruction-tuned conversational AI (LLM)
Languages: English (primarily)
License: Apache-2.0
Fine-tuned from model: Hermes-2-Pro-Mistral-7b

Model Sources

Repository: Hugging Face Repository

Uses

Direct Use

Conversational AI for general queries
Retrieval-Augmented Generation (RAG) tasks
Document summarization and information extraction

Downstream Use

Integration into conversational AI platforms
Customized document analysis systems
Enhanced customer support solutions

Out-of-Scope Use

Critical decision-making in healthcare, finance, or legal matters without thorough human review
Non-English linguistic applications

Bias, Risks, and Limitations

May reflect biases present in training data (yahma/alpaca-cleaned)
Limited effectiveness in domains outside the training scope or highly specialized subjects

Recommendations

Users should carefully assess the model outputs for bias and accuracy, especially when deploying in sensitive contexts.
External validation is recommended for critical applications.

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained("deanngkl/askmydocs-lora-v1")
model = AutoModelForCausalLM.from_pretrained(
    "deanngkl/askmydocs-lora-v1",
    load_in_4bit=True,
    device_map="auto"
)

chat = pipeline("text-generation", model=model, tokenizer=tokenizer)
response = chat("📄 Document content here\n\nQ: Summarize the document.")
print(response[0]['generated_text'])

Training Details

Training Data

Dataset: yahma/alpaca-cleaned (10,000 samples)
Preprocessing: Standardized prompts, deduplication, profanity and bias filtering

Training Procedure

Method: LoRA (Low-Rank Adaptation)
Epochs: 3
Batch Size: 4 (gradient accumulation steps: 4)
Learning Rate: 1e-4
Optimizer: AdamW with cosine decay and warm-up
Precision: Mixed (fp16)
Hardware: RunPod Cloud with NVIDIA RTX A5000 GPU (24 GB VRAM)

Speeds, Sizes, Times

Checkpoint Size: ~100 MB (LoRA adapters)
Training Duration: Approximately 3 hours

Evaluation

Testing Data, Factors & Metrics

Tensorboard Log
Testing Data: Validation subset (5% of the training set)
Metrics: Loss reduction, coherence, instruction-following accuracy

Results

Validation Loss: Decreased consistently, indicating stable training
Instruction-following: Improved coherence and context-awareness

Environmental Impact

Carbon emissions were minimized by using efficient LoRA fine-tuning on cloud infrastructure:

Hardware Type: NVIDIA RTX A5000
Cloud Provider: RunPod
Compute Region: US (West Coast)
Estimated Carbon Emissions: Low (due to efficient GPU usage and short training duration)

Technical Specifications

Model Architecture and Objective

Architecture: Hermes-2-Pro-Mistral-7b with LoRA adapters
Objective: Enhanced conversational abilities for retrieval and instructional tasks

Compute Infrastructure

Hardware

Hardware: NVIDIA RTX A5000 (24 GB VRAM)

Software

Software: Hugging Face Transformers, PyTorch, BitsAndBytes

Citation

@misc{deanngkl_askmydocs_lora_v1_2025,
  title = {askmydocs-lora-v1: Instruction-tuned Hermes-2-Pro-Mistral-7B via LoRA},
  author = {deanngkl},
  year = {2025},
  howpublished = {\url{https://huggingface.co/deanngkl/askmydocs-lora-v1}}
}

Model Card Authors

Dean Ng Kwan Lung

Model Card Contact

Blog : Portfolio
LinkedIn : LinkedIn
GitHub : GitHub
Email : [email protected]

deanngkl
/

askmydocs-lora-v1