Cannabis Science QA Model
A specialized question-answering model fine-tuned on cannabis science literature. This model is based on IBM's Granite 3.0-8B Instruct and has been fine-tuned using QLoRA on 161,170 cannabis science Q&A pairs derived from peer-reviewed research papers.
Model Details
Model Description
This model specializes in answering scientific questions about cannabis, including topics in chemistry, biology, pharmacology, extraction methods, cultivation, and medical applications. It has been trained to provide accurate, research-backed responses based on peer-reviewed cannabis science literature.
- Developed by: Kellan Finney
- Funded by: Eighth Revolution
- Model type: Causal Language Model (Fine-tuned with QLoRA)
- Language(s): English
- License: Apache 2.0
- Base model: ibm-granite/granite-3.0-8b-instruct
- Training method: QLoRA (Quantized Low-Rank Adaptation)
Model Sources
- Repository: https://github.com/KellanFinney/Canna_LoRA
- Dataset: Cannabis Science QA Dataset
- Base Model: IBM Granite 3.0-8B Instruct
Uses
Direct Use
- Scientific Q&A: Answer questions about cannabis chemistry, biology, and pharmacology
- Research assistance: Help researchers understand cannabis science concepts
- Educational support: Provide explanations for cannabis science topics
- Literature synthesis: Summarize complex cannabis research findings
Downstream Use
- Educational chatbots for cannabis science courses
- Research tools for cannabis industry professionals
- Content generation for scientific cannabis publications
- Knowledge extraction from cannabis research literature
Out-of-Scope Use
- Medical diagnosis or treatment advice - This model is for research/educational purposes only
- Legal advice regarding cannabis regulations or compliance
- Commercial product claims without proper validation
- Replacement for professional medical consultation
Bias, Risks, and Limitations
Limitations
- Research scope bias: Reflects the focus areas of available cannabis research literature
- Publication bias: May favor well-studied aspects of cannabis science
- Geographic bias: Primarily based on Western/English-language research
- Temporal bias: Weighted toward more recent research (2010-2024)
- Generated content: May occasionally produce plausible but incorrect information
Risks
- Not for medical use: Should not be used for medical decision-making
- Fact-checking required: Important claims should be verified against original sources
- Regulatory compliance: Does not provide legal or regulatory guidance
- Professional consultation: Medical and legal decisions require professional expertise
Recommendations
Users should:
- Verify critical information against peer-reviewed sources
- Use for research and educational purposes only
- Consult healthcare professionals for medical questions
- Check current regulations for legal compliance
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load base model and tokenizer
base_model_id = "ibm-granite/granite-3.0-8b-instruct"
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
# Load the fine-tuned adapter
model = PeftModel.from_pretrained(model, "KellanF89/cannabis-science-qa-model")
# Format your question
question = "What are the main cannabinoids found in cannabis and their effects?"
prompt = f"Question: {question}\nAnswer:"
# Generate response
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Training Details
Training Data
The model was fine-tuned on the Cannabis Science QA Dataset, which contains:
- 161,170 question-answer pairs derived from cannabis science literature
- 400+ source documents including peer-reviewed research papers and textbooks
- Comprehensive coverage of cannabis chemistry, biology, pharmacology, and applications
- High-quality generation using GPT-4o-mini with specialized prompting
Training Procedure
Training Hyperparameters
- Training method: QLoRA (Quantized Low-Rank Adaptation)
- Base model quantization: 4-bit (NF4)
- LoRA rank: 16
- LoRA alpha: 32
- LoRA dropout: 0.1
- Learning rate: 2e-4
- Batch size: 4 (gradient accumulation steps: 4)
- Epochs: 3
- Total training steps: 30,222
- Optimizer: AdamW
- LR scheduler: Cosine with warmup
- Mixed precision: bf16
Training Infrastructure
- Hardware: GPU with sufficient VRAM for QLoRA training
- Framework: Transformers, PEFT, BitsAndBytesConfig
- Training time: Approximately 24-48 hours
- Context length: 8,192 tokens during training
Evaluation
Benchmark Results
The model was evaluated on standard academic benchmarks to assess both specialized and general performance:
Benchmark | Score | Performance Level |
---|---|---|
SciQ (Science Q&A) | 83.6% | Excellent โญ |
ARC-Easy (Grade School Science) | 64.5% | Good โ |
ARC-Challenge (Advanced Science) | 36.7% | Below Average โ ๏ธ |
MMLU Overall | 36.4% | Below Average โ ๏ธ |
MMLU Breakdown:
- STEM Subjects: 33.8%
- Social Sciences: 39.7%
- Other (Medical/Applied): 40.2%
- Humanities: 33.4%
Performance Analysis
Strengths:
- Exceptional science Q&A performance (83.6% on SciQ vs ~60-70% for general models)
- Strong domain specialization - excels in cannabis and general science topics
- Effective knowledge transfer from training dataset to science applications
Trade-offs:
- Specialized vs. general knowledge - Lower scores on broad academic benchmarks
- Domain focus - Optimized for science Q&A rather than general reasoning
- Expected specialization effect - This performance profile indicates successful fine-tuning for domain expertise
Comparison Context:
While general 8B models (Llama 3.1, Qwen2.5) typically score 65-72% on MMLU, this model sacrifices general knowledge for superior science Q&A performance. This trade-off is intentional and beneficial for cannabis science applications.
Evaluation Approach
- Standard benchmarks: SciQ, ARC, MMLU for quantitative assessment
- Domain expert review: Manual evaluation by cannabis science professionals
- Factual accuracy: Cross-reference with source literature
- Response coherence: Assessment of answer quality and relevance
- Safety evaluation: Ensuring appropriate disclaimers for medical topics
Technical Specifications
Model Architecture
- Base Architecture: Granite 3.0-8B Instruct (Transformer decoder)
- Parameter count: ~8 billion parameters (base model)
- LoRA parameters: ~67 million trainable parameters
- Quantization: 4-bit NF4 quantization for efficiency
- Context window: 8,192 tokens
Compute Infrastructure
- Training hardware: High-end GPU with 24GB+ VRAM
- Inference requirements: 12-16GB VRAM recommended
- Optimization: QLoRA enables training on consumer hardware
Citation
If you use this model in your research, please cite:
@misc{finney2025cannabis_model,
title={Cannabis Science QA Model: QLoRA Fine-tuned Granite for Cannabis Research},
author={Kellan Finney},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/KellanF89/cannabis-science-qa-model}
}
Model Card Authors
Kellan Finney - Model development, training, and evaluation
Model Card Contact
For questions, feedback, or collaboration opportunities, please reach out via LinkedIn.
This model is designed to advance cannabis science research and education. Always consult qualified professionals for medical, legal, or regulatory decisions.
- Downloads last month
- 10
16-bit
Model tree for KellanF89/Newton-Insights-V1-cannabis-extraction-science
Base model
ibm-granite/granite-3.0-8b-base