Cannabis Science QA Model

A specialized question-answering model fine-tuned on cannabis science literature. This model is based on IBM's Granite 3.0-8B Instruct and has been fine-tuned using QLoRA on 161,170 cannabis science Q&A pairs derived from peer-reviewed research papers.

Model Details

Model Description

This model specializes in answering scientific questions about cannabis, including topics in chemistry, biology, pharmacology, extraction methods, cultivation, and medical applications. It has been trained to provide accurate, research-backed responses based on peer-reviewed cannabis science literature.

Developed by: Kellan Finney
Funded by: Eighth Revolution
Model type: Causal Language Model (Fine-tuned with QLoRA)
Language(s): English
License: Apache 2.0
Base model: ibm-granite/granite-3.0-8b-instruct
Training method: QLoRA (Quantized Low-Rank Adaptation)

Model Sources

Repository: https://github.com/KellanFinney/Canna_LoRA
Dataset: Cannabis Science QA Dataset
Base Model: IBM Granite 3.0-8B Instruct

Uses

Direct Use

Scientific Q&A: Answer questions about cannabis chemistry, biology, and pharmacology
Research assistance: Help researchers understand cannabis science concepts
Educational support: Provide explanations for cannabis science topics
Literature synthesis: Summarize complex cannabis research findings

Downstream Use

Educational chatbots for cannabis science courses
Research tools for cannabis industry professionals
Content generation for scientific cannabis publications
Knowledge extraction from cannabis research literature

Out-of-Scope Use

Medical diagnosis or treatment advice - This model is for research/educational purposes only
Legal advice regarding cannabis regulations or compliance
Commercial product claims without proper validation
Replacement for professional medical consultation

Bias, Risks, and Limitations

Limitations

Research scope bias: Reflects the focus areas of available cannabis research literature
Publication bias: May favor well-studied aspects of cannabis science
Geographic bias: Primarily based on Western/English-language research
Temporal bias: Weighted toward more recent research (2010-2024)
Generated content: May occasionally produce plausible but incorrect information

Risks

Not for medical use: Should not be used for medical decision-making
Fact-checking required: Important claims should be verified against original sources
Regulatory compliance: Does not provide legal or regulatory guidance
Professional consultation: Medical and legal decisions require professional expertise

Recommendations

Users should:

Verify critical information against peer-reviewed sources
Use for research and educational purposes only
Consult healthcare professionals for medical questions
Check current regulations for legal compliance

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base model and tokenizer
base_model_id = "ibm-granite/granite-3.0-8b-instruct"
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)

# Load the fine-tuned adapter
model = PeftModel.from_pretrained(model, "KellanF89/cannabis-science-qa-model")

# Format your question
question = "What are the main cannabinoids found in cannabis and their effects?"
prompt = f"Question: {question}\nAnswer:"

# Generate response
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

Training Data

The model was fine-tuned on the Cannabis Science QA Dataset, which contains:

161,170 question-answer pairs derived from cannabis science literature
400+ source documents including peer-reviewed research papers and textbooks
Comprehensive coverage of cannabis chemistry, biology, pharmacology, and applications
High-quality generation using GPT-4o-mini with specialized prompting

Training Procedure

Training Hyperparameters

Training method: QLoRA (Quantized Low-Rank Adaptation)
Base model quantization: 4-bit (NF4)
LoRA rank: 16
LoRA alpha: 32
LoRA dropout: 0.1
Learning rate: 2e-4
Batch size: 4 (gradient accumulation steps: 4)
Epochs: 3
Total training steps: 30,222
Optimizer: AdamW
LR scheduler: Cosine with warmup
Mixed precision: bf16

Training Infrastructure

Hardware: GPU with sufficient VRAM for QLoRA training
Framework: Transformers, PEFT, BitsAndBytesConfig
Training time: Approximately 24-48 hours
Context length: 8,192 tokens during training

Evaluation

Benchmark Results

The model was evaluated on standard academic benchmarks to assess both specialized and general performance:

Benchmark	Score	Performance Level
SciQ (Science Q&A)	83.6%	Excellent ⭐
ARC-Easy (Grade School Science)	64.5%	Good ✅
ARC-Challenge (Advanced Science)	36.7%	Below Average ⚠️
MMLU Overall	36.4%	Below Average ⚠️

MMLU Breakdown:

STEM Subjects: 33.8%
Social Sciences: 39.7%
Other (Medical/Applied): 40.2%
Humanities: 33.4%

Performance Analysis

Strengths:

Exceptional science Q&A performance (83.6% on SciQ vs ~60-70% for general models)
Strong domain specialization - excels in cannabis and general science topics
Effective knowledge transfer from training dataset to science applications

Trade-offs:

Specialized vs. general knowledge - Lower scores on broad academic benchmarks
Domain focus - Optimized for science Q&A rather than general reasoning
Expected specialization effect - This performance profile indicates successful fine-tuning for domain expertise

Comparison Context:

While general 8B models (Llama 3.1, Qwen2.5) typically score 65-72% on MMLU, this model sacrifices general knowledge for superior science Q&A performance. This trade-off is intentional and beneficial for cannabis science applications.

Evaluation Approach

Standard benchmarks: SciQ, ARC, MMLU for quantitative assessment
Domain expert review: Manual evaluation by cannabis science professionals
Factual accuracy: Cross-reference with source literature
Response coherence: Assessment of answer quality and relevance
Safety evaluation: Ensuring appropriate disclaimers for medical topics

Technical Specifications

Model Architecture

Base Architecture: Granite 3.0-8B Instruct (Transformer decoder)
Parameter count: ~8 billion parameters (base model)
LoRA parameters: ~67 million trainable parameters
Quantization: 4-bit NF4 quantization for efficiency
Context window: 8,192 tokens

Compute Infrastructure

Training hardware: High-end GPU with 24GB+ VRAM
Inference requirements: 12-16GB VRAM recommended
Optimization: QLoRA enables training on consumer hardware

Citation

If you use this model in your research, please cite:

@misc{finney2025cannabis_model,
  title={Cannabis Science QA Model: QLoRA Fine-tuned Granite for Cannabis Research},
  author={Kellan Finney},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/KellanF89/cannabis-science-qa-model}
}

Model Card Authors

Kellan Finney - Model development, training, and evaluation

Model Card Contact

For questions, feedback, or collaboration opportunities, please reach out via LinkedIn.

This model is designed to advance cannabis science research and education. Always consult qualified professionals for medical, legal, or regulatory decisions.

KellanF89
/

Newton-Insights-V1-cannabis-extraction-science