Cannabis Science QA Model

A specialized question-answering model fine-tuned on cannabis science literature. This model is based on IBM's Granite 3.0-8B Instruct and has been fine-tuned using QLoRA on 161,170 cannabis science Q&A pairs derived from peer-reviewed research papers.

Model Details

Model Description

This model specializes in answering scientific questions about cannabis, including topics in chemistry, biology, pharmacology, extraction methods, cultivation, and medical applications. It has been trained to provide accurate, research-backed responses based on peer-reviewed cannabis science literature.

  • Developed by: Kellan Finney
  • Funded by: Eighth Revolution
  • Model type: Causal Language Model (Fine-tuned with QLoRA)
  • Language(s): English
  • License: Apache 2.0
  • Base model: ibm-granite/granite-3.0-8b-instruct
  • Training method: QLoRA (Quantized Low-Rank Adaptation)

Model Sources

Uses

Direct Use

  • Scientific Q&A: Answer questions about cannabis chemistry, biology, and pharmacology
  • Research assistance: Help researchers understand cannabis science concepts
  • Educational support: Provide explanations for cannabis science topics
  • Literature synthesis: Summarize complex cannabis research findings

Downstream Use

  • Educational chatbots for cannabis science courses
  • Research tools for cannabis industry professionals
  • Content generation for scientific cannabis publications
  • Knowledge extraction from cannabis research literature

Out-of-Scope Use

  • Medical diagnosis or treatment advice - This model is for research/educational purposes only
  • Legal advice regarding cannabis regulations or compliance
  • Commercial product claims without proper validation
  • Replacement for professional medical consultation

Bias, Risks, and Limitations

Limitations

  • Research scope bias: Reflects the focus areas of available cannabis research literature
  • Publication bias: May favor well-studied aspects of cannabis science
  • Geographic bias: Primarily based on Western/English-language research
  • Temporal bias: Weighted toward more recent research (2010-2024)
  • Generated content: May occasionally produce plausible but incorrect information

Risks

  • Not for medical use: Should not be used for medical decision-making
  • Fact-checking required: Important claims should be verified against original sources
  • Regulatory compliance: Does not provide legal or regulatory guidance
  • Professional consultation: Medical and legal decisions require professional expertise

Recommendations

Users should:

  • Verify critical information against peer-reviewed sources
  • Use for research and educational purposes only
  • Consult healthcare professionals for medical questions
  • Check current regulations for legal compliance

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base model and tokenizer
base_model_id = "ibm-granite/granite-3.0-8b-instruct"
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)

# Load the fine-tuned adapter
model = PeftModel.from_pretrained(model, "KellanF89/cannabis-science-qa-model")

# Format your question
question = "What are the main cannabinoids found in cannabis and their effects?"
prompt = f"Question: {question}\nAnswer:"

# Generate response
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

Training Data

The model was fine-tuned on the Cannabis Science QA Dataset, which contains:

  • 161,170 question-answer pairs derived from cannabis science literature
  • 400+ source documents including peer-reviewed research papers and textbooks
  • Comprehensive coverage of cannabis chemistry, biology, pharmacology, and applications
  • High-quality generation using GPT-4o-mini with specialized prompting

Training Procedure

Training Hyperparameters

  • Training method: QLoRA (Quantized Low-Rank Adaptation)
  • Base model quantization: 4-bit (NF4)
  • LoRA rank: 16
  • LoRA alpha: 32
  • LoRA dropout: 0.1
  • Learning rate: 2e-4
  • Batch size: 4 (gradient accumulation steps: 4)
  • Epochs: 3
  • Total training steps: 30,222
  • Optimizer: AdamW
  • LR scheduler: Cosine with warmup
  • Mixed precision: bf16

Training Infrastructure

  • Hardware: GPU with sufficient VRAM for QLoRA training
  • Framework: Transformers, PEFT, BitsAndBytesConfig
  • Training time: Approximately 24-48 hours
  • Context length: 8,192 tokens during training

Evaluation

Benchmark Results

The model was evaluated on standard academic benchmarks to assess both specialized and general performance:

Benchmark Score Performance Level
SciQ (Science Q&A) 83.6% Excellent โญ
ARC-Easy (Grade School Science) 64.5% Good โœ…
ARC-Challenge (Advanced Science) 36.7% Below Average โš ๏ธ
MMLU Overall 36.4% Below Average โš ๏ธ

MMLU Breakdown:

  • STEM Subjects: 33.8%
  • Social Sciences: 39.7%
  • Other (Medical/Applied): 40.2%
  • Humanities: 33.4%

Performance Analysis

Strengths:

  • Exceptional science Q&A performance (83.6% on SciQ vs ~60-70% for general models)
  • Strong domain specialization - excels in cannabis and general science topics
  • Effective knowledge transfer from training dataset to science applications

Trade-offs:

  • Specialized vs. general knowledge - Lower scores on broad academic benchmarks
  • Domain focus - Optimized for science Q&A rather than general reasoning
  • Expected specialization effect - This performance profile indicates successful fine-tuning for domain expertise

Comparison Context:

While general 8B models (Llama 3.1, Qwen2.5) typically score 65-72% on MMLU, this model sacrifices general knowledge for superior science Q&A performance. This trade-off is intentional and beneficial for cannabis science applications.

Evaluation Approach

  • Standard benchmarks: SciQ, ARC, MMLU for quantitative assessment
  • Domain expert review: Manual evaluation by cannabis science professionals
  • Factual accuracy: Cross-reference with source literature
  • Response coherence: Assessment of answer quality and relevance
  • Safety evaluation: Ensuring appropriate disclaimers for medical topics

Technical Specifications

Model Architecture

  • Base Architecture: Granite 3.0-8B Instruct (Transformer decoder)
  • Parameter count: ~8 billion parameters (base model)
  • LoRA parameters: ~67 million trainable parameters
  • Quantization: 4-bit NF4 quantization for efficiency
  • Context window: 8,192 tokens

Compute Infrastructure

  • Training hardware: High-end GPU with 24GB+ VRAM
  • Inference requirements: 12-16GB VRAM recommended
  • Optimization: QLoRA enables training on consumer hardware

Citation

If you use this model in your research, please cite:

@misc{finney2025cannabis_model,
  title={Cannabis Science QA Model: QLoRA Fine-tuned Granite for Cannabis Research},
  author={Kellan Finney},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/KellanF89/cannabis-science-qa-model}
}

Model Card Authors

Kellan Finney - Model development, training, and evaluation

Model Card Contact

For questions, feedback, or collaboration opportunities, please reach out via LinkedIn.


This model is designed to advance cannabis science research and education. Always consult qualified professionals for medical, legal, or regulatory decisions.

Downloads last month
10
GGUF
Model size
8.17B params
Architecture
granite
Hardware compatibility
Log In to view the estimation

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for KellanF89/Newton-Insights-V1-cannabis-extraction-science

Adapter
(7)
this model

Dataset used to train KellanF89/Newton-Insights-V1-cannabis-extraction-science