PersianSciQA-LoRA: Scientific Question Generation for Persian Literature
A specialized LoRA adapter that transforms PersianLLaMA-13B into a scientific question generation system for Persian academic abstracts.
Academic Overview
PersianSciQA-LoRA addresses the gap in Persian language processing for academic question generation. This adapter achieves specialized performance in generating relevant questions from Persian scientific abstracts across multiple domains.
Research Contributions
- First specialized Persian question generation model for scientific literature
- Efficient fine-tuning approach using LoRA methodology
- Cross-domain validation across medical, engineering, and computer science abstracts
- Significant performance improvement with minimal computational overhead
Model Specifications
Parameter | Value |
---|---|
Base Model | PersianLLaMA-13B (13 billion parameters) |
Adaptation Method | LoRA (Low-Rank Adaptation) |
LoRA Rank (r) | 32 |
LoRA Alpha | 64 |
Trainable Parameters | ~67M (0.5% of base model) |
Target Modules | Query, Key, Value, Output, Gate, Up, Down projections |
Training Language | Persian/Farsi |
Domain | Scientific Literature |
Training Methodology
Dataset
- Source: Curated Persian scientific abstracts
- Quality Filter: Relevance scores 2-3 (high quality)
- Domains: Medical, Engineering, Computer Science, Physics
- Size: 18,740 high-quality abstract-question pairs
Training Configuration
- Learning Rate: 2e-5 with cosine scheduling
- Batch Size: Effective batch size of 8 (accumulated)
- Epochs: 3 with early stopping
- Precision: Mixed precision (BF16)
- Hardware: RTX A6000 (48GB VRAM)
Performance Metrics
- Training Loss Reduction: >30% improvement
- Validation Stability: Consistent convergence
- Generation Quality: Coherent, contextually relevant questions
Usage
Installation
pip install transformers peft torch
Basic Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model and adapter
base_model = AutoModelForCausalLM.from_pretrained(
"ViraIntelligentDataMining/PersianLLaMA-13B",
torch_dtype=torch.bfloat16,
device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "YOUR_USERNAME/PersianSciQA-LoRA")
tokenizer = AutoTokenizer.from_pretrained("ViraIntelligentDataMining/PersianLLaMA-13B")
# Generate scientific question
abstract = "Your Persian scientific abstract here"
prompt = f"چکیده: {abstract}\nسوال:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=50,
do_sample=True,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.1,
pad_token_id=tokenizer.pad_token_id
)
question = tokenizer.decode(outputs[0, inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(f"Generated Question: {question}")
Evaluation Results
Qualitative Assessment
- Relevance: Generated questions are contextually appropriate
- Fluency: Natural Persian language structure
- Complexity: Appropriate difficulty level for academic content
- Diversity: Varied question types
Training Efficiency
- Convergence: Achieved stable training within 3 epochs
- Memory Efficiency: 100MB adapter vs 26GB full model
- Training Time: ~4 hours on RTX A6000
Research Applications
Academic Use Cases
- Educational Assessment: Automatic question generation for Persian scientific courses
- Literature Review: Question formulation for systematic reviews
- Research Methodology: Hypothesis generation from existing literature
- Language Technology: Advancing Persian NLP capabilities
Technical Advantages
- Domain Adaptation: Specialized for scientific vocabulary
- Efficiency: Minimal computational requirements
- Transferability: Compatible with standard PEFT infrastructure
- Scalability: Easy integration into larger NLP pipelines
Citation
For academic use, please cite:
@misc{persiansciqa-lora-2025,
title={PersianSciQA-LoRA: Scientific Question Generation for Persian Literature},
author={[Your Name]},
year={2025},
url={https://huggingface.co/YOUR_USERNAME/PersianSciQA-LoRA},
note={LoRA adapter for Persian scientific question generation based on PersianLLaMA-13B}
}
License
Released under Apache 2.0 License. Academic and research use encouraged.
Research Collaboration
We welcome collaboration from Persian language researchers, educational technology developers, and NLP researchers focusing on low-resource languages.
Advancing Persian Academic NLP Through Efficient Fine-tuning
- Downloads last month
- 118
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for safora/PersianSciQA-LoRA
Base model
ViraIntelligentDataMining/PersianLLaMA-13B