PersianSciQA-LoRA: Scientific Question Generation for Persian Literature

A specialized LoRA adapter that transforms PersianLLaMA-13B into a scientific question generation system for Persian academic abstracts.

Academic Overview

PersianSciQA-LoRA addresses the gap in Persian language processing for academic question generation. This adapter achieves specialized performance in generating relevant questions from Persian scientific abstracts across multiple domains.

Research Contributions

  • First specialized Persian question generation model for scientific literature
  • Efficient fine-tuning approach using LoRA methodology
  • Cross-domain validation across medical, engineering, and computer science abstracts
  • Significant performance improvement with minimal computational overhead

Model Specifications

Parameter Value
Base Model PersianLLaMA-13B (13 billion parameters)
Adaptation Method LoRA (Low-Rank Adaptation)
LoRA Rank (r) 32
LoRA Alpha 64
Trainable Parameters ~67M (0.5% of base model)
Target Modules Query, Key, Value, Output, Gate, Up, Down projections
Training Language Persian/Farsi
Domain Scientific Literature

Training Methodology

Dataset

  • Source: Curated Persian scientific abstracts
  • Quality Filter: Relevance scores 2-3 (high quality)
  • Domains: Medical, Engineering, Computer Science, Physics
  • Size: 18,740 high-quality abstract-question pairs

Training Configuration

  • Learning Rate: 2e-5 with cosine scheduling
  • Batch Size: Effective batch size of 8 (accumulated)
  • Epochs: 3 with early stopping
  • Precision: Mixed precision (BF16)
  • Hardware: RTX A6000 (48GB VRAM)

Performance Metrics

  • Training Loss Reduction: >30% improvement
  • Validation Stability: Consistent convergence
  • Generation Quality: Coherent, contextually relevant questions

Usage

Installation

pip install transformers peft torch

Basic Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model and adapter
base_model = AutoModelForCausalLM.from_pretrained(
    "ViraIntelligentDataMining/PersianLLaMA-13B",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "YOUR_USERNAME/PersianSciQA-LoRA")
tokenizer = AutoTokenizer.from_pretrained("ViraIntelligentDataMining/PersianLLaMA-13B")

# Generate scientific question
abstract = "Your Persian scientific abstract here"
prompt = f"چکیده: {abstract}\nسوال:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=50,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.1,
        pad_token_id=tokenizer.pad_token_id
    )

question = tokenizer.decode(outputs[0, inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(f"Generated Question: {question}")

Evaluation Results

Qualitative Assessment

  • Relevance: Generated questions are contextually appropriate
  • Fluency: Natural Persian language structure
  • Complexity: Appropriate difficulty level for academic content
  • Diversity: Varied question types

Training Efficiency

  • Convergence: Achieved stable training within 3 epochs
  • Memory Efficiency: 100MB adapter vs 26GB full model
  • Training Time: ~4 hours on RTX A6000

Research Applications

Academic Use Cases

  1. Educational Assessment: Automatic question generation for Persian scientific courses
  2. Literature Review: Question formulation for systematic reviews
  3. Research Methodology: Hypothesis generation from existing literature
  4. Language Technology: Advancing Persian NLP capabilities

Technical Advantages

  • Domain Adaptation: Specialized for scientific vocabulary
  • Efficiency: Minimal computational requirements
  • Transferability: Compatible with standard PEFT infrastructure
  • Scalability: Easy integration into larger NLP pipelines

Citation

For academic use, please cite:

@misc{persiansciqa-lora-2025,
  title={PersianSciQA-LoRA: Scientific Question Generation for Persian Literature},
  author={[Your Name]},
  year={2025},
  url={https://huggingface.co/YOUR_USERNAME/PersianSciQA-LoRA},
  note={LoRA adapter for Persian scientific question generation based on PersianLLaMA-13B}
}

License

Released under Apache 2.0 License. Academic and research use encouraged.

Research Collaboration

We welcome collaboration from Persian language researchers, educational technology developers, and NLP researchers focusing on low-resource languages.


Advancing Persian Academic NLP Through Efficient Fine-tuning

Downloads last month
118
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for safora/PersianSciQA-LoRA

Adapter
(1)
this model