Kishan25/Story_Summarizer

Model Card: BART-Large-CNN with LoRA Fine-Tuning for Movie Plot Summarization

Model Overview --> This model, based on the facebook/bart-large-cnn architecture, has been fine-tuned using Low-Rank Adaptation (LoRA) on the vishnupriyavr/wiki-movie-plots-with-summaries dataset to generate concise and coherent summaries of movie plots. Designed for efficient text summarization, the model excels at processing long input sequences, particularly those exceeding 300 words, with a maximum input length of 1024 tokens. It produces high-quality summaries with a maximum output length of 128 tokens, making it ideal for applications requiring succinct yet informative plot descriptions.

Model Architecture--> The model leverages the facebook/bart-large-cnn backbone, a transformer-based sequence-to-sequence architecture pre-trained on the CNN/DailyMail dataset for abstractive summarization. BART (Bidirectional and Auto-Regressive Transformer) combines a bidirectional encoder with an autoregressive decoder, enabling robust understanding and generation of text. The base model comprises approximately 406 million parameters, as detailed below:

Total Parameters: 406,291,456 Trainable Parameters (Pre-LoRA): 406,291,456

To enhance efficiency and reduce computational requirements, we applied LoRA, a parameter-efficient fine-tuning technique that introduces low-rank updates to specific weight matrices.

LoRA Configuration --> The LoRA configuration targets key attention modules in the BART architecture, ensuring minimal interference with the pre-trained weights while adapting the model for movie plot summarization:

Rank (r): 32 LoRA Alpha: 32 Target Modules: self_attn.q_proj, self_attn.v_proj, encoder_attn.q_proj, encoder_attn.v_proj Dropout: 0.05 Bias: None Task Type: Sequence-to-Sequence Language Modeling (TaskType.SEQ_2_SEQ_LM)

Dataset --> The model was fine-tuned on the vishnupriyavr/wiki-movie-plots-with-summaries dataset available on Huggingface, which contains a rich collection of movie plots paired with human-written summaries. T

Training Details --> The fine-tuning process was conducted using the Hugging Face transformers library, with the following training configuration: Output Directory: ./bart_lora_finetuned

Learning Rate: 2e-5 Per-Device Train Batch Size: 2 Per-Device Evaluation Batch Size: 2 Number of Epochs: 2 Weight Decay: 0.01 Save Total Limit: 2 checkpoints Mixed Precision (FP16): Disabled (optimized for compatibility with Mac M2 hardware) Evaluation Strategy: Disabled (to prioritize training speed; validation can be enabled as needed) Random Seed: 42 (for reproducibility)

The Trainer API from Hugging Face was used to orchestrate the training process, leveraging the tokenized dataset and LoRA-adapted model. The training was performed on a subset of the dataset to ensure computational efficiency, with the model achieving robust performance on long-input summarization tasks.

Future Improvements --> A new version of this model will be released in September 2025 that incorporates Reinforcement Learning from Human Feedback (RLHF)

Kishan25
/

Story_Summarizer

Model tree for Kishan25/Story_Summarizer