Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

AAI-520 Final Project Models

This repository contains the fine-tuned models developed for the AAI-520 Final Project: SQuAD Q&A ChatBot. The models are fine-tuned on the Stanford Question Answering Dataset (SQuAD) and are designed to facilitate question-answering tasks using various architectures.

Authors

Table of Contents

Introduction

The models in this repository are part of a project aimed at developing a generative-based chatbot capable of engaging in multi-turn conversations, adapting to context, and handling a wide range of topics. By leveraging the SQuAD dataset, these models are fine-tuned to provide accurate and contextually relevant responses to user queries.

Available Models

The following fine-tuned models are available in this repository:

  1. BERT-base-cased Models

  2. DistilBERT-base-uncased Model

  3. DistilGPT-2 Model

  4. Retro-Reader Models

  5. ELECTRA Models (Recommended)

Note: We recommend using the ELECTRA models for the best performance.

Model Details

1. BERT-base-cased Model

Description: Fine-tuned the pre-trained bert-base-cased model on the SQuAD dataset for question-answering tasks.

Approach:

  • Initial Test: Trained on a subset of 1,000 data points to validate the setup.
  • Full Training: Extended training to the entire dataset after successful initial testing.

Results:

  • Training Metrics:
    • Batch Size: 8
    • Epochs: 6
    • Observations:
      • Model performance improved with more epochs but plateaued after a certain point.
      • Initial tests confirmed the feasibility of using BERT for the task.

2. DistilBERT-base-uncased Model

Description: Utilized distilbert-base-uncased, a lighter and faster version of BERT, to reduce computational resources.

Approach:

  • Trained on 10,000 data points due to resource constraints.
  • Adjusted the input formatting and preprocessing steps.

Results:

  • Challenges:
    • Encountered low accuracy and performance issues.
    • Incompatibility with the Gradio frontend hindered deployment.
  • Conclusion:
    • The model did not meet the desired performance metrics.

3. DistilGPT-2 Model

Description: Experimented with distilgpt2 to test a generative approach to question answering.

Approach:

  • Prepared input data by combining context and questions.
  • Fine-tuned the model with custom tokenization and data collators.

Results:

  • Evaluation Metrics:
    • Achieved an evaluation loss but struggled with calculating F1 and accuracy due to memory issues.
  • Challenges:
    • Resource limitations prevented extensive evaluation.
    • Model did not perform satisfactorily for the question-answering task.

4. Retro-Reader Model

Description: Implemented the Retro-Reader model, designed for machine reading comprehension tasks.

Approach:

  • Trained both the Sketchy Reading and Intensive Reading components.
  • Conducted experiments with datasets of 1,000 and 5,000 data points.

Results:

  • Performance:
    • Achieved low accuracy in both Sketchy and Intensive modes.
  • Conclusion:
    • The model did not yield better results compared to previous models.
    • Required more research and optimization to be effective.

5. ELECTRA Model

Description: Adopted ELECTRA for its efficient learning capabilities and superior performance in language understanding tasks.

Approach:

  • Trained on varying dataset sizes: 1,000, 5,000, 20,000, and the full dataset.
  • Utilized the google/electra-small-discriminator model.

Results:

  • Training Metrics:
    • Batch Size: 8
    • Epochs: 6
  • Observations:
    • Consistent improvement in performance with larger training data.
    • ELECTRA outperformed previous models, becoming the preferred choice for deployment.

Usage

Installation

To use these models, you need to have the transformers library installed:

pip install transformers

Loading a Model

You can load any of the models using the from_pretrained method:

from transformers import AutoTokenizer, AutoModelForQuestionAnswering

model_name = "zainnobody/AAI-520-Final-Project-Models/fine_tuned_electra_model_all"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForQuestionAnswering.from_pretrained(model_name)

Example Usage

from transformers import pipeline

model_name = "zainnobody/AAI-520-Final-Project-Models/fine_tuned_electra_model_all"

qa_pipeline = pipeline("question-answering", model=model_name, tokenizer=model_name)

context = "The Stanford Question Answering Dataset is a reading comprehension dataset consisting of questions posed by crowdworkers on a set of Wikipedia articles."
question = "What does SQuAD stand for?"

result = qa_pipeline(question=question, context=context)

print(f"Answer: {result['answer']}")

Output:

Answer: Stanford Question Answering Dataset

Citations

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • The models are trained and fine-tuned using resources from Hugging Face.
  • OpenAI’s ChatGPT and GitHub CoPilot were used to create, iterate, and improve code documentation. All outputs were appropriately edited and improved by the authors in the final versions.

For any questions or issues, please feel free to contact the authors or open an issue on the GitHub repository.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for zainnobody/AAI-520-Final-Project-Models

Finetuned
(6661)
this model

Dataset used to train zainnobody/AAI-520-Final-Project-Models

Space using zainnobody/AAI-520-Final-Project-Models 1