Model Card for Model ID and Description

This model have been fine-tuned using Gemma_2b_en. The data train on is a syntetic 1K QA pair generated around not diverse 10 pairs of geology in ChatGPT. The model perform well with a nbre of epoch =15.

So the training is availaible on my kaggle repo:

or my github repo

The general idea is to have a bot which assess geoscience student assessment as fast as possible with the resulting of either a pass or fail. So, if an exam is submitted, the bot will report student and predicted answers as well as the evaluation/metric between the two. Then finally, use that metric to compile whether it is a pass or fail (coming soon)

Developed by: Dr. Michel M. Nzikou
Funded by [optional]: KaggleX- Fellow cohort 4 and Google with GCP credit
Model type: Text generation models: Chatbot
Language(s) (NLP): Python 3.10, keras==3.6.0, keras_nlp==0.15.1
License: Apache 2.0
Finetuned from model [optional]: Gemma_2b_en

Sample Data

Please download the file geology-exam-test_for_gemma_model_2b_en_1000_10.json to test the UI. However, use the two questions if you are using kaggle notebook. Otherwise, create a json file similar to the downloaded file with the same structure.

"Question": "How do sedimentary rocks form?", "Response": "Sedimentary rocks form from the accumulation of sediments." "Question": "What is igneous rock formation?", "Response": "Igneous rocks form when molten rock cools and solidifies."

To test or evaluate the model, try tweaking the question and see how it respond? Please do not hesitate to contact me for further development of collaboration.

Bias, Risks, and Limitations

The smaller dataset fine-tuned is a great limitation, however, we have the pipeline ready and if you have a small set, you could use the github repo (to be filled soon) to train your model.
Bias from data generation using existing llm model. However, the sample were pre-processed before being used for fine-tuned.

How to Get Started with the Model

Test the app with your own questions, if not download it and fine-tune on top of this one. If you do so, share your variant model card.

Environmental Impact

As we know the more we use paper assessment, we have to cut more tree, so this model is a green model.

Hardware Type: [GPU T4 *2]
Hours used: [5hours]
Cloud Provider: [Kaggle]
Compute Region: [AU]
Carbon Emitted: [CO2 emission to fill in the gap here :)]

Model Card Authors [optional]

Dr. Michel M. Nzikou, Research Fellow, Center of Exploration Targeting, UWA, Perth, Australia

Model Card Contact

[email protected]/[email protected]

ShebMichel
/

geobot_teacher-v0