{"cells":[{"cell_type":"markdown","metadata":{},"source":["# Introduction\n","## Gemma Fine-tuning for Telugu News article's Interesting heading genration\n","\n","In this notebook, we'll finetune the GEMMA-2 model on telugu news articles for interacting, interesting and factual headline genration.\n","\n","### Table of Contents:\n"," \n","1. Dataset Info
\n","2. Package Installation and Importing
\n","3. Data Loading
\n","4. Data Preprocessing for Training
\n","5. Loading the Gemma Model
\n","7. Q & A Results Before Finetuning
\n","7. Applying Gemma LoRA
\n","8. Training Gemma
\n","9. Q & A Results After Finetuning
\n","10. Conclusion
\n","\n","### Dataset Used\n","- [Telugu News articles](https://www.kaggle.com/datasets/chinmayadatt/dataset-python-question-answer) : This dataset is about Python programming. Question and answers are generated using Gemma. There are more than four hundred questions and their corresponding answers about Python programming.\n","\n","---"]},{"cell_type":"markdown","metadata":{},"source":["# 1.Telugu News aricles dataset\n","\n","**To be added**\n","\n","### Inputs and Outputs\n","\n","- **Input**: Gemma models take in text strings, which can range from questions and prompts to longer documents that require summarization.\n","- **Output**: In response, they generate text in English, offering answers, summaries, or other forms of text-based output, tailored to the input provided.\n"]},{"cell_type":"markdown","metadata":{},"source":["# 2. Package Installation and Importing\n","\n","Before we start, it's essential to install all necessary packages, including Gemma itself. This part will cover the installation process step by step."]},{"cell_type":"code","execution_count":4,"metadata":{"execution":{"iopub.execute_input":"2024-04-13T17:18:21.704195Z","iopub.status.busy":"2024-04-13T17:18:21.703816Z","iopub.status.idle":"2024-04-13T17:18:48.412834Z","shell.execute_reply":"2024-04-13T17:18:48.411516Z","shell.execute_reply.started":"2024-04-13T17:18:21.704164Z"},"trusted":true},"outputs":[],"source":["import os\n","\n","os.environ['CUDA_VISIBLE_DEVICES'] = '1,2,3,4,5'\n","# Install specific versions of PEFT, evaluate, transformers, accelerate, and bitsandbytes packages quietly without showing output.\n","# %pip install -q -U peft evaluate transformers accelerate bitsandbytes evaluate\n","#!pip3 install torch==2.0.1\n","# Upgrade and quietly install the latest versions of the trl and datasets packages.\n","#%pip install -U -q trl datasets"]},{"cell_type":"markdown","metadata":{},"source":["### Package Description\n","\n","#### python basic module\n","- `os`: Provides ways to interact with the operating system and its environment variables.\n","- `torch`: PyTorch library for deep learning applications.\n","- `numpy`: Essential library for linear algebra and mathematical operations.\n","- `pandas`: Powerful data processing tool, ideal for handling CSV files and other forms of structured data.\n","\n","#### transformers module\n","- `AutoTokenizer`: Used to automatically load a pre-trained tokenizer.\n","- `AutoModelForCausalLM`: Used to automatically load pre-trained models for causal language modeling.\n","- `BitsAndBytesConfig`: Configuration class for setting up the Bits and Bytes tokenizer.\n","- `AutoConfig`: Used to automatically load the model's configuration.\n","- `TrainingArguments`: Defines arguments for training setup.\n","\n","#### datasets module\n","- `Dataset`: A class for handling datasets.\n","\n","#### peft module\n","- `LoraConfig`: A configuration class for configuring the Lora model.\n","- `PeftModel`: A class that defines the PEFT model.\n","- `prepare_model_for_kbit_training`: A function that prepares a model for k-bit training.\n","- `get_peft_model`: Function to get the PEFT model.\n","\n","#### trl module\n","- `SFTTrainer`: Trainer class for SFT (Supervised Fine-Tuning) training.\n","\n","#### IPython.display module\n","- `Markdown`: Used to output text in Markdown format.\n","- `display`: Used to display objects in Jupyter notebooks."]},{"cell_type":"code","execution_count":2,"metadata":{},"outputs":[],"source":["# !python -m pip uninstall torch torchvision torchaudio\n","# !python -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121"]},{"cell_type":"code","execution_count":3,"metadata":{"execution":{"iopub.execute_input":"2024-04-13T17:18:48.416472Z","iopub.status.busy":"2024-04-13T17:18:48.415542Z","iopub.status.idle":"2024-04-13T17:18:48.423236Z","shell.execute_reply":"2024-04-13T17:18:48.422147Z","shell.execute_reply.started":"2024-04-13T17:18:48.416430Z"},"trusted":true},"outputs":[{"name":"stderr","output_type":"stream","text":["2025-01-06 17:39:47.044925: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n","2025-01-06 17:39:47.056750: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n","2025-01-06 17:39:47.068831: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n","2025-01-06 17:39:47.072356: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n","2025-01-06 17:39:47.083817: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n","To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n","2025-01-06 17:39:47.865939: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n"]}],"source":["import os\n","\n","import torch\n","\n","import numpy as np\n","import pandas as pd\n","\n","from transformers import (AutoTokenizer, \n"," AutoModelForCausalLM, \n"," BitsAndBytesConfig, \n"," AutoConfig,\n"," TrainingArguments)\n","\n","from datasets import Dataset\n","from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, get_peft_model\n","from trl import SFTTrainer\n","from IPython.display import Markdown, display"]},{"cell_type":"code","execution_count":4,"metadata":{"execution":{"iopub.execute_input":"2024-04-13T17:18:48.424852Z","iopub.status.busy":"2024-04-13T17:18:48.424509Z","iopub.status.idle":"2024-04-13T17:18:48.435321Z","shell.execute_reply":"2024-04-13T17:18:48.434352Z","shell.execute_reply.started":"2024-04-13T17:18:48.424820Z"},"trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":["CUDA is available\n"]}],"source":["# Disable CA bundle check. Useful in certain environments where you may encounter SSL errors.\n","os.environ['CURL_CA_BUNDLE'] = ''\n","\n","# Set the order of devices as seen by CUDA to PCI bus ID order. This is to ensure consistency in device selection.\n","os.environ[\"CUDA_DEVICE_ORDER\"] = \"PCI_BUS_ID\"\n","\n","# Check if CUDA is available, and if so, specify which GPU(s) to be made visible to the process.\n","if torch.cuda.is_available():\n"," print(\"CUDA is available\")\n","else:\n"," print(\"CUDA is not available\")"]},{"cell_type":"markdown","metadata":{},"source":["A tool for tracking and visualizing Machine Learning experiments. Wandb helps you easily manage metrics, hyperparameters, experiment code, and model artifacts during model training.
\n","wandb github"]},{"cell_type":"code","execution_count":5,"metadata":{"execution":{"iopub.execute_input":"2024-04-13T17:18:48.437828Z","iopub.status.busy":"2024-04-13T17:18:48.437549Z","iopub.status.idle":"2024-04-13T17:18:48.468149Z","shell.execute_reply":"2024-04-13T17:18:48.467256Z","shell.execute_reply.started":"2024-04-13T17:18:48.437804Z"},"trusted":true},"outputs":[{"data":{"text/html":[""],"text/plain":[""]},"execution_count":5,"metadata":{},"output_type":"execute_result"}],"source":["# Wandb for experiment tracking\n","import wandb\n","\n","# Initialize Weights & Biases (wandb) for experiment tracking.\n","# If a wandb account exists, it can typically be used by specifying project and entity.\n","# However, for this example, we're disabling wandb to ignore it by setting mode to \"disabled\".\n","wandb.init(mode=\"disabled\")"]},{"cell_type":"markdown","metadata":{},"source":["# 3. Data Loading\n","\n","Loading your data is the first step in the machine learning pipeline. This section will guide you through loading your dataset into the Jupyter notebook environment.\n","\n","## Why this Datset?\n","We chose one part of the Telugu language for this challenge because, despite being one of the oldest languages, very little of it has been digitalized.Telugu newspapers, which frequently have a small political bent in favor of local politicians, offer a lot of information in a rich language that is connected to the present situation. \n","\n","Telugu newspapers stand out for their catchy headlines that make readers rush to read the full story. While English headlines are direct, Telugu news headlines (Mostly Eenadu paper) add drama through clever wordplay. These headlines aren't just clickbait - they reflect deep cultural understanding and creative expression unique to Telugu media. Local reporters developed this art form over decades, turning daily news into memorable stories.\n","\n","\n","\n","Some of these example include this type headline where the users are most excited to read the story because of the titles cleverness.\n","\n","\n","#### \"గచ్చిబౌలి జంక్షన్ లో లారీ డ్రైవర్ గారి ఐటం షో\"\n","#### \"ట్రాఫిక్ ఎస్.ఐ గారు బిపి తో హాస్పిటల్ లో\"\n","#### \"బిర్యాని ప్యాకెట్ లో దొంగల గ్యాంగ్! బావార్చి గారి ఇన్వెస్టిగేషన్\"\n","#### \"కస్టమర్ కి పరిగెట్టు పరుగే\"\n","#### \"సిసిటివిలో కనపడ్డ మిస్టరీ గాంగ్ లీడర్\"\n","#### \"నిద్రపోయే ఎమ్.ఎల్.ఏ గారికి షాక్ ఇచిన కొత్త స్కీమ్\"\n","#### \"అసెంబ్లీ లో కుర్చి మీద స్నోరింగ్ సౌండ్స్\"\n","#### \"ఆపోజిషన్ లీడర్ గారి ఫోన్ తో వీడియో వైరల్\"\n","#### \"క్రికెట్ మ్యాచ్ లో జరిగిన రొమాన్స్\""]},{"cell_type":"markdown","metadata":{},"source":["Now, Lets Load the curated Telugu news articles dataset from hugginface "]},{"cell_type":"code","execution_count":6,"metadata":{"execution":{"iopub.execute_input":"2024-04-13T17:18:48.509047Z","iopub.status.busy":"2024-04-13T17:18:48.508766Z","iopub.status.idle":"2024-04-13T17:18:48.519507Z","shell.execute_reply":"2024-04-13T17:18:48.518592Z","shell.execute_reply.started":"2024-04-13T17:18:48.509022Z"},"trusted":true},"outputs":[{"data":{"text/plain":["{'story_id': 212635,\n"," 'headline': 'ఆంజనేయస్వామి ఆలయాన్ని ఢీకొట్టిన లారీ',\n"," 'article': 'ప్రకాశం జిల్లాలో ఆంజనేయస్వామి ఆలయాన్ని ఓ లారీ ఢీకొట్టింది . ఈ ఘటనలో ఇద్దరు దుర్మరణం చెందారు . ప్రకాశం : జిల్లాలో ఘోర రోడ్డు ప్రమాదం జరిగింది . ఆంజనేయస్వామి ఆలయాన్ని ఓ లారీ ఢీకొట్టింది . ఈ ఘటనలో ఇద్దరు దుర్మరణం చెందారు . పోలీసుల కథనం ప్రకారం… విజయవాడ నుంచి ఒంగోలుకు వెళ్తున్న లారీ మార్గంమధ్యలో మార్చి 9 శనివారం తెల్లవారుజామున అద్దంకి మండలం వెంకటాపురం గ్రామం వద్ద ఒంగోలు - విజయవాడ నేషనల్ హైవే పక్కన గల ఆంజనేయస్వామి ఆలయాన్ని ఢీకొట్టింది . దీంతో లారీ డ్రైవర్ , క్లీనర్ కు\\xa0తీవ్ర గాయాలు కావడంతో అక్కడికక్కడే మృతి చెందారు . మృతదేహాలు లారీ క్యాబిన్లో ఇరుక్కుపోవడంతో స్థానికులు పోలీసుల సాయంతో బయటకు తీశారు . నిద్ర మత్తు కారణంగా ప్రమాదం జరిగి ఉండవచ్చని పోలీసులు భావిస్తున్నారు . లారీ బిహార్కు చెందినదిగా గుర్తించారు . పోస్టుమార్టం కోసం మృతదేహాలను అద్దంకి ప్రభుత్వ ఆస్పత్రికి తరలించారు . పోలీసులు కేసు నమోదు చేసుకుని దర్యాప్తు చేస్తున్నారు .'}"]},"execution_count":6,"metadata":{},"output_type":"execute_result"}],"source":["# load dataset from huggingface called saidines12/telugu_news_dataset\n","from datasets import load_dataset\n","\n","dataset = load_dataset('saidines12/telugu_news_dataset',\n"," trust_remote_code=True\n"," )\n","dataset['validation'][10]"]},{"cell_type":"markdown","metadata":{},"source":["# 4. Data Preprocessing for Training\n","\n","Before initiating the training process with Google's Gemma, a pivotal step involves the preparation of our dataset. The core of this stage is to align our dataset with the specifications required by Gemma, ensuring optimal compatibility and efficiency in training. The process commences with the strategic manipulation of our dataset, specifically focusing on the 'Question' and 'Answer' columns. These columns are instrumental as we meticulously combine them to form comprehensive training examples, thereby facilitating a seamless training experience.\n","\n","A critical aspect to acknowledge during data preprocessing is the management of data length. Given that the Gemma model operates as a Large Language Model (LLM), it's imperative to assess the length of our training data. Training with excessively lengthy data could impose substantial demands on GPU resources, potentially hindering the efficiency of the process. To circumvent this challenge and optimize resource utilization, we advocate for the exclusion of unduly long data from the training set. This strategic decision not only preserves GPU resources but also ensures a more streamlined and effective training workflow."]},{"cell_type":"code","execution_count":7,"metadata":{"execution":{"iopub.execute_input":"2024-04-13T17:18:48.521560Z","iopub.status.busy":"2024-04-13T17:18:48.521215Z","iopub.status.idle":"2024-04-13T17:18:48.535154Z","shell.execute_reply":"2024-04-13T17:18:48.534281Z","shell.execute_reply.started":"2024-04-13T17:18:48.521525Z"},"trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":["Average length of 'Question and Answer' in original dataset: 1253.3300860897145\n","Shortest length of 'Question and Answer' in original dataset: 46\n","Longest length of 'Question and Answer' in original dataset: 14611\n"]}],"source":["question_column = \"article\"\n","answer_column = \"headline\"\n","data_column = \"text\"\n","\n","\n","original_data = dataset['train']\n","\n","# get average, SHORTEST, longest length of 'Question and Answer' in original dataset\n","lengths = [len(x[question_column] + x[answer_column]) for x in original_data]\n","average_length = np.mean(lengths)\n","shortest_length = np.min(lengths)\n","longest_length = np.max(lengths)\n","\n","# Print the statistics\n","print(\"Average length of 'Question and Answer' in original dataset:\", average_length)\n","print(\"Shortest length of 'Question and Answer' in original dataset:\", shortest_length)\n","print(\"Longest length of 'Question and Answer' in original dataset:\", longest_length)"]},{"cell_type":"markdown","metadata":{},"source":["Check The news headline with article after processing"]},{"cell_type":"code","execution_count":8,"metadata":{},"outputs":[{"data":{"text/plain":["{'story_id': 2480992,\n"," 'headline': 'పంత్.. ఓ అద్భుతం: అక్రమ్',\n"," 'article': 'కరాచీ: ఘోర ప్రమాదం నుంచి కోలుకుని తిరిగి అంతర్జాతీయ క్రికెట్ ఆడుతున్న భారత వికెట్ కీపర్ రిషభ్ పంత్ ఓ అద్భుతమని పాకిస్థాన్ మాజీ కెప్టెన్ వసీమ్ అక్రమ్ కొనియాడాడు. ‘రోడ్డు ప్రమాదం తర్వాత ఎవరికైనా కోలుకునేందుకు చాలా సమయం పడుతుంది. ఇక ఆటగాడికైతే మరింత కష్టంగా ఉంటుంది. కానీ పంత్ అలా కాదు. నిజంగా తను మిరాకిల్ కిడ్. అతడిని యువతరం ఆదర్శంగా తీసుకోవాల్సిందే. ఐపీఎల్, టీ20 ప్రపంచకప్లోనూ ప్రభావం చూపి ఇప్పుడు టెస్టుల్లోనూ ఆకట్టుకుంటున్నాడు. ఆసీస్తో టెస్టు సిరీస్లోనూ తను కీలకం కానున్నాడు’ అని అక్రమ్ ప్రశంసించాడు.'}"]},"execution_count":8,"metadata":{},"output_type":"execute_result"}],"source":["original_data[10]"]},{"cell_type":"markdown","metadata":{},"source":["# 5. Loading the Gemma Model\n","\n","Here, we'll cover how to load the Gemma model so it's ready for finetuning. This includes where to download the model from and how to load it into your notebook."]},{"cell_type":"markdown","metadata":{},"source":["### Adding the Gemma Model\n","1. Still in the \"Input\" section of the right-side menu in your Kaggle notebook, click on the \"+ Add Input\" button again.\n","2. Below the search bar that appears, click on the \"Models\" option.\n","3. In the search bar, type \"Gemma\" to find the model.\n","4. From the filtered results, select the Gemma model by clicking on the \"+\" button next to it. Make sure to choose the correct version by noting the framework as \"Transformers\", the variation as \"2b-it\", and the version as \"v3\".\n","5. After selecting the correct Gemma model, click on \"Add Model\" at the bottom.\n","6. The Gemma model, specifically \"Gemma.v3\", should now be listed under the \"Models\" subsection of the \"Input\" section in the right-side menu of your notebook, indicating successful addition.\n","\n","**Note** we are using full version as our finetuning cluster supports it."]},{"cell_type":"markdown","metadata":{},"source":["### BitsAndBytesConfig Overview\n","\n","`BitsAndBytesConfig` is a configuration class provided by the `transformers` library, which is designed for controlling the behavior of model quantization and optimization during both the training and inference phases of model deployment. Quantization is a technique used to reduce the memory footprint and computational requirements of deep learning models by representing model weights and activations in lower-precision data types, such as 8-bit integers (`int8`) or even 4-bit representations.\n","\n","#### Benefits of Quantization\n","\n","The primary benefits of quantization include:\n","\n","- **Reduced Memory Usage**: Lower-precision representations require less memory, enabling the deployment of larger models on devices with limited memory capacity.\n","- **Increased Inference Speed**: Operations with lower-precision data types can be executed faster, thus speeding up the inference time.\n","- **Energy Efficiency**: Reduced computational requirements translate to lower energy consumption, which is crucial for mobile and embedded devices.\n","\n","#### `BitsAndBytesConfig` Parameters\n","\n","In the context of the `transformers` library, `BitsAndBytesConfig` allows users to configure the quantization behavior specifically for using the `bitsandbytes` backend. Below is an example configuration along with comments explaining each parameter:\n"]},{"cell_type":"code","execution_count":9,"metadata":{"execution":{"iopub.execute_input":"2024-04-13T17:18:48.599133Z","iopub.status.busy":"2024-04-13T17:18:48.598282Z","iopub.status.idle":"2024-04-13T17:18:55.870752Z","shell.execute_reply":"2024-04-13T17:18:55.869804Z","shell.execute_reply.started":"2024-04-13T17:18:48.599089Z"},"trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":["device: cuda\n"]},{"name":"stderr","output_type":"stream","text":["`config.hidden_act` is ignored, you should use `config.hidden_activation` instead.\n","Gemma's activation function will be set to `gelu_pytorch_tanh`. Please, use\n","`config.hidden_activation` if you want to override this behaviour.\n","See https://github.com/huggingface/transformers/pull/29402 for more details.\n"]},{"data":{"application/vnd.jupyter.widget-view+json":{"model_id":"01d901a040c840839c11feeb56dd7a66","version_major":2,"version_minor":0},"text/plain":["Loading checkpoint shards: 0%| | 0/2 [00:00"]},"execution_count":13,"metadata":{},"output_type":"execute_result"}],"source":["instruction = \"ఘోర ప్రమాదం నుంచి కోలుకుని తిరిగి అంతర్జాతీయ క్రికెట్ ఆడుతున్న భారత వికెట్ కీపర్ రిషభ్ పంత్ ఓ అద్భుతమని పాకిస్థాన్ మాజీ కెప్టెన్ వసీమ్ అక్రమ్ కొనియాడాడు. ‘రోడ్డు ప్రమాదం తర్వాత ఎవరికైనా కోలుకునేందుకు చాలా సమయం పడుతుంది. ఇక ఆటగాడికైతే మరింత కష్టంగా ఉంటుంది. కానీ పంత్ అలా కాదు. నిజంగా తను మిరాకిల్ కిడ్. అతడిని యువతరం ఆదర్శంగా తీసుకోవాల్సిందే. ఐపీఎల్, టీ20 ప్రపంచకప్లోనూ ప్రభావం చూపి ఇప్పుడు టెస్టుల్లోనూ ఆకట్టుకుంటున్నాడు. ఆసీస్తో టెస్టు సిరీస్లోనూ తను కీలకం కానున్నాడు’ అని అక్రమ్ ప్రశంసించాడు. \"\n","\n","\n","prompt = template.format(\n"," article=instruction,\n"," response=\"\",\n",")\n","\n","# RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!\n","response_text = generate_response(model, tokenizer, prompt, device, 256)\n","\n","Markdown(response_text)"]},{"cell_type":"markdown","metadata":{},"source":["The 2B model doesn't seem to understand the instruction and keep repeating the context in telugu. The 9B gemma2 is better at understanding telugu but 2B is not. Lets experiment smaller model on the fintuning with telugu news dataset and compare the results\n"]},{"cell_type":"markdown","metadata":{},"source":["# 7. Applying Gemma LoRA\n","\n","In this Session, we'll be applying the LoRA (**Low-Rank Adaptation**) technique to the **Gemma model**, a method designed to make fine-tuning large models like Gemma both **fast and efficient**. LoRA, a part of **PEFT** (**Parameter Efficient Fine-Tuning**), focuses on updating specific parts of a pre-trained model by only training a select few dense layers. This drastically cuts down on the computational demands and GPU memory needs, all without adding any extra time to the inference process. Here's what makes LoRA so powerful for our purposes:\n","\n","

\n","Paper: LoRA: Low-Rank Adaptation of Large Language Models
\n","\n","- **Dramatically reduces the number of parameters** needed, by up to **10,000 times**.\n","- **Cuts down GPU memory usage** by **three times**.\n","- **Maintains quick inference times** with **no additional latency**.\n","\n","The essence of PEFT, and by extension LoRA, is to enhance a model's performance using minimal resources, focusing on fine-tuning a handful of parameters for specific tasks. This technique is particularly advantageous as it:\n"," \n","- Optimizes rank decomposition matrices, maintaining the original model weights while adding optimized low-rank weights **A** and **B**.\n","- Allows for up to **threefold reductions** in both time and computational costs.\n","- Enables easy swapping of the LoRA module (weights **A** and **B**) according to the task at hand, lowering storage requirements and avoiding any increase in inference time.\n","\n","When applied specifically to **Transformer architectures**, targeting **attention weights** and keeping MLP modules static, LoRA significantly enhances the model's efficiency. For instance, in GPT-3 175B models, it:\n"," \n","- **Reduces VRAM usage** from **1.2TB to 350GB**.\n","- **Lowers checkpoint size** from **350GB to 35MB**.\n","- **Boosts training speed** by approximately **25%**.\n","\n","By integrating LoRA into Gemma, we aim to streamline the model's fine-tuning process in this Session, making it quicker and more resource-efficient, without compromising on performance."]},{"cell_type":"code","execution_count":14,"metadata":{"execution":{"iopub.execute_input":"2024-04-13T17:19:02.046698Z","iopub.status.busy":"2024-04-13T17:19:02.046069Z","iopub.status.idle":"2024-04-13T17:19:02.051888Z","shell.execute_reply":"2024-04-13T17:19:02.050874Z","shell.execute_reply.started":"2024-04-13T17:19:02.046662Z"},"trusted":true},"outputs":[],"source":["# LoRA configuration: Sets up the parameters for Low-Rank Adaptation, which is a method for efficient fine-tuning of transformers.\n","# USE LORA for saving memory and computation\n","lora_config = LoraConfig(\n"," r = 8, # Rank of the adaptation matrices. A lower rank means fewer parameters to train.\n"," target_modules = [\"q_proj\", \"o_proj\", \"k_proj\", \"v_proj\",\n"," \"gate_proj\", \"up_proj\", \"down_proj\"], # Transformer modules to apply LoRA.\n"," task_type = \"CAUSAL_LM\", # The type of task, here it is causal language modeling.\n",")"]},{"cell_type":"markdown","metadata":{},"source":["# 8.Evaluation Metrics"]},{"cell_type":"code","execution_count":15,"metadata":{},"outputs":[],"source":["# create evaluation metric ROUGE score for telugu language\n","import evaluate\n","\n","metric = evaluate.load(\"rouge\")\n","\n","# rouge metric formula\n","def compute_metrics(eval_pred):\n"," predictions, labels = eval_pred\n"," return metric.compute(predictions=predictions, references=labels)"]},{"cell_type":"markdown","metadata":{},"source":["# 8. Training Gemma\n","\n","Now that everything is set up, it's time to finetune the Gemma model on your data. This section will guide you through the training process, including setting up your training loop and selecting the right hyperparameters."]},{"cell_type":"code","execution_count":16,"metadata":{"execution":{"iopub.execute_input":"2024-04-13T17:19:02.053215Z","iopub.status.busy":"2024-04-13T17:19:02.052920Z","iopub.status.idle":"2024-04-13T17:19:02.068689Z","shell.execute_reply":"2024-04-13T17:19:02.067788Z","shell.execute_reply.started":"2024-04-13T17:19:02.053190Z"},"trusted":true},"outputs":[],"source":["def formatting_func(examples):\n"," \"\"\"\n"," Formats a given example (a dictionary containing question and answer list) using the predefined template.\n"," \n"," Parameters:\n"," - example (dict): A dictionary with keys corresponding to the columns of the dataset, such as 'article' and 'response'.\n"," \n"," Returns:\n"," - list: A list containing a single formatted string that combines the instruction and the response.\n"," \"\"\"\n"," # Add the phrase to verify training success and format the text using the template and the specific example's instruction and response.\n"," # we have to return list of strings example[question_column] and example[answer_column] are list of strings\n"," articles = examples[question_column]\n"," responses = examples[answer_column]\n"," inputs = []\n"," for i in range(len(articles)):\n"," inputs.append(template.format(article=articles[i], response=responses[i]))\n","\n"," #line = template.format(instruction=example[question_column], response=example[answer_column])\n"," return inputs\n"]},{"cell_type":"code","execution_count":17,"metadata":{"execution":{"iopub.execute_input":"2024-04-13T17:19:02.070337Z","iopub.status.busy":"2024-04-13T17:19:02.069881Z","iopub.status.idle":"2024-04-13T17:19:04.066929Z","shell.execute_reply":"2024-04-13T17:19:04.065995Z","shell.execute_reply.started":"2024-04-13T17:19:02.070304Z"},"trusted":true},"outputs":[{"name":"stderr","output_type":"stream","text":["huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n","To disable this warning, you can either:\n","\t- Avoid using `tokenizers` before the fork if possible\n","\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"]},{"name":"stderr","output_type":"stream","text":["/home/watchtower/.pyenv/versions/3.10.2/envs/venv1/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:300: UserWarning: You passed a processing_class with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `processing_class.padding_side = 'right'` to your code.\n"," warnings.warn(\n"]}],"source":["!rm -rf outputs\n","os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'max_split_size_mb:128'\n","# Setup for the trainer object that will handle fine-tuning of the model.\n","trainer = SFTTrainer(\n"," model=model, # The pre-trained model to fine-tune.\n"," train_dataset=dataset['train'], # The dataset used for training(83k)\n"," eval_dataset=dataset['validation'], # The dataset used for validation(10k)\n"," max_seq_length=512, # The maximum sequence length for the model inputs.\n"," compute_metrics=compute_metrics,\n"," args=TrainingArguments( # Arguments for training setup.\n"," per_device_train_batch_size= 4 , # Batch size per device (e.g., GPU).\n"," #gradient_accumulation_steps=4, # Number of steps to accumulate gradients before updating model weights.\n"," warmup_steps=10, # Number of steps to gradually increase the learning rate at the beginning of training.\n"," max_steps=10000, # Total number of training steps to perform.\n"," learning_rate=2e-4, # Learning rate for the optimizer.\n"," fp16=True, # Whether to use 16-bit floating point precision for training. False means 32-bit is used.\n"," logging_steps=1, # How often to log training information.\n"," output_dir=\"outputs\", # Directory where training outputs will be saved.\n"," eval_strategy=\"steps\",\n"," per_device_eval_batch_size=4,\n"," gradient_checkpointing=True, # Enable gradient checkpointing to save memory.\n"," #optim=\"paged_adamw_8bit\", # The optimizer to use, with 8-bit precision for efficiency.\n"," eval_accumulation_steps = 4, # FIX for evaluation https://discuss.huggingface.co/t/cuda-out-of-memory-when-using-trainer-with-compute-metrics/2941/3\n"," eval_steps=2000\n"," \n"," ),\n"," # peft_config=lora_config, # For The LoRA configuration for efficient fine-tuning.\n"," formatting_func=formatting_func, # The function to format the dataset examples.\n"," \n",")\n"]},{"cell_type":"code","execution_count":18,"metadata":{"execution":{"iopub.execute_input":"2024-04-13T17:19:04.068824Z","iopub.status.busy":"2024-04-13T17:19:04.068476Z","iopub.status.idle":"2024-04-13T17:19:46.369697Z","shell.execute_reply":"2024-04-13T17:19:46.368790Z","shell.execute_reply.started":"2024-04-13T17:19:04.068791Z"},"scrolled":true,"trusted":true},"outputs":[{"name":"stderr","output_type":"stream","text":["\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[33mWARNING\u001b[0m The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.\n","`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.\n"]},{"data":{"text/html":["\n","
\n"," \n"," \n"," [ 1856/10000 29:54 < 2:11:22, 1.03 it/s, Epoch 0.09/1]\n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
StepTraining LossValidation Loss

"],"text/plain":[""]},"metadata":{},"output_type":"display_data"}],"source":["# train the model to the processed data.\n","trainer.train()"]},{"cell_type":"code","execution_count":1,"metadata":{},"outputs":[{"ename":"NameError","evalue":"name 'trainer' is not defined","output_type":"error","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)","Cell \u001b[0;32mIn[1], line 2\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;66;03m# Push the model to huggingface under my user name saidies12 and model name telugu-news-headline-generation\u001b[39;00m\n\u001b[0;32m----> 2\u001b[0m \u001b[43mtrainer\u001b[49m\u001b[38;5;241m.\u001b[39mpush_to_hub(\n\u001b[1;32m 3\u001b[0m repository_name\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124msaidines12/telugu-news-headline-generation\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m 4\u001b[0m )\n","\u001b[0;31mNameError\u001b[0m: name 'trainer' is not defined"]}],"source":["# Push the model to huggingface under my user name saidies12 and model name telugu-news-headline-generation\n","trainer.push_to_hub(\n"," repository_name=\"saidines12/telugu-news-headline-generation\",\n",")"]},{"cell_type":"markdown","metadata":{},"source":["# 9. Q&A Results After Finetuning\n","\n","After training, let's see how much our Gemma model has improved. We'll rerun the question-answering test and compare the results to the pre-finetuning performance."]},{"cell_type":"code","execution_count":null,"metadata":{"execution":{"iopub.execute_input":"2024-04-13T17:19:46.371158Z","iopub.status.busy":"2024-04-13T17:19:46.370828Z","iopub.status.idle":"2024-04-13T17:19:55.489935Z","shell.execute_reply":"2024-04-13T17:19:55.489009Z","shell.execute_reply.started":"2024-04-13T17:19:46.371132Z"},"trusted":true},"outputs":[{"data":{"text/markdown":["\n","రోడ్డు ప్రమాదం తర్వాత కోలుకున్నాడు రిషభ్ పంత్\n","రోడ్డు ప్ర"],"text/plain":[""]},"execution_count":25,"metadata":{},"output_type":"execute_result"}],"source":["instruction = \"అక్రమ్ కరాచీ ఘోర ప్రమాదం నుంచి కోలుకుని తిరిగి అంతర్జాతీయ క్రికెట్ ఆడుతున్న భారత వికెట్ కీపర్ రిషభ్ పంత్ ఓ అద్భుతమని పాకిస్థాన్ మాజీ కెప్టెన్ వసీమ్ అక్రమ్ కొనియాడాడు. ‘రోడ్డు ప్రమాదం తర్వాత ఎవరికైనా కోలుకునేందుకు చాలా సమయం పడుతుంది. ఇక ఆటగాడికైతే మరింత కష్టంగా ఉంటుంది. కానీ పంత్ అలా కాదు. నిజంగా తను మిరాకిల్ కిడ్. అతడిని యువతరం ఆదర్శంగా తీసుకోవాల్సిందే. ఐపీఎల్, టీ20 ప్రపంచకప్లోనూ ప్రభావం చూపి ఇప్పుడు టెస్టుల్లోనూ ఆకట్టుకుంటున్నాడు. ఆసీస్తో టెస్టు సిరీస్లోనూ తను కీలకం కానున్నాడు’ అని అక్రమ్ ప్రశంసించాడు. \"\n","\n","\n","prompt = template.format(\n"," article=instruction,\n"," response=\"\",\n",")\n","\n","response_text = generate_response(trainer.model, tokenizer, prompt, device,32)\n","# TODO: Fix repitition of response\n","\n","Markdown(response_text)"]},{"cell_type":"markdown","metadata":{},"source":["**Although** the performance of the Gemma2B model okay, it is still better headline than reapeating the article from the last result. There is big room for improvement as we are using LORA with quantization. first try without LORA, and the performance doesn't match expected then Ramp up to bigger gemma 9B model which is really good at understanding telugu and instruction following. "]},{"cell_type":"markdown","metadata":{},"source":["# 10. Conclusion\n","\n","In this beginner-friendly notebook, we've outlined the process of fine-tuning the Gemma model, a Large Language Model (LLM), specifically for Python Q&A generation. Starting from data loading and preprocessing, we've demonstrated how to train the Gemma model effectively, even for those new to working with LLMs.\n","\n","We leveraged the Dataset_Python_Question_Answer, featuring hundreds of Python programming questions and answers, to train and refine the Gemma model's capabilities in generating accurate Q&As. This journey, while introductory, underscores the potential and straightforward path to engaging with LLMs through the Gemma model.\n","\n","Achieving the best performance with the Gemma model (or any LLM) generally requires training with more extensive datasets and over more epochs. Future enhancements could include integrating Retrieval-Augmented Generation (RAG) and Direct Preference Optimization (DPO) training techniques, offering a way to further improve the model by incorporating external knowledge bases for more precise and relevant responses.\n","\n","Ultimately, this notebook is designed to make the Gemma model approachable for beginners, illustrating that straightforward steps can unlock the potential of LLMs for diverse domain-specific tasks. It encourages users to experiment with the Gemma model across various fields, broadening the scope of its application and enhancing its utility."]},{"cell_type":"markdown","metadata":{},"source":["Reference:\n","\n"]},{"cell_type":"code","execution_count":9,"metadata":{},"outputs":[{"ename":"HfHubHTTPError","evalue":"502 Server Error: Bad Gateway for url: https://huggingface.co/api/models/saidines12/telugu-news-headline-generation/commit/main\n\n\n\n\n \n \n \n \n \n \n \n \n\n Hugging Face - The AI community building the future.\n \n \n\n\n\n

\n \n
\n

502

\n

Bad Gateway

\n
\n
\n\n","output_type":"error","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31mHTTPError\u001b[0m Traceback (most recent call last)","File \u001b[0;32m~/.pyenv/versions/3.10.2/envs/venv1/lib/python3.10/site-packages/huggingface_hub/utils/_http.py:406\u001b[0m, in \u001b[0;36mhf_raise_for_status\u001b[0;34m(response, endpoint_name)\u001b[0m\n\u001b[1;32m 405\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 406\u001b[0m \u001b[43mresponse\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mraise_for_status\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 407\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m HTTPError \u001b[38;5;28;01mas\u001b[39;00m e:\n","File \u001b[0;32m~/.pyenv/versions/3.10.2/envs/venv1/lib/python3.10/site-packages/requests/models.py:1024\u001b[0m, in \u001b[0;36mResponse.raise_for_status\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 1023\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m http_error_msg:\n\u001b[0;32m-> 1024\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m HTTPError(http_error_msg, response\u001b[38;5;241m=\u001b[39m\u001b[38;5;28mself\u001b[39m)\n","\u001b[0;31mHTTPError\u001b[0m: 502 Server Error: Bad Gateway for url: https://huggingface.co/api/models/saidines12/telugu-news-headline-generation/commit/main","\nThe above exception was the direct cause of the following exception:\n","\u001b[0;31mHfHubHTTPError\u001b[0m Traceback (most recent call last)","Cell \u001b[0;32mIn[9], line 4\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mhuggingface_hub\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m HfApi\n\u001b[1;32m 2\u001b[0m api \u001b[38;5;241m=\u001b[39m HfApi()\n\u001b[0;32m----> 4\u001b[0m \u001b[43mapi\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mupload_file\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 5\u001b[0m \u001b[43m \u001b[49m\u001b[43mpath_or_fileobj\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43m/data1/max/telugu_corpus/andhrajyothy_data/gemma-fine-tuning-on-telugu-news-dataset.ipynb\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 6\u001b[0m \u001b[43m \u001b[49m\u001b[43mrepo_id\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43msaidines12/telugu-news-headline-generation\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 7\u001b[0m \u001b[43m \u001b[49m\u001b[43mrepo_type\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mmodel\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 8\u001b[0m \u001b[43m \u001b[49m\u001b[43mpath_in_repo\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mnotebooks/gemma-fine-tuning-on-telugu-news-dataset.ipynb\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 9\u001b[0m \u001b[43m)\u001b[49m\n","File \u001b[0;32m~/.pyenv/versions/3.10.2/envs/venv1/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:114\u001b[0m, in \u001b[0;36mvalidate_hf_hub_args.._inner_fn\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 111\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m check_use_auth_token:\n\u001b[1;32m 112\u001b[0m kwargs \u001b[38;5;241m=\u001b[39m smoothly_deprecate_use_auth_token(fn_name\u001b[38;5;241m=\u001b[39mfn\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__name__\u001b[39m, has_token\u001b[38;5;241m=\u001b[39mhas_token, kwargs\u001b[38;5;241m=\u001b[39mkwargs)\n\u001b[0;32m--> 114\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfn\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n","File \u001b[0;32m~/.pyenv/versions/3.10.2/envs/venv1/lib/python3.10/site-packages/huggingface_hub/hf_api.py:1485\u001b[0m, in \u001b[0;36mfuture_compatible.._inner\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 1482\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mrun_as_future(fn, \u001b[38;5;28mself\u001b[39m, \u001b[38;5;241m*\u001b[39margs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs)\n\u001b[1;32m 1484\u001b[0m \u001b[38;5;66;03m# Otherwise, call the function normally\u001b[39;00m\n\u001b[0;32m-> 1485\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfn\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n","File \u001b[0;32m~/.pyenv/versions/3.10.2/envs/venv1/lib/python3.10/site-packages/huggingface_hub/hf_api.py:4653\u001b[0m, in \u001b[0;36mHfApi.upload_file\u001b[0;34m(self, path_or_fileobj, path_in_repo, repo_id, token, repo_type, revision, commit_message, commit_description, create_pr, parent_commit, run_as_future)\u001b[0m\n\u001b[1;32m 4645\u001b[0m commit_message \u001b[38;5;241m=\u001b[39m (\n\u001b[1;32m 4646\u001b[0m commit_message \u001b[38;5;28;01mif\u001b[39;00m commit_message \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mUpload \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mpath_in_repo\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m with huggingface_hub\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 4647\u001b[0m )\n\u001b[1;32m 4648\u001b[0m operation \u001b[38;5;241m=\u001b[39m CommitOperationAdd(\n\u001b[1;32m 4649\u001b[0m path_or_fileobj\u001b[38;5;241m=\u001b[39mpath_or_fileobj,\n\u001b[1;32m 4650\u001b[0m path_in_repo\u001b[38;5;241m=\u001b[39mpath_in_repo,\n\u001b[1;32m 4651\u001b[0m )\n\u001b[0;32m-> 4653\u001b[0m commit_info \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcreate_commit\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 4654\u001b[0m \u001b[43m \u001b[49m\u001b[43mrepo_id\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrepo_id\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 4655\u001b[0m \u001b[43m \u001b[49m\u001b[43mrepo_type\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrepo_type\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 4656\u001b[0m \u001b[43m \u001b[49m\u001b[43moperations\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m[\u001b[49m\u001b[43moperation\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 4657\u001b[0m \u001b[43m \u001b[49m\u001b[43mcommit_message\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcommit_message\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 4658\u001b[0m \u001b[43m \u001b[49m\u001b[43mcommit_description\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcommit_description\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 4659\u001b[0m \u001b[43m \u001b[49m\u001b[43mtoken\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtoken\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 4660\u001b[0m \u001b[43m \u001b[49m\u001b[43mrevision\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrevision\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 4661\u001b[0m \u001b[43m \u001b[49m\u001b[43mcreate_pr\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcreate_pr\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 4662\u001b[0m \u001b[43m \u001b[49m\u001b[43mparent_commit\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mparent_commit\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 4663\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 4665\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m commit_info\u001b[38;5;241m.\u001b[39mpr_url \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m 4666\u001b[0m revision \u001b[38;5;241m=\u001b[39m quote(_parse_revision_from_pr_url(commit_info\u001b[38;5;241m.\u001b[39mpr_url), safe\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n","File \u001b[0;32m~/.pyenv/versions/3.10.2/envs/venv1/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:114\u001b[0m, in \u001b[0;36mvalidate_hf_hub_args.._inner_fn\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 111\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m check_use_auth_token:\n\u001b[1;32m 112\u001b[0m kwargs \u001b[38;5;241m=\u001b[39m smoothly_deprecate_use_auth_token(fn_name\u001b[38;5;241m=\u001b[39mfn\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__name__\u001b[39m, has_token\u001b[38;5;241m=\u001b[39mhas_token, kwargs\u001b[38;5;241m=\u001b[39mkwargs)\n\u001b[0;32m--> 114\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfn\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n","File \u001b[0;32m~/.pyenv/versions/3.10.2/envs/venv1/lib/python3.10/site-packages/huggingface_hub/hf_api.py:1485\u001b[0m, in \u001b[0;36mfuture_compatible.._inner\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 1482\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mrun_as_future(fn, \u001b[38;5;28mself\u001b[39m, \u001b[38;5;241m*\u001b[39margs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs)\n\u001b[1;32m 1484\u001b[0m \u001b[38;5;66;03m# Otherwise, call the function normally\u001b[39;00m\n\u001b[0;32m-> 1485\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfn\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n","File \u001b[0;32m~/.pyenv/versions/3.10.2/envs/venv1/lib/python3.10/site-packages/huggingface_hub/hf_api.py:3995\u001b[0m, in \u001b[0;36mHfApi.create_commit\u001b[0;34m(self, repo_id, operations, commit_message, commit_description, token, repo_type, revision, create_pr, num_threads, parent_commit, run_as_future)\u001b[0m\n\u001b[1;32m 3993\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m 3994\u001b[0m commit_resp \u001b[38;5;241m=\u001b[39m get_session()\u001b[38;5;241m.\u001b[39mpost(url\u001b[38;5;241m=\u001b[39mcommit_url, headers\u001b[38;5;241m=\u001b[39mheaders, data\u001b[38;5;241m=\u001b[39mdata, params\u001b[38;5;241m=\u001b[39mparams)\n\u001b[0;32m-> 3995\u001b[0m \u001b[43mhf_raise_for_status\u001b[49m\u001b[43m(\u001b[49m\u001b[43mcommit_resp\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mendpoint_name\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mcommit\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[1;32m 3996\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m RepositoryNotFoundError \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m 3997\u001b[0m e\u001b[38;5;241m.\u001b[39mappend_to_message(_CREATE_COMMIT_NO_REPO_ERROR_MESSAGE)\n","File \u001b[0;32m~/.pyenv/versions/3.10.2/envs/venv1/lib/python3.10/site-packages/huggingface_hub/utils/_http.py:477\u001b[0m, in \u001b[0;36mhf_raise_for_status\u001b[0;34m(response, endpoint_name)\u001b[0m\n\u001b[1;32m 473\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m _format(HfHubHTTPError, message, response) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01me\u001b[39;00m\n\u001b[1;32m 475\u001b[0m \u001b[38;5;66;03m# Convert `HTTPError` into a `HfHubHTTPError` to display request information\u001b[39;00m\n\u001b[1;32m 476\u001b[0m \u001b[38;5;66;03m# as well (request id and/or server error message)\u001b[39;00m\n\u001b[0;32m--> 477\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m _format(HfHubHTTPError, \u001b[38;5;28mstr\u001b[39m(e), response) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01me\u001b[39;00m\n","\u001b[0;31mHfHubHTTPError\u001b[0m: 502 Server Error: Bad Gateway for url: https://huggingface.co/api/models/saidines12/telugu-news-headline-generation/commit/main\n\n\n\n\n \n \n \n \n \n \n \n \n\n Hugging Face - The AI community building the future.\n \n \n\n\n\n
\n \n
\n

502

\n

Bad Gateway

\n
\n
\n\n"]}],"source":["from huggingface_hub import HfApi\n","api = HfApi()\n","\n","api.upload_file(\n"," path_or_fileobj=\"/data1/max/telugu_corpus/andhrajyothy_data/gemma-fine-tuning-on-telugu-news-dataset.ipynb\",\n"," repo_id=\"saidines12/telugu-news-headline-generation\",\n"," repo_type=\"model\",\n"," path_in_repo=\"notebooks/gemma-fine-tuning-on-telugu-news-dataset.ipynb\",\n",")"]}],"metadata":{"kaggle":{"accelerator":"none","dataSources":[{"databundleVersionId":7669720,"sourceId":64148,"sourceType":"competition"},{"datasetId":4616621,"sourceId":7970419,"sourceType":"datasetVersion"},{"isSourceIdPinned":true,"modelInstanceId":8318,"sourceId":28785,"sourceType":"modelInstanceVersion"}],"dockerImageVersionId":30683,"isGpuEnabled":false,"isInternetEnabled":true,"language":"python","sourceType":"notebook"},"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.10.2"}},"nbformat":4,"nbformat_minor":4}