---
language:
- en
license: mit
library_name: transformers
datasets:
- mlsquare/CLIENT_samantar_mixed_train_val
pipeline_tag: text-generation
---

# Model Card for Model ID

Testing model for the Seshu pipeline.

## Model Details

### Model Description


This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

- **Developed by:** MLsquare
- **Model type:** Next Character Generation
- **Language(s) (NLP):** All languages in ai4bharat/samanantar dataset
- **License:** MIT

### Model Sources [optional]

- **Repository:** https://github.com/LegallyCoder/mamba-hf
- **Paper:** https://arxiv.org/abs/2312.00752

## Uses

Refer to the github repository for more information
### Direct Use
Refer to the github repository for more information


## How to Get Started with the Model

Refer to the github repository: https://github.com/mlsquare/fedem

## Training Details

### Training Data

Individual target and source sentences from the AI4Bharat Samanantar dataset. All 11 language sentences and their translations have been stacked and used for next character generation task.

### Training Procedure 

Trained on next character generation task using cross-entropy loss.

#### Preprocessing [optional]

converted to raw UTF8 characters before training by using ByT5-large tokenizer


#### Training Hyperparameters

- **Training regime:**
  output_dir="mamba",
  per_device_train_batch_size=1,
  per_device_eval_batch_size=1,
  num_train_epochs=4,
  weight_decay=0.1,
  lr_scheduler_type="cosine",
  learning_rate=5e-4,
  fp16=False,

## Evaluation

A simple cross-entropy loss has been used to test the pipeline and working of the model.


## Model Card Contact

MLsquare