eswardivi
/

stablelm_telugu

Text Generation

Model card Files Files and versions

eswardivi commited on Feb 1, 2024

Commit

473bc73

·

verified ·

1 Parent(s): 8dcd783

Update README.md

Files changed (1) hide show

README.md +0 -14

README.md CHANGED Viewed

@@ -213,20 +213,6 @@ print(tokenizer.decode(tokens[0], skip_special_tokens=True))
 </details>
-### Model Architecture
-The model is a decoder-only transformer similar to the LLaMA ([Touvron et al., 2023](https://arxiv.org/abs/2307.09288)) architecture with the following modifications:
-| Parameters     | Hidden Size | Layers | Heads | Sequence Length |
-|----------------|-------------|--------|-------|-----------------|
-| 1,644,417,024  | 2048        | 24     | 32    | 4096            |
-* **Position Embeddings**: Rotary Position Embeddings ([Su et al., 2021](https://arxiv.org/abs/2104.09864)) applied to the first 25% of head embedding dimensions for improved throughput following [Black et al. (2022)](https://arxiv.org/pdf/2204.06745.pdf).
-* **Normalization**: LayerNorm ([Ba et al., 2016](https://arxiv.org/abs/1607.06450)) with learned bias terms as opposed to RMSNorm ([Zhang & Sennrich, 2019](https://arxiv.org/abs/1910.07467)).
-* **Biases**: We remove all bias terms from the model except for attention Q,K,V projections ([Bai et al., 2023](https://arxiv.org/abs/2309.16609)).
-* **Tokenizer**: We use Arcade100k, a BPE tokenizer extended from OpenAI's [`tiktoken.cl100k_base`](https://github.com/openai/tiktoken). We split digits into individual tokens following findings by [Liu & Low (2023)](https://arxiv.org/abs/2305.14201).
 ## Use and Limitations
 ### Intended Use

 </details>
 ## Use and Limitations
 ### Intended Use