Update README.md
Browse files
README.md
CHANGED
@@ -1,10 +1,79 @@
|
|
1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
4 |
|
5 |
# Multilingual SmolLM2-135M Base Model
|
6 |
|
7 |
-
This SmolLM2 variant is pretrained on a small corpus covering 50 languages, unlike the original English-only models.
|
8 |
With 135M parameters, it serves as a lightweight multilingual autocomplete but is **not instruction-tuned**—additional fine-tuning on your data is recommended.
|
9 |
|
10 |
## Key Points
|
@@ -45,6 +114,9 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
45 |
|
46 |
The **bold** text indicates the initial text used for the generation.
|
47 |
|
|
|
|
|
|
|
48 |
**Ara (Latin for "the Altar") is a southern constellation between Scorpius, Telescopium, Triangulum Australe, and Norma. It was (as ) one of the Greek bulk (namely 48) described by the 2nd-century astronomer Ptolemy, and it remains one of the 88 modern constellations designated by the International Astronomical Union.** "The most modern group of the modern universe" is the group of "The Tyball" whose most celebrated galaxy catalogues are the three galaxies from the Milky Way in the Milky Way. The galaxy Milky Way spans the Milky Way around the Milky Way, near where the Milky Way lies near the Milky Way. The Milky Way is also called the Milky Way because the Milky Way is the most widely known galaxies from the Milky Way. There is also a region that does not have the same Milky Way outside Milky Way in the Milky Way, where it appears to be the Milky Way. Milky Way is a very far-away galaxy, the Milky Way does not have the same Milky Way, which is where the Milky Way is.
|
49 |
|
50 |
**Febrer és el mes més curt de l'any, però el 2008 ens ha regalat un dia i m'ha donat temps de llegir tres llibres. El primer molt llarg, però d'aquells que devores, perquè no t'hi pots aturar. L'he tret del Club de Lectura de l'Institut i és de la madrilenya Almudena GRANDES.** 1589 - Tatua, de 1502 - Fardó, d'un llama del seu pont dels gavelles de l'aniversari de Madrid, d'america, l'americana i els galítics i l'americana de Barcelona. El 2002 i els 1510 y que tenen l'india del poulement en els 2010 i la 1527. En 1520 i en 1521. El sistema de los alcaldes més més es-halt l'espat de la llocs de l'espat de l'Esquerra el 2001 - està s'encluye por la lliga, els eures d'aparats i que està s'en formulades en els llocs dels de l'a l'Amar, de 3000, però el 3000.
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
- multilingual
|
4 |
+
- ar
|
5 |
+
- az
|
6 |
+
- bg
|
7 |
+
- bn
|
8 |
+
- ca
|
9 |
+
- cs
|
10 |
+
- da
|
11 |
+
- de
|
12 |
+
- el
|
13 |
+
- en
|
14 |
+
- es
|
15 |
+
- et
|
16 |
+
- fa
|
17 |
+
- fi
|
18 |
+
- fr
|
19 |
+
- he
|
20 |
+
- hi
|
21 |
+
- hu
|
22 |
+
- hy
|
23 |
+
- id
|
24 |
+
- is
|
25 |
+
- it
|
26 |
+
- ja
|
27 |
+
- ka
|
28 |
+
- kk
|
29 |
+
- ko
|
30 |
+
- lt
|
31 |
+
- lv
|
32 |
+
- mk
|
33 |
+
- ml
|
34 |
+
- mr
|
35 |
+
- ne
|
36 |
+
- nl
|
37 |
+
- 'no'
|
38 |
+
- pl
|
39 |
+
- pt
|
40 |
+
- ro
|
41 |
+
- ru
|
42 |
+
- sk
|
43 |
+
- sl
|
44 |
+
- sq
|
45 |
+
- sr
|
46 |
+
- sv
|
47 |
+
- ta
|
48 |
+
- th
|
49 |
+
- tr
|
50 |
+
- uk
|
51 |
+
- ur
|
52 |
+
- vi
|
53 |
+
- zh
|
54 |
license: apache-2.0
|
55 |
+
datasets:
|
56 |
+
- agentlans/LinguaNova
|
57 |
+
base_model:
|
58 |
+
- HuggingFaceTB/SmolLM2-135M
|
59 |
+
tags:
|
60 |
+
- multilingual
|
61 |
+
- language-model
|
62 |
+
- small-model
|
63 |
+
- transformer
|
64 |
+
- causal-lm
|
65 |
+
- autocomplete
|
66 |
+
- base-model
|
67 |
+
- research
|
68 |
+
- pretrained
|
69 |
+
- 50-languages
|
70 |
+
- experimental
|
71 |
+
- not-instruction-tuned
|
72 |
---
|
73 |
|
74 |
# Multilingual SmolLM2-135M Base Model
|
75 |
|
76 |
+
This SmolLM2 variant is pretrained on [agentlans/LinguaNova](https://huggingface.co/datasets/agentlans/LinguaNova), a small corpus covering 50 languages, unlike the original English-only models.
|
77 |
With 135M parameters, it serves as a lightweight multilingual autocomplete but is **not instruction-tuned**—additional fine-tuning on your data is recommended.
|
78 |
|
79 |
## Key Points
|
|
|
114 |
|
115 |
The **bold** text indicates the initial text used for the generation.
|
116 |
|
117 |
+
These were autocompleted in [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) using the `min_p` preset.
|
118 |
+
Whenever the generation stalled, the inference was stopped and regenerated.
|
119 |
+
|
120 |
**Ara (Latin for "the Altar") is a southern constellation between Scorpius, Telescopium, Triangulum Australe, and Norma. It was (as ) one of the Greek bulk (namely 48) described by the 2nd-century astronomer Ptolemy, and it remains one of the 88 modern constellations designated by the International Astronomical Union.** "The most modern group of the modern universe" is the group of "The Tyball" whose most celebrated galaxy catalogues are the three galaxies from the Milky Way in the Milky Way. The galaxy Milky Way spans the Milky Way around the Milky Way, near where the Milky Way lies near the Milky Way. The Milky Way is also called the Milky Way because the Milky Way is the most widely known galaxies from the Milky Way. There is also a region that does not have the same Milky Way outside Milky Way in the Milky Way, where it appears to be the Milky Way. Milky Way is a very far-away galaxy, the Milky Way does not have the same Milky Way, which is where the Milky Way is.
|
121 |
|
122 |
**Febrer és el mes més curt de l'any, però el 2008 ens ha regalat un dia i m'ha donat temps de llegir tres llibres. El primer molt llarg, però d'aquells que devores, perquè no t'hi pots aturar. L'he tret del Club de Lectura de l'Institut i és de la madrilenya Almudena GRANDES.** 1589 - Tatua, de 1502 - Fardó, d'un llama del seu pont dels gavelles de l'aniversari de Madrid, d'america, l'americana i els galítics i l'americana de Barcelona. El 2002 i els 1510 y que tenen l'india del poulement en els 2010 i la 1527. En 1520 i en 1521. El sistema de los alcaldes més més es-halt l'espat de la llocs de l'espat de l'Esquerra el 2001 - està s'encluye por la lliga, els eures d'aparats i que està s'en formulades en els llocs dels de l'a l'Amar, de 3000, però el 3000.
|