agentlans commited on
Commit
c43c710
·
verified ·
1 Parent(s): dc2fd02

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -1
README.md CHANGED
@@ -1,10 +1,79 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
 
5
  # Multilingual SmolLM2-135M Base Model
6
 
7
- This SmolLM2 variant is pretrained on a small corpus covering 50 languages, unlike the original English-only models.
8
  With 135M parameters, it serves as a lightweight multilingual autocomplete but is **not instruction-tuned**—additional fine-tuning on your data is recommended.
9
 
10
  ## Key Points
@@ -45,6 +114,9 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
45
 
46
  The **bold** text indicates the initial text used for the generation.
47
 
 
 
 
48
  **Ara (Latin for "the Altar") is a southern constellation between Scorpius, Telescopium, Triangulum Australe, and Norma. It was (as ) one of the Greek bulk (namely 48) described by the 2nd-century astronomer Ptolemy, and it remains one of the 88 modern constellations designated by the International Astronomical Union.** "The most modern group of the modern universe" is the group of "The Tyball" whose most celebrated galaxy catalogues are the three galaxies from the Milky Way in the Milky Way. The galaxy Milky Way spans the Milky Way around the Milky Way, near where the Milky Way lies near the Milky Way. The Milky Way is also called the Milky Way because the Milky Way is the most widely known galaxies from the Milky Way. There is also a region that does not have the same Milky Way outside Milky Way in the Milky Way, where it appears to be the Milky Way. Milky Way is a very far-away galaxy, the Milky Way does not have the same Milky Way, which is where the Milky Way is.
49
 
50
  **Febrer és el mes més curt de l'any, però el 2008 ens ha regalat un dia i m'ha donat temps de llegir tres llibres. El primer molt llarg, però d'aquells que devores, perquè no t'hi pots aturar. L'he tret del Club de Lectura de l'Institut i és de la madrilenya Almudena GRANDES.** 1589 - Tatua, de 1502 - Fardó, d'un llama del seu pont dels gavelles de l'aniversari de Madrid, d'america, l'americana i els galítics i l'americana de Barcelona. El 2002 i els 1510 y que tenen l'india del poulement en els 2010 i la 1527. En 1520 i en 1521. El sistema de los alcaldes més més es-halt l'espat de la llocs de l'espat de l'Esquerra el 2001 - està s'encluye por la lliga, els eures d'aparats i que està s'en formulades en els llocs dels de l'a l'Amar, de 3000, però el 3000.
 
1
  ---
2
+ language:
3
+ - multilingual
4
+ - ar
5
+ - az
6
+ - bg
7
+ - bn
8
+ - ca
9
+ - cs
10
+ - da
11
+ - de
12
+ - el
13
+ - en
14
+ - es
15
+ - et
16
+ - fa
17
+ - fi
18
+ - fr
19
+ - he
20
+ - hi
21
+ - hu
22
+ - hy
23
+ - id
24
+ - is
25
+ - it
26
+ - ja
27
+ - ka
28
+ - kk
29
+ - ko
30
+ - lt
31
+ - lv
32
+ - mk
33
+ - ml
34
+ - mr
35
+ - ne
36
+ - nl
37
+ - 'no'
38
+ - pl
39
+ - pt
40
+ - ro
41
+ - ru
42
+ - sk
43
+ - sl
44
+ - sq
45
+ - sr
46
+ - sv
47
+ - ta
48
+ - th
49
+ - tr
50
+ - uk
51
+ - ur
52
+ - vi
53
+ - zh
54
  license: apache-2.0
55
+ datasets:
56
+ - agentlans/LinguaNova
57
+ base_model:
58
+ - HuggingFaceTB/SmolLM2-135M
59
+ tags:
60
+ - multilingual
61
+ - language-model
62
+ - small-model
63
+ - transformer
64
+ - causal-lm
65
+ - autocomplete
66
+ - base-model
67
+ - research
68
+ - pretrained
69
+ - 50-languages
70
+ - experimental
71
+ - not-instruction-tuned
72
  ---
73
 
74
  # Multilingual SmolLM2-135M Base Model
75
 
76
+ This SmolLM2 variant is pretrained on [agentlans/LinguaNova](https://huggingface.co/datasets/agentlans/LinguaNova), a small corpus covering 50 languages, unlike the original English-only models.
77
  With 135M parameters, it serves as a lightweight multilingual autocomplete but is **not instruction-tuned**—additional fine-tuning on your data is recommended.
78
 
79
  ## Key Points
 
114
 
115
  The **bold** text indicates the initial text used for the generation.
116
 
117
+ These were autocompleted in [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) using the `min_p` preset.
118
+ Whenever the generation stalled, the inference was stopped and regenerated.
119
+
120
  **Ara (Latin for "the Altar") is a southern constellation between Scorpius, Telescopium, Triangulum Australe, and Norma. It was (as ) one of the Greek bulk (namely 48) described by the 2nd-century astronomer Ptolemy, and it remains one of the 88 modern constellations designated by the International Astronomical Union.** "The most modern group of the modern universe" is the group of "The Tyball" whose most celebrated galaxy catalogues are the three galaxies from the Milky Way in the Milky Way. The galaxy Milky Way spans the Milky Way around the Milky Way, near where the Milky Way lies near the Milky Way. The Milky Way is also called the Milky Way because the Milky Way is the most widely known galaxies from the Milky Way. There is also a region that does not have the same Milky Way outside Milky Way in the Milky Way, where it appears to be the Milky Way. Milky Way is a very far-away galaxy, the Milky Way does not have the same Milky Way, which is where the Milky Way is.
121
 
122
  **Febrer és el mes més curt de l'any, però el 2008 ens ha regalat un dia i m'ha donat temps de llegir tres llibres. El primer molt llarg, però d'aquells que devores, perquè no t'hi pots aturar. L'he tret del Club de Lectura de l'Institut i és de la madrilenya Almudena GRANDES.** 1589 - Tatua, de 1502 - Fardó, d'un llama del seu pont dels gavelles de l'aniversari de Madrid, d'america, l'americana i els galítics i l'americana de Barcelona. El 2002 i els 1510 y que tenen l'india del poulement en els 2010 i la 1527. En 1520 i en 1521. El sistema de los alcaldes més més es-halt l'espat de la llocs de l'espat de l'Esquerra el 2001 - està s'encluye por la lliga, els eures d'aparats i que està s'en formulades en els llocs dels de l'a l'Amar, de 3000, però el 3000.