Files changed (1) hide show
  1. README.md +105 -93
README.md CHANGED
@@ -1,94 +1,106 @@
1
- ---
2
- license: mit
3
- language:
4
- - pt
5
- base_model:
6
- - Qwen/Qwen2.5-0.5B-Instruct
7
- pipeline_tag: text-generation
8
- ---
9
-
10
- This is Qwen2.5-0.5B-Instruct finetuned to perform the compression of chunks of text.
11
-
12
- The goal is to keep the information of each chunk in a RAG system more compressed and easier to read.
13
-
14
- The usage of this template is strict
15
-
16
- Sample inference:
17
- ```python
18
- from transformers import AutoModelForCausalLM, AutoTokenizer
19
-
20
- model_name = "cnmoro/Qwen2.5-0.5B-Chunk-Compressor"
21
- tokenizer = AutoTokenizer.from_pretrained(model_name)
22
- model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")
23
-
24
- prompt = """<|im_start|>system
25
- Você deve compactar textos sem necessidade de legibilidade para humanos, mas mantendo informações essenciais compreensíveis para outro modelo de linguagem.
26
-
27
- Regras para compressão:
28
- Remova palavras desnecessárias como artigos, preposições e pronomes quando possível, desde que a compreensão seja preservada.
29
- Preserve informações essenciais (nomes, locais, ações, conceitos-chave).
30
- Reduza expressões complexas mantendo o significado.
31
- Use listas e separadores para organizar as informações de forma eficiente.
32
- Remova redundâncias e detalhes secundários que não impactam a compreensão geral.<|im_end|>
33
- <|im_start|>user
34
- Texto para compressão:
35
- <Input>
36
- Cleaning the toilet is a task that doesn't interest people. Many, however, pray
37
- for technology that can save them from the unpleasant mission. Apparently, those
38
- prayers were answered: a group of Chinese scientists developed the concept of a
39
- self-cleaning toilet and managed to make it a reality. Thanks to 3D printing,
40
- researchers at Huazhong University of Science and Technology have managed to
41
- revolutionize the unpleasant household chore. The self-cleaning toilet, known
42
- as “ARSFT”, an acronym for “abrasion-resistant super slippery toilet flush” — the
43
- technology that allows automatic cleaning emerged from a complex combination
44
- of plastic and grains of sand that repel water. In plain English, the technology
45
- ensures that no substance sticks to the surface. Therefore, in addition to being
46
- a salvation for many, this can be a more sustainable alternative to conventional
47
- toilets. The website New Scientist interviewed one of the project's scientists,
48
- Yike Li, who created the self-cleaning toilet. According to Li, the Chinese used,
49
- in addition to the combination of plastic and grains of sand, a laser to bring the
50
- particles together, thus creating the 3D printed self-cleaning toilet. After printing,
51
- the researchers used silicon oil to lubricate the surface of the toilet, managing
52
- to penetrate it due to the structure of the model. This generated the toilet's
53
- self-cleaning capacity, with the following materials leaving no marks after
54
- flushing: Milk; Yogurt; Honey; Muddy water; Starch gel mixed with porridge.
55
- Chinese scientists also tested the self-cleaning toilet with synthetic feces,
56
- using a mixture of miso, yeast, peanut oil and water, managing to imitate human
57
- excrement. Although it may be strange that scientists work to create toilet technologies,
58
- several seemingly “unnecessary” innovations can have a major global impact.
59
- The self-cleaning toilet created by Chinese researchers can considerably reduce water waste.
60
- According to Chinese scientists, the self-cleaning toilet can withstand a thousand scraping
61
- cycles thanks to its super slippery capacity. Therefore, the self-cleaning toilet has
62
- a new flushing method that minimizes water consumption – and waste. The Daily Mail
63
- points out that, since its invention in the 18th century, although the toilet has
64
- increased hygiene, a significant amount of water is required due to the adhesion
65
- between the surface of the toilet and human feces and urine. Worldwide, toilet
66
- flushes correspond to 141 billion liters of water daily. Therefore, in addition
67
- to saving a valuable resource for humanity, the self-cleaning toilet also has another
68
- environmental benefit. In places such as public and chemical bathrooms, especially
69
- where there is no connection to the sanitation system, the self-cleaning toilet
70
- appears as an ideal solution.
71
- </Input><|im_end|>
72
- <|im_start|>assistant
73
- Texto comprimido:
74
- <Output>
75
- """
76
-
77
- inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
78
- outputs = model.generate(**inputs, max_new_tokens=384, temperature=0.5, do_sample=True)
79
-
80
- input_length = inputs.input_ids.shape[1]
81
- generated_tokens = outputs[0, input_length:]
82
- generated_text = tokenizer.decode(generated_tokens, skip_special_tokens=True)
83
-
84
- # Remove the stop token from the generated text
85
- generated_text = generated_text.split("</Output>")[0]
86
-
87
- print(generated_text)
88
- # Output text:
89
- # - Toilet cleaner - China developed self-cleaning toilet technology.
90
- # - 3D printing - recycled material repels water, prevents sticking.
91
- # - Self-cleaning toilet - reduces water use, waste, improves hygiene.
92
- # - Environmental benefits: reduced water usage globally (141 billion liters/day), reduces resource waste.
93
- # - Public/private bathroom solutions - ideal solution for areas lacking sanitation systems.
 
 
 
 
 
 
 
 
 
 
 
 
94
  ```
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - zho
5
+ - eng
6
+ - fra
7
+ - spa
8
+ - por
9
+ - deu
10
+ - ita
11
+ - rus
12
+ - jpn
13
+ - kor
14
+ - vie
15
+ - tha
16
+ - ara
17
+ base_model:
18
+ - Qwen/Qwen2.5-0.5B-Instruct
19
+ pipeline_tag: text-generation
20
+ ---
21
+
22
+ This is Qwen2.5-0.5B-Instruct finetuned to perform the compression of chunks of text.
23
+
24
+ The goal is to keep the information of each chunk in a RAG system more compressed and easier to read.
25
+
26
+ The usage of this template is strict
27
+
28
+ Sample inference:
29
+ ```python
30
+ from transformers import AutoModelForCausalLM, AutoTokenizer
31
+
32
+ model_name = "cnmoro/Qwen2.5-0.5B-Chunk-Compressor"
33
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
34
+ model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")
35
+
36
+ prompt = """<|im_start|>system
37
+ Você deve compactar textos sem necessidade de legibilidade para humanos, mas mantendo informações essenciais compreensíveis para outro modelo de linguagem.
38
+
39
+ Regras para compressão:
40
+ Remova palavras desnecessárias como artigos, preposições e pronomes quando possível, desde que a compreensão seja preservada.
41
+ Preserve informações essenciais (nomes, locais, ações, conceitos-chave).
42
+ Reduza expressões complexas mantendo o significado.
43
+ Use listas e separadores para organizar as informações de forma eficiente.
44
+ Remova redundâncias e detalhes secundários que não impactam a compreensão geral.<|im_end|>
45
+ <|im_start|>user
46
+ Texto para compressão:
47
+ <Input>
48
+ Cleaning the toilet is a task that doesn't interest people. Many, however, pray
49
+ for technology that can save them from the unpleasant mission. Apparently, those
50
+ prayers were answered: a group of Chinese scientists developed the concept of a
51
+ self-cleaning toilet and managed to make it a reality. Thanks to 3D printing,
52
+ researchers at Huazhong University of Science and Technology have managed to
53
+ revolutionize the unpleasant household chore. The self-cleaning toilet, known
54
+ as “ARSFT”, an acronym for “abrasion-resistant super slippery toilet flush” — the
55
+ technology that allows automatic cleaning emerged from a complex combination
56
+ of plastic and grains of sand that repel water. In plain English, the technology
57
+ ensures that no substance sticks to the surface. Therefore, in addition to being
58
+ a salvation for many, this can be a more sustainable alternative to conventional
59
+ toilets. The website New Scientist interviewed one of the project's scientists,
60
+ Yike Li, who created the self-cleaning toilet. According to Li, the Chinese used,
61
+ in addition to the combination of plastic and grains of sand, a laser to bring the
62
+ particles together, thus creating the 3D printed self-cleaning toilet. After printing,
63
+ the researchers used silicon oil to lubricate the surface of the toilet, managing
64
+ to penetrate it due to the structure of the model. This generated the toilet's
65
+ self-cleaning capacity, with the following materials leaving no marks after
66
+ flushing: Milk; Yogurt; Honey; Muddy water; Starch gel mixed with porridge.
67
+ Chinese scientists also tested the self-cleaning toilet with synthetic feces,
68
+ using a mixture of miso, yeast, peanut oil and water, managing to imitate human
69
+ excrement. Although it may be strange that scientists work to create toilet technologies,
70
+ several seemingly “unnecessary” innovations can have a major global impact.
71
+ The self-cleaning toilet created by Chinese researchers can considerably reduce water waste.
72
+ According to Chinese scientists, the self-cleaning toilet can withstand a thousand scraping
73
+ cycles thanks to its super slippery capacity. Therefore, the self-cleaning toilet has
74
+ a new flushing method that minimizes water consumption – and waste. The Daily Mail
75
+ points out that, since its invention in the 18th century, although the toilet has
76
+ increased hygiene, a significant amount of water is required due to the adhesion
77
+ between the surface of the toilet and human feces and urine. Worldwide, toilet
78
+ flushes correspond to 141 billion liters of water daily. Therefore, in addition
79
+ to saving a valuable resource for humanity, the self-cleaning toilet also has another
80
+ environmental benefit. In places such as public and chemical bathrooms, especially
81
+ where there is no connection to the sanitation system, the self-cleaning toilet
82
+ appears as an ideal solution.
83
+ </Input><|im_end|>
84
+ <|im_start|>assistant
85
+ Texto comprimido:
86
+ <Output>
87
+ """
88
+
89
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
90
+ outputs = model.generate(**inputs, max_new_tokens=384, temperature=0.5, do_sample=True)
91
+
92
+ input_length = inputs.input_ids.shape[1]
93
+ generated_tokens = outputs[0, input_length:]
94
+ generated_text = tokenizer.decode(generated_tokens, skip_special_tokens=True)
95
+
96
+ # Remove the stop token from the generated text
97
+ generated_text = generated_text.split("</Output>")[0]
98
+
99
+ print(generated_text)
100
+ # Output text:
101
+ # - Toilet cleaner - China developed self-cleaning toilet technology.
102
+ # - 3D printing - recycled material repels water, prevents sticking.
103
+ # - Self-cleaning toilet - reduces water use, waste, improves hygiene.
104
+ # - Environmental benefits: reduced water usage globally (141 billion liters/day), reduces resource waste.
105
+ # - Public/private bathroom solutions - ideal solution for areas lacking sanitation systems.
106
  ```