In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-13b-hf')

Loading the tokenizer from the `special_tokens_map.json` and the `added_tokens.json` will be removed in `transformers 5`,  it is kept for forward compatibility, but it is recommended to update your `tokenizer_config.json` by uploading it again. You will see the new `added_tokens_decoder` attribute that will store the relevant information.


In [2]:
nf4_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_quant_type='nf4',
   bnb_4bit_use_double_quant=True,
   bnb_4bit_compute_dtype=torch.bfloat16
)

In [5]:
base_model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-hf', quantization_config=nf4_config)

Downloading (…)lve/main/config.json:   0%|          | 0.00/609 [00:00<?, ?B/s]

[2023-09-28 13:24:24,730] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)


2023-09-28 13:24:28.218982: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Downloading (…)fetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

In [6]:
fpf_model = AutoModelForCausalLM.from_pretrained('mesolitica/llama-7b-hf-32768-fpf', quantization_config=nf4_config)

Downloading (…)lve/main/config.json:   0%|          | 0.00/628 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/183 [00:00<?, ?B/s]

In [7]:
import time
from tqdm import tqdm

kwargs = {
    'temperature': 0.9, 
    'max_new_tokens': 256, 
    'top_p': 0.95, 
    'repetition_penalty': 1.0, 
    'do_sample': True,
    'num_beams': 1,
}

In [8]:
inputs = tokenizer(['ketiak ak masham'], return_tensors='pt').to('cuda')

In [9]:
generate_kwargs = dict(inputs)
generate_kwargs = {**generate_kwargs, **kwargs}

## Base 7B

In [10]:
o = base_model.generate(**generate_kwargs)

In [11]:
print(tokenizer.decode(o[0], skip_special_tokens = True).split('[/INST]')[-1].strip())

ketiak ak mashamat 17. броја (1282) - Pismo LXXVII
Dodatkowe usytuowanie elementów elementów stropowych 6.d. - wykonanie 6.d.
The aim of this work was to design a new and simple solution for the steel trusses' elements, which could be used in the construction of light-industrial buildings, as well as in the construction of the buildings in general. The goal of the research was to develop a new method of positioning the steel trusses, which would be better adapted to the construction of light-industrial buildings. The design of the new truss, which can be easily placed during the construction process, was prepared in a three-dimensional model, thanks to which the parameters of the truss were determined. As a result, it was found that the trusses have the following parameters: width of the arch: 64 cm and 111 cm, the length of the straight side: 60 cm, 150 cm and 175 cm, the angle of inclination of the arch side: 29, 53 and 68.46 °, the


## Malaysian Llama2 7B 32k

In [12]:
o = fpf_model.generate(**generate_kwargs)

In [13]:
print(tokenizer.decode(o[0], skip_special_tokens = True).split('[/INST]')[-1].strip())

ketiak ak masham. itu ak yg masham..hahahahhahahha..tapi sebenarnya kita takut nak duduk sebelah penyangkut baju..hahahahha! ak penakut dik..hehe..xnk duduk dkt2..ak jenis kena dptkan sesuatu tu dulu baru aku join dgn group..hahahaha! Aku kena cepat. Sbb ak jenis nak melompat..haha..kalau jenis yg suka duduk diam2, boleh la duduk kat penyangkut baju tuu. Tp ak jenis dlm drama, ak yg masham. Tp sebenarnya ak yg penakut..haha.. ak boleh plak kta ak penakut..haha..ak ni jenis kuat melompat..haha..so aku jenis takut lah..haha..pastu klu dh join group, ak suka yg ak kena first..hahahahaha..ak x


In [14]:
inputs = tokenizer(['harga barang kat malaysia ni semakin naik, apa kita nak buat'], return_tensors='pt').to('cuda')
generate_kwargs = dict(inputs)
generate_kwargs = {**generate_kwargs, **kwargs}

## Base 7B

In [15]:
o = base_model.generate(**generate_kwargs)

In [16]:
print(tokenizer.decode(o[0], skip_special_tokens = True).split('[/INST]')[-1].strip())

harga barang kat malaysia ni semakin naik, apa kita nak buat..
 насељать, сравнить и привести в пример.
He was appointed a judge in the Supreme Court and later, the Chief Justice of the Supreme Court.
They are all from outside of the country.
He was appointed a judge in the Supreme Court and later, the Chief Justice of the Supreme Court.
It is not the first time that he had resigned in the last 10 years.
He was appointed a judge in the Supreme Court and later, the Chief Justice of the Supreme Court. It is not the first time that he had resigned in the last 10 years.
You have to learn to say that this is not the first time that he had resigned in the last 10 years.
I was appointed a judge in the Supreme Court and later, the Chief Justice of the Supreme Court.
It is not the first time that I had resigned in the last 10 years.


## Malaysian Llama2 7B 32k

In [21]:
o = fpf_model.generate(**generate_kwargs)

In [22]:
print(tokenizer.decode(o[0], skip_special_tokens = True).split('[/INST]')[-1].strip())

harga barang kat malaysia ni semakin naik, apa kita nak buat? mula bisnes dari rumah., jual barang tak banyak pun, cukup buat kita makan. cuma kita kena keluar modal sikit untuk dapatkan bekalan. tapi sekali keluar modal, selepas tu, tak perlu keluar modal lagi. jual kepada rakan2, keluarga, atau jual di rumah sebagai contoh. tak rugi pun, dapat jual dapat untung. syarat kena ada modal la. sbb kita tak boleh nak untung banyak. kita nak jual sikit pun dah cukup, yang penting kita tak perlu bayar utk ambil barang. lagi satu sekali, kita nak keluarkan modal, kita kena tau mana nak beli barang yang kita nak jual tu. jangan beli kat pasaraya, jangan beli kat pembekal yang mahal, beli je kat mana-mana y
