5 17

Robert Agee

RobAgrees

AI & ML interests

None yet

Recent Activity

reacted to ProCreations's post with 🚀 19 days ago

Eyyyy 50 followers 🤯

new activity 21 days ago

google/gemma-3n-E4B-it-litert-preview:Driver Code or GemmaCPP support?

new activity 21 days ago

w4r10ck/SOLAR-10.7B-Instruct-v1.0-uncensored:Adding Evaluation Results

View all activity

Organizations

None yet

RobAgrees's activity

reacted to ProCreations's post with 🚀 19 days ago

Post

2899

Eyyyy 50 followers 🤯

1 reply

New activity in google/gemma-3n-E4B-it-litert-preview 21 days ago

Driver Code or GemmaCPP support?

🤗 8

#7 opened 25 days ago by

yoyou446

New activity in w4r10ck/SOLAR-10.7B-Instruct-v1.0-uncensored 21 days ago

Adding Evaluation Results

#4 opened 8 months ago by

leaderboard-pr-bot

New activity in InferenceIllusionist/M7-Evil-7b-GGUF 22 days ago

This thing is hardly evil at all

#1 opened 22 days ago by

RobAgrees

New activity in Heartsync/NSFW-Uncensored-video2 29 days ago

Is video generation broken?

#2 opened 29 days ago by

RobAgrees

reacted to codys12's post with 🚀👀 29 days ago

Post

1935

Introducing bitnet-r1-llama-8b and bitnet-r1-qwen-32b preview! These models are the first successful sub 1-billion-token finetune to BitNet architecture. We discovered that by adding an aditional input RMSNorm to each linear, you can finetune directly to BitNet with fast convergence to original model performance!

We are working on a pull request to use this extra RMS for any model.

To test these models now, install this fork of transformers:

pip install git+https://github.com/Codys12/transformers.git

Then load the models and test:

from transformers import (AutoModelForCausalLM, AutoTokenizer)

model_id = "codys12/bitnet-r1-qwen-32b" 
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda",
)
tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side="left")

bitnet-r1-llama-8b and bitnet-r1-llama-32b were trained on ~ 300M and 200M tokens of the open-thoughts/OpenThoughts-114k dataset respectively, and were still significantly improving at the end of training. This preview simply demonstrates that the concept works, for future training runs we will leave the lm_head unquantized and align the last hidden state with the original model.

Huge thanks to the team that made this possible:
Gavin Childress, Aaron Herbst, Gavin Jones, Jasdeep Singh, Eli Vang, and Keagan Weinstock from the MSOE AI Club.