FlameF0X (Daniel Fox)

posted an update 2 days ago

Post

189

the training for SnowflakeCore-G1-1B and 7B would be retaken because now I implemented DeepSpeed and management to use two gpus.

replied to their post 9 days ago

:D

posted an update 9 days ago

Post

250

The development of SnowflakeCore-G1-7B-MoE it getting delay. In the mean time I am working on SnowflakeCore-G1-1B-MoE witch would be a pre-train chatbot.

1 reply

·

reacted to Wauplin's post with 👍 12 days ago

Post

2859

Say hello to hf: a faster, friendlier Hugging Face CLI ✨

We are glad to announce a long-awaited quality-of-life improvement: the Hugging Face CLI has been officially renamed from huggingface-cli to hf!

So... why this change?

Typing huggingface-cli constantly gets old fast. More importantly, the CLI’s command structure became messy as new features were added over time (upload, download, cache management, repo management, etc.). Renaming the CLI is a chance to reorganize commands into a clearer, more consistent format.

We decided not to reinvent the wheel and instead follow a well-known CLI pattern: hf <resource> <action>. Isn't hf auth login easier to type and remember?

The full rationale, implementation details, and migration notes are in the blog post: https://huggingface.co/blog/hf-cli

replied to their post 13 days ago

update: pre-training the model would need at least 300GB of ram/vram

posted an update 13 days ago

Post

2938

The development of SnowflakeCore-G1-7B-MoE. I can't say when it would be publish yet because it's big and it requires a lot of computational power.

1 reply

·

posted an update 18 days ago

Post

284

I just finished the benchmarks for FlameF0X/SnowflakeCore-G1-Tiny and FlameF0X/SnowflakeCore-G1-Tiny2 in comparation with openai-community/gpt2 .

reacted to Jaward's post with 🔥 18 days ago

Post

3207

Towards batch sizes too small to meter🎉 beautiful work! And my personal favorite so far - I adore peak performance at small/nano scale. Everyone deserves to run/train AGI locally:) our data, our god model!
They showed that:
- you can train LLMs (upto 1B params) with as low as batch_size=1. This is unconventional given small batch sizes can lead to unstable/spiky training runs.
- you can have a stable train run with just vanilla SGD(stochastic gradient descent), no momentum required🤯
- small batch sizes are more robust to hyperparameters (i.e no worries with initialization)
- smaller batch sizes outperforms (“better per-Flops performance”) larger batch sizes.

“We recommend that practitioners training large models in memory-constrained settings exploit the benefits of small batch sizes rather than trying to emulate the large batch size setting (e.g., through gradient accumulation) typically used in industry.”

I’ve been doing this for ages - my mantra: all my experiments must scale on my 8gb ram m2 before moving to gpu. IOW I love being gpu poor. Checkout my nanoAI algo repo: https://github.com/Jaykef/ai-algorithms, all notebooks run on memory as low as 8gb ram

posted an update 20 days ago

Post

308

Hello! Important announcement, I will rename SnowflakeCore-G1-Medium to SnowflakeCore-G1-Tiny2 because it's going to have the same parameters as the Tiny version, but this one is trained on more data.

1 reply

·

replied to their post 20 days ago

No. I think.

replied to their post 22 days ago

Hi :)

posted an update 22 days ago

Post

744

Currently working on SnowflakeCore-G1-Medium. [Updated loss cruve]

3 replies

·

posted an update 25 days ago

Post

154

Hello there world! I am happy to announce that you now can fine-tune FlameF0X/SnowflakeCore-G1-Tiny , the code for that is in the model card.

I aslo lost the training log 😐

replied to their post 29 days ago

Hey! I did manage to fine tune the model after all.

import os
import argparse
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    Trainer,
    TrainingArguments,
)
from datasets import load_dataset
import torch

# === Disable W&B logging ===
os.environ["WANDB_DISABLED"] = "true"

# === Config ===
config = {
    "model_name": "FlameF0X/SnowflakeCore-G1-Tiny",
    "output_dir": "./snowflake-chatbot",
    "context_window": 512,
    "per_device_batch_size": 1,
    "gradient_accumulation_steps": 16,
    "max_steps": 500,
    "dataloader_workers": 4,
    "dataset_name": "tatsu-lab/alpaca",
    "dataset_split": "train[:10000]",
}

# === Derived ===
config["effective_batch_size"] = (
    config["per_device_batch_size"] * config["gradient_accumulation_steps"]
)

print(f"Effective batch size: {config['effective_batch_size']}")
print(f"Context window: {config['context_window']}")


# === 1. Load tokenizer and model ===
def load_model_and_tokenizer(config):
    print(f"Loading model and tokenizer from {config['model_name']}...")
    tokenizer = AutoTokenizer.from_pretrained(
        config["model_name"],
        trust_remote_code=True,
        force_download=True,
        use_safetensors=True,
        model_max_length=config["context_window"],
    )
    model = AutoModelForCausalLM.from_pretrained(
        config["model_name"],
        trust_remote_code=True,
        force_download=True,
        use_safetensors=True,
    )

    if hasattr(torch, "compile"):
        try:
            print("Compiling model with torch.compile...")
            model = torch.compile(model)
        except Exception as e:
            print(f"Compilation failed: {e}")
    return tokenizer, model


# === 2. Load dataset ===
def load_custom_dataset(name, split):
    print(f"Loading dataset: {name} ({split})...")
    return load_dataset(name, split=split)


# === 3. Format dataset ===
def format_example(example):
    """Update this function to work with different datasets."""
    return {
        "text": f"### Instruction:\n{example['instruction']}\n### Input:\n{example['input']}\n### Response:\n{example['output']}"
    }


# === 4. Tokenize ===
def tokenize_example(example, tokenizer, max_length):
    tokens = tokenizer(
        example["text"],
        truncation=True,
        padding="max_length",
        max_length=max_length,
    )
    tokens["labels"] = tokens["input_ids"].copy()
    return tokens


# === 5. Train ===
def train_model(model, tokenizer, tokenized_dataset, config):
    print("Preparing training arguments...")
    training_args = TrainingArguments(
        output_dir=config["output_dir"],
        per_device_train_batch_size=config["per_device_batch_size"],
        gradient_accumulation_steps=config["gradient_accumulation_steps"],
        max_steps=config["max_steps"],
        logging_dir="./logs",
        logging_steps=20,
        save_strategy="no",
        fp16=torch.cuda.is_available() and not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_available() and torch.cuda.is_bf16_supported(),
        overwrite_output_dir=True,
        report_to=[],
        dataloader_num_workers=config["dataloader_workers"],
        optim="adamw_torch_fused" if torch.cuda.is_available() and hasattr(torch, 'compile') else "adamw_torch",
        remove_unused_columns=False,
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_dataset,
    )

    print("Starting training...")
    trainer.train()
    print("Training completed.")


# === 6. Save ===
def save_model(model, tokenizer, output_dir):
    print(f"Saving model to {output_dir}...")
    model.save_pretrained(output_dir, safe_serialization=False)
    tokenizer.save_pretrained(output_dir)
    print("✅ Model saved.")


# === Main ===
def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--dataset", type=str, default=config["dataset_name"])
    parser.add_argument("--split", type=str, default=config["dataset_split"])
    args = parser.parse_args()

    tokenizer, model = load_model_and_tokenizer(config)
    dataset = load_custom_dataset(args.dataset, args.split)

    print("Formatting dataset...")
    dataset = dataset.map(format_example, num_proc=config["dataloader_workers"], load_from_cache_file=False)

    print("Tokenizing dataset...")
    tokenized = dataset.map(
        lambda x: tokenize_example(x, tokenizer, config["context_window"]),
        batched=True,
        num_proc=config["dataloader_workers"],
        load_from_cache_file=False,
    )
    tokenized.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])

    train_model(model, tokenizer, tokenized, config)
    save_model(model, tokenizer, config["output_dir"])


if __name__ == "__main__":
    main()

posted an update 29 days ago

Post

1203

Hello! I am sad to say but fine-tuning FlameF0X/SnowflakeCore-G1-Tiny is complicated and the instruct version would need to wait some time.

2 replies

·

replied to their post about 1 month ago

Hello kivenemi! Thanks for waiting! Here are the first metrics from SnowflakeCore-G1-Tiny (~355.87M params):

Performance

Generation Speed: 57.26 tokens/sec
Model Size: 1.36 GB
Memory Usage: 6.35 GB (2048 tokens)

Benchmarks (After training)

GSM8K: 20%
MMLU: 0% (issues with the benchmark itself)
HumanEval: 0% (issues with the benchmark itself)

The model is still under development. So the next versions would perform better.
You have the full benchmark at FlameF0X/SnowflakeCore-G1-Benchmark.

posted an update about 1 month ago

Post

227

SnowflakeCore-G1-Tiny has landed on Hugging Face! 🚀. Give it a try and let me know what you think: FlameF0X/SnowflakeCore-G1-Tiny.

posted an update about 1 month ago

Post

253

SnowflakeCore-G1 Update:
Got it running and training! Context window is currently set to 2048 tokens.
Training is active and stable. Will share results once I have some metrics to report.

2 replies

·

replied to their post about 1 month ago

👀

posted an update about 1 month ago

Post

1933

SnowflakeCore-G1 development update: We're building a 24-layer transformer with 32K context and 1024 embedding dimensions - pretty ambitious! Even running at batch_size=1 with heavy gradient accumulation, we're hitting memory walls at 300GB RAM. Scaling up to ~1TB will take some time, but the architecture is looking promising. Thanks for following along with the journey! 😅

1 reply

·

Daniel Fox PRO

AI & ML interests

Recent Activity

Organizations

Performance

Benchmarks (After training)

Daniel Fox PRO

AI & ML interests

Recent Activity

Organizations

FlameF0X's activity

Performance

Benchmarks (After training)