sometimesanotion

sometimesanotion

AI & ML interests

Agentic LLM services, model merging, finetunes, distillation

Recent Activity

Organizations

Hugging Face Discord Community's profile picture

sometimesanotion's activity

reacted to sequelbox's post with ๐Ÿ”ฅ about 5 hours ago
view post
Post
170
EARLY SNEAK PREVIEW: get a first look at the Celestia 3 science-reasoning dataset, built with DeepSeek's newest R1-0528 reasoning model! Subjects include physics, chemistry, biology, computer science, Earth science, astronomy, and information theory.

This early look contains the first 14k rows, all synthetic responses using deepseek-ai/DeepSeek-R1-0528

SEE IT HERE: sequelbox/Celestia3-DeepSeek-R1-0528-PREVIEW

Support our releases: sequelbox/SupportOpenSource

Coming up we'll have more dataset releases, including some novel reasoning and analysis methods - we think an important role for open source researchers is experimenting with new response styles on top of the increasingly excellent base models available to finetune.

more to come soon!
allegra
replied to CultriX's post 2 days ago
view reply

Now imagine this as a hashtag generator and so a RAG search can find great context. :)

replied to CultriX's post 2 days ago
view reply

Neat! I've transitioned from wanting more from a model's one-shot answers to breaking things down and walking through the problem with cached context. This effectively means simulating most of the thinking block, but by tool usage and RAG.

I'm happily using our models from months ago to do it. If anything - even Lamarck 0.7's use of thinking blocks are a bit much. I'm using Lamarck 0.7 Fusion (my best GPQA model, though it didn't break your record and is best used where modest IFEVAL isn't a blocker) and /nothink with ValiantLab's Qwen3 models in concert.

I suspect I'll try some merges soon to give this toolchain better models, leaderboard or no leaderboard!

replied to sequelbox's post 7 days ago
view reply

I've been using Esper3 8B and 14B for first-pass code review. I am quite pleased.

Have you considered fine-tuning a 1.7B or smaller model for autocomplete?

replied to CultriX's post 7 days ago
view reply

I've been thinking a lot about using small caches of embeddings for local RAG lately. Have you considered an HTTP caching proxy like Squid as a low-impact source? It would retrieve what a user is reading anyway, and what's in their field of interest. A browser extension to signal some limited ingestion when a page is bookmarked might fit a lot of use cases.

For many reasons, smart management of context windows is my top priority with AI now!

posted an update about 1 month ago
view post
Post
1762
The capabilities of the new Qwen 3 models are fascinating, and I am watching that space!

My experience, however, is that context management is vastly more important with them. If you use a client with a typical session log with rolling compression, a Qwen 3 model will start to generate the same messages over and over. I don't think that detracts from them. They're optimized for a more advanced MCP environment. I honestly think the 8B is optimal for home use, given proper RAG/CAG.

In typical session chats, Lamarck and Chocolatine are still my daily drives. I worked hard to give Lamarck v0.7 a sprinkling of CoT from both DRT and Deepseek R1. While those models got surpassed on the leaderboards, in practice, I still really enjoy their output.

My projects are focusing on application and context management, because that's where the payoff in improved quality is right now. But should there be a mix of finetunes to make just the right mix of - my recipes are standing by.
replied to their post 3 months ago
view reply

@Inschrift-Spruch-Raum ,I am looking through recent PRs to mergekit, and I am optimistic that Lamarck's recipes will be working again soon!

When that happens, there will be two efforts: one to make a compelling non-CoT model, and another to blend CoT in right amounts.

Lamarck's multilingual capabilities improved noticeably from light influence of Krystalan/DRT-14B in v0.6, and merging from other CoT models like DeepSeek R1 is a matter of careful moderation. I will always put the overall apparent quality of translation, prose, and reasoning first.

replied to their post 3 months ago
view reply

No worries! See, I agree, the recipe behind Lamarck is pretty good, and there's a lot more to get out of it. It'll likely depend on getting multiple mergekit versions working on the pipeline. The new mergekit's fusion and sce merges offer some interesting potential, but I use fine-grained sliced merges to control the mix of branches, which last I checked, work only with older mergekit and bitsnbytes.

By now there are ample upgrades to try. I did feel Lamarck v0.7 was a proof-of-concept and had plenty of headroom to grow!

replied to their post 3 months ago
replied to their post 3 months ago
view reply

The numbers are in! The results are fascinating.
Screenshot_20250225_044936.webp

Though IFEVAL skewed low compared to the ancestor model's average, and Lamarckvergence's improved MATH didn't come through, this model is strong in several ways. The GPQA score suggests as much. These are scores I'm pretty sure I can improve without giving up much of the interesting gains.

What's more, my subjective impression is that its prose and consistency get a boost from Chocolatine. @jpacifico , I think arcee_fusion is a merge method that has a lot to offer for your future base models! This also bodes very well for the next several merges to come.

replied to csabakecskemeti's post 3 months ago
view reply

I've been doing all my LoRA work on AMD hardware with Linux; I'm looking forward to your notes! I sometimes still do it on CPU because it's easy to renice the task priority so the foreground tasks stay snappy.

The main challenge I have is keeping a solid ROCm bitsandbytes install when other packages want updates.

reacted to csabakecskemeti's post with ๐Ÿš€ 3 months ago
view post
Post
2805
Testing Training on AMD/ROCm the first time!

I've got my hands on an AMD Instinct MI100. It's about the same price used as a V100 but on paper has more TOPS (V100 14TOPS vs MI100 23TOPS) also the HBM has faster clock so the memory bandwidth is 1.2TB/s.
For quantized inference it's a beast (MI50 was also surprisingly fast)

For LORA training with this quick test I could not make the bnb config works so I'm running the FT on the fill size model.

Will share all the install, setup and setting I've learned in a blog post, together with the cooling shroud 3D design.
ยท
replied to their post 3 months ago
posted an update 3 months ago
view post
Post
4865
I'd like to draw your attention to a Lamarck-based experiment which uses Arcee AI's newly published arcee_fusion merge method for three out of its four merges. Yes, just four. This is a simple one, and its recipe is fully open:

sometimesanotion/Lamarck-14B-v0.7-Fusion

It unifies three branches, all of which feature models which bring Lamarck-14B-v0.7 and Qwenvergence-14B-v12-Prose together. One side features @jpacifico 's jpacifico/Chocolatine-2-14B-Instruct-v2.0.3 and the other features @suayptalha 's suayptalha/Lamarckvergence-14B paired with my models which were their merge ancestors.

A fusion merge - of a fusion merge and a SLERP of a fusion and older merge - should demonstrate the new merge method's behavior in interesting ways, especially in the first 1/4th of the model where the SLERP has less impact.

I welcome you to kick the tires and learn from it. It has prose quality near Qwenvergence v12's - as you'd expect.

Thank you, @mradermacher and @MaziyarPanahi , for the first-day quantizations! Your work helped get me started. https://huggingface.co/models?other=base_model:quantized:sometimesanotion/Lamarck-14B-v0.7-Fusion
ยท
replied to jjokah's post 3 months ago
view reply

Right-sizing language models is something I'm really here for. I find that a 1.5B parameter model fronting simple questions from a backing RAG source that a larger model gradually works on is more scalable. Classic information sources and stores can be QA'd, and they don't have such huge energy footprints.

AI will work out better if we give humans, classic code, SLMs, and frontier LLMs the roles they're right-sized for, and ensure data privacy and individual dignity at every stage of the contract.

reacted to jjokah's post with ๐Ÿ‘ 3 months ago
view post
Post
4644
The past few years have been a blast for artificial intelligence, with large language models (LLMs) stunning everyone with their capabilities and powering everything from chatbots to code assistants. However, not all applications demand the massive size and complexity of LLMs, the computational power required makes them impractical for many use cases. This is why Small Language Models (SLMs) entered the scene to make powerful AI models more accessible by shrinking in size.

In this article we went through what SLMs are, how they are made small, their benefits and limitations, real-world use cases, and how they can be used on mobile and desktop devices.
https://huggingface.co/blog/jjokah/small-language-model
  • 2 replies
ยท
replied to their post 4 months ago
posted an update 4 months ago
view post
Post
883
I am really pleased to see jpacifico/Chocolatine-2-14B-Instruct-v2.0.3 take #4 on the 14B segment of the Open LLM leaderboard. It is a fine-tune of a merge of Arcee's arcee-ai/Virtuoso-Small-v2, and my sometimesanotion/Lamarck-14B-v0.7 and sometimesanotion/Qwenvergence-14B-v12-Prose-DS. Don't let the numbers fool you, in its element, it's quite smooth. I really enjoy merges of Lamarck with near siblings like this one.

Don't be surprised when it's challenging to bring in the full reasoning strength of a reason-heavy prose model like Qwenvergence v12-DS into a high IFEVAL model like Lamarck or Virtuoso Small v2. That's a lot of work to get right, because IFEVAL, precise reasoning, and prose quality are often in tension against each other. Gaining as much as this did is really respectable, and fine-tuning it makes it a more stable base for the coming iterations.
  • 1 reply
ยท
reacted to sequelbox's post with โž• 4 months ago
reacted to CultriX's post with ๐Ÿ”ฅ 4 months ago
view post
Post
1768
# Multi-Agent Collaboration for Coding Tasks - Updated Space!

This version does not rely on AutoGen.
The user simply enters his OPENAI_API_KEY and a task and the Space goes to work, employing a
- 1. prompt-enhancer agent,
- 2. an orchestrator agent,
- 3. a coder agent,
- 4. a code-reviewing agent and
-5. a code documentation generator agent.

See below image for an example workflow:

CultriX/MultiAgent-CodeTask
  • 1 reply
ยท