sometimesanotion

sometimesanotion

AI & ML interests

Agentic LLM services, model merging, finetunes, distillation

Recent Activity

replied to CultriX's post 1 minute ago
Script for QA-style dataset generation from custom data: Transform Your Personal Data into High-Quality Training Datasets with help from a LLM. Inspired by a Reddit post (link below) I've created a script that converts custom documents into question-answer pairs for LLM fine-tuning. What it does: 1. Split the input data into chunks (note: this is important, more below!) 2. QA generation: Creates contextually relevant question-answer pairs from each chunk. 3. Quality assurance: Validates outputs using both rule-based filters and LLM judges 4. Exports datasets in both CSV and JSON formats Key features: - Separate model configurations for generation and evaluation - Configurable chunk sizes and question length - Multi-language support (English and Dutch, but easy to add your own!) - Local and cloud API compatibility Quick start: Place your documents (.txt for now) in an input folder and run: ``` python generate-rag-qav4.py \ --input-dir ./rag-input/ \ --output-dir ./rag-output/ \ --output-filename finetuning_qa_dataset \ --gen-model google/gemma-3-4b \ --gen-api-base http://127.0.0.1:1234/v1 \ --judge-model google/gemma-3-4b \ --judge-api-base http://127.0.0.1:1234/v1 \ --min-chunk-len 200 \ --question-chars 20 \ --answer-chars 5 \ --lang en ``` Pro tip: The --min-chunk-len parameter is critical. Too short (< 150 chars) and questions lack context; too long (> 1000 chars) and the model struggles with focus. Start with 200-400 characters and adjust based on your content type! Use cases: - Personal knowledge base fine-tuning - Domain-specific QA dataset creation - RAG system training data preparation Note: The script includes comprehensive error handling and progress tracking, and allows resuming progress should the process get interrupted. Note2: Original Reddit post that gave me the idea: https://www.reddit.com/r/LocalLLaMA/s/avkdzk8NSn The script can be found here: https://gist.github.com/CultriX-Github/9d53565214d56b12b9002a56230d1c00
liked a model 5 days ago
MaatAI/Seshat-Qwen3-8B
View all activity

Organizations

Hugging Face Discord Community's profile picture

Posts 6

view post
Post
1761
The capabilities of the new Qwen 3 models are fascinating, and I am watching that space!

My experience, however, is that context management is vastly more important with them. If you use a client with a typical session log with rolling compression, a Qwen 3 model will start to generate the same messages over and over. I don't think that detracts from them. They're optimized for a more advanced MCP environment. I honestly think the 8B is optimal for home use, given proper RAG/CAG.

In typical session chats, Lamarck and Chocolatine are still my daily drives. I worked hard to give Lamarck v0.7 a sprinkling of CoT from both DRT and Deepseek R1. While those models got surpassed on the leaderboards, in practice, I still really enjoy their output.

My projects are focusing on application and context management, because that's where the payoff in improved quality is right now. But should there be a mix of finetunes to make just the right mix of - my recipes are standing by.
view post
Post
4864
I'd like to draw your attention to a Lamarck-based experiment which uses Arcee AI's newly published arcee_fusion merge method for three out of its four merges. Yes, just four. This is a simple one, and its recipe is fully open:

sometimesanotion/Lamarck-14B-v0.7-Fusion

It unifies three branches, all of which feature models which bring Lamarck-14B-v0.7 and Qwenvergence-14B-v12-Prose together. One side features @jpacifico 's jpacifico/Chocolatine-2-14B-Instruct-v2.0.3 and the other features @suayptalha 's suayptalha/Lamarckvergence-14B paired with my models which were their merge ancestors.

A fusion merge - of a fusion merge and a SLERP of a fusion and older merge - should demonstrate the new merge method's behavior in interesting ways, especially in the first 1/4th of the model where the SLERP has less impact.

I welcome you to kick the tires and learn from it. It has prose quality near Qwenvergence v12's - as you'd expect.

Thank you, @mradermacher and @MaziyarPanahi , for the first-day quantizations! Your work helped get me started. https://huggingface.co/models?other=base_model:quantized:sometimesanotion/Lamarck-14B-v0.7-Fusion

datasets 0

None public yet