Post
2650
Introducing Completionist, an open-source command-line tool that automates synthetic dataset generation.
It works by iterating over an existing HF dataset and by using a LLM to create completions.
- Problem: You need a fast way to create custom datasets for fine-tuning or RAG, but you want the flexibility to use different LLM backends or your own infrastructure.
- Solution: Completionist connects with any OpenAI-compatible endpoint, including Ollama and LM Studio, or a Hugging Face inference endpoint.
A simple CLI like Completionist gives you the possibility to take full control of your synthetic data generation workflow.
👉 Check out Completionist on GitHub: https://github.com/ethicalabs-ai/completionist
Synthetic Dataset Example: ethicalabs/kurtis-mental-health-v2-sft-reasoning
It works by iterating over an existing HF dataset and by using a LLM to create completions.
- Problem: You need a fast way to create custom datasets for fine-tuning or RAG, but you want the flexibility to use different LLM backends or your own infrastructure.
- Solution: Completionist connects with any OpenAI-compatible endpoint, including Ollama and LM Studio, or a Hugging Face inference endpoint.
A simple CLI like Completionist gives you the possibility to take full control of your synthetic data generation workflow.
👉 Check out Completionist on GitHub: https://github.com/ethicalabs-ai/completionist
Synthetic Dataset Example: ethicalabs/kurtis-mental-health-v2-sft-reasoning