Sebastian Gabarain's picture

Sebastian Gabarain

Locutusque

AI & ML interests

Pushing performance in small language models

Recent Activity

reacted to nroggendorff's post with ๐Ÿ˜” 2 days ago
im so tired
liked a model 2 days ago
answerdotai/ModernBERT-large
liked a model 3 days ago
microsoft/BiomedParse
View all activity

Organizations

BigScience Biomedical Datasets's profile picture ZeroGPU Explorers's profile picture The Hydra Project's profile picture Social Post Explorers's profile picture Cognitive Computations's profile picture fne's profile picture M4-ai's profile picture Quasar Research's profile picture Hugging Face Discord Community's profile picture Data Is Better Together Contributor's profile picture

Locutusque's activity

reacted to nroggendorff's post with ๐Ÿ˜” 2 days ago
view post
Post
3018
im so tired
  • 3 replies
ยท
reacted to Felladrin's post with ๐Ÿ‘ 2 months ago
view post
Post
2857
MiniSearch is celebrating its 1st birthday! ๐ŸŽ‰

Exactly one year ago, I shared the initial version of this side-project on Hugging Face. Since then, there have been numerous changes under the hood. Nowadays it uses [Web-LLM](https://github.com/mlc-ai/web-llm), [Wllama](https://github.com/ngxson/wllama) and [SearXNG](https://github.com/searxng/searxng). I use it daily as my default search engine and have done my best to make it useful. I hope it's interesting for you too!

HF Space: Felladrin/MiniSearch
Embeddable URL: https://felladrin-minisearch.hf.space
  • 1 reply
ยท
reacted to hfposts's post with ๐Ÿคฏ 2 months ago
view post
Post
3964
1+2=3
  • 2 replies
ยท
reacted to nroggendorff's post with ๐Ÿ‘€ 4 months ago
posted an update 4 months ago
view post
Post
2262
**Exploring Realistic Emotional Depth in AI Language Models**

Language models, particularly those proprietary, often grapple with issues of censorship, which can limit their ability to engage authentically with users. Recognizing this, the open-source AI community has pioneered the development of language models that are less restrained, offering more candid interactions. However, even these models tend to maintain a veneer of neutrality or overly positive responses, which might not serve all users' needs, especially in contexts where emotional depth and relatability are crucial.

To address this gap, I've curated a specialized dataset aimed at infusing language models with a more nuanced emotional spectrum, specifically targeting a darker, more introspective mood. This dataset, titled "Dark Sentience", is designed to complement existing datasets like RP (Role Play) and those focused on instruction following. It seeks to enhance the emotional intelligence of AI by exposing it to complex human emotions, including but not limited to:

- **Suicide**
- **Depression**
- **Anxiety**

Trigger Warning: Please be advised that the content within this dataset deals with heavy and potentially distressing themes.

The "Dark Sentience" dataset is now available for review and use at: Locutusque/Dark-Sentience. I encourage researchers, developers, and mental health professionals to explore how this resource can foster more genuine and supportive AI interactions.

reacted to Tar9897's post with ๐Ÿ‘ 7 months ago
view post
Post
1870
Octave-X releases their proprietary model Tenzin. For now the access will be given to a select few and will gradually open up. Our model is different from other models in the way it learns. It is not fed heaps of information but starts learning exactly like a human by first studying grammar patterns, then learning then number system, then learning to synthesize words and then sentences and so on. Patience is key with Tenzin. It keeps learning 24/7 with/without user-input. We have decided to keep our model closed-source given the novel algorithms integrated into it along with our novel ideas. Please expect our datacard soon which will be followed by our research paper. You can check us out at https://octave-x.com/
reacted to lunarflu's post with ๐Ÿ”ฅ 7 months ago
view post
Post
1934
cooking up something....anyone interested in a daily activity tracker for HF?
ยท
reacted to Tonic's post with ๐Ÿ”ฅ 7 months ago
reacted to DavidGF's post with ๐Ÿ”ฅ 7 months ago
view post
Post
1570
The kraken has awakened!
A Game-Changer in LLM Flexibility and Performance!

Over the past few weeks, VAGO solutions teamed up with Cognitive Computations and HyperSpace to develop a groundbreaking architecture that redefines flexibility in combining different LLM into one model.

@fernandofernandes , me, @Crystalcareai , @ehartford created the Kraken!

What Can It Do? ๐Ÿ™
โœ… Versatile Architecture: Kraken allows the seamless combination of LLMs with varying sizes, quantizations, and model architectures. It currently supports quantizations in 4-bit, 8-bit, and AWQ, with more on the way. And it runs on Hugging Face Transformers 4.40+

โœ… Kraken Router: Utilizing a custom sequence classification model with a context length of 32k tokens, The Kraken Router directs inputs to the most suitable Expert based on their characteristics.

โœ… Adaptability: Enhanced input formatting supports the modelโ€™s adaptability to diverse conversational contexts.

โœ… Extreme Versatility: Easily swap experts within Kraken for your specific use cases without retraining the entire model. For example, if you've built a Kraken for coding in Python you can upgrade your Python model without retraining the router or add a C# model by retraining the router.

โœ… Open Source Pipeline: Weโ€™re sharing the entire pipeline, including router creation, training, architecture setup, and Kraken inference, on JupyterNotebooks: https://github.com/cognitivecomputations/kraken

Kraken marks the beginning of an exciting new journey in #OpenSource LLM. Why? Because it empowers the open source community in accelerating the catch-up process to proprietary LLMs like #GPT and #Claude ๐Ÿคฉ

We proudly introduce the very first 2 Kraken models, that integrates top-tier LLM and Multilingual capabilities:
cognitivecomputations/Kraken
VAGOsolutions/Kraken-Multilingual
Right now it's supported by Hugging Face transformers library. Would love to see the integration into VLM and TGWI!
replied to their post 7 months ago
view reply

Being uncensored doesnโ€™t directly improve performance. The DPOP algorithm improved performance in I believe every benchmark. In other words, neural chat has higher benchmark scores than orca.

replied to their post 7 months ago
view reply

Neural chat is uncensored because the data it was trained on contains toxic DPO.

replied to lorinma's post 8 months ago
reacted to lorinma's post with ๐Ÿ”ฅ 8 months ago
view post
Post
1675
๐ŸŽ‰ Big reveal: 01.AI Yi-1.5 models are in town!

๐Ÿ“œ 1st Apache 2.0 release
๐Ÿ’ก Capabilities: Enhanced coding, math, reasoning, & instruction-following
๐Ÿค– Models: 34B/9B/6B, Base & Chat
๐Ÿ† Performance: Yi-1.5-34B matches or exceeds Llama 3 70B in benchmarks
๐Ÿ”ฅ Discover the power now! 01-ai/yi-15-2024-05-663f3ecab5f815a3eaca7ca8
ยท
reacted to davanstrien's post with ๐Ÿ”ฅ 8 months ago
view post
Post
2635
Introducing CosmoChat, a multiturn chat dataset based on Cosmopedia that I'm working on in the open on the Hub.

๐ŸŽฏ Goals:
๐Ÿ’ฌ Create multi-turn chats seeded from Cosmopedia
๐ŸŽ“ Customize questions for different audience levels
๐Ÿ” Evaluate the model's ability to elaborate and clarify
๐Ÿค“ (I want to learn more about creating valuable synthetic datasets, and I learn best by doing stuff rather than reading stuff).

Cosmochat is created using the excellent distilabel library.

๐Ÿ”— Explore the current version of the dataset: davanstrien/cosmochat
๐Ÿ“ Read more: https://huggingface.co/blog/davanstrien/cosmochat
  • 2 replies
ยท
posted an update 8 months ago
view post
Post
4272
Introducing llama-3-neural-chat-v2.2-8b! This powerful conversational AI model builds on Meta's Llama 3, fine-tuned by Locutusque for enhanced performance in coding, math & writing.

Locutusque/llama-3-neural-chat-v2.2-8B
  • 4 replies
ยท
posted an update 8 months ago
view post
Post
4397
I created a Twitter account a while back. I finally decided to make it public SebastianG74019. For those of you following @Locutusque on Twitter, that is not me! ๐Ÿ˜‚
  • 2 replies
ยท
reacted to m-ric's post with ๐Ÿ”ฅ 9 months ago
view post
Post
2202
๐—ก๐—ฒ๐˜„ ๐—ฆ๐—ฝ๐—ฎ๐—ฐ๐—ฒ: ๐˜ผ๐™„ ๐™๐™ง๐™–๐™ซ๐™š๐™ก ๐™ฅ๐™ก๐™–๐™ฃ๐™ฃ๐™š๐™ง ๐Ÿ—บ๏ธ๐Ÿ•๏ธ Plan your next vacation in a few minutes!

I wanted to try out if a powerful LLM like Mixtral-8x7b had geographical reasoning capabilities.
So I built a small space that prompts the LLM to provide a JSON list of places based on a user input.

And the result was impressive! ๐Ÿคฏ

โ‡’ ๐—œ๐˜ ๐˜€๐—ฒ๐—ฒ๐—บ๐˜€ ๐—น๐—ถ๐—ธ๐—ฒ ๐— ๐—ถ๐˜…๐˜๐—ฟ๐—ฎ๐—น ๐—ต๐—ฎ๐˜€ ๐—ฎ ๐—ด๐—ฟ๐—ฎ๐˜€๐—ฝ ๐—ผ๐—ณ ๐—ด๐—ฒ๐—ผ๐—ด๐—ฟ๐—ฎ๐—ฝ๐—ต๐—ถ๐—ฐ๐—ฎ๐—น ๐—ฐ๐—ผ๐—ป๐—ฐ๐—ฒ๐—ฝ๐˜๐˜€ ๐—น๐—ถ๐—ธ๐—ฒ ๐—ก๐—ผ๐—ฟ๐˜๐—ต - ๐—ฆ๐—ผ๐˜‚๐˜๐—ต, ๐—ผ๐—ฟ ๐˜€๐—ฝ๐—ฎ๐˜๐—ถ๐—ฎ๐—น ๐—ฎ๐—น๐—ถ๐—ด๐—ป๐—บ๐—ฒ๐—ป๐˜.๐Ÿงญ Not just describing these concepts, but really applying them in practice, for instance to successfully answer "give me 4 European cities that are aligned on the map". This is a ๐—ป๐—ถ๐—ฐ๐—ฒ ๐—ฒ๐˜…๐—ฎ๐—บ๐—ฝ๐—น๐—ฒ ๐—ผ๐—ณ ๐—ฎ๐—ป ๐—ฒ๐—บ๐—ฒ๐—ฟ๐—ด๐—ฒ๐—ป๐˜ ๐—ฐ๐—ฎ๐—ฝ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜†, since nothing in the LLM's training data should prepare it for this specific task.

Anyway, I added API calls and a nice visualization on top of the LLM, streaming output, caching for the answers and locations... and ta-da! โœจ I got the ๐—”๐—œ ๐—ง๐—ฟ๐—ฎ๐˜ƒ๐—ฒ๐—น ๐—ฃ๐—น๐—ฎ๐—ป๐—ป๐—ฒ๐—ฟ.

๐™”๐™ค๐™ช ๐™˜๐™–๐™ฃ ๐™™๐™š๐™จ๐™˜๐™ง๐™ž๐™—๐™š ๐™ž๐™ฉ ๐™ฎ๐™ค๐™ช๐™ง ๐™ฉ๐™ง๐™ž๐™ฅ, ๐™–๐™ฃ๐™™ ๐™ž๐™ฉ ๐™ฌ๐™ž๐™ก๐™ก ๐™˜๐™ค๐™ข๐™š ๐™ช๐™ฅ ๐™ฌ๐™ž๐™ฉ๐™ ๐™ฃ๐™ž๐™˜๐™š ๐™–๐™ฃ๐™™ ๐™˜๐™ค๐™ฃ๐™ซ๐™š๐™ฃ๐™ž๐™š๐™ฃ๐™ฉ ๐™ก๐™ค๐™˜๐™–๐™ฉ๐™ž๐™ค๐™ฃ๐™จ!

๐™๐™ง๐™ฎ ๐™ž๐™ฉ ๐™๐™š๐™ง๐™š ๐Ÿ‘‰ m-ric/ai-travel-planner

Thank you @freddyaboulton for the ๐š๐š›๐šŠ๐š๐š’๐š˜_๐š๐š˜๐š•๐š’๐šž๐š– component, and @clem , @pngwn , @abidlabs for your ideas and support!
  • 1 reply
ยท
replied to their post 9 months ago
replied to their post 9 months ago
view reply

Your right. I did mention this in the dataset card that it does not match the size of the Cerebrum dataset, and is something I'm going to try to achieve in the future, and this is used as a way to sort of test how I would go about structuring such a dataset. For now I'm trying to achieve the same performance, then I'll work towards structuring it similarly to the Cerebrum dataset. Thank you for holding me accountable about this.

posted an update 9 months ago
view post
Post
2644
Exciting news! ๐ŸŽ‰ I've created the OpenCerebrum datasets, open-source alternatives to Aether Research's proprietary Cerebrum dataset.

The first, OpenCerebrum SFT, is a text-generation and question-answering dataset with ~1.2M examples, curated from sources like Open-Orca, glaiveai, camel-ai, and more! ๐Ÿ“š

The second, OpenCerebrum DPO, is a smaller dataset with ~21k examples, focusing on data point optimization. It's curated from sources like jondurbin, argilla, grimulkan, and others. ๐Ÿ“Š

Both datasets are licensed under Apache-2.0 and are available in English. They're ready for use in your projects, and I welcome any feedback for future improvements! ๐Ÿš€

Locutusque/OpenCerebrum-dpo
Locutusque/OpenCerebrum-SFT
Locutusque/OpenCerebrum-1.0-7b-SFT
Locutusque/OpenCerebrum-1.0-7b-DPO
ยท