When I heard the Reasoning Dataset Competition deadline was extended to 9 May, I knew I had time to get in one more entry. ๐ฅ๐ฅ๐ฅ
With the rise of Vibe Coding, and the potential risks that are introduced by humans letting LLMs build their apps for them, lots of people are (rightfully) concerned about the safety of the code that is hitting prod.
In response to that, I'm happy to present my final submission to the Reasoning Dataset Competition and attempt to start benchmarking the ability of LLMs to identify unsafe and / or exploitable code by way of the CoSa (Code Safety) benchmark: ZennyKenny/cosa-benchmark-dataset
Currently a curated set of 200 examples, calibrated on OpenAI's standard issue models (GPT-4.1, o4 mini, and GPT-3.5 Turbo) as "baseline performance" (70% decile). Check it out and drop a โค๏ธ if you think it could be useful or hit the Community section with suggestions / critiques.
The same way the advent of Adobe Illustrator has led to innovation in the way that creative professionals work, I earnestly believe that AI will do the same (contrary to the popular opinion that it represents some regression in the world of creatives).
@natalika and I were speaking about this topic and like most illustrators she has some understandable concerns about the spread of AI in her field. She also told me how much time she spends generating concept art that will never see the light of day in >98% of cases. ๐ก
To me, that sounded like a perfect opportunity to leverage image diffusion in a way that helps artists spend more time creating cool stuff rather than just malevolently mining their work and using it without credit. Using the Black Forest Labs base model FLUX, Replicate, and about $5 of H100 compute, I post-trained a LoRA adapter on a set of her images associated with one project she's working on and spun up an app with Hugging Face Spaces (and Zero GPU for the win).
Now, generating concept art in her particular style takes seconds instead of hours and when it's time to put the work into production, a human designer is still invaluable. And building it in the open hopefully inspires other use cases amongst other designers. ๐
I've created a new dataset using the Algorithm of Thoughts architecture proposed by Sel et al. (2023) in a reasoning context. (paper: https://arxiv.org/pdf/2308.10379)
The dataset simulates the discovery phase of a fictitious VC firm called Reasoned Capital and, once expanded, can be used to create models which are able to make complex, subjective financial decisions based on different criteria.
The generation process encourages recursive problem-solving in increasingly complex prompts to encourage models to assess and reevaluate the conclusions and generated opinions of upstream models. Pretty neat stuff, and I'm not aware of this architecture being used in a reasoning context anywhere else.
I'd be interested to hear the community's thoughts on the applications of AI in the military. Especially in the wargaming space.
This is something that feels inevitable (and realistically, probably already in progress). Doesn't it make sense for us to have an understanding of the mechanics of such processes? Surely they will never be open source.
This dataset is designed to post-train Metareasoning agents, or those agents whose job it is to quickly (and importantly, cheaply) reason through whether it makes sense to launch a full reasoning job or simply use a simple completions job.
Generation notebook (linked in dataset) is open source and pretty well generalized if I don't say so myself, so you can use it to make your own Metareasoning datasets.
Shoutout to @onekq for his inspiring comment on this topic.
Using a Meta LLaMa checkpoint from Unsloth and some help from the HF community, you can capture handwritten notes and convert them into digital format in just a few second.
Really exciting times for AI builders on Hugging Face.
I've spent most of time working with AI on user-facing apps like Chatbots and TextGen, but today I decided to work on something that I think has a lot of applications for Data Science teams: ZennyKenny/comment_classification
This Space supports uploading a user CSV and categorizing the fields based on user-defined categories. The applications of AI in production are truly endless. ๐
Okay this is pretty crazy. Snowflake has CortexAI and Uber is already teasing QueryGPT, both of which prominently feature plain text to SQL features to query your database.
I decided to see how hard it would be to put together something similar using ๐ค smolagents. Turns out, it was pretty straightforward. I managed to get it done in London Luton airport this afternoon.
I've completed the first unit of the just-launched Hugging Face Agents Course. I would highly recommend it, even for experienced builders, because it is a great walkthrough of the smolagents library and toolkit.
GradientBoostingClassifier is an algorithm supported by the Python SciKit library, and now you can quickly train an ML model using this powerful technique on any (viable) dataset in the Hugging Face Hub without a line of code.
On-demand audio transcription is an often-requested service without many good options on the market.
Using Hugging Face Spaces with Gradio SDK and the OpenAI Whisper model, I've put together a simple interface that supports the transcription and summarisation of audio files up to five minutes in length, completely open source and running on CPU upgrade. The cool thing is that it's built without a dedicated inference endpoint, completely on public infrastructure.