Marco De Santis's picture
1 8

Marco De Santis

marcodsn

AI & ML interests

None yet

Recent Activity

updated a dataset about 1 hour ago
marcodsn/arxiv-markdown
updated a dataset about 13 hours ago
marcodsn/academic-chains
published a dataset about 17 hours ago
marcodsn/arxiv-markdown
View all activity

Organizations

MakerPlus Lab's profile picture Noetic Labs's profile picture

marcodsn's activity

reacted to davanstrien's post with ❤️ about 23 hours ago
view post
Post
1033
Came across a very nice submission from @marcodsn for the reasoning datasets competition (https://huggingface.co/blog/bespokelabs/reasoning-datasets-competition).

The dataset distils reasoning chains from arXiv research papers in biology and economics. Some nice features of the dataset:

- Extracts both the logical structure AND researcher intuition from academic papers
- Adopts the persona of researchers "before experiments" to capture exploratory thinking
- Provides multi-short and single-long reasoning formats with token budgets - Shows 7.2% improvement on MMLU-Pro Economics when fine-tuning a 3B model

It's created using the Curator framework with plans to scale across more scientific domains and incorporate multi-modal reasoning with charts and mathematics.

I personally am very excited about datasets like this, which involve creativity in their creation and don't just rely on $$$ to produce a big dataset with little novelty.

Dataset can be found here: marcodsn/academic-chains (give it a like!)
published a Space about 1 month ago