Heegyu Kim's picture

Heegyu Kim PRO

heegyu

·

https://sites.google.com/view/heegyu-kim/

AI & ML interests

NLP

Recent Activity

liked a dataset about 12 hours ago

KOREAson/YiSang-STEM_Code-Unfiltered

liked a dataset 3 days ago

data-agents/jupyter-agent-dataset

liked a dataset about 1 month ago

nvidia/Nemotron-Post-Training-Dataset-v1

View all activity

Organizations

upvoted a paper about 2 months ago

EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes

Paper • 2507.11407 • Published Jul 15 • 57

upvoted a paper 5 months ago

Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models

Paper • 2504.04823 • Published Apr 7 • 31

upvoted a paper 6 months ago

EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild

Paper • 2502.14892 • Published Feb 17 • 6

upvoted an article 7 months ago

Article

DABStep: Data Agent Benchmark for Multi-step Reasoning

By

and 5 others •

Feb 4

• 107

upvoted a paper 7 months ago

Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge

Paper • 2502.16457 • Published Feb 23 • 11

upvoted a collection 7 months ago

🧠 Reasoning datasets

Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19 • 168

upvoted a collection 9 months ago

EXAONE-3.5

EXAONE 3.5 language model series including instruction-tuned models of 2.4B, 7.8B, and 32B • 11 items • Updated Jul 7 • 119

upvoted a collection 10 months ago

Cosmos-Tokenizer

A suite of image and video tokenizers • 13 items • Updated 6 days ago • 41

upvoted 5 collections 11 months ago

Granite 3.0 Language Models

A series of language models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 8 items • Updated May 2 • 98

Arch-Function

6 items • Updated Mar 28 • 8

LLM Safety Datasets

Korean safety, ethics dataset • 9 items • Updated Nov 23, 2024 • 3

En Ko Translate

영어 데이터셋을 한글로 번역한 데이터셋입니다. • 6 items • Updated Jul 15 • 2

Magpie Conversation Ko

Magpie 데이터셋 한국어 번역본 (@nayohan님 번역 모델 사용) • 10 items • Updated Nov 6, 2024 • 2

upvoted a paper 11 months ago

MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

Paper • 2406.06565 • Published Jun 3, 2024 • 10

upvoted 5 collections about 1 year ago

3D

Stability AI's suite of models for 3D generation • 7 items • Updated May 20 • 47

4bit Instruct Models

18 items • Updated 18 days ago • 31

DeepSeek-Prover

DeepSeek-Prover-Series • 10 items • Updated Apr 30 • 58

Magpie-Qwen2 Datasets

Dataset built with Qwen2 72B and Qwen2 7B. • 6 items • Updated Jan 13 • 10

Awesome feedback datasets

A curated list of datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO. • 19 items • Updated Apr 12, 2024 • 68

upvoted a collection over 1 year ago

Standard-format-preference-dataset

We collect the open-source datasets and process them into the standard format. • 14 items • Updated May 8, 2024 • 25