7 18 42

Abhimanyu Hans

ahans1

https://ahans30.github.io/

AI & ML interests

None yet

Recent Activity

liked a dataset 6 days ago

tomg-group-umd/fictionalqa

upvoted a paper 6 days ago

ARGUS: Hallucination and Omission Evaluation in Video-LLMs

upvoted a paper 8 days ago

MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning

View all activity

Organizations

ahans1's activity

liked a dataset 6 days ago

tomg-group-umd/fictionalqa

Viewer • Updated 8 days ago • 31.7k • 300 • 1

upvoted a paper 6 days ago

ARGUS: Hallucination and Omission Evaluation in Video-LLMs

Paper • 2506.07371 • Published 8 days ago • 8

upvoted a paper 8 days ago

MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning

Paper • 2506.05523 • Published 11 days ago • 32

liked 2 datasets about 1 month ago

PrimeIntellect/INTELLECT-2-RL-Dataset

Viewer • Updated May 13 • 285k • 3.04k • 62

nvidia/OpenMathReasoning

Viewer • Updated 20 days ago • 5.68M • 22.3k • 282

liked a model about 1 month ago

nvidia/OpenCodeReasoning-Nemotron-7B

Text Generation • Updated May 7 • 1.84k • • 35

updated a dataset about 1 month ago

tomg-group-umd/wiki_10k

Viewer • Updated May 5 • 10k • 33

published a dataset about 1 month ago

tomg-group-umd/wiki_10k

Viewer • Updated May 5 • 10k • 33

liked a dataset about 2 months ago

openai/gsm8k

Viewer • Updated Jan 4, 2024 • 17.6k • 490k • 764

upvoted a collection 2 months ago

Llama 4

Collection

Llama 4 release • 13 items • Updated Apr 29 • 530

upvoted a paper 4 months ago

Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation

Paper • 2502.19414 • Published Feb 26 • 20

authored a paper 4 months ago

Has My System Prompt Been Used? Large Language Model Prompt Membership Inference

Paper • 2502.09974 • Published Feb 14 • 9

upvoted 3 papers 4 months ago

Has My System Prompt Been Used? Large Language Model Prompt Membership Inference

Paper • 2502.09974 • Published Feb 14 • 9

Gemstones: A Model Suite for Multi-Faceted Scaling Laws

Paper • 2502.06857 • Published Feb 7 • 25

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7 • 142

liked a dataset 5 months ago

code-search-net/code_search_net

Updated Jan 18, 2024 • 7.52k • 305

upvoted a paper 7 months ago

Cut Your Losses in Large-Vocabulary Language Models

Paper • 2411.09009 • Published Nov 13, 2024 • 50

liked a Space 8 months ago

CoTaEval Leaderboard

🚀

View and filter a leaderboard of language model evaluation results

upvoted an article 8 months ago

Article

4D masks support in Transformers

•

Jan 8, 2024

• 23

updated a model 9 months ago

tomg-group-umd/llama-2-7b-lora_r32_step128

Feature Extraction • Updated Sep 26, 2024 • 69