Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Steven Goldfeather's picture
22

Steven Goldfeather

treehugg3
ยท

AI & ML interests

None yet

Recent Activity

new activity 2 days ago
nvidia/Llama-3.1-Nemotron-70B-Reward-HF:Comparability of the results for different prompts
new activity 2 days ago
nvidia/HelpSteer3:The HelpSteer datasets don't overlap, right?
reacted to merterbak's post with ๐Ÿ”ฅ about 1 month ago
OpenAI has released BrowseComp an open source benchmark designed to evaluate the web browsing capabilities of AI agents. This dataset comprising 1,266 questions challenges AI models to navigate the web and uncover complex and obscure information. Crafted by human trainers, the questions are intentionally difficult. (unsolvable by another person in under ten minutes and beyond the reach of existing models like ChatGPT with and without browsing and an early version of OpenAI's Deep Research tool.) Blog Post: https://openai.com/index/browsecomp/ Paper: https://cdn.openai.com/pdf/5e10f4ab-d6f7-442e-9508-59515c65e35d/browsecomp.pdf Code in simple eval repo: https://github.com/openai/simple-evals
View all activity

Organizations

None yet

models 0

None public yet

datasets 0

None public yet
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs