Xing Han Lù

xhluca

46 54 58

https://xinghanlu.com

AI & ML interests

None yet

Recent Activity

authored a paper 4 days ago

OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks

upvoted a paper 5 days ago

OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks

upvoted a collection 17 days ago

A3: Agent-as-Annotators

View all activity

Organizations

upvoted a paper 5 days ago

OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks

Paper • 2606.29537 • Published 15 days ago • 22

upvoted a collection 17 days ago

A3: Agent-as-Annotators

Collection

Models and data from "Structured Distillation of Web Agent Capabilities Enables Generalization" (arXiv:2604.07776) • 6 items • Updated Apr 14 • 2

upvoted a paper about 1 month ago

Would you still call this Dax? Novel Visual References in VLMs and Humans

Paper • 2606.05409 • Published Jun 3 • 8

upvoted 2 papers about 2 months ago

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

Paper • 2605.25624 • Published May 25 • 35

Forecasting Downstream Performance of LLMs With Proxy Metrics

Paper • 2605.18607 • Published May 18 • 14

upvoted a paper 3 months ago

Structured Distillation of Web Agent Capabilities Enables Generalization

Paper • 2604.07776 • Published Apr 9 • 23

upvoted 2 papers 4 months ago

LLM2Vec-Gen: Generative Embeddings from Large Language Models

Paper • 2603.10913 • Published Mar 11 • 44

Humans and LLMs Diverge on Probabilistic Inferences

Paper • 2602.23546 • Published Feb 26 • 13

upvoted a paper 8 months ago

Grounding Computer Use Agents on Human Demonstrations

Paper • 2511.07332 • Published Nov 10, 2025 • 107

upvoted 2 papers 9 months ago

FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents

Paper • 2510.03204 • Published Oct 3, 2025 • 7

The Markovian Thinker

Paper • 2510.06557 • Published Oct 8, 2025 • 33

upvoted a paper 12 months ago

MeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language Understanding Pretraining

Paper • 2012.13978 • Published Dec 27, 2020 • 1

upvoted an article about 1 year ago

Article

How to Train Your LLM Web Agent: A Statistical Diagnosis

ppEmiliano

•

Jul 8, 2025

• 15

upvoted 3 papers about 1 year ago

upvoted an article about 1 year ago

Article

MIEB: The Benchmark That Stress-Tests Image-Text Embeddings Like Never Before

isaacchung

•

Apr 24, 2025

• 18

upvoted a paper about 1 year ago

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Paper • 2504.08942 • Published Apr 11, 2025 • 29

upvoted 2 papers over 1 year ago

DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

Paper • 2504.07128 • Published Apr 2, 2025 • 87

Exploiting Instruction-Following Retrievers for Malicious Information Retrieval

Paper • 2503.08644 • Published Mar 11, 2025 • 16

Xing Han Lù

AI & ML interests

Recent Activity

Organizations

xhluca's activity

How to Train Your LLM Web Agent: A Statistical Diagnosis

MIEB: The Benchmark That Stress-Tests Image-Text Embeddings Like Never Before