15 3 269

Eugene Siow

eugenesiow

https://eugenesiow.com

AI & ML interests

None yet

Recent Activity

liked a Space about 2 months ago

aisheets/sheets

liked a model 2 months ago

KBlueLeaf/Kohaku-XL-Delta

liked a model 2 months ago

Linaqruf/animagine-xl

View all activity

Organizations

liked a Space about 2 months ago

387

Sheets

🗂

Create and enrich datasets using AI

liked 3 models 2 months ago

liked a dataset 2 months ago

open-r1/Mixture-of-Thoughts

Viewer • Updated May 26 • 699k • 5.92k • 262

liked a model 3 months ago

facebook/KernelLLM

Text Generation • 8B • Updated Jul 2 • 914 • • 167

liked a Space 3 months ago

946

Computer Agent

🖥

Interact with an AI agent to perform web tasks

liked a model 4 months ago

Linaqruf/animagine-xl-2.0

Text-to-Image • Updated Nov 27, 2023 • 257 • 191

liked a dataset 4 months ago

nvidia/ClimbMix

Viewer • Updated Apr 22 • 355M • 1.11k • 30

liked a Space 4 months ago

WebApp1K Models Leaderboard

🥇

View leaderboard of web application models

reacted to hesamation's post with 👍 4 months ago

Post

2229

OpenAI just released a 34-page practical guide to building agents,

Here's 10 things it teaches us:

1➜ agents are different from workflows: they are complete autonomous systems that perform tasks on your behalf. many applications use LLMs for workflows, but this is not an agent.

2➜ use them for tricky stuff: complex decision making, dynamic rules, unstructured data

3➜ core recipe: each agent has three main components: Model (the brain), Tools, Instructions on how to behave

4➜ choose the right brain: set up evals to get a baseline performance, use a smart model to see what's possible, gradually downgrade the model for cost and speed

5➜ tools are key: choose well-defined and tested tools. an agent needs tools to retrieve data and context, and take actions.

6➜ instruction matters A LOT: be super clear telling the agent its goals, steps, and rules. Vague instructions = unpredictable agent. Be explicit.

7➜ start simple, then scale: often a single agent with several tools is ok. don't jump to complex multi-agent systems immediately.

8➜ if you use multi-agents: you can have a "manager" agent directing traffic to specialist agents, or have agents hand off tasks to each other.

9➜ gaurdrails are a MUST: check user input for weird stuff, make sure the agent isn't about to do something risky, filter out private info, block harmful content. Don't let it run wild.

10➜ build and plan for humans: start small, test, improve. always have a plan for when the agent gets stuck or is about to do something high-risk.

Download: https://t.co/fJaCkgf7ph

3 replies

posted an update 4 months ago

Post

1685

GPT-4.1 dropped this week - and it puts OpenAI back in the race for coding & agentic leadership.

⚙️ API only - no ChatGPT toggle for this.
💻 Coding performance is back on par with Claude 3.7 Sonnet & Gemini 2.5 Pro (though Gemini still leads).
💸 Pricing:
• Full: $3.50 / 1M tokens
• Mini: $0.70 / 1M
• Nano: $0.17 / 1M
👉 Gemini 2.5 Pro = best price/perf ($3.44 / 1M)
😵 Claude 3.5 Sonnet = $6 / 1M (!)

🧠 Not a "thinking" model.
📊 Mini shines on general reasoning tasks (e.g. GPQA), but only the full model holds up in SWE-bench-verified (GitHub issue solving).