ONEKQ AI

company

AI & ML interests

Benchmark, Code Generation, LLM

Recent Activity

onekqย  updated a Space 8 days ago
onekq-ai/README
onekqย  updated a Space 9 days ago
onekq-ai/WebApp1K-models-leaderboard
onekqย  updated a model 22 days ago
onekq-ai/OneSQL-v0.1-Qwen-1.5B-GGUF
View all activity

onekq-ai's activity

onekqย 
posted an update 3 days ago
view post
Post
1821
I've recently attended a panel on AI applications. The panelists are managers/directors of Fortune 500 companies. These people make things happen and own results, so their stories and pain points are fresh.

(1) Models are used EVERYWHERE, customer facing and internal support, etc.
(2) A successful application must improve one of the following: revenue (๐Ÿ’ต๐Ÿ’ต), cost (๐Ÿ’ต๐Ÿ’ต), CSAT (still ๐Ÿ’ต๐Ÿ’ต)
(3) They proactively search on ๐Ÿค—HF๐Ÿค— for models and use them. Open source models (especially small ones) can flexibly fit into their existing workflows/infras, which enable them to deliver, and fast.
(4) The main barrier for adoption is license. A director told me they picked a model and finetuned it, then learned they would have to share enhancements. As a result, they dropped this model and the million dollar impact went to another model.

So to fellow model builders:
(1) celebrate that our work is useful and generate lots of values
(2) make your license permissive if you want maximum impact
  • 1 reply
ยท
onekqย 
posted an update 5 days ago
view post
Post
794
Heard good things about this model and no inference providers support it ...

THUDM/GLM-4-9B-0414
  • 6 replies
ยท
onekqย 
posted an update 6 days ago
view post
Post
413
This post discussed the same trend as the Sutton post, but is more concrete and down-to-earth.

https://ysymyth.github.io/The-Second-Half/

Two takeaways for me. (1) deep neural network is the backbone to unify everything. RLHF will stand the test of time because it brings two distinct fields (NLP and RL) onto the same model weights. (2) language model will continue to play a central role in the era of agent. It probably won't be the end game to AGI, but definitely not offramp.
onekqย 
posted an update 8 days ago
view post
Post
1515
This is bitter lesson 2.0
https://storage.googleapis.com/deepmind-media/Era-of-Experience%20/The%20Era%20of%20Experience%20Paper.pdf

If this reads too lofty to you, consider some low-hanging fruits. Experiences here are reward signals we send to LLMs, e.g. human score in RLHF, verification in AlphaProof, or test results for code generation.

RFT (reinforced finetuning) will become main stream, and IMO make LLMs behave more like agents.
  • 2 replies
ยท
onekqย 
updated a Space 8 days ago
onekqย 
posted an update 9 days ago
onekqย 
posted an update 10 days ago
onekqย 
posted an update 11 days ago
onekqย 
posted an update 12 days ago
view post
Post
1989
I used three posts to explain GPU/CPU and LLM performances, now finally circle back to my own model.๐Ÿ˜…

OneSQL needs GPU because it processes long prompt. It is not a chatbot which replies short prompts with long answers. I call models of my kind workhorse models.

We all have to scramble for GPUs to get adoption. Below are a few ways.

You can inherit it. If you have a new Mac machine. Congratulations, you have GPU.

You can leverage it. Get inference providers to adopt your model, then you switch from CapEx to OpEx.

Or you buy it. Go frugal. Find older GPUs with enough HBMs to house your model.
onekqย 
posted an update 13 days ago
view post
Post
697
I just compared tasks with different input/output lengths. CPU/GPU performances are very different here.

The LLMs we use today are autoregressive or causal inference models, meaning the generation of each output token depends on all previous tokens. Since the model must generate one token at a time, it sets a hard limit on parallelism. The chatbot simulating human typing is in fact a UI trick to gloss over this fundamental limit. This is great news for CPUs because it levels the playing field.

But when processing input tokens, this limit doesn't exist. The GPU can fire up thousands of cores (vs dozens of CPU cores) to process as many input tokens as it can, all at once. Here, GPU enjoys a significant speed margin over CPU. The longer the prompt, the bigger the margin.

So, when it comes to user experience, both GPU and CPU can output text at decent speed. What really distinguishes them is the initial wait time, i.e. prompt processing delay.
  • 1 reply
ยท
onekqย 
posted an update 15 days ago
view post
Post
991
I just compared CPU vs GPU. CPU is actually good for tasks with short prompt and long answer. For such tasks, we usually treat LLM as consultant or teacher.

Say you are filing taxes and ask "what is form XXXX?" The chat bot will return an essay to explain the form and walk you through scenarios.

But when you decide to file this form, LLM becomes your assistant/agent. Suddenly the prompt becomes (much) longer than the answer. You throw in bunch of documents, and ask the LLM to fill out the form for you.

This is when we need GPU. I will get into details in the next post.
  • 1 reply
ยท
onekqย 
posted an update 17 days ago
view post
Post
2577
We desperately need GPU for model inference. CPU can't replace GPU.

I will start with the basics. GPU is designed to serve predictable workloads with many parallel units (pixels, tensors, tokens). So a GPU allocates as much transistor budget as possible to build thousands of compute units (Cuda cores in NVidia or execution units in Apple Silicon), each capable of running a thread.

But CPU is designed to handle all kinds of workloads. CPU cores are much larger (hence a lot fewer) with branch prediction and other complex things. In addition, more and more transistors are allocated to build larger cache (~50% now) to house the unpredictable, devouring the compute budget.

Generalists can't beat specialists.
ยท
onekqย 
posted an update 18 days ago
onekqย 
posted an update 19 days ago
onekqย 
posted an update 22 days ago