罗杰斯's picture

罗杰斯

rojasdiego

·

https://rojasdiego.com

AI & ML interests

LLMs for Code Generation

Recent Activity

liked a model 21 days ago

nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4

liked a dataset 6 months ago

HuggingFaceFW/finepdfs

upvoted a paper 6 months ago

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

View all activity

Organizations

upvoted a paper 6 months ago

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8, 2025 • 206

upvoted a paper 10 months ago

Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding

Paper • 2504.06719 • Published Apr 9, 2025 • 8

upvoted 3 papers 12 months ago

LLM as a Broken Telephone: Iterative Generation Distorts Information

Paper • 2502.20258 • Published Feb 27, 2025 • 27

HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization

Paper • 2503.04598 • Published Mar 6, 2025 • 21

The Curse of Depth in Large Language Models

Paper • 2502.05795 • Published Feb 9, 2025 • 40

upvoted 2 papers about 1 year ago

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30, 2025 • 61

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28, 2025 • 124

upvoted 3 papers over 1 year ago

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Paper • 2411.02337 • Published Nov 4, 2024 • 36

Why Does the Effective Context Length of LLMs Fall Short?

Paper • 2410.18745 • Published Oct 24, 2024 • 17

SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration

Paper • 2410.02367 • Published Oct 3, 2024 • 50

upvoted a collection over 1 year ago

Llama 3.2

This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Dec 6, 2024 • 657

upvoted 3 papers over 1 year ago

Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published Sep 5, 2024 • 92

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Paper • 2405.04324 • Published May 7, 2024 • 25

Scaling Granite Code Models to 128K Context

Paper • 2407.13739 • Published Jul 18, 2024 • 20

upvoted 3 collections over 1 year ago

Code LLMs

6 items • Updated Jan 3, 2025 • 1

Arctic-embed

A collection of text embedding models optimized for retrieval accuracy and efficiency • 8 items • Updated Dec 5, 2024 • 27

MoEs papers reading list

60 items • Updated Nov 4, 2024 • 145

upvoted 3 papers over 1 year ago

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Paper • 2408.16532 • Published Aug 29, 2024 • 50

CogVLM2: Visual Language Models for Image and Video Understanding

Paper • 2408.16500 • Published Aug 29, 2024 • 57

Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22, 2024 • 65