Abstract
MemOS is proposed as a memory operating system for Large Language Models to enhance memory management, enabling efficient storage and retrieval, and facilitating continual learning and personalized modeling.
Large Language Models (LLMs) have become an essential infrastructure for Artificial General Intelligence (AGI), yet their lack of well-defined memory management systems hinders the development of long-context reasoning, continual personalization, and knowledge consistency.Existing models mainly rely on static parameters and short-lived contextual states, limiting their ability to track user preferences or update knowledge over extended periods.While Retrieval-Augmented Generation (RAG) introduces external knowledge in plain text, it remains a stateless workaround without lifecycle control or integration with persistent representations.Recent work has modeled the training and inference cost of LLMs from a memory hierarchy perspective, showing that introducing an explicit memory layer between parameter memory and external retrieval can substantially reduce these costs by externalizing specific knowledge. Beyond computational efficiency, LLMs face broader challenges arising from how information is distributed over time and context, requiring systems capable of managing heterogeneous knowledge spanning different temporal scales and sources. To address this challenge, we propose MemOS, a memory operating system that treats memory as a manageable system resource. It unifies the representation, scheduling, and evolution of plaintext, activation-based, and parameter-level memories, enabling cost-efficient storage and retrieval. As the basic unit, a MemCube encapsulates both memory content and metadata such as provenance and versioning. MemCubes can be composed, migrated, and fused over time, enabling flexible transitions between memory types and bridging retrieval with parameter-based learning. MemOS establishes a memory-centric system framework that brings controllability, plasticity, and evolvability to LLMs, laying the foundation for continual learning and personalized modeling.
Community
🧠 MemOS: A Memory OS for AI System
We present MemOS, an open-source, industrial-grade memory operating system for large language models (LLMs), designed to systematically address the challenges of long-term dialogue, cross-session reasoning, and personalized memory management. Unlike traditional RAG or parameter-centric approaches, we treat memory as a first-class system resource. MemOS introduces MemCube, a unified abstraction that encapsulates plaintext, activation, and parameter memories under a standardized scheduling and orchestration framework.
Our system adopts a three-layer architecture, consisting of a memory API layer, a memory scheduling and management layer, and a memory storage and infrastructure layer. We propose a novel Next-Scene Prediction mechanism that proactively preloads relevant memory fragments during inference, significantly reducing latency and token consumption.
🌳 MemOS organizes memory using a tree-structured hierarchy for clarity and scalability, while enabling graph-style cross-links for flexible semantic reasoning. Memories can be inserted, merged, or restructured over time — just like evolving human thought.
🧠 MemOS uses a Memory Scheduler to dynamically manage parametric, activation, and plaintext memories — selecting, preloading, and purifying the most relevant ones for each task. It’s like an OS scheduler, but for AI memory.
On the LoCoMo benchmark, MemOS achieves 159% improvement in temporal reasoning over OpenAI's global memory, with an overall accuracy gain of 38.97% and 60.95% reduction in token overhead, setting a new state-of-the-art in long-term memory management for LLMs.
MemOS is fully open-sourced, modular, and compatible with mainstream LLM ecosystems such as HuggingFace, OpenAI, and Ollama. We hope MemOS helps advance AI systems from static generators to continuously evolving, memory-driven agents.
- 🌐 Project website: https://memos.openmem.net
- 📄 Paper: https://arxiv.org/abs/2507.03724
- 💻 Code: https://github.com/MemTensor/MemOS
- 💬 Discord: https://discord.gg/Txbx3gebZR
- 📬 Contact: [email protected]
我们提出了 MemOS,一套面向大语言模型(LLM)的开源工业级记忆操作系统,旨在系统性解决长期对话、跨会话推理以及个性化记忆管理中的关键挑战。不同于传统的 RAG 或参数中心方法,我们将“记忆”视为与算力同等重要的一类系统资源。MemOS 引入了统一结构 MemCube,在标准化调度与编排框架下封装明文记忆、激活状态和参数记忆,实现全面统一的记忆管理。
系统采用三层架构设计:包括记忆 API 层、记忆调度与管理层,以及记忆存储与基础设施层。我们提出了创新性的 “下一场景预测”(Next-Scene Prediction) 机制,可在推理过程中主动预加载相关记忆片段,显著降低延迟与 token 开销。
🧠 MemOS 配备了一个智能记忆调度器(Memory Scheduler),用于动态管理参数记忆、激活记忆和明文记忆,能够根据任务需求进行选择、预加载与优化过滤,类似操作系统中的调度器,但服务对象是 AI 的记忆系统。
🌳 MemOS 的记忆组织结构采用树状分层模型,既具备清晰的层级性与可扩展性,也支持图结构的语义交叉链接,从而实现灵活的语义推理。系统支持记忆的插入、融合与重构,使其随着交互不断演化,仿佛模拟人类思维的变化过程。
在 LoCoMo 长对话记忆基准 上,MemOS 在时序推理任务中相较 OpenAI 全局记忆方案实现 159% 的性能提升,整体准确率提升 38.97%,Token 开销减少 60.95%,在长期记忆管理方面刷新多个指标的 SOTA 表现。
MemOS 已实现完全开源,具备模块化设计,并兼容 HuggingFace、OpenAI、Ollama 等主流大模型生态系统。我们希望 MemOS 能助力 AI 系统从“静态生成器”迈向“持续进化的记忆驱动型智能体”。
- 🌐 项目官网:https://memos.openmem.net
- 📄 论文链接:https://arxiv.org/abs/2507.03724
- 💻 开源代码:https://github.com/MemTensor/MemOS
- 💬 Discord 讨论组:https://discord.gg/Txbx3gebZR
- 📬 联系邮箱:[email protected]
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper