jw2yang commited on
Commit
018fbe3
·
1 Parent(s): 2635739
Files changed (1) hide show
  1. README.md +17 -13
README.md CHANGED
@@ -52,9 +52,10 @@ Magma is a multimodal agentic AI model that can generate text based on the input
52
  * **State-of-the-art Performance:** Magma achieves state-of-the-art performance on various multimodal tasks, including UI navigation, robotics manipulation, as well as generic image and video understanding, in particular the spatial understanding and reasoning!
53
  * **Scalable Pretraining Strategy:** Magma is designed to be **learned scalably from unlabeled videos** in the wild in addition to the existing agentic data, making it strong generalization ability and suitable for real-world applications!
54
 
55
- NOTE: The model is developed by Microsoft and is funded by Microsoft Research.
56
- NOTE: The model is shared by Microsoft Research and is licensed under the MIT License.
57
- NOTE: The model is developed based on Meta LLama-3 as the LLM.
 
58
 
59
  <!-- {{ model_description | default("", true) }}
60
 
@@ -331,18 +332,21 @@ Our model is built based on:
331
  * [DeepSpeed](https://www.deepspeed.ai/)
332
  * [FlashAttenton](https://github.com/HazyResearch/flash-attention)
333
 
334
-
335
-
336
- ## Citation [optional]
337
 
338
  <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
339
 
340
  **BibTeX:**
341
 
342
- N/A
343
- <!-- {{ citation_bibtex | default("[More Information Needed]", true)}} -->
344
-
345
- **APA:**
346
-
347
- N/A
348
- <!-- {{ citation_apa | default("[More Information Needed]", true)}} -->
 
 
 
 
 
 
52
  * **State-of-the-art Performance:** Magma achieves state-of-the-art performance on various multimodal tasks, including UI navigation, robotics manipulation, as well as generic image and video understanding, in particular the spatial understanding and reasoning!
53
  * **Scalable Pretraining Strategy:** Magma is designed to be **learned scalably from unlabeled videos** in the wild in addition to the existing agentic data, making it strong generalization ability and suitable for real-world applications!
54
 
55
+
56
+ ## License
57
+
58
+ The model is developed by Microsoft and is funded by Microsoft Research. The model is shared by Microsoft Research and is licensed under the MIT License.
59
 
60
  <!-- {{ model_description | default("", true) }}
61
 
 
332
  * [DeepSpeed](https://www.deepspeed.ai/)
333
  * [FlashAttenton](https://github.com/HazyResearch/flash-attention)
334
 
335
+ ## Citation
 
 
336
 
337
  <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
338
 
339
  **BibTeX:**
340
 
341
+ ```bibtex
342
+ @misc{yang2025magmafoundationmodelmultimodal,
343
+ title={Magma: A Foundation Model for Multimodal AI Agents},
344
+ author={Jianwei Yang and Reuben Tan and Qianhui Wu and Ruijie Zheng and Baolin Peng and Yongyuan Liang and Yu Gu and Mu Cai and Seonghyeon Ye and Joel Jang and Yuquan Deng and Lars Liden and Jianfeng Gao},
345
+ year={2025},
346
+ eprint={2502.13130},
347
+ archivePrefix={arXiv},
348
+ primaryClass={cs.CV},
349
+ url={https://arxiv.org/abs/2502.13130},
350
+ }
351
+ ```
352
+ <!-- {{ citation_bibtex | default("[More Information Needed]", true)}} -->