update
Browse files
README.md
CHANGED
@@ -52,9 +52,10 @@ Magma is a multimodal agentic AI model that can generate text based on the input
|
|
52 |
* **State-of-the-art Performance:** Magma achieves state-of-the-art performance on various multimodal tasks, including UI navigation, robotics manipulation, as well as generic image and video understanding, in particular the spatial understanding and reasoning!
|
53 |
* **Scalable Pretraining Strategy:** Magma is designed to be **learned scalably from unlabeled videos** in the wild in addition to the existing agentic data, making it strong generalization ability and suitable for real-world applications!
|
54 |
|
55 |
-
|
56 |
-
|
57 |
-
|
|
|
58 |
|
59 |
<!-- {{ model_description | default("", true) }}
|
60 |
|
@@ -331,18 +332,21 @@ Our model is built based on:
|
|
331 |
* [DeepSpeed](https://www.deepspeed.ai/)
|
332 |
* [FlashAttenton](https://github.com/HazyResearch/flash-attention)
|
333 |
|
334 |
-
|
335 |
-
|
336 |
-
## Citation [optional]
|
337 |
|
338 |
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
339 |
|
340 |
**BibTeX:**
|
341 |
|
342 |
-
|
343 |
-
|
344 |
-
|
345 |
-
|
346 |
-
|
347 |
-
|
348 |
-
|
|
|
|
|
|
|
|
|
|
|
|
52 |
* **State-of-the-art Performance:** Magma achieves state-of-the-art performance on various multimodal tasks, including UI navigation, robotics manipulation, as well as generic image and video understanding, in particular the spatial understanding and reasoning!
|
53 |
* **Scalable Pretraining Strategy:** Magma is designed to be **learned scalably from unlabeled videos** in the wild in addition to the existing agentic data, making it strong generalization ability and suitable for real-world applications!
|
54 |
|
55 |
+
|
56 |
+
## License
|
57 |
+
|
58 |
+
The model is developed by Microsoft and is funded by Microsoft Research. The model is shared by Microsoft Research and is licensed under the MIT License.
|
59 |
|
60 |
<!-- {{ model_description | default("", true) }}
|
61 |
|
|
|
332 |
* [DeepSpeed](https://www.deepspeed.ai/)
|
333 |
* [FlashAttenton](https://github.com/HazyResearch/flash-attention)
|
334 |
|
335 |
+
## Citation
|
|
|
|
|
336 |
|
337 |
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
338 |
|
339 |
**BibTeX:**
|
340 |
|
341 |
+
```bibtex
|
342 |
+
@misc{yang2025magmafoundationmodelmultimodal,
|
343 |
+
title={Magma: A Foundation Model for Multimodal AI Agents},
|
344 |
+
author={Jianwei Yang and Reuben Tan and Qianhui Wu and Ruijie Zheng and Baolin Peng and Yongyuan Liang and Yu Gu and Mu Cai and Seonghyeon Ye and Joel Jang and Yuquan Deng and Lars Liden and Jianfeng Gao},
|
345 |
+
year={2025},
|
346 |
+
eprint={2502.13130},
|
347 |
+
archivePrefix={arXiv},
|
348 |
+
primaryClass={cs.CV},
|
349 |
+
url={https://arxiv.org/abs/2502.13130},
|
350 |
+
}
|
351 |
+
```
|
352 |
+
<!-- {{ citation_bibtex | default("[More Information Needed]", true)}} -->
|