microsoft-cognitive-service

company

AI & ML interests

None defined yet.

authored a paper 3 months ago

Computer-Use Agents as Judges for Generative User Interface

Paper • 2511.15567 • Published Nov 19, 2025 • 53

authored 19 papers 5 months ago

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

Paper • 2303.11381 • Published Mar 20, 2023 • 2

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation

Paper • 2303.12346 • Published Mar 22, 2023 • 1

Equivariant Similarity for Vision-Language Foundation Models

Paper • 2303.14465 • Published Mar 25, 2023

Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation

Paper • 2304.06671 • Published Apr 13, 2023

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

Paper • 2309.17421 • Published Sep 29, 2023 • 4

OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation

Paper • 2310.07749 • Published Oct 11, 2023 • 5

GPT-4V(ision) as A Social Media Analysis Engine

Paper • 2311.07547 • Published Nov 13, 2023 • 1

MultiSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos

Paper • 2306.04216 • Published Jun 7, 2023

InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models

Paper • 2312.13503 • Published Dec 21, 2023

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17

Bring Metric Functions into Diffusion Models

Paper • 2401.02414 • Published Jan 4, 2024

DisCo: Disentangled Control for Referring Human Dance Generation in Real World

Paper • 2307.00040 • Published Jun 30, 2023 • 26

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA

Paper • 2109.05014 • Published Sep 10, 2021 • 1

UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling

Paper • 2111.12085 • Published Nov 23, 2021

GIT: A Generative Image-to-text Transformer for Vision and Language

Paper • 2205.14100 • Published May 27, 2022 • 1

Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition

Paper • 2403.12339 • Published Mar 19, 2024

PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3

Paper • 2211.09699 • Published Nov 15, 2022 • 2

ReCo: Region-Controlled Text-to-Image Generation

Paper • 2211.15518 • Published Nov 23, 2022

GRiT: A Generative Region-to-text Transformer for Object Understanding

Paper • 2212.00280 • Published Dec 1, 2022