Science · Solutions · Society · Soul

Yu Gu

where models meet meaning .Making intelligence useful — in health, science, and beyond

Large Language Models Multimodal Foundation Models Reasoning Agents

About

I'm Yu Gu (also Aiden Gu; Chinese: 顾禹). I design large language models, multimodal foundation models, and agentic reasoning frameworks. Currently Principal Scientist at Microsoft Research and Health & Life Sciences, working at the intersection of AI, healthcare, and scientific discovery.

My work has been published in Nature/Science/Cell and all major AI conferences (ICLR/NeurIPS/CVPR etc.). I led the first enterprise scale rollout of Microsoft's Health AI platform, including Healthcare Agent Orchestrator, and production frontier medical foundation models (MedImageInsight/MedImageParse/CXRReportGen). Previously, I co-founded an AI startup that grew to a Series C acquisition.

Featured

Selected highlights from recent research.

The illusion of readiness: Stress testing large frontier models on multimodal medical benchmarks

Gu, Yu, Fu, Jingjing, Liu, Xiaodong, et al.

preprint2025

arXiv arXiv:2509.18234

View paper →

Magma: A Foundation Model for Multimodal AI Agents

Jianwei Yang, Reuben Tan, Qianhui Wu, et al.

preprint2025

CVPR

View paper →

Biomedjourney: Counterfactual biomedical image generation by instruction-learning from multimodal patient journeys

Gu, Yu, Yang, Jianwei, Usuyama, Naoto, et al.

preprint2024

arXiv arXiv:2310.10765

View paper →

A whole-slide foundation model for digital pathology from real-world data

Xu, Hanwen, Usuyama, Naoto, Bagga, Jaspreet, et al.

journal2024

Nature

View paper →

A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities

Zhao, Theodore*, Gu, Yu*, Yang, Jianwei, et al.

journal2024

Nature methods

View paper →

Domain-specific language model pretraining for biomedical natural language processing

Gu, Yu, Tinn, Robert, Cheng, Hao, et al.

journal2022

ACM

View paper →

Publications

Research papers, conference proceedings, and scholarly contributions.

View All →

Scaling medical imaging report generation with multimodal reinforcement learning

Qianchu Liu, Sheng Zhang, ... Hoifung Poon

Unknown Venue (2026)

preprint2026

X-reasoner: Towards generalizable reasoning across modalities and domains

Liu, Qianchu, Zhang, Sheng, ... et al.

arXiv arXiv:2505.03981 (2025)

preprint2025

Research

Universal Abstraction: Harnessing Frontier Models to Structure Real-World Data at Scale

Wong, Cliff, Preston, Sam, ... et al.

arXiv arXiv:2502.00943 (2025)

preprint2025

Research

The illusion of readiness: Stress testing large frontier models on multimodal medical benchmarks

Gu, Yu, Fu, Jingjing, ... et al.

arXiv arXiv:2509.18234 (2025)

preprint2025

Research

QuantRad: Advancing Quantitative Reliability in Radiology Report Generation with Cascaded Decoders

Jin, Ying, Codella, Noel C, ... Hwang, Jenq-Neng

arXiv arXiv:x (2025)

preprint2025

Research

Check all publications →

Media Coverage

Press features, interviews, and notable mentions from industry leaders.

BusinessWire·October 21, 2025

Amalgam Rx's Medical-Grade AI Doubles Digital Health Engagement — Validated in Peer-Reviewed Studies

Forbes·October 3, 2025

AI Doctors Cheat Medical Tests

GeekWire·June 15, 2025

Microsoft and Providence Create AI That Unlocks Tumor Insights at a Scale Previously Out of Reach

CNBCTV18·June 15, 2025

Microsoft Research Introduces GigaPath: A Novel Vision Transformer for Digital Pathology

Forbes·May 22, 2024

Microsoft Announces New Foundation Model for Digital Pathology, Diving Deeper into Clinical Medicine

VentureBeat·September 15, 2021

Microsoft Researchers Claim State-of-the-Art Biomedical NLP Model

Areas of Focus

Active research directions and ongoing work.

Large Language Models

active2020–Present

Building and adapting domain-specific LLMs for biomedical NLP and real-world healthcare tasks, including pretraining, fine-tuning, and evaluation.

Publications:

Domain-specific language model pretraining for biomedical natural language processing Fine-tuning large neural language models for biomedical natural language processing Universalner: Targeted distillation from large language models for open named entity recognition Scaling clinical trial matching using large language models: a case study in oncology

LLMPretrainingBiomedical NLPEvaluation

Multimodal Foundation Models

active2022–Present

Designing and stress-testing vision-language foundation models across medical imaging and multimodal benchmarks at scale.

Publications:

Magma: A Foundation Model for Multimodal AI Agents A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities A whole-slide foundation model for digital pathology from real-world data Biomedjourney: Counterfactual biomedical image generation by instruction-learning from multimodal patient journeys BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once The illusion of readiness: Stress testing large frontier models on multimodal medical benchmarks

MultimodalFoundation ModelsVision-LanguageMedical Imaging

Reasoning

active2024–Present

Advancing generalizable reasoning across modalities and domains, and targeted distillation for robust information extraction.

Publications:

X-reasoner: Towards generalizable reasoning across modalities and domains Magma: A Foundation Model for Multimodal AI Agents

ReasoningGeneralizationDistillation

Agents

active2024–Present

Developing multimodal AI agents and workflows that orchestrate tools and reasoning to act in complex real-world settings.

Publications:

Magma: A Foundation Model for Multimodal AI Agents Universal Abstraction: Harnessing Frontier Models to Structure Real-World Data at Scale The illusion of readiness: Stress testing large frontier models on multimodal medical benchmarks

AgentsTool UsePlanningMultimodal

News & Updates

Recent publications, presentations, and milestones across research and collaborations.

📄

The Illusion of Readiness: Stress Testing Large Frontier Models on Multimodal Medical Benchmarks

September 18, 2025

The Illusion of Readiness: Stress Testing Large Frontier Models on Multimodal Medical Benchmarks. In just the first week, the paper has sparked meaningful conversation — highlighted by Eric Topol, shared across Health AI communities, and prompting outreach from Science and Health AI leaders looking for what comes next.

🎤

Announcing QRad

September 2, 2025

We're excited to announce QRad, accepted to NeurIPS - The Second Workshop on GenAI for Health: Potential, Trust, and Policy Compliance, a new project that enhances radiology report generation by captioning-to-VQA reframing.

🎤

BiomedParse topped the CVPR 2025 3D Biomedical Image Segmentation Challenge!

June 5, 2025

BiomedParse topped the CVPR 2025 3D Biomedical Image Segmentation Challenge! Our model delivered best-in-class performance across 42 tasks spanning CT, MRI, PET, ultrasound, and microscopy. check out the announcement

7 more updates

Connect

Models, teams, and a dream — often in that order.

Email ORCID Google Scholar LinkedIn X / Twitter

Yu Gu

About

Featured

The illusion of readiness: Stress testing large frontier models on multimodal medical benchmarks

Magma: A Foundation Model for Multimodal AI Agents

Biomedjourney: Counterfactual biomedical image generation by instruction-learning from multimodal patient journeys

A whole-slide foundation model for digital pathology from real-world data

A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities

Domain-specific language model pretraining for biomedical natural language processing

Publications

Scaling medical imaging report generation with multimodal reinforcement learning

X-reasoner: Towards generalizable reasoning across modalities and domains

Universal Abstraction: Harnessing Frontier Models to Structure Real-World Data at Scale

The illusion of readiness: Stress testing large frontier models on multimodal medical benchmarks

QuantRad: Advancing Quantitative Reliability in Radiology Report Generation with Cascaded Decoders

Media Coverage

Amalgam Rx's Medical-Grade AI Doubles Digital Health Engagement — Validated in Peer-Reviewed Studies

AI Doctors Cheat Medical Tests

Microsoft and Providence Create AI That Unlocks Tumor Insights at a Scale Previously Out of Reach

Microsoft AI Breakthrough in Cancer Research: All About GigaTime

Stanford's Use of Microsoft Agentic Platform Leads to Better Analysis

Microsoft Announces Numerous New AI Tools Dedicated To Healthcare

Microsoft Research Introduces GigaPath: A Novel Vision Transformer for Digital Pathology

Microsoft Announces New Foundation Model for Digital Pathology, Diving Deeper into Clinical Medicine

Microsoft Researchers Claim State-of-the-Art Biomedical NLP Model

Areas of Focus

Large Language Models

Multimodal Foundation Models

Reasoning

Agents

News & Updates

The Illusion of Readiness: Stress Testing Large Frontier Models on Multimodal Medical Benchmarks

Announcing QRad

BiomedParse topped the CVPR 2025 3D Biomedical Image Segmentation Challenge!

Connect