Science · Solutions · Society · Soul

Yu Gu

where models meet meaning .Making intelligence useful — in health, science, and beyond

Large Language Models Multimodal Foundation Models Reasoning Agents

About

I'm Yu Gu (also Aiden Gu; Chinese: 顾禹). I design large language models, multimodal foundation models, and agentic reasoning frameworks. Currently a Principal Applied Scientist at Microsoft Research and Health & Life Sciences, working at the intersection of AI, healthcare, and scientific discovery.

My work has been published in Nature/Science/Cell and major AI conferences (ICLR/NeurIPS/CVPR etc.). Previously, I co-founded an AI startup that grew to a Series C acquisition.

Featured

Selected highlights from recent research.

The illusion of readiness: Stress testing large frontier models on multimodal medical benchmarks

Gu, Yu, Fu, Jingjing, Liu, Xiaodong, et al.

preprint2025

arXiv arXiv:2509.18234

View paper →

Magma: A Foundation Model for Multimodal AI Agents

Jianwei Yang, Reuben Tan, Qianhui Wu, et al.

preprint2025

CVPR

View paper →

Biomedjourney: Counterfactual biomedical image generation by instruction-learning from multimodal patient journeys

Gu, Yu, Yang, Jianwei, Usuyama, Naoto, et al.

preprint2024

arXiv arXiv:2310.10765

View paper →

A whole-slide foundation model for digital pathology from real-world data

Xu, Hanwen, Usuyama, Naoto, Bagga, Jaspreet, et al.

journal2024

Nature

View paper →

A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities

Zhao, Theodore*, Gu, Yu*, Yang, Jianwei, et al.

journal2024

Nature methods

View paper →

Domain-specific language model pretraining for biomedical natural language processing

Gu, Yu, Tinn, Robert, Cheng, Hao, et al.

journal2022

ACM

View paper →

Publications

Research papers, conference proceedings, and scholarly contributions.

View All →

X-reasoner: Towards generalizable reasoning across modalities and domains

Liu, Qianchu, Zhang, Sheng, ... et al.

arXiv arXiv:2505.03981 (2025)

preprint2025

Research

Universal Abstraction: Harnessing Frontier Models to Structure Real-World Data at Scale

Wong, Cliff, Preston, Sam, ... et al.

arXiv arXiv:2502.00943 (2025)

preprint2025

Research

The illusion of readiness: Stress testing large frontier models on multimodal medical benchmarks

Gu, Yu, Fu, Jingjing, ... et al.

arXiv arXiv:2509.18234 (2025)

preprint2025

Research

QuantRad: Advancing Quantitative Reliability in Radiology Report Generation with Cascaded Decoders

Jin, Ying, Codella, Noel C, ... Hwang, Jenq-Neng

arXiv arXiv:x (2025)

preprint2025

Research

QRad: Enhancing Radiology Report Generation by Captioning-to-VQA Reframing

Jin, Ying, Codella, Noel C, ... Hwang, Jenq-Neng

NeurIPS - The Second Workshop on GenAI for Health: Potential, Trust, and Policy Compliance (2025)

conference2025

Research

View all 32 publications →

Research

Areas of focus and ongoing directions.

Large Language Models

active2020–Present

Building and adapting domain-specific LLMs for biomedical NLP and real-world healthcare tasks, including pretraining, fine-tuning, and evaluation.

Publications:

Domain-specific language model pretraining for biomedical natural language processing Fine-tuning large neural language models for biomedical natural language processing Universalner: Targeted distillation from large language models for open named entity recognition Scaling clinical trial matching using large language models: a case study in oncology

LLMPretrainingBiomedical NLPEvaluation

Multimodal Foundation Models

active2022–Present

Designing and stress-testing vision-language foundation models across medical imaging and multimodal benchmarks at scale.

Publications:

Magma: A Foundation Model for Multimodal AI Agents A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities A whole-slide foundation model for digital pathology from real-world data Biomedjourney: Counterfactual biomedical image generation by instruction-learning from multimodal patient journeys BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once The illusion of readiness: Stress testing large frontier models on multimodal medical benchmarks

MultimodalFoundation ModelsVision-LanguageMedical Imaging

Reasoning

active2024–Present

Advancing generalizable reasoning across modalities and domains, and targeted distillation for robust information extraction.

Publications:

X-reasoner: Towards generalizable reasoning across modalities and domains Magma: A Foundation Model for Multimodal AI Agents

ReasoningGeneralizationDistillation

Agents

active2024–Present

Developing multimodal AI agents and workflows that orchestrate tools and reasoning to act in complex real-world settings.

Publications:

Magma: A Foundation Model for Multimodal AI Agents Universal Abstraction: Harnessing Frontier Models to Structure Real-World Data at Scale The illusion of readiness: Stress testing large frontier models on multimodal medical benchmarks

AgentsTool UsePlanningMultimodal

News & Updates

Recent publications, presentations, and milestones across research and collaborations.

📄

The Illusion of Readiness: Stress Testing Large Frontier Models on Multimodal Medical Benchmarks

September 18, 2025

The Illusion of Readiness: Stress Testing Large Frontier Models on Multimodal Medical Benchmarks. In just the first week, the paper has sparked meaningful conversation — highlighted by Eric Topol, shared across Health AI communities, and prompting outreach from Science and Health AI leaders looking for what comes next.

🎤

Announcing QRad

September 2, 2025

We're excited to announce QRad, accepted to NeurIPS - The Second Workshop on GenAI for Health: Potential, Trust, and Policy Compliance, a new project that enhances radiology report generation by captioning-to-VQA reframing.

🎤

BiomedParse topped the CVPR 2025 3D Biomedical Image Segmentation Challenge!

June 5, 2025

BiomedParse topped the CVPR 2025 3D Biomedical Image Segmentation Challenge! Our model delivered best-in-class performance across 42 tasks spanning CT, MRI, PET, ultrasound, and microscopy. check out the announcement

7 more updates

Connect

Models, teams, and a dream — often in that order.

Email ORCID Google Scholar LinkedIn X / Twitter