Microsoft Adds Multi-Model ‘Critique’ and ‘Council’ to Microsoft 365 Copilot Researcher for Deeper, More Accurate AI Research

Microsoft Adds Multi-Model ‘Critique’ and ‘Council’ to Microsoft 365 Copilot Researcher for Deeper, More Accurate AI Research

User avatar placeholder
Written by Dave W. Shanahan

March 30, 2026

Microsoft is turning Microsoft 365 Copilot Researcher into a much more powerful deep research agent with new multi‑model features called Critique and Council. These upgrades aim to deliver research reports that are more accurate, more complete, and easier to trust for serious work.

Microsoft 365 Copilot Researcher gets multi‑model “Critique” and “Council”

In a new post on the Microsoft 365 Copilot Blog, Microsoft outlines how Researcher now uses multiple AI models instead of relying on a single system for everything. The company positions these changes as a major step toward higher‑quality, benchmark‑validated research output for complex tasks in enterprises and professional settings.

Microsoft Adds Multi-Model ‘Critique’ and ‘Council’ to Microsoft 365 Copilot Researcher for Deeper, More Accurate AI Research
DRACO (Deep Research Accuracy, Completeness, and Objectivity) benchmark scores across 100 complex research tasks spanning 10 domains. All results are sourced from the original paper [Zhong et al., arXiv:2602.11685 (Feb. 2026)], except for Researcher with Critique. Researcher with Critique achieves a substantial improvement of +7.0 points (SEM ±1.90) in the aggregated score, +13.88% over Perplexity Deep Research (Claude Opus 4.6 model), the best system reported in the paper.

Researcher is described as a deep research agent built for “complex research in the flow of work,” and these new capabilities are designed to improve not just what answers you get, but how well those answers are sourced, evaluated, and presented. The two features—Critique and Council—approach that goal from different angles.

Critique: two-model system for deeper, cleaner research

Critique is a new multi‑model deep research pipeline that deliberately separates generation from evaluation. One model plans the task, runs retrieval, and drafts a full report; a second model then steps in as an expert reviewer, focused on strengthening quality rather than rewriting the work from scratch.

Microsoft Adds Multi-Model ‘Critique’ and ‘Council’ to Microsoft 365 Copilot Researcher for Deeper, More Accurate AI Research

Microsoft says Critique uses models from “Frontier labs,” including Anthropic and OpenAI, and that this architecture outperforms traditional single‑model approaches on the DRACO benchmark (Deep Research Accuracy, Completeness, and Objectivity). Internally, Critique is driven by rubric‑based evaluation and focuses on three key dimensions:

  • Source reliability: Prioritizing reputable, domain‑appropriate, verifiable sources.

  • Report completeness: Ensuring the response fully addresses the user’s intent with relevant and unique insights.

  • Strict evidence grounding: Requiring key claims to be tightly tied to reliable sources and precise citations.

According to Microsoft, Critique becomes the default Researcher experience whenever users select “Auto” in the model picker, effectively making this two‑model architecture the standard path for deep research queries.

DRACO benchmark: big gains over single‑model systems

Microsoft Adds Multi-Model ‘Critique’ and ‘Council’ to Microsoft 365 Copilot Researcher for Deeper, More Accurate AI Research

To validate Critique, Microsoft tested it on the DRACO benchmark—100 complex research tasks across 10 domains, originally introduced by researchers from Perplexity and academia earlier in 2026. All systems in DRACO are graded by OpenAI’s GPT‑5.2, which the benchmark paper identifies as the strictest of three judge models.

Using the same evaluation protocol and configuration as the original DRACO paper, Microsoft reports that Researcher with Critique delivers:

  • A +7.0 point gain on the aggregated DRACO score, amounting to a 13.88% improvement over Perplexity Deep Research (Claude Opus 4.6), the best system reported in the original benchmark.

  • Statistically significant gains across all four DRACO dimensions: breadth and depth of analysis (+3.33), presentation quality (+3.04), and factual accuracy (+2.58), with citation quality also improving thanks to stricter grounding and better use of existing sources.

The query set spans domains like medicine, technology, and law, and Microsoft notes that Researcher with Critique scores higher than the single‑model configuration across all ten domains. Eight of those domains show statistically significant improvements (p < 0.05), with Academic and Needle‑in‑a‑Haystack being the only exceptions due to high variance.

Microsoft Adds Multi-Model ‘Critique’ and ‘Council’ to Microsoft 365 Copilot Researcher for Deeper, More Accurate AI Research

Council: side‑by‑side model comparison with a judging layer

While Critique focuses on a two‑step generator‑reviewer pipeline, Council introduces an alternate mode where multiple models work in parallel. When users choose “Model Council” in the Researcher model picker, an Anthropic model and an OpenAI model each generate their own full report on the same query.

After both reports are produced, a dedicated judge model analyzes them and creates:

  • A distilled summary of the key findings.

  • Highlights of where the models agree and where they diverge—in facts, magnitude, framing, or interpretation.

  • Callouts of unique insights that only one model surfaced.

Council is meant for scenarios where users want to compare perspectives and gain confidence by seeing how multiple high‑end models treat the same research question, rather than relying on a single synthesized answer.

Microsoft Adds Multi-Model ‘Critique’ and ‘Council’ to Microsoft 365 Copilot Researcher for Deeper, More Accurate AI Research

Availability for Frontier customers

Critique and Council are available today in the Frontier program, Microsoft’s early‑access initiative for customers driving large‑scale Copilot and agent adoption. Through Frontier, organizations can start experimenting with multi‑model intelligence in Researcher, incorporating these capabilities into their broader “Frontier transformation” with Microsoft 365 Copilot and agentic workflows.

For knowledge workers who depend on deep research—across domains like medicine, law, technology, and more—these upgrades position Researcher as a more robust, benchmark‑validated alternative to traditional single‑model AI research tools, with a strong emphasis on accuracy, completeness, and transparent evidence.

Recent Posts You Might Like


Discover more from Microsoft News Now

Subscribe to get the latest posts sent to your email.

Image placeholder

I'm Dave W. Shanahan, a Microsoft enthusiast with a passion for Windows, Xbox, Microsoft 365 Copilot, Azure, and more. I started MSFTNewsNow.com to keep the world updated on Microsoft news. Based in Massachusetts, you can email me at davewshanahan@gmail.com.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.