Microsoft Mu Is The Super Tiny Language Model That Will Power AI in Windows Settings on Copilot+ PCs

Microsoft Mu Is The Super Tiny Language Model That Will Power AI in Windows Settings on Copilot+ PCs

Written by Dave W. Shanahan

June 23, 2025

Microsoft has announced the introduction of Mu, a new on-device small language model (SLM) designed to revolutionize the way users interact with Windows Settings on Copilot+ PCs. Developed by the Windows Applied Sciences team, Mu is engineered for high efficiency, low latency, and seamless integration with Neural Processing Units (NPUs), enabling a new class of AI-powered experiences directly on Windows devices.

What Is Mu?

Mu is a compact, task-specific language model that runs entirely on-device, specifically leveraging the NPU hardware found in Copilot+ PCs. Unlike cloud-based models that require constant internet connectivity and can introduce latency, Mu operates locally, ensuring fast, private, and reliable responses to user queries within Windows Settings.

The model is currently available to Windows Insiders in the Dev Channel and is responsible for mapping natural language queries—such as “Turn on dark mode” or “Increase screen brightness”—directly to the appropriate system settings actions.

Technical Architecture and Efficiency

Mu distinguishes itself through several technical innovations:

Encoder-Decoder Transformer Architecture: Unlike traditional decoder-only models, Mu employs an encoder-decoder design. The encoder processes the input and generates a fixed-length latent representation, while the decoder uses this representation to generate the output. This separation allows Mu to reuse the encoded input, significantly reducing computational and memory overhead compared to decoder-only models.
Optimized for NPUs: Mu’s architecture and parameter shapes are meticulously tuned to align with the parallelism and memory constraints of NPUs. For example, the model uses a 2/3–1/3 split between encoder and decoder layers, maximizing performance per parameter. Weight sharing between input and output embeddings further reduces memory usage, a critical consideration for edge devices.
Performance Metrics: On a Qualcomm Hexagon NPU, Mu achieves 47% lower first-token latency and 4.7 times higher decoding speed compared to a similarly sized decoder-only model. This results in real-time responsiveness, with the model generating over 100 tokens per second—essential for a smooth user experience in Windows Settings.

Advanced Model Features

Mu incorporates several state-of-the-art transformer upgrades to maximize its efficiency and accuracy:

Dual LayerNorm:
Normalizing both before and after each sub-layer stabilizes training and keeps activations well-scaled with minimal overhead.
Rotary Positional Embeddings (RoPE):
These embeddings improve long-context reasoning, allowing Mu to handle sequences longer than those seen during training.
Grouped-Query Attention (GQA):
GQA reduces the number of attention parameters and memory requirements while maintaining head diversity, decreasing latency and power consumption on NPUs.

Training and Optimization

Mu was trained using A100 GPUs on Azure Machine Learning, following a multi-phase approach:

Pre-training:
The model learned language syntax, grammar, semantics, and world knowledge from hundreds of billions of high-quality educational tokens.
Distillation:
Knowledge was distilled from Microsoft’s Phi models, enabling Mu to achieve remarkable parameter efficiency despite its small size.
Task-Specific Fine-Tuning:
Mu was further refined using datasets tailored to specific tasks, including SQUAD, CodeXGlue, and the Windows Settings agent. Fine-tuning with low-rank adaptation (LoRA) methods allowed Mu to deliver strong performance even with a micro-sized parameter count.

Task \ Model Fine-tuned Mu Fine-tuned Phi

SQUAD 0.692 0.846

CodeXGlue 0.934 0.930

Settings Agent 0.738 0.815

Quantization and Hardware Collaboration

To ensure Mu runs efficiently on Copilot+ PCs, Microsoft applied advanced quantization techniques:

Post-Training Quantization (PTQ):
Model weights and activations were converted from floating point to 8-bit and 16-bit integer representations, drastically reducing memory and compute requirements without sacrificing accuracy.
Silicon Partner Collaboration:
Microsoft worked closely with AMD, Intel, and Qualcomm to optimize Mu for their respective NPUs, tuning mathematical operators and validating performance across different hardware platforms.

The result is a model capable of producing outputs at more than 200 tokens per second on devices like the Surface Laptop 7, with ultra-fast response times even for large input contexts.

Mu in Action: The Windows Settings Agent

The primary application for Mu is the new AI-powered agent in Windows Settings. This agent is designed to simplify the process of changing system settings by understanding and executing natural language commands. For example, a user might type “Connect to Wi-Fi” or “Change keyboard layout,” and the agent will automatically navigate to the relevant setting and perform the requested action.

Challenges and Solutions

Precision and Latency: Initial attempts using larger models like Phi LoRA achieved high precision but failed to meet the stringent latency requirements for real-time interaction. Mu, after task-specific fine-tuning on 3.6 million samples and expanding coverage from 50 to hundreds of settings, met both precision and latency goals, delivering responses in under 500 milliseconds.
Handling Ambiguous Queries: The team curated a diverse evaluation set, combining real user inputs, synthetic queries, and common settings, to ensure Mu could handle a wide range of scenarios. For ambiguous or short queries, the agent is integrated into the Settings search box: short queries trigger traditional search results, while multi-word queries invoke the agent for actionable responses.
Complex Settings Management: Some settings, like “Increase brightness,” can refer to multiple actions (e.g., primary or secondary monitor). Microsoft prioritized training data for the most commonly used settings and continues to refine the model for more complex scenarios.

Real-World Impact and Future Directions

Mu’s deployment marks a significant step forward for on-device AI in Windows. By leveraging NPUs and advanced model optimization, Microsoft delivers a fast, private, and intelligent assistant that streamlines user interaction with system settings. The company is actively seeking feedback from Windows Insiders to further refine the experience and expand Mu’s capabilities.

“We welcome feedback from users in the Windows Insiders program as we continue to refine the experience for the agent in Settings.” — Vivek Pradeep, VP, Distinguished Engineer, Windows Applied Sciences.

The introduction of Mu demonstrates Microsoft’s commitment to advancing AI on the edge, delivering powerful language understanding and automation directly on user devices. As Microsoft’s Copilot+ PCs and NPUs become more prevalent, Mu’s efficient design and real-time performance set a new standard for intelligent, privacy-preserving user experiences in Windows.

Similar Posts

Microsoft Build 2024: A comprehensive guide for AI developers

Microsoft to hold exciting Windows and Surface AI event on May 20th before Build 2024

Xbox mobile gaming store: President Sarah Bond says “true cross-platform experience” ready for July 2024 launch

President of Microsoft, Brad Smith joins Seattle Mariners ownership group, first big change since 1992

Starfield’s May update promises to epic gameplay with enhanced maps, ship customization, and 60 FPS on Xbox Series X|S

Related

Discover more from Microsoft News Today

Subscribe to get the latest posts sent to your email.

Amazon, Azure, Brad Smith, Copilot, Copilot+ PC, Developer, Gaming, Microsoft, Microsoft Build 2024, Privacy, Qualcomm, Settings, SLM, Starfield, Surface, Surface Laptop, Wi-Fi, Windows, Windows Insider, Xbox, Xbox Series X, Xbox Series X|S

I'm Dave W. Shanahan, a Microsoft enthusiast with a passion for Windows 11, Xbox, Microsoft 365 Copilot, Azure, and more. After OnMSFT.com closed, I started MSFTNewsNow.com to keep the world updated on Microsoft news. Based in Massachusetts, you can find me on Twitter @Dav3Shanahan or email me at davewshanahan@gmail.com.

Xbox June 2025 Update: Copilot for Gaming, Aggregated Library, Cloud Streaming, and Every New Feature Explained

Microsoft 365 Roadmap: 16 Powerful New Features Announced for SharePoint, Teams, Copilot, and More