News Weekly: 2026-01-26–2026-02-01
Weekly AI & GPU Industry Recap: January 26 – February 1, 2026
🔑 Key Highlights
- Microsoft unveils Maia 200 (“Braga”) AI accelerator, a major leap over the Maia 100 featuring 216 GB HBM3E memory, 7 TB/sec bandwidth, and 10.15 petaflops at FP4 — now deployed in Azure US Central for OpenAI GPT-5.2 inference
- NVIDIA launches GeForce NOW natively on Linux as a Flatpak beta, delivering RTX 5080-powered cloud gaming at up to 5K/120fps to Ubuntu 24.04+ desktops — a landmark moment for Linux gaming
- AMD’s RDNA5 (GFX13) makes its first appearance in LLVM 23 Git, confirming next-generation GPU architecture work is underway well ahead of any product announcement
- Intel rolls out XeSS 3 Multi-Frame Generation via driver update, enabling 2x–4x frame multiplication across all Arc A/B-series and Core Ultra iGPU titles that already support XeSS 2
- NVIDIA deepens CoreWeave investment to ~13% stake with an additional $2B infusion, reinforcing its neocloud strategy as hyperscalers increasingly build custom AI XPUs to bypass GPU dependency
🤖 AI & Machine Learning
Microsoft Maia 200: Inference-Focused Custom XPU
Microsoft officially announced the “Braga” Maia 200, its second-generation AI accelerator built on TSMC N3P (3nm), targeted exclusively at AI inference — dropping the dual training/inference ambition of the Maia 100. Key specs:
- 216 GB HBM3E (SK Hynix, 6 stacks) at 7 TB/sec bandwidth — a 3.9x improvement over Maia 100
- 10.15 PFLOPS FP4 / 5.07 PFLOPS FP8 tensor performance at 750W TDP
- Expanded AI Transport Layer (ATL) interconnect scaling to 6,144-engine cluster domains across 1,536 nodes via 8-rail packet spraying
- Simplifies numeric formats to FP4/FP8 for tensors and BF16/FP32 for vectors, abandoning Maia 100’s proprietary MX6/MX9 formats
- Currently deployed in Azure US Central (Des Moines, Iowa) with US West 3 (Phoenix) next; serving inference for OpenAI GPT-5.2 and Microsoft 365 Copilots
NVIDIA DRIVE AV + Mercedes-Benz S-Class L4
NVIDIA announced the new Mercedes-Benz S-Class will be built on NVIDIA DRIVE Hyperion with full-stack NVIDIA DRIVE AV L4-ready software. The platform features:
- Redundant compute and multimodal sensors (cameras, radar, lidar)
- Parallel AI and classical safety stacks via NVIDIA Halos
- Training on NVIDIA DGX systems; simulation via NVIDIA Omniverse NuRec and Cosmos world models
- Partnership with Uber to enable robotaxi deployment on Uber’s mobility network
Physical AI & Robotics Ecosystem Expands
NVIDIA’s Omniverse/Isaac stack continued gaining traction with real deployments:
- Caterpillar using Nemotron + Jetson Thor for in-cab AI assistants
- LEM Surgical’s Dynamis robotic surgical system leveraging Holoscan + Isaac for Healthcare
- NEURA Robotics integrating SAP Joule agents with Isaac GR00T models
- Hugging Face integrating Isaac GR00T N1.6 and Isaac Lab-Arena into the LeRobot ecosystem
⚡ GPU & Hardware
AMD RDNA5 (GFX13) Surfaces in LLVM 23
The first GFX13 target — presumed to be RDNA5 — landed in LLVM 23 Git this week. Currently based on GFX12 (RDNA4) and GFX12.5/GFX1250 feature sets as a starting point. Expected to mature toward the LLVM 23.1 stable release in late August/September 2026. This follows the RDNA4 (GFX12) generation and the intermediate GFX1250 “RDNA4 refresh” IP.
Intel XeSS 3 Multi-Frame Generation Goes Live
Intel shipped driver versions 32.0.101.8425 and 32.0.101.8362 enabling XeSS 3 MFG with 2x, 3x, and 4x frame generation modes — competitive with NVIDIA’s DLSS MFG. Crucially:
- No developer updates required — any XeSS 2 title gets MFG via driver-level override
- Supported on Arc A-series, B-series, and Xe2/Xe3 integrated graphics (Meteor Lake, Lunar Lake, Arrow Lake)
- Also serves as launch driver for Arc B390/B370 iGPUs on Core Ultra 3 (Panther Lake) mobile CPUs
- Early impressions noted impressive image quality but elevated input latency concerns in fast-paced titles
Apple M3 Linux Progress (Asahi)
Asahi Linux developer Michael Reeves demonstrated booting to a KDE Plasma desktop on Apple M3 hardware — storage, display, and input now functional. However, no GPU acceleration yet; the system relies on LLVMpipe CPU-based software rendering, causing significant CPU load and poor battery life. M4/M5 bring-up remains the longer-term challenge.
NVIDIA FrameView 1.7 Released
NVIDIA updated FrameView to version 1.7 with:
- Accurate FPS measurement at 800+ FPS (relevant for 6x MFG scenarios)
- Customizable in-game overlays (FPS, 1% lows, PC latency, GPU/CPU clocks)
- Memory leak fix for long Reflex-compatible sessions
- Compatibility fixes for The Finals, Arc Raiders, Starfield, Black Myth: Wukong, Battlefield 6
🏭 Industry & Market
NVIDIA’s $2B CoreWeave Investment in Strategic Context
NVIDIA increased its CoreWeave stake to ~13% with a $2B equity investment (~22.94M Class A shares), up from 7% at CoreWeave’s IPO in March 2025. Analysis from The Next Platform frames this as strategic channel management:
- CoreWeave needs $225–$300B in capital to meet its 5 GW capacity goal by 2030
- NVIDIA’s existing $6.3B MSA expansion guarantees GPU capacity purchases through 2032
- CoreWeave’s stock has declined ~46% from its June 2025 peak amid AI infrastructure funding concerns
- The round-trip investment model — NVIDIA funds CoreWeave, CoreWeave buys NVIDIA GPUs — mirrors traditional channel stuffing but at hyperscale magnitudes
Hyperscaler Custom Silicon Race Intensifies
Microsoft’s Maia 200 debut underscores the accelerating trend: AWS (Trainium/Inferentia), Google (TPU), Microsoft (Maia), Meta (MTIA), and others are all building inference-optimized XPUs to reduce per-token costs and GPU vendor dependency. NVIDIA’s CoreWeave investment is a direct strategic response — ensuring neocloud partners without the capital to build custom silicon remain dependent on NVIDIA H/B-series GPUs.
AMD DDR5 Memory Validation for Ryzen 7 9850X3D
Phoronix published a 300+ benchmark comparison of DDR5-4800 vs. DDR5-6000 on the Ryzen 7 9850X3D (Zen 5, 2nd Gen 3D V-Cache) on Ubuntu 25.10 + Linux 6.17. AMD’s messaging that DDR5-4800 is viable for gaming without major performance loss was validated — providing value context for buyers considering whether to invest in premium DDR5-6000 EXPO kits (~$70 premium for 2x16GB).
🛠️ Developer Ecosystem
NVIDIA GeForce NOW Lands Natively on Linux
The GeForce NOW Flatpak beta is now available directly from NVIDIA.com (not Flathub), officially supporting Ubuntu 24.04+ on x86_64. Key technical requirements:
- NVIDIA GPU: R580 series or newer (X.Org session)
- AMD/Intel GPU: Mesa 24.2+ (Wayland session recommended)
- Vulkan Video H.264/H.265 required; AV1 not yet supported
- Ultimate tier: RTX 5080 servers, 5K/120fps or 1080p/360fps, 8-hour sessions
- Access to 4,500+ games; free tier available for testing
AMD ROCm 7.2 & CK Tile GEMM Debugging Blog
AMD published a detailed ROCm blog post on debugging NaN results in Composable Kernel (CK) Tile GEMM using rocgdb. The post — authored by AMD engineers — walks through:
- Systematic GPU kernel debugging methodology: problem simplification, deterministic inputs, step-by-step execution tracing
- Root cause: a single-character typo (
ALdsTileinstead ofBLdsTile) causing wrong tensor distribution when instruction scheduling was enabled on Instinct GPUs - Practical guidance for HIP kernel developers on using rocgdb breakpoints, thread inspection, and data flow tracing
- Concurrent ROCm 7.2 release (“Smarter, Faster, and More Scalable for Modern AI Workloads”) adds further ecosystem context
AMD Mesa 26.1: Low-Latency Video Decode
AMD’s RadeonSI Gallium3D driver (Mesa 26.1) gained a new low-latency video decode mode for the Video Core Next (VCN) pipeline, enabled via AMD_DEBUG=lowlatencydec. Trades higher GPU power consumption for reduced decode latency, mirroring the existing AMD_DEBUG=lowlatencyenc low-latency encode option. Authored by AMD’s David Rosca.
AMDGPU HDMI Gaming Features: v2 Patches Posted
Second iteration of AMDGPU kernel driver patches adding HDMI VRR (Variable Rate Refresh) and ALLM (Auto Low Latency Mode) for Linux were posted for review:
- New module parameters:
amdgpu.allm_mode=(0/1/2) andamdgpu.hdmi_vrr_desktop_mode= - Developed via reverse engineering due to HDMI Forum blocking open-source HDMI 2.1 support
- Too late for Linux 6.20~7.0; targeting summer 2026 kernel cycle — pending legal clearance from AMD
Libcamera 0.7: 15x GPU-Accelerated SoftISP
Libcamera 0.7 shipped with initial GPU acceleration for SoftISP, delivering up to 15x performance improvement in Debayer+CCM processing versus CPU-only, validated on Qualcomm RB5 hardware by Linaro. The GPU ISP is now set as the default for software-based pipeline scenarios.
Libgcrypt 1.12: 2x AES Performance on AMD Zen 5
Libgcrypt 1.12 adds a VAES/AVX-512 accelerated AES-OCB implementation, delivering approximately 2x performance on AMD Zen 5 processors. Also includes AVX2/AVX-512 CRC acceleration, RISC-V Vector crypto optimizations, and Dilithium (ML-DSA) post-quantum signature support.
📊 Key Takeaways
Microsoft’s Maia 200 debut is the most consequential infrastructure story of the week — with 7 TB/sec HBM3E bandwidth and a 6,144-engine cluster fabric, it signals that hyperscaler custom silicon is maturing fast enough to meaningfully compete with NVIDIA GPUs for inference at scale, and NVIDIA’s aggressive $2B CoreWeave investment reflects the strategic pressure this creates. On the consumer and developer side, NVIDIA’s native GeForce NOW Linux launch and Intel’s XeSS 3 MFG driver rollout both demonstrate that the frame generation and cloud gaming era is broadening beyond Windows-centric ecosystems, while AMD’s early GFX13/RDNA5 compiler footprint in LLVM 23 suggests the next GPU architecture war is already being staged in the toolchain. The open-source GPU ecosystem — from ROCm’s CK Tile debugging guides to Mesa’s VCN low-latency decode and AMDGPU HDMI VRR patches — continues to close gaps with proprietary stacks, making Linux an increasingly viable first-class platform for both gaming and AI workloads.