🖥️ AI & GPU Industry Weekly Recap: March 30 – April 5, 2026

🔑 Key Highlights

AMD’s MI355X dominates MLPerf Inference v6.0, posting record results on Llama 2 70B, gpt-oss-120b, and Wan-2.2-T2V-A14B using WMXFP4 quantization across multi-node clusters up to 94 GPUs, with nine AMD partners also submitting in the “Available” category
AMD’s ROCDXG goes production-ready, delivering open-source ROCm 7.2.1 compute support under Windows Subsystem for Linux (WSL2) for Radeon RX 9000/7000 series and Ryzen AI APUs — a major step toward native Windows ROCm support
NVIDIA’s App debuts Auto Shader Compilation (beta), automatically recompiling game shaders in the background after driver updates, previewing a broader push toward cloud-distributed Advanced Shader Delivery (ASD)
Google’s Gemma 4 family lands on NVIDIA hardware, with NVIDIA and Google collaborating to optimize the 2B–31B parameter multimodal models for RTX GPUs, DGX Spark, and Jetson Orin Nano edge devices
NVIDIA’s Nova open-source GPU driver and AMD’s next-gen AIE4 NPU both received meaningful upstream Linux kernel advances this week, reflecting accelerating open-source hardware ecosystem momentum

🤖 AI & Machine Learning

AMD MLPerf Inference v6.0 Results Published

AMD published full reproduction instructions for its MLPerf Inference v6.0 submission on the AMD Instinct MI355X platform. Key highlights:

Models benchmarked: Llama 2 70B, gpt-oss-120b (OpenAI’s 120B open model), and Wan-2.2-T2V-A14B (text-to-video)
Datatype: WMXFP4 (Weight MX FP4) for LLMs, BF16 for video generation
Cluster sizes: single-node (8× MI355X GPUs) up to 87–94 GPU multi-node configurations
Llama 2 70B offline throughput: ~103,480 tokens/second on a single 8-GPU MI355X node
Nine AMD partners submitted independently in the “Available” (commercially purchasable/rentable) category
ROCm 7.1.0+ required; full Docker-based reproduction pipeline published via rocm/amd-mlperf container images

NVIDIA Software Lifts MLPerf Inference Performance

NVIDIA continued its narrative that its platform — encompassing CUDA, NVLink, and Dynamo (its open distributed inference framework) — is what drives benchmark leadership, not GPUs alone. Software-layer optimizations pushed NVIDIA’s MLPerf Inference v6.0 numbers to new highs, reinforcing Jensen Huang’s “more than chips” messaging from GTC 2026.

Google Gemma 4 Optimized for NVIDIA GPUs

Google and NVIDIA co-optimized the Gemma 4 model family (E2B, E4B, 26B, 31B) for local and edge deployment:

Supports reasoning, coding, agentic tool use, vision/video/audio, and 35+ languages
Runs locally via Ollama and llama.cpp on RTX PCs and DGX Spark
E2B/E4B target edge devices including Jetson Orin Nano; 26B/31B target RTX workstations and DGX Spark for agentic developer workflows
Compatible with OpenClaw local agent framework and Unsloth Studio for fine-tuning
NVIDIA Tensor Cores + CUDA stack deliver day-one efficiency without model-specific optimization overhead

AMD AIE4 NPU Linux Patches Posted

AMD submitted initial Linux kernel patches for the next-gen AIE4 NPU via the AMDXDNA accelerator driver:

Targets PCI device IDs 0x17F2 and 0x1B0B (NPU3)
Adds SR-IOV (Single Root I/O Virtualization) support — a notable new capability vs. current AIE2
Covers device initialization and basic mailbox communication
AMD’s proactive upstream Linux support approach aims to have drivers mainlined before retail hardware ships

⚡ GPU & Hardware

AMD ROCDXG: Production-Ready ROCm Under WSL2

AMD’s ROCDXG (librocdxg) library — enabling ROCm 7.2.1 on Windows 11 WSL2 — reached production status:

Open-source under MIT license (one binary blob thunk remains)
Officially supports Radeon RX 9000 and RX 7000 series, plus Ryzen AI 300 “Strix Point” and Ryzen AI Max “Strix Halo” APUs
Independently versioned from ROCm releases and Windows display drivers — more flexible than legacy roc4wsl
Pairs with Adrenalin 26.2.2 Windows 11 driver for full AI and HPC workload support under WSL
Roadmap targets full native Windows ROCm support

AMD Ryzen AI Max “Strix Halo” Shows Major Linux GPU Gains

Phoronix benchmarks of the Framework Desktop (Ryzen AI Max+ 395, 64GB LPDDR5-8000, Radeon 8060S iGPU) demonstrated significant Vulkan and OpenGL performance improvements when upgrading from Ubuntu 25.04 (Linux 6.14, Mesa 25.0) to Ubuntu 26.04 (Linux 7.0, Mesa 26.0):

RADV Vulkan driver and RadeonSI Gallium3D both showed meaningful generational uplift
Highlights the compounding benefits of upstream driver/kernel work on AMD integrated graphics

NVIDIA Nova Driver Advances in Linux 7.1

The NVIDIA Nova Core driver — the Rust-written open-source successor to Nouveau — received its Linux 7.1 pull request:

Expanded NVIDIA Turing GPU support
Hardened GPU System Processor (GSP) command queue
Support for large RPCs, refactored Falcon firmware handling, DebugFS GSM-RM log buffer support
Still not end-user ready, but advancing steadily in upstream Linux

HarfBuzz 14.0 Introduces GPU-Accelerated Text Rendering

The widely-used HarfBuzz text shaping engine released version 14.0 with the new libharfbuzz-gpu library:

GPU-based text rasterization using the Slug algorithm — decoding/rasterizing directly in the fragment shader
Shader support: GLSL, WGSL, Metal MSL, HLSL
New hb-gpu utility and interactive WebGPU/WebGL web demo included
Impacts GNOME, KDE, Chromium, LibreOffice, Flutter, Godot, and Java rendering pipelines

NVIDIA App Beta: Auto Shader Compilation

NVIDIA’s updated App introduced Auto Shader Compilation (ASC) in beta:

Background recompilation triggered after every GPU driver update
Configurable cache size (e.g., 100 GB ≈ ~20 modern AAA titles) and system utilization tiers (low/medium/high)
Works only after initial per-game shader compilation is complete
Precursor to Advanced Shader Delivery (ASD) — Microsoft’s cloud-distributed precompiled shader framework, already adopted by Intel via “Precompiled Shader Distribution”

🏭 Industry & Market

Q1 2026 Linux Ecosystem Recap

Phoronix’s Q1 2026 retrospective highlighted the quarter’s dominant themes:

Intel Core Ultra Series 3 “Panther Lake” (Core Ultra X7 358H, Arc B390 / Xe3 graphics on Intel 18A process) was the most-benchmarked new platform, showing strong power efficiency gains — up to 95× faster than Penryn-era laptops
AMD Ryzen 7 9850X3D ($499) generated strong Linux gaming interest; DDR5-4800 proved sufficient for gaming due to 2nd Gen 3D V-Cache architecture
NVIDIA GB10 Blackwell (Dell Pro Max GB10) featured prominently in AI inference benchmarks, competing against Ryzen AI Max+ 395 “Strix Halo” in CPU-focused workloads
AI/LLM code contribution debates — including Linus Torvalds’ commentary on “vibe coding” and his own AudioNoise project built with AI assistance — dominated Linux community discourse

Canonical’s Ubuntu 26.04 ROCm Integration Still Pending

Ubuntu 26.04 LTS (due April 23) faces a race condition on AMD ROCm integration:

Canonical’s promise of apt install rocm one-command installation remains undelivered at press time
Available archive packages still at ROCm 7.1 (vs. upstream ROCm 7.2.1)
A Canonical engineer (Talha Can Havadar) just received package upload rights — timeline uncertain
Current recommendation: use upstream AMD ROCm packages directly rather than Ubuntu archive versions

Intel Cache-Aware Scheduling v4 for Xeon and EPYC

Intel posted the fourth revision of Cache Aware Scheduling patches for the Linux kernel:

Targets modern Intel Xeon (Granite Rapids/Xeon 6) and AMD EPYC Turin processors with complex LLC domain topologies
v4 adds CPU scanning depth limits under NUMA balancing, improved LLC ID management, and low-load imbalance tuning
Prior testing showed significant server workload performance gains on both platforms
Not yet mainlined; community watching for Linux 7.x inclusion

🛠️ Developer Ecosystem

Rust Graphics Driver Momentum Builds for Linux 7.1

The Linux 7.1 DRM Rust pull request landed a broad set of infrastructure improvements:

Reworked DMA coherent API, GPU buddy allocator abstractions, DRM shared memory GEM helper abstraction
Applies to NVIDIA Nova Core (Turing support, GSP hardening) and Arm Mali Tyr driver
Reflects the formalization of Rust as a permanent part of the Linux kernel (Rust experiment officially concluded in Linux 7.0)

AMD ROCm Blogs: MLPerf v6.0 Reproduction Guide Published

AMD’s ROCm technical blog published a detailed step-by-step reproduction guide for MLPerf Inference v6.0 submissions:

Docker-based workflows: rocm/amd-mlperf:mi355x_llama2_70b_inference_6.0 and model-specific containers
WMXFP4 quantized model checkpoints available via Hugging Face (amd/Llama-2-70b-chat-hf-WMXFP4-...)
Covers offline, server, and interactive scenarios with accuracy validation scripts
Enables third-party customers and partners to independently verify AMD’s published numbers

Ubuntu 26.04 Ships Linux 7.0 + Mesa 26.0

Ubuntu 26.04 LTS (releasing April 23) ships a notably modern stack:

Linux 7.0 kernel (stable release still pending mid-April)
Mesa 26.0 graphics drivers with new OpenGL/Vulkan capabilities
GNOME 50, Python 3.14, OpenJDK 25, GCC 15.2
NVIDIA R590 series Linux driver available in both 25.10 and 26.04
Benchmarks on AMD Ryzen 9 9950X + RTX 5080 show meaningful gains vs. Ubuntu 25.10 (Linux 6.17)

NVIDIA AI-Assisted Driver Development Disclosed

In a notable industry first, NVIDIA publicly disclosed that development of its preview DRM Color Pipeline API Linux driver was substantially AI-assisted:

“Nearly all of the code was produced by [Claude Sonnet/Opus], but with a strong emphasis on explicit human direction, review, and iteration.”

The R595-derived preview driver enables Wayland compositors to leverage GPU hardware for HDR color processing

Signals growing industry normalization of LLM-assisted systems software development

📊 Key Takeaways

AMD had an exceptionally strong week across both the software and silicon fronts: the MI355X’s MLPerf Inference v6.0 results — including multi-node WMXFP4 inference at scale — demonstrate genuine datacenter competitiveness, while ROCDXG going production-ready and the AIE4 NPU patches signal a maturing, more accessible ROCm ecosystem that now spans Windows WSL2, Linux, and next-gen NPU silicon. NVIDIA, meanwhile, showed that software remains its most potent weapon — from Auto Shader Compilation improving the PC gaming experience to Dynamo and CUDA-layer optimizations driving MLPerf leadership, and even publicly normalizing AI-assisted driver development with Claude.

The broader Linux/open-source GPU ecosystem is at an inflection point: Rust-based drivers (Nova for NVIDIA, Tyr for Mali), Mesa 26.0, the Linux 7.0 kernel, and Ubuntu 26.04’s imminent release are converging to deliver a materially better open-source GPU compute and graphics experience — a rising tide that benefits AMD’s ROCm ambitions, NVIDIA’s Wayland HDR story, and Intel’s Arc/Xe3 momentum simultaneously.

News Weekly: 2026-03-30–2026-04-05