🖥️ AI & GPU Industry Weekly Recap: March 30 – April 5, 2026


🔑 Key Highlights

  • AMD’s MI355X dominates MLPerf Inference v6.0, posting record results on Llama 2 70B, gpt-oss-120b, and Wan-2.2-T2V-A14B using WMXFP4 quantization across multi-node clusters up to 94 GPUs, with nine AMD partners also submitting in the “Available” category
  • AMD’s ROCDXG goes production-ready, delivering open-source ROCm 7.2.1 compute support under Windows Subsystem for Linux (WSL2) for Radeon RX 9000/7000 series and Ryzen AI APUs — a major step toward native Windows ROCm support
  • NVIDIA’s App debuts Auto Shader Compilation (beta), automatically recompiling game shaders in the background after driver updates, previewing a broader push toward cloud-distributed Advanced Shader Delivery (ASD)
  • Google’s Gemma 4 family lands on NVIDIA hardware, with NVIDIA and Google collaborating to optimize the 2B–31B parameter multimodal models for RTX GPUs, DGX Spark, and Jetson Orin Nano edge devices
  • NVIDIA’s Nova open-source GPU driver and AMD’s next-gen AIE4 NPU both received meaningful upstream Linux kernel advances this week, reflecting accelerating open-source hardware ecosystem momentum

🤖 AI & Machine Learning

AMD MLPerf Inference v6.0 Results Published

AMD published full reproduction instructions for its MLPerf Inference v6.0 submission on the AMD Instinct MI355X platform. Key highlights:

  • Models benchmarked: Llama 2 70B, gpt-oss-120b (OpenAI’s 120B open model), and Wan-2.2-T2V-A14B (text-to-video)
  • Datatype: WMXFP4 (Weight MX FP4) for LLMs, BF16 for video generation
  • Cluster sizes: single-node (8× MI355X GPUs) up to 87–94 GPU multi-node configurations
  • Llama 2 70B offline throughput: ~103,480 tokens/second on a single 8-GPU MI355X node
  • Nine AMD partners submitted independently in the “Available” (commercially purchasable/rentable) category
  • ROCm 7.1.0+ required; full Docker-based reproduction pipeline published via rocm/amd-mlperf container images

NVIDIA Software Lifts MLPerf Inference Performance

NVIDIA continued its narrative that its platform — encompassing CUDA, NVLink, and Dynamo (its open distributed inference framework) — is what drives benchmark leadership, not GPUs alone. Software-layer optimizations pushed NVIDIA’s MLPerf Inference v6.0 numbers to new highs, reinforcing Jensen Huang’s “more than chips” messaging from GTC 2026.

Google Gemma 4 Optimized for NVIDIA GPUs

Google and NVIDIA co-optimized the Gemma 4 model family (E2B, E4B, 26B, 31B) for local and edge deployment:

  • Supports reasoning, coding, agentic tool use, vision/video/audio, and 35+ languages
  • Runs locally via Ollama and llama.cpp on RTX PCs and DGX Spark
  • E2B/E4B target edge devices including Jetson Orin Nano; 26B/31B target RTX workstations and DGX Spark for agentic developer workflows
  • Compatible with OpenClaw local agent framework and Unsloth Studio for fine-tuning
  • NVIDIA Tensor Cores + CUDA stack deliver day-one efficiency without model-specific optimization overhead

AMD AIE4 NPU Linux Patches Posted

AMD submitted initial Linux kernel patches for the next-gen AIE4 NPU via the AMDXDNA accelerator driver:

  • Targets PCI device IDs 0x17F2 and 0x1B0B (NPU3)
  • Adds SR-IOV (Single Root I/O Virtualization) support — a notable new capability vs. current AIE2
  • Covers device initialization and basic mailbox communication
  • AMD’s proactive upstream Linux support approach aims to have drivers mainlined before retail hardware ships

⚡ GPU & Hardware

AMD ROCDXG: Production-Ready ROCm Under WSL2

AMD’s ROCDXG (librocdxg) library — enabling ROCm 7.2.1 on Windows 11 WSL2 — reached production status:

  • Open-source under MIT license (one binary blob thunk remains)
  • Officially supports Radeon RX 9000 and RX 7000 series, plus Ryzen AI 300 “Strix Point” and Ryzen AI Max “Strix Halo” APUs
  • Independently versioned from ROCm releases and Windows display drivers — more flexible than legacy roc4wsl
  • Pairs with Adrenalin 26.2.2 Windows 11 driver for full AI and HPC workload support under WSL
  • Roadmap targets full native Windows ROCm support

AMD Ryzen AI Max “Strix Halo” Shows Major Linux GPU Gains

Phoronix benchmarks of the Framework Desktop (Ryzen AI Max+ 395, 64GB LPDDR5-8000, Radeon 8060S iGPU) demonstrated significant Vulkan and OpenGL performance improvements when upgrading from Ubuntu 25.04 (Linux 6.14, Mesa 25.0) to Ubuntu 26.04 (Linux 7.0, Mesa 26.0):

  • RADV Vulkan driver and RadeonSI Gallium3D both showed meaningful generational uplift
  • Highlights the compounding benefits of upstream driver/kernel work on AMD integrated graphics

NVIDIA Nova Driver Advances in Linux 7.1

The NVIDIA Nova Core driver — the Rust-written open-source successor to Nouveau — received its Linux 7.1 pull request:

  • Expanded NVIDIA Turing GPU support
  • Hardened GPU System Processor (GSP) command queue
  • Support for large RPCs, refactored Falcon firmware handling, DebugFS GSM-RM log buffer support
  • Still not end-user ready, but advancing steadily in upstream Linux

HarfBuzz 14.0 Introduces GPU-Accelerated Text Rendering

The widely-used HarfBuzz text shaping engine released version 14.0 with the new libharfbuzz-gpu library:

  • GPU-based text rasterization using the Slug algorithm — decoding/rasterizing directly in the fragment shader
  • Shader support: GLSL, WGSL, Metal MSL, HLSL
  • New hb-gpu utility and interactive WebGPU/WebGL web demo included
  • Impacts GNOME, KDE, Chromium, LibreOffice, Flutter, Godot, and Java rendering pipelines

NVIDIA App Beta: Auto Shader Compilation

NVIDIA’s updated App introduced Auto Shader Compilation (ASC) in beta:

  • Background recompilation triggered after every GPU driver update
  • Configurable cache size (e.g., 100 GB ≈ ~20 modern AAA titles) and system utilization tiers (low/medium/high)
  • Works only after initial per-game shader compilation is complete
  • Precursor to Advanced Shader Delivery (ASD) — Microsoft’s cloud-distributed precompiled shader framework, already adopted by Intel via “Precompiled Shader Distribution”

🏭 Industry & Market

Q1 2026 Linux Ecosystem Recap

Phoronix’s Q1 2026 retrospective highlighted the quarter’s dominant themes:

  • Intel Core Ultra Series 3 “Panther Lake” (Core Ultra X7 358H, Arc B390 / Xe3 graphics on Intel 18A process) was the most-benchmarked new platform, showing strong power efficiency gains — up to 95× faster than Penryn-era laptops
  • AMD Ryzen 7 9850X3D ($499) generated strong Linux gaming interest; DDR5-4800 proved sufficient for gaming due to 2nd Gen 3D V-Cache architecture
  • NVIDIA GB10 Blackwell (Dell Pro Max GB10) featured prominently in AI inference benchmarks, competing against Ryzen AI Max+ 395 “Strix Halo” in CPU-focused workloads
  • AI/LLM code contribution debates — including Linus Torvalds’ commentary on “vibe coding” and his own AudioNoise project built with AI assistance — dominated Linux community discourse

Canonical’s Ubuntu 26.04 ROCm Integration Still Pending

Ubuntu 26.04 LTS (due April 23) faces a race condition on AMD ROCm integration:

  • Canonical’s promise of apt install rocm one-command installation remains undelivered at press time
  • Available archive packages still at ROCm 7.1 (vs. upstream ROCm 7.2.1)
  • A Canonical engineer (Talha Can Havadar) just received package upload rights — timeline uncertain
  • Current recommendation: use upstream AMD ROCm packages directly rather than Ubuntu archive versions

Intel Cache-Aware Scheduling v4 for Xeon and EPYC

Intel posted the fourth revision of Cache Aware Scheduling patches for the Linux kernel:

  • Targets modern Intel Xeon (Granite Rapids/Xeon 6) and AMD EPYC Turin processors with complex LLC domain topologies
  • v4 adds CPU scanning depth limits under NUMA balancing, improved LLC ID management, and low-load imbalance tuning
  • Prior testing showed significant server workload performance gains on both platforms
  • Not yet mainlined; community watching for Linux 7.x inclusion

🛠️ Developer Ecosystem

Rust Graphics Driver Momentum Builds for Linux 7.1

The Linux 7.1 DRM Rust pull request landed a broad set of infrastructure improvements:

  • Reworked DMA coherent API, GPU buddy allocator abstractions, DRM shared memory GEM helper abstraction
  • Applies to NVIDIA Nova Core (Turing support, GSP hardening) and Arm Mali Tyr driver
  • Reflects the formalization of Rust as a permanent part of the Linux kernel (Rust experiment officially concluded in Linux 7.0)

AMD ROCm Blogs: MLPerf v6.0 Reproduction Guide Published

AMD’s ROCm technical blog published a detailed step-by-step reproduction guide for MLPerf Inference v6.0 submissions:

  • Docker-based workflows: rocm/amd-mlperf:mi355x_llama2_70b_inference_6.0 and model-specific containers
  • WMXFP4 quantized model checkpoints available via Hugging Face (amd/Llama-2-70b-chat-hf-WMXFP4-...)
  • Covers offline, server, and interactive scenarios with accuracy validation scripts
  • Enables third-party customers and partners to independently verify AMD’s published numbers

Ubuntu 26.04 Ships Linux 7.0 + Mesa 26.0

Ubuntu 26.04 LTS (releasing April 23) ships a notably modern stack:

  • Linux 7.0 kernel (stable release still pending mid-April)
  • Mesa 26.0 graphics drivers with new OpenGL/Vulkan capabilities
  • GNOME 50, Python 3.14, OpenJDK 25, GCC 15.2
  • NVIDIA R590 series Linux driver available in both 25.10 and 26.04
  • Benchmarks on AMD Ryzen 9 9950X + RTX 5080 show meaningful gains vs. Ubuntu 25.10 (Linux 6.17)

NVIDIA AI-Assisted Driver Development Disclosed

In a notable industry first, NVIDIA publicly disclosed that development of its preview DRM Color Pipeline API Linux driver was substantially AI-assisted:

“Nearly all of the code was produced by [Claude Sonnet/Opus], but with a strong emphasis on explicit human direction, review, and iteration.”

  • The R595-derived preview driver enables Wayland compositors to leverage GPU hardware for HDR color processing
  • Signals growing industry normalization of LLM-assisted systems software development

📊 Key Takeaways

AMD had an exceptionally strong week across both the software and silicon fronts: the MI355X’s MLPerf Inference v6.0 results — including multi-node WMXFP4 inference at scale — demonstrate genuine datacenter competitiveness, while ROCDXG going production-ready and the AIE4 NPU patches signal a maturing, more accessible ROCm ecosystem that now spans Windows WSL2, Linux, and next-gen NPU silicon. NVIDIA, meanwhile, showed that software remains its most potent weapon — from Auto Shader Compilation improving the PC gaming experience to Dynamo and CUDA-layer optimizations driving MLPerf leadership, and even publicly normalizing AI-assisted driver development with Claude.

The broader Linux/open-source GPU ecosystem is at an inflection point: Rust-based drivers (Nova for NVIDIA, Tyr for Mali), Mesa 26.0, the Linux 7.0 kernel, and Ubuntu 26.04’s imminent release are converging to deliver a materially better open-source GPU compute and graphics experience — a rising tide that benefits AMD’s ROCm ambitions, NVIDIA’s Wayland HDR story, and Intel’s Arc/Xe3 momentum simultaneously.