Technical Intelligence Report: 2026-02-16

Executive Summary

  • Linux Kernel 7.0 & AMD Zen 5: Mainline Linux 7.0 has merged critical CXL address translation support for AMD Zen 5 (EPYC 9005) systems, resolving “Normalized Address” handling via ACPI PRM.
  • Intel Compute Stack Competition: Benchmarks of Intel’s “Panther Lake” (Arc B390) on Linux 6.19 reveal strong “out-of-the-box” OpenCL performance using the open-source Intel Compute Runtime, positioning it as a viable competitor to AMD Strix Point/ROCm 7.2 configurations.
  • NVIDIA Infrastructure Expansion: NVIDIA released performance claims for the GB300 NVL72 (Blackwell Ultra), targeting “Agentic AI” with 50x efficiency gains over Hopper and 1.5x lower costs for long-context workloads compared to the standard GB200.

🤖 ROCm Updates & Software

[2026-02-16] Linux 7.0 CXL Enables AMD Zen 5 Address Translation Feature

Source: Phoronix

Key takeaway relevant to AMD:

  • Ensures stability and correct memory addressing for AMD EPYC 9005 (Zen 5) servers using CXL devices running on the latest Linux kernels.
  • Foundational work that prepares the Linux ecosystem for upcoming Zen 6 platforms.

Summary:

  • Linux kernel 7.0 has successfully merged ACPI PRMT-based address translation for the Compute Express Link (CXL) subsystem after ten rounds of code review.
  • This update specifically addresses how AMD Zen 5 platforms handle physical address translation.

Details:

  • Technical Challenge: Zen 5 systems can be configured to use “Normalized addresses,” where Host Physical Addresses (HPA) differ from System Physical Addresses (SPA). In this mode, CXL endpoints are programmed in passthrough (DPA == HPA) with interleaving disabled.
  • The Solution: The kernel now uses the ACPI Platform Runtime Mechanism (PRM) handler to translate the Device Physical Address (DPA) to SPA.
  • Implementation:
    • Introduces a new file: core/atl.c (handling ACPI PRM-specific translation).
    • While naming mimics the AMD Address Translation Library (CONFIG_AMD_ATL), this kernel implementation is vendor-agnostic and relies on Kbuild/Kconfig options.
  • Hardware Scope: Debuted with AMD EPYC 9005 series; expected to persist in Zen 6.
  • Additional Changes: The merge also includes CXL port error protocol handling/reporting and documentation updates for ACPI PRM CXL Address Translation.

🤼‍♂️ Market & Competitors

[2026-02-16] Arc B390 Graphics With Panther Lake Performing Great On Open-Source Intel Compute Runtime

Source: Phoronix

Key takeaway relevant to AMD:

  • Intel’s open-source compute stack on Linux is maturing rapidly. The Arc B390 (Xe3) is showing strong competition against AMD Ryzen AI 9 HX 370 (Strix Point) in OpenCL workloads.
  • AMD ROCm 7.2 is the current benchmark standard for comparison against Intel’s latest Compute Runtime.

Summary:

  • Phoronix benchmarked the Intel Core Ultra X7 358H “Panther Lake” with Arc B390 Xe3 graphics using the open-source Intel Compute Runtime on Linux.
  • Performance was compared against prior Intel generations and the AMD Ryzen AI 9 HX 370.

Details:

  • Test Environment:
    • OS: Linux 6.19 (Ubuntu 26.04 development build).
    • Intel Stack: Compute Runtime 26.05.37020.3, Intel Graphics Compiler 2.28.4.
    • AMD Stack: ROCm 7.2 (used on Ryzen AI 9 HX 370 / ASUS Zenbook S 14).
  • Hardware Comparison:
    • Intel: Core Ultra X7 358H (Panther Lake/Xe3), Core Ultra 7 258V (Lunar Lake), various older gens (Meteor/Alder/Tiger Lake).
    • AMD: Ryzen AI 9 HX 370 (Strix Point).
    • Note: No “Strix Halo” (Ryzen AI Max+) samples were available for this test cycle.
  • Observations:
    • Intel Arc B390 worked “out-of-the-box” with the production support in the compute runtime.
    • Benchmarks focused on OpenCL and GPU compute performance.
    • Testing included SoC power consumption monitoring to evaluate efficiency alongside raw throughput.

[2026-02-16] New SemiAnalysis InferenceX Data Shows NVIDIA Blackwell Ultra Delivers up to 50x Better Performance and 35x Lower Costs for Agentic AI

Source: NVIDIA Blog

Key takeaway relevant to AMD:

  • NVIDIA is raising the bar for “Agentic AI” and long-context inference, directly challenging the memory-capacity advantages often touted by AMD’s MI300/MI325 series.
  • The software optimization stack (TensorRT-LLM, Mooncake) is cited as a major driver of these gains, emphasizing the need for continued ROCm software optimization.

Summary:

  • NVIDIA released data regarding the Blackwell Ultra (GB300 NVL72) platform, highlighting massive efficiency gains over the Hopper platform and cost reductions compared to the standard GB200.
  • The focus is on “Agentic AI” workloads which require low latency and processing of massive codebases.

Details:

  • GB300 NVL72 vs. Hopper:
    • Throughput: Up to 50x higher throughput per megawatt.
    • Cost: 35x lower cost per token for low-latency workloads.
  • GB300 NVL72 (Blackwell Ultra) vs. GB200 NVL72:
    • Long-Context: 1.5x lower cost per token for workloads with 128,000-token inputs and 8,000-token outputs.
    • Compute Specs: 1.5x higher NVFP4 compute performance; 2x faster attention processing.
  • Software Stack Optimizations:
    • Utilizes TensorRT-LLM, NVIDIA Dynamo, Mooncake, and SGLang.
    • TensorRT-LLM alone delivered 5x better performance on GB200 for low-latency workloads compared to 4 months prior.
    • Features “Programmatic dependent launch” to minimize kernel idle time.
  • Future Roadmap (Rubin Platform):
    • NVIDIA teased the “Vera Rubin NVL72.”
    • Claims 10x higher throughput per megawatt compared to Blackwell for MoE inference.
    • Can train large MoE models using 1/4th the number of GPUs compared to Blackwell.