News Weekly: 2026-02-23–2026-03-01
🗞️ AI & GPU Industry Weekly Recap: Feb 23 – Mar 1, 2026
🔑 Key Highlights
- AMD seals a landmark 5-year, 6-gigawatt deal with Meta Platforms, mirroring its earlier OpenAI pact, featuring a custom MI450 GPU accelerator and co-designed “Helios” Open Rack Wide v3 rackscale systems — potentially worth $115B+ in GPU revenue alone
- MSI RTX 5090 Lightning Z scalping hits absurd levels, with eBay listings reaching nearly $27,000 (500%+ premium) for the limited 1,300-unit run, underscoring extreme demand for NVIDIA’s flagship Blackwell consumer GPUs
- AMD launches JAX-AITER, a new open-source bridge bringing its AITER-optimized AI kernels to the JAX framework on ROCm, delivering up to 9.68× speedups over pure-JAX attention implementations on AMD Instinct MI350 GPUs
- AMD joins ARM’s new “CoreCollective” consortium as a founding member alongside Google, Microsoft, Qualcomm, and Samsung — a notable strategic signal given rumors of AMD’s ARM-powered “Sound Wave” APU
- AMD’s ROCm ecosystem gets a surge of developer tooling, with new blogs covering PyTorch TunableOp offline tuning, the AMD Resource Manager for Kubernetes-based GPU sharing, and JAX-AITER kernel integration
🤖 AI & Machine Learning
JAX-AITER: AMD Brings Optimized Kernels to JAX
AMD published a detailed technical blog introducing JAX-AITER, an open-source package (github.com/ROCm/jax-aiter) that bridges JAX’s Foreign Function Interface (FFI) to AMD’s AITER (AI Tensor Engine Repository) high-performance kernel library. The integration targets AMD Instinct MI300 and MI350 series GPUs running ROCm.
Key architectural details:
- Uses JAX FFI + C++ bridge to route JAX device buffers directly to AITER GPU kernels with zero-copy buffer sharing
- Implements
jax.custom_vjpfor proper autodiff support in training loops - First target: multi-head attention (MHA/FMHA) via
flash_attnandflash_attn_varlenAPIs - GEMM and custom ops currently retain PyTorch as a dependency; roadmap targets fully framework-neutral entry points
Benchmark highlights on AMD Instinct MI350 (bf16, pure JAX vs. JAX-AITER):
| Config | Pure JAX | JAX-AITER | Speedup |
|---|---|---|---|
| batch=4, seq=4096, heads=32, dim=64 | 8.594ms | 0.888ms | 9.68× |
| batch=1, seq=8192, heads=8, dim=64 | 2.230ms | 0.301ms | 7.39× |
| batch=2, seq=4096, heads=16, dim=192 | 4.221ms | 0.742ms | 5.69× |
| batch=2, seq=2048, heads=16, dim=192 | 1.125ms | 0.245ms | 4.59× |
Speedups are most pronounced at longer sequence lengths and higher head counts — exactly the configurations that matter most for large-scale LLM training and inference.
PyTorch TunableOp Offline Tuning on ROCm
AMD’s ROCm team published a comprehensive guide to PyTorch TunableOp offline tuning, available in PyTorch v2.6+. The workflow decouples BLAS kernel selection from model execution:
- Collection phase: Record GEMM operations to
tunableop_untuned0.csv - Tuning phase: Run
torch.cuda.tunable.tune_gemm_in_file()independently - Deployment phase: Load pre-tuned results for accelerated inference
New TunableOp features highlighted:
- FP8 and TF32 datatype support on MI300 series; MX FP8/FP4 incoming for MI350
- Rotating buffer simulation (up to 512MB) for cold-cache-accurate benchmarking
- Real-time result saving (PyTorch 2.10+) to prevent tuning loss on crashes
- Numerical tolerance checks with absolute and relative thresholds
- Support for batch GEMM and GEMM with bias tuning
⚡ GPU & Hardware
MSI RTX 5090 Lightning Z: Scalper Frenzy
NVIDIA’s RTX 5090 Founders Edition carries a $2,000 MSRP, but MSI’s limited-edition RTX 5090 Lightning Z (MSRP: $5,090, only 1,300 units produced) has become a scalper magnet:
- eBay “sold” listings: $6,700–$8,800
- Active listings: $6,000–$15,000
- One outlier UK listing: ~$27,000
Performance numbers justify some enthusiasm:
- ~12% faster than RTX 5090 FE out of the box
- ~18% faster with manual overclocking (comparable to a theoretical RTX 5090 Ti)
- Supports a 1,000W Extreme vBIOS (200W over stock OC) and a 2,500W XOC BIOS for competitive overclockers
- One overclocker cracked a sample from thermal shock at extreme power levels — four samples remain for world record attempts
AMD-Meta MI450: Custom Silicon at Gigawatt Scale
The AMD-Meta deal introduces the first custom MI450 GPU of the MI400 generation (analogous to the MI300A custom part for LLNL). Key details:
- Tuned specifically for Meta’s inference workloads
- No additional tapeout required within the MI400 cycle (per AMD CFO Jean Hsu)
- Possible customizations: adjusted HBM stack count/speed, clock frequencies, or chiplet configuration ratios
- Delivered within “Helios” Open Rack Wide v3 rackscale systems, co-designed with Meta
- First 1GW delivery targeted for H2 2026
Meta is also adopting AMD’s upcoming “Venice” Zen 6 EPYC 9006 and future “Verrano” Zen 7 EPYC 9007 CPUs for both AI racks and general datacenter workloads (Facebook, Instagram).
Google Cloud N4 Series: Axion vs. EPYC Turin vs. Xeon
Phoronix benchmarked Google Cloud’s N4-series VMs at 16 vCPUs:
- N4A (Google Axion ARM64): $0.71/hr
- N4D (AMD EPYC 9B45 “Turin” Zen 5): $0.77/hr
- N4 (Intel Xeon Platinum 8581C “Emerald Rapids”): $0.82/hr
AMD EPYC Turin offers competitive performance at a slight cost premium over Axion, while Intel’s Emerald Rapids (notably not the newer Granite Rapids) comes in as the most expensive option. Performance-per-dollar analysis favored Axion for many workloads.
🏭 Industry & Market
AMD-Meta Platforms: A $115B+ Strategic Partnership
The headline deal of the week: AMD and Meta Platforms announced a 5-year, 6-gigawatt strategic partnership structured almost identically to AMD’s October 2025 OpenAI deal:
Financial structure:
- AMD issued Meta a warrant for 160 million shares (same structure as OpenAI deal) — estimated value ~$69B by 2030 if AMD stock reaches $600
- “Double digit billions per gigawatt” confirmed by CEO Lisa Su
- At ~$35,000/GPU average and ~550K GPUs/GW: ~$115.5B in GPU revenue over 5 years (~$23B/year average)
- Full rackscale system cost (~$35B/GW) pushes total higher
Strategic implications:
- AMD could achieve ~40% revenue share for AI accelerators at Meta (vs. ~50% for NVIDIA, ~10% for Meta’s own MTIA)
- OpenAI + Meta commitments alone = 2GW confirmed, providing manufacturing pipeline confidence
- Subsequent gigawatt tranches to be contracted through 2030 (~1.25GW/year from 2027–2030)
- Represents AMD’s clearest path to competing structurally with NVIDIA’s Blackwell/Rubin roadmap at hyperscaler scale
Lisa Su’s quote of the week: “We are making a big bet on Meta, and Meta is making a big bet on AMD.”
Context: NVIDIA also signed a deal with Meta last week for “millions of Blackwell and Rubin GPU accelerators,” estimated at $110B–$167B for GB300 NVL72 equivalents — Meta is clearly hedging across both vendors at enormous scale.
AMD Joins ARM’s CoreCollective Consortium
Arm and Linaro launched CoreCollective, a new open-source industry consortium focused on the ARM software ecosystem. AMD joined as a founding member alongside Google, Microsoft, Qualcomm, Samsung, Canonical, Fujitsu, Ampere Computing, Graphcore, CIX, and SUSE. NVIDIA is notably absent.
Focus areas: Android, data centers, confidential computing, edge computing, Linux fundamentals, virtualization.
AMD’s membership is strategically interesting given persistent rumors of an AMD ARM-powered APU codenamed “Sound Wave”, plus existing ARM exposure through the Xilinx acquisition.
🛠️ Developer Ecosystem
AMD Resource Manager: Enterprise GPU Sharing
AMD published a full walkthrough for the AMD Resource Manager, part of the AMD Enterprise AI Suite — a Kubernetes-native platform for centralized AI infrastructure governance on Instinct GPUs.
Key capabilities:
- Project-based isolation: GPU/CPU/memory quotas per team, with resource borrowing and preemption for priority-based scheduling
- Unified monitoring: Tracks workloads submitted via kubectl, Kubeflow, Flyte, and other tools
- GUI + CLI control plane: Dashboard, cluster health, user/secret/storage management
- AMD Inference Microservices (AIM) integration for LLM serving (e.g.,
meta-llama-llama-3-1-8b-instruct)
The preemption model allows lower-priority workloads to be suspended when higher-priority projects need their guaranteed quota resources back — critical for multi-team R&D clusters.
AMD SEV-SNP BTB Isolation: Confidential Computing Hardening
AMD posted Linux kernel patches enabling SEV-SNP Branch Target Buffer (BTB) isolation for AMD EPYC processors. The feature ensures that guest VMs protected by Secure Encrypted Virtualization-Secure Nested Paging cannot have their BTB state contaminated by host or peer-VM activity. Companion QEMU patches were also submitted. This advances AMD’s confidential computing story for regulated enterprise and government workloads.
ROCm Tooling Summary This Week
| Tool | Update | |—|—| | JAX-AITER | New open-source bridge; up to 9.68× attention speedup on MI350 | | PyTorch TunableOp | Offline tuning guide; FP8/TF32 support; rotating buffer benchmarking | | AMD Resource Manager | Enterprise GPU sharing with Kubernetes-native preemption | | SEV-SNP BTB Isolation | Linux kernel patches posted for confidential VM security |
📊 Key Takeaways
AMD had arguably its most consequential week of 2026: the Meta Platforms partnership — a five-year, 6-gigawatt, potentially $115B+ commitment built around custom MI450 silicon and co-designed Helios rackscale systems — represents a structural shift in how hyperscalers are diversifying away from NVIDIA dependency, with AMD now holding confirmed multi-gigawatt commitments from both OpenAI and Meta simultaneously. On the software front, AMD’s ROCm ecosystem is maturing rapidly with JAX-AITER delivering up to ~10× kernel-level speedups and TunableOp offline tuning providing enterprise-grade GEMM optimization workflows — critical infrastructure for developers choosing Instinct GPUs over CUDA-native alternatives. Meanwhile, the NVIDIA RTX 5090 Lightning Z scalping crisis and the ongoing Meta-NVIDIA Blackwell/Rubin deal running in parallel underscore a key market dynamic: AI hardware demand remains so acute that both AMD and NVIDIA are simultaneously winning landmark contracts at a scale the industry has never seen before.