Technical Intelligence Report: 2026-01-21

Executive Summary

ROCm Ecosystem Maturity: ROCm 7.2 has been released, introducing support for RDNA4 hardware (RX 9060 XT LP) and the new ROCm Optiq visualization tool. Simultaneously, ROCm has achieved “First-Class Platform” status in vLLM, with CI pass rates hitting 93% and official Docker/Wheel support.
Framework Updates: PyTorch 2.10 is live, featuring improved RDNA 3.5 (GFX1150) support and Grouped GEMM via CK for AMD GPUs.
Next-Gen Hardware Prep: Linux patches reveal Zen 6 “Venice” EPYC features, specifically focusing on advanced bandwidth enforcement (GLBE, GLSBE) and privilege management (PLZA).
Market Competition: Upscale AI raised $200M to build “SkyHammer,” a UALink-compatible switch ASIC intended to rival NVIDIA’s NVSwitch, bolstering the open ecosystem AMD utilizes.
NVIDIA Activity: Benchmarks compare NVIDIA’s GB10 (Grace-Blackwell) Arm cores against AMD’s “Strix Halo” Ryzen AI Max+. Jensen Huang is visiting China to negotiate H200 shipments and advocating for “AI as Infrastructure” at Davos.

🤖 ROCm Updates & Software

[2026-01-21] ROCm Becomes a First-Class Platform in vLLM (#2016)

Source: ROCm Tech Blog

Key takeaway relevant to AMD:

Critical Stability Milestone: ROCm is now a “first-class citizen” in vLLM, meaning AMD hardware support is no longer experimental. This significantly reduces friction for enterprise deployment of LLMs on MI300/MI350 series.
Ease of Deployment: Official Docker images and pip wheels remove the need for developers to build from source, a major previous pain point.

Summary:

ROCm support in vLLM (v0.12.0 - v0.14.0) has been massively upgraded.
CI (Continuous Integration) stability for AMD hardware improved from 37% passing (Nov 2025) to 93% passing (Jan 2026).
Native support added for vLLM-omni (multimodal inference).

Details:

Official Docker Images: Now available via vllm/vllm-openai-rocm:v0.14.0.
Installation: Simplified to uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/. Supports ROCm 7.0 and Python 3.12.
Performance Optimizations:
- Quantization: Native AITER FP8 kernels, fused LayerNorm/SiLU FP8, MXFP4 w4a4 MoE inference.
- Architecture: Optimized KV cache, assembly Paged Attention, and removal of DeepSeek MLA D2D copies.
- Hardware: Validated on MI300X, MI325X, MI350X, MI355X (gfx942, gfx950 architectures).
vLLM-omni: Supports audio/image/video input and text/audio output. Optimized configs provided for Qwen2.5-Omni and Qwen3-Omni-MoE.

[2026-01-21] AMD ROCm 7.2 Now Released With More Radeon Graphics Cards Supported, ROCm Optiq Introduced

Source: Phoronix

Key takeaway relevant to AMD:

RDNA4 Support: Early support for next-gen consumer hardware (RX 9060 XT LP) indicates RDNA4 compute support is arriving faster than previous generations.
Tooling Expansion: The introduction of ROCm Optiq addresses a long-standing gap in visualization and profiling tools compared to NVIDIA Nsight.

Summary:

ROCm 7.2.0 released with expanded hardware support and new tooling.
Official support added for RDNA4 and specific RDNA3 consumer cards.
Introduction of “ROCm Optiq” (Beta) and “ROCm Simulation”.

Details:

New Hardware Support:
- RDNA4: AMD Radeon RX 9060 XT LP (32 CUs, 64 AI accelerators, 16GB GDDR6).
- Workstation: AMD Radeon AI PRO R9600D (3072 SPs, 32GB GDDR6).
- RDNA3: Official support for Radeon RX 7700 series.
Software Features:
- ROCm Optiq: A new visualization platform (Windows/Linux) for viewing GPU traces from profiling tools.
- ROCm Simulation: Toolkit for physics-based/numerical simulations.
- HIP Updates: SPIR-V support for hipCUB and rocThrust; node power management for multi-GPU nodes.
- OS Support: MI350X/MI355X support added to SUSE Linux Enterprise Server 15 SP7.

[2026-01-21] PyTorch 2.10 Released With More Improvements For AMD ROCm & Intel GPUs

Source: Phoronix

Key takeaway relevant to AMD:

RDNA 3.5 Integration: Support for GFX1150/GFX1151 (likely Ryzen AI APUs) in hipblaslt enables better on-device AI performance for laptops/embedded.
Gemm Optimization: Grouped GEMM support via Composable Kernel (CK) improves efficiency for transformer workloads.

Summary:

PyTorch 2.10 released with broad updates for ROCm, Intel XPU, and CUDA.
Focus on kernel optimizations and Windows support for ROCm.

Details:

AMD ROCm Improvements:
- Enabled grouped GEMM via regular GEMM fallback and via CK (Composable Kernel).
- Added GFX1150/GFX1151 (RDNA 3.5) to hipblaslt-supported GEMM lists.
- Support for scaled_mm v2 and AOTriton scaled_dot_product_attention.
- Code generation support for fast_tanhf.
- Improved heuristics for pointwise kernels.
- Enhanced ROCm support on Windows.
General/Competitor Updates:
- Intel: New Torch XPU APIs, SYCL support for custom operators on Windows.
- NVIDIA: CUDA 13 compatibility improvements, CUTLASS MATMULs on Thor.
- Python: Experimental support for Python 3.14 free-threaded build.

[2026-01-21] Add new author: Mohit Deopujari (#2013)

Source: ROCm Tech Blog

Key takeaway relevant to AMD:

Minimal technical impact; indicates expanding documentation/blogging team for ROCm.

Summary:

Administrative commit adding Mohit Deopujari to the ROCm blog author list.

Details:

Updated .authorlist.txt and added author metadata/image.

🔲 AMD Hardware & Products

[2026-01-21] AMD Sends Out Linux Patches For Next-Gen EPYC Features: GLBE, GLSBE & PLZA

Source: Phoronix

Key takeaway relevant to AMD:

Zen 6 “Venice” Features: Confirms advanced QoS (Quality of Service) features for next-gen EPYC servers, critical for multi-tenant cloud and mixed-workload AI clusters.
Resource Control: AMD is significantly enhancing the resctrl (Resource Control) capabilities in Linux to manage bandwidth contention.

Summary:

19 Linux kernel patches released for EPYC “Venice” (Zen 6) processors.
Introduces three key features: GLBE, GLSBE, and PLZA.

Details:

GLBE (Global Bandwidth Enforcement): Allows software to set bandwidth limits for thread groups spanning multiple QoS Domains. Sets a ceiling for “L3 External Bandwidth.”
GLSBE (Global Slow Bandwidth Enforcement): Similar to GLBE but specifically for “Slow Memory” (CXL or tiered memory), managing bandwidth limits across multiple QoS domains.
PLZA (Privilege Level Zero Association): Hardware mechanism to automatically associate Ring 0 (kernel/privileged) execution with a specific Class of Service (COS) or Resource Monitoring Identifier (RMID), overriding per-thread associations.

🤼‍♂️ Market & Competitors

[2026-01-21] The CPU Performance Of The NVIDIA GB10 With The Dell Pro Max vs. AMD Ryzen AI Max+ “Strix Halo”

Source: Phoronix

Key takeaway relevant to AMD:

Direct Comparison: Phoronix is benchmarking the “Grace” portion of the Grace-Blackwell superchip against AMD’s top-tier Strix Halo APU. This highlights the blurring line between high-end client APUs and server-grade Arm CPUs.

Summary:

Benchmarking NVIDIA GB10 (Grace-Blackwell) CPU cores vs. AMD Ryzen AI Max+ 395 (Framework Desktop).
Focus is on traditional Linux CPU workloads, not just AI.

Details:

NVIDIA GB10 Specs: 20 Arm cores (10x Cortex-X925 performance cores + 10x Cortex-A725 efficiency cores). 128GB LPDDR5x memory.
AMD System: Framework Desktop with Ryzen AI Max+ 395 “Strix Halo”.
Testing Constraint: NVIDIA GB10 does not expose CPU power metrics via Linux PowerCap/RAPL; testing relied on total AC system power.
Software Environment: Both systems running Ubuntu 24.04.3 LTS, Linux 6.14 kernel, GCC 13.3.

[2026-01-21] Upscale AI Nabs Cash To Forge “SkyHammer” Scale Up Fabric Switch

Source: The Next Platform

Key takeaway relevant to AMD:

UALink Momentum: Upscale AI is building a switch for UALink, the AMD-backed open standard competitor to NVLink. A high-performance merchant silicon switch for UALink is required for AMD (and others) to effectively compete with NVIDIA’s rack-scale NVSwitch architecture.

Summary:

Upscale AI raised $200M (Series A) to develop “SkyHammer,” a high-radix scale-up fabric switch.
Valuation over $1 billion.
Targeting samples in late 2026, volume in 2027.

Details:

Technology: “SkyHammer” ASIC supports UALink (Universal Accelerator Link) and Meta’s ESUN standard.
Goal: Create a heterogeneous, high-bandwidth memory coherent fabric switch to compete with NVIDIA NVSwitch.
Specs: UALink 1.0 spec allows for 1,024 compute engines in a single-level fabric; SkyHammer aims to support this scale.
Founders: Ex-Auradine, Cavium, and Innovium executives (Rajiv Khemani, Barun Kar).
Market Context: Provides an alternative to proprietary NVLink, essential for the “anti-NVIDIA” coalition (AMD, Intel, Broadcom, etc.).

[2026-01-21] Nvidia CEO Jensen Huang to visit China as company prepares to start H200 shipments…

Source: Tom’s Hardware

Key takeaway relevant to AMD:

Export Control Dynamics: NVIDIA is actively pushing H200 into China despite restrictions. If successful, this maintains NVIDIA’s CUDA dominance in China, making it harder for AMD’s MI300 series to gain footholds in Alibaba/Baidu clouds despite AMD’s unrestricted offerings.

Summary:

Jensen Huang is visiting China (Beijing) during Lunar New Year.
Purpose: Negotiate H200 GPU shipments and attend internal events.

Details:

H200 Situation: US government allows some export; Chinese government has curbs on imports to foster self-sufficiency.
Target Clients: Alibaba, Baidu (commercial use cases relying on CUDA).
Restrictions: China plans to prohibit H200 for military/state-owned enterprises.

[2026-01-21] ‘Largest Infrastructure Buildout in Human History’: Jensen Huang on AI’s ‘Five-Layer Cake’ at Davos

Source: NVIDIA Blog

Key takeaway relevant to AMD:

Infrastructure Narrative: NVIDIA is positioning AI not as hardware but as “critical national infrastructure” (like roads/electricity). This narrative drives sovereign AI investments, a sector AMD is also targeting.

Summary:

Jensen Huang spoke at World Economic Forum (Davos) with BlackRock CEO Larry Fink.
Described AI as the “largest infrastructure buildout in human history.”

Details:

Five-Layer Cake: Energy, Chips, Cloud Infrastructure, AI Models, Applications.
Economic Impact: 2025 saw >$100B in VC funding, mostly for AI-native companies.
Workforce: Emphasized AI creating demand for skilled labor (electricians, datacenters) and augmenting roles (radiologists, nurses) rather than replacing them.

💬 Reddit & Community

[2026-01-21] Unlucky customer buys RTX 5080, receives relabelled RTX 5060 Ti in the box instead…

Source: Tom’s Hardware

Key takeaway relevant to AMD:

High-End GPU Scarcity/Value: Highlights the chaos in the high-end consumer GPU market. While regarding NVIDIA, these “return scams” often affect high-value AMD cards (RX 7900/9000 series) as well, serving as a warning for retail channel integrity.

Summary:

Amazon customer bought an RTX 5080 but received an RTX 5060 Ti inside the box.
The scammer applied fake “RTX 5080” stickers to the lower-end card.

Details:

Detection: The card had an 8-pin PCIe connector (Blackwell RTX 5080 uses 16-pin), exposing the swap.
Method: Likely a “return switcheroo” where a previous buyer kept the 5080 and returned a modified 5060 Ti, which Amazon restocking failed to catch.

📈 GitHub Stats

Category	Repository	Total Stars	1-Day
AMD Ecosystem	AMD-AGI/GEAK-agent	56	0
AMD Ecosystem	AMD-AGI/Primus	66	0
AMD Ecosystem	AMD-AGI/TraceLens	54	0
AMD Ecosystem	ROCm/MAD	31	0
AMD Ecosystem	ROCm/ROCm	6,094	+6
Compilers	openxla/xla	3,914	0
Compilers	tile-ai/tilelang	4,781	+12
Compilers	triton-lang/triton	18,205	+11
Google / JAX	AI-Hypercomputer/JetStream	403	+1
Google / JAX	AI-Hypercomputer/maxtext	2,102	+2
Google / JAX	jax-ml/jax	34,655	+14
HuggingFace	huggingface/transformers	155,512	+49
Inference Serving	alibaba/rtp-llm	1,027	+3
Inference Serving	efeslab/Atom	333	0
Inference Serving	llm-d/llm-d	2,380	0
Inference Serving	sgl-project/sglang	22,650	+32
Inference Serving	vllm-project/vllm	68,077	+99
Inference Serving	xdit-project/xDiT	2,510	+3
NVIDIA	NVIDIA/Megatron-LM	14,987	+13
NVIDIA	NVIDIA/TransformerEngine	3,103	+5
NVIDIA	NVIDIA/apex	8,899	+2
Optimization	deepseek-ai/DeepEP	8,908	+5
Optimization	deepspeedai/DeepSpeed	41,331	+14
Optimization	facebookresearch/xformers	10,284	+1
PyTorch & Meta	meta-pytorch/monarch	953	+2
PyTorch & Meta	meta-pytorch/torchcomms	321	0
PyTorch & Meta	meta-pytorch/torchforge	600	+2
PyTorch & Meta	pytorch/FBGEMM	1,519	+1
PyTorch & Meta	pytorch/ao	2,641	+2
PyTorch & Meta	pytorch/audio	2,814	+1
PyTorch & Meta	pytorch/pytorch	96,801	+36
PyTorch & Meta	pytorch/torchtitan	4,987	+3
PyTorch & Meta	pytorch/vision	17,460	+2
RL & Post-Training	THUDM/slime	3,464	+17
RL & Post-Training	radixark/miles	749	+5
RL & Post-Training	volcengine/verl	18,596	+34

Daily Update: 2026-01-21 (09:22 PM)

Technical Intelligence Report: 2026-01-21

Executive Summary

🤖 ROCm Updates & Software

[2026-01-21] ROCm Becomes a First-Class Platform in vLLM (#2016)

[2026-01-21] AMD ROCm 7.2 Now Released With More Radeon Graphics Cards Supported, ROCm Optiq Introduced

[2026-01-21] PyTorch 2.10 Released With More Improvements For AMD ROCm & Intel GPUs

[2026-01-21] Add new author: Mohit Deopujari (#2013)

🔲 AMD Hardware & Products

[2026-01-21] AMD Sends Out Linux Patches For Next-Gen EPYC Features: GLBE, GLSBE & PLZA

🤼‍♂️ Market & Competitors

[2026-01-21] The CPU Performance Of The NVIDIA GB10 With The Dell Pro Max vs. AMD Ryzen AI Max+ “Strix Halo”

[2026-01-21] Upscale AI Nabs Cash To Forge “SkyHammer” Scale Up Fabric Switch

[2026-01-21] Nvidia CEO Jensen Huang to visit China as company prepares to start H200 shipments…

[2026-01-21] ‘Largest Infrastructure Buildout in Human History’: Jensen Huang on AI’s ‘Five-Layer Cake’ at Davos

💬 Reddit & Community

[2026-01-21] Unlucky customer buys RTX 5080, receives relabelled RTX 5060 Ti in the box instead…

📈 GitHub Stats

🔗 References