News Weekly: 2026-01-19–2026-01-25
🖥️ AI & GPU Weekly Recap: January 19–25, 2026
🔑 Key Highlights
- AMD ROCm 7.2 officially released, introducing ROCm Optiq and expanding Radeon GPU support, marking a significant milestone for AMD’s AI/compute software stack
- ROCm achieves first-class status in vLLM, with official Docker images, pip-installable wheels, and CI stability jumping from 37% to 93% pass rate in just two months — a watershed moment for AMD in the AI inference ecosystem
- Eric Demers, the legendary GPU architect behind ATI’s R300/R600 and Qualcomm’s entire Adreno lineup, joins Intel to lead AI accelerator GPU design, signaling a major talent injection for Intel’s struggling datacenter ambitions
- NVIDIA GB10 benchmarked against GH200, providing concrete performance reference data between the compact Dell Pro Max GB10 and the high-end Grace Hopper Superchip on CUDA 13.0 / Ubuntu 24.04
- AMD new-gen EPYC “Venice” (Zen 6) Linux patches emerge, revealing Global Bandwidth Enforcement (GLBE), Global Slow Bandwidth Enforcement (GLSBE), and Privilege Level Zero Association (PLZA) server features ahead of launch
🤖 AI & Machine Learning
AMD ROCm Becomes First-Class in vLLM
This week’s biggest software story: AMD formally announced that ROCm is now a first-class platform in the vLLM ecosystem. The journey, documented in a detailed ROCm Tech Blog post, covers milestones across vLLM v0.12.0 through v0.14.0:
- Quantization gains: Native AITER FP8 kernels, fused LayerNorm/SiLU FP8 block quantization, MXFP4 w4a4 MoE inference, and FP8 MLA decode — all now available on AMD Instinct GPUs
- Performance wins: Optimized KV cache + assembly Paged Attention, AITER sampling ops, removal of DeepSeek MLA D2D copies, and fastsafetensors loading
- New model support: DeepSeek v3.2 + SparseMLA, Whisper v1 with AITER attention, Qwen3-Omni MoE with Tensor Parallelism
- CI stability: From 37% to 93% of AMD CI test groups passing between mid-November 2025 and mid-January 2026
- Official Docker images (
vllm/vllm-openai-rocm:v0.14.0) and pip wheels (uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/) now officially available - vLLM-omni Day-0 ROCm support validated on AMD Instinct MI300/MI350 GPUs (gfx942, gfx950), with official Docker image (
vllm/vllm-omni-rocm:v0.12.0rc1) available since January 6, 2026
Intel LLM-Scaler-Omni 0.1.0-b5
Intel released LLM-Scaler-Omni 0.1.0-b5 for Arc Battlemage graphics, adding:
- Python 3.12 and PyTorch 2.9 support
- ComfyUI upgrades including Qwen-Image-Layered, Qwen-Image-Edit-2511, HY-Motion, and ComfyUI-GGUF model support
- SGLang Diffusion updates: CacheDiT, Tensor Parallelism for multi-XPU inference, SGLD ComfyUI custom node
AMD GPU Patents Signal Ray Tracing Shift
Reddit discussions this week pointed to AMD GPU patents suggesting a hardware-accelerated ray tracing architectural shift in future generations, though full details remain blocked from public access.
⚡ GPU & Hardware
AMD ROCm 7.2 Released
AMD formally released ROCm 7.2.0 for Linux, first teased at CES earlier in January. Key additions include:
- Expanded Radeon graphics card support
- Introduction of ROCm Optiq (a new component within the ROCm stack)
- Continued Instinct MI300 and MI350 series optimization
NVIDIA GB10 vs. GH200 Benchmarks
Phoronix published reference benchmarks comparing the NVIDIA GB10 (as found in the Dell Pro Max GB10) against the NVIDIA GH200 Grace Hopper Superchip. Both systems ran Ubuntu 24.04 LTS / NVIDIA DGX OS with Linux 6.14 kernel and CUDA 13.0. Key takeaway: GH200 remains substantially faster than GB10 — expected given the hardware tier difference — but the GB10 punches above its weight for a compact, power-efficient form factor. Testing was facilitated by GPTshop.ai providing remote GH200 access on a Pegatron JIMBO P4352 motherboard.
AMDGPU Linux Driver: HDMI VRR & Auto Low Latency Mode
Community developer Tomasz Pakuła submitted new AMDGPU kernel driver patches enabling:
- HDMI Variable Refresh Rate (VRR) — previously only working over DisplayPort/FreeSync
- HDMI 2.1 Auto Low Latency Mode (ALLM) — automatic “Game Mode” for reduced input lag
- Fixes for VRR detection with GTF flag monitors and DP→HDMI PCON handling These patches were developed using public information rather than HDMI Forum licensed specs, circumventing prior restrictions.
Valve Contributes AMDGPU GCN 1.0 Power Management Fixes
Valve contractor Timur Kristóf posted new AMDGPU patches targeting 14-year-old GCN 1.0 (Southern Islands) graphics processors:
- Radeon R5 430 power management fix: raising TDP from 24W to 32W workaround to prevent powertune throttling
- Revised maximum SCLK limit to 780 MHz (VBIOS spec) for the Radeon 430
- Better handling of
power2_capfor cards unable to read power limits
RADV Vulkan Driver Gets HPLOC Ray-Tracing Acceleration
Valve contractor Konstantin Seurer merged HPLOC (Hierarchical Parallel Locally-Ordered Clustering) support into the RADV Vulkan driver for Mesa 26.0:
- HPLOC replaces PLOC for BVH (Bounding Volume Hierarchy) construction
- Saves ~1ms of frame time on TLAS builds
- Cyberpunk 2077 reports ~5% performance improvement; other games see 2–3% gains
- Merged and on track for Mesa 26.0 stable release in February 2026
Intel Arc GPU Firmware Updates Coming to Non-x86 Platforms
Patches queued for Linux 6.20/7.0 will enable Intel discrete GPU (Arc) firmware updating on ARM64 and RISC-V systems via the Intel Xe kernel driver. Previously restricted to x86/x86_64, these patches have landed in Greg Kroah-Hartman’s char-misc-next branch and are expected to merge in the mid-February window.
GPU Counterfeit Scam Warning
A notable consumer warning surfaced: China-based fraudsters are exploiting the ongoing DRAM shortage to pass off NVIDIA RTX 3060 Mobile (GA106, Ampere) chips as RTX 4080 (AD103, Ada Lovelace) GPUs on second-hand platforms like Xianyu. Even the die engravings were counterfeited. With Nvidia reportedly cutting board partner supply by up to 20% and retail prices surging, GPU fraud is at elevated risk levels.
🏭 Industry & Market
Eric Demers Leaves Qualcomm for Intel — Massive Talent Move
The most significant industry personnel move of the week: Eric Demers, widely regarded as one of the world’s top GPU architects, has joined Intel with a focus on AI accelerators after 14 years at Qualcomm. His credentials:
- Designed ATI’s iconic R300 (Radeon 9700/9500 series) and R600 GPUs
- Served as AMD’s graphics Chief Technology Officer post-ATI acquisition until 2012
- Spearheaded virtually all of Qualcomm’s Adreno GPU designs across Snapdragon chips
- Early career at Silicon Graphics and Matrox
Demers is expected to focus on Intel’s Falcon Shores / Jaguar Shores AI accelerator roadmap and potentially Crescent Island (inference-focused design), rather than consumer Arc graphics. Analysts at Moor Insights and Strategy called the move “bigger than people realize,” noting that GPU architects capable of building from the ground up are extraordinarily rare.
Intel’s current datacenter AI lineage — three generations of Gaudi (Gaudi 3 from 2024 being the latest, positioned as an H100 alternative) — is seen as underperforming vs. NVIDIA Blackwell and AMD MI300X/MI350X, making this hire strategically critical.
AMD Commits to MSRP Pricing Amid DRAM Crisis
Amid the escalating DRAM shortage, AMD has publicly committed to keeping GPU prices as close to MSRP as possible. NVIDIA, by contrast, has reportedly slashed supply to board partners by up to 20%, contributing to price surges across the retail and secondary markets.
AMD EPYC Venice (Zen 6) Server Features Revealed
AMD submitted 19 Linux kernel patches revealing new enterprise features tied to next-generation EPYC “Venice” (Zen 6) processors:
- GLBE (Global Bandwidth Enforcement): Cross-domain L3 external bandwidth ceiling control
- GLSBE (Global Slow Bandwidth Enforcement): Bandwidth ceiling for L3 external bandwidth to slow memory across QOS domains
- PLZA (Privilege Level Zero Association): Automatic COS/RMID association for CPL=0 execution
These features integrate with Linux’s
resctrlresource control framework. Official AMD documentation is expected in the coming weeks, ahead of EPYC Venice’s anticipated launch later in 2026.
🛠️ Developer Ecosystem
vLLM ROCm: pip install Just Works
The formalization of pip install vllm working out-of-the-box on AMD ROCm hardware (via official wheels at wheels.vllm.ai/rocm/) is a developer experience milestone. The current wheel supports Python 3.12, ROCm 7.0, and glibc ≥ 2.35. Nightly builds and broader Radeon consumer GPU support are on the roadmap.
Mesa 26.0 on Track for February Release
The RADV HPLOC merge and ongoing AMDGPU improvements are landing in Mesa ahead of the Mesa 26.0 stable release expected in February 2026, delivering meaningful ray-tracing performance improvements for Linux gamers on AMD hardware with no user action required.
DragonFlyBSD AMDGPU Gains GCN 1.1 Support
DragonFlyBSD enabled optional GCN 1.1 (Sea Islands/CIK) support in its AMDGPU driver, requiring drm.amdgpu.cik_support=1 in the bootloader config. The BSD project’s AMDGPU driver remains based on Linux 4.20.17 kernel sources — significantly behind upstream — but progress continues on legacy GPU support.
AMD EPYC Zen 6 Compiler Support Already Merged
As noted alongside the GLBE/GLSBE/PLZA patches, AMD’s Zen 6 “znver6” ISA support is already present in GCC 16, including expected features like AVX-512 BMM and support for 16-channel memory. The Linux kernel patches for Venice are being proactively submitted well ahead of hardware launch.
📊 Key Takeaways
The week’s defining narrative was AMD’s accelerating ecosystem momentum: ROCm’s elevation to first-class status in vLLM — with real CI metrics, official packages, and multimodal model support on Instinct MI300/MI350 — represents AMD’s most credible challenge yet to NVIDIA’s CUDA-dominated AI software moat. Meanwhile, Intel’s hire of Eric Demers is a strategic bet that exceptional architectural talent can accelerate its Gaudi successor roadmap and close the widening gap with NVIDIA Blackwell and AMD Instinct in the datacenter AI race. Underpinning everything is a hardware supply environment growing more constrained — the DRAM shortage is driving both legitimate price increases and a surge in GPU counterfeiting — making software stack differentiation and open-ecosystem trust more critical than ever for AMD and Intel as they pursue NVIDIA’s dominance.