🧠 AI & GPU Industry Weekly Recap: Feb 9–15, 2026

🔑 Key Highlights

AMD Instinct MI300X outperforms NVIDIA H100 in RLHF training benchmarks using the verl 0.6.0 framework with ROCm 7.0, achieving up to 56% higher PPO throughput on DeepSeek-LLM-7B and 36% on Qwen2-7B-Instruct
Cisco debuted the G300 ASIC at Cisco Live Amsterdam — a 102.4 Tb/sec switch chip with 1.6 Tb/sec ports targeting NVIDIA InfiniBand and NVSwitch in AI scale-up/scale-out networks
AMD ROCm TheRock 7.11 released as the latest technology preview toward the ROCm 8.0 generation, while Canonical advances native ROCm packaging for Ubuntu 26.04 LTS
AMD’s Peak Tops Limiter (PTL) feature for GFX 9.4.4 Instinct accelerators is entering Linux kernel review, enabling hardware-level compute throughput capping for power/thermal management
NVIDIA GTC 2026 (March 16–19, San Jose) is shaping up as the major industry inflection point, with Jensen Huang’s keynote expected to deliver hardware and platform announcements

🤖 AI & Machine Learning

verl RLHF on AMD Instinct MI300X Beats H100

AMD’s ROCm team published detailed benchmarks showing verl 0.6.0 with ROCm 7.0.0 and vLLM 0.11.0.dev running RLHF workloads on 8x MI300X GPUs outperforming equivalent NVIDIA H100 setups:

Algorithm	Model	MI300X Throughput	H100 Throughput	Delta
PPO	deepseek-llm-7b-chat	1,428 tok/GPU/s	910 tok/GPU/s	+56%
PPO	Qwen2-7B-Instruct	1,514 tok/GPU/s	1,109 tok/GPU/s	+36%
GRPO	deepseek-llm-7b-chat	2,781 tok/GPU/s	2,480 tok/GPU/s	+12%
GRPO	Qwen2-7B-Instruct	2,739 tok/GPU/s	2,467 tok/GPU/s	+11%

The verl framework leverages FSDP, Megatron, vLLM, SGLang, and Ray for hybrid orchestration. AMD provides prebuilt Docker images (rocm/verl:verl-0.6.0.amd0_rocm7.0_vllm0.11.0.dev) for streamlined deployment on Slurm-managed multi-node clusters.

AMD Enterprise AI Solution Blueprints Launched

AMD published a catalog of 7 production-ready AI deployment blueprints built on AMD Inference Microservices (AIMs) and ROCm-powered Instinct GPUs, deployed via Helm charts on Kubernetes:

Blueprint	Key Stack
AutoGen Studio	Microsoft AutoGen + Llama-3.3-70B @ FP8
Continue.dev Coding Assistant	Code-server + VS Code + Qwen2.5-Coder-7B
LLM Chat	OpenWebUI
Financial Stock Intelligence (FSI)	LangChain + yfinance + Gradio
Agentic Translation	LangChain + Streamlit
Talk to Your Documents	ChromaDB + Infinity Embeddings + Gradio
Agentic Testing	Pydantic AI + Playwright MCP

The modular architecture uses composable Helm “application charts” (aimchart-llm, aimchart-embedding, aimchart-chromadb) enabling consistent GPU scheduling and ROCm compatibility across all blueprints — a meaningful step toward lowering enterprise AI deployment friction on AMD hardware.

NVIDIA GTC 2026 — The Industry Calendar Event

NVIDIA GTC 2026 (March 16–19, San Jose’s SAP Center) is set to be the year’s defining AI hardware event. Jensen Huang’s keynote will anchor the conference, with tracks covering physical AI/robotics, agentic AI, and AI factory infrastructure. Certification workshops and hackathons targeting NVIDIA platform developers are part of the agenda. The event will be livestreamed globally.

⚡ GPU & Hardware

AMD Peak Tops Limiter (PTL) Coming to AMDGPU/AMDKFD

AMD is introducing the Peak Tops Limiter (PTL) for GFX 9.4.4 IP (current-gen Instinct accelerators). The hardware feature allows operators to cap peak TOPS delivery per GPU, dynamically adjusting engine frequency to stay within defined power/thermal budgets. Key details:

Exposed via /sys/class/drm/cardX/device/ptl/ sysfs interface (root access)
Controls: ptl_enable, ptl_supported_formats, ptl_format
Integrated with AMD SMI library and ROCm APIs for programmatic control
New IOCTL option for user-space profiling control
amdgpu.ptl= kernel module parameter for boot-time configuration
Currently under patch review; will not land in the AMDGPU v7.0 cycle

This is a meaningful datacenter operations feature, giving cloud operators fine-grained control over workload-to-power ratios on Instinct clusters.

Cisco G300 ASIC: 102.4 Tb/sec Switch Targeting InfiniBand

Cisco unveiled the G300 Silicon One ASIC at Cisco Live Amsterdam, doubling the G200’s bandwidth and targeting NVIDIA InfiniBand and NVSwitch in AI backend networks:

102.4 Tb/sec aggregate bandwidth; 512 SerDes at 200 Gb/sec each (post-encoding)
Supports 1.6 Tb/sec ports (64-port config), 800 Gb/sec (128-port), or high-radix lower-speed configurations
252 MB unified shared buffer — at least 2x G200, improving congestion handling
Multichip design, likely TSMC 3nm (packet engine) + 4nm (SerDes chiplets)
Lidless chip design for improved thermal management
Deployed in: Nexus N9364-SG3 (air-cooled, NX-OS), Cisco 8133 (SONiC), Nexus N9363-SG2 (liquid-cooled, OCP ORv3N)
Native LPO (Linear Pluggable Optics) support delivers ~30% total switch power savings vs. DSP-based optics
Claims 33% higher network utilization and 28% faster AI job completion time vs. G200
Also expanding P200 router lineup with commercial Nexus gear and line cards

Intel Xeon 6780E Sierra Forest vs. AMD EPYC 9965 Turin Dense

Fresh Linux 6.18 LTS benchmarks pit the Intel Xeon 6780E (2x, 144 cores/socket, 330W TDP, DDR5-6400 8-channel) against the AMD EPYC 9965 (2x, 192 cores/384 threads/socket, 500W TDP, DDR5-6400 12-channel, 384MB L3, AVX-512) on Ubuntu 25.10 + GCC 15.2. Intel’s Sierra Forest has benefited from ~14% performance improvements post-launch through Linux software optimizations, but EPYC 9965’s core density, memory bandwidth advantage (12 vs. 8 channels), and AVX-512 support continue to be key differentiators in HPC/AI workloads.

MSI Afterburner 4.6.7 Beta: 16-Pin Connector Safety Feature

MSI released Afterburner 4.6.7 beta adding GPU Safeguard+ monitoring for the 12V-2x6 (12VHPWR) connector on RTX 5080/5090-class cards. The feature works exclusively with MPG Ai1300TS and MPG Ai1600TS PSUs:

Real-time pop-up warnings on connector fault detection
Automatic power limit reduction to 75% via PSU telemetry via PSU.dll plugin
Covers both NVIDIA and AMD GPUs with 16-pin connectors
Also adds overclocking support for RTX 5090 LIGHTNING series

GNU Linux-Libre 6.19 and Kernel Open-Source Compliance

GNU Linux-libre 6.19-gnu released, stripping firmware blobs from Intel Xe graphics, Intel IWLWIFI WiFi, and critically the NVIDIA Nova-Core kernel graphics driver — confirming Nova-Core remains firmware-dependent and outside free software standards.

GNU Binutils 2.46: AMD Zen 6 Assembler Support Lands

GNU Binutils 2.46 released with initial AMD Zen 6 (znver6) support in the GNU Assembler (Gas), alongside GCC and LLVM/Clang. Key caveats: no tuned instruction scheduling model yet. Additional highlights include SFrame Version 3, Armv9.7 instructions, and removal of legacy NaCl and Solaris/PowerPC support.

🏭 Industry & Market

Cisco Challenges NVIDIA’s Networking Monopoly

The G300 announcement is Cisco’s most aggressive move yet into AI networking. With NVIDIA dominating both the GPU and InfiniBand/NVSwitch networking layers, Cisco’s pitch is a multi-vendor, Ethernet-native alternative that preserves security, microsegmentation, and supply chain diversity — all absent from InfiniBand. The G300 consolidates what previously required six G200 switches into a single unit, fundamentally changing rack-level economics for AI clusters. With hyperscalers facing gigawatt-scale infrastructure buildouts, the ~30% power savings from LPO optics is commercially significant.

AMD vs. NVIDIA in Enterprise AI Software

AMD’s dual moves this week — Solution Blueprints for enterprise AI deployment and verl RLHF benchmarks — represent a coordinated strategy to close the software/ecosystem gap with NVIDIA CUDA. By publishing Helm-based reference architectures for agentic AI, RAG, and RLHF, AMD is lowering the barrier to entry for enterprise teams evaluating Instinct GPUs. The verl throughput numbers, if reproducible in production, are commercially competitive with H100-based clusters.

Ubuntu 26.04 LTS ROCm Integration: Strategic Milestone

Canonical’s move to natively package ROCm in Ubuntu 26.04 LTS (April 2026) removes a longstanding friction point for AMD GPU compute adoption. The 30+ ROCm packages being upstreamed (including rocblas, MIOpen, hipblaslt, rccl, roctracer) mean enterprise Linux users will no longer need third-party AMD repositories — a distribution-level legitimization comparable to what CUDA enjoys on Ubuntu today.

🛠️ Developer Ecosystem

ROCm TheRock 7.11 Released

AMD released ROCm TheRock 7.11 as the latest technology preview build on the path to ROCm 8.0. While no formal changelog is published, Git activity indicates improvements across hardware support and optimization layers. The TheRock build system (introduced with ROCm 7.9 in October 2025) continues to serve as the modern build infrastructure for the ROCm stack. Stable ROCm versions remain in the 7.0–7.8 range; 7.9+ are preview-only.

verl 0.6.0 + ROCm 7.0: RLHF Framework Gains AMD-Native Support

AMD’s ROCm team has fully validated verl 0.6.0 with ROCm 7.0.0 and vLLM 0.11.0.dev, publishing Docker images, Slurm multi-node training scripts, and detailed PPO/GRPO configuration examples. Supported RLHF algorithms include PPO, GRPO, ReMax, REINFORCE++, RLOO, and PRIME. This is a notable expansion of AMD’s post-training ecosystem, particularly as RLHF becomes central to reasoning model development (O1/R1-class models).

AMD Enterprise AI Blueprints: Kubernetes-Native AI Deployment

AMD’s Solution Blueprint catalog introduces a Helm-based, OCI registry-deployable architecture for production AI on Instinct GPUs. The composable chart system (aimchart-llm, aimchart-embedding, aimchart-chromadb) provides a developer toolkit analogous to NVIDIA’s NIM containerized inference approach — signaling AMD’s intent to compete on the full-stack enterprise AI deployment experience, not just raw compute.

GNU Binutils 2.46 Toolchain Updates

For developers targeting AMD’s next-generation CPU architecture, GNU Binutils 2.46 delivers initial Zen 6 (znver6) assembler support. Combined with GCC and LLVM/Clang znver6 support already in progress, the upstream toolchain foundation for AMD Zen 6 is taking shape — though production-optimized instruction scheduling models are still pending.

Cisco G300 Programmable with P4

The G300 continues Silicon One’s tradition of full P4 programmability, allowing network engineers to implement custom forwarding logic and new features without hardware changes — an important capability for AI cluster operators who need to rapidly iterate on network behavior as GPU/XPU generations evolve.

📊 Key Takeaways

AMD had a remarkably dense week across the full hardware and software stack: the verl RLHF benchmarks showing MI300X outperforming H100 by up to 56% in PPO throughput, combined with the Kubernetes-native Solution Blueprints launch and Ubuntu 26.04 ROCm integration progress, represent the most coherent enterprise AI ecosystem push AMD has made to date — moving well beyond raw TOPS comparisons into deployment-ready software parity with NVIDIA. Cisco’s G300 ASIC is the most credible direct challenge yet to NVIDIA’s networking dominance, offering a single-chip alternative to six G200 units with 30% power savings through LPO optics, directly targeting the InfiniBand and NVSwitch revenue streams that have been among NVIDIA’s most profitable AI infrastructure businesses. With NVIDIA GTC 2026 arriving March 16, the competitive intensity across compute, networking, and software is setting up Q1 2026 as one of the most consequential quarters the AI infrastructure market has seen.

News Weekly: 2026-02-09–2026-02-15