News Weekly: 2026-02-09–2026-02-15
🧠 AI & GPU Industry Weekly Recap: Feb 9–15, 2026
🔑 Key Highlights
- AMD Instinct MI300X outperforms NVIDIA H100 in RLHF training benchmarks using the
verl0.6.0 framework with ROCm 7.0, achieving up to 56% higher PPO throughput on DeepSeek-LLM-7B and 36% on Qwen2-7B-Instruct - Cisco debuted the G300 ASIC at Cisco Live Amsterdam — a 102.4 Tb/sec switch chip with 1.6 Tb/sec ports targeting NVIDIA InfiniBand and NVSwitch in AI scale-up/scale-out networks
- AMD ROCm TheRock 7.11 released as the latest technology preview toward the ROCm 8.0 generation, while Canonical advances native ROCm packaging for Ubuntu 26.04 LTS
- AMD’s Peak Tops Limiter (PTL) feature for GFX 9.4.4 Instinct accelerators is entering Linux kernel review, enabling hardware-level compute throughput capping for power/thermal management
- NVIDIA GTC 2026 (March 16–19, San Jose) is shaping up as the major industry inflection point, with Jensen Huang’s keynote expected to deliver hardware and platform announcements
🤖 AI & Machine Learning
verl RLHF on AMD Instinct MI300X Beats H100
AMD’s ROCm team published detailed benchmarks showing verl 0.6.0 with ROCm 7.0.0 and vLLM 0.11.0.dev running RLHF workloads on 8x MI300X GPUs outperforming equivalent NVIDIA H100 setups:
| Algorithm | Model | MI300X Throughput | H100 Throughput | Delta |
|---|---|---|---|---|
| PPO | deepseek-llm-7b-chat | 1,428 tok/GPU/s | 910 tok/GPU/s | +56% |
| PPO | Qwen2-7B-Instruct | 1,514 tok/GPU/s | 1,109 tok/GPU/s | +36% |
| GRPO | deepseek-llm-7b-chat | 2,781 tok/GPU/s | 2,480 tok/GPU/s | +12% |
| GRPO | Qwen2-7B-Instruct | 2,739 tok/GPU/s | 2,467 tok/GPU/s | +11% |
The verl framework leverages FSDP, Megatron, vLLM, SGLang, and Ray for hybrid orchestration. AMD provides prebuilt Docker images (rocm/verl:verl-0.6.0.amd0_rocm7.0_vllm0.11.0.dev) for streamlined deployment on Slurm-managed multi-node clusters.
AMD Enterprise AI Solution Blueprints Launched
AMD published a catalog of 7 production-ready AI deployment blueprints built on AMD Inference Microservices (AIMs) and ROCm-powered Instinct GPUs, deployed via Helm charts on Kubernetes:
| Blueprint | Key Stack |
|---|---|
| AutoGen Studio | Microsoft AutoGen + Llama-3.3-70B @ FP8 |
| Continue.dev Coding Assistant | Code-server + VS Code + Qwen2.5-Coder-7B |
| LLM Chat | OpenWebUI |
| Financial Stock Intelligence (FSI) | LangChain + yfinance + Gradio |
| Agentic Translation | LangChain + Streamlit |
| Talk to Your Documents | ChromaDB + Infinity Embeddings + Gradio |
| Agentic Testing | Pydantic AI + Playwright MCP |
The modular architecture uses composable Helm “application charts” (aimchart-llm, aimchart-embedding, aimchart-chromadb) enabling consistent GPU scheduling and ROCm compatibility across all blueprints — a meaningful step toward lowering enterprise AI deployment friction on AMD hardware.
NVIDIA GTC 2026 — The Industry Calendar Event
NVIDIA GTC 2026 (March 16–19, San Jose’s SAP Center) is set to be the year’s defining AI hardware event. Jensen Huang’s keynote will anchor the conference, with tracks covering physical AI/robotics, agentic AI, and AI factory infrastructure. Certification workshops and hackathons targeting NVIDIA platform developers are part of the agenda. The event will be livestreamed globally.
⚡ GPU & Hardware
AMD Peak Tops Limiter (PTL) Coming to AMDGPU/AMDKFD
AMD is introducing the Peak Tops Limiter (PTL) for GFX 9.4.4 IP (current-gen Instinct accelerators). The hardware feature allows operators to cap peak TOPS delivery per GPU, dynamically adjusting engine frequency to stay within defined power/thermal budgets. Key details:
- Exposed via
/sys/class/drm/cardX/device/ptl/sysfs interface (root access) - Controls:
ptl_enable,ptl_supported_formats,ptl_format - Integrated with AMD SMI library and ROCm APIs for programmatic control
- New IOCTL option for user-space profiling control
amdgpu.ptl=kernel module parameter for boot-time configuration- Currently under patch review; will not land in the AMDGPU v7.0 cycle
This is a meaningful datacenter operations feature, giving cloud operators fine-grained control over workload-to-power ratios on Instinct clusters.
Cisco G300 ASIC: 102.4 Tb/sec Switch Targeting InfiniBand
Cisco unveiled the G300 Silicon One ASIC at Cisco Live Amsterdam, doubling the G200’s bandwidth and targeting NVIDIA InfiniBand and NVSwitch in AI backend networks:
- 102.4 Tb/sec aggregate bandwidth; 512 SerDes at 200 Gb/sec each (post-encoding)
- Supports 1.6 Tb/sec ports (64-port config), 800 Gb/sec (128-port), or high-radix lower-speed configurations
- 252 MB unified shared buffer — at least 2x G200, improving congestion handling
- Multichip design, likely TSMC 3nm (packet engine) + 4nm (SerDes chiplets)
- Lidless chip design for improved thermal management
- Deployed in: Nexus N9364-SG3 (air-cooled, NX-OS), Cisco 8133 (SONiC), Nexus N9363-SG2 (liquid-cooled, OCP ORv3N)
- Native LPO (Linear Pluggable Optics) support delivers ~30% total switch power savings vs. DSP-based optics
- Claims 33% higher network utilization and 28% faster AI job completion time vs. G200
- Also expanding P200 router lineup with commercial Nexus gear and line cards
Intel Xeon 6780E Sierra Forest vs. AMD EPYC 9965 Turin Dense
Fresh Linux 6.18 LTS benchmarks pit the Intel Xeon 6780E (2x, 144 cores/socket, 330W TDP, DDR5-6400 8-channel) against the AMD EPYC 9965 (2x, 192 cores/384 threads/socket, 500W TDP, DDR5-6400 12-channel, 384MB L3, AVX-512) on Ubuntu 25.10 + GCC 15.2. Intel’s Sierra Forest has benefited from ~14% performance improvements post-launch through Linux software optimizations, but EPYC 9965’s core density, memory bandwidth advantage (12 vs. 8 channels), and AVX-512 support continue to be key differentiators in HPC/AI workloads.
MSI Afterburner 4.6.7 Beta: 16-Pin Connector Safety Feature
MSI released Afterburner 4.6.7 beta adding GPU Safeguard+ monitoring for the 12V-2x6 (12VHPWR) connector on RTX 5080/5090-class cards. The feature works exclusively with MPG Ai1300TS and MPG Ai1600TS PSUs:
- Real-time pop-up warnings on connector fault detection
- Automatic power limit reduction to 75% via PSU telemetry via
PSU.dllplugin - Covers both NVIDIA and AMD GPUs with 16-pin connectors
- Also adds overclocking support for RTX 5090 LIGHTNING series
GNU Linux-Libre 6.19 and Kernel Open-Source Compliance
GNU Linux-libre 6.19-gnu released, stripping firmware blobs from Intel Xe graphics, Intel IWLWIFI WiFi, and critically the NVIDIA Nova-Core kernel graphics driver — confirming Nova-Core remains firmware-dependent and outside free software standards.
GNU Binutils 2.46: AMD Zen 6 Assembler Support Lands
GNU Binutils 2.46 released with initial AMD Zen 6 (znver6) support in the GNU Assembler (Gas), alongside GCC and LLVM/Clang. Key caveats: no tuned instruction scheduling model yet. Additional highlights include SFrame Version 3, Armv9.7 instructions, and removal of legacy NaCl and Solaris/PowerPC support.
🏭 Industry & Market
Cisco Challenges NVIDIA’s Networking Monopoly
The G300 announcement is Cisco’s most aggressive move yet into AI networking. With NVIDIA dominating both the GPU and InfiniBand/NVSwitch networking layers, Cisco’s pitch is a multi-vendor, Ethernet-native alternative that preserves security, microsegmentation, and supply chain diversity — all absent from InfiniBand. The G300 consolidates what previously required six G200 switches into a single unit, fundamentally changing rack-level economics for AI clusters. With hyperscalers facing gigawatt-scale infrastructure buildouts, the ~30% power savings from LPO optics is commercially significant.
AMD vs. NVIDIA in Enterprise AI Software
AMD’s dual moves this week — Solution Blueprints for enterprise AI deployment and verl RLHF benchmarks — represent a coordinated strategy to close the software/ecosystem gap with NVIDIA CUDA. By publishing Helm-based reference architectures for agentic AI, RAG, and RLHF, AMD is lowering the barrier to entry for enterprise teams evaluating Instinct GPUs. The verl throughput numbers, if reproducible in production, are commercially competitive with H100-based clusters.
Ubuntu 26.04 LTS ROCm Integration: Strategic Milestone
Canonical’s move to natively package ROCm in Ubuntu 26.04 LTS (April 2026) removes a longstanding friction point for AMD GPU compute adoption. The 30+ ROCm packages being upstreamed (including rocblas, MIOpen, hipblaslt, rccl, roctracer) mean enterprise Linux users will no longer need third-party AMD repositories — a distribution-level legitimization comparable to what CUDA enjoys on Ubuntu today.
🛠️ Developer Ecosystem
ROCm TheRock 7.11 Released
AMD released ROCm TheRock 7.11 as the latest technology preview build on the path to ROCm 8.0. While no formal changelog is published, Git activity indicates improvements across hardware support and optimization layers. The TheRock build system (introduced with ROCm 7.9 in October 2025) continues to serve as the modern build infrastructure for the ROCm stack. Stable ROCm versions remain in the 7.0–7.8 range; 7.9+ are preview-only.
verl 0.6.0 + ROCm 7.0: RLHF Framework Gains AMD-Native Support
AMD’s ROCm team has fully validated verl 0.6.0 with ROCm 7.0.0 and vLLM 0.11.0.dev, publishing Docker images, Slurm multi-node training scripts, and detailed PPO/GRPO configuration examples. Supported RLHF algorithms include PPO, GRPO, ReMax, REINFORCE++, RLOO, and PRIME. This is a notable expansion of AMD’s post-training ecosystem, particularly as RLHF becomes central to reasoning model development (O1/R1-class models).
AMD Enterprise AI Blueprints: Kubernetes-Native AI Deployment
AMD’s Solution Blueprint catalog introduces a Helm-based, OCI registry-deployable architecture for production AI on Instinct GPUs. The composable chart system (aimchart-llm, aimchart-embedding, aimchart-chromadb) provides a developer toolkit analogous to NVIDIA’s NIM containerized inference approach — signaling AMD’s intent to compete on the full-stack enterprise AI deployment experience, not just raw compute.
GNU Binutils 2.46 Toolchain Updates
For developers targeting AMD’s next-generation CPU architecture, GNU Binutils 2.46 delivers initial Zen 6 (znver6) assembler support. Combined with GCC and LLVM/Clang znver6 support already in progress, the upstream toolchain foundation for AMD Zen 6 is taking shape — though production-optimized instruction scheduling models are still pending.
Cisco G300 Programmable with P4
The G300 continues Silicon One’s tradition of full P4 programmability, allowing network engineers to implement custom forwarding logic and new features without hardware changes — an important capability for AI cluster operators who need to rapidly iterate on network behavior as GPU/XPU generations evolve.
📊 Key Takeaways
AMD had a remarkably dense week across the full hardware and software stack: the verl RLHF benchmarks showing MI300X outperforming H100 by up to 56% in PPO throughput, combined with the Kubernetes-native Solution Blueprints launch and Ubuntu 26.04 ROCm integration progress, represent the most coherent enterprise AI ecosystem push AMD has made to date — moving well beyond raw TOPS comparisons into deployment-ready software parity with NVIDIA. Cisco’s G300 ASIC is the most credible direct challenge yet to NVIDIA’s networking dominance, offering a single-chip alternative to six G200 units with 30% power savings through LPO optics, directly targeting the InfiniBand and NVSwitch revenue streams that have been among NVIDIA’s most profitable AI infrastructure businesses. With NVIDIA GTC 2026 arriving March 16, the competitive intensity across compute, networking, and software is setting up Q1 2026 as one of the most consequential quarters the AI infrastructure market has seen.