News: 2026-02-12
February 12, 2026 · Generated 08:36 AM PT
Technical Intelligence Report: 2026-02-12
Executive Summary
- ROCm/MI300X Performance: AMD has released performance data for the
verlRLHF framework on ROCm 7.0.0. The MI300X (8x) demonstrates up to 56% higher throughput in PPO training compared to the NVIDIA H100. - Linux Driver Unification: AMD has unified video decode implementations between RadeonSI (Gallium3D) and RADV (Vulkan) in Mesa 26.1, enabling Vulkan Video support for older GPUs (Hawaii generation) and reducing code maintenance.
- Linux Scheduler Optimizations: Intel released v3 patches for “Cache Aware Scheduling” for the Linux kernel. While Intel-authored, benchmarks indicate significant performance gains for AMD EPYC Turin processors by reducing cache bouncing.
- Competitor Expansion: NVIDIA is aggressively expanding Sovereign AI initiatives in Latin America, securing a cloud partnership with Claro in Brazil amidst a $4B government AI investment plan.
🤖 ROCm Updates & Software
[2026-02-12] Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm 7.0.0
Source: ROCm Tech Blog
Key takeaway relevant to AMD:
- Performance Win: MI300X outperforms NVIDIA H100 in both PPO and GRPO RLHF training throughput (up to 56% faster).
- Software Maturity: Validates ROCm 7.0.0 and vLLM 0.11.0.dev stability for complex reinforcement learning workflows.
- Ease of Deployment: AMD is providing prebuilt Docker images (
rocm/verl) to streamline adoption.
Summary:
- AMD integrated the
verl(Volcano Engine Reinforcement Learning) framework with ROCm 7.0.0. - The update supports single-node and multi-node training using Ray and vLLM.
- Benchmarks show leadership performance on AMD Instinct MI300X versus H100.
Details:
- Software Stack: verl 0.6.0, ROCm 7.0.0, vLLM 0.11.0.dev.
- Hardware Setup: 8x AMD Instinct MI300X (192 GB HBM3) vs. 8x NVIDIA H100.
- PPO (Proximal Policy Optimization) Results:
- Deepseek-llm-7b-chat: MI300X achieved 1,428.64 tokens/GPU/s vs. H100’s 910.42 (+56%).
- Qwen2-7B-Instruct: MI300X achieved 1,514.46 tokens/GPU/s vs. H100’s 1,109.38 (+36%).
- Convergence accuracy remained comparable between platforms (~69% for Deepseek, ~85% for Qwen2).
- GRPO (Group Relative Policy Optimization) Results:
- Deepseek-llm-7b-chat: MI300X achieved 2,781.74 tokens/GPU/s vs. H100’s 2,480.54 (+12%).
- Qwen2-7B-Instruct: MI300X achieved 2,739.34 tokens/GPU/s vs. H100’s 2,467.10 (+11%).
- Implementation:
- Uses
HIP_VISIBLE_DEVICESfor GPU isolation (main difference from CUDA). - Supports Slurm for multi-node scaling.
- Docker image available:
rocm/verl:verl-0.6.0.amd0_rocm7.0_vllm0.11.0.dev.
- Uses
[2026-02-12] AMD Video Decode Now Unified Between RadeonSI & RADV Vulkan Video
Source: Phoronix (AMD Linux)
Key takeaway relevant to AMD:
- Driver Efficiency: Unified code path reduces maintenance overhead (-1.4k lines of code) in Mesa 26.1.
- Feature Expansion: Enables RADV Vulkan Video support for legacy hardware (Hawaii GPUs and older), which previously lacked this capability in Vulkan.
Summary:
- Merged into Mesa 26.1-devel today, unifying video decode logic for RadeonSI (Gallium3D) and RADV (Vulkan).
- Led by AMD engineer David Rosca.
Details:
- Scope: Covers Video Core Next (VCN), VCN JPEG, and Unified Video Decode (UVD) engines.
- Technical Change: Shifts ~6,000 lines of code to a shared interface, resulting in a net reduction of ~1,400 lines.
- Impact:
- RADV driver now uses the same robust decode implementation as RadeonSI.
- Immediate support for Vulkan Video decoding on older hardware generations (pre-Polaris/Vega).
- Stable debut expected in Mesa 26.1 (Q2 2026).
🤼♂️ Market & Competitors
[2026-02-12] Intel Posts 2026 Update For Cache Aware Scheduling On Linux
Source: Phoronix (AMD Linux)
Key takeaway relevant to AMD:
- Cross-Vendor Benefit: Although authored by Intel, independent benchmarks show these patches provide “Big Potential” for AMD EPYC Turin processors.
- Performance Uplift: Improves performance in multi-cache domain architectures (Chiplet/CCD designs common in AMD EPYC) by reducing cache misses and bouncing.
Summary:
- Intel posted v3 patches for “Cache Aware Scheduling” for Linux.
- Targeting inclusion in the Linux kernel in 2026 (potentially post-Linux 7.0).
Details:
- Mechanism: Colocates tasks sharing data to the same cache domain to ensure better LLC (Last Level Cache) locality.
- New Optimization: The scheduler now skips cache-aware behavior after repeated load balancing failures to prevent performance regression.
- Relevance:
- Critical for high-core-count server CPUs (Intel Xeon 6 Granite Rapids & AMD EPYC Turin).
- Accounting for tasks preferring specific LLCs is now maintained in the lowest-level sched domain per CPU.
[2026-02-12] Code, Compute and Connection: Inside the Inaugural NVIDIA AI Day São Paulo
Source: NVIDIA Blog
Key takeaway relevant to AMD:
- Regional Dominance: NVIDIA is solidifying its “Sovereign AI” moat in Latin America, capitalizing on a $4B Brazilian government investment plan.
- Partnership Lock-in: Claro has been named the first NVIDIA Cloud Partner in Latin America, potentially locking regional infrastructure into the CUDA ecosystem.
Summary:
- NVIDIA held “AI Day” in São Paulo with 500+ attendees.
- Focus on Sovereign AI, AI Agents, and Open Models.
Details:
- Investment Context: The “Brazilian Artificial Intelligence Plan” (2024-2028) allocates ~$4B for infrastructure and R&D.
- Tech Stack Push: Promoting NVIDIA NeMo and NIM microservices for local startups and universities.
- Key Sectors: Biotechnology (genomics), Financial Services, and Telecommunications.
- Startup Ecosystem: NVIDIA Inception program is actively recruiting local startups to build on NVIDIA stacks.
[2026-02-12] SPARC & Alpha CPU Ports Still Seeing Activity In 2026 With Linux 7.0
Source: Phoronix (AMD Linux)
Key takeaway relevant to AMD:
- Kernel Context: Confirms the Linux 7.0 development cycle is active, which includes the critical x86_64 and AMD GPU updates mentioned in other reports.
Summary:
- Maintenance patches for legacy architectures (SPARC, Alpha, m68k) were merged into Linux 7.0.
Details:
- Alpha: Fixes user-space corruption during memory compaction.
- SPARC: Header changes, fork/clone bug fixes, and
clone3support. - m68k: Formatting string updates (
vsprintftovsnprintf) and NuBus driver fixes.