Technical Intelligence Report: 2026-02-12

Executive Summary

  • ROCm/MI300X Performance: AMD has released performance data for the verl RLHF framework on ROCm 7.0.0. The MI300X (8x) demonstrates up to 56% higher throughput in PPO training compared to the NVIDIA H100.
  • Linux Driver Unification: AMD has unified video decode implementations between RadeonSI (Gallium3D) and RADV (Vulkan) in Mesa 26.1, enabling Vulkan Video support for older GPUs (Hawaii generation) and reducing code maintenance.
  • Linux Scheduler Optimizations: Intel released v3 patches for “Cache Aware Scheduling” for the Linux kernel. While Intel-authored, benchmarks indicate significant performance gains for AMD EPYC Turin processors by reducing cache bouncing.
  • Competitor Expansion: NVIDIA is aggressively expanding Sovereign AI initiatives in Latin America, securing a cloud partnership with Claro in Brazil amidst a $4B government AI investment plan.

🤖 ROCm Updates & Software

[2026-02-12] Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm 7.0.0

Source: ROCm Tech Blog

Key takeaway relevant to AMD:

  • Performance Win: MI300X outperforms NVIDIA H100 in both PPO and GRPO RLHF training throughput (up to 56% faster).
  • Software Maturity: Validates ROCm 7.0.0 and vLLM 0.11.0.dev stability for complex reinforcement learning workflows.
  • Ease of Deployment: AMD is providing prebuilt Docker images (rocm/verl) to streamline adoption.

Summary:

  • AMD integrated the verl (Volcano Engine Reinforcement Learning) framework with ROCm 7.0.0.
  • The update supports single-node and multi-node training using Ray and vLLM.
  • Benchmarks show leadership performance on AMD Instinct MI300X versus H100.

Details:

  • Software Stack: verl 0.6.0, ROCm 7.0.0, vLLM 0.11.0.dev.
  • Hardware Setup: 8x AMD Instinct MI300X (192 GB HBM3) vs. 8x NVIDIA H100.
  • PPO (Proximal Policy Optimization) Results:
    • Deepseek-llm-7b-chat: MI300X achieved 1,428.64 tokens/GPU/s vs. H100’s 910.42 (+56%).
    • Qwen2-7B-Instruct: MI300X achieved 1,514.46 tokens/GPU/s vs. H100’s 1,109.38 (+36%).
    • Convergence accuracy remained comparable between platforms (~69% for Deepseek, ~85% for Qwen2).
  • GRPO (Group Relative Policy Optimization) Results:
    • Deepseek-llm-7b-chat: MI300X achieved 2,781.74 tokens/GPU/s vs. H100’s 2,480.54 (+12%).
    • Qwen2-7B-Instruct: MI300X achieved 2,739.34 tokens/GPU/s vs. H100’s 2,467.10 (+11%).
  • Implementation:
    • Uses HIP_VISIBLE_DEVICES for GPU isolation (main difference from CUDA).
    • Supports Slurm for multi-node scaling.
    • Docker image available: rocm/verl:verl-0.6.0.amd0_rocm7.0_vllm0.11.0.dev.

[2026-02-12] AMD Video Decode Now Unified Between RadeonSI & RADV Vulkan Video

Source: Phoronix (AMD Linux)

Key takeaway relevant to AMD:

  • Driver Efficiency: Unified code path reduces maintenance overhead (-1.4k lines of code) in Mesa 26.1.
  • Feature Expansion: Enables RADV Vulkan Video support for legacy hardware (Hawaii GPUs and older), which previously lacked this capability in Vulkan.

Summary:

  • Merged into Mesa 26.1-devel today, unifying video decode logic for RadeonSI (Gallium3D) and RADV (Vulkan).
  • Led by AMD engineer David Rosca.

Details:

  • Scope: Covers Video Core Next (VCN), VCN JPEG, and Unified Video Decode (UVD) engines.
  • Technical Change: Shifts ~6,000 lines of code to a shared interface, resulting in a net reduction of ~1,400 lines.
  • Impact:
    • RADV driver now uses the same robust decode implementation as RadeonSI.
    • Immediate support for Vulkan Video decoding on older hardware generations (pre-Polaris/Vega).
    • Stable debut expected in Mesa 26.1 (Q2 2026).

🤼‍♂️ Market & Competitors

[2026-02-12] Intel Posts 2026 Update For Cache Aware Scheduling On Linux

Source: Phoronix (AMD Linux)

Key takeaway relevant to AMD:

  • Cross-Vendor Benefit: Although authored by Intel, independent benchmarks show these patches provide “Big Potential” for AMD EPYC Turin processors.
  • Performance Uplift: Improves performance in multi-cache domain architectures (Chiplet/CCD designs common in AMD EPYC) by reducing cache misses and bouncing.

Summary:

  • Intel posted v3 patches for “Cache Aware Scheduling” for Linux.
  • Targeting inclusion in the Linux kernel in 2026 (potentially post-Linux 7.0).

Details:

  • Mechanism: Colocates tasks sharing data to the same cache domain to ensure better LLC (Last Level Cache) locality.
  • New Optimization: The scheduler now skips cache-aware behavior after repeated load balancing failures to prevent performance regression.
  • Relevance:
    • Critical for high-core-count server CPUs (Intel Xeon 6 Granite Rapids & AMD EPYC Turin).
    • Accounting for tasks preferring specific LLCs is now maintained in the lowest-level sched domain per CPU.

[2026-02-12] Code, Compute and Connection: Inside the Inaugural NVIDIA AI Day São Paulo

Source: NVIDIA Blog

Key takeaway relevant to AMD:

  • Regional Dominance: NVIDIA is solidifying its “Sovereign AI” moat in Latin America, capitalizing on a $4B Brazilian government investment plan.
  • Partnership Lock-in: Claro has been named the first NVIDIA Cloud Partner in Latin America, potentially locking regional infrastructure into the CUDA ecosystem.

Summary:

  • NVIDIA held “AI Day” in São Paulo with 500+ attendees.
  • Focus on Sovereign AI, AI Agents, and Open Models.

Details:

  • Investment Context: The “Brazilian Artificial Intelligence Plan” (2024-2028) allocates ~$4B for infrastructure and R&D.
  • Tech Stack Push: Promoting NVIDIA NeMo and NIM microservices for local startups and universities.
  • Key Sectors: Biotechnology (genomics), Financial Services, and Telecommunications.
  • Startup Ecosystem: NVIDIA Inception program is actively recruiting local startups to build on NVIDIA stacks.

[2026-02-12] SPARC & Alpha CPU Ports Still Seeing Activity In 2026 With Linux 7.0

Source: Phoronix (AMD Linux)

Key takeaway relevant to AMD:

  • Kernel Context: Confirms the Linux 7.0 development cycle is active, which includes the critical x86_64 and AMD GPU updates mentioned in other reports.

Summary:

  • Maintenance patches for legacy architectures (SPARC, Alpha, m68k) were merged into Linux 7.0.

Details:

  • Alpha: Fixes user-space corruption during memory compaction.
  • SPARC: Header changes, fork/clone bug fixes, and clone3 support.
  • m68k: Formatting string updates (vsprintf to vsnprintf) and NuBus driver fixes.