Technical Intelligence Report: 2026-02-12

Executive Summary

ROCm/MI300X Performance: AMD has released performance data for the verl RLHF framework on ROCm 7.0.0. The MI300X (8x) demonstrates up to 56% higher throughput in PPO training compared to the NVIDIA H100.
Linux Driver Unification: AMD has unified video decode implementations between RadeonSI (Gallium3D) and RADV (Vulkan) in Mesa 26.1, enabling Vulkan Video support for older GPUs (Hawaii generation) and reducing code maintenance.
Linux Scheduler Optimizations: Intel released v3 patches for “Cache Aware Scheduling” for the Linux kernel. While Intel-authored, benchmarks indicate significant performance gains for AMD EPYC Turin processors by reducing cache bouncing.
Competitor Expansion: NVIDIA is aggressively expanding Sovereign AI initiatives in Latin America, securing a cloud partnership with Claro in Brazil amidst a $4B government AI investment plan.

🤖 ROCm Updates & Software

[2026-02-12] Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm 7.0.0

Source: ROCm Tech Blog

Key takeaway relevant to AMD:

Performance Win: MI300X outperforms NVIDIA H100 in both PPO and GRPO RLHF training throughput (up to 56% faster).
Software Maturity: Validates ROCm 7.0.0 and vLLM 0.11.0.dev stability for complex reinforcement learning workflows.
Ease of Deployment: AMD is providing prebuilt Docker images (rocm/verl) to streamline adoption.

Summary:

AMD integrated the verl (Volcano Engine Reinforcement Learning) framework with ROCm 7.0.0.
The update supports single-node and multi-node training using Ray and vLLM.
Benchmarks show leadership performance on AMD Instinct MI300X versus H100.

Details:

Software Stack: verl 0.6.0, ROCm 7.0.0, vLLM 0.11.0.dev.
Hardware Setup: 8x AMD Instinct MI300X (192 GB HBM3) vs. 8x NVIDIA H100.
PPO (Proximal Policy Optimization) Results:
- Deepseek-llm-7b-chat: MI300X achieved 1,428.64 tokens/GPU/s vs. H100’s 910.42 (+56%).
- Qwen2-7B-Instruct: MI300X achieved 1,514.46 tokens/GPU/s vs. H100’s 1,109.38 (+36%).
- Convergence accuracy remained comparable between platforms (~69% for Deepseek, ~85% for Qwen2).
GRPO (Group Relative Policy Optimization) Results:
- Deepseek-llm-7b-chat: MI300X achieved 2,781.74 tokens/GPU/s vs. H100’s 2,480.54 (+12%).
- Qwen2-7B-Instruct: MI300X achieved 2,739.34 tokens/GPU/s vs. H100’s 2,467.10 (+11%).
Implementation:
- Uses HIP_VISIBLE_DEVICES for GPU isolation (main difference from CUDA).
- Supports Slurm for multi-node scaling.
- Docker image available: rocm/verl:verl-0.6.0.amd0_rocm7.0_vllm0.11.0.dev.

[2026-02-12] AMD Video Decode Now Unified Between RadeonSI & RADV Vulkan Video

Source: Phoronix (AMD Linux)

Key takeaway relevant to AMD:

Driver Efficiency: Unified code path reduces maintenance overhead (-1.4k lines of code) in Mesa 26.1.
Feature Expansion: Enables RADV Vulkan Video support for legacy hardware (Hawaii GPUs and older), which previously lacked this capability in Vulkan.

Summary:

Merged into Mesa 26.1-devel today, unifying video decode logic for RadeonSI (Gallium3D) and RADV (Vulkan).
Led by AMD engineer David Rosca.

Details:

Scope: Covers Video Core Next (VCN), VCN JPEG, and Unified Video Decode (UVD) engines.
Technical Change: Shifts ~6,000 lines of code to a shared interface, resulting in a net reduction of ~1,400 lines.
Impact:
- RADV driver now uses the same robust decode implementation as RadeonSI.
- Immediate support for Vulkan Video decoding on older hardware generations (pre-Polaris/Vega).
- Stable debut expected in Mesa 26.1 (Q2 2026).

🤼‍♂️ Market & Competitors

[2026-02-12] Intel Posts 2026 Update For Cache Aware Scheduling On Linux

Source: Phoronix (AMD Linux)

Key takeaway relevant to AMD:

Cross-Vendor Benefit: Although authored by Intel, independent benchmarks show these patches provide “Big Potential” for AMD EPYC Turin processors.
Performance Uplift: Improves performance in multi-cache domain architectures (Chiplet/CCD designs common in AMD EPYC) by reducing cache misses and bouncing.

Summary:

Intel posted v3 patches for “Cache Aware Scheduling” for Linux.
Targeting inclusion in the Linux kernel in 2026 (potentially post-Linux 7.0).

Details:

Mechanism: Colocates tasks sharing data to the same cache domain to ensure better LLC (Last Level Cache) locality.
New Optimization: The scheduler now skips cache-aware behavior after repeated load balancing failures to prevent performance regression.
Relevance:
- Critical for high-core-count server CPUs (Intel Xeon 6 Granite Rapids & AMD EPYC Turin).
- Accounting for tasks preferring specific LLCs is now maintained in the lowest-level sched domain per CPU.

[2026-02-12] Code, Compute and Connection: Inside the Inaugural NVIDIA AI Day São Paulo

Source: NVIDIA Blog

Key takeaway relevant to AMD:

Regional Dominance: NVIDIA is solidifying its “Sovereign AI” moat in Latin America, capitalizing on a $4B Brazilian government investment plan.
Partnership Lock-in: Claro has been named the first NVIDIA Cloud Partner in Latin America, potentially locking regional infrastructure into the CUDA ecosystem.

Summary:

NVIDIA held “AI Day” in São Paulo with 500+ attendees.
Focus on Sovereign AI, AI Agents, and Open Models.

Details:

Investment Context: The “Brazilian Artificial Intelligence Plan” (2024-2028) allocates ~$4B for infrastructure and R&D.
Tech Stack Push: Promoting NVIDIA NeMo and NIM microservices for local startups and universities.
Key Sectors: Biotechnology (genomics), Financial Services, and Telecommunications.
Startup Ecosystem: NVIDIA Inception program is actively recruiting local startups to build on NVIDIA stacks.

[2026-02-12] SPARC & Alpha CPU Ports Still Seeing Activity In 2026 With Linux 7.0

Source: Phoronix (AMD Linux)

Key takeaway relevant to AMD:

Kernel Context: Confirms the Linux 7.0 development cycle is active, which includes the critical x86_64 and AMD GPU updates mentioned in other reports.

Summary:

Maintenance patches for legacy architectures (SPARC, Alpha, m68k) were merged into Linux 7.0.

Details:

Alpha: Fixes user-space corruption during memory compaction.
SPARC: Header changes, fork/clone bug fixes, and clone3 support.
m68k: Formatting string updates (vsprintf to vsnprintf) and NuBus driver fixes.

News: 2026-02-12

Technical Intelligence Report: 2026-02-12

Executive Summary

🤖 ROCm Updates & Software

[2026-02-12] Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm 7.0.0

[2026-02-12] AMD Video Decode Now Unified Between RadeonSI & RADV Vulkan Video

🤼‍♂️ Market & Competitors

[2026-02-12] Intel Posts 2026 Update For Cache Aware Scheduling On Linux

[2026-02-12] Code, Compute and Connection: Inside the Inaugural NVIDIA AI Day São Paulo

[2026-02-12] SPARC & Alpha CPU Ports Still Seeing Activity In 2026 With Linux 7.0

🔗 References