quinta-feira, 25 de setembro de 2025

Dying Light: The Beast - Hotfix 1.2.1 just dropped on Steam

Shortly after its launch and stratospheric sales success, Dying Light: the Beast has already received its first patch with specific fixes.



"Hotfix 1.2.1 for Dying Light: The Beast is now live on PC! Make sure to update your game—console updates will follow soon.

This hotfix addresses the Indoor Rain issue and fixes the Disturbed Day/Night Cycle.

Once the update is live on all platforms, the APEX Car Skin will appear in the in-game stash for anyone who pre-ordered or owns the Dying Light 2 Ultimate Edition."


Check steam to stay updated. 

terça-feira, 23 de setembro de 2025

AMD Patents High-Bandwidth RAM Architecture to Double DDR5 Speeds - Strix Halo iGPU level to the masses?

 

AMD Patents High-Bandwidth RAM Architecture to Double DDR5 Speeds

AMD has unveiled a groundbreaking patent for a new RAM architecture aimed at overcoming the bandwidth bottlenecks of DDR5 memory. The innovation, dubbed High-Bandwidth Dual Inline Memory Module (HB-DIMM), promises to double data rates to 12.8 Gbps on the memory bus, far exceeding DDR5's native 6.4 Gbps. This development comes as DDR5 struggles to keep pace with the escalating demands of high-performance gaming, graphics processors, and servers.

Key Features of the HB-DIMM Architecture

The patent introduces several advanced elements to enhance memory performance:

  • Dual-Speed Data Buffering: Multiple DRAM chips connect to data buffer chips that transmit data at twice the speed of standard memory chips, enabling non-interleaved transfers for simpler signal integrity and lower latency.
  • Pseudo Channels and Intelligent Routing: A register clock driver (RCD) uses a chip identifier (CID) bit to route commands to independently addressable pseudo-channels, boosting parallel access and throughput.
  • Flexible Operating Modes: Supports 1n and 2n modes for optimized clocking, along with programmable switches between pseudo-channel and quad-rank setups, ensuring compatibility with DDR5 standards.



According to the patent, "The memory bandwidth required for applications such as high-performance graphics processors... are outpacing the roadmap of bandwidth improvements for DDR DRAM chips." This architecture leverages existing DDR5 chips without major manufacturing overhauls, making it a scalable upgrade for future systems.

Implications for Gaming and AI

If implemented, HB-DIMM could revolutionize RAM performance in high-end PCs, AI workloads, and data centers by addressing DDR5's stagnation. AMD's move aligns with its recent patents, including blower fan designs for laptops and smart cache systems for processors, signaling a push toward next-gen hardware innovation. The biggest beneficiaries of this kind of advancement would be iGPUs, which rely on RAM as VRAM. The IP described here could be integrated into the next generation of handhelds and consoles, bringing the performance of costly high-end chips like the Strix Halos to the masses.

This patent, accessible via WIPO, raises questions about the future of RAM evolution amid rising computational needs.

terça-feira, 16 de setembro de 2025

ROCm 7.0 - Bringing proper competition to CUDA

ROCm 7.0: AMD's AI Powerhouse for Next-Gen Performance and Efficiency



In the fast-evolving world of AI, AMD is pushing boundaries with ROCm 7.0, a robust open-source platform tailored for generative AI, large-scale training, inference, and accelerated discovery. This release spotlights the new AMD Instinct MI350 series GPUs, delivering unprecedented computational power, energy savings, and scalability to meet the demands of enterprise AI workloads.

Empowering the MI350X Era 



At the heart of ROCm 7.0 is support for the MI350X and MI355X GPUs, featuring eight Accelerator Complex Dies (XCDs) with 256 CDNA 4 Compute Units and 256 MB of Infinity Cache for low-latency memory access. These GPUs introduce novel data types like FP4, FP6, and FP8, boosting throughput while slashing energy use, ideal for tackling the inference bottlenecks in modern AI models. Backed by AMD's GPU driver 30.10.0, ROCm now runs seamlessly on OSes including Rocky Linux 9, Ubuntu 22.04.5/24.04.3, RHEL 9.4/9.6, and Oracle Linux 9, with flexible partitioning for bare-metal setups.

Software Innovations Driving AI Forward

ROCm 7.0 supercharges AI frameworks with day-one compatibility for PyTorch 2.7/2.8, TensorFlow 2.19.1, and JAX 0.6.x. Highlights include optimized Docker images for efficient deployment, new kernels like 3D BatchNorm and APEX Fused RoPE, and C++ compilation via amdclang++. For inference, vLLM and SGLang now natively handle FP4 on MI350 GPUs, enabling distributed prefill/decode for dense LLMs and MoE models.

Model optimization shines with AMD Quark's production-ready quantized models, such as OpenAI's gpt-oss-120b/20b, DeepSeek R1, Llama 3.3 70B, Llama 4 variants, and Qwen3 (up to 235B parameters). Tools like Primus streamline end-to-end training and fine-tuning on Instinct GPUs, with reinforcement learning on the horizon. Enterprise features, including AMD Resource Manager for smart scheduling and AI Workbench for Kubernetes/Slurm integration, make scaling effortless.

Performance Boosts and Ecosystem Synergy

Expect major gains from the Stream-K algorithm, which auto-balances GEMM operations for peak GPU utilization without manual tweaks. Libraries like hipBLASLt, rocBLAS, hipSPARSE, and rocSOLVER now support low-precision formats (FP8/BF8) with fused operations, accelerating AI and HPC tasks. RCCL's zero-copy transfers and FP8 precision speed up multi-GPU comms, while rocAL and RPP enhance vision pipelines with hardware decoding and FP16 support.

Partnerships amplify this: Collaborations with PyTorch, TensorFlow, JAX, OpenAI, and inference engines like vLLM ensure seamless integration. Benchmarks show impressive results for models like DeepSeek R1 (FP4) and Llama 3.3 70B (FP8), with detailed metrics available in ROCm docs.

Profiling gets smarter too—ROCProfV3 and AQL Profiler add PC-sampling and SQL exports, while ROCgdb aids debugging. HIP 7.0 adds CUDA-like APIs and zero-copy GPU-NIC transfers, powered by LLVM 20.

Looking Ahead: Innovation Without Limits

ROCm 7.0 isn't just a release; it's a foundation for future AI breakthroughs. Upcoming updates include refreshed profiler UIs, AMD Infinity Storage to tackle I/O hurdles, and expanded Primus features. As an open, enterprise-grade ecosystem, ROCm continues to democratize high-performance AI on AMD hardware.

Whether you're training massive models or deploying at scale, ROCm 7.0 equips developers with the tools for faster, greener AI. Dive in and experience the difference.

Source: https://rocm.blogs

segunda-feira, 8 de setembro de 2025

AI Age - NVIDIA CFO Highlights Blackwell GB300 Ramp and Surging AI Chip Demand in Q2

 

NVIDIA CFO Highlights Blackwell GB300 Ramp and Surging AI Chip Demand in Q2

NVIDIA's CFO, Jensen Huang, recently shared insights on the company's Q2 performance, emphasizing significant growth in data center revenues and the rapid scaling of its Blackwell GB200 and GB300 AI solutions. Below are the key takeaways from the discussion, optimized for those tracking NVIDIA’s advancements in AI and data center technology.

Strong Data Center Revenue Growth

NVIDIA reported a 12% quarter-over-quarter revenue increase in Q2, driven by its data center and networking segments, even after excluding China-specific H20 AI GPUs. Looking ahead, NVIDIA projects a robust 17% sequential growth for Q3, signaling strong demand for its AI and computing solutions.

Blackwell GB200 and GB300 Scale-Up Success

The ramp-up of NVIDIA’s Blackwell GB200 network racks and GB300 Ultra has exceeded expectations. Huang described the transition as “seamless,” with significant scale and volume hitting the market. Analysts predict up to 300% sequential growth for the GB300 in Q3, underscoring NVIDIA’s leadership in high-performance AI infrastructure.

Navigating China’s H20 AI GPU Market

Despite geopolitical challenges, NVIDIA has secured licenses to ship H20 AI GPUs to key Chinese customers. While uncertainties remain, Huang expressed optimism about completing these shipments, potentially adding $2 billion to $5 billion in revenue. This reflects NVIDIA’s strategic focus on maintaining its foothold in the Chinese market amid local pushes for domestic chip alternatives.

Addressing AI Chip Competition and Power Efficiency

Recent market concerns, including Broadcom’s $10 billion custom AI chip contract, have sparked debates about cost-effective AI chips. Huang emphasized that power efficiency is critical for AI computing, particularly for reasoning models and agentic AI. NVIDIA’s focus on data center-scale solutions prioritizes performance per watt and dollar, ensuring long-term efficiency for large-scale AI clusters.

Next-Gen Vera Rubin AI Chips on Track

NVIDIA’s next-generation Vera Rubin AI chips are progressing on a one-year cadence, with all six chips already taped out. Huang highlighted early demand, noting “several gigawatts” of power needs already penciled in for Rubin-powered data centers, positioning NVIDIA to meet future AI infrastructure demands.

Why NVIDIA’s Strategy Matters

NVIDIA’s ability to scale its Blackwell GB200 and GB300 solutions, combined with its forward-looking approach to power-efficient AI systems, reinforces its dominance in the AI and data center markets. As demand for AI-driven computing grows, NVIDIA’s innovations in rack-scale solutions and next-gen chips like Vera Rubin ensure it remains a key player in the industry.

For the latest updates on NVIDIA’s AI advancements and market performance, stay tuned to xxxpctech for in-depth insights.