sexta-feira, 17 de outubro de 2025

Billions keep pouring into the Red Team - Oracle unveils a 50,000-AMD-GPU supercluster.

Finally, competition emerges that threatens to put Nvidia on the ropes.

Oracle and AMD to Launch First AI Supercluster with 50,000 AMD Instinct MI450 GPUs in 2026.

October 14, 2025 – Oracle and AMD have announced a groundbreaking collaboration to deploy the first publicly available AI supercluster powered by 50,000 AMD Instinct MI450 Series GPUs, starting in Q3 2026. This initiative, revealed at Oracle AI World in Las Vegas and Santa Clara, marks a significant milestone in their decade-long partnership to advance AI infrastructure through Oracle Cloud Infrastructure (OCI).

Unprecedented AI Performance

The supercluster, expanding in 2027, will leverage AMD’s “Helios” rack design, featuring:

AMD Instinct MI450 Series GPUs: Offering up to 432 GB of HBM4 and 20 TB/s memory bandwidth for faster, larger-scale AI training and inference.
Next-Gen AMD EPYC “Venice” CPUs: Providing confidential computing and robust security for sensitive AI workloads.
AMD Pensando “Vulcano” Networking: Enabling ultra-fast, lossless connectivity with up to three 800 Gbps AI-NICs per GPU.
UALink and UALoE Fabric: Supporting scalable, low-latency GPU interconnectivity for multi-trillion-parameter models.

This architecture ensures maximum performance, energy efficiency, and scalability for demanding AI and HPC workloads.

Why It Matters

With AI models rapidly outgrowing current infrastructure, Oracle’s supercluster addresses the need for flexible, high-performance solutions. Key benefits include:

Breakthrough Compute: Handles 50% larger models in-memory, reducing model partitioning.
Scalable Design: Dense, liquid-cooled 72-GPU racks optimize cost and efficiency.
Open-Source Support: AMD ROCm™ software stack simplifies migration and supports popular AI frameworks.
Advanced Security: DPU-accelerated networking and EPYC CPUs enhance data protection.

Building on Proven Success

This announcement builds on OCI’s prior integration of AMD Instinct MI300X and MI355X GPUs, with the latter now generally available in OCI’s zettascale Supercluster, scaling up to 131,072 GPUs. These solutions offer unmatched price-performance and flexibility for generative AI and large language models.

Industry Impact

Mahesh Thiagarajan, EVP at OCI, emphasized, “Our customers are pushing AI boundaries, and this collaboration with AMD delivers the scalable, secure infrastructure they need.” Forrest Norrod, AMD’s EVP, added, “Together, we’re accelerating AI innovation with open, optimized systems.”

For more details, visit Oracle’s AI solutions or AMD’s Instinct GPUs.

Keywords: Oracle AI Supercluster, AMD Instinct MI450, AI infrastructure, cloud computing, generative AI, high-performance computing

sábado, 11 de outubro de 2025

Google Takes Down EMUCR — Nintendo at it Again?

EmuCR Blog Taken Down: What’s Next for the Emulation Community?

This month we woke up sadder with the sneaky coup that came from the shadows and for reasons that are not very clear. The emulation community was recently hit with a setback as EmuCR, a key hub for emulator updates and downloads, was removed by Blogger for breaching community guidelines. The site, a go-to resource for enthusiasts tracking the latest developments in emulation software, is now in limbo as its administrators await the outcome of a re-review request.

The removal has sparked discussions about the challenges of maintaining emulation-focused platforms under strict platform policies. If the re-review fails, the EmuCR team faces two paths: finding a new hosting platform to restore their content or building an independent alternative from scratch. Both options come with hurdles—migrating to a new platform risks further scrutiny, while creating a standalone site demands significant time and resources.

For now, the emulation community is left watching closely, as the resolution could set a precedent for how similar platforms navigate content moderation in the future.

quinta-feira, 25 de setembro de 2025

Dying Light: The Beast - Hotfix 1.2.1 just dropped on Steam

Shortly after its launch and stratospheric sales success, Dying Light: the Beast has already received its first patch with specific fixes.

"Hotfix 1.2.1 for Dying Light: The Beast is now live on PC! Make sure to update your game—console updates will follow soon.

This hotfix addresses the Indoor Rain issue and fixes the Disturbed Day/Night Cycle.

Once the update is live on all platforms, the APEX Car Skin will appear in the in-game stash for anyone who pre-ordered or owns the Dying Light 2 Ultimate Edition."

Check steam to stay updated.

terça-feira, 23 de setembro de 2025

AMD Patents High-Bandwidth RAM Architecture to Double DDR5 Speeds - Strix Halo iGPU level to the masses?

AMD Patents High-Bandwidth RAM Architecture to Double DDR5 Speeds

AMD has unveiled a groundbreaking patent for a new RAM architecture aimed at overcoming the bandwidth bottlenecks of DDR5 memory. The innovation, dubbed High-Bandwidth Dual Inline Memory Module (HB-DIMM), promises to double data rates to 12.8 Gbps on the memory bus, far exceeding DDR5's native 6.4 Gbps. This development comes as DDR5 struggles to keep pace with the escalating demands of high-performance gaming, graphics processors, and servers.

Key Features of the HB-DIMM Architecture

The patent introduces several advanced elements to enhance memory performance:

Dual-Speed Data Buffering: Multiple DRAM chips connect to data buffer chips that transmit data at twice the speed of standard memory chips, enabling non-interleaved transfers for simpler signal integrity and lower latency.
Pseudo Channels and Intelligent Routing: A register clock driver (RCD) uses a chip identifier (CID) bit to route commands to independently addressable pseudo-channels, boosting parallel access and throughput.
Flexible Operating Modes: Supports 1n and 2n modes for optimized clocking, along with programmable switches between pseudo-channel and quad-rank setups, ensuring compatibility with DDR5 standards.

According to the patent, "The memory bandwidth required for applications such as high-performance graphics processors... are outpacing the roadmap of bandwidth improvements for DDR DRAM chips." This architecture leverages existing DDR5 chips without major manufacturing overhauls, making it a scalable upgrade for future systems.

Implications for Gaming and AI

If implemented, HB-DIMM could revolutionize RAM performance in high-end PCs, AI workloads, and data centers by addressing DDR5's stagnation. AMD's move aligns with its recent patents, including blower fan designs for laptops and smart cache systems for processors, signaling a push toward next-gen hardware innovation. The biggest beneficiaries of this kind of advancement would be iGPUs, which rely on RAM as VRAM. The IP described here could be integrated into the next generation of handhelds and consoles, bringing the performance of costly high-end chips like the Strix Halos to the masses.

This patent, accessible via WIPO, raises questions about the future of RAM evolution amid rising computational needs.

terça-feira, 16 de setembro de 2025

ROCm 7.0 - Bringing proper competition to CUDA

ROCm 7.0: AMD's AI Powerhouse for Next-Gen Performance and Efficiency

In the fast-evolving world of AI, AMD is pushing boundaries with ROCm 7.0, a robust open-source platform tailored for generative AI, large-scale training, inference, and accelerated discovery. This release spotlights the new AMD Instinct MI350 series GPUs, delivering unprecedented computational power, energy savings, and scalability to meet the demands of enterprise AI workloads.

Empowering the MI350X Era

At the heart of ROCm 7.0 is support for the MI350X and MI355X GPUs, featuring eight Accelerator Complex Dies (XCDs) with 256 CDNA 4 Compute Units and 256 MB of Infinity Cache for low-latency memory access. These GPUs introduce novel data types like FP4, FP6, and FP8, boosting throughput while slashing energy use, ideal for tackling the inference bottlenecks in modern AI models. Backed by AMD's GPU driver 30.10.0, ROCm now runs seamlessly on OSes including Rocky Linux 9, Ubuntu 22.04.5/24.04.3, RHEL 9.4/9.6, and Oracle Linux 9, with flexible partitioning for bare-metal setups.

Software Innovations Driving AI Forward

ROCm 7.0 supercharges AI frameworks with day-one compatibility for PyTorch 2.7/2.8, TensorFlow 2.19.1, and JAX 0.6.x. Highlights include optimized Docker images for efficient deployment, new kernels like 3D BatchNorm and APEX Fused RoPE, and C++ compilation via amdclang++. For inference, vLLM and SGLang now natively handle FP4 on MI350 GPUs, enabling distributed prefill/decode for dense LLMs and MoE models.

Model optimization shines with AMD Quark's production-ready quantized models, such as OpenAI's gpt-oss-120b/20b, DeepSeek R1, Llama 3.3 70B, Llama 4 variants, and Qwen3 (up to 235B parameters). Tools like Primus streamline end-to-end training and fine-tuning on Instinct GPUs, with reinforcement learning on the horizon. Enterprise features, including AMD Resource Manager for smart scheduling and AI Workbench for Kubernetes/Slurm integration, make scaling effortless.

Performance Boosts and Ecosystem Synergy

Expect major gains from the Stream-K algorithm, which auto-balances GEMM operations for peak GPU utilization without manual tweaks. Libraries like hipBLASLt, rocBLAS, hipSPARSE, and rocSOLVER now support low-precision formats (FP8/BF8) with fused operations, accelerating AI and HPC tasks. RCCL's zero-copy transfers and FP8 precision speed up multi-GPU comms, while rocAL and RPP enhance vision pipelines with hardware decoding and FP16 support.

Partnerships amplify this: Collaborations with PyTorch, TensorFlow, JAX, OpenAI, and inference engines like vLLM ensure seamless integration. Benchmarks show impressive results for models like DeepSeek R1 (FP4) and Llama 3.3 70B (FP8), with detailed metrics available in ROCm docs.

Profiling gets smarter too—ROCProfV3 and AQL Profiler add PC-sampling and SQL exports, while ROCgdb aids debugging. HIP 7.0 adds CUDA-like APIs and zero-copy GPU-NIC transfers, powered by LLVM 20.

Looking Ahead: Innovation Without Limits

ROCm 7.0 isn't just a release; it's a foundation for future AI breakthroughs. Upcoming updates include refreshed profiler UIs, AMD Infinity Storage to tackle I/O hurdles, and expanded Primus features. As an open, enterprise-grade ecosystem, ROCm continues to democratize high-performance AI on AMD hardware.

Whether you're training massive models or deploying at scale, ROCm 7.0 equips developers with the tools for faster, greener AI. Dive in and experience the difference.

Source: https://rocm.blogs

segunda-feira, 8 de setembro de 2025

AI Age - NVIDIA CFO Highlights Blackwell GB300 Ramp and Surging AI Chip Demand in Q2

NVIDIA CFO Highlights Blackwell GB300 Ramp and Surging AI Chip Demand in Q2

NVIDIA's CFO, Jensen Huang, recently shared insights on the company's Q2 performance, emphasizing significant growth in data center revenues and the rapid scaling of its Blackwell GB200 and GB300 AI solutions. Below are the key takeaways from the discussion, optimized for those tracking NVIDIA’s advancements in AI and data center technology.

Strong Data Center Revenue Growth

NVIDIA reported a 12% quarter-over-quarter revenue increase in Q2, driven by its data center and networking segments, even after excluding China-specific H20 AI GPUs. Looking ahead, NVIDIA projects a robust 17% sequential growth for Q3, signaling strong demand for its AI and computing solutions.

Blackwell GB200 and GB300 Scale-Up Success

The ramp-up of NVIDIA’s Blackwell GB200 network racks and GB300 Ultra has exceeded expectations. Huang described the transition as “seamless,” with significant scale and volume hitting the market. Analysts predict up to 300% sequential growth for the GB300 in Q3, underscoring NVIDIA’s leadership in high-performance AI infrastructure.

Navigating China’s H20 AI GPU Market

Despite geopolitical challenges, NVIDIA has secured licenses to ship H20 AI GPUs to key Chinese customers. While uncertainties remain, Huang expressed optimism about completing these shipments, potentially adding $2 billion to $5 billion in revenue. This reflects NVIDIA’s strategic focus on maintaining its foothold in the Chinese market amid local pushes for domestic chip alternatives.

Addressing AI Chip Competition and Power Efficiency

Recent market concerns, including Broadcom’s $10 billion custom AI chip contract, have sparked debates about cost-effective AI chips. Huang emphasized that power efficiency is critical for AI computing, particularly for reasoning models and agentic AI. NVIDIA’s focus on data center-scale solutions prioritizes performance per watt and dollar, ensuring long-term efficiency for large-scale AI clusters.

Next-Gen Vera Rubin AI Chips on Track

NVIDIA’s next-generation Vera Rubin AI chips are progressing on a one-year cadence, with all six chips already taped out. Huang highlighted early demand, noting “several gigawatts” of power needs already penciled in for Rubin-powered data centers, positioning NVIDIA to meet future AI infrastructure demands.

Why NVIDIA’s Strategy Matters

NVIDIA’s ability to scale its Blackwell GB200 and GB300 solutions, combined with its forward-looking approach to power-efficient AI systems, reinforces its dominance in the AI and data center markets. As demand for AI-driven computing grows, NVIDIA’s innovations in rack-scale solutions and next-gen chips like Vera Rubin ensure it remains a key player in the industry.

For the latest updates on NVIDIA’s AI advancements and market performance, stay tuned to xxxpctech for in-depth insights.

sexta-feira, 22 de agosto de 2025

AMD iROCm - GPT OSS 20B and Flux on a Stick: How a $100 AMD XDNA2 NPU Could Democratize AI and ROCm ecosystem

[Opinion] - Why NPUs Could Outrun GPUs in the AI Inference Race.

For years, GPUs have been the workhorse of AI. They’re powerful, massively parallel, and great at crunching the huge matrices that deep learning demands. But here’s the thing: GPUs were never designed for AI — they were designed for graphics. AI/ML just happened to fit.

Enter the NPU, Neural Processing Unit. Unlike GPUs, NPUs are purpose‑built for AI workloads. They don’t carry the baggage of graphics pipelines or shader cores. Instead, they’re optimized for the dataflow patterns of neural networks: moving data as little as possible, keeping it close to the compute units, and executing operations with extreme efficiency.

Why XDNA 2 Is a Leap Forward

AMD’s XDNA 2 architecture, debuting in Strix Point, is a perfect example of this new breed. It delivers up to 50 TOPS of AI performance with native BF16 support, all in less than 10% of the SoCs die area. That’s roughly 20mm², or about 30 mm² if you add a basic LPDDR5X memory interface.

To put that in perspective:

A GPU block capable of similar AI throughput would be many times larger and draw far more power.
XDNA 2 achieves a 5× compute uplift over the first‑gen XDNA, with 2× the power efficiency. That means more AI work per watt, and less heat to manage.
Estimated 2-5W TDP, 10w if we push beyond the efficiency curve above the 50TOPS threshold.

This efficiency comes from its tiled AI Engine arrays, local SRAM, and deterministic interconnects — all designed to minimize data movement, which is the hidden energy hog in AI processing.

Because NPUs are so compact and efficient, they scale in ways GPUs can’t. You can add more NPUs without blowing up your power budget or die size. That’s why the idea of putting an XDNA 2 into a USB stick form factor isn’t just possible: it’s practical.

I’d venture that if AMD scaled up their NPUs, say, 10× larger than current — a 250 mm² die could deliver 500–550 TOPs while consuming under 50 W. An MCM design could reach 2,500 TOPs BF16 (dense or sparse) at 200 W, outperforming all GPUs currently used for inference.

The iROCm Stick Concept - Hopefully AMD will be inspired by my idea.

Imagine a sleek red USB4/Thunderbolt stick, branded iROCm, with an XDNA 2 NPU inside and LPDDR5X memory onboard.

Two models could make AI acceleration accessible to everyone:

Model	Memory	Price	Target Audience
iROCm Stick 8GB	LPDDR5X @ 7533Mhz	$100-120	Students, hobbyists, AI learners
iROCm Stick 16GB	LPDDR5X @ 7533Mhz	$150	Indie devs, researchers, edge AI prototyping

You plug it in, and instantly your laptop, even a thin‑and‑light, gains a dedicated AI accelerator. No drivers nightmare, no bulky eGPU enclosure. Just ROCm‑powered AI in your pocket. Under 10 W, portable, affordable — everyone (and their dog) can try the ROCm ecosystem and any apps AMD develops.

Paginas do site