How Hardware-Accelerated Ray Tracing Works in Modern GPUs

How Hardware-Accelerated Ray Tracing Works in Modern GPUs

What if your GPU could trace millions of light rays in real time without tanking frame rates?
Modern GPUs do that by shifting heavy work off shader cores and into fixed-function RT Cores.
These units handle BVH traversal and ray–triangle intersections far faster than general compute.
That change makes realistic reflections, shadows, and lighting possible at playable frame rates.
This piece shows how RT Cores work, why BVH quality and ray coherence still limit performance, and what to watch or tweak if you care about smooth, ray-traced graphics.

Core Mechanics Behind Hardware-Accelerated Ray Tracing

c-tq7Nn1VROlx6vh5oDo_g

Hardware-accelerated ray tracing shifts the heaviest work off your GPU’s shader cores and onto dedicated units called RT Cores. The two operations that eat the most cycles? BVH traversal and ray–triangle intersection. When a frame starts, the GPU fires primary rays from the camera, one per pixel. Each ray needs to figure out which surface it hits first. Without RT Cores, every single ray would test against thousands or millions of triangles using the same ALUs that handle rasterization, mesh shading, everything else. RT Cores handle these tests with specialized circuits that run orders of magnitude faster than general compute.

Ray coherence is huge here. Coherent rays travel in similar directions and hit nearby surfaces, so they traverse the same BVH branches. RT Cores can batch these tests and maximize throughput. Incoherent rays, the kind you get with glossy reflections or diffuse bounces, scatter across the scene and force the hardware to check tons of unrelated BVH nodes. Efficiency tanks. Modern pipelines try to sort or reorder rays before traversal to boost coherence, but divergence is still a practical ceiling. RT Cores work around this by pipelining multiple rays at once and hiding latency with deep queues. While one ray waits on memory, the hardware processes others.

The jump from software to hardware isn’t incremental. Early demos in 2018 showed NVIDIA’s Turing GPUs finishing the same ray tracing workload in one fifth the time Pascal needed, and Pascal was doing everything in shaders. Real-time rendering has tight deadlines. Gaming targets 16 ms per frame at 60 fps or 8 ms at 120 fps, compared to minutes or hours for offline film rendering. Hardware acceleration makes that deadline possible by turning thousands of shader instructions into a handful of fixed-function operations.

RT Cores specifically speed up:

  • BVH node traversal – testing ray–box intersections to prune huge geometry chunks without touching individual triangles.
  • Ray–triangle intersection tests – computing barycentric coordinates and hit distance with hardwired Möller–Trumbore or Watertight algorithms.
  • Stackless or micro-stack traversal – managing node queues in on-chip SRAM to cut down DRAM round-trips.
  • Primitive filtering and backface culling – rejecting non-contributing geometry early in the pipeline.

BVH Acceleration Structures and Their Role in Hardware Ray Tracing

zUE3YF3AXZaLXcG2riTprQ

A Bounding Volume Hierarchy organizes scene triangles into a tree of nested axis-aligned boxes. Each BVH node holds a bounding box and pointers to two child nodes or a list of leaf triangles. When a ray hits a parent box, the RT Core tests both children. If the ray misses a box entirely, the hardware skips that whole subtree without touching any triangles inside. This recursive pruning cuts the number of expensive ray–triangle tests from potentially millions down to tens or hundreds per ray.

BVH quality directly controls ray tracing performance. A well-balanced tree with tight bounds lets the traversal unit reject large scene portions quickly. A poorly built BVH forces the hardware to test tons of irrelevant nodes and triangles. Real-time engines rebuild or refit BVHs every frame for dynamic objects like characters, vehicles, animated foliage. Stale acceleration structures produce wrong shadows and reflections.

Key BVH pieces:

  • Internal nodes – store axis-aligned bounding boxes for two child subtrees. No geometry data.
  • Leaf nodes – contain references to one or more triangles. Traversal ends here with intersection tests.
  • Hierarchy depth – typical real-time BVHs run 15 to 30 levels. Deeper trees reduce intersection counts but add traversal overhead.
  • Split strategies – surface-area heuristic (SAH) minimizes expected ray–node intersection cost. Real-time builders use fast approximations like binned SAH or linear bounding-box splits.
  • Update types – full rebuild replaces the entire tree. Refitting adjusts bounding boxes in place, faster but less optimal. Both happen on the GPU between frames.
  • Memory layout – nodes pack into GPU buffers as tightly as possible to maximize cache hits during traversal. Typical node size is 64–128 bytes.

BVH efficiency caps RT performance. Even the fastest intersection hardware stalls if the traversal unit feeds it irrelevant triangles or revisits the same nodes over and over. Modern APIs expose BVH build flags that trade construction time for traversal speed, and drivers apply heuristics to pick the best strategy per object type.

Understanding RT Core Architecture and Dedicated Ray-Processing Units

eiF_OpozWVqXhJ9F-TWQcw

RT Cores sit alongside shader cores and texture units in the GPU’s streaming multiprocessor. Each RT Core has two main fixed-function blocks: the traversal unit and the intersection unit. The traversal unit walks the BVH tree by testing rays against bounding boxes. The intersection unit does ray–triangle math for leaf geometry. Both units operate in lockstep with the shader pipeline. When a shader issues a trace-ray instruction, the RT Core takes over, does traversal and intersection, then returns hit data back to the shader. Triangle ID, barycentric coordinates, hit distance. The shader continues with shading or secondary-ray logic.

RT Cores use SIMT execution like shader cores but with narrower SIMD width tuned for divergent control flow. Rays in a warp rarely follow identical paths through the BVH, so the traversal unit maintains per-ray state and can context-switch between rays to hide memory latency. This parallelism needs on-chip storage for node stacks or queues. High-end RT Cores include small SRAMs that cache the most recent BVH nodes, cutting the need to fetch from L2 or VRAM. The microarchitecture bottleneck? Memory bandwidth. Large scenes with deep BVHs generate tons of cache misses, so traversal throughput depends heavily on how well the acceleration structure fits in GPU caches.

When Turing launched in 2018, NVIDIA reported about five times faster ray tracing compared to Pascal, which did the same operations in shaders. That speedup came from eliminating thousands of scalar instructions per ray and replacing them with a few cycles of fixed-function logic. Tensor Cores, introduced with Volta in 2018, complement RT Cores by accelerating denoising and super-resolution. Engines can trace fewer rays per pixel and reconstruct a clean image via machine learning.

Traversal Unit Flow

The traversal unit keeps a work queue of BVH nodes to visit. It pops the front node, tests the ray against both child bounding boxes, and pushes any intersected children back onto the queue. Nodes are processed in near-to-far order to find the closest hit early and skip distant subtrees. Stackless traversal implementations use a restart trail or thread-local node cache instead of a traditional stack, reducing on-chip memory pressure. Bounding tests use interval arithmetic. Ray origin plus t·direction must lie within the box’s min and max coordinates on all three axes. The hardware evaluates all six plane intersections in parallel, rejecting the node if any axis interval is empty.

Intersection Unit Logic

When traversal reaches a leaf node, the intersection unit takes the list of triangle indices, fetches vertex positions from a separate buffer, and computes the ray–triangle intersection using the Möller–Trumbore algorithm or a watertight variant. The math solves for barycentric coordinates (u, v) and hit distance t. If 0 ≤ u, v ≤ 1 and u + v ≤ 1, the ray hits the triangle. The intersection unit runs this test in a few clock cycles using hardwired multiply-add trees and comparison logic. Multiple triangles in the same leaf get tested in parallel when SIMD width allows, and results are reduced to the single closest hit. Backface culling and alpha-test masks can be applied here before returning to the shader, filtering out non-opaque or irrelevant geometry without extra shader invocations.

Software vs Hardware Ray Tracing: Performance and Architectural Differences

vgmXCY4XGGas_SIL7sg8Q

Software ray tracing runs entirely on programmable compute or pixel shaders. The shader code manually walks the BVH, testing bounding boxes in a loop, and calculates ray–triangle intersections using scalar or vector ALUs shared with all other rendering work. DXR and Vulkan ray tracing extensions allow software ray tracing on older GPUs like NVIDIA Pascal by compiling trace-ray calls into long shader routines. This works but burns hundreds of instructions per ray and competes for registers, cache, and execution slots with rasterization, compute, and texture sampling. Real-time frame rates need either very low resolution, very simple scenes, or hybrid pipelines that use ray tracing sparingly.

Hardware acceleration offloads traversal and intersection to RT Cores, freeing shader cores for shading, denoising, and secondary-ray generation. The latency of a single ray trace drops from thousands of cycles to tens, and throughput scales with the number of RT Cores rather than being limited by shader occupancy. NVIDIA’s internal measurements on Turing showed a representative workload completing in one fifth the time compared to Pascal. Effectively a 5× speedup for the same scene and ray count. That performance gap widens in complex scenes with deep BVHs or high triangle counts, because fixed-function units handle exponential growth in traversal cost more gracefully than shader loops.

Approach Execution Units Strengths Weaknesses
Software (shader-based) Programmable shader cores (CUDA, compute, pixel shaders) Works on any DXR/Vulkan GPU; flexible for custom traversal logic High instruction overhead; shares resources with rasterization; low throughput
Hardware (RT Cores) Dedicated fixed-function traversal + intersection units Order-of-magnitude speedup; low latency; frees shader cores for other work Requires RT-capable GPU; less flexible for non-standard traversal
Hybrid (RT + Tensor) RT Cores + Tensor Cores + shaders Real-time ray tracing + ML denoising/upscaling; best image quality per watt Complex pipeline; developer learning curve; still expensive for full path tracing
CPU software CPU scalar or SIMD ALUs No GPU required; useful for offline rendering or pre-baking Minutes to hours per frame; not viable for real-time interaction

The trade-off is flexibility versus speed. Software paths let developers experiment with custom BVH layouts or intersection routines, but hardware paths deliver the performance needed to hit 60 fps with ray traced global illumination and reflections on consumer GPUs.

Hybrid Rendering Pipelines: Merging Rasterization and Hardware Ray Tracing

TmDnPvmeVFqtZ3gQXgBBxA

Modern real-time engines don’t replace rasterization with ray tracing. They combine both. Rasterization handles primary visibility, the initial image of opaque surfaces, because it’s extremely fast and well-optimized for depth buffering and early-Z rejection. Ray tracing gets applied selectively to effects that benefit most: reflections on glossy materials, accurate soft shadows from area lights, one-bounce global illumination, screen-space ambient occlusion replacement, and depth-of-field or motion blur. This hybrid approach keeps total frame time within the few-millisecond budget needed for 60 or 120 fps gaming.

A typical frame starts with a rasterized G-buffer pass that writes surface normals, albedo, roughness, and depth. The lighting pass then traces rays only where needed. Shadow rays toward each light source, reflection rays for mirrors and water, diffuse rays for indirect lighting. Each ray type gets scheduled separately, letting the engine balance quality versus performance by adjusting ray counts per pixel. Unreal Engine’s Lumen system traces a sparse set of screen-space and world-space rays for global illumination and uses temporal accumulation plus denoising to fill in the missing samples. Unity’s ray tracing pipeline similarly layers RT effects on top of a rasterized base, with artist-controlled quality presets that scale ray counts and BVH resolution.

Hybrid pipelines inject ray tracing at these points:

  • Global illumination – one or more diffuse bounces to capture color bleeding and indirect light. Replaces or augments baked lightmaps.
  • Reflections – glossy and mirror reflections that exceed screen-space limits. Reflects off-screen objects and handles multiple bounces.
  • Ambient occlusion – ray traced AO tests occlusion in all directions rather than sampling screen-space depth, producing more accurate contact shadows.
  • Shadows – shadow rays replace shadow maps for area lights, producing physically correct penumbra width that varies with distance.
  • Camera effects – depth-of-field and motion blur can be calculated via ray distribution and temporal sampling, though rasterized post-process is often still faster.

Screen-space techniques like SSAO and SSR remain in the pipeline as fallbacks or supplements, especially on lower-end hardware, ensuring the game runs across a wide performance range.

The Role of Tensor Cores: Denoising, Upscaling, and Image Reconstruction

J6H1eXfpXFWDQk_ddOe5uQ

Ray tracing at real-time frame rates forces a trade-off: trace fewer rays to stay within the millisecond budget, but accept noisy, undersampled images. A single ray per pixel produces a sparse, grainy result, especially for indirect lighting and glossy reflections, and it flickers badly across frames. Tensor Cores solve this by running trained neural networks that reconstruct a clean, high-resolution image from noisy, low-sample input. NVIDIA’s DLSS (Deep Learning Super Sampling) trains on thousands of reference frames rendered at high quality, learning to infer missing detail, remove noise, and even upscale from a lower internal resolution to the target display resolution.

Denoising networks take the raw ray traced lighting buffer plus auxiliary inputs (surface normals, motion vectors, depth, albedo) and produce a temporally stable result in a few milliseconds. Because Tensor Cores are built for matrix multiply-accumulate operations, running a convolutional or transformer-based denoiser is far faster than equivalent work on shader cores. Volta introduced Tensor Cores in 2018 for AI training. Turing adapted them for real-time inference, and later architectures expanded throughput and precision modes. Engines can trace one ray per pixel (or even less with checkerboard patterns) and reconstruct an image that looks comparable to 16 or 32 samples per pixel via traditional accumulation.

Temporal accumulation is the other half. The denoiser uses motion vectors to reproject the previous frame’s output onto the current frame, blending history with new samples to build up quality over time. This works well for static or slow-moving scenes but needs careful handling of disocclusions, regions that were hidden last frame and have no valid history. Tensor-accelerated networks can detect and in-paint these areas using learned priors, maintaining visual coherence without the ghosting artifacts common in naive temporal filters.

Tensor-accelerated operations in the RT pipeline:

  • Denoise inference – running the trained denoising network on per-frame lighting buffers to remove Monte Carlo noise.
  • Motion vectors – sometimes generated or refined via learned models to improve temporal reprojection accuracy.
  • Temporal history reuse – blending current and reprojected frames with learned confidence weights.
  • Super-resolution – upscaling the internal render resolution (e.g., 1080p) to display resolution (e.g., 4K) using detail synthesis networks like DLSS.

API and Engine Support for Hardware Ray Tracing

rxX8YR-DUgqXNLXkxRJ-5g

DirectX Raytracing (DXR) and Vulkan ray tracing extensions expose GPU ray tracing hardware through a unified programming model. Developers describe acceleration structures, bind shaders for ray generation, closest-hit, any-hit, and miss events, then dispatch rays from compute or pixel shaders. The API handles BVH construction and updates via driver calls (BuildRaytracingAccelerationStructure in DXR or vkCmdBuildAccelerationStructuresKHR in Vulkan) and provides trace-ray intrinsics that invoke RT Cores when available or fall back to software traversal on older GPUs. Unity and Unreal, which together power roughly 90 percent of shipped games, have integrated DXR support, abstracting the low-level API into artist-friendly lighting and reflection components.

OptiX, NVIDIA’s earlier GPU ray tracing framework introduced in 2009, pioneered programmable ray tracing pipelines on CUDA and laid the groundwork for DXR’s shader binding and execution model. OptiX remains in use for high-performance offline rendering, scientific visualization, and real-time applications that need fine-grained control over traversal and intersection behavior. Millions of gamers now use RTX-branded GPUs with dedicated RT and Tensor Cores, creating a large installed base that justifies the development cost of ray traced rendering paths.

DXR Execution Flow

A DXR ray tracing pipeline consists of multiple shader stages: ray generation, closest-hit, any-hit, miss, and optional intersection shaders for custom primitives. The ray-generation shader runs once per dispatch thread and calls TraceRay, which hands control to the RT Cores. Traversal proceeds until a triangle is hit. The any-hit shader fires for every potential intersection (used for alpha testing), and the closest-hit shader executes once the nearest surface is confirmed. If no geometry is struck, the miss shader runs. Each shader stage has access to ray payload data, a small structure passed through the pipeline, allowing shaders to accumulate lighting, spawn secondary rays, or terminate early.

Vulkan Ray Tracing Extensions

Vulkan ray tracing (VK_KHR_ray_tracing_pipeline, VK_KHR_acceleration_structure) mirrors DXR but uses Vulkan’s explicit pipeline-state design. Developers create separate pipeline objects for ray generation, hit groups, and callable shaders, then bind them via a shader binding table. Acceleration structures are Vulkan buffers built with vkCmdBuildAccelerationStructuresKHR. Top-level structures reference bottom-level structures (one per mesh or instance), enabling efficient instancing and transform updates. Vulkan’s model gives more control over memory allocation and synchronization but requires more boilerplate than DXR’s higher-level abstractions.

Shader Binding Table Overview

The shader binding table (SBT) maps ray–geometry intersections to the correct hit-group shaders. Each row in the table corresponds to a geometry instance and contains function pointers plus optional inline data (material IDs, texture indices). When an RT Core reports a hit, the hardware indexes into the SBT using the instance ID and geometry index, fetches the shader handle, and dispatches the appropriate closest-hit or any-hit function. This indirection allows different materials to use different shading logic without branching in a monolithic shader, keeping divergence low and occupancy high.

GPU Memory, Bandwidth, and Scene Data Considerations

HyY_NgsjUB-Js3HFmRUn9g

Ray tracing stresses GPU memory bandwidth more than rasterization. Traversal reads BVH nodes scattered across VRAM, and intersection fetches vertex data from separate buffers. Both access patterns are less cache-friendly than the linear, tile-based reads typical of rasterized rendering. High-end GPUs pair RT Cores with large L2 caches (tens of megabytes) and wide memory buses (384-bit or 512-bit GDDR6X) to supply enough bandwidth. Even so, scenes with hundreds of millions of triangles or deeply nested BVHs can saturate memory controllers, causing RT Cores to stall while waiting for data.

Host-to-device transfers add another bottleneck. Dynamic scenes require BVH updates every frame, which means uploading new vertex positions or transform matrices over PCIe. Standard PCIe 3.0 x16 provides roughly 16 GB/s bidirectional. PCIe 4.0 doubles that, and NVLink (on workstation-class GPUs) offers 50–100 GB/s. Uploading a 10 MB BVH update over PCIe 3.0 takes about 0.6 ms, nontrivial when the total frame budget is 8 ms at 120 fps. Out-of-core geometry streaming, where only visible or high-priority meshes are resident in VRAM, helps manage memory capacity but introduces latency and can cause pop-in if not carefully managed.

Bottleneck Cause Impact
BVH traversal bandwidth Random node fetches; poor cache locality for incoherent rays RT Core stalls; reduced traversal throughput; longer frame times
PCIe upload latency Host-to-device BVH or vertex updates for dynamic geometry Adds 0.5–2 ms per frame; limits animation complexity or scene scale
VRAM capacity Large scenes exceed GPU memory; forces streaming or LOD swaps Texture/geometry pop-in; reduced BVH quality; lower visual fidelity

Real-Time Ray Tracing Performance on Modern Hardware

ps4W8K1zU1KtnddN75jI7w

Measuring ray tracing performance requires isolating the RT workload from rasterization, denoising, and other pipeline stages. GPU profiling tools expose hardware counters that track RT Core utilization, traversal steps per ray, and intersection tests per frame. One 2026 benchmark on Apple Silicon tested a minimal renderer: one ray per pixel, fullscreen quad, Crytek Sponza scene with 262,267 triangles. The fragment shader output the albedo texture color of the first ray–scene intersection, with all textures packed into a single buffer and sampled without filtering. Timing used GPU counters normalized to CPU time via Apple’s timestamp-conversion method, reporting the median fragment-stage duration over the last 100 frames.

On the M2 baseline, frame rates were “in the decimals” at fullscreen resolution. Effectively single-digit fps, meaning roughly 100–200 ms per frame. The M3 family, introduced roughly half a year before the benchmark publication (mid-2025), delivered about 2× speedup under the same workload. One M3 Pro run showed instability: fragment-stage times ranged up to 10 ms with occasional reports of 5 ms, but the median didn’t settle across 100 frames. Possible causes included background processes on the colleague’s test machine, inaccuracies in GPU-to-CPU timestamp normalization, or core-frequency fluctuations affecting the conversion factor. The author flagged the 5 ms reading as uncertain and noted that the ~2× improvement assumes the shader doesn’t execute divergent loops or heavy per-ray logic, which could reduce or negate hardware-acceleration gains.

Key timing observations:

  • M2 baseline: frame rates in the low single digits at fullscreen. Approximately 100+ ms per frame for the one-ray-per-pixel Sponza test.
  • M3 performance: roughly 2× faster than M2 under the tested conditions. Estimated ~50 ms per frame or slightly better.
  • M3 Pro instability: measured durations showed spikes and didn’t converge to a stable median. Upper range ~10 ms, uncertain low reading ~5 ms. Reproducibility affected by background activity and timestamp normalization limits.

Parallelism and thread divergence also shape RT performance. Coherent rays (similar directions, nearby hits) allow RT Cores to batch BVH traversal and keep all SIMT lanes active. Incoherent rays, common in diffuse global illumination or glossy reflections with high roughness, force the hardware to traverse disjoint branches, wasting cycles and reducing effective throughput. Modern GPUs improve divergence handling with wider ray queues and more aggressive reordering, but the fundamental trade-off remains: coherent workloads scale well, divergent workloads hit microarchitecture bottlenecks in traversal scheduling and cache utilization.

Future Directions in Ray Tracing Hardware

Lrb4RTWcUMmJIy0bwR_0fg

Next-generation RT hardware will likely integrate even tighter coupling between RT Cores, Tensor Cores, and shader cores. Machine-learning co-processors are expected to assist not just with denoising but also with adaptive sampling, predicting which pixels or regions need more rays and which can be interpolated from neighbors. Early research prototypes use neural importance sampling to focus ray budgets on high-variance areas like caustics or fine geometric detail, while coarse regions get fewer samples and rely on learned reconstruction. This could double effective ray throughput without adding silicon area.

Silicon roadmaps point toward smaller process nodes (3 nm, 2 nm) and chiplet designs that dedicate separate dies to RT and AI acceleration. Chiplets allow vendors to scale RT Core count independently of shader-core count, tuning the balance for workstation rendering, real-time gaming, or cloud streaming. Memory bandwidth will remain a bottleneck unless HBM (High Bandwidth Memory) becomes standard on consumer GPUs. Current GDDR6X tops out around 1 TB/s, while HBM3 can exceed 3 TB/s, dramatically improving BVH traversal speed for large scenes.

Cloud-based GPU rendering services are pushing real-time ray tracing into high-end visualization and virtual production without requiring local hardware. Studios stream frames from remote GPU farms running OptiX or DXR at resolutions and quality levels impossible on client devices. Latency-sensitive applications like VR and live broadcast need sub-20 ms round-trips, so edge data centers co-located with fiber backbones are emerging to keep cloud RT viable for interactive use. As 5G and fiber access expand, cloud RT may become the default for mobile and thin-client scenarios, offloading the entire rendering pipeline to centralized hardware.

Emerging hardware directions include:

  • ML-guided sampling: neural networks predict where to cast rays, concentrating samples in high-detail or high-variance regions and interpolating low-frequency areas.
  • Improved BVH hardware: second-generation RT Cores with wider node caches, hardware-assisted BVH refitting, and multi-level traversal queues to handle deeper hierarchies without spilling to DRAM.
  • Cloud RT farms: centralized GPU clusters streaming ray traced frames to thin clients over low-latency networks, enabling photorealistic rendering on devices that lack local RT hardware.
  • Hybrid-AI workflows: Tensor Cores move beyond denoising to handle importance sampling, material inference, and even procedural detail synthesis, blurring the line between rendering and generative AI.

Final Words

We traced rays from camera to pixel, showing primary visibility, BVH traversal, and fixed-function RT Cores that handle traversal and ray–triangle intersection. We also covered Tensor Cores for denoising, API and engine support, memory and performance tradeoffs, and hybrid rasterization pipelines.

This matters because offloading math to hardware turns slow, shader-heavy tasks into millisecond work, letting games and apps add realistic lighting without tanking frame rates.

You now understand how hardware-accelerated ray tracing works and why it’s becoming practical. Expect steadier, more realistic real-time lighting ahead.

FAQ

Q: Should I have hardware-accelerated GPU on or off?

A: You should keep hardware-accelerated GPU on for better performance in video playback, browsers, and graphics apps. Turn it off only if you see crashes, rendering bugs, or high power/heat—test per app and driver.

Q: Is RTX or RX better?

A: RTX or RX is better depending on your needs: RTX leads in ray tracing and DLSS (AI upscaling) support and usually better RT performance; RX often offers similar raster performance and better value per dollar.

Q: Does hardware-accelerated GPU increase FPS?

A: Hardware-accelerated GPU increases FPS when it offloads math-heavy tasks like BVH traversal, intersections, or video decode to fixed-function units, freeing shaders; actual FPS gain depends on game, scene complexity, and driver.

Q: How does hardware acceleration work?

A: Hardware acceleration works by moving specific graphics or compute tasks from the CPU to specialized GPU hardware—like RT Cores for BVH traversal and Tensor Cores for denoising—so those operations run much faster and use less power.

Check out our other content

Check out other tags:

Most Popular Articles