Ray Tracer¶

The ray-tracing engine at the core of USC's carving pipeline. Implements a fused DDA (Digital Differential Analyser) kernel that traverses rays through the voxel grid and accumulates per-voxel hit scores in a single pass.

trace_and_score_dda -- the main workhorse. Traces batches of rays through the grid, incrementing each voxel's score by the ray's weight on every intersection. Runs on GPU via Warp or falls back to CPU.
generate_sky_patch_rays / generate_sun_rays -- expand sample points × directions into flat ray arrays (origins + directions) ready for batch tracing.
auto_batch_size -- estimates the maximum ray batch that fits in GPU memory given the grid resolution, to avoid OOM errors on large scenes.

Rays are processed in batches whose size adapts to available VRAM. The batch loop is the primary computational bottleneck; see Performance for tuning guidance.

`urbansolarcarver.raytracer` ¶

UrbanSolarCarver -- Ray Generation & Voxel Grid Traversal¶

This module provides ray generation (face-culled sky-patch and sun-vector rays) and voxel-grid traversal for the carving pipeline.

Two traversal backends are available:

DDA (Amanatides & Woo 1987) -- a Warp GPU kernel that launches one thread per ray and walks through voxels using the standard 3-D DDA algorithm. Each voxel is visited exactly once per ray, eliminating duplicate hits and reducing VRAM usage by ~10x compared to the broadcasting approach. This is the default when Warp is available and the device is CUDA.
Fixed-step (legacy) -- pure-PyTorch tensor broadcasting. Materializes the full R x S x 3 sample-point tensor in GPU memory. Kept as a CPU fallback and for validation.

References

Amanatides, J. & Woo, A. (1987). "A Fast Voxel Traversal Algorithm for Ray Tracing." Eurographics '87, pp. 3-10.

Coordinate conventions

World-to-grid: grid_idx = floor((world_pt - origin) / cell_size)
Cell size: cell_size = scale / resolution (meters per voxel)
Floor (not round) avoids checkerboard aliasing on planar lattices.

`trace_and_score_dda(min_corner, scale, resolution, origins, ray_dirs, patch_ids, patch_weights, scores, ray_length)` ¶

Fused DDA traversal + score accumulation (Warp GPU kernel).

Atomically adds patch weights to the score volume for every voxel each ray visits. No output buffer, no post-processing. Modifies scores in-place.

Parameters:

Name	Type	Description	Default
`min_corner`	`tuple of float`	Grid origin in world space.	required
`scale`	`float`	Grid extent in meters.	required
`resolution`	`int`	Voxels per axis.	required
`origins`	`(Tensor, shape(R, 3))`	Ray start points and unit directions.	required
`ray_dirs`	`(Tensor, shape(R, 3))`	Ray start points and unit directions.	required
`patch_ids`	`(Tensor, shape(R))`	Sky patch index per ray.	required
`patch_weights`	`(Tensor, shape(P))`	Weight per sky patch.	required
`scores`	`(Tensor, shape(N))`	Flat score volume, modified in-place.	required
`ray_length`	`float`	Maximum march distance.	required

`generate_sky_patch_rays(pts, norms, patch_dirs, device, *, ray_id=None, session=None)` ¶

Build rays for each sample point and sky patch direction. 1. We have P sample locations on a surface and V directions in the sky (divided into patches). 2. For each surface point, we want to know which sky patches are 'visible' (face-culling) by checking if the normal faces that patch. 3. We then record an origin (the point) and a direction (the sky patch) for each valid combination. 4. Normals_per_ray carries the surface normal for each ray, useful later for shading or weighting.

Returns: origins : Tensor of 3D positions where rays start (one per valid combination). ray_dirs : Tensor of unit vectors indicating ray directions. sky_patch_ids : Integer index for which sky patch each ray corresponds to. normals_per_ray : Surface normal vectors repeated per ray. point_idx : Integer index mapping each ray back to its source sample point.

`generate_sun_rays(points_3d, point_normals, sun_directions, device)` ¶

Build ray origins and directions for direct-beam sampling.

Steps: 1) Convert the input arrays (points, normals, sun vectors) to float32 Torch tensors on the specified device. 2) Compute a (P x S) dot-product matrix between each normal and each sun vector. 3) Mask for entries where dot(normal, sun_direction) > 0 -- these are the facets actually facing the sun. 4) Extract the (point_idx, sun_idx) pairs that satisfy the facing test. 5) Gather and return the corresponding origin points and direction vectors.

Args: points_3d: (P,3) NumPy array of sample points on the surface. point_normals:(P,3) NumPy array of normals at each sample point. sun_directions:(S,3) NumPy array of sun direction vectors (should be unit length). device: Torch device (e.g. torch.device('cuda')).

Returns: ray_origins: (R,3) float32 tensor of ray start points. ray_directions:(R,3) float32 tensor of ray direction vectors.

`trace_multi_hit_grid(min_corner, scale, resolution, origins, ray_dirs, sky_patch_ids, voxel_size, ray_length)` ¶

Trace rays through a uniform voxel grid, recording every voxel hit.

Dispatches to the Warp DDA kernel on CUDA (Amanatides & Woo 1987) or falls back to fixed-step PyTorch broadcasting on CPU. The DDA backend visits each voxel exactly once per ray -- no deduplication needed.

Parameters:

Name	Type	Description	Default
`min_corner`	`tuple of float`	World-space (x, y, z) of the grid's lower corner.	required
`scale`	`Tensor or float`	Full spatial extent of the grid in meters (grid_extent).	required
`resolution`	`int`	Number of voxels per axis (grid is resolution³).	required
`origins`	`(Tensor, shape(R, 3))`	Ray start points.	required
`ray_dirs`	`(Tensor, shape(R, 3))`	Unit direction vectors.	required
`sky_patch_ids`	`(Tensor, shape(R))`	Sky patch index per ray.	required
`voxel_size`	`float`	Physical edge length of one voxel (meters).	required
`ray_length`	`float`	Maximum march distance (meters).	required

Returns:

Name	Type	Description
`ray_ids`	`(Tensor, shape(H))`	Which ray produced each hit.
`patch_ids`	`(Tensor, shape(H))`	Sky patch index for each hit.
`voxel_indices`	`(Tensor, shape(H, 3))`	3-D grid coordinate of each visited voxel.

`auto_batch_size(resolution, device, *, vram_fraction=0.6, fallback=300000)` ¶

Compute an optimal ray batch size based on available GPU memory.

For the DDA backend, each ray needs ~4 output ints (ray_id, vx, vy, vz) times an estimated hits-per-ray. We size the batch so that output buffers fit within vram_fraction of free VRAM.

For the legacy fixed-step backend, the dominant cost is the R x S x 3 sample-point tensor (12 bytes per element at float32). We size the batch so this tensor fits within the VRAM budget.

Falls back to fallback on CPU or if VRAM cannot be queried.

Parameters:

Name	Type	Description	Default
`resolution`	`int`	Voxel grid resolution (used to estimate hits-per-ray).	required
`device`	`device`	Target compute device.	required
`vram_fraction`	`float`	Fraction of free VRAM to budget for ray tracing (default 0.6).	`0.6`
`fallback`	`int`	Batch size when VRAM cannot be queried (default 300K).	`300000`

Returns:

Type	Description
`int`	Recommended ray batch size.

Ray Tracer¶

urbansolarcarver.raytracer ¶

UrbanSolarCarver -- Ray Generation & Voxel Grid Traversal¶

trace_and_score_dda(min_corner, scale, resolution, origins, ray_dirs, patch_ids, patch_weights, scores, ray_length) ¶

generate_sky_patch_rays(pts, norms, patch_dirs, device, *, ray_id=None, session=None) ¶

generate_sun_rays(points_3d, point_normals, sun_directions, device) ¶

trace_multi_hit_grid(min_corner, scale, resolution, origins, ray_dirs, sky_patch_ids, voxel_size, ray_length) ¶

auto_batch_size(resolution, device, *, vram_fraction=0.6, fallback=300000) ¶

`urbansolarcarver.raytracer` ¶

`trace_and_score_dda(min_corner, scale, resolution, origins, ray_dirs, patch_ids, patch_weights, scores, ray_length)` ¶

`generate_sky_patch_rays(pts, norms, patch_dirs, device, *, ray_id=None, session=None)` ¶

`generate_sun_rays(points_3d, point_normals, sun_directions, device)` ¶

`trace_multi_hit_grid(min_corner, scale, resolution, origins, ray_dirs, sky_patch_ids, voxel_size, ray_length)` ¶

`auto_batch_size(resolution, device, *, vram_fraction=0.6, fallback=300000)` ¶