Runtime Architecture

The TopoBathySim Runtime is a stateless engine that executes Fusion Policies.

Core Concept

Unlike traditional pipelines that hardcode logic (e.g., “Always put Lidar over GEBCO”), the Runtime is agnostic. It simply executes the list of steps defined in the Policy YAML.

        graph TD
    subgraph Policy Loading
        Policy[Policy YAML] -->|Load & Validate| Schema[Pydantic Models]
        Schema --> Runtime[Runtime Engine]
    end

    subgraph Runtime Execution
        Runtime -->|1. Initialize| Canvas[Canvas: Elevation + Provenance]

        Runtime -->|2. Loop Steps| Registry[Provider Registry]
        Registry -->|Get Provider| Match

        Runtime -->|3. Fetch & Align| Match[Fetch Layer (Lazy/Cached) + Reproject]

        Match -->|4. Check Rules| Rules{Transition Rules?}

        Rules -- Yes --> TransOp[Apply Specific Operator (e.g. Feather)]
        Rules -- No --> DefOp[Apply Default Operator (e.g. Overwrite)]

        TransOp --> Blend[Blend into Canvas]
        DefOp --> Blend

        Blend -->|Update| Provenance[Update Source Mask]
    end

    subgraph Output
        Provenance -->|Finalize| Dataset[Xarray Dataset]
        Dataset -->|Save| Zarr[Fused Zarr Cache]
        Dataset -->|Write sidecars| Sidecars["_meta.json + _src.npz"]
    end
    

Components

Runtime Engine (topobathysim.runtime)

The entry point run(policy_path, bbox) orchestrates the entire process:

  1. Canvas Initialization: Creates a blank xarray.Dataset covering the requested BBox in the Policy’s CRS.

  2. Step Execution: Iterates through policy steps.

  3. Data Fetching: Calls provider.fetch_layer().

  4. Alignment: Reprojects the fetched layer to match the Canvas pixel grid (using rio.reproject_match).

  5. Composition: Applies the blening operator (Overwrite/Feather) to merge the aligned layer into the canvas.

Providers (topobathysim.providers)

Providers are standardized adapters that fetch data from remote sources.

  • Lazy Loading: Providers like gebco_2025 only open network connections when a requested tile is not in the local Zarr cache.

  • Smart Caching: Data is cached locally as Zarr (for dense grids) or COGs/LAZ (for source files).

  • Aliases: Providers are registered with short names (e.g., usgs_3dep, ncei_bag) used in the Policy YAML.

  • No-Data Signalling: When a provider has no coverage for a requested cell it raises ProviderNoDataError (a LookupError subclass). This is a normal operating condition — the runtime skips the provider silently at DEBUG log level. Only genuine unexpected failures are logged at ERROR with a traceback.

Provenance System

Every tile carries a full record of which source dataset contributed each pixel.

Provider → Dataset

Each fetch_layer() call returns an xr.Dataset containing two arrays:

  • elevation — float32 depth/height values

  • source_id — uint32 pixel map where each value identifies the contributing survey or sub-tile (provider-assigned IDs for BAG/BlueTopo; hash-derived IDs for other providers)

The dataset’s attrs["provenance_dict"] maps each source_id integer to a {"name": "...", "provider": "..."} dict.

Runtime → Fused Dataset

As the runtime processes policy steps, it accumulates provenance_dict entries from all providers into a single cell_provenance_dict. The final fused xr.Dataset carries the merged provenance across all cells.

Service → Tile Sidecars

When a rendered tile is written to the tile cache, two sidecar files are written alongside it:

File

Contents

{y}_{hash}_meta.json

provenance_dict filtered to source IDs present in this specific tile. Read by GET /tiles/{z}/{x}/{y}/metadata.

{y}_{hash}_src.npz

Compressed uint32 source_elevation array (512 × 512, padding stripped). Read by GET /tiles/{z}/{x}/{y}/pixel for per-pixel source lookups.

Viewer

The Leaflet popup fetches both /metadata and /pixel for each clicked location, displaying the pixel-level contributing survey (name + provider color swatch) and a deduplicated, priority-sorted list of all surveys present in the tile.

Caching Strategy

  1. Source Cache: Original files (TIFF, BAG, LAZ) downloaded from agencies (download providers only; streaming providers write directly to zarr).

  2. Provider Zarr Cache: Intermediate rasterised chunks per provider, stored for fast repeated access.

  3. Fused Zarr Cache: Multi-provider composite grids keyed by {policy, cell_bbox, resolution, crs} hash.

  4. Tile Cache: Final XYZ PNG/NPY/NPZ tiles at ~/.cache/topobathysim/tiles/, with _meta.json and _src.npz sidecars.

See CLI Tools for cache tier management.