Geospatial Metric Taxonomy for ETL
Architecture
Geospatial ETL pipelines operate across distributed compute clusters, cloud storage tiers, and heterogeneous coordinate reference systems, demanding an observability baseline that captures both infrastructure state and geometric integrity. The foundation begins with establishing telemetry collection boundaries between ingestion nodes, transformation workers, and spatial indexers. In production, engineers must decouple metric emission from the data plane to prevent backpressure. Lightweight telemetry sidecars run alongside spatial operators (e.g., GDAL/OGR workers, PostGIS transform pods), routing structured logs, custom counters, and distributed traces to a centralized sink via asynchronous batched gRPC streams.
When designing this layer, pipeline architects must map data flow against Defining Spatial Data Trust Boundaries to isolate exactly where coordinate transformations, topology validations, and schema mutations occur. Trust boundaries dictate metric authority: raw WKB ingestion emits baseline geometry counts and SRID validation flags, while post-join stages track topology preservation ratios and ring orientation compliance. In distributed deployments, regional edge caches, replicated tile servers, and cross-availability-zone replication introduce latency asymmetries that distort spatial freshness metrics. The architecture must standardize telemetry envelopes to preserve SRID metadata, bounding box extents, and feature density signals, ensuring downstream alerting correlates infrastructure health with geometric fidelity.
Metric Taxonomy
A geospatial metric taxonomy must extend beyond traditional row-count and latency measurements to capture spatial state transitions. The taxonomy is organized into four operational dimensions: structural, volumetric, temporal, and quality. Each metric includes a standardized naming convention, unit, and production-ready detection threshold.
flowchart LR T["Geospatial metric taxonomy"] --> S["Structural"] T --> V["Volumetric"] T --> M["Temporal"] T --> Q["Quality"] S --> S1["crs_mismatch_count"] S --> S2["geom_type_drift_ratio"] S --> S3["topology_error_rate"] V --> V1["feature_density_variance"] V --> V2["bbox_drift_degrees"] V --> V3["null_geom_ratio"] M --> M1["index_build_lag_ms"] M --> M2["transform_queue_backlog"] M --> M3["source_sync_delta_hours"] Q --> Q1["coordinate_precision_loss_meters"] Q --> Q2["self_intersection_count"]
| Dimension | Metric Key | Description | Unit | Warning Threshold | Critical Threshold |
|---|---|---|---|---|---|
| Structural | spatial.crs_mismatch_count |
Features ingested with unexpected or undefined SRIDs | Count | > 50 per batch | > 200 per batch |
spatial.geom_type_drift_ratio |
Ratio of unexpected geometry types (e.g., MultiPolygon vs Polygon) | % | > 2% | > 8% | |
spatial.topology_error_rate |
Invalid geometries (self-intersections, unclosed rings) post-validation | % | > 1% | > 5% | |
| Volumetric | spatial.feature_density_variance |
Standard deviation of feature count per spatial partition/tile | σ | > 3.0 | > 7.5 |
spatial.bbox_drift_degrees |
Bounding box expansion ratio after spatial joins or buffering | Degrees | > 0.001° | > 0.01° | |
spatial.null_geom_ratio |
Percentage of records with NULL or empty geometry payloads |
% | > 0.5% | > 3% | |
| Temporal | spatial.index_build_lag_ms |
Time delta between feature commit and spatial index availability | ms | > 1500 | > 5000 |
spatial.transform_queue_backlog |
Pending CRS conversion or topology validation tasks | Count | > 500 | > 2000 | |
spatial.source_sync_delta_hours |
Staleness relative to authoritative upstream feeds | Hours | > 2.0 | > 6.0 | |
| Quality | spatial.coordinate_precision_loss_meters |
RMS error introduced during projection shifts (e.g., WGS84 → UTM) | Meters | > 0.5m | > 2.0m |
spatial.self_intersection_count |
Detected self-intersecting polygons/lines after snapping | Count | > 100 | > 1000 |
These metrics must align with Observability Scoping Rules for Vector Data, which mandate that telemetry collection respects feature hierarchy boundaries. Point, line, and polygon datasets require distinct baselines; linework pipelines track segment snapping tolerance violations, while polygon pipelines monitor ring orientation and hole containment ratios.
Pipeline Integration & Telemetry Routing
Integrating this taxonomy into existing ETL frameworks requires explicit OpenTelemetry instrumentation and deterministic metric routing. The following Python snippet demonstrates how to attach spatial attributes to OTel counters using the official SDK:
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
reader = PrometheusMetricReader()
provider = MeterProvider(metric_readers=[reader])
metrics.set_meter_provider(provider)
meter = metrics.get_meter("geospatial.etl")
# Track cumulative invalid geometry count (use Counter, not a ratio gauge)
invalid_geom_counter = meter.create_counter(
"spatial.topology_invalid_total",
description="Count of invalid geometries per batch",
unit="1"
)
def emit_topology_metrics(batch_id: str, srid: int, error_count: int, total_features: int):
invalid_geom_counter.add(
error_count,
attributes={
"batch.id": batch_id,
"spatial.srid": str(srid),
"spatial.geom_type": "polygon",
"pipeline.stage": "post_join_validation"
}
)
For orchestration, pipeline health checks must run deterministically before downstream consumers consume indexed tiles. Implementing Setting up spatial pipeline health checks in Airflow ensures that DAGs pause or reroute when spatial quality gates fail. A typical Airflow sensor configuration validates index freshness and topology thresholds before triggering tile generation:
# airflow_dag_spatial_gate.yaml (pseudo-config for documentation; wire via Python DAG)
spatial_quality_gate:
task_type: PythonSensor
task_id: validate_spatial_integrity
timeout: 1800
mode: poke
retries: 2
retry_delay: 120
op_kwargs:
prometheus_url: "http://prometheus:9090/api/v1/query"
# Alert when topology error rate exceeds 5% for the ETL pipeline
query: "spatial_topology_error_rate{pipeline='etl_national_boundaries'} > 0.05"
fail_on_match: true
Detection Thresholds & Alerting
Thresholds must be calibrated to the spatial resolution and use-case of the dataset. High-precision cadastral pipelines require tighter bounds than continental-scale rasterized vector layers. Alerting rules should be tiered:
- Page (P1):
spatial.crs_mismatch_countexceeds critical threshold, orspatial.source_sync_delta_hours> 6.0. Indicates broken ingestion or upstream feed failure. - Ticket (P2):
spatial.topology_error_rate> 5% orspatial.index_build_lag_ms> 5000. Requires engineering review of transformation workers or index rebuild jobs. - Dashboard Warning (P3):
spatial.feature_density_variance> 3.0 orspatial.bbox_drift_degrees> 0.001°. Signals potential data skew or inefficient spatial join predicates.
Prometheus recording rules should pre-aggregate spatial metrics to reduce query load during incident response:
# prometheus-rules.yaml
groups:
- name: spatial_etl_recording
rules:
- record: spatial:topology_error_rate:avg5m
expr: avg_over_time(spatial_topology_invalid_total[5m])
- name: spatial_etl_alerts
rules:
- alert: SpatialTopologyDegradation
expr: spatial:topology_error_rate:avg5m > 0.05
for: 10m
labels:
severity: critical
team: gis-platform
annotations:
summary: "Spatial topology error rate exceeds 5% for {{ $labels.pipeline }}"
description: "Check CRS alignment and snapping tolerance in the transformation worker logs."
Troubleshooting Spatial Metric Lag
When metrics drift from actual pipeline state, engineers must isolate whether the lag originates in telemetry collection, metric aggregation, or spatial processing bottlenecks. Follow this diagnostic sequence:
- Verify OTel Exporter Flush Intervals: Default batch processors may buffer spatial counters for 5–10 seconds. Reduce
OTEL_BSP_MAX_EXPORT_BATCH_SIZEto50andOTEL_BSP_SCHEDULE_DELAYto2000for near-real-time visibility. - Cross-Region Sync Validation: In federated deployments, cross-AZ replication queues can delay metric propagation. Ensure Prometheus federation endpoints scrape regional aggregators with
honor_timestamps: trueto preserve original emission times. - CRS Conversion Bottlenecks: High
spatial.transform_queue_backlogpaired with stable CPU metrics often indicates thread contention in projection libraries. Profile GDAL/OGR workers withperf record -gand verify thatPROJ_NETWORKis disabled in air-gapped environments to prevent HTTP timeout stalls. - Fallback Chain Activation: When primary spatial APIs degrade, pipelines should route to cached geometry stores. Monitor
spatial.fallback_activation_ratioand ensure fallback payloads carry identical SRID and precision metadata to prevent silent quality degradation.
For persistent lag, validate that vector scoping rules are enforced at the collector level. Misconfigured attribute filters can drop high-cardinality spatial labels, causing metric aggregation collisions and artificial latency spikes. Always correlate spatial.index_build_lag_ms with database pg_stat_activity to confirm that index creation isn’t blocked by long-running transaction snapshots.