Managed multi-tenant inference for teams running CV on gigapixel aerial imagery. Bring your own model or start with a tuned YOLO baseline. Per-tenant cost attribution — GPU-seconds, tiles, latency — baked into the API, not bolted on in Grafana.
Tile-aware scheduling, georeferenced outputs, GPU sharing — running on Heronflux right now.
Python SDK, HTTP, or async webhook. Tag every job with a tenant — Heronflux meters per-tile and per-GPU-second automatically.
# pip install heronflux from heronflux import Client client = Client(api_key=...) job = client.jobs.create( model="heronflux/solar-defects-yolov11", inputs=["s3://acme/flights/2026-05/*.tif"], tenant="midwest-power-and-light", geo=True, ) result = client.jobs.wait(job.id) # result.detections — GeoJSON # result.usage — per-tenant cost & tiles
Horizontal inference (Modal, Replicate, SageMaker) doesn't know what a tile is. Vertical drone-data SaaS doesn't expose the inference layer. Heronflux is the missing layer between them — built for aerial CV, and only aerial CV.
Tile-aware batching, weight caching across tenants, and GPU sharing with no quality loss. The substrate that makes per-customer inference viable instead of margin-eating.
Tiling, projection, and georeferencing are first-class. Inputs can be 50,000×50,000 pixels without you writing a single line of tiling code.
Per-tenant GPU quotas, weight isolation, and line-item cost attribution. Bill your end-customers what they actually consumed — to the cent.
Start in minutes with a tuned YOLO baseline, or push your own ONNX, TorchScript, or container. Pin a version per tenant. Cold-start, weight caching, and rollouts handled — you focus on the model.
Drop into an existing pipeline, not replace it.
Start with the tuned YOLO baseline, or push your own weights as ONNX, TorchScript, or a container. Tag a version per tenant. Heronflux handles cold-start, weight caching, and rollouts.
COG, GeoTIFF, S3 URI, or a flight folder. Heronflux tiles with the right overlap, schedules across the fleet, and respects per-tenant quotas automatically.
Bounding boxes, masks, and classifications — all reprojected to the source CRS. GeoJSON, COG, or streamed to your pipeline via webhook.
From solar O&M scans to autonomous perimeter security — anywhere aerial imagery hits a model and a tenant gets billed.
Heronflux tracks GPU-time, tile-count, and inference latency per tenant — automatically. Bill your end-customers from real consumption data, not estimates.
Stop over-provisioning to cover the long tail. Start running tight margins on a fleet that does only what it needs to.
See how Heronflux handles your imagery, your models, your tenants. We'll walk you through a live workload in 30 minutes.