rasterioxyz

In my previous role, we regularly dealt with large swathes of commercial very-high-resolution (VHR) satellite and aerial imagery. Not only was this data required by in-house imagery analysts for property damage assessment but for display on the client-facing insurance exposure and claims platform. Such imagery was typically divided into constituent tiles, significant in total volume, in less common format (e.g., .tif and .tfw, MrSID), and projected in a local CRS (e.g., UTM zones). With this in mind, the third-party mapping service employed by the company - where a majority of client-facing data was hosted - impose several constraints on the data uploaded to their platform:

  • Must be byte (uint8) data type
  • Must be GeoTIFF format
  • Must be projected in either EPSG:3857 or EPSG:4326, though data projected in the latter is reprojected on the service-side
  • Must be less than 300 MB when uploaded through the web platform or 10 GB through the API
  • Must upload and process within a one hour timeout when uploaded through the web platform
  • Must be uploaded as a whole and cannot be appended to an existing dataset
  • Lengthy processing on the service-side comprising tiling of data into a proprietary format
  • Often longer than expected upload times, especially through the API where large datasets appeared to stall
  • Hosting costs, albeit with a generous free tier
These combined necessitated identification of a fast, inexpensive alternative. Despite having used XYZ tiles through the entirety of my career thus far, as well as much of my geospatial education, it was discovery of the simplicity in serving these from cloud storage that they became the clear choice.

In practice, these were generated within QGIS, written to local storage, then uploaded to Azure Blob Storage via the Azure SDK for Python, AzCopy, or Azure Storage Explorer. While largely fire-and-forget tasks, this workflow required manual intervention between steps. Furthermore, with larger datasets, QGIS' CPU and memory use during tiling often made the machine on which it was running near unusable.

Inspired by a need to automate the tiling of commercial high-resolution satellite and aerial imagery, rasterioxyz is a lightweight, low-dependency Python package for tiling georeferenced rasterio datasets according to the XYZ tiles standard. Most, if not all, geospatial professionals will have used XYZ tiles at some point, be it OpenStreetMap or Google Satellite. Many of these may have created their own XYZ dataset(s) at another. It is my hope that this package may be of use to those seeking a clean, efficient, and easily integrated way to do so in their own applications.


Design

While reprojecting the entire source image at the maximum resolution required, thereof dictated by the maximum zoom level specified, would result in faster tiling, this represents a considerable potential source of memory issues. Such an approach would preclude the tiling of large images, be they large due to spatial resolution, data type, number of bands, and/or area covered. By lazily reading and, if needed, reprojecting windows of the source dataset at a given tile's resolution, memory use is kept low.
Data of all projections and data types can be tiled with no alterations made to the original.


Comparisons

This comparison employs the same Sentinel-2 L2A tile (T17RLK) for an area of western Florida and the Gulf of Mexico, shortly after the passage of Hurricane Ian. Natively projected in EPSG:32617 (WGS 84 / UTM zone 17N), QGIS' on-the-fly projection could be advantageous regarding tiling speed. However, as can be seen in the below images, QGIS-generated tiles introduce not insignificant shifting in pixel locations and artefacts from resampling.

Source image (EPSG:32617)

Source image, OpenStreetMap overlay

QGIS zoom level 12

QGIS zoom level 12, OpenStreetMap overlay

rasterioxyz zoom level 12

rasterioxyz zoom level 12, OpenStreetMap overlay


While - at present - rasterioxyz is several times slower than QGIS' equivalent functionality, rasterioxyz utilises just one thread compared to QGIS' 16 in testing on the same machine. Investigation of threads' effectiveness in rasterioxyz's reprojection and writing functionality is high priority.

Roadmap

A number of improvements to rasterioxyz are possible and/or planned:

  • Enable tiling within a user-specified subset of a dataset
  • Enable user selection of raster bands/channels to tile
  • Reduce the write time of tile images, ideally without further dependencies
  • Resume/continue mode for interrupted tiling operations
  • Greater support for different argument values/types (e.g., int/float for resampling)
  • Command line functionality
  • In-built support for AWS, Azure, and GCP cloud storage write, likely through requests to reduce further dependencies
  • Enable tiling of sequences of rasterio datasets
  • Generate HTML map of tiled data


-->