Azure Synapse Analytics Blog articles

Azure Synapse Analytics Blog articles

https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/bg-p/AzureSynapseAnalyticsBlog

Azure Synapse Analytics Blog articles

Introduce a Notebook gallery image to process Geospatial data from Planetary Computer with STAC API

Published

Introduce a Notebook gallery image to process Geospatial data from Planetary Computer with STAC API

Introduction 

We are introducing a spaceborne data processing Notebook, which has been published to Azure Synapse Analytics Gallery. The Notebook uses STAC API (SpatioTemporal Asset Catalog) to search and download geospatial data from Microsoft Planetary Computer to an Azure Storage account and perform basic geospatial transformation. 

 

What is Azure Orbital Analytics? 

Azure Orbital Analytics is a set of capabilities using spaceborne data and AI that allow you to discover and distribute the most valuable insights. More specifically, Azure Orbital Analytics provides the ability to downlink spaceborne data from Azure Orbital Ground Station (AOGS), first or third-party archives, or customer-acquired data directly into Azure. The raw spaceborne sensor data is converted into analysis-ready data using Azure Orbital Analytics processing pipelines. 

 

What problem are we solving? 

Spaceborne data originating from remote sensors are characterized by high volume, high velocity, and a high degree of variety. How to ingest and store massive volume of spaceborne data from different sources, transform and process the data in a repeatable workflow, integrate with best-in-class ML models to drive analysis and extraction of business insights is the core problem we strive to solve. The notebook demonstrates the capability to tackle ingesting and transforming spaceborne data in an automated workflow. We’ll cover integration of AI/ML in a future blog. 

 

Simple Image Procurement and Processing from Planetary Computer STAC API.  

We will run through the steps involved in getting data from an area of interest that we define, downloading it and performing a simple geospatial transformation. Please check out the demo video for a detailed walkthrough of running the Notebook in Synapse Studio. 

 

i. Pre-requisites 

Before starting the steps in this Notebook, the following are a few things to setup: 

 

a. Storage Account (Gen 2) 

 

ii. Setup 

This Notebook requires that your Synapse Workspace instance be setup with: 

 

a. Python Packages Install through requirements.yml file 

 

Upload the requirements.yml below to your Spark Pool to install the required Python package pre-requisites by following the step in this document 

 

 

 

 

name: demo channels: - conda-forge - defaults dependencies: - pystac-client - rich - planetary-computer - gdal - pip: - geopandas

 

 

 

 

b. Create a Linked Service to the Storage Account by following the steps in this document

 

c. Create a storage account container called my-container in your storage account.

 

Reading Data from the STAC API

 

The Planetary Computer catalogs the datasets we host using the STAC (SpatioTempoaral Asset Catalog) specification. It provides a STAC API endpoint that can be used to search the datasets by space, time and more.

 

from pystac_client import Client

catalog = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

 

 

 

 

{"session_id":8,"execution_finish_time":"2022-06-20T04:33:04.4973641Z","session_start_time":"2022-06-20T04:28:52.3428885Z","livy_statement_state":"available","state":"finished","execution_start_time":"2022-06-20T04:33:01.7453198Z","queued_time":"2022-06-20T04:28:52.2842171Z","statement_id":1,"spark_pool":"pystacpool"}

 

 

 

 

Mount Azure Storage Account

Azure Synapse Analytics has two new mount/unmount APIs in mssparkutils package, you can use to mount / attach remote storage (Blob, Gen2, Azure File Share) to all working nodes (driver node and workder nodes), after that, you can access data in storage as if they were one of the local file system with local file API.


Substitute the values of your Storage account, Container and Linked Service in the next code section.


from notebookutils import mssparkutils

 

 

 

 

mssparkutils.fs.unmount("/test") mssparkutils.fs.mount( "abfss://[email protected]", "/test", {"linkedService":"mygen2account"} ) {"session_id":8,"execution_finish_time":"2022-06-20T05:04:47.1036486Z","session_start_time":null,"livy_statement_state":"available","state":"finished","execution_start_time":"2022-06-20T05:04:26.3451941Z","queued_time":"2022-06-20T05:04:26.1826246Z","statement_id":31,"spark_pool":"pystacpool"} True Searching

 

 

 

 

Using STAC API, you can search for assets meeting certain criteria. This might include the most common use cases for Geospatial data like spatial extent, date and time to other not so common attributes like cloud cover.


In this example, we'll search for imagery from Landsat Collection 2 Level-2 area around Houston, Texas.

 

 

 

 

time_range = "2021-12-01/2022-12-31" # Bounding box must be within (-180, -90, 180, 90) bbox = [-95.42668576006342, 29.703952013356414, -95.27826526902945, 29.745286282512772] search = catalog.search(collections=["sentinel-2-l2a"], bbox=bbox, datetime=time_range) items = search.get_all_items() len(items) {"session_id":8,"execution_finish_time":"2022-06-20T04:35:22.5309275Z","session_start_time":null,"livy_statement_state":"available","state":"finished","execution_start_time":"2022-06-20T04:35:20.7033969Z","queued_time":"2022-06-20T04:35:05.6668302Z","statement_id":3,"spark_pool":"pystacpool"} 100

 

 

 

 

Each pystac.Item <https://pystac.readthedocs.io/en/stable/api/pystac.html#pystac.Item>__ in this ItemCollection includes all the metadata for that scene. STAC Items are GeoJSON features, and so can be loaded by libraries like geopandas.

 

 

 

 

import geopandas df = geopandas.GeoDataFrame.from_features(items.to_dict(), crs="epsg:4326") df {"session_id":8,"execution_finish_time":"2022-06-20T04:35:28.0278163Z","session_start_time":null,"livy_statement_state":"available","state":"finished","execution_start_time":"2022-06-20T04:35:22.8021443Z","queued_time":"2022-06-20T04:35:16.4123418Z","statement_id":4,"spark_pool":"pystacpool"} geometry \ 0 POLYGON ((-95.51529 29.71709, -95.49899 29.781... 1 POLYGON ((-95.74429 28.81058, -95.72336 28.893... 2 POLYGON ((-96.13202 30.69527, -94.98651 30.717... 3 POLYGON ((-94.96323 29.54331, -94.99310 29.438... 4 POLYGON ((-95.51562 29.71708, -95.48697 29.829... .. ... 95 POLYGON ((-95.73086 28.81083, -95.70743 28.904... 96 POLYGON ((-96.13202 30.69527, -94.98651 30.717... 97 POLYGON ((-94.96225 29.49225, -94.97736 29.439... 98 POLYGON ((-95.50208 29.71734, -95.47311 29.831... 99 POLYGON ((-95.73132 28.81082, -95.69794 28.943... datetime platform proj:epsg instruments \ 0 2022-06-17T16:48:51.024000Z Sentinel-2A 32615 [msi] 1 2022-06-17T16:48:51.024000Z Sentinel-2A 32615 [msi] 2 2022-06-15T16:58:49.024000Z Sentinel-2B 32615 [msi] 3 2022-06-15T16:58:49.024000Z Sentinel-2B 32615 [msi] 4 2022-06-12T16:48:49.024000Z Sentinel-2B 32615 [msi] .. ... ... ... ... 95 2022-02-17T16:53:51.024000Z Sentinel-2A 32615 [msi] 96 2022-02-15T17:03:59.024000Z Sentinel-2B 32615 [msi] 97 2022-02-15T17:03:59.024000Z Sentinel-2B 32615 [msi] 98 2022-02-12T16:54:19.024000Z Sentinel-2B 32615 [msi] 99 2022-02-12T16:54:19.024000Z Sentinel-2B 32615 [msi] s2:mgrs_tile constellation \ 0 15RTP Sentinel 2 1 15RTN Sentinel 2 2 15RTP Sentinel 2 3 15RTN Sentinel 2 4 15RTP Sentinel 2 .. ... ... 95 15RTN Sentinel 2 96 15RTP Sentinel 2 97 15RTN Sentinel 2 98 15RTP Sentinel 2 99 15RTN Sentinel 2 s2:granule_id eo:cloud_cover \ 0 S2A_OPER_MSI_L2A_TL_ESRI_20220619T211510_A0364... 41.495755 1 S2A_OPER_MSI_L2A_TL_ESRI_20220619T214705_A0364... 32.434174 2 S2B_OPER_MSI_L2A_TL_ESRI_20220618T184141_A0275... 68.859446 3 S2B_OPER_MSI_L2A_TL_ESRI_20220618T171532_A0275... 72.318602 4 S2B_OPER_MSI_L2A_TL_ESRI_20220613T135615_A0275... 27.690512 .. ... ... 95 S2A_OPER_MSI_L2A_TL_ESRI_20220225T231054_A0347... 51.746142 96 S2B_OPER_MSI_L2A_TL_ESRI_20220223T180541_A0258... 33.355522 97 S2B_OPER_MSI_L2A_TL_ESRI_20220223T184058_A0258... 50.559312 98 S2B_OPER_MSI_L2A_TL_ESRI_20220222T095739_A0257... 97.339147 99 S2B_OPER_MSI_L2A_TL_ESRI_20220222T092438_A0257... 96.213692 s2:datatake_id ... s2:cloud_shadow_percentage \ 0 GS2A_20220617T164851_036488_N04.00 ... 3.445204 1 GS2A_20220617T164851_036488_N04.00 ... 3.215249 2 GS2B_20220615T165849_027551_N04.00 ... 1.270805 3 GS2B_20220615T165849_027551_N04.00 ... 3.873298 4 GS2B_20220612T164849_027508_N04.00 ... 1.566665 .. ... ... ... 95 GS2A_20220217T165351_034772_N04.00 ... 9.124485 96 GS2B_20220215T170359_025835_N04.00 ... 2.878365 97 GS2B_20220215T170359_025835_N04.00 ... 7.155138 98 GS2B_20220212T165419_025792_N04.00 ... 2.604990 99 GS2B_20220212T165419_025792_N04.00 ... 3.042660 s2:nodata_pixel_percentage s2:unclassified_percentage \ 0 63.893074 0.934780 1 41.600239 0.871720 2 0.000000 1.018232 3 7.218417 0.300628 4 64.242268 1.465398 .. ... ... 95 42.771989 3.662383 96 0.000000 2.258291 97 6.253406 1.707994 98 65.393096 0.000038 99 43.142593 0.040340 s2:dark_features_percentage s2:not_vegetated_percentage \ 0 0.007406 15.930077 1 0.031576 15.835543 2 0.010441 3.778080 3 0.000150 6.523465 4 0.008444 16.594319 .. ... ... 95 0.000000 21.371423 96 0.000000 35.007137 97 0.000000 25.039041 98 0.000000 0.000901 99 0.000000 0.124603 s2:degraded_msi_data_percentage s2:high_proba_clouds_percentage \ 0 0.0309 24.857128 1 0.0192 15.196441 2 0.0197 33.900774 3 0.0212 24.447919 4 0.0000 6.556357 .. ... ... 95 0.0196 33.505720 96 0.0197 14.832801 97 0.0210 29.913568 98 0.0000 70.515543 99 0.0010 83.190370 s2:reflectance_conversion_factor s2:medium_proba_clouds_percentage \ 0 0.969449 15.904395 1 0.969449 11.023247 2 0.969857 22.864167 3 0.969857 30.855566 4 0.970537 12.779278 .. ... ... 95 1.025780 18.240383 96 1.026545 14.048755 97 1.026545 15.029076 98 1.027641 24.850504 99 1.027641 12.083349 s2:saturated_defective_pixel_percentage 0 0.0 1 0.0 2 0.0 3 0.0 4 0.0 .. ... 95 0.0 96 0.0 97 0.0 98 0.0 99 0.0

 

 

 

 

[100 rows x 34 columns]


Next, we sort the items by cloudiness and select one with low cloudiness

 

 

 

 

selected_item = min(items, key=lambda item: item.properties["eo:cloud_cover"]) selected_item {"session_id":8,"execution_finish_time":"2022-06-20T04:35:36.8016275Z","session_start_time":null,"livy_statement_state":"available","state":"finished","execution_start_time":"2022-06-20T04:35:36.648027Z","queued_time":"2022-06-20T04:35:36.5446409Z","statement_id":6,"spark_pool":"pystacpool"} <Item id=S2A_MSIL2A_20220401T165851_R069_T15RTP_20220402T055130> Next, we will load the preview image from the selected item and display it using the Image package. This will shows us a preview of the area of interest. from IPython.display import Image Image(url=selected_item.assets["rendered_preview"].href, width=500) {"session_id":8,"execution_finish_time":"2022-06-20T04:35:38.1821625Z","session_start_time":null,"livy_statement_state":"available","state":"finished","execution_start_time":"2022-06-20T04:35:38.0290315Z","queued_time":"2022-06-20T04:35:37.9398703Z","statement_id":7,"spark_pool":"pystacpool"} <IPython.core.display.Image object> import rich.table table = rich.table.Table("Asset Key", "Descripiption") for asset_key, asset in selected_item.assets.items(): # print(f"{asset_key:<25} - {asset.title}") table.add_row(asset_key, asset.title) table {"session_id":8,"execution_finish_time":"2022-06-20T04:35:41.0059001Z","session_start_time":null,"livy_statement_state":"available","state":"finished","execution_start_time":"2022-06-20T04:35:40.8482942Z","queued_time":"2022-06-20T04:35:40.7372976Z","statement_id":8,"spark_pool":"pystacpool"} ┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Asset Key ┃ Descripiption ┃ ┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ AOT │ Aerosol optical thickness (AOT) │ │ B01 │ Band 1 - Coastal aerosol - 60m │ │ B02 │ Band 2 - Blue - 10m │ │ B03 │ Band 3 - Green - 10m │ │ B04 │ Band 4 - Red - 10m │ │ B05 │ Band 5 - Vegetation red edge 1 - 20m │ │ B06 │ Band 6 - Vegetation red edge 2 - 20m │ │ B07 │ Band 7 - Vegetation red edge 3 - 20m │ │ B08 │ Band 8 - NIR - 10m │ │ B09 │ Band 9 - Water vapor - 60m │ │ B11 │ Band 11 - SWIR (1.6) - 20m │ │ B12 │ Band 12 - SWIR (2.2) - 20m │ │ B8A │ Band 8A - Vegetation red edge 4 - 20m │ │ SCL │ Scene classfication map (SCL) │ │ WVP │ Water vapour (WVP) │ │ visual │ True color image │ │ preview │ Thumbnail │ │ safe-manifest │ SAFE manifest │ │ granule-metadata │ Granule metadata │ │ inspire-metadata │ INSPIRE metadata │ │ product-metadata │ Product metadata │ │ datastrip-metadata │ Datastrip metadata │ │ tilejson │ TileJSON with default rendering │ │ rendered_preview │ Rendered preview │ └────────────────────┴───────────────────────────────────────┘

 

 

 

 

Download the Raster Data


GeoTiff cannot be downloaded from Planetary Computer using the link directly as it results in a 404. That's because the Planetary Computer uses Azure Blob Storage SAS Tokens to enable access to the data.


To get a token for access, you can use the Planetary Computer's Data Authentication API. You can access that anonymously, or you can provide an API Key for higher rate limits and longer-lived token.s


Using the SAS Token generated via Planetary Computer library, we download the data to the Storage Account for further processing.

 

 

 

 

import planetary_computer import requests jobId = mssparkutils.env.getJobId() signed_href = planetary_computer.sign(selected_item).assets["visual"].href print(signed_href) response = requests.get(signed_href) open("/synfs/%s/test/sample_image.tif" % jobId, 'wb').write(response.content) {"session_id":8,"execution_finish_time":"2022-06-20T05:05:32.7157039Z","session_start_time":null,"livy_statement_state":"available","state":"finished","execution_start_time":"2022-06-20T05:04:53.538153Z","queued_time":"2022-06-20T05:04:53.42364Z","statement_id":32,"spark_pool":"pystacpool"} https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/15/R/TP/2022/04/01/S2A_MSIL2A_20220401T165851_N0400_R069_T15RTP_20220402T055130.SAFE/GRANULE/L2A_T15RTP_A035387_20220401T170337/IMG_DATA/R10m/T15RTP_20220401T165851_TCI_10m.tif?st=2022-06-19T04%3A35%3A51Z&se=2022-06-20T05%3A20%3A51Z&sp=rl&sv=2020-06-12&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2022-06-20T03%3A03%3A57Z&ske=2022-06-27T03%3A03%3A57Z&sks=b&skv=2020-06-12&sig=AXxZTJTIIugeEjf/VyGpkAHSQWRWXOoHgjxad%2B7Qj/Q%3D 331618195

 

 

 

 

Read the Raster Data Information


We start by reading the GDAL data information. To do this, we use gdal.info() method on the downloaded GeoTiff file. The gdalinfo will report on format driver used to access the file, raster size, coordinate system and so on.


Raster data in the form of GeoTiff will be retrieve from the storage account, where we saved in the previous step by directly calling the STAC API. Since the Storage Account is mounted we will use the mounted file path that starts with the prefix synfs. Storage account is mounted under a jobId specific to the run.


from osgeo import gdal

 

 

 

 

gdal.UseExceptions() dataset_info = gdal.Info("/synfs/%s/test/sample_image.tif" % jobId) print(dataset_info) {"session_id":8,"execution_finish_time":"2022-06-20T05:05:50.1055049Z","session_start_time":null,"livy_statement_state":"available","state":"finished","execution_start_time":"2022-06-20T05:05:48.2524142Z","queued_time":"2022-06-20T05:05:48.139999Z","statement_id":33,"spark_pool":"pystacpool"} Driver: GTiff/GeoTIFF Files: /synfs/8/test/sample_image.tif Size is 10980, 10980 Coordinate System is: PROJCRS["WGS 84 / UTM zone 15N", BASEGEOGCRS["WGS 84", DATUM["World Geodetic System 1984", ELLIPSOID["WGS 84",6378137,298.257223563, LENGTHUNIT["metre",1]]], PRIMEM["Greenwich",0, ANGLEUNIT["degree",0.0174532925199433]], ID["EPSG",4326]], CONVERSION["UTM zone 15N", METHOD["Transverse Mercator", ID["EPSG",9807]], PARAMETER["Latitude of natural origin",0, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8801]], PARAMETER["Longitude of natural origin",-93, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8802]], PARAMETER["Scale factor at natural origin",0.9996, SCALEUNIT["unity",1], ID["EPSG",8805]], PARAMETER["False easting",500000, LENGTHUNIT["metre",1], ID["EPSG",8806]], PARAMETER["False northing",0, LENGTHUNIT["metre",1], ID["EPSG",8807]]], CS[Cartesian,2], AXIS["(E)",east, ORDER[1], LENGTHUNIT["metre",1]], AXIS["(N)",north, ORDER[2], LENGTHUNIT["metre",1]], USAGE[ SCOPE["Engineering survey, topographic mapping."], AREA["Between 96°W and 90°W, northern hemisphere between equator and 84°N, onshore and offshore. Canada - Manitoba; Nunavut; Ontario. Ecuador -Galapagos. Guatemala. Mexico. United States (USA)."], BBOX[0,-96,84,-90]], ID["EPSG",32615]] Data axis to CRS axis mapping: 1,2 Origin = (199980.000000000000000,3400020.000000000000000) Pixel Size = (10.000000000000000,-10.000000000000000) Metadata: AREA_OR_POINT=Area Image Structure Metadata: COMPRESSION=DEFLATE INTERLEAVE=PIXEL LAYOUT=COG PREDICTOR=2 Corner Coordinates: Upper Left ( 199980.000, 3400020.000) ( 96d 7'55.26"W, 30d41'42.99"N) Lower Left ( 199980.000, 3290220.000) ( 96d 6' 2.95"W, 29d42'21.14"N) Upper Right ( 309780.000, 3400020.000) ( 94d59'11.49"W, 30d43' 4.32"N) Lower Right ( 309780.000, 3290220.000) ( 94d58' 0.17"W, 29d43'39.32"N) Center ( 254880.000, 3345120.000) ( 95d32'47.47"W, 30d12'46.47"N) Band 1 Block=512x512 Type=Byte, ColorInterp=Red NoData Value=0 Overviews: 5490x5490, 2745x2745, 1372x1372, 686x686, 343x343 Band 2 Block=512x512 Type=Byte, ColorInterp=Green NoData Value=0 Overviews: 5490x5490, 2745x2745, 1372x1372, 686x686, 343x343 Band 3 Block=512x512 Type=Byte, ColorInterp=Blue NoData Value=0 Overviews: 5490x5490, 2745x2745, 1372x1372, 686x686, 343x343

 

 

 

 

Convert GeoTiff to PNG


The conversion of GeoTiff to PNG can be performed using GDAL Translate. The gdal translate utility can be used to convert raster data between different formats, potentially performing some operations like subsettings, resampling and rescaling pixels in the process.


The resultant PNG file is saved back to the Storage Account.

 

 

 

 

tiff_in = "/synfs/%s/test/sample_image.tif" % jobId png_out = "/synfs/%s/test/sample_image.png" % jobId options = gdal.TranslateOptions(format='PNG') gdal.Translate(png_out, tiff_in, options=options) {"session_id":8,"execution_finish_time":"2022-06-20T05:47:19.3033767Z","session_start_time":null,"livy_statement_state":"available","state":"finished","execution_start_time":"2022-06-20T05:46:23.6883292Z","queued_time":"2022-06-20T05:46:23.5222257Z","statement_id":34,"spark_pool":"pystacpool"} <osgeo.gdal.Dataset; proxy of <Swig Object of type 'GDALDatasetShadow *' at 0x7f384effd120> >

 

 

 

 

Continue to website...

More from Azure Synapse Analytics Blog articles

Related Posts