Skip to content

planetary_computer

sentinel_2 #

BestProductsForFeatures #

BestProductsForFeatures(
    sentinel2_tiling_grid: GeoDataFrame,
    sentinel2_tiling_grid_column: str,
    vector_features: GeoDataFrame,
    vector_features_column: str,
    date_ranges: list[str] | None = None,
    max_cloud_cover: int = 5,
    max_no_data_value: int = 5,
    logger: Logger = LOGGER,
)

Class made to facilitate and automate searching for Sentinel 2 products using the Sentinel 2 tiling grid as a reference.

Current limitation is that vector features used must fit, or be completely contained inside a single Sentinel 2 tiling grid.

For larger features, a mosaic of products will be necessary.

This class was conceived first and foremost to be used for numerous smaller vector features, like polygon grids created from geospatial_tools.vector.create_vector_grid

Parameters:

Name Type Description Default
sentinel2_tiling_grid GeoDataFrame

GeoDataFrame containing Sentinel 2 tiling grid

required
sentinel2_tiling_grid_column str

Name of the column in sentinel2_tiling_grid that contains the tile names (ex tile name: 10SDJ)

required
vector_features GeoDataFrame

GeoDataFrame containing the vector features for which the best Sentinel 2 products will be chosen for.

required
vector_features_column str

Name of the column in vector_features where the best Sentinel 2 products will be written to

required
date_ranges list[str] | None

Date range used to search for Sentinel 2 products. should be created using geospatial_tools.utils.create_date_range_for_specific_period separately, or BestProductsForFeatures.create_date_range after initialization.

None
max_cloud_cover int

Maximum cloud cover used to search for Sentinel 2 products.

5
logger Logger

Logger instance

LOGGER
Source code in geospatial_tools/planetary_computer/sentinel_2.py
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
def __init__(
    self,
    sentinel2_tiling_grid: GeoDataFrame,
    sentinel2_tiling_grid_column: str,
    vector_features: GeoDataFrame,
    vector_features_column: str,
    date_ranges: list[str] | None = None,
    max_cloud_cover: int = 5,
    max_no_data_value: int = 5,
    logger: logging.Logger = LOGGER,
):
    """

    Args:
        sentinel2_tiling_grid: GeoDataFrame containing Sentinel 2 tiling grid
        sentinel2_tiling_grid_column: Name of the column in `sentinel2_tiling_grid` that contains the tile names
            (ex tile name: 10SDJ)
        vector_features: GeoDataFrame containing the vector features for which the best Sentinel 2
            products will be chosen for.
        vector_features_column: Name of the column in `vector_features` where the best Sentinel 2 products
            will be written to
        date_ranges: Date range used to search for Sentinel 2 products. should be created using
            `geospatial_tools.utils.create_date_range_for_specific_period` separately,
            or `BestProductsForFeatures.create_date_range` after initialization.
        max_cloud_cover: Maximum cloud cover used to search for Sentinel 2 products.
        logger: Logger instance
    """
    self.logger = logger
    self.sentinel2_tiling_grid = sentinel2_tiling_grid
    self.sentinel2_tiling_grid_column = sentinel2_tiling_grid_column
    self.sentinel2_tile_list = sentinel2_tiling_grid["name"].to_list()
    self.vector_features = vector_features
    self.vector_features_column = vector_features_column
    self.vector_features_best_product_column = "best_s2_product_id"
    self.vector_features_with_products = None
    self._date_ranges = date_ranges
    self._max_cloud_cover = max_cloud_cover
    self.max_no_data_value = max_no_data_value
    self.successful_results = {}
    self.incomplete_results = []
    self.error_results = []

max_cloud_cover property writable #

max_cloud_cover

Max % of cloud cover used for Sentinel 2 product search.

date_ranges property writable #

date_ranges

Date range used to search for Sentinel 2 products.

create_date_ranges #

create_date_ranges(
    start_year: int, end_year: int, start_month: int, end_month: int
) -> list[str]

This function create a list of date ranges.

For example, I want to create date ranges for 2020 and 2021, but only for the months from March to May. I therefore expect to have 2 ranges: [2020-03-01 to 2020-05-30, 2021-03-01 to 2021-05-30].

Handles the automatic definition of the last day for the end month, as well as periods that cross over years

For example, I want to create date ranges for 2020 and 2022, but only for the months from November to January. I therefore expect to have 2 ranges: [2020-11-01 to 2021-01-31, 2021-11-01 to 2022-01-31].

Parameters:

Name Type Description Default
start_year int

Start year for ranges

required
end_year int

End year for ranges

required
start_month int

Starting month for each period

required
end_month int

End month for each period (inclusively)

required

Returns:

Type Description
list[str]

List of date ranges

Source code in geospatial_tools/planetary_computer/sentinel_2.py
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
def create_date_ranges(self, start_year: int, end_year: int, start_month: int, end_month: int) -> list[str]:
    """
    This function create a list of date ranges.

    For example, I want to create date ranges for 2020 and 2021, but only for the months from March to May.
    I therefore expect to have 2 ranges: [2020-03-01 to 2020-05-30, 2021-03-01 to 2021-05-30].

    Handles the automatic definition of the last day for the end month, as well as periods that cross over years

    For example, I want to create date ranges for 2020 and 2022, but only for the months from November to January.
    I therefore expect to have 2 ranges: [2020-11-01 to 2021-01-31, 2021-11-01 to 2022-01-31].

    Args:
      start_year: Start year for ranges
      end_year: End year for ranges
      start_month: Starting month for each period
      end_month: End month for each period (inclusively)

    Returns:
        List of date ranges
    """
    self.date_ranges = create_date_range_for_specific_period(
        start_year=start_year, end_year=end_year, start_month_range=start_month, end_month_range=end_month
    )
    return self.date_ranges

find_best_complete_products #

find_best_complete_products(
    max_cloud_cover: int | None = None, max_no_data_value: int = 5
) -> dict

Finds the best complete products for each Sentinel 2 tiles. This function will filter out all products that have more than 5% of nodata values.

Filtered out tiles will be stored in self.incomplete and tiles for which the search has found no results will be stored in self.error_list

Parameters:

Name Type Description Default
max_cloud_cover int | None

Max percentage of cloud cover allowed used for the search (Default value = None)

None
max_no_data_value int

Max percentage of no-data coverage by individual Sentinel 2 product (Default value = 5)

5

Returns:

Type Description
dict

Dictionary of product IDs and their corresponding Sentinel 2 tile names.

Source code in geospatial_tools/planetary_computer/sentinel_2.py
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
def find_best_complete_products(self, max_cloud_cover: int | None = None, max_no_data_value: int = 5) -> dict:
    """
    Finds the best complete products for each Sentinel 2 tiles. This function will filter out all products that have
    more than 5% of nodata values.

    Filtered out tiles will be stored in `self.incomplete` and tiles for which
    the search has found no results will be stored in `self.error_list`

    Args:
      max_cloud_cover: Max percentage of cloud cover allowed used for the search  (Default value = None)
      max_no_data_value: Max percentage of no-data coverage by individual Sentinel 2 product  (Default value = 5)

    Returns:
        Dictionary of product IDs and their corresponding Sentinel 2 tile names.
    """
    cloud_cover = self.max_cloud_cover
    if max_cloud_cover:
        cloud_cover = max_cloud_cover
    no_data_value = self.max_no_data_value
    if max_no_data_value:
        no_data_value = max_no_data_value

    tile_dict, incomplete_list, error_list = find_best_product_per_s2_tile(
        date_ranges=self.date_ranges,
        max_cloud_cover=cloud_cover,
        s2_tile_grid_list=self.sentinel2_tile_list,
        num_of_workers=4,
        max_no_data_value=no_data_value,
    )
    self.successful_results = tile_dict
    self.incomplete_results = incomplete_list
    if incomplete_list:
        self.logger.warning(
            "Warning, some of the input Sentinel 2 tiles do not have products covering the entire tile. "
            "These tiles will need to be handled differently (ex. creating a mosaic with multiple products"
        )
        self.logger.warning(f"Incomplete list: {incomplete_list}")
    self.error_results = error_list
    if error_list:
        self.logger.warning(
            "Warning, products for some Sentinel 2 tiles could not be found. "
            "Consider either extending date range input or max cloud cover"
        )
        self.logger.warning(f"Error list: {error_list}")
    return self.successful_results

select_best_products_per_feature #

select_best_products_per_feature() -> GeoDataFrame

Return a GeoDataFrame containing the best products for each Sentinel 2 tile.

Source code in geospatial_tools/planetary_computer/sentinel_2.py
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
def select_best_products_per_feature(self) -> GeoDataFrame:
    """Return a GeoDataFrame containing the best products for each Sentinel 2 tile."""
    spatial_join_results = spatial_join_within(
        polygon_features=self.sentinel2_tiling_grid,
        polygon_column=self.sentinel2_tiling_grid_column,
        vector_features=self.vector_features,
        vector_column_name=self.vector_features_column,
    )
    write_best_product_ids_to_dataframe(
        spatial_join_results=spatial_join_results,
        tile_dictionary=self.successful_results,
        best_product_column=self.vector_features_best_product_column,
        s2_tiles_column=self.vector_features_column,
    )
    self.vector_features_with_products = spatial_join_results
    return self.vector_features_with_products

to_file #

to_file(output_dir: str | Path) -> None

Parameters:

Name Type Description Default
output_dir str | Path

Output directory used to write to file

required
Source code in geospatial_tools/planetary_computer/sentinel_2.py
198
199
200
201
202
203
204
205
206
207
208
209
210
def to_file(self, output_dir: str | pathlib.Path) -> None:
    """

    Args:
      output_dir: Output directory used to write to file
    """
    write_results_to_file(
        cloud_cover=self.max_cloud_cover,
        successful_results=self.successful_results,
        incomplete_results=self.incomplete_results,
        error_results=self.error_results,
        output_dir=output_dir,
    )
sentinel_2_complete_tile_search(
    tile_id: int,
    date_ranges: list[str],
    max_cloud_cover: int,
    max_no_data_value: int = 5,
) -> tuple[int, str, float | None, float | None] | None

Parameters:

Name Type Description Default
tile_id int
required
date_ranges list[str]
required
max_cloud_cover int
required
max_no_data_value int

(Default value = 5)

5

Returns:

Source code in geospatial_tools/planetary_computer/sentinel_2.py
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
def sentinel_2_complete_tile_search(
    tile_id: int,
    date_ranges: list[str],
    max_cloud_cover: int,
    max_no_data_value: int = 5,
) -> tuple[int, str, float | None, float | None] | None:
    """

    Args:
      tile_id:
      date_ranges:
      max_cloud_cover:
      max_no_data_value: (Default value = 5)

    Returns:


    """
    client = StacSearch(PLANETARY_COMPUTER)
    collection = "sentinel-2-l2a"
    tile_ids = [tile_id]
    query = {"eo:cloud_cover": {"lt": max_cloud_cover}, "s2:mgrs_tile": {"in": tile_ids}}
    sortby = [{"field": "properties.eo:cloud_cover", "direction": "asc"}]

    client.search_for_date_ranges(
        date_ranges=date_ranges, collections=collection, query=query, sortby=sortby, limit=100
    )
    try:
        sorted_items = client.sort_results_by_cloud_coverage()
        if not sorted_items:
            return tile_id, "error: No results found", None, None
        filtered_items = client.filter_no_data(
            property_name="s2:nodata_pixel_percentage", max_no_data_value=max_no_data_value
        )
        if not filtered_items:
            return tile_id, "incomplete: No results found that cover the entire tile", None, None
        optimal_result = filtered_items[0]
        if optimal_result:
            return (
                tile_id,
                optimal_result.id,
                optimal_result.properties["eo:cloud_cover"],
                optimal_result.properties["s2:nodata_pixel_percentage"],
            )

    except (IndexError, TypeError) as error:
        print(error)
        return tile_id, f"error: {error}", None, None

find_best_product_per_s2_tile #

find_best_product_per_s2_tile(
    date_ranges: list[str],
    max_cloud_cover: int,
    s2_tile_grid_list: list,
    max_no_data_value: int = 5,
    num_of_workers: int = 4,
)

Parameters:

Name Type Description Default
date_ranges list[str]
required
max_cloud_cover int
required
s2_tile_grid_list list
required
max_no_data_value int

(Default value = 5)

5
num_of_workers int

(Default value = 4)

4

Returns:

Source code in geospatial_tools/planetary_computer/sentinel_2.py
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
def find_best_product_per_s2_tile(
    date_ranges: list[str],
    max_cloud_cover: int,
    s2_tile_grid_list: list,
    max_no_data_value: int = 5,
    num_of_workers: int = 4,
):
    """

    Args:
      date_ranges:
      max_cloud_cover:
      s2_tile_grid_list:
      max_no_data_value:  (Default value = 5)
      num_of_workers: (Default value = 4)

    Returns:


    """
    successful_results = {}
    for tile in s2_tile_grid_list:
        successful_results[tile] = ""
    incomplete_results = []
    error_results = []
    with ThreadPoolExecutor(max_workers=num_of_workers) as executor:
        future_to_tile = {
            executor.submit(
                sentinel_2_complete_tile_search,
                tile_id=tile,
                date_ranges=date_ranges,
                max_cloud_cover=max_cloud_cover,
                max_no_data_value=max_no_data_value,
            ): tile
            for tile in s2_tile_grid_list
        }

        for future in as_completed(future_to_tile):
            tile_id, optimal_result_id, max_cloud_cover, no_data = future.result()
            if optimal_result_id.startswith("error:"):
                error_results.append(tile_id)
                continue
            if optimal_result_id.startswith("incomplete:"):
                incomplete_results.append(tile_id)
                continue
            successful_results[tile_id] = {"id": optimal_result_id, "cloud_cover": max_cloud_cover, "no_data": no_data}
        cleaned_successful_results = {k: v for k, v in successful_results.items() if v != ""}
    return cleaned_successful_results, incomplete_results, error_results

write_best_product_ids_to_dataframe #

write_best_product_ids_to_dataframe(
    spatial_join_results: GeoDataFrame,
    tile_dictionary: dict,
    best_product_column: str = "best_s2_product_id",
    s2_tiles_column: str = "s2_tiles",
    logger: Logger = LOGGER,
)

Parameters:

Name Type Description Default
spatial_join_results GeoDataFrame
required
tile_dictionary dict
required
best_product_column str
'best_s2_product_id'
s2_tiles_column str
's2_tiles'
logger Logger
LOGGER

Returns:

Source code in geospatial_tools/planetary_computer/sentinel_2.py
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
def write_best_product_ids_to_dataframe(
    spatial_join_results: GeoDataFrame,
    tile_dictionary: dict,
    best_product_column: str = "best_s2_product_id",
    s2_tiles_column: str = "s2_tiles",
    logger: logging.Logger = LOGGER,
):
    """

    Args:
      spatial_join_results:
      tile_dictionary:
      best_product_column:
      s2_tiles_column:
      logger:

    Returns:


    """
    logger.info("Writing best product IDs to dataframe")
    spatial_join_results[best_product_column] = spatial_join_results[s2_tiles_column].apply(
        lambda x: _get_best_product_id_for_each_grid_tile(s2_tile_search_results=tile_dictionary, feature_s2_tiles=x)
    )

write_results_to_file #

write_results_to_file(
    cloud_cover: int,
    successful_results: dict,
    incomplete_results: list | None = None,
    error_results: list | None = None,
    output_dir: str | Path = DATA_DIR,
    logger: Logger = LOGGER,
) -> dict

Parameters:

Name Type Description Default
cloud_cover int
required
successful_results dict
required
incomplete_results list | None
None
error_results list | None
None
output_dir str | Path
DATA_DIR
logger Logger
LOGGER

Returns:

Source code in geospatial_tools/planetary_computer/sentinel_2.py
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
def write_results_to_file(
    cloud_cover: int,
    successful_results: dict,
    incomplete_results: list | None = None,
    error_results: list | None = None,
    output_dir: str | pathlib.Path = DATA_DIR,
    logger: logging.Logger = LOGGER,
) -> dict:
    """

    Args:
      cloud_cover:
      successful_results:
      incomplete_results:
      error_results:
      output_dir:
      logger:

    Returns:


    """
    tile_filename = output_dir / f"data_lt{cloud_cover}cc.json"
    with open(tile_filename, "w", encoding="utf-8") as json_file:
        json.dump(successful_results, json_file, indent=4)
    logger.info(f"Results have been written to {tile_filename}")

    incomplete_filename = "None"
    if incomplete_results:
        incomplete_dict = {"incomplete": incomplete_results}
        incomplete_filename = output_dir / f"incomplete_lt{cloud_cover}cc.json"
        with open(incomplete_filename, "w", encoding="utf-8") as json_file:
            json.dump(incomplete_dict, json_file, indent=4)
        logger.info(f"Incomplete results have been written to {incomplete_filename}")

    error_filename = "None"
    if error_results:
        error_dict = {"errors": error_results}
        error_filename = output_dir / f"errors_lt{cloud_cover}cc.json"
        with open(error_filename, "w", encoding="utf-8") as json_file:
            json.dump(error_dict, json_file, indent=4)
        logger.info(f"Errors results have been written to {error_filename}")

    return {
        "tile_filename": tile_filename,
        "incomplete_filename": incomplete_filename,
        "errors_filename": error_filename,
    }

download_and_process_sentinel2_asset #

download_and_process_sentinel2_asset(
    product_id: str,
    product_bands: list[str],
    collections: str = "sentinel-2-l2a",
    target_projection: int | str | None = None,
    base_directory: str | Path = DATA_DIR,
    delete_intermediate_files: bool = False,
    logger: Logger = LOGGER,
) -> Asset

This function downloads a Sentinel 2 product based on the product ID provided.

It will download the individual asset bands provided in the bands argument, merge then all in a single tif and then reproject them to the input CRS.

Parameters:

Name Type Description Default
product_id str

ID of the Sentinel 2 product to be downloaded

required
product_bands list[str]

List of the product bands to be downloaded

required
collections str

Collections to be downloaded from. Defaults to sentinel-2-l2a

'sentinel-2-l2a'
target_projection int | str | None

The CRS project for the end product. If None, the reprojection step will be skipped

None
stac_client

StacSearch client to used. A new one will be created if not provided

required
base_directory str | Path

The base directory path where the downloaded files will be stored

DATA_DIR
delete_intermediate_files bool

Flag to determine if intermediate files should be deleted. Defaults to False

False
logger Logger

Logger instance

LOGGER

Returns:

Type Description
Asset

Asset instance

Source code in geospatial_tools/planetary_computer/sentinel_2.py
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
def download_and_process_sentinel2_asset(
    product_id: str,
    product_bands: list[str],
    collections: str = "sentinel-2-l2a",
    target_projection: int | str | None = None,
    base_directory: str | pathlib.Path = DATA_DIR,
    delete_intermediate_files: bool = False,
    logger: logging.Logger = LOGGER,
) -> Asset:
    """
    This function downloads a Sentinel 2 product based on the product ID provided.

    It will download the individual asset bands provided in the `bands` argument,
    merge then all in a single tif and then reproject them to the input CRS.

    Args:
      product_id: ID of the Sentinel 2 product to be downloaded
      product_bands: List of the product bands to be downloaded
      collections: Collections to be downloaded from. Defaults to `sentinel-2-l2a`
      target_projection: The CRS project for the end product. If `None`, the reprojection step will be
        skipped
      stac_client: StacSearch client to used. A new one will be created if not provided
      base_directory: The base directory path where the downloaded files will be stored
      delete_intermediate_files: Flag to determine if intermediate files should be deleted. Defaults to False
      logger: Logger instance

    Returns:
        Asset instance
    """
    base_file_name = f"{base_directory}/{product_id}"
    merged_file = f"{base_file_name}_merged.tif"
    reprojected_file = f"{base_file_name}_reprojected.tif"

    merged_file_exists = pathlib.Path(merged_file).exists()
    reprojected_file_exists = pathlib.Path(reprojected_file).exists()

    if reprojected_file_exists:
        logger.info(f"Reprojected file [{reprojected_file}] already exists")
        asset = Asset(asset_id=product_id, bands=product_bands, reprojected_asset=reprojected_file)
        return asset

    if merged_file_exists:
        logger.info(f"Merged file [{merged_file}] already exists")
        asset = Asset(asset_id=product_id, bands=product_bands, merged_asset_path=merged_file)
        if target_projection:
            logger.info(f"Reprojecting merged file [{merged_file}]")
            asset.reproject_merged_asset(
                base_directory=base_directory,
                target_projection=target_projection,
                delete_merged_asset=delete_intermediate_files,
            )
        return asset

    stac_client = StacSearch(catalog_name=PLANETARY_COMPUTER)
    items = stac_client.search(collections=collections, ids=[product_id])
    logger.info(items)
    asset_list = stac_client.download_search_results(bands=product_bands, base_directory=base_directory)
    logger.info(asset_list)
    asset = asset_list[0]
    asset.merge_asset(base_directory=base_directory, delete_sub_items=delete_intermediate_files)
    if not target_projection:
        logger.info("Skipping reprojection")
        return asset
    if target_projection:
        asset.reproject_merged_asset(
            target_projection=target_projection,
            base_directory=base_directory,
            delete_merged_asset=delete_intermediate_files,
        )
    return asset