Land cover classification at the Mississippi Delta¶

This notebook uses a k-means unsupervised clustering algorithm to group pixels by similar spectral signatures. k-means is an exploratory method for finding patterns in data. Because it is unsupervised, no training data is needed for the model. You also can’t measure how well it “performs” because the clusters will not correspond to any particular land cover class. However, we expect at least some of the clusters to be identifiable as different types of land cover.

We will use the harmonized Sentinal/Landsat multispectral dataset. The data is accessible with an Earthdata account and the earthaccess library from NSIDC:

STEP 1: SET UP¶

In [1]:
import os
import pathlib
import pickle
import re
import requests
import warnings

import cartopy.crs as ccrs
import datetime
import earthaccess
import earthpy as et
import geopandas as gpd
import geoviews as gv
import hvplot.pandas
import hvplot.xarray
import numpy as np
import pandas as pd
import rioxarray as rxr
import rioxarray.merge as rxrmerge
from tqdm.notebook import tqdm
import xarray as xr
from shapely.geometry import Polygon
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

os.environ["GDAL_HTTP_MAX_RETRY"] = "5"
os.environ["GDAL_HTTP_RETRY_DELAY"] = "1"

warnings.simplefilter('ignore')
c:\Users\tjsto\miniconda3\envs\earth-analytics-python\Lib\site-packages\dask\dataframe\__init__.py:42: FutureWarning: 
Dask dataframe query planning is disabled because dask-expr is not installed.

You can install it with `pip install dask[dataframe]` or `conda install dask`.
This will raise in a future version.

  warnings.warn(msg, FutureWarning)
In [2]:
# define data directories
data_dir = os.path.join(
    pathlib.Path.home(),
    'earth-analytics',
    'data',
    # Project Directory
    'land-cover-clustering'
)
# create
os.makedirs(data_dir, exist_ok=True)
In [3]:
def cached(func_key, override=False):
    """
    A decorator to cache function results
    
    Parameters
    ==========
    key: str
      File basename used to save pickled results
    override: bool
      When True, re-compute even if the results are already stored
    """
    def compute_and_cache_decorator(compute_function):
        """
        Wrap the caching function
        
        Parameters
        ==========
        compute_function: function
          The function to run and cache results
        """
        def compute_and_cache(*args, **kwargs):
            """
            Perform a computation and cache, or load cached result.
            
            Parameters
            ==========
            args
              Positional arguments for the compute function
            kwargs
              Keyword arguments for the compute function
            """
            # Add an identifier from the particular function call
            if 'cache_key' in kwargs:
                key = '_'.join((func_key, kwargs['cache_key']))
            else:
                key = func_key

            path = os.path.join(
                et.io.HOME, et.io.DATA_NAME, 'jars', f'{key}.pickle')
            
            # Check if the cache exists already or override caching
            if not os.path.exists(path) or override:
                # Make jars directory if needed
                os.makedirs(os.path.dirname(path), exist_ok=True)
                
                # Run the compute function as the user did
                result = compute_function(*args, **kwargs)
                
                # Pickle the object
                with open(path, 'wb') as file:
                    pickle.dump(result, file)
            else:
                # Unpickle the object
                with open(path, 'rb') as file:
                    result = pickle.load(file)
                    
            return result
        
        return compute_and_cache
    
    return compute_and_cache_decorator

STEP 2: STUDY SITE¶

For this analysis, you will use a watershed from the Water Boundary Dataset, HU2 watersheds (WBDHU2.shp).

Download the Water Boundary Dataset for region 8 (Mississippi)

  • Select watershed 080902030506
  • Generate a site map of the watershed
  • We chose this watershed because it covers parts of New Orleans an is near the Mississippi Delta. Deltas are boundary areas between the land and the ocean, and as a result tend to contain a rich variety of different land cover and land use types.

    In [42]:
    @cached('wbd_08')
    # url https://prd-tnm.s3.amazonaws.com/StagedProducts/Hydrography/WBD/HU2/Shape/WBD_08_HU2_Shape.zip
    def read_wbd(wbd_filename, huc_level, cache_key):
        # URL for Watershed Boundary Region 8 shape file
        wbd_url = (
            "https://prd-tnm.s3.amazonaws.com/"
            "StagedProducts/Hydrography/WBD/HU2/Shape/"
            f"{wbd_filename}.zip")
        wbd_dir = et.data.get_data(url=wbd_url)
    
        wbd_path = os.path.join(wbd_dir, 'Shape', f'WBDHU{huc_level}.shp')
        wbd_gdf = gpd.read_file(wbd_path, engine='pyogrio')
        # # Read data to gdf
        # if not os.path.exists(wbd_path):
        #     wbd_gdf = gpd.read_file(wbd_url, engine='pyogrio')
        #     wbd_gdf.to_file(wbd_path)
        # else:
        #     wbd_gdf = gpd.read_file(wbd_path, engine='pyogrio')
        
        return wbd_gdf
    
    huc_level = 12
    wbd_gdf = read_wbd('WBD_08_HU2_Shape', huc_level, cache_key=f'hu{huc_level}')
    
    delta_gdf = (wbd_gdf[wbd_gdf[f'huc{huc_level}']=='080902030506']
                 .dissolve())
    
    delta_map = (
        delta_gdf.to_crs(ccrs.Mercator())
        .hvplot(
            alpha=0.2, fill_color='white',
            crs=ccrs.Mercator(), tiles='EsriImagery',
            title='Louisiana Delta Watershed 080902030506')
            .opts(width=600, height=300)
    )
    delta_map
    
    Out[42]:

    The Mississippi River Delta

    This watershed nestled between the Mississippi River to the West and the small town of Delacroix, Louisiana to the East, near the confluence of the Mississippi River and the Gulf of Mexico. To the Northeast lies Lake Lery, and Grank Lake and Petit Lake to the Southeast. It is a part of a large expanse of coastal wetlands making up 37% of the estuarine herbaceous marshes in the contiguous United States.

    Reference:

    Couvillion, B.R., Barras, J.A., Steyer, G.D., Sleavin, William, Fischer, Michelle, Beck, Holly, Trahan, Nadine, Griffin, Brad, and Heckman, David, 2011, Land area change in coastal Louisiana from 1932 to 2010: U.S. Geological Survey Scientific Investigations Map 3164, scale 1:265,000, 12 p. pamphlet.

    STEP 3: MULTISPECTRAL DATA¶

    Search for data¶

    Log in to the earthaccess service using your Earthdata credentials: python earthaccess.login(persist=True)

  • Modify the following sample code to search for granules of the HLSL30 product overlapping the watershed boundary from May to October 2023 (there should be 76 granules): python results = earthaccess.search_data( short_name="...", cloud_hosted=True, bounding_box=tuple(gdf.total_bounds), temporal=("...", "..."), )
  • In [5]:
    # Log in to earthaccess
    earthaccess.login(strategy='interactive', persist=True)
    # Search for HLS tiles
    delta_results = earthaccess.search_data(
        short_name='HLSL30',
        cloud_hosted=True,
        bounding_box=tuple(delta_gdf.total_bounds),
        temporal=("2023-05", "2023-10")
    )
    

    Compile information about each granule¶

    For each search result:

    1. Get the following information (HINT: look at the [‘umm’] values for each search result):
      • granule id (UR)
      • datetime
      • geometry (HINT: check out the shapely.geometry.Polygon class to convert points to a Polygon)
    2. Open the granule files. I recomment opening one granule at a time, e.g. with (earthaccess.open([result]).
    3. For each file (band), get the following information:
      • file handler returned from earthaccess.open()
      • tile id
      • band number
  • Compile all the information you collected into a GeoDataFrame
  • In [6]:
    # Regular expression to search for metadata
    url_re = re.compile(
        r'\.(?P<tile_id>\w+)\.\d+T\d+\.v\d\.\d\.(?P<band>[A-Za-z0-9]+)\.tif')
    # Loop through each granule
    granules = []
    
    for result in delta_results:
        # Get granule information
        granule_id = result['umm']['GranuleUR']
        date_time = pd.to_datetime(
            result['umm']['TemporalExtent']['RangeDateTime']['BeginningDateTime'])
        points = (result['umm']['SpatialExtent']['HorizontalSpatialDomain']
                  ['Geometry']['GPolygons'][0]['Boundary']['Points'])
        geometry = Polygon([(p['Longitude'], p['Latitude']) for p in points])
        # Get URL
        files = earthaccess.open([result])
        # Build metadata DataFrame rows
        for file in files:
            match = url_re.search(file.full_name)
            if match is not None:
                granules.append(
                    gpd.GeoDataFrame(
                        dict(
                            granule_id = [granule_id],
                            date_time = [date_time],
                            geometry = [geometry],
                            tile_id = [match.group('tile_id')],
                            band_id = [match.group('band')],
                            url=[file]
                        ),
                        crs="EPSG:4326"
                    )
                )
            
    
    # Concatenate metadata DataFrame
    metadata_df = pd.concat(granules).reset_index(drop=True)
    metadata_df
    
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    QUEUEING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    PROCESSING TASKS | :   0%|          | 0/15 [00:00<?, ?it/s]
    COLLECTING RESULTS | :   0%|          | 0/15 [00:00<?, ?it/s]
    Out[6]:
    granule_id date_time geometry tile_id band_id url
    0 HLS.L30.T16RBT.2023124T163132.v2.0 2023-05-04 16:31:32.101000+00:00 POLYGON ((-90.07373 28.80445, -89.57473 28.814... T16RBT B03 <File-like object HTTPFileSystem, https://data...
    1 HLS.L30.T16RBT.2023124T163132.v2.0 2023-05-04 16:31:32.101000+00:00 POLYGON ((-90.07373 28.80445, -89.57473 28.814... T16RBT B07 <File-like object HTTPFileSystem, https://data...
    2 HLS.L30.T16RBT.2023124T163132.v2.0 2023-05-04 16:31:32.101000+00:00 POLYGON ((-90.07373 28.80445, -89.57473 28.814... T16RBT B06 <File-like object HTTPFileSystem, https://data...
    3 HLS.L30.T16RBT.2023124T163132.v2.0 2023-05-04 16:31:32.101000+00:00 POLYGON ((-90.07373 28.80445, -89.57473 28.814... T16RBT B02 <File-like object HTTPFileSystem, https://data...
    4 HLS.L30.T16RBT.2023124T163132.v2.0 2023-05-04 16:31:32.101000+00:00 POLYGON ((-90.07373 28.80445, -89.57473 28.814... T16RBT B01 <File-like object HTTPFileSystem, https://data...
    ... ... ... ... ... ... ...
    1315 HLS.L30.T15RYP.2023300T163218.v2.0 2023-10-27 16:32:18.830000+00:00 POLYGON ((-89.79864 29.70348, -89.76644 30.692... T15RYP B06 <File-like object HTTPFileSystem, https://data...
    1316 HLS.L30.T15RYP.2023300T163218.v2.0 2023-10-27 16:32:18.830000+00:00 POLYGON ((-89.79864 29.70348, -89.76644 30.692... T15RYP B02 <File-like object HTTPFileSystem, https://data...
    1317 HLS.L30.T15RYP.2023300T163218.v2.0 2023-10-27 16:32:18.830000+00:00 POLYGON ((-89.79864 29.70348, -89.76644 30.692... T15RYP B09 <File-like object HTTPFileSystem, https://data...
    1318 HLS.L30.T15RYP.2023300T163218.v2.0 2023-10-27 16:32:18.830000+00:00 POLYGON ((-89.79864 29.70348, -89.76644 30.692... T15RYP B03 <File-like object HTTPFileSystem, https://data...
    1319 HLS.L30.T15RYP.2023300T163218.v2.0 2023-10-27 16:32:18.830000+00:00 POLYGON ((-89.79864 29.70348, -89.76644 30.692... T15RYP B04 <File-like object HTTPFileSystem, https://data...

    1320 rows × 6 columns

    Open, crop, and mask data¶

    For each granule:

    1. Open the Fmask band, crop, and compute a quality mask for the granule. You can use the following code as a starting point, making sure that mask_bits contains the quality bits you want to consider: ```python # Expand into a new dimension of binary bits bits = ( np.unpackbits(da.astype(np.uint8), bitorder=‘little’) .reshape(da.shape + (-1,)) )

      # Select the required bits and check if any are flagged mask = np.prod(bits[…, mask_bits]==0, axis=-1) ```

    2. For each band that starts with ‘B’:

      1. Open the band, crop, and apply the scale factor
      2. Name the DataArray after the band using the .name attribute
      3. Apply the cloud mask using the .where() method
      4. Store the DataArray in your data structure (e.g. adding a GeoDataFrame column with the DataArray in it. Note that you will need to remove the rows for unused bands)
    In [7]:
    @cached('delta_reflectance')
    def process_reflectance(meta_df, bounds_gdf):
        """    """
        def open_dataarray(url, boundary_proj_gdf, scale=1, masked=True):
            """
            Load, crop, and scale a raster image from earthaccess
    
            Parameters
            ----------
            url: file-like or path-like
            File accessor downloaded or obtained from earthaccess
            bounds_gdf: gpd.GeoDataFrame
            Area of interest to crop to
    
            Returns
            -------
            cropped_da: rxr.DataArray
            Processed raster
            """
            # Load tile of data
            da = rxr.open_rasterio(url, masked=masked).squeeze() * scale
    
            # Define study bounds
            if boundary_proj_gdf is None:
                boundary_proj_gdf= bounds_gdf.to_crs(da.rio.crs)
                
            # Crop raster data
            cropped_da = da.rio.clip_box(*boundary_proj_gdf.total_bounds)
    
            return cropped_da
        
        def process_cloud_mask(da, bits_to_mask=[1,2,3]):
            """
            Load an 8-bit Fmask file and process to a boolean mask
    
            Parameters
            ----------
            da: DataArray of Fmask file
            bits_to_mask: list of int
            The indices of the bits to mask if set
    
            Returns
            -------
            cloud_mask: np.array
            Cloud mask
            """
    
            bits = (
                np.unpackbits(
                    da.astype(np.uint8), bitorder='little'
                ).reshape(da.shape + (-1,))
            )
            
            cloud_mask = np.prod(bits[..., bits_to_mask]==0, axis=-1)
    
            return cloud_mask
        
        granule_da_rows = []
        boundary_proj_gdf = None
    
        # Loop through each image
        group_iter = meta_df.groupby(['date_time', 'tile_id'])
        for (date_time, tile_id), granule_df in group_iter:
            print(f"Processing granule {tile_id} {date_time}")
            # Open granule cloud cover
            Fmask_url = (
                granule_df.loc[granule_df.band_id=='Fmask', 'url']
                .values[0])
            cloud_mask_da = open_dataarray(
                Fmask_url, boundary_proj_gdf, masked=False)
            
            # Compute cloud mask
            cloud_mask = process_cloud_mask(cloud_mask_da)
    
            # Loop through each spectral band
            da_list = []
            df_list = []
            for i, row in granule_df.iterrows():
                if row.band_id.startswith('B'):
                    # Open and crop the band
                    cropped_band = open_dataarray(
                        row.url, boundary_proj_gdf, scale=0.0001)
                    cropped_band.name = row.band_id
                    # Mask the band, add to metadata Dataframe
                    row['masked_da'] = cropped_band.where(cloud_mask)
                    # Add the DataArray to the metadata DataFrame row
                    granule_da_rows.append(row.to_frame().T)
    
        # Reassemble the metadata DataFrame
        return pd.concat(granule_da_rows)
    
    reflectance_da_df = process_reflectance(metadata_df, delta_gdf)
    
    In [8]:
    reflectance_da_df
    
    Out[8]:
    granule_id date_time geometry tile_id band_id url masked_da
    15 HLS.L30.T15RYN.2023124T163132.v2.0 2023-05-04 16:31:32.101000+00:00 POLYGON ((-89.82661214 28.80213717, -89.795837... T15RYN B03 <File-like object HTTPFileSystem, https://data... [[<xarray.DataArray 'B03' ()> Size: 4B\narray(...
    17 HLS.L30.T15RYN.2023124T163132.v2.0 2023-05-04 16:31:32.101000+00:00 POLYGON ((-89.82661214 28.80213717, -89.795837... T15RYN B01 <File-like object HTTPFileSystem, https://data... [[<xarray.DataArray 'B01' ()> Size: 4B\narray(...
    18 HLS.L30.T15RYN.2023124T163132.v2.0 2023-05-04 16:31:32.101000+00:00 POLYGON ((-89.82661214 28.80213717, -89.795837... T15RYN B02 <File-like object HTTPFileSystem, https://data... [[<xarray.DataArray 'B02' ()> Size: 4B\narray(...
    19 HLS.L30.T15RYN.2023124T163132.v2.0 2023-05-04 16:31:32.101000+00:00 POLYGON ((-89.82661214 28.80213717, -89.795837... T15RYN B06 <File-like object HTTPFileSystem, https://data... [[<xarray.DataArray 'B06' ()> Size: 4B\narray(...
    20 HLS.L30.T15RYN.2023124T163132.v2.0 2023-05-04 16:31:32.101000+00:00 POLYGON ((-89.82661214 28.80213717, -89.795837... T15RYN B09 <File-like object HTTPFileSystem, https://data... [[<xarray.DataArray 'B09' ()> Size: 4B\narray(...
    ... ... ... ... ... ... ... ...
    1269 HLS.L30.T16RBU.2023300T163218.v2.0 2023-10-27 16:32:18.830000+00:00 POLYGON ((-90.10082051 29.70587294, -89.305832... T16RBU B05 <File-like object HTTPFileSystem, https://data... [[<xarray.DataArray 'B05' ()> Size: 4B\narray(...
    1271 HLS.L30.T16RBU.2023300T163218.v2.0 2023-10-27 16:32:18.830000+00:00 POLYGON ((-90.10082051 29.70587294, -89.305832... T16RBU B11 <File-like object HTTPFileSystem, https://data... [[<xarray.DataArray 'B11' ()> Size: 4B\narray(...
    1272 HLS.L30.T16RBU.2023300T163218.v2.0 2023-10-27 16:32:18.830000+00:00 POLYGON ((-90.10082051 29.70587294, -89.305832... T16RBU B01 <File-like object HTTPFileSystem, https://data... [[<xarray.DataArray 'B01' ()> Size: 4B\narray(...
    1273 HLS.L30.T16RBU.2023300T163218.v2.0 2023-10-27 16:32:18.830000+00:00 POLYGON ((-90.10082051 29.70587294, -89.305832... T16RBU B02 <File-like object HTTPFileSystem, https://data... [[<xarray.DataArray 'B02' ()> Size: 4B\narray(...
    1274 HLS.L30.T16RBU.2023300T163218.v2.0 2023-10-27 16:32:18.830000+00:00 POLYGON ((-90.10082051 29.70587294, -89.305832... T16RBU B06 <File-like object HTTPFileSystem, https://data... [[<xarray.DataArray 'B06' ()> Size: 4B\narray(...

    880 rows × 7 columns

    Merge and Composite Data¶

    You will notice for this watershed that: 1. The raster data for each date are spread across 4 granules 2. Any given image is incomplete because of clouds

    Try It
    1. For each band:

      1. For each date:

        1. Merge all 4 granules
        2. Mask any negative values created by interpolating from the nodata value of -9999 (rioxarray should account for this, but doesn’t appear to when merging. If you leave these values in they will create problems down the line)
      2. Concatenate the merged DataArrays along a new date dimension

      3. Take the mean in the date dimension to create a composite image that fills cloud gaps

      4. Add the band as a dimension, and give the DataArray a name

    2. Concatenate along the band dimension

    In [9]:
    @cached('delta_reflectance_da')
    def merge_composite_arrays(granule_da_df):
        # Merge and composite and image for each band
        df_list = []
        da_list = []
        for band_id, band_df in tqdm(granule_da_df.groupby('band_id')):
            merged_das = []
            for date_time, date_df in tqdm(band_df.groupby('date_time')):
                # Merge granules for each date
                merged_da = rxrmerge.merge_arrays(list(date_df.masked_da))
                # Mask negative values
                merged_da = merged_da.where(merged_da>0)
                merged_das.append(merged_da)
    
            # Composite images across dates
            composite_da = xr.concat(merged_das, dim='date_time').median('date_time')
            composite_da['band_id'] = int(band_id[1:])
            composite_da.name = 'reflectance'
            da_list.append(composite_da)
        
        return xr.concat(da_list, dim='band_id')
    
    reflectance_da = merge_composite_arrays(reflectance_da_df)
    reflectance_da
    
    Out[9]:
    <xarray.DataArray 'reflectance' (band_id: 10, y: 556, x: 624)> Size: 14MB
    array([[[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
    ...
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]],
    
           [[nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            ...,
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan],
            [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)
    Coordinates:
      * x            (x) float64 5kB 7.926e+05 7.926e+05 ... 8.112e+05 8.113e+05
      * y            (y) float64 4kB 3.304e+06 3.304e+06 ... 3.287e+06 3.287e+06
        band         int64 8B 1
        spatial_ref  int64 8B 0
      * band_id      (band_id) int64 80B 1 2 3 4 5 6 7 9 10 11
    xarray.DataArray
    'reflectance'
    • band_id: 10
    • y: 556
    • x: 624
    • nan nan nan nan nan nan nan nan ... nan nan nan nan nan nan nan nan
      array([[[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
      ...
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]],
      
             [[nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              ...,
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan],
              [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)
      • x
        (x)
        float64
        7.926e+05 7.926e+05 ... 8.113e+05
        array([792568.062907, 792598.062907, 792628.062907, ..., 811198.062907,
               811228.062907, 811258.062907])
      • y
        (y)
        float64
        3.304e+06 3.304e+06 ... 3.287e+06
        array([3303783.495888, 3303753.495888, 3303723.495888, ..., 3287193.495888,
               3287163.495888, 3287133.495888])
      • band
        ()
        int64
        1
        array(1)
      • spatial_ref
        ()
        int64
        0
        crs_wkt :
        PROJCS["WGS 84 / UTM zone 15N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-93],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32615"]]
        semi_major_axis :
        6378137.0
        semi_minor_axis :
        6356752.314245179
        inverse_flattening :
        298.257223563
        reference_ellipsoid_name :
        WGS 84
        longitude_of_prime_meridian :
        0.0
        prime_meridian_name :
        Greenwich
        geographic_crs_name :
        WGS 84
        horizontal_datum_name :
        World Geodetic System 1984
        projected_crs_name :
        WGS 84 / UTM zone 15N
        grid_mapping_name :
        transverse_mercator
        latitude_of_projection_origin :
        0.0
        longitude_of_central_meridian :
        -93.0
        false_easting :
        500000.0
        false_northing :
        0.0
        scale_factor_at_central_meridian :
        0.9996
        spatial_ref :
        PROJCS["WGS 84 / UTM zone 15N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-93],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32615"]]
        GeoTransform :
        792553.062907047 30.0 0.0 3303798.4958882914 0.0 -30.0
        array(0)
      • band_id
        (band_id)
        int64
        1 2 3 4 5 6 7 9 10 11
        array([ 1,  2,  3,  4,  5,  6,  7,  9, 10, 11])
      • x
        PandasIndex
        PandasIndex(Index([792568.062907047, 792598.062907047, 792628.062907047, 792658.062907047,
               792688.062907047, 792718.062907047, 792748.062907047, 792778.062907047,
               792808.062907047, 792838.062907047,
               ...
               810988.062907047, 811018.062907047, 811048.062907047, 811078.062907047,
               811108.062907047, 811138.062907047, 811168.062907047, 811198.062907047,
               811228.062907047, 811258.062907047],
              dtype='float64', name='x', length=624))
      • y
        PandasIndex
        PandasIndex(Index([3303783.4958882914, 3303753.4958882914, 3303723.4958882914,
               3303693.4958882914, 3303663.4958882914, 3303633.4958882914,
               3303603.4958882914, 3303573.4958882914, 3303543.4958882914,
               3303513.4958882914,
               ...
               3287403.4958882914, 3287373.4958882914, 3287343.4958882914,
               3287313.4958882914, 3287283.4958882914, 3287253.4958882914,
               3287223.4958882914, 3287193.4958882914, 3287163.4958882914,
               3287133.4958882914],
              dtype='float64', name='y', length=556))
      • band_id
        PandasIndex
        PandasIndex(Index([1, 2, 3, 4, 5, 6, 7, 9, 10, 11], dtype='int64', name='band_id'))

    STEP 4: K-MEANS¶

    Cluster the data by spectral signature using the k-means algorithm.

    In [34]:
    # Convert spectral DataArray to a tidy DataFrame
    model_df = reflectance_da.to_dataframe().reflectance.unstack('band_id')
    model_df = model_df.drop(columns=[10,11]).dropna()
    # Running the fit and predict functions at the same time.
    # We can do this since we don't have target data.
    k_means = KMeans(n_clusters=4)
    
    k_means.fit(model_df[[1, 2, 3, 4, 5, 6, 7, 9]])
    # Add the predicted values back to the model DataFrame
    model_df['k_means_labels'] = k_means.labels_
    
    In [35]:
    model_vars = model_df[[1, 2, 3, 4, 5, 6, 7, 9]]
    model_vars
    
    Out[35]:
    band_id 1 2 3 4 5 6 7 9
    y x
    3.303783e+06 810148.062907 0.00670 0.0114 0.0219 0.0200 0.0083 0.0038 0.00490 0.0007
    810178.062907 0.00810 0.0124 0.0217 0.0196 0.0095 0.0072 0.00440 0.0008
    810208.062907 0.01070 0.0127 0.0233 0.0201 0.0077 0.0035 0.00785 0.0008
    810238.062907 0.00810 0.0159 0.0255 0.0227 0.0445 0.0198 0.00940 0.0008
    810268.062907 0.00750 0.0171 0.0258 0.0227 0.0662 0.0189 0.01070 0.0007
    ... ... ... ... ... ... ... ... ... ...
    3.287163e+06 793798.062907 0.02520 0.0361 0.0639 0.0458 0.0448 0.0307 0.02290 0.0010
    793828.062907 0.02775 0.0354 0.0621 0.0434 0.0440 0.0290 0.02100 0.0008
    793858.062907 0.03260 0.0382 0.0639 0.0459 0.0487 0.0309 0.02320 0.0008
    793888.062907 0.03615 0.0443 0.0711 0.0541 0.0556 0.0384 0.02950 0.0008
    793918.062907 0.03430 0.0451 0.0727 0.0560 0.0548 0.0396 0.03010 0.0009

    318263 rows × 8 columns

    In [36]:
    model_df
    
    Out[36]:
    band_id 1 2 3 4 5 6 7 9 k_means_labels
    y x
    3.303783e+06 810148.062907 0.00670 0.0114 0.0219 0.0200 0.0083 0.0038 0.00490 0.0007 0
    810178.062907 0.00810 0.0124 0.0217 0.0196 0.0095 0.0072 0.00440 0.0008 0
    810208.062907 0.01070 0.0127 0.0233 0.0201 0.0077 0.0035 0.00785 0.0008 0
    810238.062907 0.00810 0.0159 0.0255 0.0227 0.0445 0.0198 0.00940 0.0008 0
    810268.062907 0.00750 0.0171 0.0258 0.0227 0.0662 0.0189 0.01070 0.0007 0
    ... ... ... ... ... ... ... ... ... ... ...
    3.287163e+06 793798.062907 0.02520 0.0361 0.0639 0.0458 0.0448 0.0307 0.02290 0.0010 0
    793828.062907 0.02775 0.0354 0.0621 0.0434 0.0440 0.0290 0.02100 0.0008 0
    793858.062907 0.03260 0.0382 0.0639 0.0459 0.0487 0.0309 0.02320 0.0008 0
    793888.062907 0.03615 0.0443 0.0711 0.0541 0.0556 0.0384 0.02950 0.0008 0
    793918.062907 0.03430 0.0451 0.0727 0.0560 0.0548 0.0396 0.03010 0.0009 0

    318263 rows × 9 columns

    STEP 5: PLOT¶

    In [38]:
    rgb = reflectance_da.sel(band_id=[2, 3, 4])
    rgb_uint8 = (rgb*255).astype(np.uint8).where(rgb!=np.nan)
    rgb_bright = rgb_uint8 * 10
    rgb_sat = rgb_bright.where(rgb_bright < 255, 255)
    
    # Plot the k-means clusters
    combined_plot = (
        rgb_sat.hvplot.rgb(
            x='x', y='y', bands='band_id',
            data_aspect=1,
            xaxis=None, yaxis=None,
            title='RGB Reflectance of the\nMississippi River Delta')
        + 
        model_df.k_means_labels.to_xarray().sortby(['x', 'y']).hvplot(
            cmap="Colorblind", aspect='equal',
            title='Land Cover Classification of\nthe Mississippi River Delta') 
    )
    
    combined_plot
    
    Out[38]:

    LAND COVER CLASSIFICATION REFLECTS PHYSIOGRAPHIC FEATURES

    While the categories of the K-Means classification are not explicitly stated, inferences can be made based on satellite imagery in the visible spectrum. In this instance, possible categories include:

    1. (Blue) Deep water bodies, including Lake Lerry to the Northeast, and Grand Lake and Petit Lake to the Southeast.
    2. (Yellow) Vegetation, especially the the wide expanses of Spartina throughout the delta.
    3. (Red) Shallow water bodies, such as the canals that run through the delta, appearing as smooth continuous lines in the figure.
    4. (Black) Exposed sediments, such as sandbars or man-made hardscaping. I figured this one out by looking at Google maps streetview of the town of Delacroix, LA, which is just barely seen on the eastern edge of the RGB figure as a white bend. This category may also include the vegetated areas able to support woody growth.