../../_images/nsidc_logo.png

Data Discovery and Access via earthdata library

Credits

Objective

  • Use programmatic data access to discover and access NASA DAAC data using the earthdata library .


Motivation and Background

TL;DR earthdata uses NASA APIs to search, preview and access NASA datasets on-prem and in the cloud (with 4 lines of Python!).


There are many ways to access NASA datasets. We can use the Earthdata search portal. We can use DAAC specific portals or tools. We can use Open Altimetry. These web portals are powerful but… they are not designed for programmatic access and reproducible workflows. This is extremely important in the age of the cloud and reproducible open science.

The good news is that NASA also exposes APIs that allows us to search, transform and access data in a programmatic way. There are already some very useful client libraries for these APIs:

  • icepyx

  • python-cmr

  • eo-metadata-tools

  • harmony-py

  • Hyrax (OpenDAP)

  • cmr-stac

  • others

Each of these libraries has amazing features and some similarities.

In this context, earthdata aims to be a simple library that can deal with the important parts of the metadata so we can access or download data without having to worry if a given dataset is on-premises (DAAC server) or in the cloud. earthdata is a work in progress and improving often. You are encouraged to contribute to this opensource library.

Some important strengths of earthdata library:

  • Discovery and access to on prem and cloud-hosted data

  • Access to data across all of NASA DAACs.

  • Easy handling of S3 credentials for direct access to cloud-hosted data.

Key Steps for Programmatic Data Access

There are a few key steps for accessing data from the NASA DAAC APIs:

  1. Authenticate with NASA Earthdata Login (and for cloud-hosted data with AWS access keys and token).

  2. Query CMR to find data using filters, such as spatial extent and temporal range.

  3. Order and download your data by interacting with DAAC APIs.

We’ll go through each of these steps during this tutorial, at the end summarizing how earthdata streamlines this process into a minimal number of lines of code.


Step 0. Import classes

# Import classes from earthdata

from earthdata import Auth, DataCollections, DataGranules, Store

Step 1. Earthdata login

To access data using the library it is necessary to log into Earthdata Login. To do this, enter your NASA Earthdata credentials in the next step after executing the following code cell.

Note: If you don’t have NASA Earthdata credentials you have to register first at the link above. You don’t need to be a NASA employee to register with NASA Earthdata! Note that if you did not enter your Earthdata Login username and email into the form in the pre-Hackweek email, you will not be on the ICESat-2 cloud data early access list and you will not have access to ICESat-2 data in the cloud. You will still have access to all publicly available data sets.

#Entering our Earthdata Login credentials.  

auth = Auth().login(strategy='netrc')
if auth.authenticated is False:
    auth = Auth().login(strategy='interactive')
You're now authenticated with NASA Earthdata Login

Step 2. Query the Common Metadata Repository (CMR)

Query CMR for Data Collections

The DataCollection class can query CMR for any collection (collection = data set) using all of CMR’s Query parameters and has built-in accessors for the common ones. This makes it ideal for one liners and easier notation.

This means we can narrow our search in CMR by filtering on keyword, temporal range, area of interest, and data provider, e.g.:

  • temporal(“2020-03-01”, “2020-03-30”)

  • keyword(‘land ice’)

  • bounding_box(-134.7,58.9,-133.9,59.2)

  • provider(“NSIDC_ECS”)

We’re going to go through a couple of examples of querying CMR and accessing data - the first for accessing on prem data and the second for accessing cloud-hosted data.

The first thing we’ll do is set up our query object.

Query = DataCollections().keyword('land ice').bounding_box(-134.7,58.9,-133.9,59.2).provider("NSIDC_ECS")

# Query = DataCollections().keyword('land ice').bounding_box(-134.7,58.9,-133.9,59.2).daac("NSIDC")
# Query = DataCollections().keyword('land ice').bounding_box(-134.7,58.9,-133.9,59.2).data_center("NSIDC")

print(f'Collections found: {Query.hits()}')
Collections found: 144

Then we’ll create a collections object from our query.

collections = Query.get(10)

# Inspect 1st result.

print(collections[0:1])
[{
  "meta": {
    "revision-id": 2,
    "deleted": false,
    "format": "application/iso19115+xml",
    "provider-id": "NSIDC_ECS",
    "user-id": "jbehnke",
    "has-formats": true,
    "associations": {
      "services": [
        "S1977894169-NSIDC_ECS",
        "S1568899363-NSIDC_ECS",
        "S2013515292-NSIDC_ECS",
        "S1613689509-NSIDC_ECS",
        "S2008499525-NSIDC_ECS"
      ],
      "tools": [
        "TL2000645101-NSIDC_ECS",
        "TL1950215144-NSIDC_ECS",
        "TL1977971361-NSIDC_ECS",
        "TL2012682515-NSIDC_ECS",
        "TL1993837300-NSIDC_ECS",
        "TL1977912846-NSIDC_ECS",
        "TL2140660378-NSIDC_ECS",
        "TL1956547654-NSIDC_ECS",
        "TL2011654705-NSIDC_ECS",
        "TL1956087574-NSIDC_ECS",
        "TL1952642907-NSIDC_ECS",
        "TL1994100033-NSIDC_ECS"
      ]
    },
    "has-spatial-subsetting": true,
    "native-id": "ATLAS/ICESat-2 L3A Land Ice Height V005",
    "has-transforms": false,
    "has-variables": true,
    "concept-id": "C2144439155-NSIDC_ECS",
    "revision-date": "2021-11-29T18:34:55.355Z",
    "granule-count": 132,
    "has-temporal-subsetting": true,
    "concept-type": "collection"
  },
  "umm": {
    "DataLanguage": "eng; usa",
    "CollectionCitations": [
      {
        "Title": "ATLAS/ICESat-2 L3A Land Ice Height V005",
        "Publisher": "NASA National Snow and Ice Data Center Distributed Active Archive Center",
        "Version": "005"
      }
    ],
    "AdditionalAttributes": [
      {
        "Name": "identifier_product_doi",
        "Description": "Digital object identifier that uniquely identifies this data product",
        "DataType": "STRING"
      },
      {
        "Name": "identifier_product_doi_authority",
        "Description": "URL of the digital object identifier resolving authority",
        "DataType": "STRING"
      }
    ],
    "SpatialExtent": {
      "SpatialCoverageType": "HORIZONTAL",
      "HorizontalSpatialDomain": {
        "Geometry": {
          "CoordinateSystem": "CARTESIAN",
          "BoundingRectangles": [
            {
              "WestBoundingCoordinate": -180.0,
              "NorthBoundingCoordinate": 90.0,
              "EastBoundingCoordinate": 180.0,
              "SouthBoundingCoordinate": -90.0
            }
          ]
        },
        "ResolutionAndCoordinateSystem": {
          "HorizontalDataResolution": {
            "GriddedResolutions": [
              {
                "XDimension": 20.0,
                "Unit": "Meters"
              }
            ]
          }
        }
      },
      "OrbitParameters": {
        "SwathWidth": 36.0,
        "Period": 96.8,
        "InclinationAngle": 92.0,
        "NumberOfOrbits": 0.071428571,
        "StartCircularLatitude": 0.0
      },
      "GranuleSpatialRepresentation": "ORBIT"
    },
    "CollectionProgress": "PLANNED",
    "ScienceKeywords": [
      {
        "Category": "EARTH SCIENCE",
        "Topic": "CRYOSPHERE",
        "Term": "GLACIERS/ICE SHEETS",
        "VariableLevel1": "GLACIER ELEVATION/ICE SHEET ELEVATION"
      }
    ],
    "TemporalExtents": [
      {
        "EndsAtPresentFlag": false,
        "RangeDateTimes": [
          {
            "BeginningDateTime": "2018-10-14T00:00:00.000Z"
          }
        ]
      }
    ],
    "ProcessingLevel": {
      "ProcessingLevelDescription": "Geophysical variables mapped on a grid",
      "Id": "Level 3"
    },
    "DOI": {
      "DOI": "10.5067/ATLAS/ATL06.005"
    },
    "ShortName": "ATL06",
    "EntryTitle": "ATLAS/ICESat-2 L3A Land Ice Height V005",
    "ISOTopicCategories": [
      "CLIMATOLOGY/METEOROLOGY/ATMOSPHERE"
    ],
    "AccessConstraints": {
      "Description": " These data are freely, openly, and fully accessible, provided that you are logged into your NASA Earthdata profile (https://urs.earthdata.nasa.gov/)."
    },
    "RelatedUrls": [
      {
        "Description": "Direct download via HTTPS protocol.",
        "URLContentType": "DistributionURL",
        "Type": "GET DATA",
        "Subtype": "DIRECT DOWNLOAD",
        "URL": "https://n5eil01u.ecs.nsidc.org/ATLAS/ATL06.005/",
        "GetData": {
          "Format": "Not provided",
          "Size": 0.0,
          "Unit": "KB"
        }
      },
      {
        "Description": "NASA's newest search and order tool for subsetting, reprojecting, and reformatting data.",
        "URLContentType": "DistributionURL",
        "Type": "GET DATA",
        "Subtype": "Earthdata Search",
        "URL": "https://search.earthdata.nasa.gov/search/granules?p=C2144439155-NSIDC_ECS&pg[0][v]=f&pg[0][gsk]=-start_date&q=atl06%20v005&tl=1637620429.826!3!!&m=-12.374623815234187!-42.75!1!1!0!0%2C2",
        "GetData": {
          "Format": "Not provided",
          "Size": 0.0,
          "Unit": "KB"
        }
      },
      {
        "Description": "Platform for ICESat, ICESat-2 to visualize and access vector data along one or more individual tracks",
        "URLContentType": "DistributionURL",
        "Type": "GET DATA",
        "URL": "https://openaltimetry.org/",
        "GetData": {
          "Format": "Not provided",
          "Size": 0.0,
          "Unit": "KB"
        }
      },
      {
        "Description": "Data Access link for ITSD project",
        "URLContentType": "DistributionURL",
        "Type": "GET DATA",
        "URL": "https://nsidc.org/data/data-access-tool/ATL06/versions/5/",
        "GetData": {
          "Format": "Not provided",
          "Size": 0.0,
          "Unit": "KB"
        }
      },
      {
        "Description": "Provides access to data, documentation, tools, citation information, support, and other resources.",
        "URLContentType": "CollectionURL",
        "Type": "DATA SET LANDING PAGE",
        "URL": "https://doi.org/10.5067/ATLAS/ATL06.005"
      },
      {
        "Description": "Includes a user's guide, supplemental documents like ATBDs and academic papers, How Tos, FAQs, etc.",
        "URLContentType": "PublicationURL",
        "Type": "VIEW RELATED INFORMATION",
        "Subtype": "GENERAL DOCUMENTATION",
        "URL": "https://doi.org/10.5067/ATLAS/ATL06.005"
      }
    ],
    "ContactGroups": [
      {
        "Roles": [
          "User Services"
        ],
        "ContactInformation": {
          "Addresses": [
            {
              "City": "Boulder",
              "StateProvince": "Colorado",
              "Country": "USA"
            }
          ]
        },
        "GroupName": "NASA National Snow and Ice Data Center Distributed Active Archive Center"
      }
    ],
    "Abstract": "This data set (ATL06) provides geolocated, land-ice surface heights (above the WGS 84 ellipsoid, ITRF2014 reference frame), plus ancillary parameters that can be used to interpret and assess the quality of the height estimates. The data were acquired by the Advanced Topographic Laser Altimeter System (ATLAS) instrument on board the Ice, Cloud and land Elevation Satellite-2 (ICESat-2) observatory.",
    "Purpose": "Scientific Research",
    "LocationKeywords": [
      {
        "Category": "GEOGRAPHIC REGION",
        "Type": "GLOBAL"
      }
    ],
    "MetadataDates": [
      {
        "Date": "2021-11-29T00:00:00.000Z",
        "Type": "CREATE"
      },
      {
        "Date": "2021-11-29T00:00:00.000Z",
        "Type": "UPDATE"
      }
    ],
    "Version": "005",
    "UseConstraints": {
      "LicenseText": " These data are freely, openly, and fully available to use without restrictions, provided that you cite the data according to the recommended citation at [https://nsidc.org/about/use_copyright.html](https://nsidc.org/about/use_copyright.html). For more information on the NASA EOSDIS Data Use Policy, see [https://earthdata.nasa.gov/earth-observation-data/data-use-policy](https://earthdata.nasa.gov/earth-observation-data/data-use-policy)."
    },
    "ContactPersons": [
      {
        "Roles": [
          "Technical Contact"
        ],
        "ContactInformation": {
          "ContactMechanisms": [
            {
              "Type": "Telephone",
              "Value": "1 303 492 6199"
            },
            {
              "Type": "Fax",
              "Value": "1 303 492 2468"
            },
            {
              "Type": "Email",
              "Value": "nsidc@nsidc.org"
            }
          ],
          "Addresses": [
            {
              "StreetAddresses": [
                "CIRES, 449 UCB",
                "University of Colorado"
              ],
              "City": "Boulder",
              "StateProvince": "CO",
              "Country": "USA",
              "PostalCode": "80309-0449"
            }
          ]
        },
        "FirstName": "NSIDC",
        "MiddleName": "User",
        "LastName": "Services"
      },
      {
        "Roles": [
          "Technical Contact"
        ],
        "ContactInformation": {
          "Addresses": [
            {
              "StreetAddresses": [
                "University of Washington",
                "Department of Earth and Space Sciences"
              ],
              "City": "Seattle",
              "StateProvince": "WA",
              "Country": "USA",
              "PostalCode": "98195"
            }
          ]
        },
        "FirstName": "Ben",
        "LastName": "Smith"
      },
      {
        "Roles": [
          "Technical Contact"
        ],
        "ContactInformation": {
          "Addresses": [
            {
              "StreetAddresses": [
                "University of California, San Diego",
                "9500 Gilman Dr."
              ],
              "City": "La Jolla",
              "StateProvince": "California",
              "Country": "USA",
              "PostalCode": "92093"
            }
          ]
        },
        "FirstName": "Helen",
        "MiddleName": "A.",
        "LastName": "Fricker"
      },
      {
        "Roles": [
          "Technical Contact"
        ],
        "FirstName": "Alex",
        "LastName": "Gardner"
      },
      {
        "Roles": [
          "Technical Contact"
        ],
        "FirstName": "Matthew",
        "MiddleName": "R.",
        "LastName": "Siegfried"
      },
      {
        "Roles": [
          "Technical Contact"
        ],
        "FirstName": "Susheel",
        "LastName": "Adusumilli"
      },
      {
        "Roles": [
          "Technical Contact"
        ],
        "FirstName": "et",
        "LastName": "al"
      },
      {
        "Roles": [
          "Technical Contact"
        ],
        "ContactInformation": {
          "Addresses": [
            {
              "StreetAddresses": [
                "University of Washington",
                "Department of Earth and Space Sciences"
              ],
              "City": "Seattle",
              "StateProvince": "WA",
              "Country": "USA",
              "PostalCode": "98195"
            }
          ]
        },
        "FirstName": "Ben",
        "LastName": "Smith"
      },
      {
        "Roles": [
          "Technical Contact"
        ],
        "FirstName": "Susheel",
        "LastName": "Adusumilli"
      },
      {
        "Roles": [
          "Technical Contact"
        ],
        "FirstName": "Be\u00e1ta",
        "MiddleName": "M.",
        "LastName": "Csath\u00f3"
      },
      {
        "Roles": [
          "Technical Contact"
        ],
        "FirstName": "Denis",
        "LastName": "Felikson"
      },
      {
        "Roles": [
          "Technical Contact"
        ],
        "ContactInformation": {
          "Addresses": [
            {
              "StreetAddresses": [
                "University of California, San Diego",
                "9500 Gilman Dr."
              ],
              "City": "La Jolla",
              "StateProvince": "California",
              "Country": "USA",
              "PostalCode": "92093"
            }
          ]
        },
        "FirstName": "Helen",
        "MiddleName": "A.",
        "LastName": "Fricker"
      },
      {
        "Roles": [
          "Technical Contact"
        ],
        "FirstName": "Alex",
        "LastName": "Gardner"
      },
      {
        "Roles": [
          "Technical Contact"
        ],
        "FirstName": "Nick",
        "LastName": "Holschuh"
      },
      {
        "Roles": [
          "Technical Contact"
        ],
        "FirstName": "Jeff",
        "LastName": "Lee"
      },
      {
        "Roles": [
          "Technical Contact"
        ],
        "FirstName": "Johan",
        "LastName": "Nilsson"
      },
      {
        "Roles": [
          "Technical Contact"
        ],
        "FirstName": "Fernando",
        "MiddleName": "S.",
        "LastName": "Paolo"
      },
      {
        "Roles": [
          "Technical Contact"
        ],
        "FirstName": "Matthew",
        "MiddleName": "R.",
        "LastName": "Siegfried"
      },
      {
        "Roles": [
          "Technical Contact"
        ],
        "FirstName": "Tyler",
        "LastName": "Sutterley"
      },
      {
        "Roles": [
          "Technical Contact"
        ],
        "FirstName": "the",
        "MiddleName": "ICESat-2 Science",
        "LastName": "Team"
      }
    ],
    "DataCenters": [
      {
        "Roles": [
          "ARCHIVER"
        ],
        "ShortName": "NASA NSIDC DAAC",
        "ContactInformation": {
          "RelatedUrls": [
            {
              "Description": "Archiving Data Center",
              "URLContentType": "DataCenterURL",
              "Type": "HOME PAGE",
              "URL": "https://nsidc.org/daac"
            }
          ],
          "ContactMechanisms": [
            {
              "Type": "Telephone",
              "Value": "1 303 492 6199"
            },
            {
              "Type": "Email",
              "Value": "nsidc@nsidc.org"
            }
          ],
          "Addresses": [
            {
              "StreetAddresses": [
                "National Snow and Ice Data Center",
                "CIRES, 449 UCB",
                "University of Colorado"
              ],
              "City": "Boulder",
              "StateProvince": "CO",
              "Country": "USA",
              "PostalCode": "80309-0449"
            }
          ]
        }
      },
      {
        "Roles": [
          "DISTRIBUTOR"
        ],
        "ShortName": "NASA NSIDC DAAC",
        "ContactInformation": {
          "RelatedUrls": [
            {
              "Description": "Archiving Data Center",
              "URLContentType": "DataCenterURL",
              "Type": "HOME PAGE",
              "URL": "https://nsidc.org/daac"
            }
          ],
          "ContactMechanisms": [
            {
              "Type": "Telephone",
              "Value": "1 303 492 6199"
            },
            {
              "Type": "Email",
              "Value": "nsidc@nsidc.org"
            }
          ],
          "Addresses": [
            {
              "StreetAddresses": [
                "National Snow and Ice Data Center",
                "CIRES, 449 UCB",
                "University of Colorado"
              ],
              "City": "Boulder",
              "StateProvince": "CO",
              "Country": "USA",
              "PostalCode": "80309-0449"
            }
          ]
        }
      },
      {
        "Roles": [
          "PROCESSOR"
        ],
        "ShortName": "NASA/GSFC/EOS/ESDIS",
        "ContactInformation": {
          "RelatedUrls": [
            {
              "Description": "Originating Data Center",
              "URLContentType": "DataCenterURL",
              "Type": "HOME PAGE",
              "URL": "https://earthdata.nasa.gov/about/esdis-project"
            }
          ]
        }
      },
      {
        "Roles": [
          "ORIGINATOR"
        ],
        "ShortName": "NASA/GSFC/EOS/ESDIS",
        "ContactInformation": {
          "RelatedUrls": [
            {
              "Description": "Originating Data Center",
              "URLContentType": "DataCenterURL",
              "Type": "HOME PAGE",
              "URL": "https://earthdata.nasa.gov/about/esdis-project"
            }
          ]
        }
      }
    ],
    "Platforms": [
      {
        "Type": "Earth Observation Satellites",
        "ShortName": "ICESat-2",
        "LongName": "Ice, Cloud, and land Elevation Satellite-2",
        "Instruments": [
          {
            "ShortName": "ATLAS",
            "LongName": "Advanced Topographic Laser Altimeter System",
            "Technique": "instrument",
            "NumberOfInstruments": 1,
            "ComposedOf": [
              {
                "ShortName": "ATLAS",
                "LongName": "Advanced Topographic Laser Altimeter System"
              }
            ]
          }
        ]
      }
    ],
    "ArchiveAndDistributionInformation": {
      "FileDistributionInformation": [
        {
          "FormatType": "Native",
          "Format": "HDF5",
          "FormatDescription": "HTTPS"
        }
      ]
    }
  }
}]

To reduce the number of metadata fields displayed, we can select which fields to print when creating our collections object.

collections = Query.fields(['ShortName','Abstract']).get(5)

# Inspect 5 results printing just the ShortName and Abstract

print(collections[0:5])
[{
  "meta": {
    "concept-id": "C2144439155-NSIDC_ECS",
    "granule-count": 132,
    "provider-id": "NSIDC_ECS"
  },
  "umm": {
    "ShortName": "ATL06",
    "Abstract": "This data set (ATL06) provides geolocated, land-ice surface heights (above the WGS 84 ellipsoid, ITRF2014 reference frame), plus ancillary parameters that can be used to interpret and assess the quality of the height estimates. The data were acquired by the Advanced Topographic Laser Altimeter System (ATLAS) instrument on board the Ice, Cloud and land Elevation Satellite-2 (ICESat-2) observatory."
  }
}, {
  "meta": {
    "concept-id": "C2144424132-NSIDC_ECS",
    "granule-count": 120,
    "provider-id": "NSIDC_ECS"
  },
  "umm": {
    "ShortName": "ATL08",
    "Abstract": "This data set (ATL08) contains along-track heights above the WGS84 ellipsoid (ITRF2014 reference frame) for the ground and canopy surfaces. The canopy and ground surfaces are processed in fixed 100 m data segments, which typically contain more than 100 signal photons. The data were acquired by the Advanced Topographic Laser Altimeter System (ATLAS) instrument on board the Ice, Cloud and land Elevation Satellite-2 (ICESat-2) observatory."
  }
}, {
  "meta": {
    "concept-id": "C2003771331-NSIDC_ECS",
    "granule-count": 110,
    "provider-id": "NSIDC_ECS"
  },
  "umm": {
    "ShortName": "ATL06",
    "Abstract": "This data set (ATL06) provides geolocated, land-ice surface heights (above the WGS 84 ellipsoid, ITRF2014 reference frame), plus ancillary parameters that can be used to interpret and assess the quality of the height estimates. The data were acquired by the Advanced Topographic Laser Altimeter System (ATLAS) instrument on board the Ice, Cloud and land Elevation Satellite-2 (ICESat-2) observatory."
  }
}, {
  "meta": {
    "concept-id": "C2003772626-NSIDC_ECS",
    "granule-count": 105,
    "provider-id": "NSIDC_ECS"
  },
  "umm": {
    "ShortName": "ATL08",
    "Abstract": "This data set (ATL08) contains along-track heights above the WGS84 ellipsoid (ITRF2014 reference frame) for the ground and canopy surfaces. The canopy and ground surfaces are processed in fixed 100 m data segments, which typically contain more than 100 signal photons. The data were acquired by the Advanced Topographic Laser Altimeter System (ATLAS) instrument on board the Ice, Cloud and land Elevation Satellite-2 (ICESat-2) observatory."
  }
}, {
  "meta": {
    "concept-id": "C2199186776-NSIDC_ECS",
    "granule-count": 5,
    "provider-id": "NSIDC_ECS"
  },
  "umm": {
    "ShortName": "ATL08QL",
    "Abstract": "ATL08QL is the quick look version of ATL08. Once final ATL08 files are available the corresponding ATL08QL files will be removed. \nATL08 contains along-track heights above the WGS84 ellipsoid (ITRF2014 reference frame) for the ground and canopy surfaces. The canopy and ground surfaces are processed in fixed 100 m data segments, which typically contain more than 100 signal photons. The data were acquired by the Advanced Topographic Laser Altimeter System (ATLAS) instrument on board the Ice, Cloud and land Elevation Satellite-2 (ICESat-2) observatory."
  }
}]

The results from DataCollections are enhanced python dict objects. We can select which metadata fields from CMR to display.

The concept ID is an important parameter in CMR. It’s a unique identifier for a data collection (collection = data set). We’ll use the concept ID when querying for data granules (granules = files) below.

collections[0]["meta.concept-id"]
'C2144439155-NSIDC_ECS'
collections[0]["umm.ShortName"]
'ATL06'
collections[0]["umm.RelatedUrls"]
[{'Description': 'Direct download via HTTPS protocol.',
  'URLContentType': 'DistributionURL',
  'Type': 'GET DATA',
  'Subtype': 'DIRECT DOWNLOAD',
  'URL': 'https://n5eil01u.ecs.nsidc.org/ATLAS/ATL06.005/',
  'GetData': {'Format': 'Not provided', 'Size': 0.0, 'Unit': 'KB'}},
 {'Description': "NASA's newest search and order tool for subsetting, reprojecting, and reformatting data.",
  'URLContentType': 'DistributionURL',
  'Type': 'GET DATA',
  'Subtype': 'Earthdata Search',
  'URL': 'https://search.earthdata.nasa.gov/search/granules?p=C2144439155-NSIDC_ECS&pg[0][v]=f&pg[0][gsk]=-start_date&q=atl06%20v005&tl=1637620429.826!3!!&m=-12.374623815234187!-42.75!1!1!0!0%2C2',
  'GetData': {'Format': 'Not provided', 'Size': 0.0, 'Unit': 'KB'}},
 {'Description': 'Platform for ICESat, ICESat-2 to visualize and access vector data along one or more individual tracks',
  'URLContentType': 'DistributionURL',
  'Type': 'GET DATA',
  'URL': 'https://openaltimetry.org/',
  'GetData': {'Format': 'Not provided', 'Size': 0.0, 'Unit': 'KB'}},
 {'Description': 'Data Access link for ITSD project',
  'URLContentType': 'DistributionURL',
  'Type': 'GET DATA',
  'URL': 'https://nsidc.org/data/data-access-tool/ATL06/versions/5/',
  'GetData': {'Format': 'Not provided', 'Size': 0.0, 'Unit': 'KB'}},
 {'Description': 'Provides access to data, documentation, tools, citation information, support, and other resources.',
  'URLContentType': 'CollectionURL',
  'Type': 'DATA SET LANDING PAGE',
  'URL': 'https://doi.org/10.5067/ATLAS/ATL06.005'},
 {'Description': "Includes a user's guide, supplemental documents like ATBDs and academic papers, How Tos, FAQs, etc.",
  'URLContentType': 'PublicationURL',
  'Type': 'VIEW RELATED INFORMATION',
  'Subtype': 'GENERAL DOCUMENTATION',
  'URL': 'https://doi.org/10.5067/ATLAS/ATL06.005'}]

Query CMR for Data Granules

The DataGranules class provides similar functionality as the collection class. As mentioned above, concept IDs are unique identifiers for data sets (collections). To query for granules from the exact data set and version in which you are interested, query granules using concept-id. You can search data granules using a short name but that could (more likely will) return different versions of the same data granules. Even when specifying both short name and version number, a query won’t distinguish between on prem or cloud hosted granules.

In this example we’re querying for data granules from ICESat-2 ATL06 version 005 dataset.

# Generally speaking we won't need the auth instance for *queries* to collections and granules, unless the data set is under restricted access (like NSIDC_CPRD).

Query = DataGranules().concept_id('C2144439155-NSIDC_ECS').bounding_box(-134.7,58.9,-133.9,59.2).temporal("2020-03-01", "2020-03-30")
print(f'Granules found: {Query.hits()}')
Granules found: 4
granules = Query.get()
print(granules[0:4])
[Collection: {'EntryTitle': 'ATLAS/ICESat-2 L3A Land Ice Height V005'}
Spatial coverage: {'HorizontalSpatialDomain': {'Orbit': {'AscendingCrossing': 51.00122639850689, 'StartLatitude': 59.5, 'StartDirection': 'D', 'EndLatitude': 27.0, 'EndDirection': 'D'}}}
Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2020-03-06T12:23:20.622Z', 'EndingDateTime': '2020-03-06T12:23:43.274Z'}}
Size(MB): 2.7875404358
Data: ['https://n5eil01u.ecs.nsidc.org/DP7/ATLAS/ATL06.005/2020.03.06/ATL06_20200306122320_10810606_005_01.h5'], Collection: {'EntryTitle': 'ATLAS/ICESat-2 L3A Land Ice Height V005'}
Spatial coverage: {'HorizontalSpatialDomain': {'Orbit': {'AscendingCrossing': -126.52898255861147, 'StartLatitude': 27.0, 'StartDirection': 'A', 'EndLatitude': 59.5, 'EndDirection': 'A'}}}
Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2020-03-08T23:49:46.098Z', 'EndingDateTime': '2020-03-08T23:50:24.457Z'}}
Size(MB): 4.3645324707
Data: ['https://n5eil01u.ecs.nsidc.org/DP7/ATLAS/ATL06.005/2020.03.08/ATL06_20200308234154_11190602_005_01.h5'], Collection: {'EntryTitle': 'ATLAS/ICESat-2 L3A Land Ice Height V005'}
Spatial coverage: {'HorizontalSpatialDomain': {'Orbit': {'AscendingCrossing': 50.222511037176204, 'StartLatitude': 59.5, 'StartDirection': 'D', 'EndLatitude': 27.0, 'EndDirection': 'D'}}}
Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2020-03-10T12:15:10.646Z', 'EndingDateTime': '2020-03-10T12:15:44.699Z'}}
Size(MB): 2.6717844009
Data: ['https://n5eil01u.ecs.nsidc.org/DP7/ATLAS/ATL06.005/2020.03.10/ATL06_20200310121504_11420606_005_01.h5'], Collection: {'EntryTitle': 'ATLAS/ICESat-2 L3A Land Ice Height V005'}
Spatial coverage: {'HorizontalSpatialDomain': {'Orbit': {'AscendingCrossing': -127.30761451447074, 'StartLatitude': 27.0, 'StartDirection': 'A', 'EndLatitude': 59.5, 'EndDirection': 'A'}}}
Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2020-03-12T23:41:28.626Z', 'EndingDateTime': '2020-03-12T23:42:07.293Z'}}
Size(MB): 14.1388778687
Data: ['https://n5eil01u.ecs.nsidc.org/DP7/ATLAS/ATL06.005/2020.03.12/ATL06_20200312233336_11800602_005_01.h5']]

Pretty printing data granules

Since we are in a notebook we can take advantage of it to see a more user friendly version of the granules with the built-in function display This will render browse image for the granule if available and eventually will have a similar representation as the one from the Earthdata search portal

# printing 2 granules using display
[display(granule) for granule in granules[0:2]]

Data: https://n5eil01u.ecs.nsidc.org/DP7/ATLAS/ATL06.005/2020.03.06/ATL06_20200306122320_10810606_005_01.h5

Size: 2.7875404358 MB

Spatial: {'HorizontalSpatialDomain': {'Orbit': {'AscendingCrossing': 51.00122639850689, 'StartLatitude': 59.5, 'StartDirection': 'D', 'EndLatitude': 27.0, 'EndDirection': 'D'}}}

Data PreviewData Preview

Data: https://n5eil01u.ecs.nsidc.org/DP7/ATLAS/ATL06.005/2020.03.08/ATL06_20200308234154_11190602_005_01.h5

Size: 4.3645324707 MB

Spatial: {'HorizontalSpatialDomain': {'Orbit': {'AscendingCrossing': -126.52898255861147, 'StartLatitude': 27.0, 'StartDirection': 'A', 'EndLatitude': 59.5, 'EndDirection': 'A'}}}

Data PreviewData Preview
[None, None]

Step 3. Accessing the data

On-prem access 📡

DAAC hosted data

%%time
# accessing the data on prem means downloading it if we are in a local environment or "uploading them" if we are in the cloud.
access = Store(auth)
files = access.get(granules[1:2], local_path = "/tmp/demo-atl06")
 Getting 1 granules, approx download size: 0.0 GB
CPU times: user 109 ms, sys: 16.7 ms, total: 126 ms
Wall time: 2.41 s

In a terminal, “ls /tmp” to see where the files are going.

Cloud access ☁️

Same API, just a different place

The cloud is not something magical, but having infrastructure on-demand is quite handy to have for many scientific workflows, especially if the data already lives in “the cloud”. As for NASA, data migration started in 2020 and will continue into the foreseeable future. Not all, but most of NASA data will be available in Amazon Web Services object simple storage service or AWS S3.

To work with this data the first thing we need to do is to get the proper credentials for accessing data in their S3 buckets. These credentials are on a per-DAAC basis and last a mere 1 hour. In the near future the Auth class will keep track of this to regenerate the credentials as needed.

With earthdata a researcher can get the files regardless if they are on-prem or cloud based with the same API call, although an important consideration is that if we want direct access to data in the cloud we must run the code in the cloud. This is because some S3 buckets are configured to only allow direct access (s3:// links) if the requester is in the same zone, us-west-2.

Query = DataCollections(auth).keyword('land ice').bounding_box(-134.7,58.9,-133.9,59.2).provider("NSIDC_CPRD")

print(f'Collections found: {Query.hits()}') 
Collections found: 0

Oh no! What!? Zero hits? :(

The ‘hits’ method above will tell you the number of query hits, but only for publicly available data sets.
Because cloud hosted ICESat-2 data are not yet publicly available, CMR will return “0” hits, if you filtered DataCollections by provider = NSIDC_CPRD. For now we need an alternative method of seeing how many cloud data sets are available at NSIDC. This is only temporary until cloud-hosted ICESat-2 become publicly available. We can create a collections object (we’re going to want one of these soon anyhow) and print the len() of the collections object to see the true number of hits.

Create a collections object

# We can create a collections object from our query.

collections = Query.fields(['ShortName','Abstract']).get()

print(len(collections))
# Inspect 1st result.

print(collections[0:5])
Query = DataGranules(auth).concept_id("C2153572614-NSIDC_CPRD").bounding_box(-134.7,58.9,-133.9,59.2).temporal("2020-03-01", "2020-03-30")
print(f"Granule hits: {Query.hits()}")
cloud_granules = Query.get(4)
print(len(cloud_granules))
# is this a cloud hosted data granule?
cloud_granules[0].cloud_hosted
%%time

# If we get an error here, most likely is because we are running this code outside the us-west-2 region.
try:
    files = access.get(cloud_granules[0:2], "/tmp/demo-NSIDC_CPRD/")
except Exception as e:
    print("If we are here maybe we are not in us-west-2 or the collection ")

Recap

from earthdata import Auth, DataGranules, DataCollections, Store
auth = Auth().login()
access = Store(auth)

Query = DataGranules(auth).concept_id("C2144439155-NSIDC_ECS").bounding_box(-134.7,58.9,-133.9,59.2).temporal("2020-03-01", "2020-03-30")
granules = Query.get()
files = access.get(granules)

Wait, we said 4 lines of Python

from earthdata import Auth, DataGranules, Store
auth = Auth().login()
granules = DataGranules(auth).concept_id("C2144439155-NSIDC_ECS").bounding_box(-134.7,58.9,-133.9,59.2).temporal("2020-03-01", "2020-03-30").get_all()
files = Store(auth).get(granules, '/tmp')

The Demo notebook in the earthdata library GitHub repo showcases much more of earthdata’s capabilities, including many handy methods for querying CMR for collections and granules. Please take a look on your own when you are ready to start using earthdata library. You are invited to contribute!

Data provider ID cheat sheet.

../../_images/data_provider_cheat_sheet.png