Basic Class Structure

The three primary classes (Survey, Tabular, and Raster) all contain data and metadata within Xarray Datasets. This example demonstrates how to access the xarray object for each class, and methods for exploring the data and metadata.

This example uses ASEG-formatted raw AEM data from the Tempest system, and a 2-D GeoTiFF of magnetic data.

Dataset Reference: Minsley, B.J., James, S.R., Bedrosian, P.A., Pace, M.D., Hoogenboom, B.E., and Burton, B.L., 2021, Airborne electromagnetic, magnetic, and radiometric survey of the Mississippi Alluvial Plain, November 2019 - March 2020: U.S. Geological Survey data release, https://doi.org/10.5066/P9E44CTQ.

import matplotlib.pyplot as plt
from os.path import join
from gspy import Survey
from pprint import pprint

First Create the Survey & Data Objects

# Initialize the Survey
data_path = '..//..//supplemental//region//MAP'
metadata = join(data_path, "data//Tempest_survey_md.json")
survey = Survey(metadata)

# Add Tabular and Raster Datasets
t_data = join(data_path, 'data//Tempest.dat')
t_supp = join(data_path, 'data//Tempest_data_md.json')
survey.add_tabular(type='aseg', data_filename=t_data, metadata_file=t_supp)
r_supp = join(data_path, 'data//Tempest_raster_md.json')
survey.add_raster(metadata_file = r_supp)

Accessing the Xarray object

Survey

# The Survey's metadata is accessed through the xarray property
print('Survey:\n')
print(survey.xarray)
Survey:

<xarray.Dataset>
Dimensions:                 ()
Coordinates:
    spatial_ref             float64 0.0
Data variables:
    survey_information      float64 nan
    survey_units            float64 nan
    system_information      float64 nan
    flightline_information  float64 nan
    survey_equipment        float64 nan
Attributes:
    title:        Example Tempest Airborne Electromagnetic (AEM) Dataset
    institution:  USGS Geology, Geophysics, & Geochemistry Science Center
    source:       Contractor provided ASEG-formatted data
    history:      <date and time when the data were produced and/or modified>
    references:   <data release reference>
    comment:      <additional details or ancillary information>
    content:      <summary list of file contents, e.g. raw data (/survey/tabu...
    conventions:  CF-1.8, GS-0.0
    created_by:   gspy==0.0.1

To look just at the attributes

print('Survey Attributes:\n')
pprint(survey.xarray.attrs)
Survey Attributes:

{'comment': '<additional details or ancillary information>',
 'content': '<summary list of file contents, e.g. raw data '
            '(/survey/tabular/0), processed data (/survey/tabular/1)>',
 'conventions': 'CF-1.8, GS-0.0',
 'created_by': 'gspy==0.0.1',
 'history': '<date and time when the data were produced and/or modified>',
 'institution': 'USGS Geology, Geophysics, & Geochemistry Science Center',
 'references': '<data release reference>',
 'source': 'Contractor provided ASEG-formatted data',
 'title': 'Example Tempest Airborne Electromagnetic (AEM) Dataset'}

Or expand a specific variable

print('Survey Information:\n')
print(survey.xarray['survey_information'])
Survey Information:

<xarray.DataArray 'survey_information' ()>
array(nan)
Coordinates:
    spatial_ref  float64 0.0
Attributes:
    contractor_project_number:  603756FWA
    contractor:                 CGG Canada Services Ltd.
    client:                     U.S. Geological Survey
    survey_type:                ['electromagnetic', 'magnetic', 'radiometric']
    survey_area_name:           Mississippi Alluvial Plain (MAP)
    state:                      ['MO', 'AR', 'TN', 'MS', 'LA', 'IL', 'KY']
    country:                    USA
    acquisition_start:          20191120
    acquisition_end:            20200307
    dataset_created:            20200420

Tabular & Raster

Datasets are attached to the Survey as lists, however if only one Dataset of a given type is present then the xarray object is returned simply by the name of the group

# Tabular
print('Tabular:\n')
print(survey.tabular)

# Raster
print('\nRaster:\n')
print(survey.raster)
Tabular:

<xarray.Dataset>
Dimensions:          (index: 20701, gate_times: 15, nv: 2)
Coordinates:
    spatial_ref      float64 0.0
  * index            (index) int32 0 1 2 3 4 5 ... 20696 20697 20698 20699 20700
  * gate_times       (gate_times) float64 1.085e-05 3.255e-05 ... 0.01338
  * nv               (nv) int64 0 1
    x                (index) float64 3.579e+05 3.579e+05 ... 4.907e+05 4.906e+05
    y                (index) float64 1.211e+06 1.211e+06 ... 1.577e+06 1.577e+06
    z                (index) float64 45.83 46.61 46.95 ... 177.0 179.4 177.2
Data variables: (12/62)
    gate_times_bnds  (gate_times, nv) float64 5.43e-06 1.628e-05 ... 0.01666
    Line             (index) int32 225401 225401 225401 ... 262001 262001 262001
    Flight           (index) int32 10 10 10 10 10 10 10 ... 70 70 70 70 70 70 70
    Fiducial         (index) float64 7.836e+03 7.836e+03 ... 1.282e+04 1.282e+04
    Proj_CGG         (index) int32 603756 603756 603756 ... 603756 603756 603756
    Proj_Client      (index) int32 9999 9999 9999 9999 ... 9999 9999 9999 9999
    ...               ...
    Z_PrimaryField   (index) float64 14.69 14.53 15.06 ... 16.77 15.95 14.99
    Z_VLF1           (index) float64 3.696 3.733 3.729 ... 3.732 3.734 3.71
    Z_VLF2           (index) float64 3.684 3.711 3.705 ... 3.701 3.717 3.699
    Z_VLF3           (index) float64 3.637 3.607 3.623 ... 3.654 3.602 3.614
    Z_VLF4           (index) float64 3.567 3.576 3.621 ... 3.616 3.594 3.586
    Z_Geofact        (index) float64 0.9969 0.9862 1.022 ... 1.123 1.069 1.004
Attributes:
    content:  raw data
    comment:  This dataset includes minimally processed (raw) AEM data

Raster:

<xarray.Dataset>
Dimensions:       (x: 599, nv: 2, y: 1212)
Coordinates:
    spatial_ref   float64 0.0
  * x             (x) float64 2.928e+05 2.934e+05 ... 6.51e+05 6.516e+05
  * nv            (nv) int64 0 1
  * y             (y) float64 1.607e+06 1.606e+06 ... 8.808e+05 8.802e+05
Data variables:
    x_bnds        (x, nv) float64 2.925e+05 2.931e+05 ... 6.513e+05 6.519e+05
    y_bnds        (y, nv) float64 1.607e+06 1.606e+06 ... 8.805e+05 8.799e+05
    magnetic_tmi  (y, x) float64 1.701e+38 1.701e+38 ... 1.701e+38 1.701e+38
Attributes:
    comment:  <additional details or ancillary information>
    content:  gridded magnetic map

Multiple Groups

# If more than one Dataset is present under the group, then the list begins indexing
# For example, let's add a second Tabular Dataset
m_data = join(data_path, 'model//Tempest_model.dat')
m_supp = join(data_path, 'model//Tempest_model_md.json')
survey.add_tabular(type='aseg', data_filename=m_data, metadata_file=m_supp)

Now the first dataset is accessed at index 0

print('First Tabular Group:\n')
print(survey.tabular[0])
First Tabular Group:

<xarray.Dataset>
Dimensions:          (index: 20701, gate_times: 15, nv: 2)
Coordinates:
    spatial_ref      float64 0.0
  * index            (index) int32 0 1 2 3 4 5 ... 20696 20697 20698 20699 20700
  * gate_times       (gate_times) float64 1.085e-05 3.255e-05 ... 0.01338
  * nv               (nv) int64 0 1
    x                (index) float64 3.579e+05 3.579e+05 ... 4.907e+05 4.906e+05
    y                (index) float64 1.211e+06 1.211e+06 ... 1.577e+06 1.577e+06
    z                (index) float64 45.83 46.61 46.95 ... 177.0 179.4 177.2
Data variables: (12/62)
    gate_times_bnds  (gate_times, nv) float64 5.43e-06 1.628e-05 ... 0.01666
    Line             (index) int32 225401 225401 225401 ... 262001 262001 262001
    Flight           (index) int32 10 10 10 10 10 10 10 ... 70 70 70 70 70 70 70
    Fiducial         (index) float64 7.836e+03 7.836e+03 ... 1.282e+04 1.282e+04
    Proj_CGG         (index) int32 603756 603756 603756 ... 603756 603756 603756
    Proj_Client      (index) int32 9999 9999 9999 9999 ... 9999 9999 9999 9999
    ...               ...
    Z_PrimaryField   (index) float64 14.69 14.53 15.06 ... 16.77 15.95 14.99
    Z_VLF1           (index) float64 3.696 3.733 3.729 ... 3.732 3.734 3.71
    Z_VLF2           (index) float64 3.684 3.711 3.705 ... 3.701 3.717 3.699
    Z_VLF3           (index) float64 3.637 3.607 3.623 ... 3.654 3.602 3.614
    Z_VLF4           (index) float64 3.567 3.576 3.621 ... 3.616 3.594 3.586
    Z_Geofact        (index) float64 0.9969 0.9862 1.022 ... 1.123 1.069 1.004
Attributes:
    content:  raw data
    comment:  This dataset includes minimally processed (raw) AEM data

and the second is located at index 1

print('Second Tabular Group:\n')
print(survey.tabular[1])
Second Tabular Group:

<xarray.Dataset>
Dimensions:                  (index: 20701, layer_depth: 30, nv: 2,
                              gate_times: 15)
Coordinates:
    spatial_ref              float64 0.0
  * index                    (index) int32 0 1 2 3 4 ... 20697 20698 20699 20700
  * layer_depth              (layer_depth) float64 1.5 4.65 ... 424.2 467.5
  * nv                       (nv) int64 0 1
  * gate_times               (gate_times) float64 1.085e-05 ... 0.01338
    x                        (index) float64 3.579e+05 3.579e+05 ... 4.906e+05
    y                        (index) float64 1.211e+06 1.211e+06 ... 1.577e+06
    z                        (index) float64 45.83 46.61 46.95 ... 179.4 177.2
Data variables: (12/49)
    layer_depth_bnds         (layer_depth, nv) float64 0.0 3.0 ... 445.9 489.1
    gate_times_bnds          (gate_times, nv) float64 5.43e-06 ... 0.01666
    uniqueid                 (index) int32 0 1 2 3 4 ... 20697 20698 20699 20700
    survey                   (index) int32 9999 9999 9999 ... 9999 9999 9999
    date                     (index) int32 20191128 20191128 ... 20200227
    flight                   (index) int32 10 10 10 10 10 10 ... 70 70 70 70 70
    ...                       ...
    PhiC                     (index) float64 0.4491 0.4759 0.129 ... 1.61 1.289
    PhiT                     (index) float64 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
    PhiG                     (index) float64 0.9652 0.6608 ... 0.7603 1.457
    PhiS                     (index) float64 0.1158 0.1392 ... 0.2877 0.1705
    Lambda                   (index) float64 0.5968 0.5487 ... 0.3808 1.771
    Iterations               (index) int32 20 19 25 25 25 18 ... 28 30 30 27 29
Attributes:
    content:  inverted resistivity models
    comment:  This dataset includes inverted resistivity models derived from ...

Coordinates, Dimensions, and Attributes

All data variables must have dimensions, coordinate, and attributes

Dimensions

Tabular data are typicaly 1-D or 2-D variables with the primary dimension being index, which corresponds to the rows of the input text file representing individual measurements.

print(survey.tabular[1]['index'])
<xarray.DataArray 'index' (index: 20701)>
array([    0,     1,     2, ..., 20698, 20699, 20700], dtype=int32)
Coordinates:
    spatial_ref  float64 0.0
  * index        (index) int32 0 1 2 3 4 5 ... 20696 20697 20698 20699 20700
    x            (index) float64 3.579e+05 3.579e+05 ... 4.907e+05 4.906e+05
    y            (index) float64 1.211e+06 1.211e+06 ... 1.577e+06 1.577e+06
    z            (index) float64 45.83 46.61 46.95 46.66 ... 177.0 179.4 177.2
Attributes:
    standard_name:  index
    long_name:      Index of individual data points
    units:          not_defined
    null_value:     not_defined
    valid_range:    [    0 20700]
    grid_mapping:   spatial_ref

If a dimension is not discrete, meaning it represents ranges (such as depth layers), then the bounds on each dimension value also need to be defined, and are linked to the dimension through the “bounds” attribute.

print('example non-discrete dimension:\n')
print(survey.tabular[1]['gate_times'])
print('\n\ncorresponding bounds on non-discrete dimension:\n')
print(survey.tabular[1]['gate_times_bnds'])
example non-discrete dimension:

<xarray.DataArray 'gate_times' (gate_times: 15)>
array([1.085000e-05, 3.255000e-05, 5.426000e-05, 8.681000e-05, 1.410600e-04,
       2.278700e-04, 3.689300e-04, 5.859500e-04, 9.114800e-04, 1.410630e-03,
       2.191900e-03, 3.418070e-03, 5.338690e-03, 8.301020e-03, 1.337928e-02])
Coordinates:
    spatial_ref  float64 0.0
  * gate_times   (gate_times) float64 1.085e-05 3.255e-05 ... 0.008301 0.01338
Attributes:
    standard_name:  gate_times
    long_name:      receiver gate times
    units:          seconds
    null_value:     not_defined
    valid_range:    [1.085000e-05 1.337928e-02]
    grid_mapping:   spatial_ref
    bounds:         gate_times_bnds


corresponding bounds on non-discrete dimension:

<xarray.DataArray 'gate_times_bnds' (gate_times: 15, nv: 2)>
array([[5.430000e-06, 1.628000e-05],
       [2.713000e-05, 3.798000e-05],
       [4.883000e-05, 5.968000e-05],
       [7.053000e-05, 1.030800e-04],
       [1.139400e-04, 1.681900e-04],
       [1.790400e-04, 2.767000e-04],
       [2.875500e-04, 4.503200e-04],
       [4.611700e-04, 7.107400e-04],
       [7.215900e-04, 1.101380e-03],
       [1.112230e-03, 1.709030e-03],
       [1.719880e-03, 2.663920e-03],
       [2.674770e-03, 4.161360e-03],
       [4.172210e-03, 6.505170e-03],
       [6.516030e-03, 1.008600e-02],
       [1.009686e-02, 1.666171e-02]])
Coordinates:
    spatial_ref  float64 0.0
  * nv           (nv) int64 0 1
  * gate_times   (gate_times) float64 1.085e-05 3.255e-05 ... 0.008301 0.01338
Attributes:
    standard_name:  gate_times_bounds
    long_name:      receiver gate times cell boundaries
    units:          seconds
    null_value:     not_defined
    valid_range:    [5.430000e-06 1.666171e-02]
    grid_mapping:   spatial_ref

Coordinates

Coordinates define the spatial and temporal positioning of the data (X Y Z T). Additionally, all dimensions are by default classified as a coordinate. This means a dataset can have both dimensional and non-dimensional coordinates. Dimensional coordinates are noted with a * (or bold text) in printed output of the xarray, such as index, gate_times, nv in this example:

print(survey.tabular[0].coords)
Coordinates:
    spatial_ref  float64 0.0
  * index        (index) int32 0 1 2 3 4 5 ... 20696 20697 20698 20699 20700
  * gate_times   (gate_times) float64 1.085e-05 3.255e-05 ... 0.008301 0.01338
  * nv           (nv) int64 0 1
    x            (index) float64 3.579e+05 3.579e+05 ... 4.907e+05 4.906e+05
    y            (index) float64 1.211e+06 1.211e+06 ... 1.577e+06 1.577e+06
    z            (index) float64 45.83 46.61 46.95 46.66 ... 177.0 179.4 177.2

Tabular Coordinates

In Tabular data, coordinates are typically non-dimensional, since the primary dataset dimension is index. By default, we define the spatial coordinates, x and y, based on the longitude and latitude (or easting/northing) data variables. If relevant, z and t coordinate variables can also be defined, representing the vertical and temporal coordinates of the data points.

Note: All coordinates must match the coordinate reference system defined in the Survey.

Raster Coordinates

Raster data are gridded, typically representing maps or multi-dimensional models. Therefore, Raster data almost always have dimensional coordinates, i.e., the data dimensions correspond directly to either spatial or temporal coordinates (x, y, z, t).

print(survey.raster.coords)
Coordinates:
    spatial_ref  float64 0.0
  * x            (x) float64 2.928e+05 2.934e+05 2.94e+05 ... 6.51e+05 6.516e+05
  * nv           (nv) int64 0 1
  * y            (y) float64 1.607e+06 1.606e+06 ... 8.808e+05 8.802e+05

The Spatial Reference Coordinate

the spatial_ref coordinate variable is a non-dimensional coordinate that contains information on the coordinate reference system. For more information, see Coordinate Reference Systems.

Attributes

Both datasets and data variables have attributes (metadata fields). Certain attributes are required, see our documentation on the GS standard. for more details.

Dataset attributes

Dataset attributes provide users a way to document and describe supplementary information about a dataset group as a whole, such as model inversion parameters or other processing descriptions. At a minimum, a content attribute should contain a brief summary of the contents of the dataset.

pprint(survey.tabular[1].attrs)
{'comment': 'This dataset includes inverted resistivity models derived from '
            'processed AEM data produced by USGS',
 'content': 'inverted resistivity models'}

Variable attributes

Each data variable must contain attributes detailing the metadata of that individual variable. These follow the Climate and Forecast (CF) metadata conventions.

pprint(survey.tabular[1]['conductivity'].attrs)
{'format': '30e15.6',
 'grid_mapping': 'spatial_ref',
 'long_name': 'not_defined',
 'null_value': 'not_defined',
 'standard_name': 'conductivity',
 'units': 'not_defined',
 'valid_range': array([1.e-04, 1.e+01])}

Total running time of the script: (0 minutes 1.227 seconds)

Gallery generated by Sphinx-Gallery