Note
Go to the end to download the full example code
Basic Class Structure
The three primary classes (Survey, Tabular, and Raster) all contain data and metadata within Xarray Datasets. This example demonstrates how to access the xarray object for each class, and methods for exploring the data and metadata.
This example uses ASEG-formatted raw AEM data from the Tempest system, and a 2-D GeoTiFF of magnetic data.
Dataset Reference: Minsley, B.J., James, S.R., Bedrosian, P.A., Pace, M.D., Hoogenboom, B.E., and Burton, B.L., 2021, Airborne electromagnetic, magnetic, and radiometric survey of the Mississippi Alluvial Plain, November 2019 - March 2020: U.S. Geological Survey data release, https://doi.org/10.5066/P9E44CTQ.
import matplotlib.pyplot as plt
from os.path import join
from gspy import Survey
from pprint import pprint
First Create the Survey & Data Objects
# Initialize the Survey
data_path = '..//..//supplemental//region//MAP'
metadata = join(data_path, "data//Tempest_survey_md.json")
survey = Survey(metadata)
# Add Tabular and Raster Datasets
t_data = join(data_path, 'data//Tempest.dat')
t_supp = join(data_path, 'data//Tempest_data_md.json')
survey.add_tabular(type='aseg', data_filename=t_data, metadata_file=t_supp)
r_supp = join(data_path, 'data//Tempest_raster_md.json')
survey.add_raster(metadata_file = r_supp)
Accessing the Xarray object
Survey
# The Survey's metadata is accessed through the xarray property
print('Survey:\n')
print(survey.xarray)
Survey:
<xarray.Dataset>
Dimensions: ()
Coordinates:
spatial_ref float64 0.0
Data variables:
survey_information float64 nan
survey_units float64 nan
system_information float64 nan
flightline_information float64 nan
survey_equipment float64 nan
Attributes:
title: Example Tempest Airborne Electromagnetic (AEM) Dataset
institution: USGS Geology, Geophysics, & Geochemistry Science Center
source: Contractor provided ASEG-formatted data
history: <date and time when the data were produced and/or modified>
references: <data release reference>
comment: <additional details or ancillary information>
content: <summary list of file contents, e.g. raw data (/survey/tabu...
conventions: CF-1.8, GS-0.0
created_by: gspy==0.0.1
To look just at the attributes
print('Survey Attributes:\n')
pprint(survey.xarray.attrs)
Survey Attributes:
{'comment': '<additional details or ancillary information>',
'content': '<summary list of file contents, e.g. raw data '
'(/survey/tabular/0), processed data (/survey/tabular/1)>',
'conventions': 'CF-1.8, GS-0.0',
'created_by': 'gspy==0.0.1',
'history': '<date and time when the data were produced and/or modified>',
'institution': 'USGS Geology, Geophysics, & Geochemistry Science Center',
'references': '<data release reference>',
'source': 'Contractor provided ASEG-formatted data',
'title': 'Example Tempest Airborne Electromagnetic (AEM) Dataset'}
Or expand a specific variable
print('Survey Information:\n')
print(survey.xarray['survey_information'])
Survey Information:
<xarray.DataArray 'survey_information' ()>
array(nan)
Coordinates:
spatial_ref float64 0.0
Attributes:
contractor_project_number: 603756FWA
contractor: CGG Canada Services Ltd.
client: U.S. Geological Survey
survey_type: ['electromagnetic', 'magnetic', 'radiometric']
survey_area_name: Mississippi Alluvial Plain (MAP)
state: ['MO', 'AR', 'TN', 'MS', 'LA', 'IL', 'KY']
country: USA
acquisition_start: 20191120
acquisition_end: 20200307
dataset_created: 20200420
Tabular & Raster
Datasets are attached to the Survey as lists, however if only one Dataset of a given type is present then the xarray object is returned simply by the name of the group
# Tabular
print('Tabular:\n')
print(survey.tabular)
# Raster
print('\nRaster:\n')
print(survey.raster)
Tabular:
<xarray.Dataset>
Dimensions: (index: 20701, gate_times: 15, nv: 2)
Coordinates:
spatial_ref float64 0.0
* index (index) int32 0 1 2 3 4 5 ... 20696 20697 20698 20699 20700
* gate_times (gate_times) float64 1.085e-05 3.255e-05 ... 0.01338
* nv (nv) int64 0 1
x (index) float64 3.579e+05 3.579e+05 ... 4.907e+05 4.906e+05
y (index) float64 1.211e+06 1.211e+06 ... 1.577e+06 1.577e+06
z (index) float64 45.83 46.61 46.95 ... 177.0 179.4 177.2
Data variables: (12/62)
gate_times_bnds (gate_times, nv) float64 5.43e-06 1.628e-05 ... 0.01666
Line (index) int32 225401 225401 225401 ... 262001 262001 262001
Flight (index) int32 10 10 10 10 10 10 10 ... 70 70 70 70 70 70 70
Fiducial (index) float64 7.836e+03 7.836e+03 ... 1.282e+04 1.282e+04
Proj_CGG (index) int32 603756 603756 603756 ... 603756 603756 603756
Proj_Client (index) int32 9999 9999 9999 9999 ... 9999 9999 9999 9999
... ...
Z_PrimaryField (index) float64 14.69 14.53 15.06 ... 16.77 15.95 14.99
Z_VLF1 (index) float64 3.696 3.733 3.729 ... 3.732 3.734 3.71
Z_VLF2 (index) float64 3.684 3.711 3.705 ... 3.701 3.717 3.699
Z_VLF3 (index) float64 3.637 3.607 3.623 ... 3.654 3.602 3.614
Z_VLF4 (index) float64 3.567 3.576 3.621 ... 3.616 3.594 3.586
Z_Geofact (index) float64 0.9969 0.9862 1.022 ... 1.123 1.069 1.004
Attributes:
content: raw data
comment: This dataset includes minimally processed (raw) AEM data
Raster:
<xarray.Dataset>
Dimensions: (x: 599, nv: 2, y: 1212)
Coordinates:
spatial_ref float64 0.0
* x (x) float64 2.928e+05 2.934e+05 ... 6.51e+05 6.516e+05
* nv (nv) int64 0 1
* y (y) float64 1.607e+06 1.606e+06 ... 8.808e+05 8.802e+05
Data variables:
x_bnds (x, nv) float64 2.925e+05 2.931e+05 ... 6.513e+05 6.519e+05
y_bnds (y, nv) float64 1.607e+06 1.606e+06 ... 8.805e+05 8.799e+05
magnetic_tmi (y, x) float64 1.701e+38 1.701e+38 ... 1.701e+38 1.701e+38
Attributes:
comment: <additional details or ancillary information>
content: gridded magnetic map
Multiple Groups
# If more than one Dataset is present under the group, then the list begins indexing
# For example, let's add a second Tabular Dataset
m_data = join(data_path, 'model//Tempest_model.dat')
m_supp = join(data_path, 'model//Tempest_model_md.json')
survey.add_tabular(type='aseg', data_filename=m_data, metadata_file=m_supp)
Now the first dataset is accessed at index 0
print('First Tabular Group:\n')
print(survey.tabular[0])
First Tabular Group:
<xarray.Dataset>
Dimensions: (index: 20701, gate_times: 15, nv: 2)
Coordinates:
spatial_ref float64 0.0
* index (index) int32 0 1 2 3 4 5 ... 20696 20697 20698 20699 20700
* gate_times (gate_times) float64 1.085e-05 3.255e-05 ... 0.01338
* nv (nv) int64 0 1
x (index) float64 3.579e+05 3.579e+05 ... 4.907e+05 4.906e+05
y (index) float64 1.211e+06 1.211e+06 ... 1.577e+06 1.577e+06
z (index) float64 45.83 46.61 46.95 ... 177.0 179.4 177.2
Data variables: (12/62)
gate_times_bnds (gate_times, nv) float64 5.43e-06 1.628e-05 ... 0.01666
Line (index) int32 225401 225401 225401 ... 262001 262001 262001
Flight (index) int32 10 10 10 10 10 10 10 ... 70 70 70 70 70 70 70
Fiducial (index) float64 7.836e+03 7.836e+03 ... 1.282e+04 1.282e+04
Proj_CGG (index) int32 603756 603756 603756 ... 603756 603756 603756
Proj_Client (index) int32 9999 9999 9999 9999 ... 9999 9999 9999 9999
... ...
Z_PrimaryField (index) float64 14.69 14.53 15.06 ... 16.77 15.95 14.99
Z_VLF1 (index) float64 3.696 3.733 3.729 ... 3.732 3.734 3.71
Z_VLF2 (index) float64 3.684 3.711 3.705 ... 3.701 3.717 3.699
Z_VLF3 (index) float64 3.637 3.607 3.623 ... 3.654 3.602 3.614
Z_VLF4 (index) float64 3.567 3.576 3.621 ... 3.616 3.594 3.586
Z_Geofact (index) float64 0.9969 0.9862 1.022 ... 1.123 1.069 1.004
Attributes:
content: raw data
comment: This dataset includes minimally processed (raw) AEM data
and the second is located at index 1
print('Second Tabular Group:\n')
print(survey.tabular[1])
Second Tabular Group:
<xarray.Dataset>
Dimensions: (index: 20701, layer_depth: 30, nv: 2,
gate_times: 15)
Coordinates:
spatial_ref float64 0.0
* index (index) int32 0 1 2 3 4 ... 20697 20698 20699 20700
* layer_depth (layer_depth) float64 1.5 4.65 ... 424.2 467.5
* nv (nv) int64 0 1
* gate_times (gate_times) float64 1.085e-05 ... 0.01338
x (index) float64 3.579e+05 3.579e+05 ... 4.906e+05
y (index) float64 1.211e+06 1.211e+06 ... 1.577e+06
z (index) float64 45.83 46.61 46.95 ... 179.4 177.2
Data variables: (12/49)
layer_depth_bnds (layer_depth, nv) float64 0.0 3.0 ... 445.9 489.1
gate_times_bnds (gate_times, nv) float64 5.43e-06 ... 0.01666
uniqueid (index) int32 0 1 2 3 4 ... 20697 20698 20699 20700
survey (index) int32 9999 9999 9999 ... 9999 9999 9999
date (index) int32 20191128 20191128 ... 20200227
flight (index) int32 10 10 10 10 10 10 ... 70 70 70 70 70
... ...
PhiC (index) float64 0.4491 0.4759 0.129 ... 1.61 1.289
PhiT (index) float64 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
PhiG (index) float64 0.9652 0.6608 ... 0.7603 1.457
PhiS (index) float64 0.1158 0.1392 ... 0.2877 0.1705
Lambda (index) float64 0.5968 0.5487 ... 0.3808 1.771
Iterations (index) int32 20 19 25 25 25 18 ... 28 30 30 27 29
Attributes:
content: inverted resistivity models
comment: This dataset includes inverted resistivity models derived from ...
Coordinates, Dimensions, and Attributes
All data variables must have dimensions, coordinate, and attributes
Dimensions
Tabular data are typicaly 1-D or 2-D variables with the primary dimension being index
, which
corresponds to the rows of the input text file representing individual measurements.
print(survey.tabular[1]['index'])
<xarray.DataArray 'index' (index: 20701)>
array([ 0, 1, 2, ..., 20698, 20699, 20700], dtype=int32)
Coordinates:
spatial_ref float64 0.0
* index (index) int32 0 1 2 3 4 5 ... 20696 20697 20698 20699 20700
x (index) float64 3.579e+05 3.579e+05 ... 4.907e+05 4.906e+05
y (index) float64 1.211e+06 1.211e+06 ... 1.577e+06 1.577e+06
z (index) float64 45.83 46.61 46.95 46.66 ... 177.0 179.4 177.2
Attributes:
standard_name: index
long_name: Index of individual data points
units: not_defined
null_value: not_defined
valid_range: [ 0 20700]
grid_mapping: spatial_ref
If a dimension is not discrete, meaning it represents ranges (such as depth layers), then the bounds on each dimension value also need to be defined, and are linked to the dimension through the “bounds” attribute.
print('example non-discrete dimension:\n')
print(survey.tabular[1]['gate_times'])
print('\n\ncorresponding bounds on non-discrete dimension:\n')
print(survey.tabular[1]['gate_times_bnds'])
example non-discrete dimension:
<xarray.DataArray 'gate_times' (gate_times: 15)>
array([1.085000e-05, 3.255000e-05, 5.426000e-05, 8.681000e-05, 1.410600e-04,
2.278700e-04, 3.689300e-04, 5.859500e-04, 9.114800e-04, 1.410630e-03,
2.191900e-03, 3.418070e-03, 5.338690e-03, 8.301020e-03, 1.337928e-02])
Coordinates:
spatial_ref float64 0.0
* gate_times (gate_times) float64 1.085e-05 3.255e-05 ... 0.008301 0.01338
Attributes:
standard_name: gate_times
long_name: receiver gate times
units: seconds
null_value: not_defined
valid_range: [1.085000e-05 1.337928e-02]
grid_mapping: spatial_ref
bounds: gate_times_bnds
corresponding bounds on non-discrete dimension:
<xarray.DataArray 'gate_times_bnds' (gate_times: 15, nv: 2)>
array([[5.430000e-06, 1.628000e-05],
[2.713000e-05, 3.798000e-05],
[4.883000e-05, 5.968000e-05],
[7.053000e-05, 1.030800e-04],
[1.139400e-04, 1.681900e-04],
[1.790400e-04, 2.767000e-04],
[2.875500e-04, 4.503200e-04],
[4.611700e-04, 7.107400e-04],
[7.215900e-04, 1.101380e-03],
[1.112230e-03, 1.709030e-03],
[1.719880e-03, 2.663920e-03],
[2.674770e-03, 4.161360e-03],
[4.172210e-03, 6.505170e-03],
[6.516030e-03, 1.008600e-02],
[1.009686e-02, 1.666171e-02]])
Coordinates:
spatial_ref float64 0.0
* nv (nv) int64 0 1
* gate_times (gate_times) float64 1.085e-05 3.255e-05 ... 0.008301 0.01338
Attributes:
standard_name: gate_times_bounds
long_name: receiver gate times cell boundaries
units: seconds
null_value: not_defined
valid_range: [5.430000e-06 1.666171e-02]
grid_mapping: spatial_ref
Coordinates
Coordinates define the spatial and temporal positioning of the data (X Y Z T).
Additionally, all dimensions are by default classified as a coordinate.
This means a dataset can have both dimensional and non-dimensional coordinates.
Dimensional coordinates are noted with a * (or bold text) in printed output of the xarray,
such as index
, gate_times
, nv
in this example:
print(survey.tabular[0].coords)
Coordinates:
spatial_ref float64 0.0
* index (index) int32 0 1 2 3 4 5 ... 20696 20697 20698 20699 20700
* gate_times (gate_times) float64 1.085e-05 3.255e-05 ... 0.008301 0.01338
* nv (nv) int64 0 1
x (index) float64 3.579e+05 3.579e+05 ... 4.907e+05 4.906e+05
y (index) float64 1.211e+06 1.211e+06 ... 1.577e+06 1.577e+06
z (index) float64 45.83 46.61 46.95 46.66 ... 177.0 179.4 177.2
Tabular Coordinates
In Tabular data, coordinates are typically non-dimensional, since the primary dataset
dimension is index
. By default, we define the spatial coordinates, x
and y
,
based on the longitude and latitude (or easting/northing) data variables. If relevant,
z
and t
coordinate variables can also be defined, representing the vertical and
temporal coordinates of the data points.
Note: All coordinates must match the coordinate reference system defined in the Survey.
Raster Coordinates
Raster data are gridded, typically representing maps or multi-dimensional models.
Therefore, Raster data almost always have dimensional coordinates, i.e., the
data dimensions correspond directly to either spatial or temporal coordinates (x
, y
, z
, t
).
print(survey.raster.coords)
Coordinates:
spatial_ref float64 0.0
* x (x) float64 2.928e+05 2.934e+05 2.94e+05 ... 6.51e+05 6.516e+05
* nv (nv) int64 0 1
* y (y) float64 1.607e+06 1.606e+06 ... 8.808e+05 8.802e+05
The Spatial Reference Coordinate
the spatial_ref
coordinate variable is a non-dimensional coordinate that
contains information on the coordinate reference system. For more information,
see Coordinate Reference Systems.
Attributes
Both datasets and data variables have attributes (metadata fields). Certain attributes are required, see our documentation on the GS standard. for more details.
Dataset attributes
Dataset attributes provide users a way to document and describe supplementary
information about a dataset group as a whole, such as model inversion parameters
or other processing descriptions. At a minimum, a content
attribute should
contain a brief summary of the contents of the dataset.
pprint(survey.tabular[1].attrs)
{'comment': 'This dataset includes inverted resistivity models derived from '
'processed AEM data produced by USGS',
'content': 'inverted resistivity models'}
Variable attributes
Each data variable must contain attributes detailing the metadata of that individual variable. These follow the Climate and Forecast (CF) metadata conventions.
pprint(survey.tabular[1]['conductivity'].attrs)
{'format': '30e15.6',
'grid_mapping': 'spatial_ref',
'long_name': 'not_defined',
'null_value': 'not_defined',
'standard_name': 'conductivity',
'units': 'not_defined',
'valid_range': array([1.e-04, 1.e+01])}
Total running time of the script: (0 minutes 1.227 seconds)