API#

intake-axds Python API#

intake-axds catalog#

Set up a catalog for Axiom assets.

AXDSCatalog

class intake_axds.axds_cat.AXDSCatalog(*args, **kwargs)[source]#

Makes data sources out of all datasets for a given AXDS data type.

pglabels#

If keys_to_match or standard_names is input to search on, they are converted to parameterGroupLabels and saved to the catalog metadata.

Type:

list[str]

pgids#

If keys_to_match or standard_names is input to search on, they are converted to parameterGroupIds and saved to the catalog metadata. In the case that query_type=="intersection_constrained" and datatype=="platform2", the pgids are passed to the sensor source so that only data from variables corresponding to those pgids are returned.

Type:

list[int]

Parameters:
  • datatype (str) – Axiom data type. Currently “platform2” or “sensor_station” but eventually also “module”. Platforms and sensors are returned as dataframe containers.

  • keys_to_match (str, list, optional) – Name of keys to match with system-available variable parameterNames using criteria. To filter search by variables, either input keys_to_match and a vocabulary or input standard_names. Results from multiple values will be combined according to query_type.

  • standard_names (str, list, optional) – Standard names to select from Axiom search parameterNames. If more than one is input, the search is for a logical OR of datasets containing the standard_names. To filter search by variables, either input keys_to_match and a vocabulary or input standard_names. Results from multiple values will be combined according to query_type.

  • bbox (tuple of 4 floats, optional) – For explicit geographic search queries, pass a tuple of four floats in the bbox argument. The bounding box parameters are (min_lon, min_lat, max_lon, max_lat).

  • start_time (str, optional) – For explicit search queries for datasets that contain data after start_time. Must include end_time if include start_time.

  • end_time (str, optional) – For explicit search queries for datasets that contain data before end_time. Must include start_time if include end_time.

  • search_for (str, list of strings, optional) – For explicit search queries for datasets that any contain of the terms specified in this keyword argument. Results from multiple values will be combined according to query_type.

  • kwargs_search (dict, optional) –

    Keyword arguments to input to search on the server before making the catalog. Options are:

    • to search by bounding box: include all of min_lon, max_lon, min_lat, max_lat: (int, float). Longitudes must be between -180 to +180.

    • to search within a datetime range: include both of min_time, max_time: interpretable datetime string, e.g., “2021-1-1”

    • to search using a textual keyword: include search_for as a string or list of strings. Results from multiple values will be combined according to query_type.

  • query_type (str, default "union") –

    Specifies how the catalog should apply the query parameters. Choices are:

    • "union": the results will be the union of each resulting dataset. This is equivalent to a logical OR.

    • "intersection": the set of results will be the intersection of each individual query made to the server. This is equivalent to a logical AND of the results.

    • "intersection_constrained": the set of results will be the intersection of queries but also only the variables requested (using either keys_to_match or standard_names) will be returned in the DataFrame, instead of all available variables. This only applies to datatype=="sensor_station".

  • qartod (bool, int, list, optional) –

    Whether to return QARTOD agg flags when available, which is only for sensor_stations. Can instead input an int or a list of ints representing the _qa_agg flags for which to return data values. More information about QARTOD testing and flags can be found here: https://cdn.ioos.noaa.gov/media/2020/07/QARTOD-Data-Flags-Manual_version1.2final.pdf. Only used by datatype “sensor_station”. Is not available if binned==True.

    Examples of ways to use this input are:

    • qartod=True: Return aggregate QARTOD flags as a column for each data variable.

    • qartod=False: Do not return any QARTOD flag columns.

    • qartod=1: nan any data values for which the aggregated QARTOD flags are not equal to 1.

    • qartod=[1,3]: nan any data values for which the aggregated QARTOD flags are not equal to 1 or 3.

    Flags are:

    • 1: Pass

    • 2: Not Evaluated

    • 3: Suspect

    • 4: Fail

    • 9: Missing Data

  • use_units (bool, optional) – If True include units in column names. Syntax is “standard_name [units]”. If False, no units. Then syntax for column names is “standard_name”. This is currently specific to sensor_station only. Only used by datatype “sensor_station”.

  • binned (bool, optional) – True for binned data, False for raw, by default False. Only used by datatype “sensor_station”.

  • bin_interval (Optional[str], optional) – If binned=True, input the binning interval to return. Options are hourly, daily, weekly, monthly, yearly. If bin_interval is input, binned is set to True. Only used by datatype “sensor_station”.

  • page_size (int, optional) – Number of results. Fewer is faster. Note that default is 10. Note that if you want to make sure you get all available datasets, you should input a large number like 50000.

  • verbose (bool, optional) – Set to True for helpful information.

  • ttl (int, optional) – Time to live for catalog (in seconds). How long before force-reloading catalog. Set to None to not do this.

  • name (str, optional) – Name for catalog.

  • description (str, optional) – Description for catalog.

  • metadata (dict, optional) – Metadata for catalog.

  • kwargs – Other input arguments are passed to the intake Catalog class. They can includegetenv, getshell, persist_mode, storage_options, and user_parameters, in addition to some that are surfaced directly in this class.

Notes

only datatype sensor_station uses the following parameters: qartod, use_units, binned, bin_interval

datatype of sensor_station skips webcam data.

Attributes:
auth
cache
cache_dirs
cat
classname
description
dtype
entry
gui

Source GUI, with parameter selection and plotting

has_been_persisted
hvplot

alias for DataSource.plot

is_persisted
kwargs
plot

Plot API accessor

plots

List custom associated quick-plots

shape

Methods

__call__(**kwargs)

Create a new instance of this source with altered arguments

close()

Close open resources corresponding to this data source.

configure_new(**kwargs)

Create a new instance of this source with altered arguments

describe()

Description from the entry spec

discover()

Open resource and populate the source attributes.

export(path, **kwargs)

Save this data for sharing with other people

filter(func)

Create a Catalog of a subset of entries based on a condition

force_reload()

Imperative reload data now

from_dict(entries, **kwargs)

Create Catalog from the given set of entries

get(**kwargs)

Create a new instance of this source with altered arguments

get_search_urls()

Gather all search urls for catalog.

items()

Get an iterator over (key, source) tuples for the catalog entries.

keys()

Entry names in this catalog as an iterator (alias for __iter__)

persist([ttl])

Save data from this source to local persistent storage

pop(key)

Remove entry from catalog and return it

read()

Load entire dataset into a container and return it

read_chunked()

Return iterator over container fragments of data source

read_partition(i)

Return a part of the data corresponding to i-th partition.

reload()

Reload catalog if sufficient time has passed

save(url[, storage_options])

Output this catalog to a file as YAML

search_url([pglabel, text_search])

Set up one url for searching.

serialize()

Produce YAML version of this catalog.

to_dask()

Return a dask container for this data source

to_spark()

Provide an equivalent data object in Apache Spark

values()

Get an iterator over the sources for catalog entries.

walk([sofar, prefix, depth])

Get all entries in this catalog and sub-catalogs

yaml()

Return YAML representation of this data-source

get_persisted

search

set_cache_dir

get_search_urls() list[source]#

Gather all search urls for catalog.

Inputs that can have more than one search_url are pglabels and search_for list.

Returns:

List of search urls.

Return type:

list

search_url(pglabel: Optional[str] = None, text_search: Optional[str] = None) str[source]#

Set up one url for searching.

Parameters:
  • pglabel (Optional[str], optional) – Parameter Group Label (not ID), by default None

  • text_search (Optional[str], optional) – free text search, by default None

Returns:

URL to use to search Axiom systems.

Return type:

str

intake-axds sensor source#

AXDSSensorSource

class intake_axds.axds.AXDSSensorSource(*args, **kwargs)[source]#

Intake Source for AXDS sensor

Parameters:
  • internal_id (Optional[int], optional) – Internal station id for Axiom, by default None. Not the UUID. Need to input internal_id or UUID. If both are input, be sure they are for the same station.

  • uuid (Optional[str], optional) – The UUID for the station, by default None. Not the internal_id. Need to input internal_id or UUID. If both are input, be sure they are for the same station. Note that there may also be a “datasetId” parameter which is sometimes but not always the same as the UUID.

  • start_time (Optional[str], optional) – At what datetime for data to start, by default None. Must be interpretable by pandas Timestamp. If not input, the datetime at which the dataset starts will be used.

  • end_time (Optional[str], optional) – At what datetime for data to end, by default None. Must be interpretable by pandas Timestamp. If not input, the datetime at which the dataset ends will be used.

  • qartod (bool, int, list, optional) –

    Whether to return QARTOD agg flags when available, which is only for sensor_stations. Can instead input an int or a list of ints representing the _qa_agg flags for which to return data values. More information about QARTOD testing and flags can be found here: https://cdn.ioos.noaa.gov/media/2020/07/QARTOD-Data-Flags-Manual_version1.2final.pdf. Only used by datatype “sensor_station”. Is not available if binned==True.

    Examples of ways to use this input are:

    • qartod=True: Return aggregate QARTOD flags as a column for each data variable.

    • qartod=False: Do not return any QARTOD flag columns.

    • qartod=1: nan any data values for which the aggregated QARTOD flags are not equal to 1.

    • qartod=[1,3]: nan any data values for which the aggregated QARTOD flags are not equal to 1 or 3.

    Flags are:

    • 1: Pass

    • 2: Not Evaluated

    • 3: Suspect

    • 4: Fail

    • 9: Missing Data

  • use_units (bool, optional) – If True include units in column names. Syntax is “standard_name [units]”. If False, no units. Then syntax for column names is “standard_name”. This is currently specific to sensor_station only. Only used by datatype “sensor_station”.

  • metadata (dict, optional) – Metadata for catalog.

  • binned (bool, optional) – True for binned data, False for raw, by default False. Only used by datatype “sensor_station”.

  • bin_interval (Optional[str], optional) – If binned=True, input the binning interval to return. Options are hourly, daily, weekly, monthly, yearly. If bin_interval is input, binned is set to True. Only used by datatype “sensor_station”.

  • only_pgids (list, optional) – If input, only return data associated with these parameterGroupIds. This is separate from parameterGroupLabels and parameterGroupIds that might be present in the metadata.

Raises:

ValueError – _description_

Attributes:
cache
cache_dirs
cat
classname
data_urls

Prepare to load in data by getting data_urls.

description
dtype
entry
gui

Source GUI, with parameter selection and plotting

has_been_persisted
hvplot

alias for DataSource.plot

is_persisted
plot

Plot API accessor

plots

List custom associated quick-plots

shape

Methods

__call__(**kwargs)

Create a new instance of this source with altered arguments

close()

Close open resources corresponding to this data source.

configure_new(**kwargs)

Create a new instance of this source with altered arguments

describe()

Description from the entry spec

discover()

Open resource and populate the source attributes.

export(path, **kwargs)

Save this data for sharing with other people

get(**kwargs)

Create a new instance of this source with altered arguments

get_filters()

Return appropriate filter for stationid.

persist([ttl])

Save data from this source to local persistent storage

read()

read data in

read_chunked()

Return iterator over container fragments of data source

read_partition(i)

Return a part of the data corresponding to i-th partition.

to_dask()

Return a dask container for this data source

to_spark()

Provide an equivalent data object in Apache Spark

yaml()

Return YAML representation of this data-source

get_persisted

set_cache_dir

property data_urls#

Prepare to load in data by getting data_urls.

For V1 sources there will be a data_url per parameterGroupId but not for V2 sources.

get_filters()[source]#

Return appropriate filter for stationid.

What filter form to use depends on if V1 or V2.

For V1, use each parameterGroupId only once to make a filter since all data of that type will be read in together.

Following Sensor API https://admin.axds.co/#!/sensors/api/overview

read()[source]#

read data in

intake-axds utilities#

Utils to run.

intake_axds.utils.available_names() list[source]#

Return available parameterNames for variables.

Returns:

parametersNames, which are a superset of standard_names.

Return type:

list

intake_axds.utils.check_station(metadata: dict, verbose: bool) bool[source]#

Whether to keep station or not.

Parameters:
  • metadata (dict) – metadata about station.

  • verbose (bool, optional) – Set to True for helpful information.

Returns:

True to keep station, False to skip.

Return type:

bool

intake_axds.utils.load_metadata(datatype: str, results: dict) dict[source]#

Load metadata for catalog entry.

Parameters:

results (dict) – Returned results from call to server for a single dataset.

Returns:

Metadata to store with catalog entry.

Return type:

dict

intake_axds.utils.make_data_url(filter: str, start_time: str, end_time: str, binned: bool = False, bin_interval: Optional[str] = None) str[source]#

Create url for accessing sensor data, raw or binned.

Parameters:
  • filter (str) – get this from make_filter(); contains station and potentially variable info.

  • start_time (str) – e.g. “2022-1-1”. Needs to be interpretable by pandas Timestamp.

  • end_time (str) – e.g. “2022-1-2”. Needs to be interpretable by pandas Timestamp.

  • binned (bool, optional) – True for binned data, False for raw, by default False.

  • bin_interval (Optional[str], optional) – If binned=True, input the binning interval to return. Options are hourly, daily, weekly, monthly, yearly.

Returns:

URL from which to access data.

Return type:

str

intake_axds.utils.make_filter(internal_id: int, parameterGroupId: Optional[int] = None) str[source]#

Make filter for Axiom Sensors API.

Parameters:
  • internal_id (int) – internal id for station. Not the uuid.

  • parameterGroupId (Optional[int], optional) – Parameter Group ID to narrow search, by default None

Returns:

filter to use in station metadata and data access

Return type:

str

intake_axds.utils.make_label(label: str, units: Optional[str] = None, use_units: bool = True) str[source]#

making column name

Parameters:
  • label (str) – variable label to use in column header

  • units (Optional[str], optional) – units to use in column name, if not None, by default None

  • use_units (bool, optional) – Users can choose not to include units in column name, by default True

Returns:

string to use as column name

Return type:

str

intake_axds.utils.make_metadata_url(filter: str) str[source]#

Make url for finding metadata

Parameters:

filter (str) – filter for Sensors API. Use make_filter to make this.

Returns:

url for metadata.

Return type:

str

intake_axds.utils.make_search_docs_url(internal_id: Optional[int] = None, uuid: Optional[str] = None) str[source]#

Url for Axiom Search docs.

Uses whichever of internal_id and uuid is not None to formulate url.

Parameters:
  • internal_id (Optional[int], optional) – Internal station id for Axiom. Not the UUID.

  • uuid (str) – uuid for station.

Returns:

Url for finding Axiom Search docs

Return type:

str

intake_axds.utils.match_key_to_parameter(keys_to_match: list, criteria: Optional[dict] = None) list[source]#

Find Parameter Group values that match keys_to_match.

Parameters:
  • keys_to_match (list) – The custom_criteria key to narrow the search, which will be matched to the category results using the custom_criteria that must be set up ahead of time with cf-pandas.

  • criteria (dict, optional) – Criteria to use to map from variable to attributes describing the variable. If user has defined custom_criteria, this will be used by default.

Returns:

Parameter Group values that match key, according to the custom criteria.

Return type:

list

intake_axds.utils.match_std_names_to_parameter(standard_names: list) list[source]#

Find Parameter Group values that match standard_names.

Parameters:

standard_names (list) – standard_names values to narrow the search.

Returns:

Parameter Group values that match standard_names.

Return type:

list

intake_axds.utils.response_from_url(url: str) Union[list, dict][source]#

Return response from url.

Parameters:

url (str) – URL to check.

Returns:

should be a list or dict depending on the url

Return type:

list, dict

Inherited from intake#

intake-axds catalog#

Set up a catalog for Axiom assets.

class intake_axds.axds_cat.AXDSCatalog(*args, **kwargs)[source]#

Bases: Catalog

Makes data sources out of all datasets for a given AXDS data type.

pglabels#

If keys_to_match or standard_names is input to search on, they are converted to parameterGroupLabels and saved to the catalog metadata.

Type:

list[str]

pgids#

If keys_to_match or standard_names is input to search on, they are converted to parameterGroupIds and saved to the catalog metadata. In the case that query_type=="intersection_constrained" and datatype=="platform2", the pgids are passed to the sensor source so that only data from variables corresponding to those pgids are returned.

Type:

list[int]

Parameters:
  • datatype (str) – Axiom data type. Currently “platform2” or “sensor_station” but eventually also “module”. Platforms and sensors are returned as dataframe containers.

  • keys_to_match (str, list, optional) – Name of keys to match with system-available variable parameterNames using criteria. To filter search by variables, either input keys_to_match and a vocabulary or input standard_names. Results from multiple values will be combined according to query_type.

  • standard_names (str, list, optional) – Standard names to select from Axiom search parameterNames. If more than one is input, the search is for a logical OR of datasets containing the standard_names. To filter search by variables, either input keys_to_match and a vocabulary or input standard_names. Results from multiple values will be combined according to query_type.

  • bbox (tuple of 4 floats, optional) – For explicit geographic search queries, pass a tuple of four floats in the bbox argument. The bounding box parameters are (min_lon, min_lat, max_lon, max_lat).

  • start_time (str, optional) – For explicit search queries for datasets that contain data after start_time. Must include end_time if include start_time.

  • end_time (str, optional) – For explicit search queries for datasets that contain data before end_time. Must include start_time if include end_time.

  • search_for (str, list of strings, optional) – For explicit search queries for datasets that any contain of the terms specified in this keyword argument. Results from multiple values will be combined according to query_type.

  • kwargs_search (dict, optional) –

    Keyword arguments to input to search on the server before making the catalog. Options are:

    • to search by bounding box: include all of min_lon, max_lon, min_lat, max_lat: (int, float). Longitudes must be between -180 to +180.

    • to search within a datetime range: include both of min_time, max_time: interpretable datetime string, e.g., “2021-1-1”

    • to search using a textual keyword: include search_for as a string or list of strings. Results from multiple values will be combined according to query_type.

  • query_type (str, default "union") –

    Specifies how the catalog should apply the query parameters. Choices are:

    • "union": the results will be the union of each resulting dataset. This is equivalent to a logical OR.

    • "intersection": the set of results will be the intersection of each individual query made to the server. This is equivalent to a logical AND of the results.

    • "intersection_constrained": the set of results will be the intersection of queries but also only the variables requested (using either keys_to_match or standard_names) will be returned in the DataFrame, instead of all available variables. This only applies to datatype=="sensor_station".

  • qartod (bool, int, list, optional) –

    Whether to return QARTOD agg flags when available, which is only for sensor_stations. Can instead input an int or a list of ints representing the _qa_agg flags for which to return data values. More information about QARTOD testing and flags can be found here: https://cdn.ioos.noaa.gov/media/2020/07/QARTOD-Data-Flags-Manual_version1.2final.pdf. Only used by datatype “sensor_station”. Is not available if binned==True.

    Examples of ways to use this input are:

    • qartod=True: Return aggregate QARTOD flags as a column for each data variable.

    • qartod=False: Do not return any QARTOD flag columns.

    • qartod=1: nan any data values for which the aggregated QARTOD flags are not equal to 1.

    • qartod=[1,3]: nan any data values for which the aggregated QARTOD flags are not equal to 1 or 3.

    Flags are:

    • 1: Pass

    • 2: Not Evaluated

    • 3: Suspect

    • 4: Fail

    • 9: Missing Data

  • use_units (bool, optional) – If True include units in column names. Syntax is “standard_name [units]”. If False, no units. Then syntax for column names is “standard_name”. This is currently specific to sensor_station only. Only used by datatype “sensor_station”.

  • binned (bool, optional) – True for binned data, False for raw, by default False. Only used by datatype “sensor_station”.

  • bin_interval (Optional[str], optional) – If binned=True, input the binning interval to return. Options are hourly, daily, weekly, monthly, yearly. If bin_interval is input, binned is set to True. Only used by datatype “sensor_station”.

  • page_size (int, optional) – Number of results. Fewer is faster. Note that default is 10. Note that if you want to make sure you get all available datasets, you should input a large number like 50000.

  • verbose (bool, optional) – Set to True for helpful information.

  • ttl (int, optional) – Time to live for catalog (in seconds). How long before force-reloading catalog. Set to None to not do this.

  • name (str, optional) – Name for catalog.

  • description (str, optional) – Description for catalog.

  • metadata (dict, optional) – Metadata for catalog.

  • kwargs – Other input arguments are passed to the intake Catalog class. They can includegetenv, getshell, persist_mode, storage_options, and user_parameters, in addition to some that are surfaced directly in this class.

Notes

only datatype sensor_station uses the following parameters: qartod, use_units, binned, bin_interval

datatype of sensor_station skips webcam data.

Attributes:
auth
cache
cache_dirs
cat
classname
description
dtype
entry
gui

Source GUI, with parameter selection and plotting

has_been_persisted
hvplot

alias for DataSource.plot

is_persisted
kwargs
plot

Plot API accessor

plots

List custom associated quick-plots

shape

Methods

__call__(**kwargs)

Create a new instance of this source with altered arguments

close()

Close open resources corresponding to this data source.

configure_new(**kwargs)

Create a new instance of this source with altered arguments

describe()

Description from the entry spec

discover()

Open resource and populate the source attributes.

export(path, **kwargs)

Save this data for sharing with other people

filter(func)

Create a Catalog of a subset of entries based on a condition

force_reload()

Imperative reload data now

from_dict(entries, **kwargs)

Create Catalog from the given set of entries

get(**kwargs)

Create a new instance of this source with altered arguments

get_search_urls()

Gather all search urls for catalog.

items()

Get an iterator over (key, source) tuples for the catalog entries.

keys()

Entry names in this catalog as an iterator (alias for __iter__)

persist([ttl])

Save data from this source to local persistent storage

pop(key)

Remove entry from catalog and return it

read()

Load entire dataset into a container and return it

read_chunked()

Return iterator over container fragments of data source

read_partition(i)

Return a part of the data corresponding to i-th partition.

reload()

Reload catalog if sufficient time has passed

save(url[, storage_options])

Output this catalog to a file as YAML

search_url([pglabel, text_search])

Set up one url for searching.

serialize()

Produce YAML version of this catalog.

to_dask()

Return a dask container for this data source

to_spark()

Provide an equivalent data object in Apache Spark

values()

Get an iterator over the sources for catalog entries.

walk([sofar, prefix, depth])

Get all entries in this catalog and sub-catalogs

yaml()

Return YAML representation of this data-source

get_persisted

search

set_cache_dir

close()#

Close open resources corresponding to this data source.

configure_new(**kwargs)#

Create a new instance of this source with altered arguments

Enables the picking of options and re-evaluating templates from any user-parameters associated with this source, or overriding any of the init arguments.

Returns a new data source instance. The instance will be recreated from the original entry definition in a catalog if this source was originally created from a catalog.

describe()#

Description from the entry spec

discover()#

Open resource and populate the source attributes.

export(path, **kwargs)#

Save this data for sharing with other people

Creates a copy of the data in a format appropriate for its container, in the location specified (which can be remote, e.g., s3).

Returns the resultant source object, so that you can, for instance, add it to a catalog (catalog.add(source)) or get its YAML representation (.yaml()).

filter(func)#

Create a Catalog of a subset of entries based on a condition

Warning

This function operates on CatalogEntry objects not DataSource objects.

Note

Note that, whatever specific class this is performed on, the return instance is a Catalog. The entries are passed unmodified, so they will still reference the original catalog instance and include its details such as directory,.

Parameters:

func (function) – This should take a CatalogEntry and return True or False. Those items returning True will be included in the new Catalog, with the same entry names

Returns:

New catalog with Entries that still refer to their parents

Return type:

Catalog

force_reload()#

Imperative reload data now

classmethod from_dict(entries, **kwargs)#

Create Catalog from the given set of entries

Parameters:
  • entries (dict-like) – A mapping of name:entry which supports dict-like functionality, e.g., is derived from collections.abc.Mapping.

  • kwargs (passed on the constructor) – Things like metadata, name; see __init__.

Return type:

Catalog instance

get(**kwargs)#

Create a new instance of this source with altered arguments

Enables the picking of options and re-evaluating templates from any user-parameters associated with this source, or overriding any of the init arguments.

Returns a new data source instance. The instance will be recreated from the original entry definition in a catalog if this source was originally created from a catalog.

get_search_urls() list[source]#

Gather all search urls for catalog.

Inputs that can have more than one search_url are pglabels and search_for list.

Returns:

List of search urls.

Return type:

list

property gui#

Source GUI, with parameter selection and plotting

property has_been_persisted#

The base class does not interact with persistence

property hvplot#

alias for DataSource.plot

property is_persisted#

The base class does not interact with persistence

items()#

Get an iterator over (key, source) tuples for the catalog entries.

keys()#

Entry names in this catalog as an iterator (alias for __iter__)

persist(ttl=None, **kwargs)#

Save data from this source to local persistent storage

Parameters:
  • ttl (numeric, optional) – Time to live in seconds. If provided, the original source will be accessed and a new persisted version written transparently when more than ttl seconds have passed since the old persisted version was written.

  • kargs (passed to the _persist method on the base container.) –

property plot#

Plot API accessor

This property exposes both predefined plots (described in the source metadata) and general-purpose plotting via the hvPlot library. Supported containers are: array, dataframe and xarray,

To display in a notebook, be sure to run intake.output_notebook() first.

The set of plots defined for this source can be found by

>>> source.plots
["plot1", "plot2"]

and to display one of these:

>>> source.plot.plot1()
<holoviews/panel output>

To create new plot types and supply custom configuration, use one of the methods of hvplot.hvPlot:

>>> source.plot.line(x="fieldX", y="fieldY")

The full set of arguments that can be passed, and the types of plot they refer to, can be found in the doc and attributes of hvplot.HoloViewsConverter.

Once you have found a suitable plot, you may wish to update the plots definitions of the source. Simply add the plotname= optional argument (this will overwrite any existing plot of that name). The source’s YAML representation will include the new plot, and it could be saved into a catalog with this new definition.

>>> source.plot.line(plotname="new", x="fieldX", y="fieldY");
>>> source.plots
["plot1", "plot2", "new"]
property plots#

List custom associated quick-plots

pop(key)#

Remove entry from catalog and return it

This relies on the _entries attribute being mutable, which it normally is. Note that if a catalog automatically reloads, any entry removed here may soon reappear

Parameters:

key (str) – Key to give the entry in the cat

read()#

Load entire dataset into a container and return it

read_chunked()#

Return iterator over container fragments of data source

read_partition(i)#

Return a part of the data corresponding to i-th partition.

By default, assumes i should be an integer between zero and npartitions; override for more complex indexing schemes.

reload()#

Reload catalog if sufficient time has passed

save(url, storage_options=None)#

Output this catalog to a file as YAML

Parameters:
  • url (str) – Location to save to, perhaps remote

  • storage_options (dict) – Extra arguments for the file-system

search_url(pglabel: Optional[str] = None, text_search: Optional[str] = None) str[source]#

Set up one url for searching.

Parameters:
  • pglabel (Optional[str], optional) – Parameter Group Label (not ID), by default None

  • text_search (Optional[str], optional) – free text search, by default None

Returns:

URL to use to search Axiom systems.

Return type:

str

serialize()#

Produce YAML version of this catalog.

Note that this is not the same as .yaml(), which produces a YAML block referring to this catalog.

to_dask()#

Return a dask container for this data source

to_spark()#

Provide an equivalent data object in Apache Spark

The mapping of python-oriented data containers to Spark ones will be imperfect, and only a small number of drivers are expected to be able to produce Spark objects. The standard arguments may b translated, unsupported or ignored, depending on the specific driver.

This method requires the package intake-spark

values()#

Get an iterator over the sources for catalog entries.

walk(sofar=None, prefix=None, depth=2)#

Get all entries in this catalog and sub-catalogs

Parameters:
  • sofar (dict or None) – Within recursion, use this dict for output

  • prefix (list of str or None) – Names of levels already visited

  • depth (int) – Number of levels to descend; needed to truncate circular references and for cleaner output

Returns:

  • Dict where the keys are the entry names in dotted syntax, and the

  • values are entry instances.

yaml()#

Return YAML representation of this data-source

The output may be roughly appropriate for inclusion in a YAML catalog. This is a best-effort implementation

intake-axds sensor source#

class intake_axds.axds.AXDSSensorSource(*args, **kwargs)[source]#

Bases: DataSource

Intake Source for AXDS sensor

Parameters:
  • internal_id (Optional[int], optional) – Internal station id for Axiom, by default None. Not the UUID. Need to input internal_id or UUID. If both are input, be sure they are for the same station.

  • uuid (Optional[str], optional) – The UUID for the station, by default None. Not the internal_id. Need to input internal_id or UUID. If both are input, be sure they are for the same station. Note that there may also be a “datasetId” parameter which is sometimes but not always the same as the UUID.

  • start_time (Optional[str], optional) – At what datetime for data to start, by default None. Must be interpretable by pandas Timestamp. If not input, the datetime at which the dataset starts will be used.

  • end_time (Optional[str], optional) – At what datetime for data to end, by default None. Must be interpretable by pandas Timestamp. If not input, the datetime at which the dataset ends will be used.

  • qartod (bool, int, list, optional) –

    Whether to return QARTOD agg flags when available, which is only for sensor_stations. Can instead input an int or a list of ints representing the _qa_agg flags for which to return data values. More information about QARTOD testing and flags can be found here: https://cdn.ioos.noaa.gov/media/2020/07/QARTOD-Data-Flags-Manual_version1.2final.pdf. Only used by datatype “sensor_station”. Is not available if binned==True.

    Examples of ways to use this input are:

    • qartod=True: Return aggregate QARTOD flags as a column for each data variable.

    • qartod=False: Do not return any QARTOD flag columns.

    • qartod=1: nan any data values for which the aggregated QARTOD flags are not equal to 1.

    • qartod=[1,3]: nan any data values for which the aggregated QARTOD flags are not equal to 1 or 3.

    Flags are:

    • 1: Pass

    • 2: Not Evaluated

    • 3: Suspect

    • 4: Fail

    • 9: Missing Data

  • use_units (bool, optional) – If True include units in column names. Syntax is “standard_name [units]”. If False, no units. Then syntax for column names is “standard_name”. This is currently specific to sensor_station only. Only used by datatype “sensor_station”.

  • metadata (dict, optional) – Metadata for catalog.

  • binned (bool, optional) – True for binned data, False for raw, by default False. Only used by datatype “sensor_station”.

  • bin_interval (Optional[str], optional) – If binned=True, input the binning interval to return. Options are hourly, daily, weekly, monthly, yearly. If bin_interval is input, binned is set to True. Only used by datatype “sensor_station”.

  • only_pgids (list, optional) – If input, only return data associated with these parameterGroupIds. This is separate from parameterGroupLabels and parameterGroupIds that might be present in the metadata.

Raises:

ValueError – _description_

Attributes:
cache
cache_dirs
cat
classname
data_urls

Prepare to load in data by getting data_urls.

description
dtype
entry
gui

Source GUI, with parameter selection and plotting

has_been_persisted
hvplot

alias for DataSource.plot

is_persisted
plot

Plot API accessor

plots

List custom associated quick-plots

shape

Methods

__call__(**kwargs)

Create a new instance of this source with altered arguments

close()

Close open resources corresponding to this data source.

configure_new(**kwargs)

Create a new instance of this source with altered arguments

describe()

Description from the entry spec

discover()

Open resource and populate the source attributes.

export(path, **kwargs)

Save this data for sharing with other people

get(**kwargs)

Create a new instance of this source with altered arguments

get_filters()

Return appropriate filter for stationid.

persist([ttl])

Save data from this source to local persistent storage

read()

read data in

read_chunked()

Return iterator over container fragments of data source

read_partition(i)

Return a part of the data corresponding to i-th partition.

to_dask()

Return a dask container for this data source

to_spark()

Provide an equivalent data object in Apache Spark

yaml()

Return YAML representation of this data-source

get_persisted

set_cache_dir

close()#

Close open resources corresponding to this data source.

configure_new(**kwargs)#

Create a new instance of this source with altered arguments

Enables the picking of options and re-evaluating templates from any user-parameters associated with this source, or overriding any of the init arguments.

Returns a new data source instance. The instance will be recreated from the original entry definition in a catalog if this source was originally created from a catalog.

property data_urls#

Prepare to load in data by getting data_urls.

For V1 sources there will be a data_url per parameterGroupId but not for V2 sources.

describe()#

Description from the entry spec

discover()#

Open resource and populate the source attributes.

export(path, **kwargs)#

Save this data for sharing with other people

Creates a copy of the data in a format appropriate for its container, in the location specified (which can be remote, e.g., s3).

Returns the resultant source object, so that you can, for instance, add it to a catalog (catalog.add(source)) or get its YAML representation (.yaml()).

get(**kwargs)#

Create a new instance of this source with altered arguments

Enables the picking of options and re-evaluating templates from any user-parameters associated with this source, or overriding any of the init arguments.

Returns a new data source instance. The instance will be recreated from the original entry definition in a catalog if this source was originally created from a catalog.

get_filters()[source]#

Return appropriate filter for stationid.

What filter form to use depends on if V1 or V2.

For V1, use each parameterGroupId only once to make a filter since all data of that type will be read in together.

Following Sensor API https://admin.axds.co/#!/sensors/api/overview

property gui#

Source GUI, with parameter selection and plotting

property has_been_persisted#

The base class does not interact with persistence

property hvplot#

alias for DataSource.plot

property is_persisted#

The base class does not interact with persistence

persist(ttl=None, **kwargs)#

Save data from this source to local persistent storage

Parameters:
  • ttl (numeric, optional) – Time to live in seconds. If provided, the original source will be accessed and a new persisted version written transparently when more than ttl seconds have passed since the old persisted version was written.

  • kargs (passed to the _persist method on the base container.) –

property plot#

Plot API accessor

This property exposes both predefined plots (described in the source metadata) and general-purpose plotting via the hvPlot library. Supported containers are: array, dataframe and xarray,

To display in a notebook, be sure to run intake.output_notebook() first.

The set of plots defined for this source can be found by

>>> source.plots
["plot1", "plot2"]

and to display one of these:

>>> source.plot.plot1()
<holoviews/panel output>

To create new plot types and supply custom configuration, use one of the methods of hvplot.hvPlot:

>>> source.plot.line(x="fieldX", y="fieldY")

The full set of arguments that can be passed, and the types of plot they refer to, can be found in the doc and attributes of hvplot.HoloViewsConverter.

Once you have found a suitable plot, you may wish to update the plots definitions of the source. Simply add the plotname= optional argument (this will overwrite any existing plot of that name). The source’s YAML representation will include the new plot, and it could be saved into a catalog with this new definition.

>>> source.plot.line(plotname="new", x="fieldX", y="fieldY");
>>> source.plots
["plot1", "plot2", "new"]
property plots#

List custom associated quick-plots

read()[source]#

read data in

read_chunked()#

Return iterator over container fragments of data source

read_partition(i)#

Return a part of the data corresponding to i-th partition.

By default, assumes i should be an integer between zero and npartitions; override for more complex indexing schemes.

to_dask()#

Return a dask container for this data source

to_spark()#

Provide an equivalent data object in Apache Spark

The mapping of python-oriented data containers to Spark ones will be imperfect, and only a small number of drivers are expected to be able to produce Spark objects. The standard arguments may b translated, unsupported or ignored, depending on the specific driver.

This method requires the package intake-spark

yaml()#

Return YAML representation of this data-source

The output may be roughly appropriate for inclusion in a YAML catalog. This is a best-effort implementation