API
#
intake-axds
Python API#
intake-axds
catalog#
Set up a catalog for Axiom assets.
AXDSCatalog
- class intake_axds.axds_cat.AXDSCatalog(*args, **kwargs)[source]#
Makes data sources out of all datasets for a given AXDS data type.
- pglabels#
If
keys_to_match
orstandard_names
is input to search on, they are converted to parameterGroupLabels and saved to the catalog metadata.- Type:
list[str]
- pgids#
If
keys_to_match
orstandard_names
is input to search on, they are converted to parameterGroupIds and saved to the catalog metadata. In the case thatquery_type=="intersection_constrained"
anddatatype=="platform2"
, the pgids are passed to the sensor source so that only data from variables corresponding to those pgids are returned.- Type:
list[int]
- Parameters:
datatype (str) – Axiom data type. Currently “platform2” or “sensor_station” but eventually also “module”. Platforms and sensors are returned as dataframe containers.
keys_to_match (str, list, optional) – Name of keys to match with system-available variable parameterNames using criteria. To filter search by variables, either input keys_to_match and a vocabulary or input standard_names. Results from multiple values will be combined according to
query_type
.standard_names (str, list, optional) – Standard names to select from Axiom search parameterNames. If more than one is input, the search is for a logical OR of datasets containing the standard_names. To filter search by variables, either input keys_to_match and a vocabulary or input standard_names. Results from multiple values will be combined according to
query_type
.bbox (tuple of 4 floats, optional) – For explicit geographic search queries, pass a tuple of four floats in the bbox argument. The bounding box parameters are (min_lon, min_lat, max_lon, max_lat).
start_time (str, optional) – For explicit search queries for datasets that contain data after start_time. Must include end_time if include start_time.
end_time (str, optional) – For explicit search queries for datasets that contain data before end_time. Must include start_time if include end_time.
search_for (str, list of strings, optional) – For explicit search queries for datasets that any contain of the terms specified in this keyword argument. Results from multiple values will be combined according to
query_type
.kwargs_search (dict, optional) –
Keyword arguments to input to search on the server before making the catalog. Options are:
to search by bounding box: include all of min_lon, max_lon, min_lat, max_lat: (int, float). Longitudes must be between -180 to +180.
to search within a datetime range: include both of min_time, max_time: interpretable datetime string, e.g., “2021-1-1”
to search using a textual keyword: include search_for as a string or list of strings. Results from multiple values will be combined according to
query_type
.
query_type (str, default "union") –
Specifies how the catalog should apply the query parameters. Choices are:
"union"
: the results will be the union of each resulting dataset. This is equivalent to a logical OR."intersection"
: the set of results will be the intersection of each individual query made to the server. This is equivalent to a logical AND of the results."intersection_constrained"
: the set of results will be the intersection of queries but also only the variables requested (using eitherkeys_to_match
orstandard_names
) will be returned in the DataFrame, instead of all available variables. This only applies todatatype=="sensor_station"
.
qartod (bool, int, list, optional) –
Whether to return QARTOD agg flags when available, which is only for sensor_stations. Can instead input an int or a list of ints representing the _qa_agg flags for which to return data values. More information about QARTOD testing and flags can be found here: https://cdn.ioos.noaa.gov/media/2020/07/QARTOD-Data-Flags-Manual_version1.2final.pdf. Only used by datatype “sensor_station”. Is not available if
binned==True
.Examples of ways to use this input are:
qartod=True
: Return aggregate QARTOD flags as a column for each data variable.qartod=False
: Do not return any QARTOD flag columns.qartod=1
: nan any data values for which the aggregated QARTOD flags are not equal to 1.qartod=[1,3]
: nan any data values for which the aggregated QARTOD flags are not equal to 1 or 3.
Flags are:
1: Pass
2: Not Evaluated
3: Suspect
4: Fail
9: Missing Data
use_units (bool, optional) – If True include units in column names. Syntax is “standard_name [units]”. If False, no units. Then syntax for column names is “standard_name”. This is currently specific to sensor_station only. Only used by datatype “sensor_station”.
binned (bool, optional) – True for binned data, False for raw, by default False. Only used by datatype “sensor_station”.
bin_interval (Optional[str], optional) – If
binned=True
, input the binning interval to return. Options are hourly, daily, weekly, monthly, yearly. If bin_interval is input, binned is set to True. Only used by datatype “sensor_station”.page_size (int, optional) – Number of results. Fewer is faster. Note that default is 10. Note that if you want to make sure you get all available datasets, you should input a large number like 50000.
verbose (bool, optional) – Set to True for helpful information.
ttl (int, optional) – Time to live for catalog (in seconds). How long before force-reloading catalog. Set to None to not do this.
name (str, optional) – Name for catalog.
description (str, optional) – Description for catalog.
metadata (dict, optional) – Metadata for catalog.
kwargs – Other input arguments are passed to the intake Catalog class. They can includegetenv, getshell, persist_mode, storage_options, and user_parameters, in addition to some that are surfaced directly in this class.
Notes
only datatype sensor_station uses the following parameters: qartod, use_units, binned, bin_interval
datatype of sensor_station skips webcam data.
- Attributes:
Methods
__call__
(**kwargs)Create a new instance of this source with altered arguments
close
()Close open resources corresponding to this data source.
configure_new
(**kwargs)Create a new instance of this source with altered arguments
describe
()Description from the entry spec
discover
()Open resource and populate the source attributes.
export
(path, **kwargs)Save this data for sharing with other people
filter
(func)Create a Catalog of a subset of entries based on a condition
Imperative reload data now
from_dict
(entries, **kwargs)Create Catalog from the given set of entries
get
(**kwargs)Create a new instance of this source with altered arguments
Gather all search urls for catalog.
items
()Get an iterator over (key, source) tuples for the catalog entries.
keys
()Entry names in this catalog as an iterator (alias for __iter__)
persist
([ttl])Save data from this source to local persistent storage
pop
(key)Remove entry from catalog and return it
read
()Load entire dataset into a container and return it
Return iterator over container fragments of data source
Return a part of the data corresponding to i-th partition.
reload
()Reload catalog if sufficient time has passed
save
(url[, storage_options])Output this catalog to a file as YAML
search_url
([pglabel, text_search])Set up one url for searching.
Produce YAML version of this catalog.
to_dask
()Return a dask container for this data source
to_spark
()Provide an equivalent data object in Apache Spark
values
()Get an iterator over the sources for catalog entries.
walk
([sofar, prefix, depth])Get all entries in this catalog and sub-catalogs
yaml
()Return YAML representation of this data-source
get_persisted
search
set_cache_dir
- get_search_urls() list [source]#
Gather all search urls for catalog.
Inputs that can have more than one search_url are pglabels and search_for list.
- Returns:
List of search urls.
- Return type:
list
- search_url(pglabel: Optional[str] = None, text_search: Optional[str] = None) str [source]#
Set up one url for searching.
- Parameters:
pglabel (Optional[str], optional) – Parameter Group Label (not ID), by default None
text_search (Optional[str], optional) – free text search, by default None
- Returns:
URL to use to search Axiom systems.
- Return type:
str
intake-axds
sensor source#
AXDSSensorSource
- class intake_axds.axds.AXDSSensorSource(*args, **kwargs)[source]#
Intake Source for AXDS sensor
- Parameters:
internal_id (Optional[int], optional) – Internal station id for Axiom, by default None. Not the UUID. Need to input internal_id or UUID. If both are input, be sure they are for the same station.
uuid (Optional[str], optional) – The UUID for the station, by default None. Not the internal_id. Need to input internal_id or UUID. If both are input, be sure they are for the same station. Note that there may also be a “datasetId” parameter which is sometimes but not always the same as the UUID.
start_time (Optional[str], optional) – At what datetime for data to start, by default None. Must be interpretable by pandas
Timestamp
. If not input, the datetime at which the dataset starts will be used.end_time (Optional[str], optional) – At what datetime for data to end, by default None. Must be interpretable by pandas
Timestamp
. If not input, the datetime at which the dataset ends will be used.qartod (bool, int, list, optional) –
Whether to return QARTOD agg flags when available, which is only for sensor_stations. Can instead input an int or a list of ints representing the _qa_agg flags for which to return data values. More information about QARTOD testing and flags can be found here: https://cdn.ioos.noaa.gov/media/2020/07/QARTOD-Data-Flags-Manual_version1.2final.pdf. Only used by datatype “sensor_station”. Is not available if binned==True.
Examples of ways to use this input are:
qartod=True
: Return aggregate QARTOD flags as a column for each data variable.qartod=False
: Do not return any QARTOD flag columns.qartod=1
: nan any data values for which the aggregated QARTOD flags are not equal to 1.qartod=[1,3]
: nan any data values for which the aggregated QARTOD flags are not equal to 1 or 3.
Flags are:
1: Pass
2: Not Evaluated
3: Suspect
4: Fail
9: Missing Data
use_units (bool, optional) – If True include units in column names. Syntax is “standard_name [units]”. If False, no units. Then syntax for column names is “standard_name”. This is currently specific to sensor_station only. Only used by datatype “sensor_station”.
metadata (dict, optional) – Metadata for catalog.
binned (bool, optional) – True for binned data, False for raw, by default False. Only used by datatype “sensor_station”.
bin_interval (Optional[str], optional) – If
binned=True
, input the binning interval to return. Options are hourly, daily, weekly, monthly, yearly. If bin_interval is input, binned is set to True. Only used by datatype “sensor_station”.only_pgids (list, optional) – If input, only return data associated with these parameterGroupIds. This is separate from parameterGroupLabels and parameterGroupIds that might be present in the metadata.
- Raises:
ValueError – _description_
- Attributes:
Methods
__call__
(**kwargs)Create a new instance of this source with altered arguments
close
()Close open resources corresponding to this data source.
configure_new
(**kwargs)Create a new instance of this source with altered arguments
describe
()Description from the entry spec
discover
()Open resource and populate the source attributes.
export
(path, **kwargs)Save this data for sharing with other people
get
(**kwargs)Create a new instance of this source with altered arguments
Return appropriate filter for stationid.
persist
([ttl])Save data from this source to local persistent storage
read
()read data in
Return iterator over container fragments of data source
Return a part of the data corresponding to i-th partition.
to_dask
()Return a dask container for this data source
to_spark
()Provide an equivalent data object in Apache Spark
yaml
()Return YAML representation of this data-source
get_persisted
set_cache_dir
- property data_urls#
Prepare to load in data by getting data_urls.
For V1 sources there will be a data_url per parameterGroupId but not for V2 sources.
- get_filters()[source]#
Return appropriate filter for stationid.
What filter form to use depends on if V1 or V2.
For V1, use each parameterGroupId only once to make a filter since all data of that type will be read in together.
Following Sensor API https://admin.axds.co/#!/sensors/api/overview
intake-axds
utilities#
Utils to run.
- intake_axds.utils.available_names() list [source]#
Return available parameterNames for variables.
- Returns:
parametersNames, which are a superset of standard_names.
- Return type:
list
- intake_axds.utils.check_station(metadata: dict, verbose: bool) bool [source]#
Whether to keep station or not.
- Parameters:
metadata (dict) – metadata about station.
verbose (bool, optional) – Set to True for helpful information.
- Returns:
True to keep station, False to skip.
- Return type:
bool
- intake_axds.utils.load_metadata(datatype: str, results: dict) dict [source]#
Load metadata for catalog entry.
- Parameters:
results (dict) – Returned results from call to server for a single dataset.
- Returns:
Metadata to store with catalog entry.
- Return type:
dict
- intake_axds.utils.make_data_url(filter: str, start_time: str, end_time: str, binned: bool = False, bin_interval: Optional[str] = None) str [source]#
Create url for accessing sensor data, raw or binned.
- Parameters:
filter (str) – get this from
make_filter()
; contains station and potentially variable info.start_time (str) – e.g. “2022-1-1”. Needs to be interpretable by pandas
Timestamp
.end_time (str) – e.g. “2022-1-2”. Needs to be interpretable by pandas
Timestamp
.binned (bool, optional) – True for binned data, False for raw, by default False.
bin_interval (Optional[str], optional) – If
binned=True
, input the binning interval to return. Options are hourly, daily, weekly, monthly, yearly.
- Returns:
URL from which to access data.
- Return type:
str
- intake_axds.utils.make_filter(internal_id: int, parameterGroupId: Optional[int] = None) str [source]#
Make filter for Axiom Sensors API.
- Parameters:
internal_id (int) – internal id for station. Not the uuid.
parameterGroupId (Optional[int], optional) – Parameter Group ID to narrow search, by default None
- Returns:
filter to use in station metadata and data access
- Return type:
str
- intake_axds.utils.make_label(label: str, units: Optional[str] = None, use_units: bool = True) str [source]#
making column name
- Parameters:
label (str) – variable label to use in column header
units (Optional[str], optional) – units to use in column name, if not None, by default None
use_units (bool, optional) – Users can choose not to include units in column name, by default True
- Returns:
string to use as column name
- Return type:
str
- intake_axds.utils.make_metadata_url(filter: str) str [source]#
Make url for finding metadata
- Parameters:
filter (str) – filter for Sensors API. Use
make_filter
to make this.- Returns:
url for metadata.
- Return type:
str
- intake_axds.utils.make_search_docs_url(internal_id: Optional[int] = None, uuid: Optional[str] = None) str [source]#
Url for Axiom Search docs.
Uses whichever of internal_id and uuid is not None to formulate url.
- Parameters:
internal_id (Optional[int], optional) – Internal station id for Axiom. Not the UUID.
uuid (str) – uuid for station.
- Returns:
Url for finding Axiom Search docs
- Return type:
str
- intake_axds.utils.match_key_to_parameter(keys_to_match: list, criteria: Optional[dict] = None) list [source]#
Find Parameter Group values that match keys_to_match.
- Parameters:
keys_to_match (list) – The custom_criteria key to narrow the search, which will be matched to the category results using the custom_criteria that must be set up ahead of time with cf-pandas.
criteria (dict, optional) – Criteria to use to map from variable to attributes describing the variable. If user has defined custom_criteria, this will be used by default.
- Returns:
Parameter Group values that match key, according to the custom criteria.
- Return type:
list
Inherited from intake#
intake-axds
catalog#
Set up a catalog for Axiom assets.
- class intake_axds.axds_cat.AXDSCatalog(*args, **kwargs)[source]#
Bases:
Catalog
Makes data sources out of all datasets for a given AXDS data type.
- pglabels#
If
keys_to_match
orstandard_names
is input to search on, they are converted to parameterGroupLabels and saved to the catalog metadata.- Type:
list[str]
- pgids#
If
keys_to_match
orstandard_names
is input to search on, they are converted to parameterGroupIds and saved to the catalog metadata. In the case thatquery_type=="intersection_constrained"
anddatatype=="platform2"
, the pgids are passed to the sensor source so that only data from variables corresponding to those pgids are returned.- Type:
list[int]
- Parameters:
datatype (str) – Axiom data type. Currently “platform2” or “sensor_station” but eventually also “module”. Platforms and sensors are returned as dataframe containers.
keys_to_match (str, list, optional) – Name of keys to match with system-available variable parameterNames using criteria. To filter search by variables, either input keys_to_match and a vocabulary or input standard_names. Results from multiple values will be combined according to
query_type
.standard_names (str, list, optional) – Standard names to select from Axiom search parameterNames. If more than one is input, the search is for a logical OR of datasets containing the standard_names. To filter search by variables, either input keys_to_match and a vocabulary or input standard_names. Results from multiple values will be combined according to
query_type
.bbox (tuple of 4 floats, optional) – For explicit geographic search queries, pass a tuple of four floats in the bbox argument. The bounding box parameters are (min_lon, min_lat, max_lon, max_lat).
start_time (str, optional) – For explicit search queries for datasets that contain data after start_time. Must include end_time if include start_time.
end_time (str, optional) – For explicit search queries for datasets that contain data before end_time. Must include start_time if include end_time.
search_for (str, list of strings, optional) – For explicit search queries for datasets that any contain of the terms specified in this keyword argument. Results from multiple values will be combined according to
query_type
.kwargs_search (dict, optional) –
Keyword arguments to input to search on the server before making the catalog. Options are:
to search by bounding box: include all of min_lon, max_lon, min_lat, max_lat: (int, float). Longitudes must be between -180 to +180.
to search within a datetime range: include both of min_time, max_time: interpretable datetime string, e.g., “2021-1-1”
to search using a textual keyword: include search_for as a string or list of strings. Results from multiple values will be combined according to
query_type
.
query_type (str, default "union") –
Specifies how the catalog should apply the query parameters. Choices are:
"union"
: the results will be the union of each resulting dataset. This is equivalent to a logical OR."intersection"
: the set of results will be the intersection of each individual query made to the server. This is equivalent to a logical AND of the results."intersection_constrained"
: the set of results will be the intersection of queries but also only the variables requested (using eitherkeys_to_match
orstandard_names
) will be returned in the DataFrame, instead of all available variables. This only applies todatatype=="sensor_station"
.
qartod (bool, int, list, optional) –
Whether to return QARTOD agg flags when available, which is only for sensor_stations. Can instead input an int or a list of ints representing the _qa_agg flags for which to return data values. More information about QARTOD testing and flags can be found here: https://cdn.ioos.noaa.gov/media/2020/07/QARTOD-Data-Flags-Manual_version1.2final.pdf. Only used by datatype “sensor_station”. Is not available if
binned==True
.Examples of ways to use this input are:
qartod=True
: Return aggregate QARTOD flags as a column for each data variable.qartod=False
: Do not return any QARTOD flag columns.qartod=1
: nan any data values for which the aggregated QARTOD flags are not equal to 1.qartod=[1,3]
: nan any data values for which the aggregated QARTOD flags are not equal to 1 or 3.
Flags are:
1: Pass
2: Not Evaluated
3: Suspect
4: Fail
9: Missing Data
use_units (bool, optional) – If True include units in column names. Syntax is “standard_name [units]”. If False, no units. Then syntax for column names is “standard_name”. This is currently specific to sensor_station only. Only used by datatype “sensor_station”.
binned (bool, optional) – True for binned data, False for raw, by default False. Only used by datatype “sensor_station”.
bin_interval (Optional[str], optional) – If
binned=True
, input the binning interval to return. Options are hourly, daily, weekly, monthly, yearly. If bin_interval is input, binned is set to True. Only used by datatype “sensor_station”.page_size (int, optional) – Number of results. Fewer is faster. Note that default is 10. Note that if you want to make sure you get all available datasets, you should input a large number like 50000.
verbose (bool, optional) – Set to True for helpful information.
ttl (int, optional) – Time to live for catalog (in seconds). How long before force-reloading catalog. Set to None to not do this.
name (str, optional) – Name for catalog.
description (str, optional) – Description for catalog.
metadata (dict, optional) – Metadata for catalog.
kwargs – Other input arguments are passed to the intake Catalog class. They can includegetenv, getshell, persist_mode, storage_options, and user_parameters, in addition to some that are surfaced directly in this class.
Notes
only datatype sensor_station uses the following parameters: qartod, use_units, binned, bin_interval
datatype of sensor_station skips webcam data.
- Attributes:
Methods
__call__
(**kwargs)Create a new instance of this source with altered arguments
close
()Close open resources corresponding to this data source.
configure_new
(**kwargs)Create a new instance of this source with altered arguments
describe
()Description from the entry spec
discover
()Open resource and populate the source attributes.
export
(path, **kwargs)Save this data for sharing with other people
filter
(func)Create a Catalog of a subset of entries based on a condition
Imperative reload data now
from_dict
(entries, **kwargs)Create Catalog from the given set of entries
get
(**kwargs)Create a new instance of this source with altered arguments
Gather all search urls for catalog.
items
()Get an iterator over (key, source) tuples for the catalog entries.
keys
()Entry names in this catalog as an iterator (alias for __iter__)
persist
([ttl])Save data from this source to local persistent storage
pop
(key)Remove entry from catalog and return it
read
()Load entire dataset into a container and return it
Return iterator over container fragments of data source
Return a part of the data corresponding to i-th partition.
reload
()Reload catalog if sufficient time has passed
save
(url[, storage_options])Output this catalog to a file as YAML
search_url
([pglabel, text_search])Set up one url for searching.
Produce YAML version of this catalog.
to_dask
()Return a dask container for this data source
to_spark
()Provide an equivalent data object in Apache Spark
values
()Get an iterator over the sources for catalog entries.
walk
([sofar, prefix, depth])Get all entries in this catalog and sub-catalogs
yaml
()Return YAML representation of this data-source
get_persisted
search
set_cache_dir
- close()#
Close open resources corresponding to this data source.
- configure_new(**kwargs)#
Create a new instance of this source with altered arguments
Enables the picking of options and re-evaluating templates from any user-parameters associated with this source, or overriding any of the init arguments.
Returns a new data source instance. The instance will be recreated from the original entry definition in a catalog if this source was originally created from a catalog.
- describe()#
Description from the entry spec
- discover()#
Open resource and populate the source attributes.
- export(path, **kwargs)#
Save this data for sharing with other people
Creates a copy of the data in a format appropriate for its container, in the location specified (which can be remote, e.g., s3).
Returns the resultant source object, so that you can, for instance, add it to a catalog (
catalog.add(source)
) or get its YAML representation (.yaml()
).
- filter(func)#
Create a Catalog of a subset of entries based on a condition
Warning
This function operates on CatalogEntry objects not DataSource objects.
Note
Note that, whatever specific class this is performed on, the return instance is a Catalog. The entries are passed unmodified, so they will still reference the original catalog instance and include its details such as directory,.
- Parameters:
func (function) – This should take a CatalogEntry and return True or False. Those items returning True will be included in the new Catalog, with the same entry names
- Returns:
New catalog with Entries that still refer to their parents
- Return type:
Catalog
- force_reload()#
Imperative reload data now
- classmethod from_dict(entries, **kwargs)#
Create Catalog from the given set of entries
- Parameters:
entries (dict-like) – A mapping of name:entry which supports dict-like functionality, e.g., is derived from
collections.abc.Mapping
.kwargs (passed on the constructor) – Things like metadata, name; see
__init__
.
- Return type:
Catalog instance
- get(**kwargs)#
Create a new instance of this source with altered arguments
Enables the picking of options and re-evaluating templates from any user-parameters associated with this source, or overriding any of the init arguments.
Returns a new data source instance. The instance will be recreated from the original entry definition in a catalog if this source was originally created from a catalog.
- get_search_urls() list [source]#
Gather all search urls for catalog.
Inputs that can have more than one search_url are pglabels and search_for list.
- Returns:
List of search urls.
- Return type:
list
- property gui#
Source GUI, with parameter selection and plotting
- property has_been_persisted#
The base class does not interact with persistence
- property hvplot#
alias for
DataSource.plot
- property is_persisted#
The base class does not interact with persistence
- items()#
Get an iterator over (key, source) tuples for the catalog entries.
- keys()#
Entry names in this catalog as an iterator (alias for __iter__)
- persist(ttl=None, **kwargs)#
Save data from this source to local persistent storage
- Parameters:
ttl (numeric, optional) – Time to live in seconds. If provided, the original source will be accessed and a new persisted version written transparently when more than
ttl
seconds have passed since the old persisted version was written.kargs (passed to the _persist method on the base container.) –
- property plot#
Plot API accessor
This property exposes both predefined plots (described in the source metadata) and general-purpose plotting via the hvPlot library. Supported containers are: array, dataframe and xarray,
To display in a notebook, be sure to run
intake.output_notebook()
first.The set of plots defined for this source can be found by
>>> source.plots ["plot1", "plot2"]
and to display one of these:
>>> source.plot.plot1() <holoviews/panel output>
To create new plot types and supply custom configuration, use one of the methods of
hvplot.hvPlot
:>>> source.plot.line(x="fieldX", y="fieldY")
The full set of arguments that can be passed, and the types of plot they refer to, can be found in the doc and attributes of
hvplot.HoloViewsConverter
.Once you have found a suitable plot, you may wish to update the plots definitions of the source. Simply add the
plotname=
optional argument (this will overwrite any existing plot of that name). The source’s YAML representation will include the new plot, and it could be saved into a catalog with this new definition.>>> source.plot.line(plotname="new", x="fieldX", y="fieldY"); >>> source.plots ["plot1", "plot2", "new"]
- property plots#
List custom associated quick-plots
- pop(key)#
Remove entry from catalog and return it
This relies on the _entries attribute being mutable, which it normally is. Note that if a catalog automatically reloads, any entry removed here may soon reappear
- Parameters:
key (str) – Key to give the entry in the cat
- read()#
Load entire dataset into a container and return it
- read_chunked()#
Return iterator over container fragments of data source
- read_partition(i)#
Return a part of the data corresponding to i-th partition.
By default, assumes i should be an integer between zero and npartitions; override for more complex indexing schemes.
- reload()#
Reload catalog if sufficient time has passed
- save(url, storage_options=None)#
Output this catalog to a file as YAML
- Parameters:
url (str) – Location to save to, perhaps remote
storage_options (dict) – Extra arguments for the file-system
- search_url(pglabel: Optional[str] = None, text_search: Optional[str] = None) str [source]#
Set up one url for searching.
- Parameters:
pglabel (Optional[str], optional) – Parameter Group Label (not ID), by default None
text_search (Optional[str], optional) – free text search, by default None
- Returns:
URL to use to search Axiom systems.
- Return type:
str
- serialize()#
Produce YAML version of this catalog.
Note that this is not the same as
.yaml()
, which produces a YAML block referring to this catalog.
- to_dask()#
Return a dask container for this data source
- to_spark()#
Provide an equivalent data object in Apache Spark
The mapping of python-oriented data containers to Spark ones will be imperfect, and only a small number of drivers are expected to be able to produce Spark objects. The standard arguments may b translated, unsupported or ignored, depending on the specific driver.
This method requires the package intake-spark
- values()#
Get an iterator over the sources for catalog entries.
- walk(sofar=None, prefix=None, depth=2)#
Get all entries in this catalog and sub-catalogs
- Parameters:
sofar (dict or None) – Within recursion, use this dict for output
prefix (list of str or None) – Names of levels already visited
depth (int) – Number of levels to descend; needed to truncate circular references and for cleaner output
- Returns:
Dict where the keys are the entry names in dotted syntax, and the
values are entry instances.
- yaml()#
Return YAML representation of this data-source
The output may be roughly appropriate for inclusion in a YAML catalog. This is a best-effort implementation
intake-axds
sensor source#
- class intake_axds.axds.AXDSSensorSource(*args, **kwargs)[source]#
Bases:
DataSource
Intake Source for AXDS sensor
- Parameters:
internal_id (Optional[int], optional) – Internal station id for Axiom, by default None. Not the UUID. Need to input internal_id or UUID. If both are input, be sure they are for the same station.
uuid (Optional[str], optional) – The UUID for the station, by default None. Not the internal_id. Need to input internal_id or UUID. If both are input, be sure they are for the same station. Note that there may also be a “datasetId” parameter which is sometimes but not always the same as the UUID.
start_time (Optional[str], optional) – At what datetime for data to start, by default None. Must be interpretable by pandas
Timestamp
. If not input, the datetime at which the dataset starts will be used.end_time (Optional[str], optional) – At what datetime for data to end, by default None. Must be interpretable by pandas
Timestamp
. If not input, the datetime at which the dataset ends will be used.qartod (bool, int, list, optional) –
Whether to return QARTOD agg flags when available, which is only for sensor_stations. Can instead input an int or a list of ints representing the _qa_agg flags for which to return data values. More information about QARTOD testing and flags can be found here: https://cdn.ioos.noaa.gov/media/2020/07/QARTOD-Data-Flags-Manual_version1.2final.pdf. Only used by datatype “sensor_station”. Is not available if binned==True.
Examples of ways to use this input are:
qartod=True
: Return aggregate QARTOD flags as a column for each data variable.qartod=False
: Do not return any QARTOD flag columns.qartod=1
: nan any data values for which the aggregated QARTOD flags are not equal to 1.qartod=[1,3]
: nan any data values for which the aggregated QARTOD flags are not equal to 1 or 3.
Flags are:
1: Pass
2: Not Evaluated
3: Suspect
4: Fail
9: Missing Data
use_units (bool, optional) – If True include units in column names. Syntax is “standard_name [units]”. If False, no units. Then syntax for column names is “standard_name”. This is currently specific to sensor_station only. Only used by datatype “sensor_station”.
metadata (dict, optional) – Metadata for catalog.
binned (bool, optional) – True for binned data, False for raw, by default False. Only used by datatype “sensor_station”.
bin_interval (Optional[str], optional) – If
binned=True
, input the binning interval to return. Options are hourly, daily, weekly, monthly, yearly. If bin_interval is input, binned is set to True. Only used by datatype “sensor_station”.only_pgids (list, optional) – If input, only return data associated with these parameterGroupIds. This is separate from parameterGroupLabels and parameterGroupIds that might be present in the metadata.
- Raises:
ValueError – _description_
- Attributes:
Methods
__call__
(**kwargs)Create a new instance of this source with altered arguments
close
()Close open resources corresponding to this data source.
configure_new
(**kwargs)Create a new instance of this source with altered arguments
describe
()Description from the entry spec
discover
()Open resource and populate the source attributes.
export
(path, **kwargs)Save this data for sharing with other people
get
(**kwargs)Create a new instance of this source with altered arguments
Return appropriate filter for stationid.
persist
([ttl])Save data from this source to local persistent storage
read
()read data in
Return iterator over container fragments of data source
Return a part of the data corresponding to i-th partition.
to_dask
()Return a dask container for this data source
to_spark
()Provide an equivalent data object in Apache Spark
yaml
()Return YAML representation of this data-source
get_persisted
set_cache_dir
- close()#
Close open resources corresponding to this data source.
- configure_new(**kwargs)#
Create a new instance of this source with altered arguments
Enables the picking of options and re-evaluating templates from any user-parameters associated with this source, or overriding any of the init arguments.
Returns a new data source instance. The instance will be recreated from the original entry definition in a catalog if this source was originally created from a catalog.
- property data_urls#
Prepare to load in data by getting data_urls.
For V1 sources there will be a data_url per parameterGroupId but not for V2 sources.
- describe()#
Description from the entry spec
- discover()#
Open resource and populate the source attributes.
- export(path, **kwargs)#
Save this data for sharing with other people
Creates a copy of the data in a format appropriate for its container, in the location specified (which can be remote, e.g., s3).
Returns the resultant source object, so that you can, for instance, add it to a catalog (
catalog.add(source)
) or get its YAML representation (.yaml()
).
- get(**kwargs)#
Create a new instance of this source with altered arguments
Enables the picking of options and re-evaluating templates from any user-parameters associated with this source, or overriding any of the init arguments.
Returns a new data source instance. The instance will be recreated from the original entry definition in a catalog if this source was originally created from a catalog.
- get_filters()[source]#
Return appropriate filter for stationid.
What filter form to use depends on if V1 or V2.
For V1, use each parameterGroupId only once to make a filter since all data of that type will be read in together.
Following Sensor API https://admin.axds.co/#!/sensors/api/overview
- property gui#
Source GUI, with parameter selection and plotting
- property has_been_persisted#
The base class does not interact with persistence
- property hvplot#
alias for
DataSource.plot
- property is_persisted#
The base class does not interact with persistence
- persist(ttl=None, **kwargs)#
Save data from this source to local persistent storage
- Parameters:
ttl (numeric, optional) – Time to live in seconds. If provided, the original source will be accessed and a new persisted version written transparently when more than
ttl
seconds have passed since the old persisted version was written.kargs (passed to the _persist method on the base container.) –
- property plot#
Plot API accessor
This property exposes both predefined plots (described in the source metadata) and general-purpose plotting via the hvPlot library. Supported containers are: array, dataframe and xarray,
To display in a notebook, be sure to run
intake.output_notebook()
first.The set of plots defined for this source can be found by
>>> source.plots ["plot1", "plot2"]
and to display one of these:
>>> source.plot.plot1() <holoviews/panel output>
To create new plot types and supply custom configuration, use one of the methods of
hvplot.hvPlot
:>>> source.plot.line(x="fieldX", y="fieldY")
The full set of arguments that can be passed, and the types of plot they refer to, can be found in the doc and attributes of
hvplot.HoloViewsConverter
.Once you have found a suitable plot, you may wish to update the plots definitions of the source. Simply add the
plotname=
optional argument (this will overwrite any existing plot of that name). The source’s YAML representation will include the new plot, and it could be saved into a catalog with this new definition.>>> source.plot.line(plotname="new", x="fieldX", y="fieldY"); >>> source.plots ["plot1", "plot2", "new"]
- property plots#
List custom associated quick-plots
- read_chunked()#
Return iterator over container fragments of data source
- read_partition(i)#
Return a part of the data corresponding to i-th partition.
By default, assumes i should be an integer between zero and npartitions; override for more complex indexing schemes.
- to_dask()#
Return a dask container for this data source
- to_spark()#
Provide an equivalent data object in Apache Spark
The mapping of python-oriented data containers to Spark ones will be imperfect, and only a small number of drivers are expected to be able to produce Spark objects. The standard arguments may b translated, unsupported or ignored, depending on the specific driver.
This method requires the package intake-spark
- yaml()#
Return YAML representation of this data-source
The output may be roughly appropriate for inclusion in a YAML catalog. This is a best-effort implementation