Skip to content

Latest commit

 

History

History
190 lines (184 loc) · 27.1 KB

File metadata and controls

190 lines (184 loc) · 27.1 KB

Data API Index

This is the data package which forms a part of the PyEarthTools package. It contains code for fetching, loading, transforming and working with a wide variety of data sources. It has support for industry standard data sources of common interest, and also has code to aid users in managing their own data in their own projects.

Many research facilities have pre-existing data holdings on disk, and it is not necessary for users to fetch data. On the other hand, many users do need to fetch their own data for their projects. Both situations are catered for, but it's important to bear in mind that this package is catering to a broad and diverse set of user requirements.

The use of the data package within PyEarthTools includes:

  • Fetching known data sources (such as ERA5 or ISD)
  • Indexing into either pre-existing or newly-fetched data that has been downloaded
  • Subsetting and reprocessing that data for efficient storage, access and reprocessing
  • Loading that data into memory for efficient use in machine learning
  • Performing scientific operations on that data as part of data pre-processing

These tasks are aided by the API presented by the data package. Users looking for "how-to guides" or worked examples should review the Tutorial Gallery.

The rest of this page contains reference information for the components of the Data package. The data API docs can be viewed at Data API Docs.

Module Purpose API Docs
data.archive Indexing and loading from known data holdings - ZarrIndex
- ZarrTimeIndex
- extensions.register_archive
- reset_root
- set_root
- config_root
data.derived Calculated derived fields - DerivedValue
- TimeDerivedValue
- AdvancedTimeDerivedValue
- Insolation
data.download Publicly available datasets - ARCOERA5
- WB2ERA5
- WB2ERA5Clim
data.indexes - Index
- DataIndex
- FileSystemIndex
- TimeIndex
- SingleTimeIndex
- TimeDataIndex
- AdvancedTimeIndex
- AdvancedTimeDataIndex
- BaseTimeIndex
- DataFileSystemIndex
- ArchiveIndex
- ForecastIndex
- StaticDataIndex
- CachingForecastIndex
- IntakeIndex
- IntakeIndexCache
- cacheIndex.BaseCacheIndex
- cacheIndex.FileSystemCacheIndex
- cacheIndex.CacheFactory
- cacheIndex.FunctionalCache
- combine.InterpolationIndex
- fake.FakeIndex
- extensions.register_accessor
data.modifications - Modification
- register_modification
- variable_modifications
- aggregations.Aggregation
- aggregations.AggregationGeneral
- aggregations.Mean
- aggregations.Accumulate
- constants.Constant
- decorator.VariableModification
- decorator.Modifier
- reductions.Reduction
- reductions.Groupby
- reductions.Hourly
- reductions.Daily
- reductions.Monthly
- register.register_modification
data.operations - percentile
- aggregation
- binning
- SpatialInterpolation
- TemporalInterpolation
- FullInterpolation
- index_routines.series
- index_routines.safe_series
- index_operations.split_ds
- index_operations.split_ds_gen
- index_operations.aggregation
- index_operations.find_range
- utils.identify_time_dimension
- forecast_op.forecast_series
- forecast_op.forecast_as_basetime
- forecast_op.forecast_select_time
data.patterns - PatternIndex
- PatternTimeIndex
- PatternForecastIndex
- PatternVariableAware
- Argument
- ArgumentExpansion
- ArgumentExpansionVariable
- Direct
- TemporalDirect
- ForecastDirect
- DirectVariable
- ForecastDirectVariable
- TemporalDirectVariable
- DirectFactory
- ExpandedDate
- TemporalExpandedDate
- ForecastExpandedDate
- ExpandedDateVariable
- ForecastExpandedDateVariable
- TemporalExpandedDateVariable
- ExpandedDateFactory
- Static
- ParsingPattern
- ZarrIndex
- ZarrTimeIndex
data.save - save.save
- ManageFiles
- ManageTemp
- array.save
- dask.save
- dataset.save
- dataset.to_netcdf
- dataset.to_zarr
- jsonsave.save
- plot.save
- save_utils.check_if_exists
- save_utils.make_new_filename
- save_utils.keep_clear
data.transforms - Transform
- TransformCollection
- FunctionTransform
- Derive
- aggregation.over
- aggregation.leaving
- aggregation.Aggregate
- attributes.SetAttributes
- attributes.SetEncoding
- attributes.SetType
- attributes.Rename
- coordinates.get_longitude
- coordinates.StandardLongitude
- coordinates.ReIndex
- coordinates.StandardCoordinateNames
- coordinates.Select
- coordinates.Drop
- coordinates.Assign
- coordinates.Pad
- default.get_default_transforms
- derive.evaluate
- derive.derive_equations
- dimensions.StandardDimensionNames
- dimensions.Expand
- interpolation.Interpolate
- interpolation.XESMF
- interpolation.InterpolateNan
- interpolation.like
- mask.UnderlyingMaskTransform
- mask.Dataset
- mask.Replace
- optimisation.Rechunk
- region.check_shape
- region.order
- region.like
- region.Bounding
- region.Select
- region.ISelect
- region.PointBox
- region.Lookup
- region.Geosearch
- region.ShapeFile
- utils.parse_dataset
data.catalog - Catalog
- CatalogEntry
- get_name
data.collection - Collection
- LabelledCollection
data.exceptions - InvalidIndexError
- InvalidDataError
- DataNotFoundError
data.load - load
data.time - multisplit
- find_components
- strip_to_common_resolution
- time_delta
- time_delta_resolution
- range_samples
- TimeResolution
- Petdt
- TimeDelta
- TimeRange
data.warnings - pyearthtoolsDataWarning
- IndexWarning
- AccessorRegistrationWarning