plans.datasets.core
The core module for datasets.
- Description:
The
datasets.core
module provides primitive (abstract) objects to handleplans
data.- License:
This software is released under the GNU General Public License v3.0 (GPL-3.0). For details, see: https://www.gnu.org/licenses/gpl-3.0.html
- Author:
Iporã Possantti
- Contact:
Overview
todo Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla mollis tincidunt erat eget iaculis. Mauris gravida ex quam, in porttitor lacus lobortis vitae. In a lacinia nisl. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas.
>>> from plans.datasets.core import *
Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Nulla facilisi. Mauris eget nisl eu eros euismod sodales. Cras pulvinar tincidunt enim nec semper.
Example
todo Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla mollis tincidunt erat eget iaculis. Mauris gravida ex quam, in porttitor lacus lobortis vitae. In a lacinia nisl.
import numpy as np
from plans import analyst
# get data to a vector
data_vector = np.random.rand(1000)
# instantiate the Univar object
uni = analyst.Univar(data=data_vector, name="my_data")
# view data
uni.view()
Mauris gravida ex quam, in porttitor lacus lobortis vitae. In a lacinia nisl. Mauris gravida ex quam, in porttitor lacus lobortis vitae. In a lacinia nisl.
Functions
|
Utility function for dataframe pre-processing. |
|
Utility function to get a list of random colors |
Classes
|
A Quali-Hard is a hard-coded qualitative map (that is, the table is pre-set) |
|
Basic qualitative raster map dataset. |
|
The raster test_collection base dataset. |
|
A |
|
The basic raster map dataset. |
|
The raster test_collection base dataset. |
|
A |
|
|
|
The |
|
A collection of time series objects with associated metadata. |
|
The |
|
The |
|
Zones map dataset |
- plans.datasets.core.dataframe_prepro(dataframe)[source]
Utility function for dataframe pre-processing.
- Parameters
dataframe (
pandas.DataFrame
) – incoming dataframe- Returns
prepared dataframe
- Return type
pandas.DataFrame
- plans.datasets.core.get_colors(size=10, cmap='tab20', randomize=True)[source]
Utility function to get a list of random colors
- class plans.datasets.core.TimeSeries(name='MyTimeSeries', alias='TS0')[source]
Bases:
DataSet
- __init__(name='MyTimeSeries', alias='TS0')[source]
Initialize the
TimeSeries
object. Expected to increment superior methods.
- _set_view_specs()[source]
Set view specifications. Expected to overwrite superior methods.
- Returns
None
- Return type
None
- _set_frequency()[source]
Guess the datetime resolution of a time series based on the consistency of timestamp components (e.g., seconds, minutes).
- Returns
None
- Return type
None
Notes:
This method infers the datetime frequency of the time series data
based on the consistency of timestamp components.
Examples:
>>> ts._set_frequency()
- get_metadata()[source]
Get a dictionary with object metadata. Expected to increment superior methods.
Note
Metadata does not necessarily inclue all object attributes.
- Returns
dictionary with all metadata
- Return type
- update()[source]
Update internal attributes based on the current data.
- Returns
None
- Return type
None
Notes:
Calls the
set_frequency
method to update the datetime frequency attribute.Updates the
start
attribute with the minimum datetime value in the data.Updates the
end
attribute with the maximum datetime value in the data.Updates the
var_min
attribute with the minimum value of the variable field in the data.Updates the
var_max
attribute with the maximum value of the variable field in the data.Updates the
data_size
attribute with the length of the data.
Examples:
>>> ts.update()
- set(dict_setter, load_data=True)[source]
Set selected attributes based on an incoming dictionary. Expected to increment superior methods.
- set_data(input_df, input_dtfield, input_varfield, filter_dates=None, dropnan=True)[source]
Set time series data from an input DataFrame.
- Parameters
input_df (
pandas.DataFrame
) – pandas DataFrame Input DataFrame containing time series data.input_dtfield (str) – str Name of the datetime field in the input DataFrame.
input_varfield (str) – str Name of the variable field in the input DataFrame.
filter_dates – list, optional List of [Start, End] used for filtering date range.
dropnan (bool) – bool, optional If True, drop NaN values from the DataFrame. Default is True.
- Returns
None
- Return type
None
Notes:
Assumes the input DataFrame has a datetime column in the format “YYYY-mm-DD HH:MM:SS”.
Renames columns to standard format (datetime:
self.dtfield
, variable:self.varfield
).Converts the datetime column to standard format.
Examples:
>>> input_data = pd.DataFrame({ ... 'Date': ['2022-01-01 12:00:00', '2022-01-02 12:00:00', '2022-01-03 12:00:00'], ... 'Temperature': [25.1, 26.5, 24.8] ... }) >>> ts.set_data(input_data, input_dtfield='Date', input_varfield='Temperature')
- load_data(file_data, input_dtfield=None, input_varfield=None, in_sep=';', filter_dates=None)[source]
Load data from file. Expected to overwrite superior methods.
- Parameters
- Returns
None
- Return type
None
Notes:
Assumes the input file is in
csv
format.Expects a datetime column in the format
YYYY-mm-DD HH:MM:SS
.
Examples:
>>> ts.load_data("path/to/data.csv", input_dtfield="Date", input_varfield="TempDB", sep=",")
Important
The
DateTime
field in the incoming file must be a full timestampYYYY-mm-DD HH:MM:SS
. Even if the data is a daily time series, make sure to include a constant, default timestamp like the following example:DateTime; Temperature 2020-02-07 12:00:00; 24.5 2020-02-08 12:00:00; 25.1 2020-02-09 12:00:00; 28.7 2020-02-10 12:00:00; 26.5
- cut_edges(inplace=False)[source]
Cut off initial and final NaN records in a given time series.
- Parameters
inplace (bool) – bool, optional If True, the operation will be performed in-place, and the original data will be modified. If False, a new DataFrame with cut edges will be returned, and the original data will remain unchanged. Default is False.
- Returns
pandas.DataFrame``
or None If inplace is False, a new DataFrame with cut edges. If inplace is True, returns None, and the original data is modified in-place.- Return type
pandas.DataFrame``
or None
Notes:
This function removes leading and trailing rows with NaN values in the specified variable field.
The operation is performed on a copy of the original data, and the original data remains unchanged.
Examples:
>>> ts.cut_edges(inplace=True)
>>> trimmed_ts = ts.cut_edges(inplace=False)
- standardize(start=None, end=None)[source]
Standardize the data based on regular datetime steps and the time resolution.
- Parameters
start (
pandas.Timestamp
) –pandas.Timestamp
, optional The starting datetime for the standardization. Defaults to the first datetime in the data.end (
pandas.Timestamp
) –pandas.Timestamp
, optional The ending datetime for the standardization. Defaults to the last datetime in the data.
- Returns
None
- Return type
None
Notes:
Handles missing start and end values by using the first and last datetimes in the data.
Standardizes the incoming start and end datetimes to midnight of their respective days.
Creates a full date range with the specified frequency for the standardization period.
Groups the data by epochs (based on the frequency and datetime field), applies the specified aggregation function, and fills in missing values with left merges.
Cuts off any edges with missing data.
Updates internal attributes, including
self.isstandard
to indicate that the data has been standardized.
Examples:
>>> ts.standardize()
Warning
The
standardize
method modifies the internal data representation. Ensure to review the data after standardization.
- get_epochs(inplace=False)[source]
Get Epochs (periods) for continuous time series (0 = gap epoch).
- Parameters
inplace (bool) – bool, optional Option to set Epochs inplace. Default is False.
- Returns
pandas.DataFrame``
or None A DataFrame if inplace is False or None.- Return type
pandas.DataFrame`
, None
Notes:
This function labels continuous chunks of data as Epochs, with Epoch 0 representing gaps in the time series.
Examples:
>>> df_epochs = ts.get_epochs()
- update_epochs_stats()[source]
Update all epochs statistics.
- Returns
None
- Return type
None
Notes:
This function updates statistics for all epochs in the time series.
Ensures that the data is standardized by calling the
standardize
method if it’s not already standardized.Removes epoch 0 from the statistics since it typically represents non-standardized or invalid data.
Groups the data by
Epoch_Id
and calculates statistics such as count, start, and end timestamps for each epoch.Generates random colors for each epoch using the
get_random_colors
function with a specified colormap (cmap` attribute).Includes the time series name in the statistics for identification.
Organizes the statistics DataFrame to include relevant columns:
Name
,Epoch_Id
,Count
,Start
,End
, andColor
.Updates the attribute
epochs_n
with the number of epochs in the statistics.
Examples:
>>> ts.update_epochs_stats()
- interpolate_gaps(method='linear', constant=0, inplace=False)[source]
Fill gaps in a time series by interpolating missing values. If the time series is not in standard form, it will be standardized before interpolation.
- Parameters
method (str) – str, optional Specifies the interpolation method. Default is
linear`. Other supported methods include ``constant
,nearest
,zero
,slinear
,quadratic
,cubic
, etc. Theconstant
method applies a constant value to gaps. Refer to the documentation of scipy.interpolate.interp1d for more options.constant (float) – float, optional Value used if the case of
constant
method. Default is 0.inplace (bool) – bool, optional If True, the interpolation will be performed in-place, and the original data will be modified. If False, a new DataFrame with interpolated values will be returned, and the original data will remain unchanged. Default is False.
- Returns
pandas.DataFrame``
or None If inplace is False, a new DataFrame with interpolated values. If inplace is True, returns None, and the original data is modified in-place.- Return type
pandas.DataFrame``
or None
Notes:
The interpolation is performed for each unique epoch in the time series.
The
method
parameter determines the interpolation technique. Common options includeconstant
,linear
,nearest
,zero
,slinear
,quadratic
, and ``cubic`. See the documentation of scipy.interpolate.interp1d for additional methods and details.If
linear
is chosen, the interpolation is a linear interpolation. Fornearest
, it uses the value of the nearest data point.zero
uses zero-order interpolation (nearest-neighbor).slinear
andquadratic
are spline interpolations of first and second order, respectively.cubic
is a cubic spline interpolation.If the method is
linear
, the fill_value parameter is set toextrapolate
to allow extrapolation beyond the data range.
Examples:
>>> ts.interpolate_gaps(method="linear", inplace=True)
>>> interpolated_ts = ts.interpolate_gaps(method="linear", inplace=False)
>>> interpolated_ts = ts.interpolate_gaps(method="constant", constant=1, inplace=False)
- aggregate(freq, bad_max, agg_funcs=None)[source]
“Aggregate the time series data based on a specified frequency using various aggregation functions.
- Parameters
freq (str) – str Pandas-like alias frequency at which to aggregate the time series data. Common options include: -
H
for hourly frequency -D
for daily frequency -W
for weekly frequency -MS
for monthly/start frequency -QS
for quarterly/start frequency -YS
for yearly/start frequency More options and details can be found in the Pandas documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases.bad_max (int) – int The maximum number of
Bad
records allowed in a time window for aggregation. Records with moreBad
entries will be excluded from the aggregated result. Default is 7.agg_funcs (dict) – dict, optional A dictionary specifying customized aggregation functions for each variable. Default is None, which uses standard aggregation functions (sum, mean, median, min, max, std, var, percentiles).
- Returns
pandas.DataFrame
A newpandas.DataFrame
with aggregated values based on the specified frequency.- Return type
pandas.DataFrame
Notes:
Resamples the time series data to the specified frequency using Pandas-like alias strings.
Aggregates the values using the specified aggregation functions.
Counts the number of
Bad
records in each time window and excludes time windows with moreBad
entries than the specified threshold.
Examples:
>>> agg_result = ts.aggregate(freq='D', agg_funcs={'sum': 'sum', 'mean': 'mean'}, bad_max=5)
- view_epochs(show=True)[source]
Get a basic visualization. Expected to overwrite superior methods.
- Parameters
show (bool) – option for showing instead of saving.
- Returns
None or file path to figure
- Return type
None or str
Notes:
Uses values in the
view_specs()
attribute for plotting
Examples:
Simple visualization:
>>> ds.view(show=True)
Customize view specs:
>>> ds.view_specs["title"] = "My Custom Title" >>> ds.view_specs["xlabel"] = "The X variable" >>> ds.view(show=True)
Save the figure:
>>> ds.view_specs["folder"] = "path/to/folder" >>> ds.view_specs["filename"] = "my_visual" >>> ds.view_specs["fig_format"] = "png" >>> ds.view(show=False)
- view(show=True)[source]
Get a basic visualization. Expected to overwrite superior methods.
- Parameters
show (bool) – option for showing instead of saving.
- Returns
None or file path to figure
- Return type
None or str
Notes:
Uses values in the
view_specs()
attribute for plotting
Examples:
Simple visualization:
>>> ds.view(show=True)
Customize view specs:
>>> ds.view_specs["title"] = "My Custom Title" >>> ds.view_specs["xlabel"] = "The X variable" >>> ds.view(show=True)
Save the figure:
>>> ds.view_specs["folder"] = "path/to/folder" >>> ds.view_specs["filename"] = "my_visual" >>> ds.view_specs["fig_format"] = "png" >>> ds.view(show=False)
- __annotations__ = {}
- class plans.datasets.core._TimeSeries(name='MyTS', alias=None, varname='variable', varfield='V', units='units')[source]
Bases:
object
The primitive time series object
- __init__(name='MyTS', alias=None, varname='variable', varfield='V', units='units')[source]
Deploy time series object
- Parameters
name (str) – str, optional Name for the object. Default is “MyTS”.
alias (str) – str, optional Alias for the object. Default is None, and it will be set to the first three characters of the name.
varname (str) – str, optional Variable name. Default is “Variable”.
varfield (str) – str, optional Variable field alias. Default is “V”.
units (str) – str, optional Units of the variable. Default is “units”.
Notes:
The
alias
parameter defaults to the first three characters of thename
parameter if not provided.Various attributes related to optional and auto-update information are initialized to
None
or default values.The
dtfield
attribute is set to “DateTime” as the default datetime field.The
cmap
attribute is set to “tab20b” as a default colormap.The method
_set_view_specs
is called to set additional specifications.
Examples:
Creating a time series object with default parameters:
>>> ts_default = TimeSeries()
Creating a time series object with custom parameters:
>>> ts_custom = TimeSeries(name="CustomTS", alias="Cust", varname="Temperature", varfield="Temp", units="Celsius")
- load_data(input_file, input_varfield, input_dtfield='DateTime', sep=';', filter_dates=None)[source]
Load data from file.
- Parameters
- Returns
None
- Return type
None
Notes:
Assumes the input file is in CSV format.
Expects a datetime column in the format “YYYY-mm-DD HH:MM:SS”.
Examples:
>>> ts.load_data("data.csv", "Temperature", input_dtfield="Date", sep=",")
Important
The DateTime field in the incoming file must be a full timestamp
YYYY-mm-DD HH:MM:SS
. Even if the data is a daily time series, make sure to include a constant, default timestamp like the following example:DateTime; Temperature 2020-02-07 12:00:00; 24.5 2020-02-08 12:00:00; 25.1 2020-02-09 12:00:00; 28.7 2020-02-10 12:00:00; 26.5
- set_data(input_df, input_dtfield, input_varfield, filter_dates=None, dropnan=True)[source]
Set time series data from an input DataFrame.
- Parameters
input_df (
pandas.DataFrame
) – pandas DataFrame Input DataFrame containing time series data.input_dtfield (str) – str Name of the datetime field in the input DataFrame.
input_varfield (str) – str Name of the variable field in the input DataFrame.
filter_dates – list, optional List of [Start, End] used for filtering date range.
dropnan (bool) – bool, optional If True, drop NaN values from the DataFrame. Default is True.
- Returns
None
- Return type
None
Notes:
Assumes the input DataFrame has a datetime column in the format “YYYY-mm-DD HH:MM:SS”.
Renames columns to standard format (datetime:
self.dtfield
, variable:self.varfield
).Converts the datetime column to standard format.
Examples:
>>> input_data = pd.DataFrame({ ... 'Date': ['2022-01-01 12:00:00', '2022-01-02 12:00:00', '2022-01-03 12:00:00'], ... 'Temperature': [25.1, 26.5, 24.8] ... }) >>> ts.set_data(input_data, input_dtfield='Date', input_varfield='Temperature')
- export(folder)[source]
Export data (time series and epoch stats) to csv files
- Parameters
folder (str) – str Path to output folder
- Returns
None
- Return type
None
- standardize(start=None, end=None)[source]
Standardize the data based on regular datetime steps and the time resolution.
- Parameters
start (
pandas.Timestamp
) –pandas.Timestamp
, optional The starting datetime for the standardization. Defaults to the first datetime in the data.end (
pandas.Timestamp
) –pandas.Timestamp
, optional The ending datetime for the standardization. Defaults to the last datetime in the data.
- Returns
None
- Return type
None
Notes:
Handles missing start and end values by using the first and last datetimes in the data.
Standardizes the incoming start and end datetimes to midnight of their respective days.
Creates a full date range with the specified frequency for the standardization period.
Groups the data by epochs (based on the frequency and datetime field), applies the specified aggregation function, and fills in missing values with left merges.
Cuts off any edges with missing data.
Updates internal attributes, including
self.isstandard
to indicate that the data has been standardized.
Examples:
>>> ts.standardize()
Warning
The
standardize
method modifies the internal data representation. Ensure to review the data after standardization.
- get_epochs(inplace=False)[source]
Get Epochs (periods) for continuous time series (0 = gap epoch).
- Parameters
inplace (bool) – bool, optional Option to set Epochs inplace. Default is False.
- Returns
pandas.DataFrame``
or None A DataFrame if inplace is False or None.- Return type
pandas.DataFrame`
, None
Notes:
This function labels continuous chunks of data as Epochs, with Epoch 0 representing gaps in the time series.
Examples:
>>> df_epochs = ts.get_epochs()
- update()[source]
Update internal attributes based on the current data.
- Returns
None
- Return type
None
Notes:
Calls the
set_frequency
method to update the datetime frequency attribute.Updates the
start
attribute with the minimum datetime value in the data.Updates the
end
attribute with the maximum datetime value in the data.Updates the
var_min
attribute with the minimum value of the variable field in the data.Updates the
var_max
attribute with the maximum value of the variable field in the data.Updates the
data_size
attribute with the length of the data.
Examples:
>>> ts.update()
- update_epochs_stats()[source]
Update all epochs statistics.
- Returns
None
- Return type
None
Notes:
This function updates statistics for all epochs in the time series.
Ensures that the data is standardized by calling the
standardize
method if it’s not already standardized.Removes epoch 0 from the statistics since it typically represents non-standardized or invalid data.
Groups the data by
Epoch_Id
and calculates statistics such as count, start, and end timestamps for each epoch.Generates random colors for each epoch using the
get_random_colors
function with a specified colormap (cmap` attribute).Includes the time series name in the statistics for identification.
Organizes the statistics DataFrame to include relevant columns:
Name
,Epoch_Id
,Count
,Start
,End
, andColor
.Updates the attribute
epochs_n
with the number of epochs in the statistics.
Examples:
>>> ts.update_epochs_stats()
- interpolate_gaps(method='linear', constant=0, inplace=False)[source]
Fill gaps in a time series by interpolating missing values. If the time series is not in standard form, it will be standardized before interpolation.
- Parameters
method (str) – str, optional Specifies the interpolation method. Default is
linear`. Other supported methods include ``constant
,nearest
,zero
,slinear
,quadratic
,cubic
, etc. Theconstant
method applies a constant value to gaps. Refer to the documentation of scipy.interpolate.interp1d for more options.constant (float) – float, optional Value used if the case of
constant
method. Default is 0.inplace (bool) – bool, optional If True, the interpolation will be performed in-place, and the original data will be modified. If False, a new DataFrame with interpolated values will be returned, and the original data will remain unchanged. Default is False.
- Returns
pandas.DataFrame``
or None If inplace is False, a new DataFrame with interpolated values. If inplace is True, returns None, and the original data is modified in-place.- Return type
pandas.DataFrame``
or None
Notes:
The interpolation is performed for each unique epoch in the time series.
The
method
parameter determines the interpolation technique. Common options includeconstant
,linear
,nearest
,zero
,slinear
,quadratic
, and ``cubic`. See the documentation of scipy.interpolate.interp1d for additional methods and details.If
linear
is chosen, the interpolation is a linear interpolation. Fornearest
, it uses the value of the nearest data point.zero
uses zero-order interpolation (nearest-neighbor).slinear
andquadratic
are spline interpolations of first and second order, respectively.cubic
is a cubic spline interpolation.If the method is
linear
, the fill_value parameter is set toextrapolate
to allow extrapolation beyond the data range.
Examples:
>>> ts.interpolate_gaps(method="linear", inplace=True)
>>> interpolated_ts = ts.interpolate_gaps(method="linear", inplace=False)
>>> interpolated_ts = ts.interpolate_gaps(method="constant", constant=1, inplace=False)
- cut_edges(inplace=False)[source]
Cut off initial and final NaN records in a given time series.
- Parameters
inplace (bool) – bool, optional If True, the operation will be performed in-place, and the original data will be modified. If False, a new DataFrame with cut edges will be returned, and the original data will remain unchanged. Default is False.
- Returns
pandas.DataFrame``
or None If inplace is False, a new DataFrame with cut edges. If inplace is True, returns None, and the original data is modified in-place.- Return type
pandas.DataFrame``
or None
Notes:
This function removes leading and trailing rows with NaN values in the specified variable field.
The operation is performed on a copy of the original data, and the original data remains unchanged.
Examples:
>>> ts.cut_edges(inplace=True)
>>> trimmed_ts = ts.cut_edges(inplace=False)
- aggregate(freq, bad_max, agg_funcs=None)[source]
“Aggregate the time series data based on a specified frequency using various aggregation functions.
- Parameters
freq (str) – str Pandas-like alias frequency at which to aggregate the time series data. Common options include: -
H
for hourly frequency -D
for daily frequency -W
for weekly frequency -MS
for monthly/start frequency -QS
for quarterly/start frequency -YS
for yearly/start frequency More options and details can be found in the Pandas documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases.bad_max (int) – int The maximum number of
Bad
records allowed in a time window for aggregation. Records with moreBad
entries will be excluded from the aggregated result. Default is 7.agg_funcs (dict) – dict, optional A dictionary specifying customized aggregation functions for each variable. Default is None, which uses standard aggregation functions (sum, mean, median, min, max, std, var, percentiles).
- Returns
pandas.DataFrame
A newpandas.DataFrame
with aggregated values based on the specified frequency.- Return type
pandas.DataFrame
Notes:
Resamples the time series data to the specified frequency using Pandas-like alias strings.
Aggregates the values using the specified aggregation functions.
Counts the number of
Bad
records in each time window and excludes time windows with moreBad
entries than the specified threshold.
Examples:
>>> agg_result = ts.aggregate(freq='D', agg_funcs={'sum': 'sum', 'mean': 'mean'}, bad_max=5)
- _set_frequency()[source]
Guess the datetime resolution of a time series based on the consistency of timestamp components (e.g., seconds, minutes).
- Returns
None
- Return type
None
Notes:
This method infers the datetime frequency of the time series data based on the consistency of timestamp components.
Examples:
>>> ts._set_frequency()
- view_epochs(show=True, folder='./output', filename=None, dpi=300, fig_format='jpg', suff='')[source]
- view(show=True, folder='./output', filename=None, dpi=300, fig_format='jpg', suff='')[source]
Visualize the time series data using a scatter plot with colored epochs.
- Parameters
show (bool) – bool, optional If True, the plot will be displayed interactively. If False, the plot will be saved to a file. Default is True.
folder (str) – str, optional The folder where the plot file will be saved. Used only if show is False. Default is “./output”.
filename (str or None) – str, optional The base name of the plot file. Used only if show is False. If None, a default filename is generated. Default is None.
dpi (int) – int, optional The dots per inch (resolution) of the plot file. Used only if show is False. Default is 300.
fig_format (str) – str, optional The format of the plot file. Used only if show is False. Default is “jpg”.
raw (str) – bool, optional Option for considering a raw data series. No epochs analysis. Default is False.
- Returns
None If show is True, the plot is displayed interactively. If show is False, the plot is saved to a file.
- Return type
None
Notes:
This function generates a scatter plot with colored epochs based on the epochs’ start and end times. The plot includes data points within each epoch, and each epoch is labeled with its corresponding ID.
Examples:
>>> ts.view(show=True)
>>> ts.view(show=False, folder="./output", filename="time_series_plot", dpi=300, fig_format="png")
- class plans.datasets.core.TimeSeriesCollection(name='myTSCollection', base_object=None)[source]
Bases:
Collection
A collection of time series objects with associated metadata.
The
TimeSeriesCollection
or simply TSC class extends theCollection
class and is designed to handle time series data.- __init__(name='myTSCollection', base_object=None)[source]
Deploy the time series collection data structure.
- Parameters
name (str) – str, optional Name of the time series collection. Default is “myTSCollection”.
base_object (TimeSeries or None) – TimeSeries, optional Base object for the time series collection. If None, a default TimeSeries object is created. Default is None.
Notes:
If
base_object
is not provided, a defaultTimeSeries
object is created.
Examples:
>>> ts_collection = TimeSeriesCollection(name="MyTSCollection")
- update(details=False)[source]
Update the time series collection.
- Parameters
details (bool) – bool, optional If True, update additional details. Default is False.
Examples:
>>> ts_collection.update(details=True)
- load_data(table_file, filter_dates=None)[source]
Load data from table file (information table) into the time series collection.
- Parameters
table_file (str) –
str Path to file. Expected to be a
csv
table.todo place this in IO files docs Required columns:
Id
: int, required. Unique number id.Name
: str, required. Simple name.Alias
: str, required. Short nickname.X
: float, required. Longitude in WGS 84 Datum (EPSG4326).Y
: float, required. Latitude in WGS 84 Datum (EPSG4326).Code
: str, required.Source
: str, required.Description
: str, required.Color
: str, required.Units
or<Varname>_Units
: str, required. Units of data.VarField``or ``<Varname>_VarField
: str, required. Variable column in data file.DtField``or ``<Varname>_DtField
: str, required. Date-time column in data fileFile``or ``<Varname>_File
: str, required. Name or path to data time seriescsv
file.
Examples:
>>> ts_collection.load_data(table_file="data.csv")
- set_data(df_info, src_dir=None, filter_dates=None)[source]
Set data for the time series collection from a info DataFrame.
- Parameters
df_info (class:pandas.DataFrame) –
class:pandas.DataFrame DataFrame containing metadata information for the time series collection. This DataFrame is expected to have matching fields to the metadata keys.
Required fields:
Id
: int, required. Unique number id.Name
: str, required. Simple name.Alias
: str, required. Short nickname.X
: float, required. Longitude in WGS 84 Datum (EPSG4326).Y
: float, required. Latitude in WGS 84 Datum (EPSG4326).Code
: str, requiredSource
: str, requiredDescription
: str, requiredColor
: str, optionalUnits
or<Varname>_Units
: str, required. Units of data.VarField``or ``<Varname>_VarField
: str, required. Variable column in data file.DtField``or ``<Varname>_DtField
: str, required. Date-time column in data fileFile``or ``<Varname>_File
: str, required. Name or path to data time seriescsv
file.
src_dir (str) – str, optional Path for input directory in the case for only file names in
File
column.filter_dates (str) – list, optional List of Start and End dates for filter data
Notes:
The
set_data
method populates the time series collection with data based on the provided DataFrame.It creates time series objects, loads data, and performs additional processing steps.
Adjust
skip_process
according to your data processing needs.
Examples:
>>> ts_collection.set_data(df, "path/to/data", filter_dates=["2020-01-01 00:00:00", "2020-03-12 00:00:00"])
- clear_outliers()[source]
Clear outliers in the collection based on the
datarange_min
anddatarange_max
attributes.- Returns
None
- Return type
None
- merge_data()[source]
Merge data from multiple sources into a single DataFrame.
- Returns
DataFrame A merged DataFrame with datetime and variable fields from different sources.
- Return type
pandas.DataFrame
Notes: - Updates the catalog details. - Merges data from different sources based on the specified datetime field and variable field. - The merged DataFrame includes a date range covering the entire period.
Examples:
>>> merged_df = ts.merge_data()
- standardize()[source]
Standardize the time series data.
This method standardizes all time series objects in the collection.
- Returns
None
- Return type
None
Notes:
The method iterates through each time series in the collection and standardizes it.
After standardizing individual time series, the data is merged.
The merged data is then reset for each individual time series in the collection.
Epoch statistics are updated for each time series after the reset.
Finally, the collection catalog is updated with details.
Examples:
>>> ts_collection = TimeSeriesCollection() >>> ts_collection.standardize()
- merge_local_epochs()[source]
Merge local epochs statistics from individual time series within the collection.
- Returns
Merged
pandas.DataFrame``
containing epochs statistics.- Return type
pandas.DataFrame
Notes:
This method creates an empty list to store individual epochs statistics dataframes.
It iterates through each time series in the collection.
For each time series, it updates the local epochs statistics using the :meth:
update_epochs_stats
method.The local epochs statistics dataframes are then appended to the list.
Finally, the list of dataframes is concatenated into a single dataframe.
Examples:
>>> ts_collection = TimeSeriesCollection() >>> merged_epochs = ts_collection.merge_local_epochs()
- get_epochs()[source]
Calculate epochs for the time series data.
- Returns
DataFrame with epochs information.
- Return type
pandas.DataFrame
Notes:
This method merges the time series data to create a working DataFrame.
It creates a copy of the DataFrame for NaN-value calculation.
Converts non-NaN values to 1 and NaN values to 0.
Calculates the sum of non-NaN values for each row and updates the DataFrame with the result.
Extracts relevant columns for epoch calculation.
Sets 0 values in the specified
overfield
column to NaN.Creates a new
TimeSeries`
instance for epoch calculation using theoverfield
values.Calculates epochs using the :meth:
get_epochs
method of the newTimeSeries`
instance.Updates the original DataFrame with the calculated epochs.
Examples:
>>> epochs_data = ts_instance.get_epochs()
- view(show=True, folder='./output', filename=None, dpi=300, fig_format='jpg', suff='', usealias=False)[source]
Visualize the time series collection.
- Parameters
show (bool) – bool, optional If True, the plot will be displayed interactively. If False, the plot will be saved to a file. Default is True.
folder (str) – str, optional The folder where the plot file will be saved. Used only if show is False. Default is “./output”.
filename (str or None) – str, optional The base name of the plot file. Used only if show is False. If None, a default filename is generated. Default is None.
dpi (int) – int, optional The dots per inch (resolution) of the plot file. Used only if show is False. Default is 300.
fig_format (str) – str, optional The format of the plot file. Used only if show is False. Default is “jpg”.
usealias (bool) – bool, optional Option for using the Alias instead of Name in the plot. Default is False.
- Returns
None If show is True, the plot is displayed interactively. If show is False, the plot is saved to a file.
- Return type
None
Notes:
This function generates a scatter plot with colored epochs based on the epochs’ start and end times. The plot includes data points within each epoch, and each epoch is labeled with its corresponding ID.
Examples:
>>> ts.view(show=True)
>>> ts.view(show=False, folder="./output", filename="time_series_plot", dpi=300, fig_format="png")
- export_views(folder, dpi=300, fig_format='jpg', suff='', skip_main=False, raw=False)[source]
Export views of time series data and individual time series within the collection.
- Parameters
folder (str) – str The folder path where the views will be exported.
dpi (int) – int, optional Dots per inch (resolution) for the exported images, default is 300.
fig_format (str) – str, optional Format for the exported figures, default is “jpg”.
suff (str) – str, optional Suffix to be appended to the exported file names, default is an empty string.
skip_main (bool) – bool, optional Option for skipping the main plot (pannel)
raw (str) – bool, optional Option for considering a raw data series. No epochs analysis. Default is False.
- Returns
None
- Return type
None
Notes:
Updates the collection details and epoch statistics.
Calls the
view
method for the entire collection and individual time series with specified parameters.Sets view specifications for individual time series, such as y-axis limits and time range.
Examples:
>>> tscoll.export_views(folder="/path/to/export", dpi=300, fig_format="jpg", suff="_views")
- __annotations__ = {}
- class plans.datasets.core.TimeSeriesCluster(name='myTimeSeriesCluster', base_object=None)[source]
Bases:
TimeSeriesCollection
The
TimeSeriesCluster
instance is desgined for holding a collection of same variable time series. That is, no miscellaneus data is allowed.- __init__(name='myTimeSeriesCluster', base_object=None)[source]
Deploy the time series collection data structure.
- Parameters
name (str) – str, optional Name of the time series collection. Default is “myTSCollection”.
base_object (TimeSeries or None) – TimeSeries, optional Base object for the time series collection. If None, a default TimeSeries object is created. Default is None.
Notes:
If
base_object
is not provided, a defaultTimeSeries
object is created.
Examples:
>>> ts_collection = TimeSeriesCollection(name="MyTSCollection")
- __annotations__ = {}
- class plans.datasets.core.TimeSeriesSamples(name='myTimeSeriesSamples', base_object=None)[source]
Bases:
TimeSeriesCluster
The
TimeSeriesSamples
instance is desgined for holding a collection of same variable time series arising from the same underlying process. This means that all elements in the collection are statistical samples.This instance allows for the
reducer()
method.- __init__(name='myTimeSeriesSamples', base_object=None)[source]
Deploy the time series collection data structure.
- Parameters
name (str) – str, optional Name of the time series collection. Default is “myTSCollection”.
base_object (TimeSeries or None) – TimeSeries, optional Base object for the time series collection. If None, a default TimeSeries object is created. Default is None.
Notes:
If
base_object
is not provided, a defaultTimeSeries
object is created.
Examples:
>>> ts_collection = TimeSeriesCollection(name="MyTSCollection")
- __annotations__ = {}
- class plans.datasets.core.TimeSeriesSpatialSamples(name='myTimeSeriesSpatialSample', base_object=None)[source]
Bases:
TimeSeriesSamples
The
TimeSeriesSpatialSamples
instance is desgined for holding a collection of same variable time series arising from the same underlying process in space. This means that all elements in the collection are statistical samples in space.This instance allows for the
regionalization()
method.- __init__(name='myTimeSeriesSpatialSample', base_object=None)[source]
Deploy the time series collection data structure.
- Parameters
name (str) – str, optional Name of the time series collection. Default is “myTSCollection”.
base_object (TimeSeries or None) – TimeSeries, optional Base object for the time series collection. If None, a default TimeSeries object is created. Default is None.
Notes:
If
base_object
is not provided, a defaultTimeSeries
object is created.
Examples:
>>> ts_collection = TimeSeriesCollection(name="MyTSCollection")
- regionalize(method='average')[source]
Regionalize the time series data using a specified method.
- Parameters
method (str) – str, optional Method for regionalization, default is “average”.
- Returns
None
- Return type
None
Notes:
This method handles standardization. If the time series data is not standardized, it applies standardization.
Computes epochs for the time series data.
Iterates through each time series in the collection and performs regionalization.
For each time series, sets up source and destination vectors, computes weights, and calculates regionalized values.
Updates the destination column in-place with the regionalized values and updates epochs statistics.
Updates the collection catalog with details.
Examples:
>>> ts_instance = YourTimeSeriesClass() >>> ts_instance.regionalize(method="average")
- __annotations__ = {}
- class plans.datasets.core.Raster(name='myRasterMap', dtype='float32')[source]
Bases:
object
The basic raster map dataset.
- __init__(name='myRasterMap', dtype='float32')[source]
Deploy a basic raster map object.
- Parameters
Attributes:
grid
(None): Main grid of the raster.backup_grid
(None): Backup grid for AOI operations.isaoi
(False): Flag indicating whether an AOI mask is applied.asc_metadata
(dict): Metadata dictionary with keys: ncols, nrows, xllcorner, yllcorner, cellsize, NODATA_value.nodatavalue
(None): NODATA value from asc_metadata.cellsize
(None): Cell size from asc_metadata.name
(str): Name of the raster map.dtype
(str): Data type of raster cells.cmap
(“jet”): Default color map for visualization.varname
(“Unknown variable”): Variable name associated with the raster.varalias
(“Var”): Variable alias.description
(None): Description of the raster map.units
(“units”): Measurement units of the raster values.date
(None): Date associated with the raster map.source_data
(None): Source data information.prj
(None): Projection information.path_ascfile
(None): Path to the .asc raster file.path_prjfile
(None): Path to the .prj projection file.view_specs
(None): View specifications for visualization.
Examples:
>>> # Create a raster map with default settings >>> raster = Raster()
>>> # Create a raster map with custom name and data type >>> custom_raster = Raster(name="CustomRaster", dtype="int16")
- set_grid(grid)[source]
Set the data grid for the raster object.
This function allows setting the data grid for the raster object. The incoming grid should be a NumPy array.
- Parameters
grid (
numpy.ndarray
) –numpy.ndarray
The data grid to be set for the raster.
Notes:
The function overwrites the existing data grid in the raster object with the incoming grid, ensuring that the data type matches the raster’s dtype.
Nodata values are masked after setting the grid.
Examples:
>>> # Example of setting a new grid >>> new_grid = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> raster.set_grid(new_grid)
- set_asc_metadata(metadata)[source]
Set metadata for the raster object based on incoming metadata.
This function allows setting metadata for the raster object from an incoming metadata dictionary. The metadata should include information such as the number of columns, number of rows, corner coordinates, cell size, and nodata value.
- Parameters
metadata (dict) –
dict A dictionary containing metadata for the raster. Example metadata for a
.asc
file raster:meta = { 'ncols': 366, 'nrows': 434, 'xllcorner': 559493.08, 'yllcorner': 6704832.2, 'cellsize': 30, 'NODATA_value': -1 }
Notes:
The function updates the raster object’s metadata based on the provided dictionary, ensuring that existing metadata keys are preserved.
It specifically updates nodata value and cell size attributes in the raster object.
Examples:
>>> # Example of setting metadata >>> metadata_dict = {'ncols': 200, 'nrows': 300, 'xllcorner': 500000.0, 'yllcorner': 6000000.0, 'cellsize': 25, 'NODATA_value': -9999} >>> raster.set_asc_metadata(metadata_dict)
- load(asc_file, prj_file=None)[source]
Load data from files to the raster object.
This function loads data from
.asc
raster and ‘.prj’ projection files into the raster object.- Parameters
- Returns
None
- Return type
None
Notes:
The function first loads the raster data from the
.asc
file using theload_asc_raster
method.If a ‘.prj’ file is not explicitly provided, the function attempts to use a ‘.prj’ file with the same path and name as the
.asc
file.The function then loads the projection information from the ‘.prj’ file using the
load_prj_file
method.
Examples:
>>> # Example of loading data >>> raster.boot(asc_file="path/to/raster.asc")
>>> # Example of loading data with a specified projection file >>> raster.boot(asc_file="path/to/raster.asc", prj_file="path/to/raster.prj")
- load_tif_raster(file)[source]
Load data from ‘.tif’ raster files.
This function loads data from ‘.tif’ raster files into the raster object. Note that metadata may be provided from other sources.
- Parameters
file (str) – str The file path of the ‘.tif’ raster file.
- Returns
None
- Return type
None
Notes:
The function uses the Pillow (PIL) library to open the ‘.tif’ file and converts it to a NumPy array.
Metadata may need to be provided separately, as this function focuses on loading raster data.
The loaded data grid is set using the
set_grid
method of the raster object.
Examples:
>>> # Example of loading data from a '.tif' file >>> raster.load_tif_raster(file="path/to/raster.tif")
- load_asc_raster(file)[source]
Load data and metadata from
.asc
raster files.This function loads both data and metadata from
.asc
raster files into the raster object.- Parameters
file (str) – str The file path to the
.asc
raster file.- Returns
None
- Return type
None
Notes:
The function reads the content of the
.asc
file, extracts metadata, and constructs the data grid.The metadata includes information such as the number of columns, number of rows, corner coordinates, cell size, and nodata value.
The data grid is constructed from the array information provided in the
.asc
file.The function depends on the existence of a properly formatted
.asc
file.No additional dependencies beyond standard Python libraries are required.
Examples:
>>> # Example of loading data and metadata from a ``.asc`` file >>> raster.load_asc_raster(file="path/to/raster.asc")
- load_asc_metadata(file)[source]
Load only metadata from
.asc
raster files.This function extracts metadata from
.asc
raster files and sets it as attributes in the raster object.- Parameters
file (str) – str The file path to the
.asc
raster file.- Returns
None
- Return type
None
Notes:
The function reads the first six lines of the
.asc
file to extract metadata.Metadata includes information such as the number of columns, number of rows, corner coordinates, cell size, and nodata value.
The function sets the metadata as attributes in the raster object using the
set_asc_metadata
method.This function is useful when only metadata needs to be loaded without the entire data grid.
Examples:
>>> # Example of loading metadata from a ``.asc`` file >>> raster.load_asc_metadata(file="path/to/raster.asc")
- load_prj_file(file)[source]
Load ‘.prj’ auxiliary file to the ‘prj’ attribute.
This function loads the content of a ‘.prj’ auxiliary file and sets it as the ‘prj’ attribute in the raster object.
- Parameters
file (str) – str The file path to the ‘.prj’ auxiliary file.
- Returns
None
- Return type
None
Notes:
The function reads the content of the ‘.prj’ file and assigns it to the ‘prj’ attribute.
The ‘prj’ attribute typically contains coordinate system information in Well-Known Text (WKT) format.
This function is useful for associating coordinate system information with raster data.
Examples:
>>> # Example of loading coordinate system information from a '.prj' file >>> raster.load_prj_file(file="path/to/raster.prj")
- copy_structure(raster_ref, n_nodatavalue=None)[source]
Copy structure (asc_metadata and prj file) from another raster object.
This function copies the structure, including asc_metadata and prj, from another raster object to the current raster object.
- Parameters
raster_ref (
datasets.Raster
) –datasets.Raster
The reference incoming raster object from which to copy asc_metadata and prj.n_nodatavalue (float) – float, optional The new nodata value for different raster objects. If None, the nodata value remains unchanged.
- Returns
None
- Return type
None
Notes:
The function copies the asc_metadata and prj attributes from the reference raster object to the current raster object.
If a new nodata value is provided, it updates the ‘NODATA_value’ in the copied asc_metadata.
This function is useful for ensuring consistency in metadata and coordinate system information between raster objects.
Examples:
>>> # Example of copying structure from a reference raster object >>> new_raster.copy_structure(raster_ref=reference_raster, n_nodatavalue=-9999.0)
- export(folder, filename=None)[source]
Export raster data to a folder.
This function exports raster data, including the
.asc
raster file and ‘.prj’ projection file, to the specified folder.- Parameters
- Returns
None
- Return type
None
Notes:
The function exports the raster data to the specified folder, creating
.asc
and ‘.prj’ files.If a filename is not provided, the function uses the name of the raster object.
The exported files will have the same filename with different extensions (
.asc
and ‘.prj’).This function is useful for saving raster data to a specified directory.
Examples:
>>> # Example of exporting raster data to a folder >>> raster.export(folder="path/to/export_folder", filename="exported_raster")
- export_asc_raster(folder, filename=None)[source]
Export an
.asc
raster file.This function exports the raster data as an
.asc
file to the specified folder.- Parameters
- Returns
str The full file name (path and extension) of the exported
.asc
raster file.- Return type
Notes:
The function exports the raster data to an
.asc
file in the specified folder.If a filename is not provided, the function uses the name of the raster object.
The exported
.asc
file contains metadata and data information.This function is useful for saving raster data in ASCII format.
Examples:
>>> # Example of exporting an ``.asc`` raster file to a folder >>> raster.export_asc_raster(folder="path/to/export_folder", filename="exported_raster")
- export_prj_file(folder, filename=None)[source]
Export a ‘.prj’ file.
This function exports the coordinate system information to a ‘.prj’ file in the specified folder.
- Parameters
- Returns
str or None The full file name (path and extension) of the exported ‘.prj’ file, or None if no coordinate system information is available.
- Return type
str or None
Notes:
The function exports the coordinate system information to a ‘.prj’ file in the specified folder.
If a filename is not provided, the function uses the name of the raster object.
The exported ‘.prj’ file contains coordinate system information in Well-Known Text (WKT) format.
This function is useful for saving coordinate system information associated with raster data.
Examples:
>>> # Example of exporting a '.prj' file to a folder >>> raster.export_prj_file(folder="path/to/export_folder", filename="exported_prj")
- mask_nodata()[source]
Mask grid cells as NaN where data is NODATA.
- Returns
None
- Return type
None
Notes:
The function masks grid cells as NaN where the data is equal to the specified NODATA value.
If NODATA value is not set, no masking is performed.
- insert_nodata()[source]
Insert grid cells as NODATA where data is NaN.
- Returns
None
- Return type
None
Notes:
The function inserts NODATA values into grid cells where the data is NaN.
If NODATA value is not set, no insertion is performed.
- rebase_grid(base_raster, inplace=False, method='linear_model')[source]
Rebase the grid of a raster.
This function creates a new grid based on a provided reference raster. Both rasters are expected to be in the same coordinate system and have overlapping bounding boxes.
- Parameters
base_raster (
datasets.Raster
) –datasets.Raster
The reference raster used for rebase. It should be in the same coordinate system and have overlapping bounding boxes.inplace (bool) – bool, optional If True, the rebase operation will be performed in-place, and the original raster’s grid will be modified. If False, a new rebased grid will be returned, and the original data will remain unchanged. Default is False.
method (str) – str, optional Interpolation method for rebasing the grid. Options include “linear_model,” “nearest,” and “cubic.” Default is “linear_model.”
- Returns
numpy.ndarray`
or None If inplace is False, a new rebased grid as a NumPy array. If inplace is True, returns None, and the original raster’s grid is modified in-place.- Return type
numpy.ndarray`
or None
Notes:
The rebase operation involves interpolating the values of the original grid to align with the reference raster’s grid.
The method parameter specifies the interpolation method and can be “linear_model,” “nearest,” or “cubic.”
The rebase assumes that both rasters are in the same coordinate system and have overlapping bounding boxes.
Examples:
>>> # Example with inplace=True >>> raster.rebase_grid(base_raster=reference_raster, inplace=True)
>>> # Example with inplace=False >>> rebased_grid = raster.rebase_grid(base_raster=reference_raster, inplace=False)
- apply_aoi_mask(grid_aoi, inplace=False)[source]
Apply AOI (area of interest) mask to the raster map.
This function applies an AOI (area of interest) mask to the raster map, replacing values outside the AOI with the NODATA value.
- Parameters
grid_aoi (
numpy.ndarray
) –numpy.ndarray
Map of AOI (masked array or pseudo-boolean). Expected to have the same grid shape as the raster.inplace (bool) – bool, optional If True, overwrite the main grid with the masked values. If False, create a backup and modify a copy of the grid. Default is False.
- Returns
None
- Return type
None
Notes:
The function replaces values outside the AOI (where grid_aoi is 0) with the NODATA value.
If NODATA value is not set, no replacement is performed.
If inplace is True, the main grid is modified. If False, a backup of the grid is created before modification.
This function is useful for focusing analysis or visualization on a specific area within the raster map.
Examples:
>>> # Example of applying an AOI mask to the raster map >>> raster.apply_aoi_mask(grid_aoi=aoi_mask, inplace=True)
- release_aoi_mask()[source]
Release AOI mask from the main grid. Backup grid is restored.
This function releases the AOI (area of interest) mask from the main grid, restoring the original values from the backup grid.
- Returns
None
- Return type
None
Notes:
If an AOI mask has been applied, this function restores the original values to the main grid from the backup grid.
If no AOI mask has been applied, the function has no effect.
After releasing the AOI mask, the backup grid is set to None, and the raster object is no longer considered to have an AOI mask.
Examples:
>>> # Example of releasing the AOI mask from the main grid >>> raster.release_aoi_mask()
- cut_edges(upper, lower, inplace=False)[source]
Cutoff upper and lower values of the raster grid.
- Parameters
- Returns
numpy.ndarray`
or None The processed grid if inplace is False. If inplace is True, returns None.- Return type
Union[None, np.ndarray]
Notes:
Values in the raster grid below the lower value are set to the lower value.
Values in the raster grid above the upper value are set to the upper value.
If inplace is False, a processed copy of the grid is returned, leaving the original grid unchanged.
This function is useful for clipping extreme values in the raster grid.
Examples:
>>> # Example of cutting off upper and lower values in the raster grid >>> processed_grid = raster.cut_edges(upper=100, lower=0, inplace=False) >>> # Alternatively, modify the main grid in-place >>> raster.cut_edges(upper=100, lower=0, inplace=True)
- get_metadata()[source]
Get all metadata from the base object.
- Returns
Metadata dictionary.
”Name” (str): Name of the raster.
”Variable” (str): Variable name.
”VarAlias” (str): Variable alias.
”Units” (str): Measurement units.
”Date” (str): Date information.
”Source” (str): Data source.
”Description” (str): Description of the raster.
”cellsize” (float): Cell size of the raster.
”ncols” (int): Number of columns in the raster grid.
”nrows” (int): Number of rows in the raster grid.
”xllcorner” (float): X-coordinate of the lower-left corner.
”yllcorner” (float): Y-coordinate of the lower-left corner.
”NODATA_value” (Union[float, None]): NODATA value in the raster.
”Prj” (str): Projection information.
”Path_ASC” (str): File path to the ASC raster file.
”Path_PRJ” (str): File path to the PRJ projection file.
- Return type
- get_bbox()[source]
Get the Bounding Box of the map.
- Returns
Dictionary of xmin, xmax, ymin, and ymax.
”xmin” (float): Minimum x-coordinate.
”xmax” (float): Maximum x-coordinate.
”ymin” (float): Minimum y-coordinate.
”ymax” (float): Maximum y-coordinate.
- Return type
- get_grid_datapoints(drop_nan=False)[source]
Get flat and cleared grid data points (x, y, and z).
- Parameters
drop_nan (bool) – Option to ignore nan values.
- Returns
DataFrame of x, y, and z fields.
- Return type
pandas.DataFrame``
or None If the grid is None, returns None.
Notes:
This function extracts coordinates (x, y, and z) from the raster grid.
The x and y coordinates are determined based on the grid cell center positions.
If drop_nan is True, nan values are ignored in the resulting DataFrame.
The resulting DataFrame includes columns for x, y, z, i, and j coordinates.
Examples:
>>> # Get grid data points with nan values included >>> datapoints_df = raster.get_grid_datapoints(drop_nan=False) >>> # Get grid data points with nan values ignored >>> clean_datapoints_df = raster.get_grid_datapoints(drop_nan=True)
- get_grid_data()[source]
Get flat and cleared grid data.
- Returns
1D vector of cleared data.
- Return type
numpy.ndarray`
or None If the grid is None, returns None.
Notes:
This function extracts and flattens the grid data, removing any masked or NaN values.
For integer grids, the masked values are ignored.
For floating-point grids, both masked and NaN values are ignored.
Examples:
>>> # Get flattened and cleared grid data >>> data_vector = raster.get_grid_data()
- get_grid_stats()[source]
Get basic statistics from flat and cleared data.
- Returns
DataFrame of basic statistics.
- Return type
pandas.DataFrame``
or None If the grid is None, returns None.
Notes:
This function computes basic statistics from the flattened and cleared grid data.
Basic statistics include measures such as mean, median, standard deviation, minimum, and maximum.
Requires the ‘plans.analyst’ module for statistical analysis.
Examples:
>>> # Get basic statistics from the raster grid >>> stats_dataframe = raster.get_grid_stats()
- get_aoi(by_value_lo, by_value_hi)[source]
Get the AOI map from an interval of values (values are expected to exist in the raster).
- Parameters
- Returns
AOI map.
- Return type
AOI`
object
Notes:
This function creates an AOI (Area of Interest) map based on a specified value range.
The AOI map is constructed as a binary grid where values within the specified range are set to 1, and others to 0.
Examples:
>>> # Get AOI map for values between 10 and 20 >>> aoi_map = raster.get_aoi(by_value_lo=10, by_value_hi=20)
- _set_view_specs()[source]
Set default view specs.
- Returns
None
- Return type
None
Notes:
This private method sets default view specifications for visualization.
The view specs include color, colormap, titles, dimensions, and other parameters for visualization.
These default values can be adjusted based on specific requirements.
Examples:
>>> # Set default view specifications >>> obj._set_view_specs()
- view(accum=True, show=True, stats=True, folder='./output', filename=None, dpi=300, fig_format='jpg')[source]
Plot a basic panel of the raster map.
- Parameters
accum (bool) – boolean to include an accumulated probability plot, defaults to True
show (bool) – boolean to show the plot instead of saving, defaults to True
stats (bool) – boolean to include stats, defaults to True
folder (str) – path to the output folder, defaults to “./output”
filename (str) – name of the file, defaults to None
dpi (int) – image resolution, defaults to 300
fig_format (str) – image format (e.g., jpg or png), defaults to “jpg”
Notes:
This function generates a basic panel for visualizing the raster map, including the map itself, a histogram, metadata, and basic statistics.
The panel includes various customization options such as color, titles, dimensions, and more.
The resulting plot can be displayed or saved based on the specified parameters.
Examples:
>>> # Show the plot without saving >>> raster.view()
>>> # Save the plot to a file >>> raster.view(show=False, folder="./output", filename="raster_plot", dpi=300, fig_format="png")
- class plans.datasets.core.QualiRaster(name='QualiMap', dtype='uint8')[source]
Bases:
Raster
Basic qualitative raster map dataset.
Attributes dataframe must at least have: *
Id`
field *Name`
field *Alias`
field- set_asc_metadata(metadata)[source]
Set metadata for the raster object based on incoming metadata.
This function allows setting metadata for the raster object from an incoming metadata dictionary. The metadata should include information such as the number of columns, number of rows, corner coordinates, cell size, and nodata value.
- Parameters
metadata (dict) –
dict A dictionary containing metadata for the raster. Example metadata for a
.asc
file raster:meta = { 'ncols': 366, 'nrows': 434, 'xllcorner': 559493.08, 'yllcorner': 6704832.2, 'cellsize': 30, 'NODATA_value': -1 }
Notes:
The function updates the raster object’s metadata based on the provided dictionary, ensuring that existing metadata keys are preserved.
It specifically updates nodata value and cell size attributes in the raster object.
Examples:
>>> # Example of setting metadata >>> metadata_dict = {'ncols': 200, 'nrows': 300, 'xllcorner': 500000.0, 'yllcorner': 6000000.0, 'cellsize': 25, 'NODATA_value': -9999} >>> raster.set_asc_metadata(metadata_dict)
- rebase_grid(base_raster, inplace=False)[source]
Rebase the grid of a raster.
This function creates a new grid based on a provided reference raster. Both rasters are expected to be in the same coordinate system and have overlapping bounding boxes.
- Parameters
base_raster (
datasets.Raster
) –datasets.Raster
The reference raster used for rebase. It should be in the same coordinate system and have overlapping bounding boxes.inplace (bool) – bool, optional If True, the rebase operation will be performed in-place, and the original raster’s grid will be modified. If False, a new rebased grid will be returned, and the original data will remain unchanged. Default is False.
method (str) – str, optional Interpolation method for rebasing the grid. Options include “linear_model,” “nearest,” and “cubic.” Default is “linear_model.”
- Returns
numpy.ndarray`
or None If inplace is False, a new rebased grid as a NumPy array. If inplace is True, returns None, and the original raster’s grid is modified in-place.- Return type
numpy.ndarray`
or None
Notes:
The rebase operation involves interpolating the values of the original grid to align with the reference raster’s grid.
The method parameter specifies the interpolation method and can be “linear_model,” “nearest,” or “cubic.”
The rebase assumes that both rasters are in the same coordinate system and have overlapping bounding boxes.
Examples:
>>> # Example with inplace=True >>> raster.rebase_grid(base_raster=reference_raster, inplace=True)
>>> # Example with inplace=False >>> rebased_grid = raster.rebase_grid(base_raster=reference_raster, inplace=False)
- reclassify(dict_ids, df_new_table, talk=False)[source]
Reclassify QualiRaster Ids in grid and table
- load(asc_file, prj_file, table_file)[source]
Load data from files to raster :param asc_file: folder_main to
.asc
raster file :type asc_file: str :param prj_file: folder_main to.prj
projection file :type prj_file: str :param table_file: folder_main to.txt
table file :type table_file: str :return: None :rtype: None
- load_table(file)[source]
Load attributes dataframe from
csv
.txt
file (separator must be ;).- Parameters
file (str) – folder_main to file
- export(folder, filename=None)[source]
Export raster data param folder: string of directory folder_main, :type folder: str :param filename: string of file without extension, defaults to None :type filename: str :return: None :rtype: None
- set_table(dataframe)[source]
Set attributes dataframe from incoming
pandas.DataFrame
.- Parameters
dataframe (
pandas.DataFrame
) – incoming pandas dataframe
- get_areas(merge=False)[source]
Get export_areas in map of each category in table.
- Parameters
merge (bool, defaults to False) – option to merge data with raster table
- Returns
export_areas dataframe
- Return type
pandas.DataFrame
- get_zonal_stats(raster_sample, merge=False, skip_count=False)[source]
Get zonal stats from other raster map to sample.
- get_aoi(by_value_id)[source]
Get the AOI map from a specific value id (value is expected to exist in the raster) :param by_value_id: category id value :type by_value_id: int :return: AOI map :rtype:
AOI`
object
- view(show=True, export_areas=True, folder='./output', filename=None, dpi=300, fig_format='jpg', filter=False, n_filter=6)[source]
Plot a basic pannel of qualitative raster map.
- Parameters
show (bool) – option to show plot instead of saving, defaults to False
export_areas (bool) – option to export areas table, defaults to True
folder (str) – folder_main to output folder, defaults to
./output
filename (str) – name of file, defaults to None
dpi (int) – image resolution, defaults to 96
fig_format (str) – image fig_format (ex: png or jpg). Default jpg
filter (bool) – option for collapsing to n classes max (create “other” class)
n_filter (int) – number of total classes + others
- Returns
None
- Return type
None
- __annotations__ = {}
- class plans.datasets.core.QualiHard(name='qualihard')[source]
Bases:
QualiRaster
A Quali-Hard is a hard-coded qualitative map (that is, the table is pre-set)
- __annotations__ = {}
- class plans.datasets.core.Zones(name='ZonesMap')[source]
Bases:
QualiRaster
Zones map dataset
- set_table()[source]
Set attributes dataframe from incoming
pandas.DataFrame
.- Parameters
dataframe (
pandas.DataFrame
) – incoming pandas dataframe
- set_grid(grid)[source]
Set the data grid for the raster object.
This function allows setting the data grid for the raster object. The incoming grid should be a NumPy array.
- Parameters
grid (
numpy.ndarray
) –numpy.ndarray
The data grid to be set for the raster.
Notes:
The function overwrites the existing data grid in the raster object with the incoming grid, ensuring that the data type matches the raster’s dtype.
Nodata values are masked after setting the grid.
Examples:
>>> # Example of setting a new grid >>> new_grid = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> raster.set_grid(new_grid)
- load(asc_file, prj_file)[source]
Load data from files to raster :param asc_file: folder_main to
.asc
raster file :type asc_file: str :param prj_file: folder_main to.prj
projection file :type prj_file: str :return: None :rtype: None
- get_aoi(zone_id)[source]
Get the AOI map from a zone id :param zone_id: number of zone ID :type zone_id: int :return: AOI map :rtype:
AOI`
object
- view(show=True, folder='./output', filename=None, specs=None, dpi=150, fig_format='jpg')[source]
Plot a basic pannel of raster map.
- Parameters
show (bool) – boolean to show plot instead of saving, defaults to False
folder (str) – folder_main to output folder, defaults to
./output
filename (str) – name of file, defaults to None
specs (dict) – specifications dictionary, defaults to None
dpi (int) – image resolution, defaults to 96
fig_format (str) – image fig_format (ex: png or jpg). Default jpg
- __annotations__ = {}
- class plans.datasets.core.RasterCollection(name='myRasterCollection')[source]
Bases:
Collection
The raster test_collection base dataset. This data strucute is designed for holding and comparing
Raster`
objects.- __init__(name='myRasterCollection')[source]
Deploy the raster test_collection data structure.
- Parameters
name (str) – name of raster test_collection
- load(name, asc_file, prj_file=None, varname=None, varalias=None, units=None, date=None, dtype='float32', skip_grid=False)[source]
Load a
Raster`
base_object from a.asc
raster file.- Parameters
name (str) –
Raster.name`
name attributeasc_file (str) – folder_main to
.asc
raster filevarname (str) –
Raster.varname`
variable name attribute, defaults to Nonevaralias (str) –
Raster.varalias`
variable alias attribute, defaults to Noneunits (str) –
Raster.units`
units attribute, defaults to Nonedate (str) –
Raster.date`
date attribute, defaults to Noneskip_grid (bool) – option for loading only the metadata
- reducer(reducer_func, reduction_name, extra_arg=None, skip_nan=False, talk=False)[source]
This method reduces the test_collection by applying a numpy broadcasting function (example: np.mean)
- Parameters
reducer_func (numpy function) – reducer numpy function (example: np.mean)
reduction_name (str) – name for the output raster
extra_arg (any) – extra argument for function (example: np.percentiles) - Default: None
skip_nan (bool) – Option for skipping NaN values in map
talk (bool) – option for printing messages
- Returns
raster object based on the first object found in the test_collection
- Return type
- percentile(percentile, skip_nan=False, talk=False)[source]
Reduce Collection to the Nth Percentile raster
- get_collection_stats()[source]
Get basic statistics from test_collection.
- Returns
statistics data
- Return type
pandas.DataFrame
- get_views(show=True, folder='./output', dpi=300, fig_format='jpg')[source]
Plot all basic pannel of raster maps in test_collection.
- view_bboxes(colors=None, datapoints=False, show=True, folder='./output', filename=None, dpi=150, fig_format='jpg')[source]
View Bounding Boxes of Raster test_collection
- Parameters
colors (list) – list of colors for plotting. expected to be the same runsize of catalog
datapoints (bool) – option to plot datapoints as well, defaults to False
show (bool) – option to show plot instead of saving, defaults to False
folder (str) – folder_main to output folder, defaults to
./output
filename (str) – name of file, defaults to None
dpi (int) – image resolution, defaults to 96
fig_format (str) – image fig_format (ex: png or jpg). Default jpg
- Returns
None
- Return type
none
- __annotations__ = {}
- class plans.datasets.core.QualiRasterCollection(name)[source]
Bases:
RasterCollection
The raster test_collection base dataset.
This data strucute is designed for holding and comparing
QualiRaster`
objects.- load(name, asc_file, prj_file=None, table_file=None)[source]
Load a
QualiRaster`
base_object from.asc
raster file.
- __annotations__ = {}
- class plans.datasets.core.RasterSeries(name, varname, varalias, units, dtype='float32')[source]
Bases:
RasterCollection
A
RasterCollection`
where date matters and all maps in collections are expected to be the same variable, same projection and same grid.- load(name, date, asc_file, prj_file=None)[source]
Load a
Raster`
base_object from a.asc
raster file.
- load_folder(folder, name_pattern='map_*', talk=False)[source]
Load all rasters from a folder by following a name pattern. Date is expected to be at the end of name before file extension.
- apply_aoi_masks(grid_aoi, inplace=False)[source]
Batch method to apply AOI mask over all maps in test_collection
- Parameters
grid_aoi (
numpy.ndarray
) – aoi gridinplace (bool) – overwrite the main grid if True, defaults to False
- Returns
None
- Return type
None
- release_aoi_masks()[source]
Batch method to release the AOI mask over all maps in test_collection
- Returns
None
- Return type
None
- rebase_grids(base_raster, talk=False)[source]
Batch method for rebase all maps in test_collection
- Parameters
base_raster (
datasets.Raster
) – base raster for rebasingtalk (bool) – option for print messages
- Returns
None
- Return type
None
- get_series_stats()[source]
Get the raster series statistics
- Returns
dataframe of raster series statistics
- Return type
pandas.DataFrame
- get_views(show=True, folder='./output', view_specs=None, dpi=300, fig_format='jpg', talk=False)[source]
Plot all basic pannel of raster maps in test_collection.
- Parameters
show (bool) – boolean to show plot instead of saving, defaults to False
folder (str) – folder_main to output folder, defaults to
./output
view_specs (dict) – specifications dictionary, defaults to None
dpi (int) – image resolution, defaults to 96
fig_format (str) – image fig_format (ex: png or jpg). Default jpg
talk (bool) – option for print messages
- Returns
None
- Return type
None
- view_series_stats(statistic='Mean', folder='./output', filename=None, specs=None, show=True, dpi=150, fig_format='jpg')[source]
View raster series statistics
- Parameters
statistic (str) – statistc to view. Default mean
show (bool) – option to show plot instead of saving, defaults to False
folder (str) – folder_main to output folder, defaults to
./output
filename (str) – name of file, defaults to None
specs (dict) – specifications dictionary, defaults to None
dpi (int) – image resolution, defaults to 96
fig_format (str) – image fig_format (ex: png or jpg). Default jpg
- Returns
None
- Return type
None
- __annotations__ = {}
- class plans.datasets.core.QualiRasterSeries(name, varname, varalias, dtype='uint8')[source]
Bases:
RasterSeries
A
RasterSeries`
where date matters and all maps in collections are expected to beQualiRaster`
with the same variable, same projection and same grid.- update_table(clear=True)[source]
Update series table (attributes) :param clear: option for clear table from unfound values. default: True :type clear: bool :return: None :rtype: None
- append(raster)[source]
Append a
Raster`
base_object to test_collection. Pre-existing objects with the sameRaster.name`
attribute are replaced- Parameters
raster (
Raster
) – incomingRaster`
to append
- load(name, date, asc_file, prj_file=None, table_file=None)[source]
Load a
QualiRaster`
base_object from.asc
raster file.
- load_folder(folder, table_file, name_pattern='map_*', talk=False, use_parallel=False, num_threads=None)[source]
Load all rasters from a folder by following a name pattern. Date is expected to be at the end of name before file extension. Supports both serial and parallel processing using threads.
- Parameters
- Returns
None
- Return type
None
- _load_folder(folder, table_file, name_pattern='map_*', talk=False)[source]
Load all rasters from a folder by following a name pattern. Date is expected to be at the end of name before file extension.
- get_series_areas()[source]
Get export_areas prevalance for all series
- Returns
dataframe of series export_areas
- Return type
pandas.DataFrame
- view_series_areas(specs=None, show=True, export_areas=True, folder='./output', filename=None, dpi=300, fig_format='jpg')[source]
View series areas
- Parameters
specs (dict) – specifications dictionary, defaults to None
show (bool) – option to show plot instead of saving, defaults to False
folder (str) – folder_main to output folder, defaults to
./output
filename (str) – name of file, defaults to None
dpi (int) – image resolution, defaults to 96
fig_format (str) – image fig_format (ex: png or jpg). Default jpg
- Returns
None
- Return type
None
- get_views(show=True, export_areas=True, filter=False, n_filter=6, folder='./output', view_specs=None, dpi=300, fig_format='jpg', talk=False)[source]
Plot all basic pannel of raster maps in test_collection.
- Parameters
show (bool) – boolean to show plot instead of saving, defaults to False
folder (str) – folder_main to output folder, defaults to
./output
view_specs (dict) – specifications dictionary, defaults to None
dpi (int) – image resolution, defaults to 96
fig_format (str) – image fig_format (ex: png or jpg). Default jpg
talk (bool) – option for print messages
- Returns
None
- Return type
None
- __annotations__ = {}