calipy.data (API)
This module provides basic functionality to represent and access data in a way that interacts well with calipy’s basic classes and methods.
- The classes and functions are
- DataTuple: A class for holding tuples of various objects with explicit names.
is the basic object to be used for input variables, observations etc. as it makes explicit the meaning of the tensors passed or produced.
sample:
The DataTuple class is often used to manage and package data, including for the various forward() methods when activating CalipyNodes.
The script is meant solely for educational and illustrative purposes. Written by Dr. Jemil Avers Butt, Atlas optimization GmbH, www.atlasoptimization.com.
- class calipy.data.CalipyDataset(input_data, output_data, homogeneous=False)
Bases:
DatasetCalipyDataset is a class mimicking the functionality of the Dataset class in torch.utils.data but providing some streamlined prebuilt functions needed in the context of calipy. This includes support for subsampling based on CalipyDict objects. Is meant to be subclassed for augmenting user specified datasets with additional, calipy-ready functionality.
- Parameters:
input_data (NoneType, CalipyTensor, CalipyDict, CalipyIO) –
The input_data of the dataset reflecting the inputs to the model that evoke the corresponding outputs. Valid input types include:
None => No input data (no input)
CalipyTensor => Single tensor (single input)
CalipyDict => Dictionary containing CalipyTensors (multiple inputs)
- CalipyIO => List containing CalipyDict containing CalipyTensors
(multiple inputs, possibly of inhomogeneous shape and type)
output_data (NoneType, CalipyTensor, CalipyDict, CalipyIO) –
The output_data of the dataset reflecting the outputs of the model evoked by the corresponding inputs. Valid input types include:
None => No output data (no output)
CalipyTensor => Single tensor (single output)
CalipyDict => Dictionary containing CalipyTensors (multiple outputs)
- CalipyIO => List containing CalipyDict containing CalipyTensors
(multiple inputs, possibly of inhomogeneous shape and type)
batch_dims (DimTuple) – A DimTuple object defining the batch dimensions among which flattening and subsampling is performed.
- Returns:
An instance of CalipyDataset, suitable for accessing datasets and passing them to DataLoader objects.
- Return type:
The following scenarios need to be covered by the construction procedure:
(Input, Ouptut) = (None, CalipyTensor)
(Input, Ouptut) = (Calipytensor, CalipyTensor)
(Input, Ouptut) = (None, dict(CalipyTensor))
(Input, Output) = (dict(CalipyTensor), dict(CalipyTensor))
(Input, Ouptut) = (None, list(dict(CalipyTensor)))
(Input, Output) = (list(dict(CalipyTensor)), list(dict(CalipyTensor)))
(Input, Output) = (None, list_mixed)
(Input, Output) = (list_mixed, list_mixed)
where list_mixed means a list of dicts with entries to keys sometimes being None or of nonmatching shapes.
Example usage:
# i) Imports and definitions import torch import pyro from calipy.utils import dim_assignment from calipy.data import CalipyDataset, io_collate from calipy.tensor import CalipyTensor from torch.utils.data import DataLoader # Definitions n_meas = 2 n_event = 1 n_subbatch = 7 # ii) Create data for dataset # Set up sample distributions mu_true = torch.tensor(0.0) sigma_true = torch.tensor(0.1) # Sample from distributions & wrap result data_distribution = pyro.distributions.Normal(mu_true, sigma_true) data = data_distribution.sample([n_meas, n_event]) data_dims = dim_assignment(['bd_data', 'ed_data'], dim_sizes = [n_meas, n_event]) data_cp = CalipyTensor(data, data_dims, name = 'data') # dataset_inputs data_none = None data_ct = data_cp data_cd = {'a': data_cp, 'b' : data_cp} data_io = [data_cd, data_cd] data_io_mixed = [data_cd, {'a' : None, 'b' : data_cp} , {'a': data_cp, 'b':None}, data_cd] # iii) Build datasets # Build datasets and check dataset_none_none = CalipyDataset(input_data = data_none, output_data = data_none) dataset_none_ct = CalipyDataset(input_data = data_none, output_data = data_ct) dataset_none_cd = CalipyDataset(input_data = data_none, output_data = data_cd) dataset_none_io = CalipyDataset(input_data = data_none, output_data = data_io) dataset_none_iomixed = CalipyDataset(input_data = data_none, output_data = data_io_mixed) dataset_ct_ct = CalipyDataset(input_data = data_ct, output_data = data_ct) dataset_ct_cd = CalipyDataset(input_data = data_ct, output_data = data_cd) dataset_ct_io = CalipyDataset(input_data = data_ct, output_data = data_io) dataset_ct_iomixed = CalipyDataset(input_data = data_ct, output_data = data_io_mixed) dataset_cd_ct = CalipyDataset(input_data = data_cd, output_data = data_ct) dataset_cd_cd = CalipyDataset(input_data = data_cd, output_data = data_cd) dataset_cd_io = CalipyDataset(input_data = data_cd, output_data = data_io) dataset_cd_iomixed = CalipyDataset(input_data = data_cd, output_data = data_io_mixed) dataset_io_ct = CalipyDataset(input_data = data_io, output_data = data_ct) dataset_io_cd = CalipyDataset(input_data = data_io, output_data = data_cd) dataset_io_io = CalipyDataset(input_data = data_io, output_data = data_io) dataset_io_iomixed = CalipyDataset(input_data = data_io, output_data = data_io_mixed) dataset_iomixed_ct = CalipyDataset(input_data = data_io_mixed, output_data = data_ct) dataset_iomixed_cd = CalipyDataset(input_data = data_io_mixed, output_data = data_cd) dataset_iomixed_io = CalipyDataset(input_data = data_io_mixed, output_data = data_io) dataset_iomixed_iomixed = CalipyDataset(input_data = data_io_mixed, output_data = data_io_mixed) # iv) Build dataloader and subsample dataset = CalipyDataset(input_data = [None, data_ct, data_cd], output_data = [None, data_ct, data_cd] ) dataloader = DataLoader(dataset, batch_size=2, shuffle=True, collate_fn=io_collate) # Iterate through the DataLoader for batch_input, batch_output, batch_index in dataloader: print(batch_input, batch_output, batch_index)
- infer_length(query_data)
- class calipy.data.CalipyDict(data=None)
Bases:
dictA dictionary-like container that can store single or multiple items. If it contains exactly one item, calipy_dict.value can be called to retrieve it directly. Is meant as a convenient wrapper to DataTuple functionality and is the basis for the standard input/output/observation format CalipyIO handled inside of the CalipyNode objects. Is typically autowrapped around dictionaries or single objects provided by the user towards e.g. the forward() method. Has idempotent property and leaves CalipyDict objects unchanged.
CalipyDict allows heterogeneous tensor shapes for flexible datasets. Keys represent measurement identifiers (‘mean’, ‘var’, etc.); values are e.g. CalipyTensors with potentially differing shapes across CalipyDict instances.
- Parameters:
data (None, dict, CalipyDict, DataTuple, or any single object.) –
The data used to construct the CalipyDict. Valid input types include:
None: Initializes an empty dict.
dict[str, item]: Multi-item dictionary.
CalipyDict: Returns unchanged.
DataTuple: Converted to dict.
Single object: Stored under a default key ‘__single__’.
- Returns:
An instance of CalipyDict
- Return type:
Example usage:
# Imports and definitions import torch from calipy.data import DataTuple, CalipyDict # Create data for CalipyDict initialization tensor_A = torch.ones(2, 3) tensor_B = torch.ones(4, 5) names = ['tensor_A', 'tensor_B'] values = [tensor_A, tensor_B] data_tuple = DataTuple(names, values) data_dict = {'tensor_A': tensor_A, 'tensor_B' : tensor_B} # Create CalipyDict objects dict_from_none = CalipyDict() dict_from_dict = CalipyDict(data_dict) dict_from_tuple = CalipyDict(data_tuple) dict_from_calipy = CalipyDict(dict_from_dict) dict_from_single = CalipyDict(tensor_A) # Print contents and investigate for cp_dict in [dict_from_none, dict_from_dict, dict_from_tuple, dict_from_calipy, dict_from_single]: print(cp_dict) dict_from_single.has_single_item dict_from_single.value dict_from_dict.as_datatuple()
- as_datatuple() DataTuple
Convert this CalipyDict into a DataTuple for dimension-aware operations or other advanced uses.
- property has_single_item: bool
- Returns:
True if exactly one item is in this dict, else False.
- property is_null
Indicate if CalipyDict only has one element and that one is trivial
- rename_keys(rename_dict)
Renames current keys to the ones given by rename_dict[key].
- Parameters:
rename_dict (dict) – Dictionary s.t. for each key in rename_dict, key is in self.keys() with rename_dict[key] being the string that is the key in the newly produced CalipyDict.
- Returns:
CalipyDict with the same values but with changed keys.
- Return type:
- stack(other)
Overloads the + operator to return a new CalipyDicte when adding two CalipyDict objects. Addition is defined
- Parameters:
other (CalipyDict) – The CalipyDict to add.
- Returns:
A new CalipyDict with elements from each dict stacked.
- Return type:
- Raises:
ValueError – If both DataTuples do not have matching keys.
- subsample_tensors(dim, indices)
Allows accessing CalipyTensor elements of CalipyDict by passing a list of integer indices and a single dimension along which all of the CalipyTensors in the dict are to be sliced.
- Parameters:
indices (list of int) – List of integer indices that is used for indexing self in the dimension dim
dim (DimTuple) – A DimTuple containing a single CalipyDim object declaring which dim is to be subsampled
- Returns:
A new CalipyDict with keys of self and corresponding values = value[…, indices, …] i.e. the values indexed by the indices in dimension dim.
- Return type:
- property value: Any
If there’s exactly one item in this CalipyDict, return it. Otherwise, raise an error. This property allows single-output usage.
- class calipy.data.CalipyIO(data=None, name='io_noname')
Bases:
objectA data container that can store single or multiple dict like containers. Is meant as a convenient wrapper for homogeneous and inhomogeneous lists of data where each list element is a CalipyDict. CalipyIO s the standard input /output/observation format handled inside of the CalipyNode objects. Is typically autowrapped around lists of dictionaries or single objects provided by the user towards e.g. the forward() method. Has idempotent property and leaves CalipyIO objects unchanged; i.e. wrapping multiple times is equivalent to wrapping once. CalipyIO objects are also the output of iterating through InhomogeneousDataLoader objects; i.e. datasets and subbatched datasets are represented in this way.
- Special access rules:
If calipy_io contains in its list a single dict, calipy_io.dict returns it
- If calipy_io contains in its list a single dict, calipy_io[key] returns
the corresponding value dict[key]
- If calipy_io contains in its list a single dict and in that dict a single
key, value pair, then calipy_io.value returns that value.
- Parameters:
data (None, or dict, or CalipyDict, or DataTuple, or single object.) –
The data being used to construct the CalipyDict. Valid input types include:
None => empty dict
A dict {str -> item} => multi-item
A CalipyDict => Leave unchanged
A DataTuple => convert to dict
A single item => store under a default key ‘__single__’
- Returns:
An instance of CalipyIO
- Return type:
Example usage:
# Imports and definitions import torch from calipy.data import DataTuple, CalipyDict, CalipyList, CalipyIO from calipy.tensor import CalipyTensor from calipy.utils import dim_assignment # Create data for CalipyList calipy_list_empty = CalipyList() calipy_list = CalipyList(data = ['a','b']) calipy_same_list = CalipyList(calipy_list) # Pass data into CalipyIO and investigate # Legal input types are None, single object, dict, CalipyDict, DataTuple, list, # CalipyList, CalipyIO. # Build inputs none_input = None single_input = torch.tensor([1.0]) dict_input = {'a': 1, 'b' : 2} CalipyDict_input = CalipyDict(dict_input) DataTuple_input = CalipyDict_input.as_datatuple() list_input = [dict_input, {'c' : 3}, {'d' : 4}] CalipyList_input = CalipyList(list_input) CalipyIO_input = CalipyIO(dict_input) # Build CalipyIO's none_io = CalipyIO(none_input) single_io = CalipyIO(single_input) dict_io = CalipyIO(dict_input) CalipyDict_io = CalipyIO(CalipyDict_input) DataTuple_io = CalipyIO(DataTuple_input) list_io = CalipyIO(list_input, name = 'io_from_list') CalipyList_io = CalipyIO(CalipyList_input) CalipyIO_io = CalipyIO(CalipyIO_input) # Check properties none_io.is_null single_io.is_null print(single_io) # Functionality includes: # 1. Iteration # 2. Fetch by index # 3. Associated CalipyIndex # - Has global and local index # 4. Comes with collate function # 1. Iteration # Proceed to investigate one of the built calipy_io objects, here list_io for io in list_io: print(io) print(io.indexer.global_index) # 2. Fetch by index # Access values (special if list and dict only have 1 element) single_io.dict single_io.value single_io.calipy_dict single_io.calipy_list single_io.data_tuple single_io['__single__'] list_io[0]['a'] # 3. a) Associated Indexer # Content of indexer list_io.batch_dim_flattened list_io.indexer list_io.indexer.local_index list_io_sub = list_io[1:2] list_io_sub.indexer.data_source_name list_io_sub.indexer.index_tensor_dims # 3. b) Associated CalipyIndex # Content of specific IOIndexer list_io_sub.indexer.local_index.tuple list_io_sub.indexer.local_index.tensor list_io_sub.indexer.local_index.index_name_dict list_io_sub.indexer.global_index.tuple list_io_sub.indexer.global_index.tensor list_io_sub.indexer.global_index.index_name_dict # Iteration produces sub_io's for io in list_io: print(io.indexer.global_index) print(io.indexer.global_index.tensor) # 3. c) Index / IO interaction # subsampling and indexing: via intes, tuples, slices, and CalipyIndex sub_io_1 = list_io[0] sub_io_2 = list_io[1] sub_io_3 = list_io[1:3] sub_io_1.indexer.local_index sub_io_2.indexer.local_index sub_io_3.indexer.local_index sub_io_1.indexer.global_index sub_io_2.indexer.global_index sub_io_3.indexer.global_index global_index_1 = sub_io_1.indexer.global_index global_index_2 = sub_io_2.indexer.global_index global_index_3 = sub_io_3.indexer.global_index assert(list_io[global_index_1] == list_io[0]) assert(list_io[global_index_2] == list_io[1]) assert(list_io[global_index_3] == list_io[1:3]) # 4. Collate function # Check collation functionality for autoreducing io s mean_dims = dim_assignment(['bd_1', 'ed_1']) var_dims = dim_assignment(['bd_1', 'ed_1']) mean_1 = CalipyTensor(torch.randn(3, 2), mean_dims) mean_2 = CalipyTensor(torch.randn(5, 2), mean_dims) var_1 = CalipyTensor(torch.randn(3, 2), var_dims) var_2 = CalipyTensor(torch.randn(5, 2), var_dims) io_obj = CalipyIO([ CalipyDict({'mean': mean_1, 'var': var_1}), CalipyDict({'mean': mean_2, 'var': var_2}) ]) collated_io = io_obj.reduce_list() # Rename all entries in the dicts in CalipyIO rename_dict = {'a' : 'new_a', 'b' : 'new_b'} renamed_io = list_io.rename_keys(rename_dict)
- as_datatuple() DataTuple
Convert this CalipyDict into a DataTuple for dimension-aware operations or other advanced uses.
- property dict: Any
If there’s exactly one dict in this CalipyIO, return it. Otherwise, raise an error. This property allows single-output usage.
- property has_single_item: bool
- Returns:
True if exactly one item is in this io, else False.
- property is_null
Indicate if CalipyIO only has one element and that one is trivial
- property is_reducible
- preprocess_for_node(nodestructure)
- reduce_list()
Attempts to merge all CalipyDict elements in self.calipy_list into a single CalipyDict by concatenating tensors along the first dimension. This method succeeds only if all CalipyDict elements have exactly matching keys and tensor dimensions (excluding the first dimension).
- rename_keys(rename_dict)
Renames current keys in all the dicts to the ones given by rename_dict[key].
- Parameters:
rename_dict (dict) – Dictionary s.t. for each key in rename_dict, key is in self.calipy_list[k]keys() with rename_dict[key] being the string that is the key in the newly produced CalipyDict.
- Returns:
CalipyIO with the same values but with changed keys.
- Return type:
- property value: Any
If there’s exactly one dict in this CalipyIO and one entry in it return the entry. Otherwise, raise an error. This property allows single-output usage.
- class calipy.data.CalipyList(data=None)
Bases:
listA list-like container that can store single or multiple dict like containers. Is meant as a convenient wrapper for homogeneous and inhomogeneous lists of data where each list element is a CalipyDict. CalipyList is a central element to the standard input/output/observation format handled inside of the CalipyNode objects. Is typically autowrapped around lists of dictionaries or single objects provided by the user towards e.g. the forward() method. Has idempotent property and leaves CalipyList objects unchanged; i.e. wrapping multiple times is equivalent to wrapping once. Ingredient to CalipyIO.
- Parameters:
data (Any) –
The data being used to construct the CalipyList. Valid input types include:
A single item => CalipyList containing single item
A list => CalipyList containg list of objects
- Returns:
An instance of CalipyList
- Return type:
- property has_single_item: bool
- Returns:
True if exactly one item is in this dict, else False.
- property is_null
Indicate if CalipyList only has one element and that one is trivial
- property value: Any
If there’s exactly one item in this CalipyList, return it. Otherwise, raise an error. This property allows single-output usage.
- class calipy.data.DataTuple(names, values)
Bases:
objectCustom class for holding tuples of various objects with explicit names. Provides methods for easily distributing functions over the entries in the tuple and thereby makes modifying collections of objects easier. This is routinely used to perform actions on grouped observation tensors, batch_dims, or event_dims.
- Parameters:
names (list of string) – A list of names serving as keys for the DataTuple.
values (list of obj) – A list of objects serving as values for the DataTuple.
- Returns:
An instance of DataTuple containing the key, value pairs and additional attributes and methods.
- Return type:
Example usage:
# Create DataTuple of tensors names = ['tensor_A', 'tensor_B'] values = [torch.ones(2, 3), torch.ones(4, 5)] data_tuple = DataTuple(names, values) data_tuple['tensor_A'] # Apply functions fun = lambda x: x +1 result_tuple_1 = data_tuple.apply_elementwise(fun) print("Result of applying function:", result_tuple_1, result_tuple_1['tensor_A'], result_tuple_1['tensor_B']) fun_dict = {'tensor_A': lambda x: x + 1, 'tensor_B': lambda x: x - 1} result_tuple_2 = data_tuple.apply_from_dict(fun_dict) print("Result of applying function dictionary:", result_tuple_2, result_tuple_2['tensor_A'], result_tuple_2['tensor_B']) # Create DataTuple of dimensions batch_dims_A = dim_assignment(dim_names = ['bd_A',]) event_dims_A = dim_assignment(dim_names = ['ed_A']) batch_dims_B = dim_assignment(dim_names = ['bd_B']) event_dims_B = dim_assignment(dim_names = ['ed_B']) batch_dims_tuple = DataTuple(names, [batch_dims_A, batch_dims_B]) event_dims_tuple = DataTuple(names, [event_dims_A, event_dims_B]) # Add them added_tensor_tuple = data_tuple + data_tuple full_dims_datatuple = batch_dims_tuple + event_dims_tuple # Construct indexer data_tuple_cp = data_tuple.calipytensor_construct(full_dims_datatuple) augmented_tensor = data_tuple_cp['tensor_A'] augmented_tensor.indexer.local_index # Access subattributes shapes_tuple = data_tuple.get_subattributes('shape') print("Shapes of each tensor in DataTuple:", shapes_tuple) batch_dims_datatuple.get_subattributes('sizes') batch_dims_datatuple.get_subattributes('build_torchdims') # Set new item data_tuple['tensor_C'] = torch.ones([6,6]) print(data_tuple) # Apply class over each element class DifferentClass: def __init__(self, tensor): self.tensor = tensor def __repr__(self): return f"DifferentClass(tensor={self.tensor})" different_tuple = data_tuple.apply_class(DifferentClass) print("Result of applying DifferentClass to DataTuple:", different_tuple) # DataTuple and CalipyTensor interact well: In the following we showcase # that a DataTuple of CalipyTensors can be subsampled by providing a # DataTuple of CalipyIndexes or a single CalipyIndex that is automatically distributed over the CalipyTensors for indexing. # Set up DataTuple of CalipyTensors batch_dims = dim_assignment(dim_names = ['bd_1']) event_dims_A = dim_assignment(dim_names = ['ed_1_A', 'ed_2_A']) data_dims_A = batch_dims + event_dims_A event_dims_B = dim_assignment(dim_names = ['ed_1_B']) data_dims_B = batch_dims + event_dims_B data_A_torch = torch.normal(0,1,[6,4,2]) data_A_cp = CalipyTensor(data_A_torch, data_dims_A, 'data_A') data_B_torch = torch.normal(0,1,[6,3]) data_B_cp = CalipyTensor(data_B_torch, data_dims_B, 'data_B') data_AB_tuple = DataTuple(['data_A_cp', 'data_B_cp'], [data_A_cp, data_B_cp]) # subsample the data individually data_AB_subindices = TensorIndexer.create_simple_subsample_indices(batch_dims[0], data_A_cp.shape[0], 5) data_AB_subindex = data_AB_subindices[0] data_A_subindex = data_AB_subindex.expand_to_dims(data_dims_A, data_A_cp.shape) data_B_subindex = data_AB_subindex.expand_to_dims(data_dims_B, data_B_cp.shape) data_AB_sub_1 = DataTuple(['data_A_cp_sub', 'data_B_cp_sub'], [data_A_cp[data_A_subindex], data_B_cp[data_B_subindex]]) # Use subsampling functionality for DataTuples, either by passing a DataTuple of # CalipyIndex or a single CalipyIndex that is broadcasted data_AB_subindex_tuple = DataTuple(['data_A_cp', 'data_B_cp'], [data_A_subindex, data_B_subindex]) data_AB_sub_2 = data_AB_tuple.subsample(data_AB_subindex_tuple) data_AB_sub_3 = data_AB_tuple.subsample(data_AB_subindex) assert ((data_AB_sub_1[0] - data_AB_sub_2[0]).tensor == 0).all() assert ((data_AB_sub_2[0] - data_AB_sub_3[0]).tensor == 0).all()
- apply_class(class_type)
Allows applying a class constructor to all elements in the DataTuple. For example, DifferentClass(data_tuple) will apply DifferentClass to each element in data_tuple.
- Parameters:
class_type – The class constructor to be applied to each element.
- Returns:
New DataTuple with the class constructor applied to each element.
- apply_elementwise(function)
Returns a new DataTuple with keys = self._data_dict.keys() and associated values = function(self._data_dict.values())
- apply_from_dict(fun_dict)
Applies functions from a dictionary to corresponding entries in the DataTuple. If a key in fun_dict matches a key in DataTuple, the function is applied.
- Parameters:
fun_dict – Dictionary with keys corresponding to DataTuple keys and values as functions.
- Returns:
New DataTuple with functions applied where specified.
- as_dict()
Returns the underlying dictionary linking names and values
- calipytensor_construct(dims_datatuple)
Applies construction of the TensorIndexer to build for each tensor in self the CalipyTensor construction used for indexing. Requires all elements of self to be tensors and requires dims_datatuple to be a DataTuple containing DimTuples.
- Parameters:
self – A DataTuple containing indexable tensors to be indexed
dims_datatuple – A DataTuple containing the DimTuples used for indexing
- Returns:
Nothing returned, calipy.indexer integrated into tensors in self
- Return type:
- Raises:
ValueError – If both DataTuples do not have matching keys.
- get_subattributes(attr)
Allows direct access to attributes or methods of elements inside the DataTuple. For example, calling data_tuple.get_subattributes(‘shape’) will return a DataTuple containing the shapes of each tensor in data_tuple.
- Parameters:
attr – The attribute or method name to be accessed.
- Returns:
DataTuple containing the attribute values or method results for each element.
- get_tensors()
Allows to extract .tensor attribute out of a DataTuple that contains CalipyTensors leaving other objects in the tuple unperturbed.
- Returns:
DataTuple containing for each key, value pair either the tensor subattribute value.tensor or the original value.
- items()
- keys()
- rename_keys(rename_dict)
Renames current keys to the ones given by rename_dict[key].
- Parameters:
rename_dict (dict) – Dictionary s.t. for each key in rename_dict, key is in self.keys() with rename_dict[key] being the string that is the key in the newly produced DataTuple.
- Returns:
DataTuple the same values but with changed keys.
- Return type:
- subsample(datatuple_indices)
Subsamples a DataTuple containing CalipyTensors by applying to each of it the corresponding CalipyIndex object from the datatuple_indices. The arg datatuple_indices can also consist of just 1 entry of CalipyIndex that is then applied to all elements of self for subsampling. If a CalipyTensor in self does not feature the dim subsampled in CalipyIndex, then it is not subsampled
- Parameters:
datatuple_indices (DataTuple or CalipyIndex) – The DataTuple containing the CalipyIndex objects or a single CalipyIndex object.
- Returns:
A new DataTuple with each CalipyTensor subsampled by the indices.
- Return type:
- Raises:
ValueError – If both DataTuples do not have matching keys.
- values()
- calipy.data.io_collate(batch, reduce=False)
Custom collate function that collates ios together by concatenating contained list elements into a longer list aroung which a new new CalipyIO object is built. This new CalipyIO object contains a list of dicts. If reduce = True, list elements are aimed to be stacked themselves (e.g. tensors along their first dimensions) to create a single dict containing stacked elements. Used primarily as collate function for the DataLoader to perform automatized subsampling.
- Parameters:
batch (list of CalipyIO) – A list of CalipyDiIO containing info on input_vars, observations, and corresponding index that was used to produce them via dataset.__getitem__[idx]
- Returns:
An instance of CalipyIO, where multiple CalipyDict objects are collated together either into a list of dicts or into a single calipy_io containing stacked CalipyTensors.
- Return type:
- calipy.data.preprocess_args(input_vars, observations, subsample_index)
Function for preprocessing arguments to forward passes. Converts different forms of input to CalipyIO objects reflecting a standardized form of inputs and outputs. Typically just wraps input into CalipyIO.
- Parameters:
input_vars (None, single object, Dict, CalipyDict, list, CalipyList, or CalipyIO containing CalipyTensors.) – Input input_vars to some .forward() method call. Specific contents depend on the node but typically None or a dict containing CalipyTensors with keys as specified in the nodes’ input_vars_schema
observation – Input observation to some .forward() method call. Specific contents depend on the node but typically a dict containing CalipyTensors with keys as specified in the nodes’ observation_schema
sbsample_index – Input subsample_index to some .forward() method call. Specific contents depend on the node but typically None if no subsampling happens or of type Dict containing CalipyIndex objects in case of subsampling. The keys are as specified in the nodes’ subsampling_schema
- Returns:
A tuple containing instances of CalipyIO that represent input_vars, observations, subsample_index in a way that forward methods can handle them well and they are easily passable between nodes.
- Return type:
tuple of CalipyIO
Example usage:
# Imports and definitions import torch from calipy.data import DataTuple, CalipyDict, CalipyList, CalipyIO from calipy.tensor import CalipyTensor