calipy.data (API)

This module provides basic functionality to represent and access data in a way that interacts well with calipy’s basic classes and methods.

The classes and functions are

DataTuple: A class for holding tuples of various objects with explicit names.: is the basic object to be used for input variables, observations etc. as it makes explicit the meaning of the tensors passed or produced.

sample:

The DataTuple class is often used to manage and package data, including for the various forward() methods when activating CalipyNodes.

The script is meant solely for educational and illustrative purposes. Written by Dr. Jemil Avers Butt, Atlas optimization GmbH, www.atlasoptimization.com.

class calipy.data.CalipyDataset(input_data, output_data, homogeneous=False)

Bases: Dataset

CalipyDataset is a class mimicking the functionality of the Dataset class in torch.utils.data but providing some streamlined prebuilt functions needed in the context of calipy. This includes support for subsampling based on CalipyDict objects. Is meant to be subclassed for augmenting user specified datasets with additional, calipy-ready functionality.

Parameters:

input_data (NoneType, CalipyTensor, CalipyDict, CalipyIO) –
The input_data of the dataset reflecting the inputs to the model that evoke the corresponding outputs. Valid input types include:
- None => No input data (no input)
- CalipyTensor => Single tensor (single input)
- CalipyDict => Dictionary containing CalipyTensors (multiple inputs)
- CalipyIO => List containing CalipyDict containing CalipyTensors
  (multiple inputs, possibly of inhomogeneous shape and type)
output_data (NoneType, CalipyTensor, CalipyDict, CalipyIO) –
The output_data of the dataset reflecting the outputs of the model evoked by the corresponding inputs. Valid input types include:
- None => No output data (no output)
- CalipyTensor => Single tensor (single output)
- CalipyDict => Dictionary containing CalipyTensors (multiple outputs)
- CalipyIO => List containing CalipyDict containing CalipyTensors
  (multiple inputs, possibly of inhomogeneous shape and type)
batch_dims (DimTuple) – A DimTuple object defining the batch dimensions among which flattening and subsampling is performed.

Returns:

An instance of CalipyDataset, suitable for accessing datasets and passing them to DataLoader objects.

Return type:

CalipyDataset

The following scenarios need to be covered by the construction procedure:

(Input, Ouptut) = (None, CalipyTensor)

(Input, Ouptut) = (Calipytensor, CalipyTensor)

(Input, Ouptut) = (None, dict(CalipyTensor))

(Input, Output) = (dict(CalipyTensor), dict(CalipyTensor))

(Input, Ouptut) = (None, list(dict(CalipyTensor)))

(Input, Output) = (list(dict(CalipyTensor)), list(dict(CalipyTensor)))

(Input, Output) = (None, list_mixed)

(Input, Output) = (list_mixed, list_mixed)

where list_mixed means a list of dicts with entries to keys sometimes being None or of nonmatching shapes.

Example usage:

# i) Imports and definitions
import torch
import pyro        
from calipy.utils import dim_assignment
from calipy.data import  CalipyDataset, io_collate
from calipy.tensor import CalipyTensor
from torch.utils.data import DataLoader

# Definitions        
n_meas = 2
n_event = 1
n_subbatch = 7


# ii) Create data for dataset

# Set up sample distributions
mu_true = torch.tensor(0.0)
sigma_true = torch.tensor(0.1)

# Sample from distributions & wrap result
data_distribution = pyro.distributions.Normal(mu_true, sigma_true)
data = data_distribution.sample([n_meas, n_event])
data_dims = dim_assignment(['bd_data', 'ed_data'], dim_sizes = [n_meas, n_event])
data_cp = CalipyTensor(data, data_dims, name = 'data')

# dataset_inputs
data_none = None
data_ct = data_cp
data_cd = {'a': data_cp, 'b' : data_cp}
data_io = [data_cd, data_cd]
data_io_mixed = [data_cd, {'a' : None, 'b' : data_cp} , {'a': data_cp, 'b':None}, data_cd]


# iii) Build datasets

# Build datasets and check
dataset_none_none = CalipyDataset(input_data = data_none, output_data = data_none)
dataset_none_ct = CalipyDataset(input_data = data_none, output_data = data_ct)
dataset_none_cd = CalipyDataset(input_data = data_none, output_data = data_cd)
dataset_none_io = CalipyDataset(input_data = data_none, output_data = data_io)
dataset_none_iomixed = CalipyDataset(input_data = data_none, output_data = data_io_mixed)

dataset_ct_ct = CalipyDataset(input_data = data_ct, output_data = data_ct)
dataset_ct_cd = CalipyDataset(input_data = data_ct, output_data = data_cd)
dataset_ct_io = CalipyDataset(input_data = data_ct, output_data = data_io)
dataset_ct_iomixed = CalipyDataset(input_data = data_ct, output_data = data_io_mixed)

dataset_cd_ct = CalipyDataset(input_data = data_cd, output_data = data_ct)
dataset_cd_cd = CalipyDataset(input_data = data_cd, output_data = data_cd)
dataset_cd_io = CalipyDataset(input_data = data_cd, output_data = data_io)
dataset_cd_iomixed = CalipyDataset(input_data = data_cd, output_data = data_io_mixed)

dataset_io_ct = CalipyDataset(input_data = data_io, output_data = data_ct)
dataset_io_cd = CalipyDataset(input_data = data_io, output_data = data_cd)
dataset_io_io = CalipyDataset(input_data = data_io, output_data = data_io)
dataset_io_iomixed = CalipyDataset(input_data = data_io, output_data = data_io_mixed)

dataset_iomixed_ct = CalipyDataset(input_data = data_io_mixed, output_data = data_ct)
dataset_iomixed_cd = CalipyDataset(input_data = data_io_mixed, output_data = data_cd)
dataset_iomixed_io = CalipyDataset(input_data = data_io_mixed, output_data = data_io)
dataset_iomixed_iomixed = CalipyDataset(input_data = data_io_mixed, output_data = data_io_mixed)


# iv) Build dataloader and subsample

dataset = CalipyDataset(input_data = [None, data_ct, data_cd],
                        output_data = [None, data_ct, data_cd] )
dataloader = DataLoader(dataset, batch_size=2, shuffle=True, collate_fn=io_collate)

# Iterate through the DataLoader
for batch_input, batch_output, batch_index in dataloader:
    print(batch_input, batch_output, batch_index)

infer_length(query_data)

class calipy.data.CalipyDict(data=None)

Bases: dict

A dictionary-like container that can store single or multiple items. If it contains exactly one item, calipy_dict.value can be called to retrieve it directly. Is meant as a convenient wrapper to DataTuple functionality and is the basis for the standard input/output/observation format CalipyIO handled inside of the CalipyNode objects. Is typically autowrapped around dictionaries or single objects provided by the user towards e.g. the forward() method. Has idempotent property and leaves CalipyDict objects unchanged.

CalipyDict allows heterogeneous tensor shapes for flexible datasets. Keys represent measurement identifiers (‘mean’, ‘var’, etc.); values are e.g. CalipyTensors with potentially differing shapes across CalipyDict instances.

Parameters:

data (None, dict, CalipyDict, DataTuple, or any single object.) –

The data used to construct the CalipyDict. Valid input types include:

None: Initializes an empty dict.
dict[str, item]: Multi-item dictionary.
CalipyDict: Returns unchanged.
DataTuple: Converted to dict.
Single object: Stored under a default key ‘__single__’.

Returns:

An instance of CalipyDict

Return type:

CalipyDict

Example usage:

# Imports and definitions
import torch
from calipy.data import DataTuple, CalipyDict


# Create data for CalipyDict initialization
tensor_A = torch.ones(2, 3)
tensor_B = torch.ones(4, 5)
names = ['tensor_A', 'tensor_B']
values = [tensor_A, tensor_B]
data_tuple = DataTuple(names, values)
data_dict = {'tensor_A': tensor_A, 'tensor_B' : tensor_B}

# Create CalipyDict objects
dict_from_none = CalipyDict()
dict_from_dict = CalipyDict(data_dict)
dict_from_tuple = CalipyDict(data_tuple)
dict_from_calipy = CalipyDict(dict_from_dict)
dict_from_single = CalipyDict(tensor_A)

# Print contents and investigate 
for cp_dict in [dict_from_none, dict_from_dict, dict_from_tuple, 
                dict_from_calipy, dict_from_single]:
    print(cp_dict)

dict_from_single.has_single_item
dict_from_single.value
dict_from_dict.as_datatuple()

as_datatuple() → DataTuple: Convert this CalipyDict into a DataTuple for dimension-aware operations or other advanced uses.

property has_single_item: bool

Returns:: True if exactly one item is in this dict, else False.

property is_null: Indicate if CalipyDict only has one element and that one is trivial

rename_keys(rename_dict)

Renames current keys to the ones given by rename_dict[key].

Parameters:: rename_dict (dict) – Dictionary s.t. for each key in rename_dict, key is in self.keys() with rename_dict[key] being the string that is the key in the newly produced CalipyDict.
Returns:: CalipyDict with the same values but with changed keys.
Return type:: CalipyDict

stack(other)

Overloads the + operator to return a new CalipyDicte when adding two CalipyDict objects. Addition is defined

Parameters:: other (CalipyDict) – The CalipyDict to add.
Returns:: A new CalipyDict with elements from each dict stacked.
Return type:: CalipyDict
Raises:: ValueError – If both DataTuples do not have matching keys.

subsample_tensors(dim, indices)

Allows accessing CalipyTensor elements of CalipyDict by passing a list of integer indices and a single dimension along which all of the CalipyTensors in the dict are to be sliced.

Parameters:

indices (list of int) – List of integer indices that is used for indexing self in the dimension dim
dim (DimTuple) – A DimTuple containing a single CalipyDim object declaring which dim is to be subsampled

Returns:

A new CalipyDict with keys of self and corresponding values = value[…, indices, …] i.e. the values indexed by the indices in dimension dim.

Return type:

CalipyDict

property value: Any: If there’s exactly one item in this CalipyDict, return it. Otherwise, raise an error. This property allows single-output usage.

class calipy.data.CalipyIO(data=None, name='io_noname')

Bases: object

A data container that can store single or multiple dict like containers. Is meant as a convenient wrapper for homogeneous and inhomogeneous lists of data where each list element is a CalipyDict. CalipyIO s the standard input /output/observation format handled inside of the CalipyNode objects. Is typically autowrapped around lists of dictionaries or single objects provided by the user towards e.g. the forward() method. Has idempotent property and leaves CalipyIO objects unchanged; i.e. wrapping multiple times is equivalent to wrapping once. CalipyIO objects are also the output of iterating through InhomogeneousDataLoader objects; i.e. datasets and subbatched datasets are represented in this way.

Special access rules:

If calipy_io contains in its list a single dict, calipy_io.dict returns it
If calipy_io contains in its list a single dict, calipy_io[key] returns
the corresponding value dict[key]
If calipy_io contains in its list a single dict and in that dict a single
key, value pair, then calipy_io.value returns that value.

Parameters:

data (None, or dict, or CalipyDict, or DataTuple, or single object.) –

The data being used to construct the CalipyDict. Valid input types include:

None => empty dict

A dict {str -> item} => multi-item

A CalipyDict => Leave unchanged

A DataTuple => convert to dict

A single item => store under a default key ‘__single__’

Returns:

An instance of CalipyIO

Return type:

CalipyIO

Example usage:

# Imports and definitions
import torch
from calipy.data import DataTuple, CalipyDict, CalipyList, CalipyIO
from calipy.tensor import CalipyTensor
from calipy.utils import dim_assignment


# Create data for CalipyList
calipy_list_empty = CalipyList()
calipy_list = CalipyList(data = ['a','b'])
calipy_same_list = CalipyList(calipy_list)


# Pass data into CalipyIO and investigate

# Legal input types are None, single object, dict, CalipyDict, DataTuple, list,
# CalipyList, CalipyIO. 

# Build inputs
none_input = None
single_input = torch.tensor([1.0])
dict_input = {'a': 1, 'b' : 2}
CalipyDict_input = CalipyDict(dict_input)
DataTuple_input = CalipyDict_input.as_datatuple()
list_input = [dict_input, {'c' : 3}, {'d' : 4}]
CalipyList_input = CalipyList(list_input)
CalipyIO_input = CalipyIO(dict_input)

# Build CalipyIO's
none_io = CalipyIO(none_input)
single_io = CalipyIO(single_input)
dict_io = CalipyIO(dict_input)
CalipyDict_io = CalipyIO(CalipyDict_input)
DataTuple_io = CalipyIO(DataTuple_input)
list_io = CalipyIO(list_input, name = 'io_from_list')
CalipyList_io = CalipyIO(CalipyList_input)
CalipyIO_io = CalipyIO(CalipyIO_input)


# Check properties
none_io.is_null
single_io.is_null
print(single_io)


# Functionality includes:
#   1. Iteration
#   2. Fetch by index
#   3. Associated CalipyIndex
#      -  Has global and local index
#   4. Comes with collate function

# 1. Iteration
# Proceed to investigate one of the built calipy_io objects, here list_io
for io in list_io:
    print(io)
    print(io.indexer.global_index)

# 2. Fetch by index
# Access values (special if list and dict only have 1 element)
single_io.dict
single_io.value
single_io.calipy_dict
single_io.calipy_list
single_io.data_tuple
single_io['__single__']
list_io[0]['a']


# 3. a) Associated Indexer
# Content of indexer
list_io.batch_dim_flattened
list_io.indexer
list_io.indexer.local_index
list_io_sub = list_io[1:2]
list_io_sub.indexer.data_source_name
list_io_sub.indexer.index_tensor_dims

# 3. b) Associated CalipyIndex
# Content of specific IOIndexer
list_io_sub.indexer.local_index.tuple
list_io_sub.indexer.local_index.tensor
list_io_sub.indexer.local_index.index_name_dict

list_io_sub.indexer.global_index.tuple
list_io_sub.indexer.global_index.tensor
list_io_sub.indexer.global_index.index_name_dict

# Iteration produces sub_io's
for io in list_io:
    print(io.indexer.global_index)
    print(io.indexer.global_index.tensor)

# 3. c) Index / IO interaction
# subsampling and indexing: via intes, tuples, slices, and CalipyIndex
sub_io_1 = list_io[0]
sub_io_2 = list_io[1]
sub_io_3 = list_io[1:3]

sub_io_1.indexer.local_index
sub_io_2.indexer.local_index
sub_io_3.indexer.local_index

sub_io_1.indexer.global_index
sub_io_2.indexer.global_index
sub_io_3.indexer.global_index

global_index_1 = sub_io_1.indexer.global_index
global_index_2 = sub_io_2.indexer.global_index
global_index_3 = sub_io_3.indexer.global_index

assert(list_io[global_index_1] == list_io[0])
assert(list_io[global_index_2] == list_io[1])
assert(list_io[global_index_3] == list_io[1:3])

# 4. Collate function
# Check collation functionality for autoreducing io s
mean_dims = dim_assignment(['bd_1', 'ed_1'])
var_dims = dim_assignment(['bd_1', 'ed_1'])

mean_1 = CalipyTensor(torch.randn(3, 2), mean_dims)
mean_2 = CalipyTensor(torch.randn(5, 2), mean_dims)
var_1 = CalipyTensor(torch.randn(3, 2), var_dims)
var_2 = CalipyTensor(torch.randn(5, 2), var_dims)

io_obj = CalipyIO([
    CalipyDict({'mean': mean_1, 'var': var_1}),
    CalipyDict({'mean': mean_2, 'var': var_2})
])

collated_io = io_obj.reduce_list()

# Rename all entries in the dicts in CalipyIO
rename_dict = {'a' : 'new_a', 'b' : 'new_b'}
renamed_io = list_io.rename_keys(rename_dict)

as_datatuple() → DataTuple: Convert this CalipyDict into a DataTuple for dimension-aware operations or other advanced uses.

property dict: Any: If there’s exactly one dict in this CalipyIO, return it. Otherwise, raise an error. This property allows single-output usage.

property has_single_item: bool

Returns:: True if exactly one item is in this io, else False.

property is_null: Indicate if CalipyIO only has one element and that one is trivial

property is_reducible

preprocess_for_node(nodestructure)

reduce_list(): Attempts to merge all CalipyDict elements in self.calipy_list into a single CalipyDict by concatenating tensors along the first dimension. This method succeeds only if all CalipyDict elements have exactly matching keys and tensor dimensions (excluding the first dimension).

rename_keys(rename_dict)

Renames current keys in all the dicts to the ones given by rename_dict[key].

Parameters:: rename_dict (dict) – Dictionary s.t. for each key in rename_dict, key is in self.calipy_list[k]keys() with rename_dict[key] being the string that is the key in the newly produced CalipyDict.
Returns:: CalipyIO with the same values but with changed keys.
Return type:: CalipyIO

property value: Any: If there’s exactly one dict in this CalipyIO and one entry in it return the entry. Otherwise, raise an error. This property allows single-output usage.

class calipy.data.CalipyList(data=None)

Bases: list

A list-like container that can store single or multiple dict like containers. Is meant as a convenient wrapper for homogeneous and inhomogeneous lists of data where each list element is a CalipyDict. CalipyList is a central element to the standard input/output/observation format handled inside of the CalipyNode objects. Is typically autowrapped around lists of dictionaries or single objects provided by the user towards e.g. the forward() method. Has idempotent property and leaves CalipyList objects unchanged; i.e. wrapping multiple times is equivalent to wrapping once. Ingredient to CalipyIO.

Parameters:

data (Any) –

The data being used to construct the CalipyList. Valid input types include:

A single item => CalipyList containing single item

A list => CalipyList containg list of objects

Returns:

An instance of CalipyList

Return type:

CalipyList

property has_single_item: bool

Returns:: True if exactly one item is in this dict, else False.

property is_null: Indicate if CalipyList only has one element and that one is trivial

property value: Any: If there’s exactly one item in this CalipyList, return it. Otherwise, raise an error. This property allows single-output usage.

class calipy.data.DataTuple(names, values)

Bases: object

Custom class for holding tuples of various objects with explicit names. Provides methods for easily distributing functions over the entries in the tuple and thereby makes modifying collections of objects easier. This is routinely used to perform actions on grouped observation tensors, batch_dims, or event_dims.

Parameters:

names (list of string) – A list of names serving as keys for the DataTuple.
values (list of obj) – A list of objects serving as values for the DataTuple.

Returns:

An instance of DataTuple containing the key, value pairs and additional attributes and methods.

Return type:

DataTuple

Example usage:

# Create DataTuple of tensors
names = ['tensor_A', 'tensor_B']
values = [torch.ones(2, 3), torch.ones(4, 5)]
data_tuple = DataTuple(names, values)
data_tuple['tensor_A']

# Apply functions
fun = lambda x: x +1
result_tuple_1 = data_tuple.apply_elementwise(fun)
print("Result of applying function:", result_tuple_1, result_tuple_1['tensor_A'], result_tuple_1['tensor_B'])
fun_dict = {'tensor_A': lambda x: x + 1, 'tensor_B': lambda x: x - 1}
result_tuple_2 = data_tuple.apply_from_dict(fun_dict)
print("Result of applying function dictionary:", result_tuple_2, result_tuple_2['tensor_A'], result_tuple_2['tensor_B'])


# Create DataTuple of dimensions
batch_dims_A = dim_assignment(dim_names = ['bd_A',])
event_dims_A = dim_assignment(dim_names = ['ed_A'])       
batch_dims_B = dim_assignment(dim_names = ['bd_B'])
event_dims_B = dim_assignment(dim_names = ['ed_B'])

batch_dims_tuple = DataTuple(names, [batch_dims_A, batch_dims_B])
event_dims_tuple = DataTuple(names, [event_dims_A, event_dims_B])

# Add them 
added_tensor_tuple = data_tuple + data_tuple
full_dims_datatuple = batch_dims_tuple + event_dims_tuple

# Construct indexer
data_tuple_cp = data_tuple.calipytensor_construct(full_dims_datatuple)
augmented_tensor = data_tuple_cp['tensor_A']
augmented_tensor.indexer.local_index

# Access subattributes
shapes_tuple = data_tuple.get_subattributes('shape')
print("Shapes of each tensor in DataTuple:", shapes_tuple)
batch_dims_datatuple.get_subattributes('sizes')
batch_dims_datatuple.get_subattributes('build_torchdims')

# Set new item
data_tuple['tensor_C'] = torch.ones([6,6])
print(data_tuple)

# Apply class over each element
class DifferentClass:
    def __init__(self, tensor):
        self.tensor = tensor

    def __repr__(self):
        return f"DifferentClass(tensor={self.tensor})"

different_tuple = data_tuple.apply_class(DifferentClass)
print("Result of applying DifferentClass to DataTuple:", different_tuple)


# DataTuple and CalipyTensor interact well: In the following we showcase
# that a DataTuple of CalipyTensors can be subsampled by providing a
# DataTuple of CalipyIndexes or a single CalipyIndex that is automatically
distributed over the CalipyTensors for indexing.

# Set up DataTuple of CalipyTensors
batch_dims = dim_assignment(dim_names = ['bd_1'])
event_dims_A = dim_assignment(dim_names = ['ed_1_A', 'ed_2_A'])
data_dims_A = batch_dims + event_dims_A
event_dims_B = dim_assignment(dim_names = ['ed_1_B'])
data_dims_B = batch_dims + event_dims_B
data_A_torch = torch.normal(0,1,[6,4,2])
data_A_cp = CalipyTensor(data_A_torch, data_dims_A, 'data_A')
data_B_torch = torch.normal(0,1,[6,3])
data_B_cp = CalipyTensor(data_B_torch, data_dims_B, 'data_B')

data_AB_tuple = DataTuple(['data_A_cp', 'data_B_cp'], [data_A_cp, data_B_cp])

# subsample the data individually
data_AB_subindices = TensorIndexer.create_simple_subsample_indices(batch_dims[0], data_A_cp.shape[0], 5)
data_AB_subindex = data_AB_subindices[0]
data_A_subindex = data_AB_subindex.expand_to_dims(data_dims_A, data_A_cp.shape)
data_B_subindex = data_AB_subindex.expand_to_dims(data_dims_B, data_B_cp.shape)
data_AB_sub_1 = DataTuple(['data_A_cp_sub', 'data_B_cp_sub'], [data_A_cp[data_A_subindex], data_B_cp[data_B_subindex]])

# Use subsampling functionality for DataTuples, either by passing a DataTuple of
# CalipyIndex or a single CalipyIndex that is broadcasted
data_AB_subindex_tuple = DataTuple(['data_A_cp', 'data_B_cp'], [data_A_subindex, data_B_subindex])
data_AB_sub_2 = data_AB_tuple.subsample(data_AB_subindex_tuple)
data_AB_sub_3 = data_AB_tuple.subsample(data_AB_subindex)
assert ((data_AB_sub_1[0] - data_AB_sub_2[0]).tensor == 0).all()
assert ((data_AB_sub_2[0] - data_AB_sub_3[0]).tensor == 0).all()

apply_class(class_type)

Allows applying a class constructor to all elements in the DataTuple. For example, DifferentClass(data_tuple) will apply DifferentClass to each element in data_tuple.

Parameters:: class_type – The class constructor to be applied to each element.
Returns:: New DataTuple with the class constructor applied to each element.

apply_elementwise(function): Returns a new DataTuple with keys = self._data_dict.keys() and associated values = function(self._data_dict.values())

apply_from_dict(fun_dict)

Applies functions from a dictionary to corresponding entries in the DataTuple. If a key in fun_dict matches a key in DataTuple, the function is applied.

Parameters:: fun_dict – Dictionary with keys corresponding to DataTuple keys and values as functions.
Returns:: New DataTuple with functions applied where specified.

as_dict(): Returns the underlying dictionary linking names and values

calipytensor_construct(dims_datatuple)

Applies construction of the TensorIndexer to build for each tensor in self the CalipyTensor construction used for indexing. Requires all elements of self to be tensors and requires dims_datatuple to be a DataTuple containing DimTuples.

Parameters:

self – A DataTuple containing indexable tensors to be indexed
dims_datatuple – A DataTuple containing the DimTuples used for indexing

Returns:

Nothing returned, calipy.indexer integrated into tensors in self

Return type:

DataTuple

Raises:

ValueError – If both DataTuples do not have matching keys.

get_subattributes(attr)

Allows direct access to attributes or methods of elements inside the DataTuple. For example, calling data_tuple.get_subattributes(‘shape’) will return a DataTuple containing the shapes of each tensor in data_tuple.

Parameters:: attr – The attribute or method name to be accessed.
Returns:: DataTuple containing the attribute values or method results for each element.

get_tensors()

Allows to extract .tensor attribute out of a DataTuple that contains CalipyTensors leaving other objects in the tuple unperturbed.

Returns:: DataTuple containing for each key, value pair either the tensor subattribute value.tensor or the original value.

items()

keys()

rename_keys(rename_dict)

Renames current keys to the ones given by rename_dict[key].

Parameters:: rename_dict (dict) – Dictionary s.t. for each key in rename_dict, key is in self.keys() with rename_dict[key] being the string that is the key in the newly produced DataTuple.
Returns:: DataTuple the same values but with changed keys.
Return type:: DataTuple

subsample(datatuple_indices)

Subsamples a DataTuple containing CalipyTensors by applying to each of it the corresponding CalipyIndex object from the datatuple_indices. The arg datatuple_indices can also consist of just 1 entry of CalipyIndex that is then applied to all elements of self for subsampling. If a CalipyTensor in self does not feature the dim subsampled in CalipyIndex, then it is not subsampled

Parameters:: datatuple_indices (DataTuple or CalipyIndex) – The DataTuple containing the CalipyIndex objects or a single CalipyIndex object.
Returns:: A new DataTuple with each CalipyTensor subsampled by the indices.
Return type:: DataTuple
Raises:: ValueError – If both DataTuples do not have matching keys.

values()

calipy.data.io_collate(batch, reduce=False)

Custom collate function that collates ios together by concatenating contained list elements into a longer list aroung which a new new CalipyIO object is built. This new CalipyIO object contains a list of dicts. If reduce = True, list elements are aimed to be stacked themselves (e.g. tensors along their first dimensions) to create a single dict containing stacked elements. Used primarily as collate function for the DataLoader to perform automatized subsampling.

Parameters:: batch (list of CalipyIO) – A list of CalipyDiIO containing info on input_vars, observations, and corresponding index that was used to produce them via dataset.__getitem__[idx]
Returns:: An instance of CalipyIO, where multiple CalipyDict objects are collated together either into a list of dicts or into a single calipy_io containing stacked CalipyTensors.
Return type:: CalipyIO

calipy.data.preprocess_args(input_vars, observations, subsample_index)

Function for preprocessing arguments to forward passes. Converts different forms of input to CalipyIO objects reflecting a standardized form of inputs and outputs. Typically just wraps input into CalipyIO.

Parameters:

input_vars (None, single object, Dict, CalipyDict, list, CalipyList, or CalipyIO containing CalipyTensors.) – Input input_vars to some .forward() method call. Specific contents depend on the node but typically None or a dict containing CalipyTensors with keys as specified in the nodes’ input_vars_schema
observation – Input observation to some .forward() method call. Specific contents depend on the node but typically a dict containing CalipyTensors with keys as specified in the nodes’ observation_schema
sbsample_index – Input subsample_index to some .forward() method call. Specific contents depend on the node but typically None if no subsampling happens or of type Dict containing CalipyIndex objects in case of subsampling. The keys are as specified in the nodes’ subsampling_schema

Returns:

A tuple containing instances of CalipyIO that represent input_vars, observations, subsample_index in a way that forward methods can handle them well and they are easily passable between nodes.

Return type:

tuple of CalipyIO

Example usage:

# Imports and definitions
import torch
from calipy.data import DataTuple, CalipyDict, CalipyList, CalipyIO
from calipy.tensor import CalipyTensor