calipy.tensor (API)

This module provides basic dimension aware tensor functionality needed for subsampling, indexing, attaching dimensions to tensors and other quality of life features that allow tensors to keep and communicate extra structure.

The classes and functions are

CalipyIndex: Class of objects that can be used to index normal torch.tensors: and CalipyTensors or CalipyDistributions. Contains indextensors, tuples, dims, names of the elements in the tensor and functionality for reducing and expanding indices.
CalipyIndexer: Abstract class of objects that forms the basis for TensorIndexer: and DistributionIndexer and bundles indexing methods and attributes used by both of these classes.
TensorIndexer: Class responsible for indexing tensors. Is attached directly: to CalipyTensors and takes over active duties like subsampling, keeping track of the origin of subsampled data, and creating CalipyIndex objects based on user demands.

The TensorIndexer and CalipyIndex classes provide basic functionality that is used regularly in the context of defining basic effects and enabling primitives that need to implement subsampling (like sampling or calling parameters.)

The script is meant solely for educational and illustrative purposes. Written by Dr. Jemil Avers Butt, Atlas optimization GmbH, www.atlasoptimization.com.

class calipy.tensor.CalipyIndex(index_tensor, index_tensor_dims=None, name=None)

Bases: object

Class acting as a collection of infos on a specific index tensor collecting basic index_tensor, index_tensor_tuple, and index_tensor_named. This class represents a specific index tensor.

index_tensor.tensor is the original index tensor
index_tensor.tuple can be used for indexing via data[tuple]
index_tensor.named can be used for dimension-specific operations

property bound_dims: Returns the dims of ssi bound to the current actual sizes

expand_to_dims(dims, dim_sizes)

Expand the current index to include additional dimensions.

Parameters:

dims – A DimTuple containing the target dimensions.
dim_sizes – A list containing sizes for the target dimensions.

Returns:

A new CalipyIndex instance with the expanded index tensor.

generate_index_name_dict()

Generate a dictionary that maps indices to unique names.

Parameters:: global_index – CalipyIndex containing global index tensor.
Returns:: Dict mapping each elment of index tensor to unique name.

property is_empty: Indicates if no indexing is actually performed because e.g. indexed quantity is a scalar and does not have dims. is_empty is passed to the CalipyTensor __getitem__ method to check if the full tensor is to be returned.

property is_null: Indicates if the Index is null and will not perform any subsampling just returning the orginal tensor

is_reducible(dims_to_keep)

Determine if the index is reducible to the specified dimensions without loss.

Parameters:: dims_to_keep – DimTuple of CalipyDims to keep.
Returns:: True if reducible without loss, False otherwise.

classmethod null(): Construct a null index object.

reduce_to_dims(dims_to_keep)

Reduce the current index to cover some subset of dimensions.

Parameters:: dims_to_keep – A DimTuple containing the target dimensions.
Returns:: A new CalipyIndex instance with the reduced index tensor.

class calipy.tensor.CalipyIndexer(dims, name=None)

Bases: object

Base class of an Indexer that implements methods for assigning dimensions to specific slices of e.g. tensors or distributions. The methods and attributes are rarely called directly; user interaction happens mostly with the subclasses TensorIndexer and DistIndexer. Within these, functionality for subsampling, batching, name generation etc are conretized. See those classes for examples and specific implementation details.

classmethod convert_indextensor_to_tuple(indextensor)

Converts an indextensor to an indextuple so that their actions for indexing are equivalent in the sense that indexed tensor has entries from tensor indexed by indextensor values: tensor[indextuple] = tensor[indextensor.unbind(-1)].

Parameters:: indextensor – An indextensor containing an index at each location of value in tensor
Returns:: An index tuple that can be used to index tensors

classmethod convert_slice_to_indextensor(indexslice_list)

Converts an indexslice_list to an indextensor so that their actions for indexing are equivalent in the sense that sliced tensor has entries from tensor indexed by indextensor values: tensor[indexslice_list] = tensor[indextensor.unbind(-1)].

Parameters:: indexslice – An index slice that can be used to index tensors
Returns:: An indextensor containing an index at each location of value in tensor

classmethod convert_tuple_to_indextensor(indextuple)

Converts an indexstuple to an indextensor so that their actions for indexing are equivalent in the sense that indexed tensor has entries from tensor indexed by indextensor values: tensor[indextuple] = tensor[indextensor.unbind(-1)].

Parameters:: indextuple – An index tuple that can be used to index tensors
Returns:: An indextensor containing an index at each location of value in tensor

class calipy.tensor.CalipyTensor(tensor, dims=None, name='tensor_noname')

Bases: object

Class that wraps torch.Tensor objects and augments them with indexing operations and dimension upkeep functionality, while referring most torch functions to its wrapped torch. Tensor object. Can be sliced and indexed in the usual ways which produces another CalipyTensor whose indexer is inherited.

Special creation rules for calipy tensors:

If tensor is None, dims must be None. Produces null object
If tensor exists and dims are None. Produces calipy tensor with generic dims
If calipy tensor is passed as input. Produces the same calipy tensor
If calipy tensor is passed as input and some dims. Produces new calipy tensor with new dims.
It tensor exists and dims exist, produce regular calipy tensor.

Parameters:

tensor (torch.Tensor) – The tensor which should be embedded into CalipyTensor
dims (DimTuple) – A DimTuple containing the dimensions of the tensor or None
name (string) – A name for the CalipyTensor, useful for keeping track of derived CalipyTensor’s. Default is None.

Returns:

An instance of CalipyTensor containing functionality for dimension upkeep, indexing, and function call referral.

Return type:

CalipyTensor

Example usage:

# Imports and definitions
import torch
from calipy.tensor import CalipyTensor, TensorIndexer, CalipyIndex
from calipy.utils import dim_assignment
from calipy.data import DataTuple

# Create CalipyTensors -----------------------------------------------
#
# Create DimTuples and tensors
data_A_torch = torch.normal(0,1,[6,4,2])
batch_dims_A = dim_assignment(dim_names = ['bd_1_A', 'bd_2_A'])
event_dims_A = dim_assignment(dim_names = ['ed_1_A'])
data_dims_A = batch_dims_A + event_dims_A
data_A_cp = CalipyTensor(data_A_torch, data_dims_A, name = 'data_A')

# Confirm that subsampling works as intended
subtensor_1 = data_A_cp[0,0:3,...]  # identical to next line
subtensor_1 = data_A_cp[0:1,0:3,...]
subtensor_1.dims == data_A_cp.dims
assert((subtensor_1.tensor - data_A_cp.tensor[0:1,0:3,...] == 0).all())
# subsample has global_index that can be used for subsampling on tensors
# and on CalipyTensors
assert((data_A_cp.tensor[subtensor_1.indexer.global_index.tuple] 
        - data_A_cp.tensor[0:1,0:3,...] == 0).all())
assert(((data_A_cp[subtensor_1.indexer.global_index] 
        - data_A_cp[0:1,0:3,...]).tensor == 0).all())

# When using an integer, dims are kept; i.e. singleton dims are not reduced
subtensor_2 = data_A_cp[0,0:3,...]
assert((subtensor_2.tensor == data_A_cp[0,0:3,...].unsqueeze(0)).all())

# Indexing of CalipyTensors via int, tuple, slice, and CalipyIndex
data_A_cp[0,:]
local_index = data_A_cp.indexer.local_index
data_A_cp[local_index]
# During addressing, appropriate indexers are built
data_A_cp[0,:].indexer.global_index
data_A_cp[local_index].indexer.global_index

# CalipyTensors work well even when some dims are empty
# Set up data and dimensions
data_0dim = torch.ones([])
data_1dim = torch.ones([5])
data_2dim = torch.ones([5,2])

batch_dim = dim_assignment(['bd'])
event_dim = dim_assignment(['ed'])
empty_dim = dim_assignment(['empty'], dim_sizes = [])

data_0dim_cp = CalipyTensor(data_0dim, empty_dim)
data_1dim_cp = CalipyTensor(data_1dim, batch_dim)
data_1dim_cp = CalipyTensor(data_1dim, batch_dim + empty_dim)
data_1dim_cp = CalipyTensor(data_1dim, empty_dim + batch_dim + empty_dim)

data_2dim_cp = CalipyTensor(data_2dim, batch_dim + event_dim)
data_2dim_cp = CalipyTensor(data_2dim, batch_dim + empty_dim + event_dim)

# Indexing a scalar with an empty index just returns the scalar
data_0dim_cp.indexer
zerodim_index = data_0dim_cp.indexer.local_index
zerodim_index.is_empty
data_0dim_cp[zerodim_index]

# # These produce errors or warnings as they should.
# data_0dim_cp = CalipyTensor(data_0dim, batch_dim) # Trying to assign nonempty dim to scalar
# data_1dim_cp = CalipyTensor(data_1dim, empty_dim) # Trying to assign empty dim to vector
# data_2dim_cp = CalipyTensor(data_2dim, batch_dim + empty_dim) # Trying to assign empty dim to vector


# CalipyTensor / DataTuple interaction ---------------------------------
#
# DataTuple and CalipyTensor interact well: In the following we showcase
# that a DataTuple of CalipyTensors can be subsampled by providing a
# DataTuple of CalipyIndexes or a single CalipyIndex that is automatically
# distributed over the CalipyTensors for indexing.

# Set up DataTuple of CalipyTensors
batch_dims = dim_assignment(dim_names = ['bd_1'])
event_dims_A = dim_assignment(dim_names = ['ed_1_A', 'ed_2_A'])
data_dims_A = batch_dims + event_dims_A
event_dims_B = dim_assignment(dim_names = ['ed_1_B'])
data_dims_B = batch_dims + event_dims_B
data_A_torch = torch.normal(0,1,[6,4,2])
data_A_cp = CalipyTensor(data_A_torch, data_dims_A, 'data_A')
data_B_torch = torch.normal(0,1,[6,3])
data_B_cp = CalipyTensor(data_B_torch, data_dims_B, 'data_B')

data_AB_tuple = DataTuple(['data_A_cp', 'data_B_cp'], [data_A_cp, data_B_cp])


# Subsampling functionality -------------------------------------------
#
# subsample the data individually
data_AB_subindices = TensorIndexer.create_simple_subsample_indices(batch_dims[0], data_A_cp.shape[0], 5)
data_AB_subindex = data_AB_subindices[0]
data_A_subindex = data_AB_subindex.expand_to_dims(data_dims_A, data_A_cp.shape)
data_B_subindex = data_AB_subindex.expand_to_dims(data_dims_B, data_B_cp.shape)
data_AB_sub_1 = DataTuple(['data_A_cp_sub', 'data_B_cp_sub'], [data_A_cp[data_A_subindex], data_B_cp[data_B_subindex]])

# Use subsampling functionality for DataTuples, either by passing a DataTuple of
# CalipyIndex or a single CalipyIndex that is broadcasted
data_AB_subindex_tuple = DataTuple(['data_A_cp', 'data_B_cp'], [data_A_subindex, data_B_subindex])
data_AB_sub_2 = data_AB_tuple.subsample(data_AB_subindex_tuple)
data_AB_sub_3 = data_AB_tuple.subsample(data_AB_subindex)
assert ((data_AB_sub_1[0] - data_AB_sub_2[0]).tensor == 0).all()
assert ((data_AB_sub_2[0] - data_AB_sub_3[0]).tensor == 0).all()


# Expansion and reordering -------------------------------------------
#
# Expand a tensor by copying it among some dimensions.
data_dims_A = data_dims_A.bind([6,4,2])
data_dims_B = data_dims_B.bind([6,3])
data_dims_expanded = data_dims_A + data_dims_B[1:]
data_A_expanded_cp = data_A_cp.expand_to_dims(data_dims_expanded)
assert((data_A_expanded_cp[:,:,:,0].tensor.squeeze() - data_A_cp.tensor == 0).all())
# Ordering of dims is also ordering of result
data_dims_expanded_reordered = data_dims_A[1:] + data_dims_A[0:1] + data_dims_B[1:]
data_A_expanded_reordered_cp = data_A_cp.expand_to_dims(data_dims_expanded_reordered)
assert((data_A_expanded_reordered_cp.tensor -
        data_A_expanded_cp.tensor.permute([1,2,0,3]) == 0).all())

# There also exists a CalipyTensor.reorder(dims) method
data_dims_A_reordered = event_dims_A + batch_dims
data_A_reordered_cp = data_A_cp.reorder(data_dims_A_reordered)
assert((data_A_reordered_cp.tensor - data_A_cp.tensor.permute([1,2,0]) == 0).all())
assert(data_A_reordered_cp.dims == data_dims_A_reordered)


# Null object functionality -------------------------------------------

# CalipyTensors and CalipyIndex also work with None inputs to produce Null objects
# Create data for initialization
tensor_dims = dim_assignment(['bd', 'ed'])
tensor_cp = CalipyTensor(torch.ones(6, 3), tensor_dims) 
tensor_none = None

index_full = tensor_cp.indexer.local_index
index_none = None

# ii) Create and investigate null CalipyIndex
CI_none = CalipyIndex(None)
print(CI_none)
CI_expanded = CI_none.expand_to_dims(tensor_dims, [5,2])

# Passing a null index to CalipyTensor returns the orginal tensor.
tensor_cp[CI_none]
tensor_cp[CI_expanded]
# The following errors out, as intended: 
#   CalipyIndex(torch.ones([1]), index_tensor_dims = None)

# iii) Create and investigate null CalipyTensor
CT_none = CalipyTensor(None)
CT_none
CT_none[CI_none] 
CT_none[CI_expanded]

tensor_dims_bound = tensor_dims.bind(tensor_cp.shape)
CT_expanded = CT_none.expand_to_dims(tensor_dims_bound)
# The following errors out, as intended: 
#   CalipyIndex(torch.ones([1]), index_tensor_dims = None)


# Special creation rules for calipy tensors ---------------------------
#   i) If tensor is None, dims must be None. Produces null object
#   ii) If tensor exists and dims are None. Produces calipy tensor with generic dims
#   iii) If calipy tensor is passed as input. Produces the same calipy tensor
#   iv) If calipy tensor is passd as input and some dims. Produces new calipy tensor with new dims.

tensor_A = torch.ones([5,2])
dims_A = dim_assignment(['bd', 'ed'])
dims_A_alt = dim_assignment(['bd_alt', 'ed_alt'])
tensor_A_cp = CalipyTensor(tensor_A, dims_A)

tensor_cp_None = CalipyTensor(None)
tensor_cp_default = CalipyTensor(tensor_A)
tensor_cp_idempotent = CalipyTensor(tensor_A_cp)
tensor_cp_alt = CalipyTensor(tensor_A_cp, dims_A_alt)
print(tensor_cp_alt)

property T

expand_to_dims(dims)

Expands the current CalipyTensor to another CalipyTensor with dims specified in argument dims. Returns a CalipyTensor with dims dims that consists of copies of self where expansion is necessary.

Parameters:: dims (DimTuple) – A DimTuple instance that contains the dims of the current CalipyTensor and prescribes the dims of the expanded CalipyTensor
Returns:: An instance of CalipyTensor expanded to match dims.
Return type:: CalipyTensor

Example usage:

# Create DimTuples and tensors
data_torch = torch.normal(0,1,[10,3])
batch_dims = dim_assignment(dim_names = ['bd_1'], dim_sizes = [10])
event_dims = dim_assignment(dim_names = ['ed_1'], dim_sizes = [3])
data_dims = batch_dims + event_dims
data_cp = CalipyTensor(data_torch, data_dims, name = 'data')

batch_dims_expanded = dim_assignment(dim_names = ['bd_1', 'bd_2'], dim_sizes = [10,5])
data_dims_expanded = batch_dims_expanded + event_dims
data_expanded_cp = data_cp.expand_to_dims(data_dims_expanded)
assert((data_expanded_cp[:,0,:].tensor.squeeze() - data_cp.tensor == 0).all())

flatten(dims_to_flatten, name_flat_dim='flat_dim')

Flattens the current CalipyTensor to another CalipyTensor by collapsing the dims dims_to_flatten towards a new dim with name name_dim_flat. Procedure consists of reordering followed by reshaping and repackaging into a new CalipyTensor.

Parameters:

dims_to_flatten (DimTuple) – A DimTuple instance that contains some dims of the current CalipyTensor that are to be flattened.
name_flat_dim – A string providing the name of the flattened dim that will replace the collapsed dims.

Returns:

An instance of CalipyTensor reordered to match dims.

Return type:

CalipyTensor

Example usage:

# Imports and definitions
import torch
from calipy.utils import dim_assignment
from calipy.tensor import CalipyTensor

# Create DimTuples and tensors
data_torch = torch.normal(0,1,[10,5,3])
batch_dims = dim_assignment(dim_names = ['bd_1', 'bd_2'], dim_sizes = [10,5])
event_dims = dim_assignment(dim_names = ['ed_1'], dim_sizes = [3])
data_dims = batch_dims + event_dims
data_cp = CalipyTensor(data_torch, data_dims, name = 'data')

data_flattened_cp = data_cp.flatten(batch_dims)
assert((data_flattened_cp.tensor - data_cp.tensor.reshape([50,3]) == 0).all())

get_element(dims, indices)

Access the tensor at positions indices in dims dims of self and return it. Returns a CalipyTensor with the same dims as self.

Parameters:

dims (DimTuple) – A DimTuple instance that represents the dims in which indexing should be performed
indices (list of int) – A list containing integers providing where to access the corresponding dimensions. match between dims and indices is done via ordering

Returns:

An instance of CalipyTensor containing the one element of self

where [dims = indices ,…]. :rtype: CalipyTensor

Example usage:

# Imports and definitions
import torch
from calipy.tensor import CalipyTensor
from calipy.utils import dim_assignment

# Create DimTuples and tensors
data_torch = torch.normal(0,1,[10,5,3])
batch_dims = dim_assignment(dim_names = ['bd_1', 'bd_2'], dim_sizes = [10,5])
event_dims = dim_assignment(dim_names = ['ed_1'], dim_sizes = [3])
data_dims = batch_dims + event_dims
data_cp = CalipyTensor(data_torch, data_dims, name = 'data')

# Access the single element where batch_dim 'bd_1' has the value 5
data_cp_element_1 = data_cp.get_element(batch_dims[0:1], [5])
assert((data_cp_element_1.tensor.squeeze() - data_cp.tensor[5,...] == 0).all())

# Access the single element where batch_dims has the value [5,2]
data_cp_element_2 = data_cp.get_element(batch_dims, [5,2])
assert((data_cp_element_2.tensor.squeeze() - data_cp.tensor[5,2,...] == 0).all())

property is_null: Indicates if the Index is null and will not perform any subsampling just returning the orginal tensor

reorder(dims)

Reorders the current CalipyTensor to another CalipyTensor with dims ordered as specified in argument dims.

Parameters:: dims (DimTuple) – A DimTuple instance that contains the dims of the current CalipyTensor and prescribes the dims of the reordered CalipyTensor
Returns:: An instance of CalipyTensor reordered to match dims.
Return type:: CalipyTensor

Example usage:

# Create DimTuples and tensors
data_torch = torch.normal(0,1,[10,5,3])
batch_dims = dim_assignment(dim_names = ['bd_1', 'bd_2'], dim_sizes = [10,5])
event_dims = dim_assignment(dim_names = ['ed_1'], dim_sizes = [3])
data_dims = batch_dims + event_dims
data_cp = CalipyTensor(data_torch, data_dims, name = 'data')

data_dims_reordered = event_dims + batch_dims
data_reordered_cp = data_cp.reorder(data_dims_reordered)
assert((data_reordered_cp.tensor - data_cp.tensor.permute([2,0,1]) == 0).all())
assert(data_reordered_cp.dims == data_dims_reordered)

class calipy.tensor.IOIndexer(calipy_io, dims, name=None)

Bases: CalipyIndexer

Class to handle indexing operations for CalipyIO objects, including creating local and global indices, managing subsampling, and generating named dictionaries for indexing purposes. Takes as input a CalipyIO object and a DimTuple object and creates a CalipyIndexer object that can be used to produce indices, bind dimensions, order the calipy_io and similar other support functionality. Indexing is performed over the calipy_list elements.

Parameters:

calipy_io (CalipyIO) – The CalipyIO object for which the indexer is to be constructed
dims (DimTuple) – A DimTuple containing the dimensions of the tensor
name (string) – A name for the indexer, useful for keeping track of subservient indexers. Default is None.

Returns:

An instance of TensorIndexer containing functionality for indexing the input tensor including subbatching, naming, index tensors.

Return type:

TensorIndexer

Example usage:

# Create DimTuples and tensors
data_A_torch = torch.normal(0,1,[6,4,2])
batch_dims_A = dim_assignment(dim_names = ['bd_1_A', 'bd_2_A'])

create_global_index(subsample_indextensor=None, data_source_name=None)

Create a global CalipyIndex object enumerating all possible indices for all the dims. The indices global_index_tensor are chosen such that they can be used to access the data in data_source with name data_source_name via self.tensor = data_source[global_index_tensor_tuple]

Parameters:

subsample_indextensor – An index tensor that enumerates for all the entries of self.tensor which index needs to be used to access it in some global dataset.
data_source_name – A string serving as info to record which object the global indices are indexing.

Returns:

A CalipyIndex object global index containing indexing data that describes how the tensor is related to the superpopulation it has been sampled from.

create_local_index()

Create a local index tensor enumerating all indices for the list calipy_list inside of the CalipyIO object.

Returns:: Returns CalipyIndex containing torch tensors with indices representing all list indices.

simple_subsample(batch_dim, subsample_size)

Generate indices for subbatching across a single batch dimension and extract the subbatches.

Parameters:

batch_dim – Element of DimTuple (typically CalipyDim) along which subbatching happens.
subsample_size – Single size determining length of batches to create.

Returns:

List of tensors and CalipyIndex representing the subbatches.

class calipy.tensor.TensorIndexer(tensor, dims, name=None)

Bases: CalipyIndexer

Class to handle indexing operations for observations, including creating local and global indices, managing subsampling, and generating named dictionaries for indexing purposes. Takes as input a tensor and a DimTuple object and creates a CalipyIndexer object that can be used to produce indices, bind dimensions, order the tensor and similar other support functionality.

Parameters:

tensor (torch.Tensor) – The tensor for which the indexer is to be constructed
dims (DimTuple) – A DimTuple containing the dimensions of the tensor
name (string) – A name for the indexer, useful for keeping track of subservient indexers. Default is None.

Returns:

An instance of TensorIndexer containing functionality for indexing the input tensor including subbatching, naming, index tensors.

Return type:

TensorIndexer

Example usage:

# Create DimTuples and tensors
data_A_torch = torch.normal(0,1,[6,4,2])
batch_dims_A = dim_assignment(dim_names = ['bd_1_A', 'bd_2_A'])
event_dims_A = dim_assignment(dim_names = ['ed_1_A'])
data_dims_A = batch_dims_A + event_dims_A


# Evoke indexer
data_A = CalipyTensor(data_A_torch, data_dims_A, 'data_A')
indexer = data_A.indexer
print(indexer)

# Indexer contains the tensor, its dims, and bound tensor
indexer.tensor
indexer.tensor_dims
indexer.tensor_dims.__class__
indexer.tensor_dims.sizes
indexer.tensor_torchdims
indexer.tensor_torchdims.__class__
indexer.tensor_torchdims.sizes
indexer.tensor_named
indexer.index_dim
indexer.index_tensor_dims

# Functionality indexer
attr_list = [attr for attr in dir(indexer) if '__' not in attr]
print(attr_list)

# Functionality index
local_index = data_A.indexer.local_index
local_index
local_index.dims
local_index.tensor.shape
local_index.index_name_dict
assert (data_A.tensor[local_index.tuple] == data_A.tensor).all()
assert ((data_A[local_index] - data_A).tensor == 0).all()


# Reordering and indexing by DimTuple
reordered_dims = DimTuple((data_dims_A[1], data_dims_A[2], data_dims_A[0]))
data_A_reordered = data_A.indexer.reorder(reordered_dims)
data_tdims_A = data_dims_A.build_torchdims()
data_tdims_A_reordered = data_tdims_A[reordered_dims]
data_A_named_tensor = data_A.tensor[data_tdims_A]
data_A_named_tensor_reordered = data_A_reordered.tensor[data_tdims_A_reordered]
assert (data_A_named_tensor.order(*data_tdims_A) == data_A_named_tensor_reordered.order(*data_tdims_A)).all()

# Subbatching along one or multiple dims
subsamples, subsample_indices = data_A.indexer.simple_subsample(batch_dims_A[0], 5)
print('Shape subsamples = {}'.format([subsample.shape for subsample in subsamples]))
block_batch_dims_A = batch_dims_A
block_subsample_sizes_A = [5,3]
block_subsamples, block_subsample_indices = data_A.indexer.block_subsample(block_batch_dims_A, block_subsample_sizes_A)
print('Shape block subsamples = {}'.format([subsample.shape for subsample in block_subsamples]))

# Inheritance - by construction
# Suppose we got data_C as a subset of data_B with derived ssi CalipyIndex and
# now want to index data_C with proper names and references
#   1. generate data_B
batch_dims_B = dim_assignment(['bd_1_B', 'bd_2_B'])
event_dims_B = dim_assignment(['ed_1_B'])
data_dims_B = batch_dims_B + event_dims_B
data_B_torch = torch.normal(0,1,[7,5,2])
data_B = CalipyTensor(data_B_torch, data_dims_B, 'data_B')

#   2. subsample data_C from data_B
block_data_C, block_indices_C = data_B.indexer.block_subsample(batch_dims_B, [5,3])
block_nr = 3
data_C = block_data_C[block_nr]
block_index_C = block_indices_C[block_nr]

#   3. subsampling has created an indexer for data_C
data_C.indexer
data_C.indexer.local_index
data_C.indexer.global_index
data_C.indexer.local_index.tensor
data_C.indexer.global_index.tensor
data_C.indexer.global_index.index_name_dict
data_C.indexer.data_source_name

data_C_local_index = data_C.indexer.local_index
data_C_global_index = data_C.indexer.global_index
assert (data_C.tensor[data_C_local_index.tuple] == data_B.tensor[data_C_global_index.tuple]).all()
assert ((data_C[data_C_local_index] - data_B[data_C_global_index]).tensor == 0).all()

# Inheritance - by declaration
# If data comes out of some external subsampling and only the corresponding indextensors
# are known, the calipy_indexer can be evoked manually.
data_D_torch = copy.copy(data_C.tensor)
index_tensor_D = block_index_C.tensor

data_D = CalipyTensor(data_D_torch, data_dims_B, 'data_D')
data_D.indexer.create_global_index(index_tensor_D, 'from_data_D')
data_D_global_index = data_D.indexer.global_index

assert (data_D.tensor == data_B.tensor[data_D_global_index.tuple]).all()
assert ((data_D - data_B[data_D_global_index]).tensor == 0).all()

# Alternative way of calling via DataTuples
data_E_torch = torch.normal(0,1,[5,3])
batch_dims_E = dim_assignment(dim_names = ['bd_1_E'])
event_dims_E = dim_assignment(dim_names = ['ed_1_E'])
data_dims_E = batch_dims_E + event_dims_E

data_names_list = ['data_A', 'data_E']
data_list = [data_A_torch, data_E_torch]
data_datatuple_torch = DataTuple(data_names_list, data_list)

batch_dims_datatuple = DataTuple(data_names_list, [batch_dims_A, batch_dims_E])
event_dims_datatuple = DataTuple(data_names_list, [event_dims_A, event_dims_E])
data_dims_datatuple = batch_dims_datatuple + event_dims_datatuple

data_datatuple = data_datatuple_torch.calipytensor_construct(data_dims_datatuple)
data_datatuple['data_A'].indexer


# Functionality for creating indices with TensorIndexer class methods
# It is possible to create subsample_indices even when no tensor is given
# simply by calling the class method TensorIndexer.create_block_subsample_indices
# or TensorIndexer.create_simple_subsample_indices and providing the 
# appropriate size specifications.         
# i) Create the dims (with unspecified size so no conflict later when subbatching)
batch_dims_FG = dim_assignment(['bd_1_FG', 'bd_2_FG'])
event_dims_F = dim_assignment(['ed_1_F', 'ed_2_F'])
event_dims_G = dim_assignment(['ed_1_G'])
data_dims_F = batch_dims_FG + event_dims_F
data_dims_G = batch_dims_FG + event_dims_G

# ii) Sizes
batch_dims_FG_sizes = [10,7]
event_dims_F_sizes = [6,5]
event_dims_G_sizes = [4]
data_dims_F_sizes = batch_dims_FG_sizes + event_dims_F_sizes
data_dims_G_sizes = batch_dims_FG_sizes + event_dims_G_sizes

# iii) Then create the data
data_F_torch = torch.normal(0,1, data_dims_F_sizes)
data_F = CalipyTensor(data_F_torch, data_dims_F, 'data_F')
data_G_torch = torch.normal(0,1, data_dims_G_sizes)
data_G = CalipyTensor(data_G_torch, data_dims_G, 'data_G')

# iv) Create and expand the reduced_index
indices_reduced = TensorIndexer.create_block_subsample_indices(batch_dims_FG, batch_dims_FG_sizes, [9,5])
index_reduced = indices_reduced[0]

# Functionality for expanding, reducing, and reordering indices
# Indices like the ones above can be used flexibly by expanding them to
# fit tensors with various dimensions. They can also be changed w.r.t 
# their order.

# i) Expand index to fit data_F and data_G
index_expanded_F = index_reduced.expand_to_dims(data_dims_F, [None]*len(batch_dims_FG) + event_dims_F_sizes)
index_expanded_G = index_reduced.expand_to_dims(data_dims_G, [None]*len(batch_dims_FG) + event_dims_G_sizes)
assert (data_F.tensor[index_expanded_F.tuple] == data_F.tensor[index_reduced.tensor[:,:,0], index_reduced.tensor[:,:,1], :,:]).all()
assert ((data_F[index_expanded_F] - data_F[index_reduced.tensor[:,:,0], index_reduced.tensor[:,:,1], :,:]).tensor ==0).all()

# ii) Reordering is done by passing in a differently ordered DimTuple
data_dims_F_reordered = dim_assignment(['ed_2_F', 'bd_2_FG', 'ed_1_F', 'bd_1_FG'])
data_dims_F_reordered_sizes = [5, None, 6, None]
index_expanded_F_reordered = index_reduced.expand_to_dims(data_dims_F_reordered, data_dims_F_reordered_sizes)
data_F_reordered = data_F.indexer.reorder(data_dims_F_reordered)
data_F_subsample = data_F[index_expanded_F]
data_F_reordered_subsample = data_F_reordered[index_expanded_F_reordered]
assert (data_F_subsample.tensor == data_F_reordered_subsample.tensor.permute([3,1,2,0])).all()

# iii) Index expansion can also be performed by the indexer of a tensor;
# this is usually more convenient
index_expanded_F_alt = data_F.indexer.expand_index(index_reduced)
index_expanded_G_alt = data_G.indexer.expand_index(index_reduced)
data_F_subsample_alt = data_F[index_expanded_F_alt.tuple]
data_G_subsample_alt = data_G[index_expanded_G_alt.tuple]
assert (data_F_subsample.tensor == data_F_subsample_alt.tensor).all()
assert ((data_F_subsample - data_F_subsample_alt).tensor == 0).all()

# Inverse operation is index_reduction (only possible when index is cartesian product)
assert (index_expanded_F.is_reducible(batch_dims_FG))
assert (index_reduced.tensor == index_expanded_F.reduce_to_dims(batch_dims_FG).tensor).all()
assert (index_reduced.tensor == index_expanded_G.reduce_to_dims(batch_dims_FG).tensor).all()

# Illustrate nonseparable case
inseparable_index = CalipyIndex(torch.randint(10, [10,7,6,5,4]), data_dims_F)
inseparable_index.is_reducible(batch_dims_FG)
inseparable_index.reduce_to_dims(batch_dims_FG) # Produces a warning as it should

block_subsample(batch_dims, subsample_sizes)

Generate indices for block subbatching across multiple batch dimensions and extract the subbatches.

Parameters:

batch_dims – DimTuple with dims along which subbatching happens
subsample_sizes – Tuple with sizes of the blocks to create.

Returns:

List of tensors and CalipyIndex representing the block subatches.

classmethod create_block_subsample_indices(batch_dims, tensor_shape, subsample_sizes)

Create a CalipyIndex that indexes only the specified batch_dims.

Parameters:

batch_dims – DimTuple of batch dimensions to index.
tensor_shape – List containing the sizes of the unsubsampled tensor
subsample_sizes – Sizes for subsampling along each batch dimension.

Returns:

A list of CalipyIndex instances indexing the batch_dims.

create_global_index(subsample_indextensor=None, data_source_name=None)

Create a global CalipyIndex object enumerating all possible indices for all the dims. The indices global_index_tensor are chosen such that they can be used to access the data in data_source with name data_source_name via self.tensor = data_source[global_index_tensor_tuple]

Parameters:

subsample_indextensor – An index tensor that enumerates for all the entries of self.tensor which index needs to be used to access it in some global dataset.
data_source_name – A string serving as info to record which object the global indices are indexing.

Returns:

A CalipyIndex object global index containing indexing data that describes how the tensor is related to the superpopulation it has been sampled from.

create_local_index()

Create a local index tensor enumerating all possible indices for all the dims. The indices local_index_tensor are chosen such that they can be used for indexing the tensor via value = tensor[i,j,k] = tensor[local_index_tensor[i,j,k,:]], i.e. the index at [i,j,k] is [i,j,k]. A more compact form of indexing is given by directly accessing the index tuples via tensor = tensor[local_index_tensor_tuple]

Returns:: Writes torch tensors with indices representing all possible positions into the index local_index.tensor: index_tensor containing an index at each location of value in tensor local_index.tuple: index_tensor split into tuple for straightforward indexing

classmethod create_simple_subsample_indices(batch_dim, batch_dim_size, subsample_size)

Create a CalipyIndex that indexes only the specified singular batch_dim.

Parameters:

batch_dim – Element of DimTuple (typically CalipyDim) along which subbatching happens.
batch_dim_size – The integer size of the unsubsampled tensor
subsample_size – Single integer size determining length of batches to create.

Returns:

A list of CalipyIndex instances indexing the batch_dim.

expand_index(index_reduced)

Expand the CalipyIndex index_reduced to align with the dimensions self.dims of the current tensor self.tensor.

Parameters:: index_reduced – A CalipyIndex instance whosed dims are a subset of the dims of the current tensor.
Returns:: A new CalipyIndex instance with the expanded index tensor.

reorder(order_dimtuple)

Generate out of self.tensor a new tensor that is reordered to align with the order given in the order_dimtuple DimTuple object.

Parameters:: order_dimtuple – DimTuple of CalipyDim objects whose sequence determines permutation and index binding of the produced tensor.
Returns:: A tensor with an calipy.indexer where all ordering is aligned to order_dimtuple

simple_subsample(batch_dim, subsample_size)

Generate indices for subbatching across a single batch dimension and extract the subbatches.

Parameters:

batch_dim – Element of DimTuple (typically CalipyDim) along which subbatching happens.
subsample_size – Single size determining length of batches to create.

Returns:

List of tensors and CalipyIndex representing the subbatches.

calipy.tensor.broadcast_dims(dims_1, dims_2)

Check if DimTuples dims_1 and dims_2 can be broadcasted together and, if so, produces a new DimTuple that employs PyTorch’s broadcasting logic on extended dims. Unlike a simple right-to-left alignment, this version explicitly pads DimTuples towards a consistent superDimTuple with dims of size=1 where dims need to be injected for consistency. Then pytorchs broadcasting functionality is called on the extended shapes.

This helps avoid the scenario where the last dimension of one tensor is matched with the first dimension of another just because of naive negative indexing. We thereby do not fully emulate how PyTorch handles missing dims but achieve a more dimension aware broadcast.

Steps:

Merge dimension name sequences into a minimal supersequence.
Expand each DimTuple to that full name list, filling size=1 for missing dims.
Let PyTorch do shape-based broadcasting.
Build a final DimTuple from the broadcasted shape, reusing dimension names from the supersequence.

Parameters:

dims_1 (DimTuple or None) – The first DimTuple to broadcast, or None/empty to indicate no dims.
dims_2 (DimTuple or None) – The second DimTuple to broadcast, or None/empty to indicate no dims.

Returns:

A DimTuple reflecting the broadcasted shape if compatible, otherwise None.

Return type:

DimTuple or None

Example usage:

import torch
from calipy.tensor import CalipyTensor, broadcast_dims
from calipy.utils import Dim, DimTuple, dim_assignment

# Suppose we have:
#   c_cp of shape [2, 1], dims=('dim1','dim2')
#   b_cp of shape [2],    dims=('dim1',)
# This function ensures the second tensor is padded to [1,2],
# then does standard broadcasting, leading to final shape [2,2].
# We unify dimension names accordingly.

calipy.tensor.build_dim_supersequence(seq1, seq2)

Builds a minimal supersequence of dimension names that contains seq1 and seq2 as subsequences in the same relative order. Names that appear in both sequences are placed (and unified) only once, if they appear in a non-contradictory order.

If no valid ordering is possible (e.g., seq1 = [dim1, dim2] and seq2 = [dim2, dim1]), this raises a ValueError.

Parameters:

seq1 (list[str]) – The first dimension name sequence (list of strings).
seq2 (list[str]) – The second dimension name sequence (list of strings).

Returns:

A minimal supersequence (list of strings) that includes seq1 and seq2 in order, unifying repeated names.

Return type:

list[str]

Example usage:

# Good case:
seq1 = ['dim1','dim2']
seq2 = ['dim2','dim3']
supersequence = build_dim_supersequence(seq1, seq2)
# => supersequence = ['dim1','dim2','dim3','dim4']

# More complicated case
seq1 = ['dim1', 'dim2', 'dim4']
seq2 = ['dim2',  'dim3', 'dim4']
supersequence = build_dim_supersequence(seq1, seq2)
# => supersequence  = ['dim1', 'dim2', 'dim3', 'dim4']

# Even more complicated case:
seq1 = ['dim1', 'dim2', 'dim4', 'dim5']
seq2 = ['dim2',  'dim3', 'dim4', 'dim6']
supersequence = build_dim_supersequence(seq1, seq2)
# => supersequence  = ['dim1', 'dim2', 'dim3', 'dim4', 'dim5', 'dim6']  

# Contradiction:
seq1 = ['dim1','dim2']
seq2 = ['dim2','dim1']
supersequence = build_dim_supersequence(seq1, seq2)
# => raises ValueError

calipy.tensor.preprocess_args(args, kwargs)

Recursively preprocesses and unwraps input arguments and keyword arguments by replacing any nested CalipyTensor objects with their underlying torch.Tensor instances. Supports arbitrary nesting including dictionaries, lists, tuples, and sets.

Parameters:

args (tuple) – Positional arguments potentially containing nested CalipyTensor instances.
kwargs (dict) – Keyword arguments potentially containing nested CalipyTensor instances.

Returns:

A tuple consisting of unwrapped positional arguments and keyword arguments with all CalipyTensor instances replaced by torch.Tensor objects.

Return type:

(tuple, dict)

Example usage:

# Imports and definitions
import torch
from calipy.tensor import CalipyTensor, preprocess_args

# Create sample CalipyTensors
batch_dims = dim_assignment(dim_names = ['bd_1'])
event_dims = dim_assignment(dim_names = ['ed_1'])
data_dims = batch_dims + event_dims
tensor_a = CalipyTensor(torch.ones([ 2, 3]), dims = data_dims)
tensor_b = CalipyTensor(torch.ones([4, 5]), dims = data_dims)

# Nested structure containing CalipyTensors
args = (tensor_a, {'key1': tensor_b, 'key2': [tensor_a, 10]})
kwargs = {'param': {'nested': tensor_b}}

unwrapped_args, unwrapped_kwargs = preprocess_args(args, kwargs)

assert isinstance(unwrapped_args[0], torch.Tensor)
assert isinstance(unwrapped_args[1]['key1'], torch.Tensor)
assert isinstance(unwrapped_kwargs['param']['nested'], torch.Tensor)