utils#

data_mask#

ensure_id(mask: List[str])#
filter_data(data: dict, mask: dict | None)#
masks_overlap(pub: dict | None, sub: dict | None)#

calculates whether there is overlap between the pub and sub filters of two models. This function assumes that the two filters have been validated using validate_filter

validate_mask(data_mask: dict | None)#

determines whether the dataset filter has the correct shape, it must be lists inside dictionaries inside a dictionary. eg.: {“some_dataset”: {“some_entity_group”: [“attribute1”, “attribute2”]}}

Also, at every level, the filter must either be filled or be none. It cannot be an empty container, eg:

  • {"some_dataset": {}}

  • {"some_dataset": {"some_entity_group": ["attribute1"], "empty_group": []}}

lifecycle#

deprecated(obj=None, alternative: str | None = None)#
has_deprecations(cls)#

logging#

captureWarnings(logger)#

If logger is an instance of logging.Logger, redirect all warnings to that logger. If logger is None, ensure that warnings are not redirected to logging but to their original destinations.

get_logger(settings: Settings, name=None, capture_warnings=True)#

path#

DatasetPath(*args, **kwargs)#

JsonPath is a subclass of pathlib.Path that points to a Movici format dataset file. It has one additional method read_dict that returns a dictionary of the dataset

Parameters:

path – The location of the the dataset file

strategies#

get_instance(strat: Type[T], **kwargs) T#
get_type(strat: Type[T]) Type[T]#
reset()#
set(strat)#

time#

string_to_datetime(datetime_str: str, max_year=5000, **kwargs) datetime#

Convert a string into a datetime. datetime_str can be one of the following

  • A year (eg. ‘2025’)

  • A unix timestamp (in seconds) (eg. ‘1626684322’)

  • A dateutil parsable string

Parameters:
  • max_year – int. The cutoff for when a datestime_str representing a single integer is interpreted as a year or as a unix timestamp

  • kwargs – Additional parameters passed directly into the dateutil.parser to customize parsing. For example dayfirst=True.

unicode#

determine_new_unicode_dtype(a: ndarray, b: ndarray | str, max_size=256) dtype | None#

Determine the new unicode dtype for array a if it needs to be updated with data coming from b.

Returns: a new np.dtype if required or None if the dtype can remain the same. A new dtype is the first power of 2 that fits the dtype of b

equal_str_dtypes(a: ndarray, b: ndarray)#
get_unicode_dtype(size, max_size=256)#
largest_unicode_dtype(a: ndarray, b: ndarray | str, max_size=256)#

Determines whether the dtype of unicode array a and/or b must be upcasted to the largest size dtype of the two arrays to be able to use them both in numba jit compiled functions, since numba requires unicode arrays to be of the same itemsize in order to do certain operations, such as comparisons.

:returns The largest dtype of the two or None if no upcasting has to be done (or when the

arrays involved are not unicode or bytes)

next_power_of_two(val, max_val=256)#

Module contents#

DatasetPath(*args, **kwargs)#

JsonPath is a subclass of pathlib.Path that points to a Movici format dataset file. It has one additional method read_dict that returns a dictionary of the dataset

Parameters:

path – The location of the the dataset file

determine_new_unicode_dtype(a: ndarray, b: ndarray | str, max_size=256) dtype | None#

Determine the new unicode dtype for array a if it needs to be updated with data coming from b.

Returns: a new np.dtype if required or None if the dtype can remain the same. A new dtype is the first power of 2 that fits the dtype of b

filter_data(data: dict, mask: dict | None)#
get_logger(settings: Settings, name=None, capture_warnings=True)#
largest_unicode_dtype(a: ndarray, b: ndarray | str, max_size=256)#

Determines whether the dtype of unicode array a and/or b must be upcasted to the largest size dtype of the two arrays to be able to use them both in numba jit compiled functions, since numba requires unicode arrays to be of the same itemsize in order to do certain operations, such as comparisons.

:returns The largest dtype of the two or None if no upcasting has to be done (or when the

arrays involved are not unicode or bytes)

masks_overlap(pub: dict | None, sub: dict | None)#

calculates whether there is overlap between the pub and sub filters of two models. This function assumes that the two filters have been validated using validate_filter

string_to_datetime(datetime_str: str, max_year=5000, **kwargs) datetime#

Convert a string into a datetime. datetime_str can be one of the following

  • A year (eg. ‘2025’)

  • A unix timestamp (in seconds) (eg. ‘1626684322’)

  • A dateutil parsable string

Parameters:
  • max_year – int. The cutoff for when a datestime_str representing a single integer is interpreted as a year or as a unix timestamp

  • kwargs – Additional parameters passed directly into the dateutil.parser to customize parsing. For example dayfirst=True.

validate_mask(data_mask: dict | None)#

determines whether the dataset filter has the correct shape, it must be lists inside dictionaries inside a dictionary. eg.: {“some_dataset”: {“some_entity_group”: [“attribute1”, “attribute2”]}}

Also, at every level, the filter must either be filled or be none. It cannot be an empty container, eg:

  • {"some_dataset": {}}

  • {"some_dataset": {"some_entity_group": ["attribute1"], "empty_group": []}}