Utility functions¶

Additional features for pytorch¶

radbm.utils.torch.multi_bernoulli.poisson_binomial.log_hamming_binomial(log_p10, log_p11, log_p20, log_p21)¶

Computes the log probabilities of each Hamming Binomial events, parameterized with p1 and p2.

Parameters:	log_p10 (torch.tensor (dtype=torch.float)) – The log probability of each bits to be zero of the first random vector. The Hamming Binomial is considered to be on the last dim. shape=(a1,a2,a3,…,am,n) where n is the number of bits for each vectors. a1,a2,a3,…,am are arbitrary but should be broadcastable with the other inputs. log_p11 (torch.tensor (dtype=torch.float)) – The log probability of each bits to be one of the first random vector. The Hamming Binomial is considered to be on the last dim. shape=(a1,a2,a3,…,am,n) where n is the number of bits for each vectors. a1,a2,a3,…,am are arbitrary but should be broadcastable with the other inputs. log_p20 (torch.tensor (dtype=torch.float)) – The log probability of each bits to be zero of the second random vector. The Hamming Binomial is considered to be on the last dim. shape=(a1,a2,a3,…,am,n) where n is the number of bits for each vectors. a1,a2,a3,…,am are arbitrary but should be broadcastable with the other inputs. log_p21 (torch.tensor (dtype=torch.float)) – The log probability of each bits to be one of the second random vector. The Hamming Binomial is considered to be on the last dim. shape=(a1,a2,a3,…,am,n) where n is the number of bits for each vectors. a1,a2,a3,…,am are arbitrary but should be broadcastable with the other inputs.
Returns:	log_hb – The log probability of each Hamming Binomial events. shape=(a1,a2,a3,…,am,n+1). log_pb[i1,i2,i3,…,am,k] is the log probability that the hamming distance between the two random Multi-Bernoulli parameterized by log_p11.exp()[i1,i2,i3,…,am] and log_p21.exp()[i1,i2,i3,…,am] be k.
Return type:	torch.tensor (dtype=torch.float)

Notes

see log_poisson_binomial’s notes.

radbm.utils.torch.multi_bernoulli.poisson_binomial.log_poisson_binomial(log_q0, log_q1)¶

Computes the events log probabilities w.r.t. a batch of Poisson Binomial R.V. The computation is numerically stable.

Parameters:	log_q0 (torch.tensor (dtype=torch.float)) – The log probability of each bits to be zero. The Poisson Binomial is considered to be on the last dim. shape=(a1,a2,a3,…,am,n) where n is the number of bits for each Poisson Binomial. a1,a2,a3,…,am are arbitrary but should match with log_q1. log_q1 (torch.tensor (dtype=torch.float)) – The log probability of each bits to be one. The Poisson Binomial is considered to be on the last dim. shape=(a1,a2,a3,…,am,n) where n is the number of bits for each Poisson Binomial. a1,a2,a3,…,am are arbitrary but should match with log_q0.
Returns:	log_pb – The log probability of each Poisson Binomial events. shape=(a1,a2,a3,…,am,n+1). log_pb[i1,i2,i3,…,am,k] is the log probability that the sum of the n Bernoulli with parameters log_q1.exp()[i1,i2,i3,…,am] gives k.
Return type:	torch.tensor(dtype=torch.float)

Notes

We should have (1-log_q0.exp()).log() = log_q1 in theory, hence the input of this function is over specified. But for numerical stability we need both log_q0 and log_q1 and it is not possible to compute log_q0 from log_q1 with numerical stability (and vice versa). This is why they are both required as input. Since in some cases, it is possible to compute log_q0 and log_q1 with numerical stability, i.e. log_q0 = log_sigmoid(-logits) and log_q1 = log_sigmoid(logits)

radbm.utils.torch.multi_bernoulli.log_arithmetic.multi_bernoulli_activated_equality(xz, yz, az)¶

Compute the bitwise log probability that two Multi-Bernoulli are equal or that a third Multi-Bernoulli is one.

Parameters:

xz (torch.tensor) – the logits (before sigmoid) of the first Multi-Bernoulli
yz (torch.tensor) – the logits (before sigmoid) of the second Multi-Bernoulli
az (torch.tensor) – the logits of the third Multi-Bernoulli which act as an activation of the equality.

Returns:

log_p0 (torch.tensor) – the bitwise log probability that the two Multi-Bernoulli are not equal and the third is zero.
log_p1 (torch.tensor) – the bitwise log probability that the two Multi-Bernoulli are equal or the third is one.

Notes

xz and yz need not to have the same shape, but they should be broadcastable.

radbm.utils.torch.multi_bernoulli.log_arithmetic.multi_bernoulli_activated_subset(xz, yz, az)¶

Compute the bitwise log probability that the first Multi-Bernoulli is lower are equal to the second or that a third Multi-Bernoulli is one.

Parameters:

xz (torch.tensor) – the logits (before sigmoid) of the first Multi-Bernoulli
yz (torch.tensor) – the logits (before sigmoid) of the second Multi-Bernoulli
az (torch.tensor) – the logits of the third Multi-Bernoulli which act as an activation of the “subset”.

Returns:

log_p0 (torch.tensor)
log_p1 (torch.tensor)

Notes

xz and yz need not to have the same shape, but they should be broadcastable.

radbm.utils.torch.multi_bernoulli.log_arithmetic.multi_bernoulli_equality(xz, yz)¶

Compute the bitwise log probability that two Multi-Bernoulli are equal.

Parameters:

xz (torch.tensor) – the logits (before sigmoid) of the first Multi-Bernoulli
yz (torch.tensor) – the logits (before sigmoid) of the second Multi-Bernoulli

Returns:

log_p0 (torch.tensor) – the bitwise log probability that the two Multi-Bernoulli are not equal
log_p1 (torch.tensor) – the bitwise log probability that the two Multi-Bernoulli are equal

Notes

xz and yz need not to have the same shape, but they should be broadcastable.

radbm.utils.torch.multi_bernoulli.log_arithmetic.multi_bernoulli_subset(xz, yz)¶

Compute the bitwise log probability that the first Multi-Bernoulli is lower are equal to the second.

Parameters:

xz (torch.tensor) – the logits (before sigmoid) of the first Multi-Bernoulli
yz (torch.tensor) – the logits (before sigmoid) of the second Multi-Bernoulli

Returns:

log_p0 (torch.tensor) – the bitwise log probability of not subset
log_p1 (torch.tensor) – the bitwise log probability of subset

Notes

xz and yz need not to have the same shape, but they should be broadcastable.

radbm.utils.torch.multi_bernoulli.log_arithmetic.torch_log_prob_any(log_q0, log_q1)¶

Similar to x.any() but for log probabilities (instead of booleans). The any is taken across the last dim.

Parameters:

log_q0 (torch.tensor (dtype=torch.float)) – The log probability of each bits to be zero. The any operation is over the last dim. shape=(a1,a2,a3,…,am,n) where n is the number of (independant) Bernoullis. a1,a2,a3,…,am are arbitrary but should match with log_q1.
log_q1 (torch.tensor (dtype=torch.float)) – The log probability of each bits to be one. The any operation is over the last dim. shape=(a1,a2,a3,…,am,n) where n is the number of (independant) Bernoullis. a1,a2,a3,…,am are arbitrary but should match with log_q1.

Returns:

log_nor (torch.tensor (dtype=torch.float))
log_or (torch.tensor (dtype=torch.float))

Probabilistic Distributions¶

class radbm.utils.time.chronometer.Chronometer¶

A chronometer found time codes.

reset()¶: Resets the chronometer.

start()¶: Starts the chronometer.

stop()¶: Stops the chronometer.

time() : float: return the number of second on the chronometer.

radbm.utils.stats.generators.greatest_k_multi_bernoulli_outcomes_generator(log_probs0, log_probs1=None, k=None)¶

Generator that yields the outcomes of a Multi-Bernoulli in decreasing order of probability. This work by reducing to the problem of generating the subset of a set in increasing order of their sum.

Notes

Bits probability must be in ]0,1[ (i.e. they cannot be zero or one)

For numerical stability, it is possible to provide

Parameters:	log_probs0 (numpy.ndarray) – log probabilities for bits to be zero log_probs1 (numpy.ndarray, optional) – log probabilities for bits to be one if not given log_probs1 = np.log(1-np.exp(log_probs0)) which might be unstable k (int, optional) – The maximum number of outcomes to yield, by default all outcomes are yielded
Yields:	bits (tuple) – The bits outcomes in decreasing order of prabability

radbm.utils.stats.generators.least_k_subset_sum_generator(values, k=None)¶

Generator that yields the subset of index of values in increasing order of their sum. The values must be all positive.

Parameters:	values (numpy.ndarray) – The values from which to take the subsets
Yields:	subset (tuple) – Subset of index of the values in increasing order of their sum

radbm.utils.stats.hypergeometric.hypergeometric(N, K)¶

This function compute the pmf of the Hypergeometric(N, K, n) for each possible value of n (i.e. n in {0,1,2,…,N})

In the context where an urn with N marbles contains K white marbles, this function outputs an array P of shape (N+1,K+1) where P[i, j] correspond to the probabillity that with i samples without replacement we select j white marbles.

Parameters:	N (int) – The number of marbles in the urn K (int) – The number of white marbles in the urn
Returns:	P – P[i, j] is the probability that with i samples without replacement we select j white marbles. P.shape is (N+1, K+1)
Return type:	numpy.ndarray

Notes

This is equivalent to np.array([scipy.stats.hypergeom(N, K, n).pmf(range(0,K+1)) for n in range(N+1)]) but faster, if only a row is needed (e.g. P[i]) than using scipy is faster.

This algorithm uses the following recursive formula P[i, j] = P[i-1, j]*(1-Q[i-1, j]) + P[i-1, j-1]*Q[i-1, j-1] with Q[i, j] = (K-j)/(N-i) the probability of sampling a white marble given that i marbles where sampled from which j were white

radbm.utils.stats.hypergeometric.superdupergeometric(N, K)¶

This is the scenario where we have an urn with N marbles and K of them are white. We sample from the urn without replacement until we obtain k white marbles. The function gives the probability that we need n samples to obtain k white marbles.

Parameters:	N (int) – The number of marbles in the urn K (int) – The number of white marbles in the urn
Returns:	SP – SP[i, j] is the probability that it requires i samples without replacement to select j white marbles. SP.shape is (N+1, K+1)
Return type:	numpy.ndarray

Notes

Probabibly related to the negative hypergeometric distribution

This uses the hypergeometric (hence the name) SP[i, j] = P[i-1, j-1]*Q[i-1, j-1] where P = hypergeometric(N, K) and with Q[i, j] = (K-j)/(N-i) the probability of sampling a white marble given that i marbles where sampled from which j were white

radbm.utils.stats.hypergeometric.superdupergeometric_expectations(N, K)¶

This is the scenario where we have an urn with N marbles and K of them are white. We sample from the urn without replacement until we obtain k white marbles. The function gives the expected number of samples n requires to get k white marbles (for each k).

Parameters:	N (int) – The number of marbles in the urn K (int) – The number of white marbles in the urn
Returns:	ESP – ESP[k] is the expected number of samples without replacement requires to get k white marbles. ESP.shape is (K+1,)
Return type:	numpy.ndarray

Notes

This is equivalent (but way faster) to (SP*np.arange(N+1)[:,None]).sum(axis=0) where SP = superdupergeometric(N, K)

Others¶

radbm.utils.Ramp(x0, x1, y0, y1)¶

Parameters:	x0 (float) – The input value where we start ramping x1 (float) – The input value where we stop ramping y0 (float) – The output value where the ramp starts y1 (float) – The output value where the ramp stops
Returns:	ramp – The ramping function
Return type:	function float -> float

radbm.utils.unique_list(it)¶

Create a list from an iterable with only unique element and where the order is preserved.

Parameters:	it (iterable) – Items should be hashable and comparable Returns (list) – All items in the list is unique and the order of the iterable is preserved.

radbm.utils.os.safe_load(path, map_location=None)¶

Safely load an object if .tmp is present it will be prefered unless it is corrupted (i.e. pickle.load fails)

Parameters:	path (str) – the path where to object is saved
Returns:	obj – the object loaded
Return type:	object

radbm.utils.os.safe_save(obj, path, pickle_protocol=2)¶

Safely saves obj in path by creating .tmp file before ovewriting an existing file

Parameters:	obj (object) – the object to be saved path (str) – the path where to save obj pickle_protocol (int (optional)) – The pickle protocol to uses. By default, the torch default_protocol is used.
Returns:	obj – the same obj received as input
Return type:	object

radbm.utils.gdrive.download.available_files()¶

Returns:	files – The files available for download
Return type:	list of str

radbm.utils.gdrive.download.download_file(file, path=None, verbose=False)¶

Download a file from Google Drive. This uses the gdown package. Warning, this function overwrite any existing file.

Parameters:	file (str) – The name of the file to download. Use available_files() to see which files are available for download. path (str, optional (default=file)) – The path where to download the file. If the path does not exists it will be created. This should contain the filename. verbose (bool, optional (default=False)) – If True output the download progress.
Returns:	path – The path where the file has been downloaded.
Return type:	str

radbm.utils.fetch.expend_paths(paths, subpaths)¶

Parameters:	dirs (list of str (directory path)) – subpaths (list of str (sub-directory path)) –

radbm.utils.fetch.fetch_file(file, path=None, data_type=None, subdirs=None, download=True)¶

lookup on the machine for file otherwise download it.

Parameters:	file (str) – The name of the file to fetch path (str, optional (default=None)) – The principal path to look for the file. If the find is not found, this function will attemp to download to path to this file. data_type (str ('dataset' or 'model'), optional (default=None)) – Modifies the path to consider when looking for the file. If path is None and the file is not found, it will affect where to the file is downloaded, see get_directories_list for more information. subdirs (list of str, optional (default=None)) – Additional sub-directories to lookup download (bool) – A boolean to indicate if we want to download the file if not found on the machine
Returns:	paths_list – The list of path where to find the file
Return type:	list of str

radbm.utils.fetch.get_directories_list(path=None, data_type=None)¶

This function return the list of directory that is relevant for RADBM in order, it returns:

path $DATASETS_DIR if data_type==’dataset’ $MODELS_DIR if data_type==’model’ $PYTHONRADBM_DUMP/<data_type> $HOME/.radbm/<data_type> .

Parameters:	path (str, optional) – The path to look first, this is helpful when a user specifies a path to file or relevant directory data_type (str, optional) – should be ‘dataset’ or ‘model’ this tell to lookup for environment variables specific to dataset or model
Returns:	dirs – The relevant directories in order
Return type:	list of str