Checks that there are fewer (or an equal number of) bins than nonzero CalcConfig.lowerAvgEntryLimit for the lowest fraction of jackknifed data in both dimensions, OR checks that the number of unique values in each dimension is lower than the number of bins in the respective dimensions
Alternate function to generate RegData instances regardless of randomization
Builds a infcalcs.tables.ContingencyTable with or without some specified weight
Builds a infcalcs.tables.ContingencyTable with or without some specified weight
data points entered into table
bins for data points
Weight wrapped in Option monad
specifies if data should be shuffled to destroy information
Calculate the estimated mutual information and bootstrap confidence intervals for a particular binning of the data
Calculate the estimated mutual information and bootstrap confidence intervals for a particular binning of the data
an Estimates instance
Calculates regression model for both original and randomized data.
Calculates regression model for both original and randomized data.
Calculates a linear regression of the mutual information of each subsampled or randomized dataset vs. the inverse sample size. Returns results as a tuple: the first entry in the tuple contains the regression line (as an instance of OLS) for the original dataset; the second entry in the tuple contains a list of regression lines (OLS objects), one for each of numRandTables rounds of randomization. Because linear regression may fail on the randomized data, some entries in the list may be None.
(RegData, RegDataRand) structure as returned by buildRegData.
(regression on original data, list of regressions on random data)
Takes the pair of n-dimensional bin number vectors resulting in maximal mutual information in order to estimate all relevant quantities as defined in tables.CTable.ctVals.
Takes the pair of n-dimensional bin number vectors resulting in maximal mutual information in order to estimate all relevant quantities as defined in tables.CTable.ctVals. These data are outputted to an information file
Alternate to genEstimatesMultImp when using bootstrapping approach
Alternate to genEstimatesMultImp when using bootstrapping approach
Vector of EstTupleBS instances
Gets mutual information estimates for range of response bin sizes by regression.
Gets mutual information estimates for range of response bin sizes by regression.
The range of bin sizes attempted depends on the calculation Parameters which define the response bin spacing and the number of biased calculations before reaching a stop criterion. For each set of bin sizes given, this function: - builds the randomized and resampled contingency tables by calling buildRegData - estimates the unbiased mutual information for the resampled and/or randomized datasets by linear regression, by calling calcMultRegs - extracts the intercepts and confidence intervals from the regression results by calling multIntercepts.
specified bin configuration
Weight wrapped in Option
tracks number of biased estimates
results
An imperative alternative to genEstimatesMult
Filter for determining if mutual information estimates are biased based on the presence of nontrivial quantities of information in randomized data sets
Filter for determining if mutual information estimates are biased based on the presence of nontrivial quantities of information in randomized data sets
Returns intercepts and confidence intervals given multiple regression data.
Returns intercepts and confidence intervals given multiple regression data.
The list of intercepts and intervals returned will be the same length as the list of regression lines forming the second entry in the tuple passed as an argument.
(regression, list of regressions), as from calcMultRegs
Estimates instance
Alternate version of optMIMult that uses data structures compatible with bootstrapping
Returns the MI estimate that is maximized for real but not randomized data.
Returns the MI estimate that is maximized for real but not randomized data.
Takes a list of EstTuple instances for a range of bin sizes, extracted by linear regression for both real and randomized data (eg, as provided by genEstimatesMult) and finds the bin size/MI combination that maximizes the mutual information while still maintaining the estimated MI of the randomized data below the cutoff specified by the "cutoffValue" parameter.
vector of EstTuple instances
Entry from the list d the optimizes the MI estimate.
Generates a new infcalcs.tables.ContingencyTable from a Tree of CtPos instances.
Generates a new infcalcs.tables.ContingencyTable from a Tree of CtPos instances.
This function is a way to generate integer-based infcalcs.tables.ContingencyTable instances with a specified number of entries, given some probability distribution over two dimensions. Generally the probability distribution is obtained from a previous infcalcs.tables.ContingencyTable that was created using the CTBuild.buildTable method.
total number of entries in the new infcalcs.tables.ContingencyTable
rows in the new infcalcs.tables.ContingencyTable
columns in the new infcalcs.tables.ContingencyTable
Tree of CtPos instances that defines the probability distribution
Increments the bins for response space
Increments the bins for signal space
Returns resampled and randomized contingency tables for estimation of MI.
Returns resampled and randomized contingency tables for estimation of MI.
The data structures returned by this function contains all of the information required to calculate the MI for a single pair of bin sizes, including contingency tables for randomly subsampled datasets (for unbiased estimation of MI at each bin size) and randomly shuffled contingency tables (for selection of the appropriate bin size).
Resampling of the dataset is performed using the resample method.
Pair of tuples containing the numbers of row and column bins.
The input/output dataset.
An optional weights vector to be applied to the rows.
Contains methods for estimating mutual information.
The principal methods used by top-level calling functions are: - EstimateMI.genEstimatesMult, which takes a dataset and returns mutual information estimates for each attempted bin size, and - EstimateMI.optMIMult, which takes the mutual information estimates produced by EstimateMI.genEstimatesMult and finds the bin size and mutual information that maximizes the mutual information for the real but not the randomized datasets.
Other important methods include: - EstimateMI.buildRegData, which builds resampled and randomized contingency tables - EstimateMI.calcMultRegs, which estimates the mutual information by linear regression.