Weight whose weights are bimodal and derived from the sum of two Gaussian probability distributions
Case class representing a discrete entity in one dimension of a contingency table, (equivalent to a histogram bin).
Case class representing a discrete entity in one dimension of a contingency table, (equivalent to a histogram bin).
bin number
list of entries in bin
bin can contain values above this count (non-inclusive)
Parser for command line options, inherited by EstCC
Class passed to numerous functions and methods that contains configuration information for channel capacity estimation
Case class for storing calculations without regression estimator
Case class whose fields comprise the required and optional command line options for channel capacity calculation.
Case class whose fields comprise the required and optional command line options for channel capacity calculation.
For more details see the usage text with the flag "--help"
Case class that contains the cumulative probability of a particular location in a contingency table
Case class that contains the cumulative probability of a particular location in a contingency table
(row,col) location in a contingency table
cumulative probability
lower bound on probability (non-inclusive)
Weight whose weights are defined by the user
Weight whose weights are defined by the user
list of doubles to serve as weights
Structure for input-output (dose-response) data.
Structure for input-output (dose-response) data.
Both the signal and response points are entered into Vectors whose elements are themselves Vectors of length N, where N is the number of distinct signal or response types of the system (i.e. given a channel which simultaneously takes signal A and signal B, the input Vector would have elements of length 2.)
Case class that contains all mutual information estimates for a particular pair of signal and response bin sets and a Weight
Case class that contains all mutual information estimates for a particular pair of signal and response bin sets and a Weight
numbers of bins for given data dimensionality
pairs of (mean, 95% conf) values
true if estimate is not biased
Alternative to EstTuple for use with bootstrapping approach
Case class with actual and 1 randomized data set estimates using a bootstrapping approach
Case class with actual and 1 randomized data set estimates using a bootstrapping approach
mean and 95% confidence interval bounds
Case class with the actual and randomized mutual information estimates.
Case class with the actual and randomized mutual information estimates. Also includes the coefficient of determination for the actual linear fit
Weight whose weights are generated from a Gaussian probability distribution
An n-tuple (for clarity)
A node representing the head of a subtree in a binary tree.
Class that perfoms ordinary least squares regression
Weight whose weights are uniformly distributed over interior bins, and exterior bins are weighted to be 0
A two-tuple.
Case class holding the calculation parameters as denoted by the InfConfig object and optional paramter file.
Case class holding the calculation parameters as denoted by the InfConfig object and optional paramter file.
parameters that have list values
parameters that have numeric values
parameters that have boolean values
parameters that have string values
(optional) parameters governing signal/response space
Case class holding all the necessary SubCalc data for performing the linear regression estimates
Same as RegData but holds arbitrary numbers of SubCalc instances for randomization purposes
Case class for holding a infcalcs.tables.CTable instance and the inverse of its sample size, resulting from a subsampling operation
Abstract binary tree definition that defines the methods and properties of each node in the tree: regular nodes are implemented by the class Node, while empty nodes (indicating that their parent is a terminal node) are implemented by the singleton object EmptyTree.
Abstract binary tree definition that defines the methods and properties of each node in the tree: regular nodes are implemented by the class Node, while empty nodes (indicating that their parent is a terminal node) are implemented by the singleton object EmptyTree.
Implemented by the class Node and the object EmptyTree. A binary tree for a set of numbers (Doubles) is constructed by first obtaining an ordered (but unconnected) list of Node instances using the function Tree.buildOrderedNodeList. The ordered list of Nodes returned by this function can then be passed to the function Tree.buildTree to return an instance of Tree containing the binary tree.
Weight whose weights are distributed uniformly over all bins
Mixin that defines key aspects of a novel probability function
Contains methods for building contingency tables from data.
Companion object for CalcConfig
Singleton object representing a nonexistent child of a terminal node.
Top-level main function for channel capacity calculation.
Top-level main function for channel capacity calculation.
- Collects command-line arguments; - Loads the data; - Sets configuration parameters; - Generates unimodal and bimodal weights; - Calculates channel capacity for each weighting scheme.
Contains functions for generating signal weights and estimating the channel capacity.
Contains functions for generating signal weights and estimating the channel capacity.
Most importantly are the EstimateCC.estimateCC and EstimateCC.estimateCCVerbose functions which are used in the EstCC main object to estimate the channel capacity.
The weighting functions EstimateCC.genUnimodalWeights and EstimateCC.genBimodalWeights generate weights for unimodal and bimodal Gaussian-based probability density functions and EstimateCC.genPieceWiseUniformWeights produces discrete piecewise probability mass functions by iteratively selecting signal values to omit from the mutual information estimation.
Contains methods for estimating mutual information.
Contains methods for estimating mutual information.
The principal methods used by top-level calling functions are: - EstimateMI.genEstimatesMult, which takes a dataset and returns mutual information estimates for each attempted bin size, and - EstimateMI.optMIMult, which takes the mutual information estimates produced by EstimateMI.genEstimatesMult and finds the bin size and mutual information that maximizes the mutual information for the real but not the randomized datasets.
Other important methods include: - EstimateMI.buildRegData, which builds resampled and randomized contingency tables - EstimateMI.calcMultRegs, which estimates the mutual information by linear regression.
This object contains methods for writing and reading various types of data to and from files.
Defines default values for the channel capacity calculation parameters.
Contains a handful of useful mathematical functions.
Companion object to OLS class
Defines orderings for various case classes employed in Tree searches
Contains a handful of utility functions.
Companion object to the infcalcs.Tree trait.
Companion object to Weight trait.
Information theory calculations.
This package implements calculation of information-theoretic quantities, in particular estimates of mutual information and channel capacity for continuous variables.
For information on building, testing, and using the code, see the README file.
The steps in the channel capacity calculation are outlined below. For more details on the theory underlying the approach taken, see the supplementary information for Suderman, Bachman et al. (2016).
- In the top-level main function, infcalcs.EstCC, command-line arguments are parsed and the data and configuration parameters are loaded into a infcalcs.CalcConfig object.
- Because the channel capacity depends on the input distribution, various input weights are generated to determine which input weightings yield the highest mutual information between input and output. Input weights are generated using the functions infcalcs.EstimateCC.genUnimodalWeights and infcalcs.EstimateCC.genBimodalWeights, which allow weighting of the input distribution according to unimodal and bimodal Gaussian distributions, respectively, as well as discrete piecewise distributions with uniform probability using the infcalcs.EstimateCC.genPieceWiseUniformWeights.
- Mutual information is estimated for each proposed input weighting by the function infcalcs.EstimateMI.genEstimatesMult.
- For each weighting, the algorithm tries a wide variety of bin numbers/sizes to arrive at an estimate that is not biased by the bin size.
- For each unique combination of input/output bin sizes, the algorithm builds the contingency tables for the raw data as well as for randomly selected subsamples of the data. These contingency tables are generated by infcalcs.EstimateMI.buildRegData.
- After calculating the mutual information for each contingency table (implemented in infcalcs.tables.CTable.mutualInformation) the unbiased mutual information is estimated by performing a linear regression of the mutual information of each subsampled dataset against the inverse sample size; the intercept of the linear regression is the unbiased mutual information estimate. The regression calculation is performed by infcalcs.EstimateMI.calcMultRegs.
- Because increasing the number of bins can artifactually inflate estimates of MI, identical MI calculations are performed on shuffled datasets of varying sizes. The function infcalcs.EstimateMI.optMIMult then selects the MI estimate that maximizes the MI estimate for real data while keeping the the MI estimate for randomized data below the cutoff specified in the configuration.
- infcalcs.EstimateCC.estimateCC then reports the channel capacity estimate as the maximum mutual information estimate obtained for all of the input weightings tested.