Package

infcalcs

Permalink

package infcalcs

Information theory calculations.

This package implements calculation of information-theoretic quantities, in particular estimates of mutual information and channel capacity for continuous variables.

For information on building, testing, and using the code, see the README file.

The steps in the channel capacity calculation are outlined below. For more details on the theory underlying the approach taken, see the supplementary information for Suderman, Bachman et al. (2016).

- In the top-level main function, infcalcs.EstCC, command-line arguments are parsed and the data and configuration parameters are loaded into a infcalcs.CalcConfig object.

- Because the channel capacity depends on the input distribution, various input weights are generated to determine which input weightings yield the highest mutual information between input and output. Input weights are generated using the functions infcalcs.EstimateCC.genUnimodalWeights and infcalcs.EstimateCC.genBimodalWeights, which allow weighting of the input distribution according to unimodal and bimodal Gaussian distributions, respectively, as well as discrete piecewise distributions with uniform probability using the infcalcs.EstimateCC.genPieceWiseUniformWeights.

- Mutual information is estimated for each proposed input weighting by the function infcalcs.EstimateMI.genEstimatesMult.

- For each weighting, the algorithm tries a wide variety of bin numbers/sizes to arrive at an estimate that is not biased by the bin size.

- For each unique combination of input/output bin sizes, the algorithm builds the contingency tables for the raw data as well as for randomly selected subsamples of the data. These contingency tables are generated by infcalcs.EstimateMI.buildRegData.

- After calculating the mutual information for each contingency table (implemented in infcalcs.tables.CTable.mutualInformation) the unbiased mutual information is estimated by performing a linear regression of the mutual information of each subsampled dataset against the inverse sample size; the intercept of the linear regression is the unbiased mutual information estimate. The regression calculation is performed by infcalcs.EstimateMI.calcMultRegs.

- Because increasing the number of bins can artifactually inflate estimates of MI, identical MI calculations are performed on shuffled datasets of varying sizes. The function infcalcs.EstimateMI.optMIMult then selects the MI estimate that maximizes the MI estimate for real data while keeping the the MI estimate for randomized data below the cutoff specified in the configuration.

- infcalcs.EstimateCC.estimateCC then reports the channel capacity estimate as the maximum mutual information estimate obtained for all of the input weightings tested.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. infcalcs
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. case class BWeight(p1: Pair[Double], p2: Pair[Double], w: Double, bt: Tree[Bin]) extends Weight with Product with Serializable

    Permalink

    Weight whose weights are bimodal and derived from the sum of two Gaussian probability distributions

    Weight whose weights are bimodal and derived from the sum of two Gaussian probability distributions

    p1

    (mu, sigma) for the first Gaussian

    p2

    (mu, sigma) for the second Gaussian

    w

    relative weight of the first Gaussian to the second (must be between 0 and 1)

    bt

    Tree of Bin instances defining signal space

  2. case class Bin(index: Int, values: List[Double], lowerBound: Double) extends Product with Serializable

    Permalink

    Case class representing a discrete entity in one dimension of a contingency table, (equivalent to a histogram bin).

    Case class representing a discrete entity in one dimension of a contingency table, (equivalent to a histogram bin).

    index

    bin number

    values

    list of entries in bin

    lowerBound

    bin can contain values above this count (non-inclusive)

  3. trait CLOpts extends AnyRef

    Permalink

    Parser for command line options, inherited by EstCC

  4. class CalcConfig extends AnyRef

    Permalink

    Class passed to numerous functions and methods that contains configuration information for channel capacity estimation

  5. case class Calculation(pairBinTuple: Pair[NTuple[Int]], mi: Double, randMi: Double, unBiased: Boolean) extends Product with Serializable

    Permalink

    Case class for storing calculations without regression estimator

  6. case class Config(verbose: Boolean = false, noReg: Boolean = false, dataFile: String = "", paramFile: String = "", seed: Int = 1, cores: Int = 1) extends Product with Serializable

    Permalink

    Case class whose fields comprise the required and optional command line options for channel capacity calculation.

    Case class whose fields comprise the required and optional command line options for channel capacity calculation.

    For more details see the usage text with the flag "--help"

  7. case class CtPos(coord: Pair[Int], cumProb: Double, lowerBound: Double) extends Product with Serializable

    Permalink

    Case class that contains the cumulative probability of a particular location in a contingency table

    Case class that contains the cumulative probability of a particular location in a contingency table

    coord

    (row,col) location in a contingency table

    cumProb

    cumulative probability

    lowerBound

    lower bound on probability (non-inclusive)

  8. case class CustomWeight(label: String, wts: List[Double]) extends Weight with Product with Serializable

    Permalink

    Weight whose weights are defined by the user

    Weight whose weights are defined by the user

    wts

    list of doubles to serve as weights

  9. class DRData extends AnyRef

    Permalink

    Structure for input-output (dose-response) data.

    Structure for input-output (dose-response) data.

    Both the signal and response points are entered into Vectors whose elements are themselves Vectors of length N, where N is the number of distinct signal or response types of the system (i.e. given a channel which simultaneously takes signal A and signal B, the input Vector would have elements of length 2.)

  10. case class EstTuple(pairBinTuples: Pair[NTuple[Int]], estimates: Option[Estimates], weight: Option[Weight], unbiased: Boolean) extends Product with Serializable

    Permalink

    Case class that contains all mutual information estimates for a particular pair of signal and response bin sets and a Weight

    Case class that contains all mutual information estimates for a particular pair of signal and response bin sets and a Weight

    pairBinTuples

    numbers of bins for given data dimensionality

    estimates

    pairs of (mean, 95% conf) values

    unbiased

    true if estimate is not biased

  11. case class EstTupleBS(pairBinTuples: Pair[NTuple[Int]], estimates: Option[EstimateBS], weight: Option[Weight], unbiased: Boolean) extends Product with Serializable

    Permalink

    Alternative to EstTuple for use with bootstrapping approach

  12. case class EstimateBS(dataEstimate: (Double, Pair[Double]), randDataEstimate: (Double, Pair[Double]), coeffOfDetermination: Double) extends Product with Serializable

    Permalink

    Case class with actual and 1 randomized data set estimates using a bootstrapping approach

    Case class with actual and 1 randomized data set estimates using a bootstrapping approach

    dataEstimate

    mean and 95% confidence interval bounds

  13. case class Estimates(dataEstimate: Pair[Double], randDataEstimate: List[Pair[Double]], coeffOfDetermination: Double) extends Product with Serializable

    Permalink

    Case class with the actual and randomized mutual information estimates.

    Case class with the actual and randomized mutual information estimates. Also includes the coefficient of determination for the actual linear fit

  14. case class GWeight(p: Pair[Double], bt: Tree[Bin]) extends Weight with Product with Serializable

    Permalink

    Weight whose weights are generated from a Gaussian probability distribution

    Weight whose weights are generated from a Gaussian probability distribution

    p

    (mu, sigma) for calculating the Gaussian probability

    bt

    Tree of Bin instances defining signal space

  15. type NTuple[T] = Vector[T]

    Permalink

    An n-tuple (for clarity)

  16. case class Node[T](index: Int, value: Some[T], left: Tree[T], right: Tree[T]) extends Tree[T] with Product with Serializable

    Permalink

    A node representing the head of a subtree in a binary tree.

    A node representing the head of a subtree in a binary tree.

    index

    position in tree

    value

    value of the Node

    left

    left subTree

    right

    right sub Tree

  17. class OLS extends AnyRef

    Permalink

    Class that perfoms ordinary least squares regression

  18. case class PWeight(b: Pair[Int], bt: Tree[Bin]) extends Weight with Product with Serializable

    Permalink

    Weight whose weights are uniformly distributed over interior bins, and exterior bins are weighted to be 0

    Weight whose weights are uniformly distributed over interior bins, and exterior bins are weighted to be 0

    b

    (low, high) all bins below 'low' and above 'high' are weighted to 0

    bt

    Tree of Bin instances defining signal space

  19. type Pair[T] = (T, T)

    Permalink

    A two-tuple.

  20. case class Parameters(listParams: Map[String, List[Double]], numParams: Map[String, Double], boolParams: Map[String, Boolean], stringParams: Map[String, String], sigRespParams: Map[String, Option[Vector[NTuple[Double]]]]) extends Product with Serializable

    Permalink

    Case class holding the calculation parameters as denoted by the InfConfig object and optional paramter file.

    Case class holding the calculation parameters as denoted by the InfConfig object and optional paramter file.

    listParams

    parameters that have list values

    numParams

    parameters that have numeric values

    boolParams

    parameters that have boolean values

    stringParams

    parameters that have string values

    sigRespParams

    (optional) parameters governing signal/response space

  21. case class RegData(subCalcs: Seq[SubCalc], label: String) extends Product with Serializable

    Permalink

    Case class holding all the necessary SubCalc data for performing the linear regression estimates

  22. case class RegDataRand(subCalcs: Vector[Vector[SubCalc]], label: String, value: String = "mutualInformation") extends Product with Serializable

    Permalink

    Same as RegData but holds arbitrary numbers of SubCalc instances for randomization purposes

  23. case class SubCalc(inv: Double, table: CTable[Double]) extends Product with Serializable

    Permalink

    Case class for holding a infcalcs.tables.CTable instance and the inverse of its sample size, resulting from a subsampling operation

  24. sealed trait Tree[+T] extends AnyRef

    Permalink

    Abstract binary tree definition that defines the methods and properties of each node in the tree: regular nodes are implemented by the class Node, while empty nodes (indicating that their parent is a terminal node) are implemented by the singleton object EmptyTree.

    Abstract binary tree definition that defines the methods and properties of each node in the tree: regular nodes are implemented by the class Node, while empty nodes (indicating that their parent is a terminal node) are implemented by the singleton object EmptyTree.

    Implemented by the class Node and the object EmptyTree. A binary tree for a set of numbers (Doubles) is constructed by first obtaining an ordered (but unconnected) list of Node instances using the function Tree.buildOrderedNodeList. The ordered list of Nodes returned by this function can then be passed to the function Tree.buildTree to return an instance of Tree containing the binary tree.

  25. case class UWeight(bt: Tree[Bin]) extends Weight with Product with Serializable

    Permalink

    Weight whose weights are distributed uniformly over all bins

    Weight whose weights are distributed uniformly over all bins

    bt

    Tree of Bin instances defining signal space

  26. trait Weight extends AnyRef

    Permalink

    Mixin that defines key aspects of a novel probability function

Value Members

  1. object CTBuild

    Permalink

    Contains methods for building contingency tables from data.

  2. object CalcConfig

    Permalink

    Companion object for CalcConfig

  3. object EmptyTree extends Tree[Nothing] with Product with Serializable

    Permalink

    Singleton object representing a nonexistent child of a terminal node.

  4. object EstCC extends App with CLOpts

    Permalink

    Top-level main function for channel capacity calculation.

    Top-level main function for channel capacity calculation.

    - Collects command-line arguments; - Loads the data; - Sets configuration parameters; - Generates unimodal and bimodal weights; - Calculates channel capacity for each weighting scheme.

  5. object EstimateCC

    Permalink

    Contains functions for generating signal weights and estimating the channel capacity.

    Contains functions for generating signal weights and estimating the channel capacity.

    Most importantly are the EstimateCC.estimateCC and EstimateCC.estimateCCVerbose functions which are used in the EstCC main object to estimate the channel capacity.

    The weighting functions EstimateCC.genUnimodalWeights and EstimateCC.genBimodalWeights generate weights for unimodal and bimodal Gaussian-based probability density functions and EstimateCC.genPieceWiseUniformWeights produces discrete piecewise probability mass functions by iteratively selecting signal values to omit from the mutual information estimation.

  6. object EstimateMI

    Permalink

    Contains methods for estimating mutual information.

    Contains methods for estimating mutual information.

    The principal methods used by top-level calling functions are: - EstimateMI.genEstimatesMult, which takes a dataset and returns mutual information estimates for each attempted bin size, and - EstimateMI.optMIMult, which takes the mutual information estimates produced by EstimateMI.genEstimatesMult and finds the bin size and mutual information that maximizes the mutual information for the real but not the randomized datasets.

    Other important methods include: - EstimateMI.buildRegData, which builds resampled and randomized contingency tables - EstimateMI.calcMultRegs, which estimates the mutual information by linear regression.

  7. object IOFile

    Permalink

    This object contains methods for writing and reading various types of data to and from files.

  8. object InfConfig

    Permalink

    Defines default values for the channel capacity calculation parameters.

  9. object MathFuncs

    Permalink

    Contains a handful of useful mathematical functions.

  10. object OLS

    Permalink

    Companion object to OLS class

  11. object Orderings

    Permalink

    Defines orderings for various case classes employed in Tree searches

  12. object ParameterFuncs

    Permalink

    Contains a handful of utility functions.

  13. object Tree

    Permalink

    Companion object to the infcalcs.Tree trait.

  14. object Weight

    Permalink

    Companion object to Weight trait.

  15. package actors

    Permalink
  16. package exceptions

    Permalink
  17. package tables

    Permalink

Inherited from AnyRef

Inherited from Any

Ungrouped