infcalcs

Information theory calculations.

This package implements calculation of information-theoretic quantities, in particular estimates of mutual information and channel capacity for continuous variables.

For information on building, testing, and using the code, see the README file.

The steps in the channel capacity calculation are outlined below. For more details on the theory underlying the approach taken, see the supplementary information for Suderman, Bachman et al. (2016).

- In the top-level main function, infcalcs.EstCC, command-line arguments are parsed and the data and configuration parameters are loaded into a infcalcs.CalcConfig object.

- Because the channel capacity depends on the input distribution, various input weights are generated to determine which input weightings yield the highest mutual information between input and output. Input weights are generated using the functions infcalcs.EstimateCC.genUnimodalWeights and infcalcs.EstimateCC.genBimodalWeights, which allow weighting of the input distribution according to unimodal and bimodal Gaussian distributions, respectively, as well as discrete piecewise distributions with uniform probability using the infcalcs.EstimateCC.genPieceWiseUniformWeights.

- Mutual information is estimated for each proposed input weighting by the function infcalcs.EstimateMI.genEstimatesMult.

- For each weighting, the algorithm tries a wide variety of bin numbers/sizes to arrive at an estimate that is not biased by the bin size.

- For each unique combination of input/output bin sizes, the algorithm builds the contingency tables for the raw data as well as for randomly selected subsamples of the data. These contingency tables are generated by infcalcs.EstimateMI.buildRegData.

- After calculating the mutual information for each contingency table (implemented in infcalcs.tables.CTable.mutualInformation) the unbiased mutual information is estimated by performing a linear regression of the mutual information of each subsampled dataset against the inverse sample size; the intercept of the linear regression is the unbiased mutual information estimate. The regression calculation is performed by infcalcs.EstimateMI.calcMultRegs.

- Because increasing the number of bins can artifactually inflate estimates of MI, identical MI calculations are performed on shuffled datasets of varying sizes. The function infcalcs.EstimateMI.optMIMult then selects the MI estimate that maximizes the MI estimate for real data while keeping the the MI estimate for randomized data below the cutoff specified in the configuration.

- infcalcs.EstimateCC.estimateCC then reports the channel capacity estimate as the maximum mutual information estimate obtained for all of the input weightings tested.

Linear Supertypes

AnyRef, Any

Type Members

case class BWeight(p1: Pair[Double], p2: Pair[Double], w: Double, bt: Tree[Bin]) extends Weight with Product with Serializable

Weight whose weights are bimodal and derived from the sum of two Gaussian probability distributions
Weight whose weights are bimodal and derived from the sum of two Gaussian probability distributions
p1
(mu, sigma) for the first Gaussian
p2
(mu, sigma) for the second Gaussian
w
relative weight of the first Gaussian to the second (must be between 0 and 1)
bt
Tree of Bin instances defining signal space
case class Bin(index: Int, values: List[Double], lowerBound: Double) extends Product with Serializable

Case class representing a discrete entity in one dimension of a contingency table, (equivalent to a histogram bin).
Case class representing a discrete entity in one dimension of a contingency table, (equivalent to a histogram bin).
index
bin number
values
list of entries in bin
lowerBound
bin can contain values above this count (non-inclusive)
trait CLOpts extends AnyRef

Parser for command line options, inherited by EstCC
class CalcConfig extends AnyRef

Class passed to numerous functions and methods that contains configuration information for channel capacity estimation
case class Calculation(pairBinTuple: Pair[NTuple[Int]], mi: Double, randMi: Double, unBiased: Boolean) extends Product with Serializable

Case class for storing calculations without regression estimator
case class Config(verbose: Boolean = false, noReg: Boolean = false, dataFile: String = "", paramFile: String = "", seed: Int = 1, cores: Int = 1) extends Product with Serializable

Case class whose fields comprise the required and optional command line options for channel capacity calculation.
Case class whose fields comprise the required and optional command line options for channel capacity calculation.
For more details see the usage text with the flag "--help"
case class CtPos(coord: Pair[Int], cumProb: Double, lowerBound: Double) extends Product with Serializable

Case class that contains the cumulative probability of a particular location in a contingency table
Case class that contains the cumulative probability of a particular location in a contingency table
coord
(row,col) location in a contingency table
cumProb
cumulative probability
lowerBound
lower bound on probability (non-inclusive)
case class CustomWeight(label: String, wts: List[Double]) extends Weight with Product with Serializable

Weight whose weights are defined by the user
Weight whose weights are defined by the user
wts
list of doubles to serve as weights
class DRData extends AnyRef

Structure for input-output (dose-response) data.
Structure for input-output (dose-response) data.
Both the signal and response points are entered into Vectors whose elements are themselves Vectors of length N, where N is the number of distinct signal or response types of the system (i.e. given a channel which simultaneously takes signal A and signal B, the input Vector would have elements of length 2.)
case class EstTuple(pairBinTuples: Pair[NTuple[Int]], estimates: Option[Estimates], weight: Option[Weight], unbiased: Boolean) extends Product with Serializable

Case class that contains all mutual information estimates for a particular pair of signal and response bin sets and a Weight
Case class that contains all mutual information estimates for a particular pair of signal and response bin sets and a Weight
pairBinTuples
numbers of bins for given data dimensionality
estimates
pairs of (mean, 95% conf) values
unbiased
true if estimate is not biased
case class EstTupleBS(pairBinTuples: Pair[NTuple[Int]], estimates: Option[EstimateBS], weight: Option[Weight], unbiased: Boolean) extends Product with Serializable

Alternative to EstTuple for use with bootstrapping approach
case class EstimateBS(dataEstimate: (Double, Pair[Double]), randDataEstimate: (Double, Pair[Double]), coeffOfDetermination: Double) extends Product with Serializable

Case class with actual and 1 randomized data set estimates using a bootstrapping approach
Case class with actual and 1 randomized data set estimates using a bootstrapping approach
dataEstimate
mean and 95% confidence interval bounds
case class Estimates(dataEstimate: Pair[Double], randDataEstimate: List[Pair[Double]], coeffOfDetermination: Double) extends Product with Serializable

Case class with the actual and randomized mutual information estimates.
Case class with the actual and randomized mutual information estimates. Also includes the coefficient of determination for the actual linear fit
case class GWeight(p: Pair[Double], bt: Tree[Bin]) extends Weight with Product with Serializable

Weight whose weights are generated from a Gaussian probability distribution
Weight whose weights are generated from a Gaussian probability distribution
p
(mu, sigma) for calculating the Gaussian probability
bt
Tree of Bin instances defining signal space
type NTuple[T] = Vector[T]

An n-tuple (for clarity)
case class Node[T](index: Int, value: Some[T], left: Tree[T], right: Tree[T]) extends Tree[T] with Product with Serializable

A node representing the head of a subtree in a binary tree.
A node representing the head of a subtree in a binary tree.
index
position in tree
value
value of the Node
left
left subTree
right
right sub Tree
class OLS extends AnyRef

Class that perfoms ordinary least squares regression
case class PWeight(b: Pair[Int], bt: Tree[Bin]) extends Weight with Product with Serializable

Weight whose weights are uniformly distributed over interior bins, and exterior bins are weighted to be 0
Weight whose weights are uniformly distributed over interior bins, and exterior bins are weighted to be 0
b
(low, high) all bins below 'low' and above 'high' are weighted to 0
bt
Tree of Bin instances defining signal space
type Pair[T] = (T, T)

A two-tuple.
case class Parameters(listParams: Map[String, List[Double]], numParams: Map[String, Double], boolParams: Map[String, Boolean], stringParams: Map[String, String], sigRespParams: Map[String, Option[Vector[NTuple[Double]]]]) extends Product with Serializable

Case class holding the calculation parameters as denoted by the InfConfig object and optional paramter file.
Case class holding the calculation parameters as denoted by the InfConfig object and optional paramter file.
listParams
parameters that have list values
numParams
parameters that have numeric values
boolParams
parameters that have boolean values
stringParams
parameters that have string values
sigRespParams
(optional) parameters governing signal/response space
case class RegData(subCalcs: Seq[SubCalc], label: String) extends Product with Serializable

Case class holding all the necessary SubCalc data for performing the linear regression estimates
case class RegDataRand(subCalcs: Vector[Vector[SubCalc]], label: String, value: String = "mutualInformation") extends Product with Serializable

Same as RegData but holds arbitrary numbers of SubCalc instances for randomization purposes
case class SubCalc(inv: Double, table: CTable[Double]) extends Product with Serializable

Case class for holding a infcalcs.tables.CTable instance and the inverse of its sample size, resulting from a subsampling operation
sealed trait Tree[+T] extends AnyRef

Abstract binary tree definition that defines the methods and properties of each node in the tree: regular nodes are implemented by the class Node, while empty nodes (indicating that their parent is a terminal node) are implemented by the singleton object EmptyTree.
Abstract binary tree definition that defines the methods and properties of each node in the tree: regular nodes are implemented by the class Node, while empty nodes (indicating that their parent is a terminal node) are implemented by the singleton object EmptyTree.
Implemented by the class Node and the object EmptyTree. A binary tree for a set of numbers (Doubles) is constructed by first obtaining an ordered (but unconnected) list of Node instances using the function Tree.buildOrderedNodeList. The ordered list of Nodes returned by this function can then be passed to the function Tree.buildTree to return an instance of Tree containing the binary tree.
case class UWeight(bt: Tree[Bin]) extends Weight with Product with Serializable

Weight whose weights are distributed uniformly over all bins
Weight whose weights are distributed uniformly over all bins
bt
Tree of Bin instances defining signal space
trait Weight extends AnyRef

Mixin that defines key aspects of a novel probability function

Value Members

object CTBuild

Contains methods for building contingency tables from data.
object CalcConfig

Companion object for CalcConfig
object EmptyTree extends Tree[Nothing] with Product with Serializable

Singleton object representing a nonexistent child of a terminal node.
object EstCC extends App with CLOpts

Top-level main function for channel capacity calculation.
Top-level main function for channel capacity calculation.
- Collects command-line arguments; - Loads the data; - Sets configuration parameters; - Generates unimodal and bimodal weights; - Calculates channel capacity for each weighting scheme.
object EstimateCC

Contains functions for generating signal weights and estimating the channel capacity.
Contains functions for generating signal weights and estimating the channel capacity.
Most importantly are the EstimateCC.estimateCC and EstimateCC.estimateCCVerbose functions which are used in the EstCC main object to estimate the channel capacity.
The weighting functions EstimateCC.genUnimodalWeights and EstimateCC.genBimodalWeights generate weights for unimodal and bimodal Gaussian-based probability density functions and EstimateCC.genPieceWiseUniformWeights produces discrete piecewise probability mass functions by iteratively selecting signal values to omit from the mutual information estimation.
object EstimateMI

Contains methods for estimating mutual information.
Contains methods for estimating mutual information.
The principal methods used by top-level calling functions are: - EstimateMI.genEstimatesMult, which takes a dataset and returns mutual information estimates for each attempted bin size, and - EstimateMI.optMIMult, which takes the mutual information estimates produced by EstimateMI.genEstimatesMult and finds the bin size and mutual information that maximizes the mutual information for the real but not the randomized datasets.
Other important methods include: - EstimateMI.buildRegData, which builds resampled and randomized contingency tables - EstimateMI.calcMultRegs, which estimates the mutual information by linear regression.
object IOFile

This object contains methods for writing and reading various types of data to and from files.
object InfConfig

Defines default values for the channel capacity calculation parameters.
object MathFuncs

Contains a handful of useful mathematical functions.
object OLS

Companion object to OLS class
object Orderings

Defines orderings for various case classes employed in Tree searches
object ParameterFuncs

Contains a handful of utility functions.
object Tree

Companion object to the infcalcs.Tree trait.
object Weight

Companion object to Weight trait.
package actors
package exceptions
package tables

package infcalcs

Type Members

case class BWeight(p1: Pair[Double], p2: Pair[Double], w: Double, bt: Tree[Bin]) extends Weight with Product with Serializable

case class Bin(index: Int, values: List[Double], lowerBound: Double) extends Product with Serializable

trait CLOpts extends AnyRef

class CalcConfig extends AnyRef

case class Calculation(pairBinTuple: Pair[NTuple[Int]], mi: Double, randMi: Double, unBiased: Boolean) extends Product with Serializable

case class Config(verbose: Boolean = false, noReg: Boolean = false, dataFile: String = "", paramFile: String = "", seed: Int = 1, cores: Int = 1) extends Product with Serializable

case class CtPos(coord: Pair[Int], cumProb: Double, lowerBound: Double) extends Product with Serializable

case class CustomWeight(label: String, wts: List[Double]) extends Weight with Product with Serializable

class DRData extends AnyRef

case class EstTuple(pairBinTuples: Pair[NTuple[Int]], estimates: Option[Estimates], weight: Option[Weight], unbiased: Boolean) extends Product with Serializable

case class EstTupleBS(pairBinTuples: Pair[NTuple[Int]], estimates: Option[EstimateBS], weight: Option[Weight], unbiased: Boolean) extends Product with Serializable

case class EstimateBS(dataEstimate: (Double, Pair[Double]), randDataEstimate: (Double, Pair[Double]), coeffOfDetermination: Double) extends Product with Serializable

case class Estimates(dataEstimate: Pair[Double], randDataEstimate: List[Pair[Double]], coeffOfDetermination: Double) extends Product with Serializable

case class GWeight(p: Pair[Double], bt: Tree[Bin]) extends Weight with Product with Serializable

type NTuple[T] = Vector[T]

case class Node[T](index: Int, value: Some[T], left: Tree[T], right: Tree[T]) extends Tree[T] with Product with Serializable

class OLS extends AnyRef

case class PWeight(b: Pair[Int], bt: Tree[Bin]) extends Weight with Product with Serializable

type Pair[T] = (T, T)

case class Parameters(listParams: Map[String, List[Double]], numParams: Map[String, Double], boolParams: Map[String, Boolean], stringParams: Map[String, String], sigRespParams: Map[String, Option[Vector[NTuple[Double]]]]) extends Product with Serializable

case class RegData(subCalcs: Seq[SubCalc], label: String) extends Product with Serializable

case class RegDataRand(subCalcs: Vector[Vector[SubCalc]], label: String, value: String = "mutualInformation") extends Product with Serializable

case class SubCalc(inv: Double, table: CTable[Double]) extends Product with Serializable

sealed trait Tree[+T] extends AnyRef

case class UWeight(bt: Tree[Bin]) extends Weight with Product with Serializable

trait Weight extends AnyRef

Value Members

object CTBuild

object CalcConfig

object EmptyTree extends Tree[Nothing] with Product with Serializable

object EstCC extends App with CLOpts

object EstimateCC

object EstimateMI

object IOFile

object InfConfig

object MathFuncs

object OLS

object Orderings

object ParameterFuncs

object Tree

object Weight

package actors

package exceptions

package tables

Inherited from AnyRef

Inherited from Any

Ungrouped