Histogram

PortfolioOptimisers.AbstractBinsType

Abstract supertype for all histogram binning algorithms.

AbstractBins is the abstract type for all binning algorithm types used in histogram-based calculations within PortfolioOptimisers.jl, such as mutual information and variation of information analysis. Concrete subtypes implement specific binning strategies (e.g., Knuth, Freedman-Diaconis, Scott, Hacine-Gharbi-Ravier) and provide a consistent interface for bin selection.

Related

source
PortfolioOptimisers.AstroPyBinsType
abstract type AstroPyBins <: AbstractBins end

Abstract supertype for all histogram binning algorithms implemented using AstroPy's bin width selection methods.

AstroPyBins is the abstract type for all binning algorithm types that rely on bin width selection functions from the AstroPy Python library, such as Knuth, Freedman-Diaconis, and Scott. Concrete subtypes implement specific binning strategies and provide a consistent interface for bin selection in histogram-based calculations within PortfolioOptimisers.jl.

Related

source
PortfolioOptimisers.FreedmanDiaconisType
FreedmanDiaconis <: AstroPyBins

Histogram binning algorithm using the Freedman-Diaconis rule.

FreedmanDiaconis implements the Freedman-Diaconis rule for selecting the number of bins in a histogram, as provided by the AstroPy library. This method determines bin width based on the interquartile range (IQR) and the number of data points, making it robust to outliers and suitable for skewed distributions.

Related

source
PortfolioOptimisers.HacineGharbiRavierType
HacineGharbiRavier <: AbstractBins

Histogram binning algorithm using the Hacine-Gharbi–Ravier rule.

HacineGharbiRavier implements the Hacine-Gharbi–Ravier rule for selecting the number of bins in a histogram. This method adapts the bin count based on the correlation structure and sample size, and is particularly useful for information-theoretic measures such as mutual information and variation of information.

Related

source
PortfolioOptimisers.get_bin_width_funcFunction
get_bin_width_func(bins::Knuth)
get_bin_width_func(bins::FreedmanDiaconis)
get_bin_width_func(bins::Scott)
get_bin_width_func(bins::Union{<:HacineGharbiRavier, <:Integer})

Return the bin width selection function associated with a histogram binning algorithm.

This utility dispatches on the binning algorithm type and returns the corresponding bin width function from the AstroPy Python library for Knuth, FreedmanDiaconis, and Scott. For HacineGharbiRavier and integer bin counts, it returns nothing, as these strategies do not use a bin width function.

Arguments

  • bins::Knuth: Use Knuth's rule (astropy.stats.knuth_bin_width).
  • bins::FreedmanDiaconis: Use the Freedman-Diaconis rule (astropy.stats.freedman_bin_width).
  • bins::Scott: Use Scott's rule (astropy.stats.scott_bin_width).
  • bins::Union{<:HacineGharbiRavier, <:Integer}: No bin width function (returns nothing).

ReturnsResult

  • bin_width_func: The corresponding bin width function (callable), or nothing if not applicable.

Examples

julia> PortfolioOptimisers.get_bin_width_func(Knuth())
Python: <function knuth_bin_width at 0x7da1178e0fe0>

julia> PortfolioOptimisers.get_bin_width_func(FreedmanDiaconis())
Python: <function freedman_bin_width at 0x7da1178e0fe0>

julia> PortfolioOptimisers.get_bin_width_func(Scott())
Python: <function scott_bin_width at 0x7da1178e0fe0>

julia> PortfolioOptimisers.get_bin_width_func(HacineGharbiRavier())

julia> PortfolioOptimisers.get_bin_width_func(10)

Related

source
PortfolioOptimisers.calc_num_binsFunction
calc_num_bins(bins::AstroPyBins,
              xj::AbstractVector, xi::AbstractVector, j::Integer, i::Integer,
              bin_width_func, T::Integer)
calc_num_bins(bins::HacineGharbiRavier,
              xj::AbstractVector, xi::AbstractVector, j::Integer, i::Integer,
              bin_width_func, T::Integer)
calc_num_bins(bins::Integer,
              xj::AbstractVector, xi::AbstractVector, j::Integer, i::Integer,
              bin_width_func, T::Integer)

Compute the number of histogram bins for a pair of variables using a specified binning algorithm.

This function determines the number of bins to use for histogram-based calculations (such as mutual information or variation of information) between two variables, based on the selected binning strategy. It dispatches on the binning algorithm type and uses the appropriate method for each:

  • For AstroPyBins, it computes the bin width using the provided bin_width_func and calculates the number of bins as the range divided by the bin width, rounding to the nearest integer. For off-diagonal pairs, it uses the maximum of the two variables' bin counts.
  • For HacineGharbiRavier, it uses the Hacine-Gharbi–Ravier rule, which adapts the bin count based on the correlation and sample size.
  • For an integer, it returns the specified number of bins directly.

Arguments

  • bins::AstroPyBins: Binning algorithm type.
  • bins::HacineGharbiRavier: Use the Hacine-Gharbi–Ravier rule.
  • bins::Integer: Use a fixed number of bins.
  • xj::AbstractVector: Data vector for variable j.
  • xi::AbstractVector: Data vector for variable i.
  • j::Integer: Index of variable j.
  • i::Integer: Index of variable i.
  • bin_width_func: Bin width selection function (from get_bin_width_func), or nothing.
  • T::Integer: Number of observations (used by some algorithms).

ReturnsResult

  • nbins::Int: The computed number of bins for the variable pair.

Related

source
PortfolioOptimisers.calc_hist_dataFunction
calc_hist_data(xj::AbstractVector, xi::AbstractVector, bins::Integer)

Compute histogram-based marginal and joint distributions for two variables.

This function computes the normalised histograms (probability mass functions) for two variables xj and xi using the specified number of bins, as well as their joint histogram. It returns the marginal entropies and the joint histogram, which are used in mutual information and variation of information calculations.

Arguments

  • xj::AbstractVector: Data vector for variable j.
  • xi::AbstractVector: Data vector for variable i.
  • bins::Integer: Number of bins to use for the histograms.

ReturnsResult

  • ex::Float64: Entropy of xj.
  • ey::Float64: Entropy of xi.
  • hxy::Matrix{Float64}: Joint histogram (counts, not normalised to probability).

Details

  • The histograms are computed using StatsBase.fit(Histogram, ...) over the range of each variable, with bin edges expanded slightly using eps to ensure all data is included.
  • The marginal histograms are normalised to sum to 1 before entropy calculation.
  • The joint histogram is not normalised, as it is used directly in mutual information calculations.

Related

source
PortfolioOptimisers.intrinsic_mutual_infoFunction
intrinsic_mutual_info(X::AbstractMatrix)

Compute the intrinsic mutual information from a joint histogram.

This function calculates the mutual information between two variables given their joint histogram matrix X. It is used as a core step in information-theoretic measures such as mutual information and variation of information.

Arguments

  • X::AbstractMatrix: Joint histogram matrix (typically from calc_hist_data).

ReturnsResult

  • mi::Float64: The intrinsic mutual information between the two variables.

Details

  • The function computes marginal distributions by summing over rows and columns.
  • Only nonzero entries in the joint histogram are considered.
  • The mutual information is computed as the sum over all nonzero joint probabilities of p(x, y) * log(p(x, y) / (p(x) * p(y))), with careful handling of log and normalisation.

Related

source
PortfolioOptimisers.variation_infoFunction
variation_info(X::AbstractMatrix,
               bins::Union{<:AbstractBins, <:Integer} = HacineGharbiRavier(),
               normalise::Bool = true)

Compute the variation of information (VI) matrix for a set of variables.

This function calculates the pairwise variation of information between all columns of the data matrix X, using histogram-based entropy and mutual information estimates. VI quantifies the amount of information lost and gained when moving from one variable to another, and is a true metric on the space of discrete distributions.

Arguments

  • X::AbstractMatrix: Data matrix (observations × variables).
  • bins::Union{<:AbstractBins, <:Integer}: Binning algorithm or fixed number of bins.
  • normalise::Bool: Whether to normalise the VI by the joint entropy.

ReturnsResult

  • var_mtx::Matrix{Float64}: Symmetric matrix of pairwise variation of information values.

Details

  • For each pair of variables, the function computes marginal entropies and the joint histogram using calc_hist_data.
  • The mutual information is computed using intrinsic_mutual_info.
  • VI is calculated as H(X) + H(Y) - 2 * I(X, Y). If normalise is true, it is divided by the joint entropy.
  • The result is clamped to [0, typemax(eltype(X))] and is symmetric.

Related

source
PortfolioOptimisers.mutual_infoFunction
mutual_info(X::AbstractMatrix,
            bins::Union{<:AbstractBins, <:Integer} = HacineGharbiRavier(),
            normalise::Bool = true)

Compute the mutual information (MI) matrix for a set of variables.

This function calculates the pairwise mutual information between all columns of the data matrix X, using histogram-based entropy and mutual information estimates. MI quantifies the amount of shared information between pairs of variables, and is widely used in information-theoretic analysis of dependencies.

Arguments

  • X::AbstractMatrix: Data matrix (observations × variables).
  • bins::Union{<:AbstractBins, <:Integer}: Binning algorithm or fixed number of bins.
  • normalise::Bool: Whether to normalise the MI by the minimum marginal entropy.

ReturnsResult

  • mut_mtx::Matrix{Float64}: Symmetric matrix of pairwise mutual information values.

Details

  • For each pair of variables, the function computes marginal entropies and the joint histogram using calc_hist_data.
  • The mutual information is computed using intrinsic_mutual_info.
  • If normalise is true, the MI is divided by the minimum of the two marginal entropies.
  • The result is clamped to [0, typemax(eltype(X))] and is symmetric.

Related

source