Histogram
PortfolioOptimisers.AbstractBins
— TypeAbstract supertype for all histogram binning algorithms.
AbstractBins
is the abstract type for all binning algorithm types used in histogram-based calculations within PortfolioOptimisers.jl, such as mutual information and variation of information analysis. Concrete subtypes implement specific binning strategies (e.g., Knuth, Freedman-Diaconis, Scott, Hacine-Gharbi-Ravier) and provide a consistent interface for bin selection.
Related
PortfolioOptimisers.AstroPyBins
— Typeabstract type AstroPyBins <: AbstractBins end
Abstract supertype for all histogram binning algorithms implemented using AstroPy's bin width selection methods.
AstroPyBins
is the abstract type for all binning algorithm types that rely on bin width selection functions from the AstroPy Python library, such as Knuth, Freedman-Diaconis, and Scott. Concrete subtypes implement specific binning strategies and provide a consistent interface for bin selection in histogram-based calculations within PortfolioOptimisers.jl.
Related
PortfolioOptimisers.Knuth
— TypeKnuth <: AstroPyBins
Histogram binning algorithm using Knuth's rule.
Knuth
implements Knuth's rule for selecting the optimal number of bins in a histogram, as provided by the AstroPy library. This method aims to maximize the posterior probability of the histogram given the data, resulting in an adaptive binning strategy that balances bias and variance.
Related
PortfolioOptimisers.FreedmanDiaconis
— TypeFreedmanDiaconis <: AstroPyBins
Histogram binning algorithm using the Freedman-Diaconis rule.
FreedmanDiaconis
implements the Freedman-Diaconis rule for selecting the number of bins in a histogram, as provided by the AstroPy library. This method determines bin width based on the interquartile range (IQR) and the number of data points, making it robust to outliers and suitable for skewed distributions.
Related
PortfolioOptimisers.Scott
— TypeScott <: AstroPyBins
Histogram binning algorithm using Scott's rule.
Scott
implements Scott's rule for selecting the number of bins in a histogram, as provided by the AstroPy library. This method chooses bin width based on the standard deviation of the data and the number of observations, providing a good default for normally distributed data.
Related
PortfolioOptimisers.HacineGharbiRavier
— TypeHacineGharbiRavier <: AbstractBins
Histogram binning algorithm using the Hacine-Gharbi–Ravier rule.
HacineGharbiRavier
implements the Hacine-Gharbi–Ravier rule for selecting the number of bins in a histogram. This method adapts the bin count based on the correlation structure and sample size, and is particularly useful for information-theoretic measures such as mutual information and variation of information.
Related
PortfolioOptimisers.get_bin_width_func
— Functionget_bin_width_func(bins::Knuth)
get_bin_width_func(bins::FreedmanDiaconis)
get_bin_width_func(bins::Scott)
get_bin_width_func(bins::Union{<:HacineGharbiRavier, <:Integer})
Return the bin width selection function associated with a histogram binning algorithm.
This utility dispatches on the binning algorithm type and returns the corresponding bin width function from the AstroPy Python library for Knuth
, FreedmanDiaconis
, and Scott
. For HacineGharbiRavier
and integer bin counts, it returns nothing
, as these strategies do not use a bin width function.
Arguments
bins::Knuth
: Use Knuth's rule (astropy.stats.knuth_bin_width
).bins::FreedmanDiaconis
: Use the Freedman-Diaconis rule (astropy.stats.freedman_bin_width
).bins::Scott
: Use Scott's rule (astropy.stats.scott_bin_width
).bins::Union{<:HacineGharbiRavier, <:Integer}
: No bin width function (returnsnothing
).
ReturnsResult
bin_width_func
: The corresponding bin width function (callable), ornothing
if not applicable.
Examples
julia> PortfolioOptimisers.get_bin_width_func(Knuth())
Python: <function knuth_bin_width at 0x7da1178e0fe0>
julia> PortfolioOptimisers.get_bin_width_func(FreedmanDiaconis())
Python: <function freedman_bin_width at 0x7da1178e0fe0>
julia> PortfolioOptimisers.get_bin_width_func(Scott())
Python: <function scott_bin_width at 0x7da1178e0fe0>
julia> PortfolioOptimisers.get_bin_width_func(HacineGharbiRavier())
julia> PortfolioOptimisers.get_bin_width_func(10)
Related
PortfolioOptimisers.calc_num_bins
— Functioncalc_num_bins(bins::AstroPyBins,
xj::AbstractVector, xi::AbstractVector, j::Integer, i::Integer,
bin_width_func, T::Integer)
calc_num_bins(bins::HacineGharbiRavier,
xj::AbstractVector, xi::AbstractVector, j::Integer, i::Integer,
bin_width_func, T::Integer)
calc_num_bins(bins::Integer,
xj::AbstractVector, xi::AbstractVector, j::Integer, i::Integer,
bin_width_func, T::Integer)
Compute the number of histogram bins for a pair of variables using a specified binning algorithm.
This function determines the number of bins to use for histogram-based calculations (such as mutual information or variation of information) between two variables, based on the selected binning strategy. It dispatches on the binning algorithm type and uses the appropriate method for each:
- For
AstroPyBins
, it computes the bin width using the providedbin_width_func
and calculates the number of bins as the range divided by the bin width, rounding to the nearest integer. For off-diagonal pairs, it uses the maximum of the two variables' bin counts. - For
HacineGharbiRavier
, it uses the Hacine-Gharbi–Ravier rule, which adapts the bin count based on the correlation and sample size. - For an integer, it returns the specified number of bins directly.
Arguments
bins::AstroPyBins
: Binning algorithm type.bins::HacineGharbiRavier
: Use the Hacine-Gharbi–Ravier rule.bins::Integer
: Use a fixed number of bins.xj::AbstractVector
: Data vector for variablej
.xi::AbstractVector
: Data vector for variablei
.j::Integer
: Index of variablej
.i::Integer
: Index of variablei
.bin_width_func
: Bin width selection function (fromget_bin_width_func
), ornothing
.T::Integer
: Number of observations (used by some algorithms).
ReturnsResult
nbins::Int
: The computed number of bins for the variable pair.
Related
PortfolioOptimisers.calc_hist_data
— Functioncalc_hist_data(xj::AbstractVector, xi::AbstractVector, bins::Integer)
Compute histogram-based marginal and joint distributions for two variables.
This function computes the normalised histograms (probability mass functions) for two variables xj
and xi
using the specified number of bins, as well as their joint histogram. It returns the marginal entropies and the joint histogram, which are used in mutual information and variation of information calculations.
Arguments
xj::AbstractVector
: Data vector for variablej
.xi::AbstractVector
: Data vector for variablei
.bins::Integer
: Number of bins to use for the histograms.
ReturnsResult
ex::Float64
: Entropy ofxj
.ey::Float64
: Entropy ofxi
.hxy::Matrix{Float64}
: Joint histogram (counts, not normalised to probability).
Details
- The histograms are computed using
StatsBase.fit(Histogram, ...)
over the range of each variable, with bin edges expanded slightly usingeps
to ensure all data is included. - The marginal histograms are normalised to sum to 1 before entropy calculation.
- The joint histogram is not normalised, as it is used directly in mutual information calculations.
Related
PortfolioOptimisers.intrinsic_mutual_info
— Functionintrinsic_mutual_info(X::AbstractMatrix)
Compute the intrinsic mutual information from a joint histogram.
This function calculates the mutual information between two variables given their joint histogram matrix X
. It is used as a core step in information-theoretic measures such as mutual information and variation of information.
Arguments
X::AbstractMatrix
: Joint histogram matrix (typically fromcalc_hist_data
).
ReturnsResult
mi::Float64
: The intrinsic mutual information between the two variables.
Details
- The function computes marginal distributions by summing over rows and columns.
- Only nonzero entries in the joint histogram are considered.
- The mutual information is computed as the sum over all nonzero joint probabilities of
p(x, y) * log(p(x, y) / (p(x) * p(y)))
, with careful handling of log and normalisation.
Related
PortfolioOptimisers.variation_info
— Functionvariation_info(X::AbstractMatrix,
bins::Union{<:AbstractBins, <:Integer} = HacineGharbiRavier(),
normalise::Bool = true)
Compute the variation of information (VI) matrix for a set of variables.
This function calculates the pairwise variation of information between all columns of the data matrix X
, using histogram-based entropy and mutual information estimates. VI quantifies the amount of information lost and gained when moving from one variable to another, and is a true metric on the space of discrete distributions.
Arguments
X::AbstractMatrix
: Data matrix (observations × variables).bins::Union{<:AbstractBins, <:Integer}
: Binning algorithm or fixed number of bins.normalise::Bool
: Whether to normalise the VI by the joint entropy.
ReturnsResult
var_mtx::Matrix{Float64}
: Symmetric matrix of pairwise variation of information values.
Details
- For each pair of variables, the function computes marginal entropies and the joint histogram using
calc_hist_data
. - The mutual information is computed using
intrinsic_mutual_info
. - VI is calculated as
H(X) + H(Y) - 2 * I(X, Y)
. Ifnormalise
istrue
, it is divided by the joint entropy. - The result is clamped to
[0, typemax(eltype(X))]
and is symmetric.
Related
PortfolioOptimisers.mutual_info
— Functionmutual_info(X::AbstractMatrix,
bins::Union{<:AbstractBins, <:Integer} = HacineGharbiRavier(),
normalise::Bool = true)
Compute the mutual information (MI) matrix for a set of variables.
This function calculates the pairwise mutual information between all columns of the data matrix X
, using histogram-based entropy and mutual information estimates. MI quantifies the amount of shared information between pairs of variables, and is widely used in information-theoretic analysis of dependencies.
Arguments
X::AbstractMatrix
: Data matrix (observations × variables).bins::Union{<:AbstractBins, <:Integer}
: Binning algorithm or fixed number of bins.normalise::Bool
: Whether to normalise the MI by the minimum marginal entropy.
ReturnsResult
mut_mtx::Matrix{Float64}
: Symmetric matrix of pairwise mutual information values.
Details
- For each pair of variables, the function computes marginal entropies and the joint histogram using
calc_hist_data
. - The mutual information is computed using
intrinsic_mutual_info
. - If
normalise
istrue
, the MI is divided by the minimum of the two marginal entropies. - The result is clamped to
[0, typemax(eltype(X))]
and is symmetric.
Related