Skip to content
13

Combinatorial

PortfolioOptimisers.CombinatorialCrossValidation Type
julia
struct CombinatorialCrossValidation{__T_n_folds, __T_n_test_folds, __T_purged_size, __T_embargo_size} <: NonSequentialCrossValidationEstimator

Implements combinatorial non-sequential cross-validation with purging and embargoing, allowing for all possible combinations of test folds.

Fields

  • n_folds: Number of folds.

  • n_test_folds: Number of test folds.

  • purged_size: Number of observations to purge between train and test sets.

  • embargo_size: Number of observations to embargo after the test set.

Constructors

julia
CombinatorialCrossValidation(;
    n_folds::Integer = 10,
    n_test_folds::Integer = 8,
    purged_size::Integer = 0,
    embargo_size::Integer = 0,
    warn_comb::Integer = 100_000,
) -> CombinatorialCrossValidation

Keyword arguments correspond to the struct's fields.

Validation

  • n_folds must be non-empty, greater than zero, and finite.

  • n_test_folds must be non-empty, greater than zero, and finite.

  • purged_size and embargo_size must be non-empty and finite.

  • Warns if the number of combinations exceeds warn_comb.

Examples

julia
julia> CombinatorialCrossValidation(; n_folds = 10, n_test_folds = 8, purged_size = 2,
                                    embargo_size = 1)
CombinatorialCrossValidation
       n_folds ┼ Int64: 10
  n_test_folds ┼ Int64: 8
   purged_size ┼ Int64: 2
  embargo_size ┴ Int64: 1

Related

source
PortfolioOptimisers.CombinatorialCrossValidationResult Type
julia
struct CombinatorialCrossValidationResult{__T_train_idx, __T_test_idx, __T_path_ids} <: NonSequentialCrossValidationResult

Result type produced by CombinatorialCrossValidation after splitting data into combinatorial training and testing folds.

Stores the train index vectors, nested test index vectors (one per path), and a matrix of path IDs mapping folds to paths.

Fields

  • train_idx: Training set indices.

  • test_idx: Test set indices.

  • path_ids: Path identifiers for cross-validation splits.

Constructors

julia
CombinatorialCrossValidationResult(;
    train_idx::VecVecInt,
    test_idx::VecVecVecInt,
    path_ids::AbstractMatrix{<:Integer}
) -> CombinatorialCrossValidationResult

Keywords correspond to the struct's fields.

Validation

  • !isempty(train_idx).

  • !isempty(test_idx).

  • !isempty(path_ids).

  • length(train_idx) == length(test_idx) == size(path_ids, 2).

Related

source
PortfolioOptimisers.CombCVER Type
julia
const CombCVER = Union{<:CombinatorialCrossValidation,
                       <:CombinatorialCrossValidationResult}

Alias for a combinatorial cross-validation estimator or result.

Matches either a CombinatorialCrossValidation estimator or a CombinatorialCrossValidationResult.

Related

source
Base.split Method
julia
Base.split(ccv::CombinatorialCrossValidation, rd::ReturnsResult) -> CombinatorialCrossValidationResult

Split the returns data rd into all possible combinations of training and test folds using combinatorial cross-validation with optional purging and embargoing.

Arguments

  • ccv::CombinatorialCrossValidation: Combinatorial cross-validation estimator.

  • rd::ReturnsResult: Returns data to split.

Returns

  • CombinatorialCrossValidationResult: Result containing train indices, nested test index vectors (one per path), and a matrix of path IDs mapping folds to paths.

Related

source
PortfolioOptimisers.test_set_index Method
julia
test_set_index(ccv)

Generate all test set index combinations for combinatorial cross-validation.

Returns a vector of test fold index combinations for ccv.

Arguments

Returns

  • Vector of test index combinations.

Related

source
PortfolioOptimisers.binary_train_test_sets Method
julia
binary_train_test_sets(ccv)

Generate binary train/test set assignment matrices for combinatorial cross-validation.

Returns a matrix indicating which samples are in train (0) and test (1) sets for each combination.

Arguments

Returns

  • Binary train/test assignment matrix.

Related

source
PortfolioOptimisers.get_path_ids Method
julia
get_path_ids(ccv)

Get path identifiers for each test combination in combinatorial cross-validation.

Returns the path assignment for each test combination, mapping combinations to their recombined paths.

Arguments

Returns

  • Vector of path IDs.

Related

source
PortfolioOptimisers.n_test_paths Function
julia
n_test_paths(n_folds, n_test_folds)

Compute the number of test paths in combinatorial cross-validation.

Returns the number of unique recombined test paths from n_folds folds choosing n_test_folds test folds. Also accepts a CombinatorialCrossValidation object directly.

Arguments

  • n_folds: Total number of folds.

  • n_test_folds: Number of test folds per combination.

Returns

  • Integer number of test paths.

Related

source
PortfolioOptimisers.average_train_size Function
julia
average_train_size(T, n_folds, n_test_folds)

Compute the average training set size for combinatorial cross-validation.

Arguments

  • T: Total number of observations.

  • n_folds: Total number of folds.

  • n_test_folds: Number of test folds per combination.

Returns

  • Average number of training observations per fold.

Related

source
PortfolioOptimisers.recombined_paths Function
julia
recombined_paths(ccv)

Generate the recombined test paths for combinatorial cross-validation.

Returns a vector of vectors representing the recombined test paths — sequences of test fold indices that together cover the entire dataset.

Arguments

Returns

  • Vector of recombined path index vectors.

Related

source
PortfolioOptimisers.optimal_number_folds Function
julia
optimal_number_folds(T::Integer, target_train_size::Integer,
                     target_n_test_paths::Integer; train_size_w::Number = 1,
                     n_test_paths_w::Number = 1, maxval::Number = 1e5) -> Tuple{Int, Int}

Find the optimal (n_folds, n_test_folds) pair for combinatorial cross-validation by minimising a weighted cost that balances the average training size against the number of test paths.

Mathematical definition

The cost function for a candidate (n_folds, n_test_folds) pair is:

cost=wntp|P(n,k)P|P+wtr|T¯(n,k)T|T.

Where:

  • cost: Weighted cost for the candidate fold configuration.

  • wntp: Weight on the test-paths component.

  • wtr: Weight on the training-size component.

  • P(n,k): Number of test paths for n folds and k test folds.

  • T¯(n,k): Average training size for n folds and k test folds.

  • P: Target number of test paths (target_n_test_paths).

  • T: Target training size (target_train_size).

Arguments

  • T: Total number of observations in the dataset.

  • target_train_size: Desired average number of observations in each training set.

  • target_n_test_paths: Desired number of recombined test paths.

  • train_size_w: Weight applied to the training-size component of the cost (default 1).

  • n_test_paths_w: Weight applied to the test-paths component of the cost (default 1).

  • maxval: Early-exit threshold; a fold configuration whose cost exceeds maxval prunes subsequent higher n_test_folds values (default 1e5).

Returns

  • Tuple{Int, Int}: The optimal (n_folds, n_test_folds) pair minimising the weighted cost. Returns (0, 0) when no valid configuration is found.

Related

source