Skip to contents

remove_correlated removes one of column-pairs from sparse and dense matrices that have sample correlation value greater than a user-defined threshold.

Usage

remove_correlated(x, threshold)

# S3 method for CsparseMatrix
remove_correlated(x, threshold = 0.99)

# S3 method for matrix
remove_correlated(x, threshold = 0.99)

Arguments

x

A matrix or CsparseMatrix.

threshold

A double between 0 and 1 specifying the absolute correlation threshold value at which to remove columns.

Value

x with one of highly correlated column-pairs removed.

Details

remove_correlated() is an S3 generic with methods for:

  • CsparseMatrix

  • matrix

Examples

# Create a sparse matrix with very sparse columns
x <- Matrix::rsparsematrix(10, 5, 0.5)
x <- cbind(x, x[, 4:5], x[, 4:5])
# Create two perfectly correlated columns
colnames(x) <- paste0("x", 1:9)
# Print x
x
#> 10 x 9 sparse Matrix of class "dgCMatrix"
#>           x1    x2    x3     x4    x5     x6    x7     x8    x9
#>  [1,] -1.300 -0.80 -0.18  0.880 -0.85  0.880 -0.85  0.880 -0.85
#>  [2,]  .      0.12  .     .      .     .      .     .      .   
#>  [3,]  .      .     .    -0.530 -0.22 -0.530 -0.22 -0.530 -0.22
#>  [4,] -0.980 -0.31  0.10 -0.013  0.57 -0.013  0.57 -0.013  0.57
#>  [5,]  0.091 -0.16  1.60  0.490  .     0.490  .     0.490  .   
#>  [6,]  0.300  .     .    -0.850 -0.11 -0.850 -0.11 -0.850 -0.11
#>  [7,]  .      .     .     .      .     .      .     .      .   
#>  [8,] -0.380  .     .     .      .     .      .     .      .   
#>  [9,]  1.400  .     .     .      .     .      .     .      .   
#> [10,]  .      .    -0.85  0.430  1.70  0.430  1.70  0.430  1.70

# Same matrix in dense format
xdense <- as.matrix(x)

# Drop highly correlated columns
remove_correlated(x, threshold = 0.99)
#> 10 x 5 sparse Matrix of class "dgCMatrix"
#>           x1    x2    x3     x4    x5
#>  [1,] -1.300 -0.80 -0.18  0.880 -0.85
#>  [2,]  .      0.12  .     .      .   
#>  [3,]  .      .     .    -0.530 -0.22
#>  [4,] -0.980 -0.31  0.10 -0.013  0.57
#>  [5,]  0.091 -0.16  1.60  0.490  .   
#>  [6,]  0.300  .     .    -0.850 -0.11
#>  [7,]  .      .     .     .      .   
#>  [8,] -0.380  .     .     .      .   
#>  [9,]  1.400  .     .     .      .   
#> [10,]  .      .    -0.85  0.430  1.70
remove_correlated(xdense, threshold = 0.99)
#>           x1    x2    x3     x4    x5
#>  [1,] -1.300 -0.80 -0.18  0.880 -0.85
#>  [2,]  0.000  0.12  0.00  0.000  0.00
#>  [3,]  0.000  0.00  0.00 -0.530 -0.22
#>  [4,] -0.980 -0.31  0.10 -0.013  0.57
#>  [5,]  0.091 -0.16  1.60  0.490  0.00
#>  [6,]  0.300  0.00  0.00 -0.850 -0.11
#>  [7,]  0.000  0.00  0.00  0.000  0.00
#>  [8,] -0.380  0.00  0.00  0.000  0.00
#>  [9,]  1.400  0.00  0.00  0.000  0.00
#> [10,]  0.000  0.00 -0.85  0.430  1.70