Remove highly correlated columns
remove_correlated.Rd
remove_correlated
removes one of column-pairs from sparse and dense
matrices that have sample correlation value greater than a user-defined
threshold.
Usage
remove_correlated(x, threshold)
# S3 method for CsparseMatrix
remove_correlated(x, threshold = 0.99)
# S3 method for matrix
remove_correlated(x, threshold = 0.99)
Arguments
- x
A
matrix
orCsparseMatrix
.- threshold
A double between 0 and 1 specifying the absolute correlation threshold value at which to remove columns.
Examples
# Create a sparse matrix with very sparse columns
x <- Matrix::rsparsematrix(10, 5, 0.5)
x <- cbind(x, x[, 4:5], x[, 4:5])
# Create two perfectly correlated columns
colnames(x) <- paste0("x", 1:9)
# Print x
x
#> 10 x 9 sparse Matrix of class "dgCMatrix"
#> x1 x2 x3 x4 x5 x6 x7 x8 x9
#> [1,] -1.300 -0.80 -0.18 0.880 -0.85 0.880 -0.85 0.880 -0.85
#> [2,] . 0.12 . . . . . . .
#> [3,] . . . -0.530 -0.22 -0.530 -0.22 -0.530 -0.22
#> [4,] -0.980 -0.31 0.10 -0.013 0.57 -0.013 0.57 -0.013 0.57
#> [5,] 0.091 -0.16 1.60 0.490 . 0.490 . 0.490 .
#> [6,] 0.300 . . -0.850 -0.11 -0.850 -0.11 -0.850 -0.11
#> [7,] . . . . . . . . .
#> [8,] -0.380 . . . . . . . .
#> [9,] 1.400 . . . . . . . .
#> [10,] . . -0.85 0.430 1.70 0.430 1.70 0.430 1.70
# Same matrix in dense format
xdense <- as.matrix(x)
# Drop highly correlated columns
remove_correlated(x, threshold = 0.99)
#> 10 x 5 sparse Matrix of class "dgCMatrix"
#> x1 x2 x3 x4 x5
#> [1,] -1.300 -0.80 -0.18 0.880 -0.85
#> [2,] . 0.12 . . .
#> [3,] . . . -0.530 -0.22
#> [4,] -0.980 -0.31 0.10 -0.013 0.57
#> [5,] 0.091 -0.16 1.60 0.490 .
#> [6,] 0.300 . . -0.850 -0.11
#> [7,] . . . . .
#> [8,] -0.380 . . . .
#> [9,] 1.400 . . . .
#> [10,] . . -0.85 0.430 1.70
remove_correlated(xdense, threshold = 0.99)
#> x1 x2 x3 x4 x5
#> [1,] -1.300 -0.80 -0.18 0.880 -0.85
#> [2,] 0.000 0.12 0.00 0.000 0.00
#> [3,] 0.000 0.00 0.00 -0.530 -0.22
#> [4,] -0.980 -0.31 0.10 -0.013 0.57
#> [5,] 0.091 -0.16 1.60 0.490 0.00
#> [6,] 0.300 0.00 0.00 -0.850 -0.11
#> [7,] 0.000 0.00 0.00 0.000 0.00
#> [8,] -0.380 0.00 0.00 0.000 0.00
#> [9,] 1.400 0.00 0.00 0.000 0.00
#> [10,] 0.000 0.00 -0.85 0.430 1.70