Skip to contents

This function implements model-consistent Lasso estimation through the bootstrap. It supports parallel processing by way of the future package, allowing the user to flexibly specify many parallelization methods. This method was developed as a variable-selection algorithm, but this package also supports making ensemble predictions on new data using the bagged Lasso models.

Usage

bolasso(
  formula,
  data,
  n.boot = 100,
  progress = TRUE,
  implement = "glmnet",
  x = NULL,
  y = NULL,
  ...
)

Arguments

formula

An optional object of class formula (or one that can be coerced to that class): a symbolic description of the model to be fitted. Can be omitted when x and y are non-missing.

data

An optional object of class data.frame that contains the modeling variables referenced in form. Can be omitted when x and y are non-missing.

n.boot

An integer specifying the number of bootstrap replicates.

progress

A boolean indicating whether to display progress across bootstrap folds.

implement

A character; either 'glmnet' or 'gamlr', specifying which Lasso implementation to utilize. For specific modeling details, see glmnet::cv.glmnet or gamlr::cv.gamlr.

x

An optional predictor matrix in lieu of form and data.

y

An optional response vector in lieu of form and data.

...

Additional parameters to pass to either glmnet::cv.glmnet or gamlr::cv.gamlr.

Value

An object of class bolasso. This object is a list of length n.boot of cv.glmnet or cv.gamlr objects.

References

Bach FR (2008). “Bolasso: model consistent Lasso estimation through the bootstrap.” CoRR, abs/0804.1302. 0804.1302, https://arxiv.org/abs/0804.1302.

See also

glmnet::cv.glmnet and gamlr::cv.gamlr for full details on the respective implementations and arguments that can be passed to ....

Examples

mtcars[, c(2, 10:11)] <- lapply(mtcars[, c(2, 10:11)], as.factor)
idx <- sample(nrow(mtcars), 22)
mtcars_train <- mtcars[idx, ]
mtcars_test <- mtcars[-idx, ]

## Formula Interface

# Train model
set.seed(123)
bolasso_form <- bolasso(
  form = mpg ~ .,
  data = mtcars_train,
  n.boot = 20,
  nfolds = 5,
  implement = "glmnet"
)
#> Loaded glmnet 4.1-8

# Extract selected variables
selected_vars(bolasso_form, threshold = 0.9, select = "lambda.min")
#> # A tibble: 1 × 2
#>   variable  mean_coef
#>   <chr>         <dbl>
#> 1 Intercept      23.6

# Bagged ensemble prediction on test data
predict(bolasso_form,
        new.data = mtcars_test,
        select = "lambda.min")
#>                     boot1    boot2    boot3     boot4    boot5    boot6
#> Mazda RX4        23.53203 21.31155 20.38801 22.290720 22.27351 21.94408
#> Datsun 710       26.39838 25.65987 30.73719 27.331865 23.36205 24.69222
#> Duster 360       15.27134 15.34670 13.64426  8.423965 14.22472 15.46800
#> Merc 240D        23.13465 22.65999 27.44668 21.980070 23.39864 21.78321
#> Merc 280         18.72157 18.82589 18.99172 17.249912 21.24348 18.30049
#> Honda Civic      29.66803 28.55722 31.92853 28.131520 27.31862 27.60294
#> Toyota Corolla   28.63992 27.65163 31.55401 30.084933 25.56220 26.69050
#> Camaro Z28       14.61451 14.34784 13.39836  8.986365 14.97222 14.66826
#> Pontiac Firebird 17.25657 14.82313 16.19174 12.794893 16.11565 16.05771
#> Maserati Bora    14.18150 14.71184 17.21216  8.396929 11.88185 14.61647
#>                     boot7    boot8    boot9   boot10   boot11   boot12   boot13
#> Mazda RX4        21.42965 18.82831 23.22819 20.14767 15.38615 21.52801 20.42563
#> Datsun 710       31.17257 25.62445 25.92730 24.50229 26.38631 26.26061 29.43843
#> Duster 360       13.31440 15.80368 15.89598 17.17782 15.19136 12.60124 10.22709
#> Merc 240D        23.64343 21.01089 23.67867 21.78253 24.44045 21.91810 26.32699
#> Merc 280         18.29764 17.98024 19.86190 17.58422 18.97526 17.93253 17.81826
#> Honda Civic      35.80988 25.13158 29.70004 26.70624 29.92394 30.08307 32.93965
#> Toyota Corolla   33.05723 27.56742 28.51666 26.01848 26.83720 27.57016 30.29603
#> Camaro Z28       17.50096 15.51962 14.91479 16.33376 16.78562 14.44168  9.86533
#> Pontiac Firebird 17.97528 15.62626 16.96378 16.31812 16.54070 16.04707 15.59535
#> Maserati Bora    20.20025 17.60688 13.24071 17.30561 19.69561 21.45137 17.86446
#>                    boot14   boot15   boot16   boot17   boot18   boot19   boot20
#> Mazda RX4        20.96366 23.00697 19.63938 20.72184 22.53705 21.92048 21.68905
#> Datsun 710       29.04419 26.01095 21.00265 25.77933 25.60174 25.00468 22.52004
#> Duster 360       12.14012 14.26750 16.01328 14.81375 15.41257 14.91257 15.08037
#> Merc 240D        22.16729 23.64211 19.49836 23.43581 22.98283 22.26822 21.15043
#> Merc 280         18.50003 19.07564 18.52614 19.48435 18.95140 18.24974 20.15770
#> Honda Civic      31.84528 27.46691 22.16310 27.58397 28.90394 27.96428 24.74005
#> Toyota Corolla   30.00382 27.00526 21.95821 26.47687 27.86820 27.03570 23.91624
#> Camaro Z28       13.48879 14.26750 15.80876 14.58134 14.54991 14.16212 14.90464
#> Pontiac Firebird 16.06674 16.75329 15.26705 15.22047 16.91750 15.85567 15.11960
#> Maserati Bora    24.75042 14.54118 16.73050 17.24715 14.33267 13.79177 14.54958

## Alternal Matrix Interface

# Train model
set.seed(123)
bolasso_mat <- bolasso(
  x = model.matrix(mpg ~ . - 1, mtcars_train),
  y = mtcars_train[, 1],
  data = mtcars_train,
  n.boot = 20,
  nfolds = 5,
  implement = "glmnet"
)

# Extract selected variables
selected_vars(bolasso_mat, threshold = 0.9, select = "lambda.min")
#> # A tibble: 1 × 2
#>   variable  mean_coef
#>   <chr>         <dbl>
#> 1 Intercept      23.6

# Bagged ensemble prediction on test data
predict(bolasso_mat,
        new.data = model.matrix(mpg ~ . - 1, mtcars_test),
        select = "lambda.min")
#>                     boot1    boot2    boot3     boot4    boot5    boot6
#> Mazda RX4        23.53203 21.31155 20.38801 22.290720 22.27351 21.94408
#> Datsun 710       26.39838 25.65987 30.73719 27.331865 23.36205 24.69222
#> Duster 360       15.27134 15.34670 13.64426  8.423965 14.22472 15.46800
#> Merc 240D        23.13465 22.65999 27.44668 21.980070 23.39864 21.78321
#> Merc 280         18.72157 18.82589 18.99172 17.249912 21.24348 18.30049
#> Honda Civic      29.66803 28.55722 31.92853 28.131520 27.31862 27.60294
#> Toyota Corolla   28.63992 27.65163 31.55401 30.084933 25.56220 26.69050
#> Camaro Z28       14.61451 14.34784 13.39836  8.986365 14.97222 14.66826
#> Pontiac Firebird 17.25657 14.82313 16.19174 12.794893 16.11565 16.05771
#> Maserati Bora    14.18150 14.71184 17.21216  8.396929 11.88185 14.61647
#>                     boot7    boot8    boot9   boot10   boot11   boot12   boot13
#> Mazda RX4        21.42965 18.82831 23.22819 20.14767 15.38615 21.52801 20.42563
#> Datsun 710       31.17257 25.62445 25.92730 24.50229 26.38631 26.26061 29.43843
#> Duster 360       13.31440 15.80368 15.89598 17.17782 15.19136 12.60124 10.22709
#> Merc 240D        23.64343 21.01089 23.67867 21.78253 24.44045 21.91810 26.32699
#> Merc 280         18.29764 17.98024 19.86190 17.58422 18.97526 17.93253 17.81826
#> Honda Civic      35.80988 25.13158 29.70004 26.70624 29.92394 30.08307 32.93965
#> Toyota Corolla   33.05723 27.56742 28.51666 26.01848 26.83720 27.57016 30.29603
#> Camaro Z28       17.50096 15.51962 14.91479 16.33376 16.78562 14.44168  9.86533
#> Pontiac Firebird 17.97528 15.62626 16.96378 16.31812 16.54070 16.04707 15.59535
#> Maserati Bora    20.20025 17.60688 13.24071 17.30561 19.69561 21.45137 17.86446
#>                    boot14   boot15   boot16   boot17   boot18   boot19   boot20
#> Mazda RX4        20.96366 23.00697 19.63938 20.72184 22.53705 21.92048 21.68905
#> Datsun 710       29.04419 26.01095 21.00265 25.77933 25.60174 25.00468 22.52004
#> Duster 360       12.14012 14.26750 16.01328 14.81375 15.41257 14.91257 15.08037
#> Merc 240D        22.16729 23.64211 19.49836 23.43581 22.98283 22.26822 21.15043
#> Merc 280         18.50003 19.07564 18.52614 19.48435 18.95140 18.24974 20.15770
#> Honda Civic      31.84528 27.46691 22.16310 27.58397 28.90394 27.96428 24.74005
#> Toyota Corolla   30.00382 27.00526 21.95821 26.47687 27.86820 27.03570 23.91624
#> Camaro Z28       13.48879 14.26750 15.80876 14.58134 14.54991 14.16212 14.90464
#> Pontiac Firebird 16.06674 16.75329 15.26705 15.22047 16.91750 15.85567 15.11960
#> Maserati Bora    24.75042 14.54118 16.73050 17.24715 14.33267 13.79177 14.54958