--- title: "Getting Started" output: rmarkdown::html_vignette bibliography: ../inst/references.bib vignette: > %\VignetteIndexEntry{Getting Started} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} set.seed(42) library("mlr3") library("mlr3spatiotempcv") knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction ```{r, include=FALSE} library(mlr3spatiotempcv) ``` This package adds resampling methods for the {mlr3} package framework suited for spatial, temporal and spatiotemporal data. These methods can help to reduce the influence of autocorrelation on performance estimates when performing cross-validation. While this article gives a rather technical introduction to the package, a more applied approach can be found in the [mlr3book section on "Spatiotemporal Analysis"](https://mlr3book.mlr-org.com/chapters/chapter13/beyond_regression_and_classification.html#sec-spatiotemporall). After loading the package via `library("mlr3spatiotempcv")`, the spatiotemporal resampling methods and example tasks provided by {mlr3spatiotempcv} are available to the user **alongside the default {mlr3} resampling methods and tasks**. ## Creating a spatial Task To make use of spatial resampling methods, a {mlr3} task that is aware of its spatial characteristic needs to be created. Two `Task` child classes exist in {mlr3spatiotempcv} for this purpose: - `TaskClassifST` - `TaskRegrST` To create one of these, you have multiple options: 1. Use the constructor of the `Task` directly via `$new()` - this only works for data.table backends (!) 1. Use the `as_task_*` converters (e.g. if your data is stored in an `sf` object) We recommend the latter, as the `as_task_*` converters aim to make task construction easier, e.g., by creating the `DataBackend` (which is required to create a Task in {mlr3}) automatically and setting the `crs` and `coordinate_names` fields. Let's assume your (point) data is stored in with an `sf` object, which is a common scenario for spatial analysis in R. ```{r 08-special-spatiotemp-002} # create 'sf' object data_sf = sf::st_as_sf(ecuador, coords = c("x", "y"), crs = 32717) # create `TaskClassifST` from `sf` object task = as_task_classif_st(data_sf, id = "ecuador_task", target = "slides", positive = "TRUE") ``` You can also use a plain `data.frame`. In this case, `crs` and `coordinate_names` need to be passed along explicitly as they cannot be inferred directly from the `sf` object: ```{r 08-special-spatiotemp-003} task = as_task_classif_st(ecuador, id = "ecuador_task", target = "slides", positive = "TRUE", coordinate_names = c("x", "y"), crs = 32717) ``` The `*ST` task family prints a subset of the coordinates by default: ```{r 08-special-spatiotemp-004} print(task) ``` All `*ST` tasks can be treated as their super class equivalents `TaskClassif` or `TaskRegr` in subsequent {mlr3} modeling steps. # Contributed reflections by {mlr3spatiotempcv} In {mlr3}, dictionaries are used for overview purposes of available methods. The following sections show which dictionaries get appended with new entries when loading {mlr3spatiotempcv}. # Task Type - `TaskClassifST` - `TaskRegrST` ```{r} mlr_reflections$task_types ``` # Task Column Roles - `coordinate` - `space` - `time` ```{r} mlr_reflections$task_col_roles ``` # Resampling Methods * `mlr_resampling_spcv_block` * `mlr_resampling_spcv_buffer` * `mlr_resampling_spcv_coords` * `mlr_resampling_spcv_knndm` * `mlr_resampling_spcv_disc` * `mlr_resampling_spcv_tiles` * `mlr_resampling_spcv_env` * `mlr_resampling_sptcv_cstf` and their respective repeated versions. See `as.data.table(mlr_resamplings)` for the full dictionary. # Examples Tasks - `tsk("ecuador")` (spatial, classif) - `tsk("cookfarm_mlr3")` (spatiotemp, regr) # Upstream Packages and Scientific References The following table lists all spatiotemporal methods implemented in {mlr3spatiotempcv} (or {mlr3}), their upstream R package and scientific references. All methods besides `"spcv_buffer"` also have a corresponding "repeated" method. +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ | Category | (Package) Method Name | Reference | mlr3 Notation | +================================+===============================================+===================+==============================================================================+ | **Buffering**, spatial | (blockCV) Spatial Buffering | @valavi2018 | `mlr_resamplings_spcv_buffer` | +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ | **Buffering**, spatial | (sperrorest) Spatial Disc | @brenning2012 | `mlr_resamplings_spcv_disc` | +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ | **Blocking**, spatial | (blockCV) Spatial Blocking | @valavi2018 | `mlr_resamplings_spcv_block` | +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ | **Blocking**, spatial | (sperrorest) Spatial Tiles | @valavi2018 | `mlr_resamplings_spcv_tiles` | +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ | **Clustering**, spatial | (sperrorest) Spatial CV | @brenning2012 | `mlr_resamplings_spcv_coords` | +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ | **Clustering**, spatial | (CAST) KNNDM | @linnenbrink2023 | `mlr_resamplings_spcv_knndm` | +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ | **Clustering**, feature-space | (blockCV) Environmental Blocking | @valavi2018 | `mlr_resamplings_spcv_env` | +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ | ------------------------------ | --------------------------------------------- | ----------------- | ---------------------------------------------------------------------------- | +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ | **Grouping**, predefined inds | (mlr3) Predefined partitions | | `mlr_resamplings_custom_cv` | +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ | **Grouping**, spatiotemporal | (mlr3) via `col_roles` `"group"` | | `mlr_resamplings_cv`, `Task$set_col_roles(, "group")` | +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ | **Grouping**, spatiotemporal | (CAST) Leave-Location-and-Time-Out | @meyer2018 | `mlr_resamplings_sptcv_cstf`, `Task$set_col_roles(, "space|time")` | +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ # References