Title: | Connector Between 'mlr3' and 'OpenML' |
---|---|
Description: | Provides an interface to 'OpenML.org' to list and download machine learning data, tasks and experiments. The 'OpenML' objects can be automatically converted to 'mlr3' objects. For a more sophisticated interface with more upload options, see the 'OpenML' package. |
Authors: | Michel Lang [aut] , Sebastian Fischer [cre, aut] |
Maintainer: | Sebastian Fischer <[email protected]> |
License: | LGPL-3 |
Version: | 0.10.0 |
Built: | 2024-10-28 05:06:00 UTC |
Source: | https://github.com/mlr-org/mlr3oml |
Provides an interface to 'OpenML.org' to list and download machine learning data, tasks and experiments. The 'OpenML' objects can be automatically converted to 'mlr3' objects. For a more sophisticated interface with more upload options, see the 'OpenML' package.
Start by reading the Large-Scale Benchmarking chapter from the mlr3book.
This package adds the mlr3::Task "oml"
and the mlr3::Resampling "oml"
to
mlr3::mlr_tasks and mlr3::mlr_resamplings, respectively.
For the former you may pass either a data_id
or a task_id
, the latter requires
a task_id
.
Furthermore it allows to convert the OpenML objects to mlr3 objects using the usual S3 generics
such as mlr3::as_task, mlr3::as_learner, mlr3::as_resampling, mlr3::as_resample_result,
mlr3::as_benchmark_result or mlr3::as_data_backend. This allows for a frictionless
integration of OpenML and mlr3.
mlr3oml.cache
: Enables or disables caching globally.
If set to FALSE
, caching is disabled.
If set to TRUE
, cache directory as reported by R_user_dir()
is used.
Alternatively, you can specify a path on the local file system here.
Default is FALSE
.
mlr3oml.api_key
: API key to use. All operations supported by this package
work without an API key, but you might get rate limited without an API key.
If not set, defaults to the value of the environment variable OPENMLAPIKEY
.
mlr3oml.arff_parser
: ARFF parser to use, defaults to the internal one relies
on data.table::fread()
. Can also be set to "RWeka"
for the parser in
RWeka.
mlr3oml.parquet
: Enables or disables parquet as the default file format.
If set to TRUE
, the parquet version of datasets will be used by default.
If set to FALSE
, the arff version of datasets will be used by default.
Note that the OpenML sever is still transitioning from arff to parquet and some features
will work better with arff.
Default is FALSE
.
mlr3oml.retries
:
An integer defining number of retries when downloading data from OpenML.
If it is NULL
, the number of retries is set to 3.
Relevant for developers
mlr3oml.test_server
:
The default value for whether to use the OpenML test server.
Default is FALSE
.
mlr3oml.test_api_key
:
API key to use for the test server. If not set, defaults to the value of the environment
variable TESTOPENMLAPIKEY
.
The lgr package is used for logging.
To change the threshold, use lgr::get_logger("mlr3oml")$set_threshold()
.
Maintainer: Sebastian Fischer [email protected] (ORCID)
Authors:
Michel Lang [email protected] (ORCID)
Useful links:
Report bugs at https://github.com/mlr-org/mlr3oml/issues
By default this function creates a Pseudo-Learner (that cannot be used for training or prediction) for the given task type. This enables the conversion of OpenML Runs to mlr3::ResampleResults. This is well defined because each subcomponent (i.e. id) can only appear once in a Flow according to the OpenML docs.
## S3 method for class 'OMLFlow' as_learner(x, task_type = NULL, ...)
## S3 method for class 'OMLFlow' as_learner(x, task_type = NULL, ...)
x |
(OMLFlow) The OMLFlow that is converted to a mlr3::Learner. |
task_type |
( |
... |
Additional arguments. |
This function allows to query data sets, tasks, flows, setups, runs, and evaluation measures from https://www.openml.org/search?type=data&sort=runs&status=active using some simple filter criteria.
To find datasets for a specific task type, use list_oml_tasks()
which supports filtering according to the task
type.
Another heuristic to search for possible regression tasks is to search for data sets with
0 number of classes, i.e. by specifying number_classes = 0
.
list_oml_data( data_id = NULL, data_name = NULL, number_instances = NULL, number_features = NULL, number_classes = NULL, number_missing_values = NULL, tag = NULL, limit = limit_default(), test_server = test_server_default(), ... ) list_oml_evaluations( run_id = NULL, task_id = NULL, measures = NULL, tag = NULL, limit = limit_default(), test_server = test_server_default(), ... ) list_oml_flows( uploader = NULL, tag = NULL, limit = limit_default(), test_server = test_server_default(), ... ) list_oml_measures(test_server = test_server_default()) list_oml_runs( run_id = NULL, task_id = NULL, tag = NULL, flow_id = NULL, limit = limit_default(), test_server = test_server_default(), ... ) list_oml_setups( flow_id = NULL, setup_id = NULL, tag = NULL, limit = limit_default(), test_server = test_server_default(), ... ) list_oml_tasks( task_id = NULL, data_id = NULL, number_instances = NULL, number_features = NULL, number_classes = NULL, number_missing_values = NULL, tag = NULL, limit = limit_default(), test_server = test_server_default(), type = NULL, ... )
list_oml_data( data_id = NULL, data_name = NULL, number_instances = NULL, number_features = NULL, number_classes = NULL, number_missing_values = NULL, tag = NULL, limit = limit_default(), test_server = test_server_default(), ... ) list_oml_evaluations( run_id = NULL, task_id = NULL, measures = NULL, tag = NULL, limit = limit_default(), test_server = test_server_default(), ... ) list_oml_flows( uploader = NULL, tag = NULL, limit = limit_default(), test_server = test_server_default(), ... ) list_oml_measures(test_server = test_server_default()) list_oml_runs( run_id = NULL, task_id = NULL, tag = NULL, flow_id = NULL, limit = limit_default(), test_server = test_server_default(), ... ) list_oml_setups( flow_id = NULL, setup_id = NULL, tag = NULL, limit = limit_default(), test_server = test_server_default(), ... ) list_oml_tasks( task_id = NULL, data_id = NULL, number_instances = NULL, number_features = NULL, number_classes = NULL, number_missing_values = NULL, tag = NULL, limit = limit_default(), test_server = test_server_default(), type = NULL, ... )
data_id |
( |
data_name |
( |
number_instances |
( |
number_features |
( |
number_classes |
( |
number_missing_values |
( |
tag |
( |
limit |
( |
test_server |
( |
... |
(any) |
run_id |
( |
task_id |
( |
measures |
( |
uploader |
( |
flow_id |
( |
setup_id |
( |
type |
( |
Filter values are usually provided as single atomic values (typically integer or character).
Provide a numeric vector of length 2 (c(l, u)
) to find matches in the range .
Note that only a subset of filters is exposed here.
For a more feature-complete package, see OpenML.
Alternatively, you can pass additional filters via ...
using the names of the official API,
c.f. the REST tab of https://www.openml.org/apis.
(data.table()
) of results, or a null data.table if no data set matches the filter criteria.
Casalicchio G, Bossek J, Lang M, Kirchhoff D, Kerschke P, Hofner B, Seibold H, Vanschoren J, Bischl B (2017). “OpenML: An R Package to Connect to the Machine Learning Platform OpenML.” Computational Statistics, 1–15. doi:10.1007/s00180-017-0742-2.
Vanschoren J, van Rijn JN, Bischl B, Torgo L (2014). “OpenML.” ACM SIGKDD Explorations Newsletter, 15(2), 49–60. doi:10.1145/2641190.2641198.
# For technical reasons, examples cannot be included in this R package. # Instead, these are some relevant resources: # # Large-Scale Benchmarking chapter in the mlr3book: # https://mlr3book.mlr-org.com/chapters/chapter11/large-scale_benchmarking.html # # Package Article: # https://mlr3oml.mlr-org.com/articles/tutorial.html
# For technical reasons, examples cannot be included in this R package. # Instead, these are some relevant resources: # # Large-Scale Benchmarking chapter in the mlr3book: # https://mlr3book.mlr-org.com/chapters/chapter11/large-scale_benchmarking.html # # Package Article: # https://mlr3oml.mlr-org.com/articles/tutorial.html
Creates an OMLCollection
instance.
ocl(id, test_server = test_server_default())
ocl(id, test_server = test_server_default())
id |
( |
test_server |
( |
Creates an OMLData
instance.
odt(id, parquet = parquet_default(), test_server = test_server_default())
odt(id, parquet = parquet_default(), test_server = test_server_default())
id |
( |
parquet |
( |
test_server |
( |
(OMLData
)
Creates an OMLFlow
instance.
oflw(id, test_server = test_server_default())
oflw(id, test_server = test_server_default())
id |
( |
test_server |
( |
(OMLFlow
)
This is the class for collections (previously known as studies) served on
https://www.openml.org.
A collection can either be a task collection
or run collection.
This object can also be constructed using the sugar function ocl()
.
Run Collection
A run collection contains runs, flows, datasets and tasks.
The primary object are the runs (main_entity_type
is "run"
).
The the flows, datasets and tasks are those used in the runs.
Task Collection
A task collection (main_entity_type = "task"
) contains tasks and datasets.
The primary object are the tasks (main_entity_type
is "task"
).
The datasets are those used in the tasks.
Note: All Benchmark Suites on OpenML are also collections.
Because collections on OpenML can be modified (ids can be added), it is not possible to cache this object.
Obtain a list of mlr3::Tasks using mlr3::as_tasks.
Obtain a list of mlr3::Resamplings using mlr3::as_resamplings.
Obtain a list of mlr3::Learners using mlr3::as_learners (if main_entity_type is "run").
Obtain a mlr3::BenchmarkResult using mlr3::as_benchmark_result (if main_entity_type is "run").
mlr3oml::OMLObject
-> OMLCollection
desc
(list()
)
Colllection description (meta information), downloaded and converted from the JSON API response.
parquet
(logical(1)
)
Whether to use parquet.
main_entity_type
(character(n)
)
The main entity type, either "run"
or "task"
.
flow_ids
(integer(n)
)
An vector containing the flow ids of the collection.
data_ids
(integer(n)
)
An vector containing the data ids of the collection.
run_ids
(integer(n)
)
An vector containing the run ids of the collection.
task_ids
(integer(n)
)
An vector containing the task ids of the collection.
new()
Creates a new instance of this R6 class.
OMLCollection$new(id, test_server = test_server_default())
id
(integer(1)
)
OpenML id for the object.
test_server
(character(1)
)
Whether to use the OpenML test server or public server.
Defaults to value of option "mlr3oml.test_server"
, or FALSE
if not set.
print()
Prints the object.
OMLCollection$print()
download()
Downloads the whole object for offline usage.
OMLCollection$download()
clone()
The objects of this class are cloneable with this method.
OMLCollection$clone(deep = FALSE)
deep
Whether to make a deep clone.
Vanschoren J, van Rijn JN, Bischl B, Torgo L (2014). “OpenML.” ACM SIGKDD Explorations Newsletter, 15(2), 49–60. doi:10.1145/2641190.2641198.
# For technical reasons, examples cannot be included in this R package. # Instead, these are some relevant resources: # # Large-Scale Benchmarking chapter in the mlr3book: # https://mlr3book.mlr-org.com/chapters/chapter11/large-scale_benchmarking.html # # Package Article: # https://mlr3oml.mlr-org.com/articles/tutorial.html
# For technical reasons, examples cannot be included in this R package. # Instead, these are some relevant resources: # # Large-Scale Benchmarking chapter in the mlr3book: # https://mlr3book.mlr-org.com/chapters/chapter11/large-scale_benchmarking.html # # Package Article: # https://mlr3oml.mlr-org.com/articles/tutorial.html
This is the class for data sets served on OpenML.
This object can also be constructed using the sugar function odt()
.
A mlr3::Task can be obtained by calling mlr3::as_task()
.
The target column must either be the default target (this is the default behaviour) or one of $feature_names
.
In case the target is specified to be one of $feature_names
, the default target is added to the features
of the task.
A mlr3::DataBackend can be obtained by calling mlr3::as_data_backend()
. Depending on the
selected file-type, the returned backend is a mlr3::DataBackendDataTable (arff) or
mlr3db::DataBackendDuckDB (parquet).
Note that a converted backend can contain columns beyond the target and the features (id column or ignore columns).
Column names that don't comply with R's naming scheme are renamed (see base::make.names()
).
This means that the names can differ from those on OpenML.
The datasets stored on OpenML are either stored as (sparse) ARFF or parquet.
When creating a new OMLData
object, the constructor argument parquet
allows to switch
between arff and parquet. Note that not necessarily all data files are available as parquet.
The option mlr3oml.parquet
can be used to set a default.
If parquet
is TRUE
but not available, "arff"
will be used as a fallback.
This package comes with an own reader for ARFF files, based on data.table::fread()
.
For sparse ARFF files and if the RWeka package is installed, the reader
automatically falls back to the implementation in (RWeka::read.arff()
).
For the handling of parquet files, we rely on duckdb and DBI.
mlr3oml::OMLObject
-> OMLData
qualities
(data.table()
)
Data set qualities (performance values), downloaded from the JSON API response and
converted to a data.table::data.table()
with columns "name"
and "value"
.
tags
(character()
)
Returns all tags of the object.
parquet
(logical(1)
)
Whether to use parquet.
data
(data.table()
)
Returns the data (without the row identifier and ignore id columns).
features
(data.table()
)
Information about data set features (including target), downloaded from the JSON API response and
converted to a data.table::data.table()
with columns:
"index"
(integer()
): Column position.
"name"
(character()
): Name of the feature.
"data_type"
(factor()
): Type of the feature: "nominal"
or "numeric"
.
"nominal_value"
(list()
): Levels of the feature, or NULL
for numeric features.
"is_target"
(logical()
): TRUE
for target column, FALSE
otherwise.
"is_ignore"
(logical()
): TRUE
if this feature should be ignored.
Ignored features are removed automatically from the data set.
"is_row_identifier"
(logical()
): TRUE
if the column encodes a row identifier.
Row identifiers are removed automatically from the data set.
"number_of_missing_values"
(integer()
): Number of missing values in the column.
target_names
(character()
)
Name of the default target, as extracted from the OpenML data set description.
feature_names
(character()
)
Name of the features, as extracted from the OpenML data set description.
nrow
(integer()
)
Number of observations, as extracted from the OpenML data set qualities.
ncol
(integer()
)
Number of features (including targets), as extracted from the table of data set features.
This excludes row identifiers and ignored columns.
license
(character()
)
Returns all license of the dataset.
parquet_path
(character()
)
Downloads the parquet file (or loads from cache) and returns the path of the parquet file.
Note that this also normalizes the names of the parquet file.
new()
Creates a new instance of this R6 class.
OMLData$new( id, parquet = parquet_default(), test_server = test_server_default() )
id
(integer(1)
)
OpenML id for the object.
parquet
(logical(1)
)
Whether to use parquet instead of arff.
If parquet is not available, it will fall back to arff.
Defaults to value of option "mlr3oml.parquet"
or FALSE
if not set.
test_server
(character(1)
)
Whether to use the OpenML test server or public server.
Defaults to value of option "mlr3oml.test_server"
, or FALSE
if not set.
print()
Prints the object.
For a more detailed printer, convert to a mlr3::Task via as_task()
.
OMLData$print()
download()
Downloads the whole object for offline usage.
OMLData$download()
quality()
Returns the value of a single OpenML data set quality.
OMLData$quality(name)
name
(character(1)
)
Name of the quality to extract.
clone()
The objects of this class are cloneable with this method.
OMLData$clone(deep = FALSE)
deep
Whether to make a deep clone.
Vanschoren J, van Rijn JN, Bischl B, Torgo L (2014). “OpenML.” ACM SIGKDD Explorations Newsletter, 15(2), 49–60. doi:10.1145/2641190.2641198.
# For technical reasons, examples cannot be included in this R package. # Instead, these are some relevant resources: # # Large-Scale Benchmarking chapter in the mlr3book: # https://mlr3book.mlr-org.com/chapters/chapter11/large-scale_benchmarking.html # # Package Article: # https://mlr3oml.mlr-org.com/articles/tutorial.html
# For technical reasons, examples cannot be included in this R package. # Instead, these are some relevant resources: # # Large-Scale Benchmarking chapter in the mlr3book: # https://mlr3book.mlr-org.com/chapters/chapter11/large-scale_benchmarking.html # # Package Article: # https://mlr3oml.mlr-org.com/articles/tutorial.html
This is the class for flows served on OpenML.
Flows represent machine learning algorithms.
This object can also be constructed using the sugar function oflw()
.
Obtain a mlr3::Learner using mlr3::as_learner()
.
mlr3oml::OMLObject
-> OMLFlow
parameter
(data.table
)
The parameters of the flow.
dependencies
(character()
)
The dependencies of the flow.
tags
(character()
)
Returns all tags of the object.
new()
Creates a new instance of this R6 class.
OMLFlow$new(id, test_server = test_server_default())
id
(integer(1)
)
OpenML id for the object.
test_server
(character(1)
)
Whether to use the OpenML test server or public server.
Defaults to value of option "mlr3oml.test_server"
, or FALSE
if not set.
print()
Prints the object.
OMLFlow$print()
download()
Downloads the whole object for offline usage.
OMLFlow$download()
clone()
The objects of this class are cloneable with this method.
OMLFlow$clone(deep = FALSE)
deep
Whether to make a deep clone.
Vanschoren J, van Rijn JN, Bischl B, Torgo L (2014). “OpenML.” ACM SIGKDD Explorations Newsletter, 15(2), 49–60. doi:10.1145/2641190.2641198.
# For technical reasons, examples cannot be included in this R package. # Instead, these are some relevant resources: # # Large-Scale Benchmarking chapter in the mlr3book: # https://mlr3book.mlr-org.com/chapters/chapter11/large-scale_benchmarking.html # # Package Article: # https://mlr3oml.mlr-org.com/articles/tutorial.html
# For technical reasons, examples cannot be included in this R package. # Instead, these are some relevant resources: # # Large-Scale Benchmarking chapter in the mlr3book: # https://mlr3book.mlr-org.com/chapters/chapter11/large-scale_benchmarking.html # # Package Article: # https://mlr3oml.mlr-org.com/articles/tutorial.html
All OML Objects inherit from this class. Don't use his class directly.
desc
(list()
)
Description of OpenML object.
cache_dir
(logical(1)
| character(1)
)
Stores the location of the cache for objects retrieved from OpenML.
If set to FALSE
, caching is disabled.
Objects from the test server are stored in the subdirectory 'test', those from the public
server are stored in the subdirectory 'public'.
The package qs is required for caching.
id
(integer(1)
)
OpenML data id.
server
(character(1)
)
The server for this object.
man
(character(1)
)
The manual entry.
name
(character(1)
)
The name of the object.
type
(character()
)
The type of OpenML object (e.g. task, run, ...).
test_server
(logical(1)
)
Whether the object is using the test server.
new()
Creates a new instance of this R6 class.
OMLObject$new(id, test_server = test_server_default(), type)
id
(integer(1)
)
OpenML id for the object.
test_server
(character(1)
)
Whether to use the OpenML test server or public server.
Defaults to value of option "mlr3oml.test_server"
, or FALSE
if not set.
type
(charcater()
)
The type of OpenML object (e.g. run, task, ...).
help()
Opens the corresponding help page referenced by field $man
.
OMLObject$help()
clone()
The objects of this class are cloneable with this method.
OMLObject$clone(deep = FALSE)
deep
Whether to make a deep clone.
This is the class for OpenML Runs, which are
conceptually similar to mlr3::ResampleResults.
This object can also be constructed using the sugar function oml_run()
.
A OMLTask is returned by accessing the active field $task
.
A OMLData is returned by accessing the active field $data
(short for $task$data
)
A OMLFlow is returned by accessing the active field $flow
.
The raw predictions are returned by accessing the active field $prediction
.
A mlr3::ResampleResult is returned when calling mlr3::as_resample_result()
.
A mlr3::Task is returned when calling mlr3::as_task()
.
A mlr3::DataBackend is returned when calling mlr3::as_data_backend()
.
A instantiated mlr3::Resampling is returned when calling mlr3::as_resampling()
.
mlr3oml::OMLObject
-> OMLRun
flow_id
(integer(1)
)
The id of the flow.
flow
(OMLFlow)
The OpenML Flow.
tags
(character()
)
Returns all tags of the object.
parquet
(logical(1)
)
Whether to use parquet.
task_id
(character(1)
)
The id of the task solved by this run.
task
(OMLTask)
The task solved by this run.
data_id
(integer(1)
)
The id of the dataset.
data
(OMLData)
The data used in this run.
task_type
(character()
)
The task type.
parameter_setting
data.table()
)
The parameter setting for this run.
prediction
(data.table()
)
The raw predictions of the run as returned by OpenML, not in standard mlr3 format.
Formatted predictions are accessible after converting to a mlr3::ResampleResult via
as_resample_result()
.
evaluation
(data.table()
)
The evaluations calculated by the OpenML server.
new()
Creates a new instance of this R6 class.
OMLRun$new( id, parquet = parquet_default(), test_server = test_server_default() )
id
(integer(1)
)
OpenML id for the object.
parquet
(logical(1)
)
Whether to use parquet instead of arff.
If parquet is not available, it will fall back to arff.
Defaults to value of option "mlr3oml.parquet"
or FALSE
if not set.
test_server
(character(1)
)
Whether to use the OpenML test server or public server.
Defaults to value of option "mlr3oml.test_server"
, or FALSE
if not set.
print()
Prints the object.
OMLRun$print()
download()
Downloads the whole object for offline usage.
OMLRun$download()
clone()
The objects of this class are cloneable with this method.
OMLRun$clone(deep = FALSE)
deep
Whether to make a deep clone.
Vanschoren J, van Rijn JN, Bischl B, Torgo L (2014). “OpenML.” ACM SIGKDD Explorations Newsletter, 15(2), 49–60. doi:10.1145/2641190.2641198.
# For technical reasons, examples cannot be included in this R package. # Instead, these are some relevant resources: # # Large-Scale Benchmarking chapter in the mlr3book: # https://mlr3book.mlr-org.com/chapters/chapter11/large-scale_benchmarking.html # # Package Article: # https://mlr3oml.mlr-org.com/articles/tutorial.html
# For technical reasons, examples cannot be included in this R package. # Instead, these are some relevant resources: # # Large-Scale Benchmarking chapter in the mlr3book: # https://mlr3book.mlr-org.com/chapters/chapter11/large-scale_benchmarking.html # # Package Article: # https://mlr3oml.mlr-org.com/articles/tutorial.html
This is the class for tasks served on OpenML.
It consists of a dataset and other meta-information such as the target variable for supervised
problems.
This object can also be constructed using the sugar function otsk()
.
Obtain a mlr3::Task by calling as_task()
.
Obtain a mlr3::Resampling by calling as_resampling()
.
mlr3oml::OMLObject
-> OMLTask
estimation_procedure
(list()
)
The estimation procedure, returns NULL
if none is available.
task_splits
(data.table()
)
A data.table containing the splits as provided by OpenML.
tags
(character()
)
Returns all tags of the object.
parquet
(logical(1)
)
Whether to use parquet.
name
(character(1)
)
Name of the task, extracted from the task description.
task_type
(character(1)
)
The OpenML task type.
data_id
(integer()
)
Data id, extracted from the task description.
data
(OMLData)
Access to the underlying OpenML data set via a OMLData object.
nrow
(integer()
)
Number of rows, extracted from the OMLData object.
ncol
(integer()
)
Number of columns, as extracted from the OMLData object.
target_names
(character()
)
Name of the targets, as extracted from the OpenML task description.
feature_names
(character()
)
Name of the features (without targets of this OMLTask).
data_name
(character()
)
Name of the dataset (inferred from the task name).
new()
Creates a new instance of this R6 class.
OMLTask$new( id, parquet = parquet_default(), test_server = test_server_default() )
id
(integer(1)
)
OpenML id for the object.
parquet
(logical(1)
)
Whether to use parquet instead of arff.
If parquet is not available, it will fall back to arff.
Defaults to value of option "mlr3oml.parquet"
or FALSE
if not set.
test_server
(character(1)
)
Whether to use the OpenML test server or public server.
Defaults to value of option "mlr3oml.test_server"
, or FALSE
if not set.
print()
Prints the object.
For a more detailed printer, convert to a mlr3::Task via $task
.
OMLTask$print()
download()
Downloads the whole object for offline usage.
OMLTask$download()
clone()
The objects of this class are cloneable with this method.
OMLTask$clone(deep = FALSE)
deep
Whether to make a deep clone.
Vanschoren J, van Rijn JN, Bischl B, Torgo L (2014). “OpenML.” ACM SIGKDD Explorations Newsletter, 15(2), 49–60. doi:10.1145/2641190.2641198.
# For technical reasons, examples cannot be included in this R package. # Instead, these are some relevant resources: # # Large-Scale Benchmarking chapter in the mlr3book: # https://mlr3book.mlr-org.com/chapters/chapter11/large-scale_benchmarking.html # # Package Article: # https://mlr3oml.mlr-org.com/articles/tutorial.html
# For technical reasons, examples cannot be included in this R package. # Instead, these are some relevant resources: # # Large-Scale Benchmarking chapter in the mlr3book: # https://mlr3book.mlr-org.com/chapters/chapter11/large-scale_benchmarking.html # # Package Article: # https://mlr3oml.mlr-org.com/articles/tutorial.html
Creates an OMLRun
instance.
orn(id, parquet = parquet_default(), test_server = test_server_default())
orn(id, parquet = parquet_default(), test_server = test_server_default())
id |
( |
parquet |
( |
test_server |
( |
(OMLRun
)
Creates an OMLTask
instance.
otsk(id, parquet = parquet_default(), test_server = test_server_default())
otsk(id, parquet = parquet_default(), test_server = test_server_default())
id |
( |
parquet |
( |
test_server |
( |
(OMLTask
)
Publish a collection to OpenML This can also be achieved through the website.
publish_collection( ids, name, desc, main_entity_type = "task", alias = NULL, api_key = NULL, test_server = test_server_default() )
publish_collection( ids, name, desc, main_entity_type = "task", alias = NULL, api_key = NULL, test_server = test_server_default() )
ids |
( |
name |
( |
desc |
( |
main_entity_type |
( |
alias |
( |
api_key |
( In case |
test_server |
( |
Upload a dataset to OpenML. This can also be achieved through the website.
publish_data( data, name, desc, license = NULL, default_target = NULL, citation = NULL, row_identifier = NULL, ignore_attribute = NULL, original_data_url = NULL, paper_url = NULL, test_server = test_server_default(), api_key = NULL )
publish_data( data, name, desc, license = NULL, default_target = NULL, citation = NULL, row_identifier = NULL, ignore_attribute = NULL, original_data_url = NULL, paper_url = NULL, test_server = test_server_default(), api_key = NULL )
data |
( |
name |
( |
desc |
( |
license |
( |
default_target |
( |
citation |
( |
row_identifier |
( |
ignore_attribute |
( |
original_data_url |
(character(1)) |
paper_url |
( |
test_server |
( |
api_key |
( In case |
Publish a task on OpenML. This can also be achieved through the website.
publish_task( id, type, estimation_procedure, target, api_key = NULL, test_server = test_server_default() )
publish_task( id, type, estimation_procedure, target, api_key = NULL, test_server = test_server_default() )
id |
( |
type |
( |
estimation_procedure |
( |
target |
( |
api_key |
( In case |
test_server |
( |
Parses a file located at path
and returns a data.table()
.
Limitations:
Only works for dense files, no support for sparse data. Use RWeka instead.
Dates (even if there is no time component) are read in as POSIXct.
The date-format
from the ARFF specification is currently ignored.
Instead, we rely on the auto-detection of data.table's fread()
..
read_arff(path)
read_arff(path)
path |
( |
(data.table()
).
Writes a data.frame()
to an ARFF file.
Limitations:
Logicals are written as categorical features.
POSIXct columns are converted to UTC.
write_arff(data, path, relation = deparse(substitute(data)))
write_arff(data, path, relation = deparse(substitute(data)))
data |
( |
path |
( |
relation |
( |