twinlab.Emulator.train#
- Emulator.train(dataset, inputs, outputs, params=<twinlab.params.TrainParams object>, wait=True, verbose=True)[source]#
Train an emulator on the twinLab cloud.
This is the primary functionality of twinLab, whereby an emulator is trained to learn patterns from a dataset. The emulator learns trends in the dataset and then is able to make predictions on new data. These new data may be far away from the training data; the emulator will effectively interpolate between the training data points. The emulator can also be used to extrapolate beyond the training data, but this is less reliable.
Emulators can be trained on datasets with multiple inputs and outputs, and can be used to make predictions on new data with multiple inputs and outputs. The powerful algorithms in twinLab allow for the emulator to not only make predictions, but to also quantify the uncertainty in these predictions. This is extremely advantageous, because it allows for the reliability of the predictions to be quantified.
See the documentation for
TrainParams()
for more information on the available training parameters. The documentation forEstimatorParams()
contains information about estimator types and kernels. Finally, the documentation forModelSelectionParams()
details automatic model selection parameters.- Parameters:
dataset (Dataset) – The training and test data for the emulator. The ratio of train to test data can be set in
TrainParams
.inputs (list[str]) – A list of the input column names in the training dataset. These correspond to the independent variables in the dataset, which are often the parameters of a model. These are usually known as
X
(note that capital) values.outputs (list[str]) – A list of the output column names in the training dataset. These correspond to the dependent variables in the dataset, which are often the results of a model. These are usually known as
y
values.params (TrainParams, optional) – A training parameter configuration that contains all optional training parameters.
wait (bool, optional) – If
True
wait for the job to complete, otherwise return the process ID and exit. Settingwait=False
is useful for running longer training jobs. The status of all emulators, including those currently training, can be queried usingtl.list_emulators(verbose=True)
.verbose (bool, optional) – Display information about the operation while running.
- Returns:
If
wait=True
the function will run until the emulator is trained on the cloud. Ifwait=False
the function will return the process ID and exit. This is useful for longer training jobs. The training status can then be checked later usingEmulator.status()
.
Example
Train a simple emulator:
df = pd.DataFrame({"X": [1, 2, 3, 4], "y": [1, 4, 9, 16]}) dataset = tl.Dataset("my_dataset") dataset.upload(df) emulator = tl.Emulator("my_emulator") emulator.train(dataset, ["X"], ["y"])
Train a emulator with dimensionality reduction (here on the output):
dataset = tl.Dataset("my_dataset") emulator = tl.Emulator("my_emulator") params = tl.TrainParams(output_retained_dimensions=1) emulator.train(dataset, ["X"], ["y1", "y2"], params)
Train an emulator with a specified (here variational) estimator type:
dataset = tl.Dataset("my_dataset") emulator = tl.Emulator("my_emulator") estimator_params=tl.EstimatorParams(estimator_type="variational_gp") params = tl.TrainParams(estimator_params=estimator_params) emulator.train(dataset, ["X"], ["y"], params)
Train an emulator with a specific (here linear) kernel:
dataset = tl.Dataset("my_dataset") emulator = tl.Emulator("my_emulator") params = tl.TrainParams(estimator_params=tl.EstimatorParams(kernel="LIN")) emulator.train(dataset, ["X"], ["y"], params)
Train an emulator using automatic kernel selection to find the best kernel:
dataset = tl.Dataset("my_dataset") emulator = tl.Emulator("my_emulator") params = tl.TrainParams(model_selection=True) emulator.train(dataset, ["X"], ["y"], params)