twinlab.Emulator.train#

Emulator.train(dataset, inputs, outputs, params=<twinlab.params.TrainParams object>, wait=True, verbose=True)[source]#

Train an emulator on the twinLab cloud.

This is the primary functionality of twinLab, whereby an emulator is trained to learn patterns from a dataset. The emulator learns trends in the dataset and then is able to make predictions on new data. These new data may be far away from the training data; the emulator will effectively interpolate between the training data points. The emulator can also be used to extrapolate beyond the training data, but this is less reliable.

Emulators can be trained on datasets with multiple inputs and outputs, and can be used to make predictions on new data with multiple inputs and outputs. The powerful algorithms in twinLab allow for the emulator to not only make predictions, but to also quantify the uncertainty in these predictions. This is extremely advantageous, because it allows for the reliability of the predictions to be quantified.

The training process will start on the twinLab cloud using the ID of the emulator previously instantiated (with tl.Emulator(id=)). If that emulator has not been trained already, the training process will start directly. Otherwise, the process will required you to rename the emulator or delete the existing one (with Emulator.delete()).

See the documentation for TrainParams() for more information on the available training parameters. The documentation for EstimatorParams() contains information about estimator types and kernels. Finally, the documentation for ModelSelectionParams() details automatic model selection parameters.

Parameters:
  • dataset (Dataset) – The training and test data for the emulator. The ratio of train to test data can be set in TrainParams.

  • inputs (list[str]) – A list of the input column names in the training dataset. These correspond to the independent variables in the dataset, which are often the parameters of a model. These are usually known as X (note that capital) values.

  • outputs (list[str]) – A list of the output column names in the training dataset. These correspond to the dependent variables in the dataset, which are often the results of a model. These are usually known as y values.

  • params (TrainParams, optional) – A training parameter configuration that contains all optional training parameters.

  • wait (bool, optional) – If True wait for the job to complete, otherwise return the process ID and exit. Setting wait=False is useful for running longer training jobs. The status of all emulators, including those currently training, can be queried using tl.list_emulators(verbose=True).

  • verbose (bool, optional) – Display information about the operation while running.

Returns:

If wait=True the function will run until the emulator is trained on the cloud. If wait=False the function will return the process ID and exit. This is useful for longer training jobs. The training status can then be checked later using Emulator.status().

Example

Train a simple emulator:

df = pd.DataFrame({"X": [1, 2, 3, 4], "y": [1, 4, 9, 16]})
dataset = tl.Dataset("my_dataset")
dataset.upload(df)
emulator = tl.Emulator("my_emulator")
emulator.train(dataset, ["X"], ["y"])

Train a emulator with dimensionality reduction (here on the output):

dataset = tl.Dataset("my_dataset")
emulator = tl.Emulator("my_emulator")
params = tl.TrainParams(output_retained_dimensions=1)
emulator.train(dataset, ["X"], ["y1", "y2"], params)

Train an emulator with a specified (here variational) estimator type:

dataset = tl.Dataset("my_dataset")
emulator = tl.Emulator("my_emulator")
estimator_params=tl.EstimatorParams(estimator_type="variational_gp")
params = tl.TrainParams(estimator_params=estimator_params)
emulator.train(dataset, ["X"], ["y"], params)

Train an emulator with a specific (here linear) kernel:

dataset = tl.Dataset("my_dataset")
emulator = tl.Emulator("my_emulator")
params = tl.TrainParams(estimator_params=tl.EstimatorParams(kernel="LIN"))
emulator.train(dataset, ["X"], ["y"], params)

Train an emulator using automatic kernel selection to find the best kernel:

dataset = tl.Dataset("my_dataset")
emulator = tl.Emulator("my_emulator")
params = tl.TrainParams(model_selection=True)
emulator.train(dataset, ["X"], ["y"], params)