twinlab.TrainParams#

class twinlab.TrainParams(estimator='gaussian_process_regression', estimator_params=<twinlab.params.EstimatorParams object>, input_explained_variance=None, input_retained_dimensions=None, output_explained_variance=None, output_retained_dimensions=None, fidelity=None, class_column=None, dataset_std=None, train_test_ratio=1.0, model_selection=False, model_selection_params=<twinlab.params.ModelSelectionParams object>, shuffle=True, seed=42)[source]#

Parameter configuration for training an emulator.

This includes parameters that pertain directly to the training of the model, such as the ratio of training to testing data, as well as parameters that pertain to the setup of the model such as the number of dimensions to retain after decomposition.

Variables:
  • estimator (str, optional) – The type of estimator (emulator) to be trained. Currently only “gaussian_process_regression” is supported, which is the default value.

  • estimator_params (EstimatorParams, optional) – The set of parameters for the emulator.

  • input_retained_dimensions (Union[int, None], optional) – The number of input dimensions to retain after applying dimensional reduction. Setting this cannot be done at the same time as specifying the input_explained_variance. The maximum number of input dimensions currently allowed by twinLab is 20. The default value is None, which means that dimensional reduction is not applied to the input unless input_explained_variance is specified.

  • input_explained_variance (Union[float, None], optional) – Specifies what fraction of the variance of the input data is retained after applying dimensional reduction. This must be a number between 0 and 1. This cannot be specified at the same time as input_retained_dimensions. The default value is None, which means that dimensional reduction is not applied to the input unless input_retained_dimensions is specified.

  • output_retained_dimensions (Union[int, None], optional) – The number of output dimensions to retain after applying dimensional reduction. Setting this cannot be done at the same time as specifying the output_explained_variance. The maximum number of output dimensions currently allowed by twinLab is 10. The default value is None, which means that dimensional reduction is not applied to the output unless output_explained_variance is specified.

  • output_explained_variance (Union[float, None], optional) – Specifies what fraction of the variance of the output data is retained after applying dimensional reduction. This must be a number between 0 and 1. This cannot be specified at the same time as output_retained_dimensions. The default value is None, which means that dimensional reduction is not applied to the output unless output_retained_dimensions is specified.

  • fidelity (Union[str, None], optional) – Name of the column in the dataset corresponding to the fidelity parameter if a multi-fidelity model (estimator_type="multi_fidelity_gp" in EstimatorParams) is being trained. Fidelity is used to differentiate the quality of individual data samples on which the emulator is being trained. The default value is None, because this argument is not required unless a multi-fidelity model is being trained.

  • class_column (Union[str, None], optional) – The name of the column that contains the classification labels if training a mixture-of-experts model (estimator_type="mixture_of_experts_gp" in EstimatorParams). The classification labels distinguish different groups of data, which the emulator uses to train a set of expert models, with one expert tailored to each group. If the training data contains n classes, the classes must be labelled from 0 to n-1. The default value is None, because this argument is not required unless a mixture-of-experts model is being trained.

  • train_test_ratio (Union[float, None], optional) – Specifies the fraction of training samples in the dataset. This must be a number beteen 0 and 1. The default value is 1, which means that all of the provided data is used for training. This is good to make the most out of a dataset, but means that it will not be possible to score or benchmark the performance of an emulator.

  • dataset_std (Union[Dataset, None], optional) – A twinLab dataset object that contains the standard deviation of the training data. This is necessary when training a heteroskedastic or fixed noise emulator.

  • model_selection (bool, optional) – Whether to run Bayesian model selection, a form of automatic machine learning. The default value is False, which simply trains the specified emulator, rather than iterating over them.

  • model_selection_params (ModelSelectionParams, optional) – The parameters for model selection, if it is being used.

  • shuffle (bool, optional) – Whether to randomly shuffle the training data before splitting it into training and testing sets. The default value is True. Please be particularly careful while using this parameter with time-series data.

  • seed (Union[int, None], optional) – The seed used to initialise the random number generators for reproducibility. Setting to an integer is necessary for reproducible results. The default value is 42, which is useful for reproducibility, but it can be set to None to randomly generate the seed each time. Be aware that the seed is used in the training process, so if the seed is set to None the trained emulator will not be reproducible.

__init__(estimator='gaussian_process_regression', estimator_params=<twinlab.params.EstimatorParams object>, input_explained_variance=None, input_retained_dimensions=None, output_explained_variance=None, output_retained_dimensions=None, fidelity=None, class_column=None, dataset_std=None, train_test_ratio=1.0, model_selection=False, model_selection_params=<twinlab.params.ModelSelectionParams object>, shuffle=True, seed=42)[source]#

Methods

__init__([estimator, estimator_params, ...])

unpack_parameters()