twinlab.ScoreParams#
- class twinlab.ScoreParams(metric='MSLL', combined_score=False)[source]#
Parameter configuration for scoring a trained emulator.
- Variables:
metric (str, optional) –
Metric used for scoring the performance of an emulator. Can be one of:
”MSLL”: The mean standardised log loss (MSLL), calculated as the mean of the log loss of the emulator minus the mean log loss of a trivial model. The log loss is defined as the negative log of the probability of getting the test data value according to a predicted distribution. The trivial model is taken to be a Gaussian distribution with mean and standard deviation equal to those of the training data. Lower (more negative) scores are better, while positive scores indicate serious problems (a model that is less good than the extremely-trivial naive model). The MSLL can be thought of as a measure of how good a model is, as opposed to just taking the average and standard deviation of the training data. This is the default metric, and the only metric that accounts for the model uncertainty, which is usually necessary when training a probabilistic model.
”MSE”: The mean squared error (MSE) is the average of the squared differences between your predicted mean values and those of the test set. The MSE quantifies deviations in the model mean predictions only, and is not affected by the model uncertainty estimates. A value of zero indicates a model that fits to the data perfectly, but this is not necessarily desirable, as it may indicate overfitting.
”RMSE”: The root mean squared error (RMSE) is the square root of the MSE and provides a measure of the expected error in the output. The RMSE may be considered more interpretable than the MSE, because it shares the same units as the output values. However, like the MSE and R2, since model uncertainty is not apart of how this metric is calculated, a desirable RMSE score may belie an underlying poorly-fitting model.
”R2”: A dimensionless number calculated as one minus the ratio of the MSE to the variance of the test set. A value of 1 indicates a perfect model, while a value of 0 indicates a model that is no better than the mean of the test set. Negative values are possible but unusual, and indicate a model that is worse than simply taking the mean of the test set. As with MSE and RMSE, the model uncertainty is not accounted for in this score; thus, it is possible to have a high R2 score, but a poorly-fitting model.
The default metric is “MSLL”.
combined_score (bool, optional) – Determine whether to combine (average) the emulator score across output dimensions. If False, a dataframe of scores will be returned, with the score for each output dimension, even if there is only a single emulator output dimension. If True, a single number will be returned, which is the average score across all output dimensions. The default is
False
.
Methods
__init__
([metric, combined_score])unpack_parameters
()