twinlab.Dataset.analyse_variance#

Dataset.analyse_variance(columns, verbose=False)[source]#

Return an analysis of the variance retained per dimension after performing singular value decomposition (SVD) on the dataset.

SVD is useful for understanding how much variance in the dataset is retained after projecting it into a new basis. SVD components are naturally ordered by the amount of variance they retain, with the first component retaining the most variance. A decision can be made about how many dimensions to keep based on the cumulative variance retained. This analysis is usually performed on either the set of input or output columns of the dataset.

Parameters:
  • columns (list[str]) – List of columns to evaluate. This is typically either the set of input or output columns.

  • verbose (bool, optional) – Display information about the operation while running.

Returns:

A pandas.DataFrame containing the variance analysis.

Return type:

pandas.Dataframe

Example

dataset = tl.Dataset("quickstart")
dataset.analyse_variance(columns=["x", "y"]) # Typically either input or output columns
   Number of Dimensions  Cumulative Variance
0                     0             0.000000
1                     1             0.925741
2                     2             1.000000