Example datasets#

Explore our example datasets on this page. Understanding their parameters means you can ascertain which one is best to use to trial twinLab’s capabilities.

Many of these datasets have been used in our tutorials, which you can check out on the Examples page.

The quickstart dataset#

The quickstart dataset is a simple, non-contextual dataset. In the dataset, the rows are the samples and the columns are:

Dataset property
Author	Dr Freddy Wordingham (digiLab Solutions Ltd.)
Provenance	Generated randomly with packages like numpy and scipy.
Copyright information	MIT license
Size	20
Shape	(10,2)

We can see how the dataset is distributed if we plot the points as a scatterplot:

A scatterplot of the quickstart dataset.

The advancedstart dataset#

The advancedstart dataset is a simple, non-contextual dataset. In the dataset, the rows are the samples and the columns are:

Dataset property
Author	Dr Alexander Mead (digiLab Solutions Ltd.)
Provenance	Generated randomly with packages like numpy and scipy.
Copyright information	MIT license
Size	75
Shape	(25,3)

We can see how the dataset is distributed if we plot the points as a 3D scatterplot:

A scatterplot of the advancedstart dataset.

The biscuits dataset#

The biscuits dataset explores a hypothetical pricing optimisation problem for the manager of a biscuit factory. In the dataset, the rows are the samples and the columns are:

Pack price in GBP
The number of biscuits per pack
The number of packs sold
Profit made in GBP

Dataset property
Author	Dr Freddy Wordingham (digiLab Solutions Ltd.)
Provenance	Generated randomly with packages like numpy and scipy.
Copyright information	MIT license
Size	48
Shape	(12,4)

We can see how the dataset is distributed if we plot scatterplots. Because this is a 4D dataset, to better understand the distribution of points we present two different plots, to represent each y-value via the colorbars:

The first 3D scatterplot of the biscuits dataset.

The second 3D scatterplot of the biscuits dataset.

The gardening dataset#

The gardening dataset explores a hypothetical growth optimisation problem for an intrepid gardener seeking to understand what makes their plants grow best. In the dataset, the rows are the samples and the columns are:

Sunlight in hours per day
The amount of times the garden was watered per week
The amount of units of fruit produced

Dataset property
Author	Dr Alexander Mead (digiLab Solutions Ltd.)
Provenance	Generated randomly with packages like numpy and scipy.
Copyright information	MIT license
Size	75
Shape	(25,3)

We can see how the dataset is distributed if we plot the points on a 4D scatterplot:

A 4D scatterplot of the gardening dataset.

The tritium-desorption-small dataset#

The tritium-desorption-small dataset explores microscopic transport of tritium in fusion reactor materials. In the dataset, the rows are the samples and the columns are:

E1, E2, E3; representing: the detrapping energy of tritrium traps in a reactor.
n1, n2; representing: the density of the intrinsic traps.
y0-y623; representing: the flux of tritium across the trap boundary as a function of time, in atomic fractions.

The dataset is created from simulations of Achlys using the software UM-Bridge. Achlys models the macroscopic transport (and subsequent desorption) of tritium through fusion reactor materials using Foster-McNabb equations.

Dataset property
Authors	Dr Mikkel Lykkegaard (digiLab Solutions Ltd.) and Dr Anne Reinarz (Durham University)
Provenance	This dataset was created as part of simulations calculated in Seelinger et al. 2024 (arXiv: 2402.13768v4). The software used to generate this dataset was UM-Bridge, and more details about the simulation and subsequent generated dataset can be found on the UM-Bridge documentation, on both the inverse benchmark documentation, and the benchmark documentation.
Copyright information	MIT license
Size	251,600
Shape	(400,629)

The tritium-desorption-temperature-grid dataset#

The tritium-desorption-temperature-grid dataset is an accompaniment to the tritium-desorption-small dataset.

The grid is derived from accompanying simulations of Achlys using the software UM-Bridge. Achlys models the macroscopic transport (and subsequent desorption) of tritium through fusion reactor materials using Foster-McNabb equations.

Dataset property
Authors	Dr Mikkel Lykkegaard (digiLab Solutions Ltd.) and Dr Anne Reinarz (Durham University)
Provenance	This dataset was derived as part of simulations calculated in Seelinger et al. 2024 (arXiv: 2402.13768v4). The software used to generate this dataset was UM-Bridge, and more details about the simulation and subsequent generated dataset can be found on the UM-Bridge documentation, on both the inverse benchmark documentation, and the benchmark documentation.
Copyright information	MIT license
Size	623
Shape	(623,1)

The jet-confinement dataset#

This dataset explores the confinement of magnetic fusion devices and how that changes with device parameters. Derived from a larger dataset which features dozens of fusion experiments around the world, it is a subset that describes the outcome of high-confinement mode experiments from Joint European Torus (JET). JET is a record-breaking tokamak located in the UK.

In the dataset, the rows are the samples and the columns are:

Magnetic field strength: the intensity of the magnetic field, in units of teslas, applied to confine the plasma within the tokamak.
Plasma current: the electric current flowing through the plasma, in amperes.
Thermal power: the estimated amount of heat energy, in watts, consumed by the plasma.
Major radius: the distance in meters from the center of the tokamak to the center of the plasma.
Elongation: the ratio of the plasma’s height to its width.
Electron density: the number of electrons per unit volume (meters cubed) within the plasma.
Effective mass number: the average atomic mass (amu) of the ions in the plasma, weighted by their abundance.
Inverse aspect ratio: the ratio of the plasma’s minor radius to its major radius.
Energy confinement time: the duration, in seconds, for which the plasma retains its energy before it is lost to the surroundings.

Dataset property
Authors	Verdoolaege, G. (Ghent University), Kaye, S. M. (Princeton University), Angioni, C. (Max-Planck-Institut für Plasmaphysik), Kardaunn, O. W. J. F. (Max-Planck-Institut für Plasmaphysik), Maslov, M. (United Kingdom Atomic Energy Authority), Romanelli, M. (United Kingdom Atomic Energy Authority), Ryter, F. (Max-Planck-Institut für Plasmaphysik), and Thomsen, K (Max-Planck-Institut für Plasmaphysik).
Provenance	This dataset is a subset of JET experiment data from the ITPA Global H-mode Confinement Database . This dataset was published by Princeton Plasma Physics Laboratory, Princeton University. It was funded by the United States Department of Energy and the Euratom Research and Training Programme.
Copyright information	Creative Commons Attribution 4.0 International (CC BY)
Size	29160
Shape	(3240, 9)

We can see how the energy confinement time is distributed against the magnetic field strength if we plot the data as a scatterplot:

A scatterplot of the energy confinement time versus the magnetic field strength for JET experiments.