Physics Datasets

CUORE
This dataset contains 10,000 triggered events from the CUORE (Cryogenic Underground Observatory for Rare Events) experiment. Each event is a 10-s waveform, beginning with a 3-s pretrigger baseline. CUORE digitizes their detector readout at 1 kHz, thus each event contains 10,000 samples. Two classes of CUORE detector pulses are included in this dataset: single-pulse events and pile-up events; the latter contains two or more pulses. This dataset is evenly split between the two classes. Only 90% of the total dataset is labeled for training purposes, while the remaining 10% is unlabeled and reserved for testing.
- Input Data: 1D array with 10000 samples (Waveform), eventId
- Labels: 1 Binary Classification label (clean single-pulse event: 0, pile-up pulse event: 1)
- Format: HDF5
Note: The waveforms stored in the HDF5 files are raw and have not been normalized. The provided normalization parameters allow normalization to be applied during data loading.
Download Dataset (Zenodo)
Read Description under cuore directory on NERSC
KLZ
This dataset has been approved for use for the 0νββ AI Summer School but not for public release. Students interested in analyzing this dataset will receive details and access during the summer school.
MJD
This dataset contains over 3 million data points derived from the MAJORANA DEMONSTRATOR (MJD) experiment, which utilizes High Purity Germanium Detectors to search for Neutrinoless Double-Beta Decay. Each data point represents a real time-series waveform generated by the detector.
- Input Data: 1D Numpy vector with 4000 samples (Time Series)
- Labels: 4 Binary Classification labels + 1 Energy Regression label
- Format: HDF5
Download Dataset (Zenodo)
Read Description (arXiv)
nEXO
This dataset contains 83.5M simulated double beta decay events as recorded by the charge collectors of a single-phase liquid xenon TPC. The simulation follows the approximate design specifications of the nEXO experiment, which seeks to measure neutrinoless double beta (0νββ) decay. The charge collection portion of nEXO is comprised of square tiles embedded with electrodes, 16 by 16 channels in perpendicular to reconstruct charge signals into position information. Here, events are confined to a single tile. Thus, each recorded event is an array of 32 channels of 1D charge waveform data within a window of 1377 time steps in units of 0.5 microseconds. Single events will appear duplicated as both sets of perpendicular channels will record charge. Each event has a clean (“noiseless”) and noisy version, along with an energy value in MeV.
- Input Data: EArrays (convertible to Numpy and Pandas, code will be provided)—clean event, noisy event, energy
- Labels: clean event (denoiser_img_clean), noisy event (denoiser_img), energy
- Format: HDF5
Download Dataset (Zenodo)
Read Description (arXiv)
NEXT
This dataset contains a simulation of ~650k signal (0νββ) and ~500k background (Bi-214) events inside the NEXT detector which uses a high pressure gaseous Time Projection Chamber detector to search for Neutrinoless Double-Beta Decay. Each unique event represents the voxelized spatial (x,y,z) and energy depositions made by an electron or gamma ray depositing energy inside the xenon gas medium. Signal events contain a two-electron signature, while background events represent a single electron signature originating from a gamma interaction.
- Input Data: Pandas dataframe containing columns with event_id, x, y, z, energy
- Labels: 1 Binary Classification label (
signalorbackground) - Format: HDF5
Note: Full dataset is reasonably large, recommended to start with 10% of the full dataset (e.g. 0nubb_part_1.tar + Bi_part_1.tar) before training over full dataset.
Download Dataset (Zenodo)
Read Description (arXiv)
Project 8
This dataset contains ~10,000 simulated Cyclotron Radiation Emission Spectroscopy (CRES) events from the Project 8 experiment. Each event is a two-dimensional time–frequency spectrogram of shape (1024, 512) (1024 frequency bins × 512 time bins), in which a β-decay electron appears as one or more narrow, positively-sloped tracks separated by frequency jumps from scattering. The events are simulated cavity CRES signals with electron energies near the tritium β-decay endpoint (~18.6 keV). Each spectrogram is accompanied by a pixel-level segmentation label (track vs. background) and a per-pixel weight map that addresses the severe class imbalance, since track pixels make up well under 1% of each frame. White Gaussian noise is added to otherwise signal-only spectrograms at a configured signal-to-noise ratio. The dataset is split 90%/10%: the 90% training portion is fully labeled, while the 10% test portion has the labels and weight maps removed and reserved for evaluation.
- Input Data: 2D time–frequency spectrogram with ~10000 samples (spectrogram), weight, eventId, energy, pitch
- Labels: 1 per-pixel binary segmentation mask (noise/empty time–frequency bins: 0, bins belonging to an electron's cyclotron-radiation track: 1)
- Format: HDF5
Note: Arrays are index-aligned across all datasets. The train file contains all six datasets, the test file contains spectrogram, event_id, energy, and pitch only.
Dataset and description can be found in the p8 directory on NERSC.
SuperNEMO
The dataset contains simulated data from the SuperNEMO Demonstrator, a detector equipped with electron trajectory reconstruction technology. It captures simulations of four distinct physical processes occurring in the source foil: signal process: neutrinoless double beta (0νββ) decay in ⁸²Se, background processes: two-neutrino double beta (2νββ) decay in ⁸²Se, decay of foil contaminant ²⁰⁸Tl, and decay of foil contaminant ²¹⁴Bi.
- Input Data: Numpy array containing columns with ev_no, E1, E2, tX, tY, tZ, tR, dY, dZ, theta, phiR, phiS
- Labels: 1 Multiclass Classification label (0nubb, 2nubb, Bi214, Tl208)
- Format: HDF5