Scaling IMAS performance

This example shows a scaling performance study for manipulating OMAS data in hierarchical or tensor format.

The hierarchical organization of the IMAS data structure can in some situations hinder IMAS’s ability to efficiently manipulate large data sets. This contrasts to the multidimensional arrays (ie. tensors) approach that is commonly used in computer science for high-performance numerical calculations.

Based on this observation OMAS implements a transformation that casts the data that is contained in the IMAS hierarchical structure as a list of tensors, by taking advantage of the homogeneity of grid sizes that is commonly found across arrays of structures. Such transformation and a summary of the scaling results are illustrated here for an hypothetical IDS that has data organized as a series of time-slices:

OMAS implements a transformation that casts the data that is contained in the IMAS hierarchical structure as a list of tensors

The favorable scaling that is observed when representing IMAS data in tensor form makes a strong case for adopting it. Implementing the same system as part of the IMAS backend storage of data and in memory representation would likely greatly benefit IMAS performance in many real-world applications.

The new tensors representation would also greatly simplify the integration of IMAS with a broad range of tools and numerical libraries that are commonly used across many fields of science.

Finally, the addition of an extra dimension to the tensors could be used to efficiently store multiple realizations of signals from a distribution function of uncertain quantities. Such feature would enable support of uncertainty quantification workflows and Bayesian integrated data analyses within IMAS.

Scaling study in detail

OMAS can seamlessly use either hierarchical or tensor representations as the backend for storing data both in memory and on file, and transform from one format to the other. The mapping function is generic and can handle nested hierarchical list of structures (not only in time). Also OMAS can automatically determine which data can be collected across the hierarchical structure, which cannot, and seamlessly handle both at the same time.

The following diagram summarizes the tests performed in this scaling study. Benchmarks show that most operations stemming from the hierarchical representation of the data scale linearly with with the number of time-slices in the sample IDS (red markers in the diagram), whereas operations that make only use of the tensor representation show little to no dependency on the dataset size (green markers in the diagram). As a result the tensors representation can be several orders of magnitude faster than a hierarchical organization, even for datasets of modest size.

OMAS can seamlessly use either hierarchical or tensor representations as backed for storing data both in memory and on file, and transform from one format to the other

Scaling plots and code used for the benchmark follow:

  • Read/Write
  • Mapping
  • Access
1/100
2/100
3/100
6/100
10/100
15/100
25/100
39/100
63/100
100/100
[1, 2, 3, 6, 10, 15, 25, 39, 63, 100]
{'HA': [0.00010418891906738281,
        8.480548858642578e-05,
        8.648236592610677e-05,
        9.101629257202148e-05,
        9.390830993652344e-05,
        0.00010384400685628257,
        8.572864532470703e-05,
        8.50518544514974e-05,
        8.57988993326823e-05,
        8.33871364593506e-05],
 'HB': [0.00015289783477783202,
        0.00018270015716552733,
        0.00026547908782958984,
        0.0005499839782714844,
        0.0008525609970092774,
        0.0011939048767089845,
        0.0019532442092895508,
        0.002906513214111328,
        0.004805064201354981,
        0.007996368408203124],
 'HM': [0.371168851852417,
        0.3298532962799072,
        0.3309476375579834,
        0.3515448570251465,
        0.3888132572174072,
        0.42308664321899414,
        0.44945836067199707,
        0.5318121910095215,
        0.7051033973693848,
        0.9257996082305908],
 'HR': [0.06508994102478027,
        0.08557367324829102,
        0.13012170791625977,
        0.3017001152038574,
        0.42336153984069824,
        0.6358833312988281,
        1.0139694213867188,
        1.5469889640808105,
        2.482639789581299,
        4.0948591232299805],
 'HS': [0.00017023086547851562,
        0.00019502639770507812,
        0.00030533472696940106,
        0.0005220969518025717,
        0.0008637666702270508,
        0.0012135982513427734,
        0.0019127464294433595,
        0.0029315520555545124,
        0.004766872950962612,
        0.007518811225891113],
 'HW': [0.09755444526672363,
        0.0544123649597168,
        0.07622385025024414,
        0.2165844440460205,
        0.2487044334411621,
        0.37813639640808105,
        0.5997560024261475,
        0.9178194999694824,
        1.5057709217071533,
        2.5562644004821777],
 'TA': [0.00010132789611816406,
        9.855031967163086e-05,
        9.171168009440103e-05,
        9.7199281056722e-05,
        0.0001003122329711914,
        9.734948476155599e-05,
        9.174919128417968e-05,
        9.124462421123798e-05,
        9.340823642791263e-05,
        8.955979347229003e-05],
 'TB': [6.377696990966797e-05,
        5.723237991333008e-05,
        5.4717063903808594e-05,
        5.730787913004557e-05,
        5.8898925781250005e-05,
        5.68548838297526e-05,
        5.253314971923828e-05,
        5.5655454977964736e-05,
        5.8775477939181855e-05,
        5.721902847290039e-05],
 'TM': [0.0245816707611084,
        0.03773951530456543,
        0.050867557525634766,
        0.09443163871765137,
        0.15196847915649414,
        0.22064709663391113,
        0.34643101692199707,
        0.5621933937072754,
        0.8826532363891602,
        1.4467198848724365],
 'TR': [0.11832022666931152,
        0.053864240646362305,
        0.04956936836242676,
        0.05408072471618652,
        0.053997039794921875,
        0.04836606979370117,
        0.04897594451904297,
        0.04874563217163086,
        0.05333685874938965,
        0.04897809028625488],
 'TS': [6.508827209472656e-05,
        6.37054443359375e-05,
        6.081263224283854e-05,
        6.168683369954427e-05,
        6.026029586791992e-05,
        5.866527557373047e-05,
        5.2299499511718746e-05,
        5.74490962884365e-05,
        6.139770386711e-05,
        5.6256771087646484e-05],
 'TW': [0.09149861335754395,
        0.07061767578125,
        0.06848764419555664,
        0.07145190238952637,
        0.0701601505279541,
        0.06984782218933105,
        0.06870722770690918,
        0.06549477577209473,
        0.0702970027923584,
        0.07221126556396484]}

import os
import time
from omas import *
import numpy
from pprint import pprint
from matplotlib import pyplot

ods = ODS()
ods.sample_equilibrium()

max_n = 100
max_samples = 11
stats_reps = 10
samples = numpy.unique(list(map(int, numpy.logspace(0, numpy.log10(max_n), max_samples)))).tolist()
max_samples = len(samples)

times = {}
for type in ['H', 'T']:  # hierarchical or tensor
    for action in ['R', 'W', 'M', 'A', 'S', 'B']:  # Read, Write, Mapping, Array access, Stripe access, Bulk access
        times[type + action] = []

try:
    __file__
except NameError:
    import inspect

    __file__ = inspect.getfile(lambda: None)
for n in samples:
    print('%d/%d' % (n, samples[-1]))

    # keep adding time slices to the data structure
    for k in range(len(ods['equilibrium.time_slice']), n):
        ods.sample_equilibrium(time_index=k)

    # hierarchical write to HDF5
    filename = omas_testdir(__file__) + '/tmp.h5'
    t0 = time.time()
    save_omas_h5(ods, filename)
    times['HW'].append(time.time() - t0)

    # hierarchical read from HDF5
    t0 = time.time()
    load_omas_h5(filename)
    times['HR'].append(time.time() - t0)

    # hierarchical access to individual array
    t0 = time.time()
    for k in range(stats_reps):
        for kk in range(n):
            ods['equilibrium.time_slice.%d.profiles_1d.psi' % kk]
    times['HA'].append((time.time() - t0) / n / float(stats_reps))

    # hierarchical slice across the data structure
    t0 = time.time()
    for kk in range(n):
        ods['equilibrium.time_slice.:.profiles_1d.psi'][:, 0]
    times['HS'].append((time.time() - t0) / n)

    # hierarchical bulk access to data
    t0 = time.time()
    for k in range(stats_reps):
        ods['equilibrium.time_slice.:.profiles_1d.psi']
    times['HB'].append((time.time() - t0) / float(stats_reps))

    # hierarchical mapping to tensor
    t0 = time.time()
    odx = ods_2_odx(ods)
    times['HM'].append(time.time() - t0)

    filename = omas_testdir(__file__) + '/tmp.ds'

    # tensor write to HDF5
    t0 = time.time()
    save_omas_dx(odx, filename)
    times['TW'].append(time.time() - t0)

    # tensor read from HDF5
    t0 = time.time()
    odx = load_omas_dx(filename)
    times['TR'].append(time.time() - t0)

    # tensor mapping to hierarchical
    t0 = time.time()
    ods = odx_2_ods(odx)
    times['TM'].append(time.time() - t0)

    # tensor access to individual array
    t0 = time.time()
    for k in range(stats_reps):
        for kk in range(n):
            odx['equilibrium.time_slice.%d.profiles_1d.psi' % kk]
    times['TA'].append((time.time() - t0) / n / float(stats_reps))

    # tensor slice across the data structure
    t0 = time.time()
    for k in range(stats_reps):
        for kk in range(n):
            odx['equilibrium.time_slice.:.profiles_1d.psi'][:, 0]
    times['TS'].append((time.time() - t0) / n / float(stats_reps))

    # tensor bulk access to data
    t0 = time.time()
    for k in range(stats_reps):
        for kk in range(n):
            odx['equilibrium.time_slice.:.profiles_1d.psi']
    times['TB'].append((time.time() - t0) / n / float(stats_reps))

# print numbers to screen
print(samples)
pprint(times)

# plot read/write scaling
pyplot.figure()
for type in ['H', 'T']:  # hierarchical or tensor
    for action in ['R', 'W']:  # Read, Write
        pyplot.loglog(samples, times[type + action], label=type + action, lw=1.5, ls=['-', '--']['H' in type])
pyplot.xlabel('# of Equilibrium Time Slices')
pyplot.ylabel('Time [s]')
pyplot.legend(loc='upper left', frameon=False)
pyplot.title('Read/Write', y=0.85, x=0.7)

# plot mapping scaling
pyplot.figure()
for type in ['H', 'T']:  # hierarchical or tensor
    for action in ['M']:  # Mapping
        pyplot.loglog(samples, times[type + action], label=type + action, lw=1.5, ls=['-', '--']['H' in type])
pyplot.xlabel('# of Equilibrium Time Slices')
pyplot.ylabel('Time [s]')
pyplot.legend(loc='upper left', frameon=False)
pyplot.title('Mapping', y=0.85, x=0.7)

# plot access scaling
pyplot.figure()
for type in ['H', 'T']:  # hierarchical or tensor
    for action in ['A', 'S', 'B']:  # Array access, Stripe access, Bulk access
        pyplot.loglog(samples, times[type + action], label=type + action, lw=1.5, ls=['-', '--']['H' in type])
pyplot.xlabel('# of Equilibrium Time Slices')
pyplot.ylabel('Time [s]')
pyplot.legend(loc='upper left', frameon=False)
pyplot.title('Access', y=0.85, x=0.7)

pyplot.show()

Gallery generated by Sphinx-Gallery