# Parallel processing¶

It is common to want to study the magnetic response of a device as a function one or more properties of the device, or as a function of one of the applied boundary conditions. The superscreen.solve_many() function provides a convenient interface for performing “sweeps” of this type, where the properies that can be swept include:

• The applied field $$H_{z,\,\mathrm{applied}}$$

• Any circulating currents associated with trapped flux

• The number and position of vortices trapped in the Device

• The effective penetration depth $$\Lambda(x, y)$$ (Lambda) or $$\lambda(x, y)$$ (london_lambda) of each Layer in the Device

• The vertical position $$z_0$$ of each Layer in the Device

The geometry of the Device parallel to the $$x-y$$ plane cannot be swept in solve_many() because changing the geometry requires re-generating the mesh.

:

%config InlineBackend.figure_formats = {"retina", "png"}
%matplotlib inline

import os
import logging

logging.basicConfig(level=logging.CRITICAL)

import ray
import numpy as np
import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = (5, 4)
plt.rcParams["font.size"] = 10
from IPython.display import clear_output

import superscreen as sc
from superscreen.geometry import circle


## Introduction¶

The idea behind superscreen.solve_many() is to create a set of models based on a single Device and solve them all with a single command. A single model consists of the following:

• A superscreen.Parameter that calculates the applied magnetic field (if any).

• A dict of {hole_name: circulating_current} specifying the circulating currents in the Device (if any).

• A list of superscreen.Vortex objects representing vortices trapped in the Device.

Additionally, for each model the penetration depth and/or $$z$$-position of each layer in the Device can be modified by a layer_updater function that takes some update_kwargs such that

new_layer = layer_updater(old_layer, **update_kwargs)


In pseudocode, superscreen.solve_many() does the following:

# Simplified pseudocode for superscreen.solve_many().
# Given a Device, a layer_updater function, and a list of "models":
model_solutions = []
for model in models:
applied_field, circulating_currents, vortices, update_kwargs = model
device.layers_list = [layer_updater(layer, **update_kwargs) for layer in device.layers_list]
solutions = superscreen.solve(
device,
applied_field=applied_field,
circulating_currents=circulating_currents,
vortices=vortices,
...,
)
model_solutions.append(solutions)
# Save and/or return model_solutions.


There are a few reasons one would want to use superscreen.solve_many() instead of simply writing out a for loop:

1. solve_many() takes care of generating the list of models to solve.

2. solve_many() takes care of saving results in a convenient way.

3. The for loop in the pseudocode above can be parallelized to run across multiple CPUs or even multiple nodes in a computing cluster.

### Parallel methods¶

There are three different “parallel methods” that can be specified when calling superscreen.solve_many():

1. "serial" or None: All models are solved sequentially in a single Python process (a simple for loop as sketched above).

2. "multiprocessing" or "mp": Models are solved in parallel using at most ncpu processes, where ncpu is the number of CPU cores available. This option uses the multiprocessing package from the Python standard library.

3. "ray": Models are solved in parallel on multiple CPUs using the third-party distributed computing package, ray.

Both the "multiprocessing" and "ray" options are implemented such that a Device's “big” arrays, i.e. the mesh, Laplace operator, and kernel matrix, are placed in shared memory rather than copied to each worker process. The "multiprocessing" option is limited to work on a single machine (i.e. a laptop, workstation, or a single node in a compute cluster). The "ray" option can in principle also be used across multiple nodes in a cluster (see the ray documentation).

### Disclaimer¶

It is important to note that the relative performance of the three different “parallel methods” will in general depend on the specific models your are solving, and on the setup of your machine (how many CPU cores, what other programs are running, etc.). With both multiprocessing and ray, there is some overhead associated with placing arrays in shared memory and spinning up and managing new processes. This means that if each call to solve() (see the pseudocode above) takes only a few seconds, you are unlikely to see a big performance improvement relative to serial execution. In the limit that each call to solve() takes a long time relative to this overhead, you can in principle get a speedup of order ncpu, where ncpu is the number of CPUs avilable. We recommend comparing the three parallel methods in your own application before deciding which to use.

:

sc.version_table()

:

SoftwareVersion
SuperScreen0.8.0
Numpy1.23.1
SciPy1.8.1
matplotlib3.5.2
ray1.13.0
jax0.3.15
IPython8.4.0
Python3.9.12 (main, Jun 1 2022, 11:38:51) [GCC 7.5.0]
OSposix [linux]
Number of CPUsPhysical: 11, Logical: 11
BLAS InfoOPENBLAS
Wed Dec 14 18:00:59 2022 PST

## Example¶

Here we will demonstrate the use of superscreen.solve_many() by modeling the mutual inductance between two concentric rings as a function of both their penetration depth $$\Lambda$$ and the vertical distance $$\Delta z$$ between them.

### Setup¶

Define a function to create out Device with two concentric rings.

:

def two_rings(inner_radius=3, outer_radius=5, length_units="um"):
"""Returns a Device representing two concentric rings."""
layers = [
sc.Layer("layer0", Lambda=0, z0=0),
sc.Layer("layer1", Lambda=0, z0=1),
]
films = [
]
holes = [
]
abstract_regions = [
]
return sc.Device(
"two_rings",
layers=layers,
films=films,
holes=holes,
abstract_regions=abstract_regions,
length_units=length_units,
)

:

device = two_rings()
_ = device.draw(exclude="bounding_box", subplots=True) :

device.make_mesh(min_points=3000)


Recall that to model the mutual inductance $$M$$ between two holes, we specify a current $$I_\mathrm{circ}$$ to circulate around one of the holes, and calculate the fluxoid $$\Phi^f_S$$ induced in the other hole. The mutual inductance is then $$M=\Phi^f_S / I_\mathrm{circ}$$.

:

I_circ = device.ureg("1 mA")
circulating_currents = {"big_hole": str(I_circ)}

:

max_cpus = sc.parallel.cpu_count()
num_cpus =  + list(range(2, max_cpus, 2))


### Sweep $$\Lambda$$¶

:

def update_Lambda(layer, **Lambdas):
"""Updates a layer based on a dict Lambdas = {layer_name: layer_Lambda}."""
if layer.name in Lambdas:
layer.Lambda = Lambdas[layer.name]
return layer

def sweep_Lambda(Lambdas, ncpus, iterations=4, parallel_method=None):
"""Sweep Lambda using superscreen.solve_many()."""

layer_update_kwargs = [
{layer.name: Lambda for layer in device.layers.values()} for Lambda in Lambdas
]

solutions, paths = sc.solve_many(
device,
circulating_currents=circulating_currents,
layer_updater=update_Lambda,
layer_update_kwargs=layer_update_kwargs,
parallel_method=parallel_method,
num_cpus=ncpus,
iterations=iterations,
return_solutions=True,
keep_only_final_solution=True,
)
clear_output(wait=True)
return solutions

:

Lambdas = np.linspace(0, 0.95, 20)

:

serial_result = %timeit -o sweep_Lambda(Lambdas, 1, parallel_method=None)

55.3 s ± 185 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

:

ray_results = []

for ncpus in num_cpus:
ray.init(num_cpus=ncpus)
ray_result = %timeit -o sweep_Lambda(Lambdas, ncpus, parallel_method="ray")
ray_results.append(ray_result)
ray.shutdown()

7.29 s ± 213 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

:

mp_results = []

for ncpus in num_cpus:
mp_result = %timeit -o sweep_Lambda(Lambdas, ncpus, parallel_method="multiprocessing")
mp_results.append(mp_result)

28.6 s ± 909 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

:

fig, ax = plt.subplots()
for results, label in [(ray_results, "ray"), (mp_results, "multiprocessing")]:
xs = num_cpus
ys = [serial_result.average / result.average for result in results]
ax.plot(xs, ys, "o--", label=label)
ax.legend(loc=0)
ax.set_xlabel("Number of CPUs")
ax.set_ylabel("Speed-up vs. serial")
ax.grid(True) Calculcate and plot the mutual inductance $$M$$ for each Solution:

:

solutions = sweep_Lambda(Lambdas, num_cpus[-1], parallel_method="ray")
mutuals = []
for i, solution in enumerate(solutions):
print(i, end=" ")
fluxoid = sum(solution.hole_fluxoid("little_hole"))
mutuals.append((fluxoid / I_circ).to("pH").magnitude)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

:

fig, ax = plt.subplots()
ax.plot(Lambdas, mutuals, "o-")
ax.grid(True)
ax.set_xlabel("$\\Lambda$ [$\\mu$m]")
ax.set_ylabel("Mutual inductance [pH]")
DeltaZ = device.layers["layer1"].z0 - device.layers["layer0"].z0
ax.set_title(f"$\\Delta z$ = {DeltaZ:.1f} $\\mu$m")

:

Text(0.5, 1.0, '$\\Delta z$ = 1.0 $\\mu$m') ### Sweep $$\Delta z$$¶

:

def update_height(layer, **heights):
if layer.name in heights:
layer.z0 = heights[layer.name]
return layer

def sweep_height(heights, ncpus, iterations=6, parallel_method=None):
"""Sweep \Delta z using superscreen.solve_many()."""
layer_update_kwargs = [dict(layer0=0, layer1=z0) for z0 in heights]
solutions, paths = sc.solve_many(
device,
circulating_currents=circulating_currents,
layer_updater=update_height,
layer_update_kwargs=layer_update_kwargs,
parallel_method=parallel_method,
num_cpus=ncpus,
iterations=iterations,
return_solutions=True,
keep_only_final_solution=True,
)
clear_output(wait=True)
return solutions

:

z0s = np.linspace(0.5, 5.5, 40)

# Set Lambda = 0.5 microns for both layers
Lambda = 0.5
for layer in device.layers.values():
layer.Lambda = Lambda

:

serial_result = %timeit -o sweep_height(z0s, 1, parallel_method=None)

2min 19s ± 618 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

:

ray_results = []

for ncpus in num_cpus:
ray.init(num_cpus=ncpus)
ray_result = %timeit -o sweep_height(z0s, ncpus, parallel_method="ray")
ray_results.append(ray_result)
ray.shutdown()

17.6 s ± 132 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

:

mp_results = []

for ncpus in num_cpus:
mp_result = %timeit -o sweep_height(z0s, ncpus, parallel_method="multiprocessing")
mp_results.append(mp_result)

1min 20s ± 1.54 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

:

fig, ax = plt.subplots()
for results, label in [(ray_results, "ray"), (mp_results, "multiprocessing")]:
xs = num_cpus
ys = [serial_result.average / result.average for result in results]
ax.plot(xs, ys, "o--", label=label)
ax.legend(loc=0)
ax.set_xlabel("Number of CPUs")
ax.set_ylabel("Speed-up vs. serial")
ax.grid(True) Calculcate and plot the mutual inductance $$M$$ for each Solution:

:

solutions = sweep_height(z0s, num_cpus[-1], parallel_method="multiprocessing")
mutuals = []
for i, solution in enumerate(solutions):
print(i, end=" ")
fluxoid = sum(solution.hole_fluxoid("little_hole"))
mutuals.append((fluxoid / I_circ).to("pH").magnitude)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

:

fig, ax = plt.subplots()
ax.plot(z0s, mutuals, "o-")
ax.grid(True)
ax.set_xlabel("$\\Delta z$ [$\\mu$m]")
ax.set_ylabel("Mutual inductance [pH]")
ax.set_title(f"$\\Lambda$ = {Lambda:.1f} $\\mu$m")

:

Text(0.5, 1.0, '$\\Lambda$ = 0.5 $\\mu$m') [ ]: