Parallel processing¶
It is common to want to study the magnetic response of a device as a function one or more properties of the device, or as a function of one of the applied boundary conditions. The superscreen.solve_many()
function provides a convenient interface for performing “sweeps” of this type, where the properies that can be swept include:
The applied field \(H_{z,\,\mathrm{applied}}\)
Any circulating currents associated with trapped flux
The number and position of vortices trapped in the
Device
The effective penetration depth \(\Lambda(x, y)\) (
Lambda
) or \(\lambda(x, y)\) (london_lambda
) of eachLayer
in theDevice
The vertical position \(z_0\) of each
Layer
in theDevice
The geometry of the Device
parallel to the \(x-y\) plane cannot be swept in solve_many()
because changing the geometry requires re-generating the mesh.
[1]:
%config InlineBackend.figure_formats = {"retina", "png"}
%matplotlib inline
import os
import logging
os.environ["OPENBLAS_NUM_THREADS"] = "1"
logging.basicConfig(level=logging.CRITICAL)
import ray
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (5, 4)
plt.rcParams["font.size"] = 10
from IPython.display import clear_output
import superscreen as sc
from superscreen.geometry import circle
Introduction¶
The idea behind superscreen.solve_many()
is to create a set of models based on a single Device
and solve them all with a single command. A single model consists of the following:
A
superscreen.Parameter
that calculates the applied magnetic field (if any).A
dict
of{hole_name: circulating_current}
specifying the circulating currents in theDevice
(if any).A
list
ofsuperscreen.Vortex
objects representing vortices trapped in theDevice
.
Additionally, for each model the penetration depth and/or \(z\)-position of each layer
in the Device
can be modified by a layer_updater
function that takes some update_kwargs
such that
new_layer = layer_updater(old_layer, **update_kwargs)
In pseudocode, superscreen.solve_many()
does the following:
# Simplified pseudocode for superscreen.solve_many().
# Given a Device, a layer_updater function, and a list of "models":
model_solutions = []
for model in models:
applied_field, circulating_currents, vortices, update_kwargs = model
device.layers_list = [layer_updater(layer, **update_kwargs) for layer in device.layers_list]
solutions = superscreen.solve(
device,
applied_field=applied_field,
circulating_currents=circulating_currents,
vortices=vortices,
...,
)
model_solutions.append(solutions)
# Save and/or return model_solutions.
There are a few reasons one would want to use superscreen.solve_many()
instead of simply writing out a for
loop:
solve_many()
takes care of generating the list of models to solve.solve_many()
takes care of saving results in a convenient way.The
for
loop in the pseudocode above can be parallelized to run across multiple CPUs or even multiple nodes in a computing cluster.
Parallel methods¶
There are three different “parallel methods” that can be specified when calling superscreen.solve_many()
:
"serial"
orNone
: All models are solved sequentially in a single Python process (a simplefor
loop as sketched above)."multiprocessing"
or"mp"
: Models are solved in parallel using at mostncpu
processes, wherencpu
is the number of CPU cores available. This option uses the multiprocessing package from the Python standard library."ray"
: Models are solved in parallel on multiple CPUs using the third-party distributed computing package, ray.
Both the "multiprocessing"
and "ray"
options are implemented such that a Device's
“big” arrays, i.e. the mesh, Laplace operator, and kernel matrix, are placed in shared memory rather than copied to each worker process. The "multiprocessing"
option is limited to work on a single machine (i.e. a laptop, workstation, or a single node in a compute cluster). The "ray"
option can in principle also be used across multiple nodes in a cluster (see the ray
documentation).
Disclaimer¶
It is important to note that the relative performance of the three different “parallel methods” will in general depend on the specific models your are solving, and on the setup of your machine (how many CPU cores, what other programs are running, etc.). With both multiprocessing
and ray
, there is some overhead associated with placing arrays in shared memory and spinning up and managing new processes. This means that if each call to solve()
(see the pseudocode above) takes only a few
seconds, you are unlikely to see a big performance improvement relative to serial execution. In the limit that each call to solve()
takes a long time relative to this overhead, you can in principle get a speedup of order ncpu
, where ncpu
is the number of CPUs avilable. We recommend comparing the three parallel methods in your own application before deciding which to use.
[2]:
sc.version_table()
[2]:
Software | Version |
---|---|
SuperScreen | 0.8.0 |
Numpy | 1.23.1 |
SciPy | 1.8.1 |
matplotlib | 3.5.2 |
ray | 1.13.0 |
jax | 0.3.15 |
IPython | 8.4.0 |
Python | 3.9.12 (main, Jun 1 2022, 11:38:51) [GCC 7.5.0] |
OS | posix [linux] |
Number of CPUs | Physical: 11, Logical: 11 |
BLAS Info | OPENBLAS |
Wed Dec 14 18:00:59 2022 PST |
Example¶
Here we will demonstrate the use of superscreen.solve_many()
by modeling the mutual inductance between two concentric rings as a function of both their penetration depth \(\Lambda\) and the vertical distance \(\Delta z\) between them.
Setup¶
Define a function to create out Device
with two concentric rings.
[3]:
def two_rings(inner_radius=3, outer_radius=5, length_units="um"):
"""Returns a Device representing two concentric rings."""
layers = [
sc.Layer("layer0", Lambda=0, z0=0),
sc.Layer("layer1", Lambda=0, z0=1),
]
films = [
sc.Polygon("big_ring", layer="layer0", points=circle(1.5 * outer_radius)),
sc.Polygon("little_ring", layer="layer1", points=circle(outer_radius)),
]
holes = [
sc.Polygon("big_hole", layer="layer0", points=circle(1.5 * inner_radius)),
sc.Polygon("little_hole", layer="layer1", points=circle(inner_radius)),
]
abstract_regions = [
sc.Polygon("bounding_box", layer="layer0", points=circle(2 * outer_radius)),
]
return sc.Device(
"two_rings",
layers=layers,
films=films,
holes=holes,
abstract_regions=abstract_regions,
length_units=length_units,
)
[4]:
device = two_rings()
_ = device.draw(exclude="bounding_box", subplots=True)

[5]:
device.make_mesh(min_points=3000)
Recall that to model the mutual inductance \(M\) between two holes, we specify a current \(I_\mathrm{circ}\) to circulate around one of the holes, and calculate the fluxoid \(\Phi^f_S\) induced in the other hole. The mutual inductance is then \(M=\Phi^f_S / I_\mathrm{circ}\).
[6]:
I_circ = device.ureg("1 mA")
circulating_currents = {"big_hole": str(I_circ)}
[7]:
max_cpus = sc.parallel.cpu_count()
num_cpus = [1] + list(range(2, max_cpus, 2))
Sweep \(\Lambda\)¶
[8]:
def update_Lambda(layer, **Lambdas):
"""Updates a layer based on a dict Lambdas = {layer_name: layer_Lambda}."""
if layer.name in Lambdas:
layer.Lambda = Lambdas[layer.name]
return layer
def sweep_Lambda(Lambdas, ncpus, iterations=4, parallel_method=None):
"""Sweep Lambda using superscreen.solve_many()."""
layer_update_kwargs = [
{layer.name: Lambda for layer in device.layers.values()} for Lambda in Lambdas
]
solutions, paths = sc.solve_many(
device,
circulating_currents=circulating_currents,
layer_updater=update_Lambda,
layer_update_kwargs=layer_update_kwargs,
parallel_method=parallel_method,
num_cpus=ncpus,
iterations=iterations,
return_solutions=True,
keep_only_final_solution=True,
)
clear_output(wait=True)
return solutions
[9]:
Lambdas = np.linspace(0, 0.95, 20)
[10]:
serial_result = %timeit -o sweep_Lambda(Lambdas, 1, parallel_method=None)
55.3 s ± 185 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
[11]:
ray_results = []
for ncpus in num_cpus:
ray.init(num_cpus=ncpus)
ray_result = %timeit -o sweep_Lambda(Lambdas, ncpus, parallel_method="ray")
ray_results.append(ray_result)
ray.shutdown()
7.29 s ± 213 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
[12]:
mp_results = []
for ncpus in num_cpus:
mp_result = %timeit -o sweep_Lambda(Lambdas, ncpus, parallel_method="multiprocessing")
mp_results.append(mp_result)
28.6 s ± 909 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
[13]:
fig, ax = plt.subplots()
for results, label in [(ray_results, "ray"), (mp_results, "multiprocessing")]:
xs = num_cpus
ys = [serial_result.average / result.average for result in results]
ax.plot(xs, ys, "o--", label=label)
ax.legend(loc=0)
ax.set_xlabel("Number of CPUs")
ax.set_ylabel("Speed-up vs. serial")
ax.grid(True)

Calculcate and plot the mutual inductance \(M\) for each Solution
:
[14]:
solutions = sweep_Lambda(Lambdas, num_cpus[-1], parallel_method="ray")
mutuals = []
for i, solution in enumerate(solutions):
print(i, end=" ")
fluxoid = sum(solution.hole_fluxoid("little_hole"))
mutuals.append((fluxoid / I_circ).to("pH").magnitude)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
[15]:
fig, ax = plt.subplots()
ax.plot(Lambdas, mutuals, "o-")
ax.grid(True)
ax.set_xlabel("$\\Lambda$ [$\\mu$m]")
ax.set_ylabel("Mutual inductance [pH]")
DeltaZ = device.layers["layer1"].z0 - device.layers["layer0"].z0
ax.set_title(f"$\\Delta z$ = {DeltaZ:.1f} $\\mu$m")
[15]:
Text(0.5, 1.0, '$\\Delta z$ = 1.0 $\\mu$m')

Sweep \(\Delta z\)¶
[16]:
def update_height(layer, **heights):
if layer.name in heights:
layer.z0 = heights[layer.name]
return layer
def sweep_height(heights, ncpus, iterations=6, parallel_method=None):
"""Sweep \Delta z using superscreen.solve_many()."""
layer_update_kwargs = [dict(layer0=0, layer1=z0) for z0 in heights]
solutions, paths = sc.solve_many(
device,
circulating_currents=circulating_currents,
layer_updater=update_height,
layer_update_kwargs=layer_update_kwargs,
parallel_method=parallel_method,
num_cpus=ncpus,
iterations=iterations,
return_solutions=True,
keep_only_final_solution=True,
)
clear_output(wait=True)
return solutions
[17]:
z0s = np.linspace(0.5, 5.5, 40)
# Set Lambda = 0.5 microns for both layers
Lambda = 0.5
for layer in device.layers.values():
layer.Lambda = Lambda
[18]:
serial_result = %timeit -o sweep_height(z0s, 1, parallel_method=None)
2min 19s ± 618 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
[19]:
ray_results = []
for ncpus in num_cpus:
ray.init(num_cpus=ncpus)
ray_result = %timeit -o sweep_height(z0s, ncpus, parallel_method="ray")
ray_results.append(ray_result)
ray.shutdown()
17.6 s ± 132 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
[20]:
mp_results = []
for ncpus in num_cpus:
mp_result = %timeit -o sweep_height(z0s, ncpus, parallel_method="multiprocessing")
mp_results.append(mp_result)
1min 20s ± 1.54 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
[21]:
fig, ax = plt.subplots()
for results, label in [(ray_results, "ray"), (mp_results, "multiprocessing")]:
xs = num_cpus
ys = [serial_result.average / result.average for result in results]
ax.plot(xs, ys, "o--", label=label)
ax.legend(loc=0)
ax.set_xlabel("Number of CPUs")
ax.set_ylabel("Speed-up vs. serial")
ax.grid(True)

Calculcate and plot the mutual inductance \(M\) for each Solution
:
[22]:
solutions = sweep_height(z0s, num_cpus[-1], parallel_method="multiprocessing")
mutuals = []
for i, solution in enumerate(solutions):
print(i, end=" ")
fluxoid = sum(solution.hole_fluxoid("little_hole"))
mutuals.append((fluxoid / I_circ).to("pH").magnitude)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
[23]:
fig, ax = plt.subplots()
ax.plot(z0s, mutuals, "o-")
ax.grid(True)
ax.set_xlabel("$\\Delta z$ [$\\mu$m]")
ax.set_ylabel("Mutual inductance [pH]")
ax.set_title(f"$\\Lambda$ = {Lambda:.1f} $\\mu$m")
[23]:
Text(0.5, 1.0, '$\\Lambda$ = 0.5 $\\mu$m')

[ ]: