arXiv:2312.04327v1 [cs.LG] 7 Dec 2023

Learning to sample in Cartesian MRI

Thèse n. 9981
présentée le 3 juin 2022
Faculté des sciences et techniques de l’ingénieur
Laboratoire de systèmes d’information et d’inférence
Programme doctoral en informatique et communications
pour l’obtention du grade de Docteur ès Sciences
par
Thomas Sanchez
Acceptée sur proposition du jury :
Prof Alexandre Alahi, président du jury
Prof Volkan Cevher, directeur de thèse
Prof Dimitri van de Ville, rapporteur
Dr Ruud van Heeswijk, rapporteur
Dr Philippe Ciuciu, rapporteur

On God rests my salvation and my glory;
my mighty rock, my refuge is God.
— The Bible, Psalm 62:8

Acknowledgements
This thesis was a journey that would have been impossible without the help and support
of many people.
First, I would like to thank my advisor Volkan Cevher. He never gave up on me despite
some tumultuous steps, and fought for me to reach the end of my thesis. Few would have
done this. I am also thankful that he let me pursue freely my research interests, and
I am grateful to him for having built an amazing group with colleagues who are both
brilliant researchers and great friends.
Second, I would to express my gratitude to Prof. Alexandre Alahi, Prof. Dimitri van
de Ville, Dr. Ruud van Heeswijk and Dr. Philippe Ciuciu for being part of my thesis
committee and for the interest that they took in reading and evaluating my thesis, as
well as for their useful suggestions. I would like to thank in particular Dr. Ruud van
Heeswijk and Dr. Philippe Ciuciu for all the insight that they gave me about MRI, and
Dr. Philippe Ciuciu hosting me at Neurospin, despite the visit being shortened by the
start of a pandemic. I have very good memories of my short time there.
My thanks then naturally go to my friends and colleagues, who supported me during
these four years and made them unforgettable. I can never repay them enough for their
kindness. Thank you Baran, Paul, Ya-Ping, Ahmet, Fatih, Ali K., Kamal, Thomas,
Fabian, Leello, Pedro, Igor, Chaehwan, Maria, Arda, Armin, Yura, Nadav, Grigoris,
Ali R., Andrej, Luca, Stratis, Kimon, Fanghui, and all the others for your friendship
and the moments that we have shared. Of course, a couple of special mentions are in
order. I want to thank Baran for introducing me to the world of MRI, and teaching me
much about research and life. I want to thank Igor for the close collaboration that we
had throughout my PhD, for putting up with me through manifold argumentative and
extended meetings, pesky bugs and seemingly incomprehensible experimental results,
and for being so patient and supportive. I am also thankful for his healthy German
honesty and for the good friend he is. I want also to thank Leello for being one of the
most fun people on this planet – I don’t think that someone made me laugh more than
him – for his optimistic look on life, for his practical wisdom and overall for being such
an amazing friend. Thank you for bringing joy and friendship to everyone around you.
Of course, I ought to thank Gosia, the secretary of our lab, for her amazing work and
i

Acknowledgements
efficiency in every practical matter. I suspect that she is the reason that the lab has not
yet collapsed into chaos.
I am also grateful to all my friends for their support, in particular to Clément and Nathan
Dupertuis, Peter Lecomte, Marion Conus and Christian Cuendet, as well as everyone
from church, from the house group and from GBU. They are too numerous to be listed
individually, and I feel very fortunate to be surrounded by so many kind and generous
people.
Of course, without the support from my family, this thesis would not have been possible.
I have been blessed by the unconditional love and encouragement from my parents, Esther
and David, from my sister Myriam and her fiancé Sam, and from my brother Nicolas. I
also cannot thank enough my parents-in-law, Karin and Jean-Paul.
Finally, my words turn to my best friend and amazing wife, Aline. Thank you for who
you are. Thank you for your love, your support and unfailing encouragement. Without
you, I could not have completed this thesis. You make every day brighter and push me
to become a better man.
Lausanne, May 12, 2022

ii

T. S.

Abstract
Magnetic Resonance Imaging (MRI) is a non-invasive, non-ionizing imaging modality
with unmatched soft tissue contrast. However, compared to imaging methods like X-ray
radiography, MRI suffers from long scanning times, due to its inherently sequential
acquisition procedure. Shortening scanning times is crucial in clinical setting, as it
increases patient comfort, decreases examination costs and improves throughput.
Recent developments thanks to compressed sensing (CS) and lately deep learning allow
to reconstruct high quality images from undersampled images, and have the potential to
greatly accelerate MRI. Many algorithms have been proposed in the context of reconstruction, but comparatively little work has been done to find acquisition trajectories
that optimize the quality of the reconstructed image downstream.
Although in recent years, this problem has gained attention, it is still unclear what is the
best approach to design acquisition trajectories in Cartesian MRI. In this thesis, we aim
at contributing to this problem along two complementary directions.
First, we provide novel methodological contributions to this problem. We first propose
two algorithms that improve drastically the greedy learning-based compressed sensing
(LBCS) approach of Gözcü et al. (2018). These two algorithms, called lazy LBCS and
stochastic LBCS scale to large, clinically relevant problems such as multi-coil 3D MR and
dynamic MRI that were inaccessible to LBCS. We also show that generative adversarial
networks (GANs), used to model the posterior distribution in inverse problems, provide
a natural criterion for adaptive sampling by leveraging their variance in the measurement
domain to guide the acquisition procedure.
Secondly, we aim at deepening the understanding of the kind of structures or assumptions
that enable mask design algorithms to perform well in practice. In particular, our
experiments show that state-of-the-art approaches based on deep reinforcement learning
(RL), which have the ability to adapt trajectories on the fly to patient and perform
long-horizon planning, bring at best a marginal improvement over stochastic LBCS,
which is neither adaptive nor does long-term planning.
Overall, our results suggest that methods like stochastic LBCS offer promising alternatives
iii

Abstract
to deep RL. They shine in particular by their scalability and computational efficiency
and could be key in the deployment of optimized acquisition trajectories in Cartesian MRI.
Keywords: magnetic resonance imaging, experiment design, inverse problems, compressed sensing, reinforcement learning, deep learning, generative adversarial networks

iv

Résumé
L’imagerie par résonance magnétique (IRM) est une modalité d’imagerie non invasive,
non ionisante et offrant un contraste inégalé des tissus mous. Cependant, en raison de
procédures d’acquisition inhéremment séquentielles, la vitesse d’acquisition en IRM est
lente comparé à des méthodes telles que la radiographie par rayons X. La réduction des
temps d’acquisition est cruciale en milieu clinique, car elle profite au confort du patient,
diminue les coûts d’examen et améliore le rendement.
Les innovations récentes grâce à l’acquisition comprimée (CS) et dernièrement l’apprentissage profond permettent de reconstruire des images de haute qualité à partir d’images
sous-échantillonnées et ont le potentiel de grandement accélérer l’IRM. De nombreux
algorithmes ont été proposés dans le contexte de la reconstruction, mais comparativement
peu de travaux ont été réalisés afin de trouver des trajectoires d’acquisition qui optimisent
la qualité de l’image reconstruite en aval.
Bien que ces dernières années, ce problème ait gagné en attention, il n’est toujours pas
clair quelle est la meilleure approche pour concevoir des trajectoires d’acquisition en
IRM cartésienne. Dans cette thèse, nous visons à contribuer à ce problème selon deux
directions complémentaires.
Premièrement, nous proposons de nouvelles contributions méthodologiques à ce problème.
Nous proposons tout d’abord deux algorithmes qui améliorent considérablement l’approche gloutonne d’acquisition comprimée basée sur l’apprentissage (LBCS) de (Gözcü
et al., 2018). Ces deux algorithmes, appelés LBCS paresseux et LBCS stochastique,
s’étendent à des problèmes importants et cliniquement pertinents, tels que l’IRM parallèle 3D et dynamique, qui étaient inaccessibles à LBCS. Nous montrons également que les
réseaux antagonistes génératifs (GAN), utilisés pour modéliser la distribution a posteriori
dans les problèmes inverses, fournissent un critère naturel pour l’acquisition adaptative en
utilisant leur variance dans le domaine de la mesure pour guider la procédure d’acquisition.
Deuxièmement, nous cherchons à approfondir la compréhension du type de structures
ou d’hypothèses qui permettent aux algorithmes de conception de masques d’être performants en pratique. En particulier, nos expériences montrent que les approches de
pointe basées sur l’apprentissage par renforcement (RL) profond, qui ont la capacité
v

Résumé
d’adapter leurs trajectoires au patient à la volée et d’effectuer une planification à long
terme, apportent au mieux une amélioration marginale par rapport à LBCS stochastique,
qui n’est ni adaptatif ni ne fait pas de planification à long terme.
Dans l’ensemble, nos résultats suggèrent que des méthodes comme LBCS stochastique
offrent des alternatives prometteuses au RL profond. Elles brillent notamment par leur
extensibilité et leur efficacité de calcul et pourraient être déterminantes dans le déploiement de trajectoires d’acquisition optimisées en IRM cartésienne.
Mots-clé : imagerie par résonance magnétique, planification d’expériences, problèmes
inverses, acquisition comprimée, apprentissage par renforcement, apprentissage profond,
réseaux antagonistes génératifs.

vi

Contents
Acknowledgements

i

Abstract (English/Français)

iii

1 Introduction

1

2 MRI Background
2.1 MRI physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 Magnetization . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.2 Excitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.3 Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.4 Image acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.5 Parallel imaging . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.6 Dynamic MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Mathematical description . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Accelerated MRI . . . . . . . . . . . . . . . . . . . . . . . . . . .

9
9
9
10
11
12
16
18
19
20

3 Reconstruction methods
3.1 Model-driven approaches . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Compressed Sensing . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Data-driven approaches . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Dictionary learning . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 Unrolled neural networks . . . . . . . . . . . . . . . . . . . . . .
3.2.3 Deep image prior . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.4 A Bayesian interpretation of reconstruction . . . . . . . . . . . .
3.2.5 Generative approaches . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Trade-offs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23
23
23
25
25
26
27
28
29
29

4 Optimizing sampling patterns
4.1 A taxonomy of mask design methods . . . . . . . . . . . . . . . . . . . .
4.2 Compressed sensing and MRI: Model-based, model-driven approaches .
4.2.1 The contribution of Lustig et al. (2007) . . . . . . . . . . . . . .
4.2.2 Model-based, model-driven sampling in the sequel . . . . . . . .
4.3 Towards learning-based, data-driven approaches . . . . . . . . . . . . . .

31
31
32
32
33
34
vii

CONTENTS
4.4 A matter of performance: how do these methods compare to each other?
4.5 The advent of deep learning: Learning-based, data-driven approaches . .
4.6 Formalizing learning-based, data-driven mask design . . . . . . . . . . .
4.6.1 Non-adaptive sampling masks . . . . . . . . . . . . . . . . . . . .
4.6.2 Adaptive sampling masks . . . . . . . . . . . . . . . . . . . . . .
4.6.3 On the optimality of the discrete mask optimization problem . .

35
36
39
40
41
42

5 Scalable learning-based sampling optimization
5.1 Learning-based Compressive Sensing (LBCS) . . . . . . . . . . . . . . .
5.2 LBCS motivation: Submodularity . . . . . . . . . . . . . . . . . . . . . .
5.3 Scaling up LBCS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.1 Limitations of LBCS . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.2 Stochastic LBCS . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.3 Lazy LBCS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 Experiments - Stochastic LBCS . . . . . . . . . . . . . . . . . . . . . . .
5.4.1 Experimental setting . . . . . . . . . . . . . . . . . . . . . . . . .
5.4.2 Comparison of greedy algorithms . . . . . . . . . . . . . . . . . .
5.4.3 Single coil results . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4.4 Multicoil experiment . . . . . . . . . . . . . . . . . . . . . . . . .
5.4.5 Large scale static results . . . . . . . . . . . . . . . . . . . . . . .
5.5 Experiments - Lazy LBCS . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5.1 2D multicoil setting . . . . . . . . . . . . . . . . . . . . . . . . .
5.5.2 3D multicoil setting . . . . . . . . . . . . . . . . . . . . . . . . .
5.6 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .

47
47
49
52
52
52
54
55
56
58
61
70
71
72
73
74
77

6 On the benefits of deep RL in accelerated MRI sampling
6.1 Reinforcement Learning to optimize sampling trajectories . . . . . . . .
6.1.1 A short RL primer . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.2 The special case of MRI . . . . . . . . . . . . . . . . . . . . . . .
6.1.3 Double Deep Q-Networks . . . . . . . . . . . . . . . . . . . . . .
6.1.4 Policy Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.5 LBCS as a simple RL policy . . . . . . . . . . . . . . . . . . . . .
6.1.6 The questions at hand . . . . . . . . . . . . . . . . . . . . . . . .
6.2 The MRI data processing pipeline . . . . . . . . . . . . . . . . . . . . .
6.3 Re-examining deep RL for MRI sampling . . . . . . . . . . . . . . . . .
6.3.1 Experimental setting . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.2 Main results on Bakker et al. (2020) . . . . . . . . . . . . . . . .
6.3.3 Main results on Pineda et al. (2020) . . . . . . . . . . . . . . . .
6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.5 Relation to other works . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

81
82
83
85
86
88
89
89
90
93
93
95
97
98
101
101

7 Uncertainty driven adaptive sampling via GANs

103

viii

CONTENTS
7.1

7.2
7.3
7.4

7.5
7.6
7.7

Generative adversarial networks (GANs) for posterior modeling . . . . .
7.1.1 Generative adversarial networks (GANs) . . . . . . . . . . . . . .
7.1.2 Conditional GANs . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.3 Conditional GANs for posterior modeling . . . . . . . . . . . . .
7.1.4 Learning the posterior or learning moments . . . . . . . . . . . .
GAN based adaptive sampling . . . . . . . . . . . . . . . . . . . . . . .
Comparison with Zhang et al. (2019b) . . . . . . . . . . . . . . . . . . .
Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4.1 MRI adaptive sampling . . . . . . . . . . . . . . . . . . . . . . .
7.4.2 Beyond Fourier sampling: image-domain conditional sampling . .
Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8 Conclusion and future works

104
104
105
107
108
110
112
113
113
119
121
123
125
127

A Appendix for Chapter 6
133
A.1 Implementation details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
A.2 Further experiments and results on Bakker’s setting . . . . . . . . . . . 135
A.2.1 Data processing differences . . . . . . . . . . . . . . . . . . . . . 135
A.2.2 Exact reproduction of Bakker et al. . . . . . . . . . . . . . . . . 136
A.2.3 Further results in the Bakker experiment . . . . . . . . . . . . . 137
A.3 LBCS vs long-range adaptivity . . . . . . . . . . . . . . . . . . . . . . . 140
A.4 LBCS complexity and parameters . . . . . . . . . . . . . . . . . . . . . . 142
A.5 Visual comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
A.6 Detail on the practical recommendations . . . . . . . . . . . . . . . . . . 147
B Appendix for Chapter 7
149
B.1 Implementation details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
B.1.1 Knee experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 149
B.2 Extended results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
B.2.1 Additional MRI knee results . . . . . . . . . . . . . . . . . . . . . 151
B.2.2 Brain experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
B.2.3 Additional MNIST results . . . . . . . . . . . . . . . . . . . . . . 153
B.2.4 CIFAR10 experiment . . . . . . . . . . . . . . . . . . . . . . . . . 155
Bibliography

159

ix

1 Introduction

Since its inception in the 1970s, Magnetic Resonance Imaging (MRI) has deeply impacted
radiology and medicine. It allows for a non-invasive, non-ionizing imaging of the body,
and enables among other things to visualize anatomical structures, physiological functions
and metabolic information with unmatched precision (Wright, 1997; Feng et al., 2016b).
Magnetic Resonance Imaging, as its name indicates, constructs an image of an object of
interest by exploiting magnetic resonance. Namely, when submitted to a strong static
magnetic field B0 , the protons in the object, mostly in water molecules, will resonate
when excited through a radio frequency pulse. In order to encode spatial information in
the observed signal instead of a receiving global response, additional time and spatially
varying magnetic fields, referred to as gradient fields, are superimposed to the original
magnetic field B0 .
However, this process does not give direct access to an image, but rather informs us
about the different frequencies that construct it. By varying the gradient fields, it is
possible to obtain frequency information relating to different locations and represent
them in what is known as the Fourier space, or k-space. Figure 1.1 shows the difference
between an image and its Fourier space representation. The object of interest is excited
several times using different varying gradient fields, which defines a k-space trajectory
along which data are acquired. When sufficient information has been acquired, an image
can be readily obtained by computing the inverse Fourier transform of the signal.
There is considerable flexibility in the design of k-space trajectories, but a particularly
common approach is to cover the k-space in a grid-like fashion, known as Cartesian
sampling. A grid-like covering of k-space implies that the image can be directly obtained
by applying a fast Fourier transform (FFT). Other approaches, known as non-Cartesian,
do not cover k-space in a grid-like fashion, but can consist of radial spokes (Lauterbur,
1973), spirals (Meyer et al., 1992) or space-filling curves (Lazarus et al., 2019). They
can provide a faster acquisition than Cartesian sampling and benefits such as resistance
to motion artifacts, but require a more involved and slower reconstruction through
1

Introduction
Fourier space

Image space

Inverse Fourier
Transform

Figure 1.1: Fourier domain and image domain representation of a knee MRI scan.

techniques such as gridding (O’sullivan, 1985; Jackson et al., 1991) or non-uniform fast
Fourier transform (NUFFT) (Fessler and Sutton, 2003), as well as density compensation
(Pipe and Menon, 1999; Pauly, 2007). In addition to this, non-Cartesian trajectories can
be sensitive to errors in the magnetic field gradients, due to field imperfections of various
sources (Vannesjo et al., 2013). As a result, Cartesian trajectories have been the most
popular in practice (Lustig et al., 2008; Feng et al., 2016b).
However, MRI examinations suffer from being relatively slow compared to other imaging
modalities, due to the sequential nature of the acquisition. For instance, MRI acquisition
can easily exceed 30 minutes (Zbontar et al., 2019). Improving imaging speed has been a
major concern for researchers of the community and has led to many innovations along
the years, such as rapid imaging sequences (Frahm et al., 1986; Hennig et al., 1986)
or parallel/multi-coil imaging (Sodickson and Manning, 1997; Pruessmann et al., 1999;
Griswold et al., 2002a). But there is a need to further accelerate imaging. As scanning
time is directly related to the number of acquisitions, collecting fewer of them directly
enables faster imaging. In Cartesian imaging, this can be achieved by undersampling
the Fourier space, which is implemented by not acquiring some lines in Fourier space.
However, as we see on Figure 1.2, such undersampling degrades the quality of the image
by introducing artifacts. This is called aliasing. As a result, an additional step of
reconstruction is required in order to obtain a cleaner estimation of the original image.
The problem of reconstructing data from undersampled measurements is a typical
instance of an ill-posed inverse problem. It is called an inverse problem because we
aim at constructing an image of an object of interest from indirect measurements, in
this case frequency information about the image. It is ill-posed because there is not
sufficient information to directly retrieve the image of interest from the undersampled
measurements. More precisely, there are infinitely many signals that could produce these
partial measurements.
The typical way of solving this problem is by exploiting prior knowledge about the data,
by leveraging some structure that is present in them. A particularly successful approach
2

Introduction
Unobserved data
Fully sampled
image

Observed data
Undersampled
Fourier spectrum

Undersampled
image

iﬀt

Reconstructed
image

Reconstructor

iﬀt

Undersampling

Fourier spectrum

Sampling mask

Figure 1.2: Accelerated MRI by undersampling. Undersampling is governed by a sampling mask
that tells what locations are acquired. The undersampled Fourier spectrum is then mapped back to
image domain through an inverse Fourier transform, which gives an aliased image which needs
to go through a reconstruction method in order to obtain a de-aliased estimation of the original,
unobserved image. Here ifft stands for inverse fast Fourier transform.

relies on sparsity-based priors, where it is assumed that if the signal can be represented
sparsely, or compactly in some transform domain, then, it can be reconstructed from
undersampled measurements. This approach is known as Compressed Sensing (CS)
(Donoho, 2006; Candès et al., 2006), and has been successfully applied to MRI (Lustig
et al., 2008), for instance by representing the signal sparsely in the Wavelets domain.
The signal is retrieved by solving an optimization problem with two components: a
data-fidelity term that ensures that the reconstructed image is close to the measurements,
and a regularization term that encourages the reconstructed to be sparse in some domain.
In the latter half of 2010s, deep learning-based approaches allowed pushing even further
the boundaries of the state-of-the-art in MRI reconstruction (Sun et al., 2016; Schlemper
et al., 2017; Mardani et al., 2018). By leveraging increasingly large amount of data,
these methods are able to learn the structure directly from data, rather than imposing it
through a mathematical model such as sparsity. These methods not only improved the
quality of reconstruction, but also enabled fast reconstruction times, where CS methods
often relied on slow iterative procedures.
However, in parallel to these developments, only few works tackled the problem of how the
k-space should be undersampled. Although CS prescribed random sampling, practitioners
quickly turned to heuristics favoring low frequency sampling, where a lot of energy and
information about the structure of the signal is located (Lustig et al., 2008). Nevertheless,
there were only a few works in the early 2010s aiming at directly optimizing the sampling
3

Introduction
mask1 , which can be attributed to the difficulty of the problem. Optimizing a sampling
mask is a combinatorial problem, for which commonly used gradient-based optimization
techniques are not readily available. As a result, two main trends emerged: approximate
solutions to the combinatorial problem (Ravishankar and Bresler, 2011a; Liu et al., 2012),
or the optimization of an underlying probability distribution from which a random mask
is then sampled (Knoll et al., 2011b; Chauffert et al., 2013).
In more recent years, there has been a renewed interest in this problem, and a flurry of
approaches have been proposed (Gözcü et al., 2018; Jin et al., 2019; Sherry et al., 2019;
Bahadir et al., 2019; Huijben et al., 2020; Pineda et al., 2020; Bakker et al., 2020; Yin et al.,
2021). This is mainly due to the wide availability of training data, the fast reconstruction
times of deep learning-based methods and the wider use of auto-differentiation frameworks
such as PyTorch and TensorFlow, which facilitate the routine computation of derivatives
through complex models. While these methods are very diverse, they fit under two
distinctive categories. Whereas previous approaches leveraged an underlying model from
which candidate sampling masks were drawn, modern approaches are learning-based in the
sense that they learn this model directly from data. Furthermore, earlier methods were
model-driven, designing masks to minimize a mathematical criterion such as coherence
(Lustig et al., 2008). In contrast, recent works are data-driven, as they learn policies that
minimize the reconstruction error against a ground truth.
In this thesis, we aim at contributing to the problem of optimizing sampling masks along
two complementary directions.
First, we will provide novel methodological contributions to the problem of sampling
optimization. More specifically, we aim at developing mask optimization algorithms that
do not depend on a specific reconstruction algorithm, as this will allow the method to
remain usable in a field where the state-of-the-art evolves very quickly. In addition, we
are concerned with designing scalable methods. Most clinically relevant settings involve
high resolution, multi-coil images, and it is imperative for sampling optimization methods
to scale up to such challenging settings in order to provide practical value.
Secondly, we aim at deepening the understanding of the kind of structures or assumptions
that enable mask design algorithms to perform well in practice. It is not clear from the
current literature what components are fundamental in a mask design algorithm. Should
the mask design algorithm plan several steps of ahead? Should it adapt to each different
patient? What kind of feedback should be available to the mask design algorithm in
order to achieve the best performance? Amidst the race to develop ever more complex
models, we aim at identifying the key components that drive the performance of modern
mask design algorithms, in order to efficiently reach state-of-the-art performance.
We will refer to this problem interchangeably as the problem of mask design, sampling optimization,
mask optimization, sampling mask optimization, sampling pattern optimization, or optimizing the
acquisition trajectory.
1

4

(
<latexit sha1_base64="lgiXS4OEveEQHyxLkPoL5+Wq1l4=">AAAB8HicbVDLSsNAFL2pr1pfUZduBovgqiQi6s5SNy4r2Ie0sUymk3ToZBJmJkIJ/Qo3LhRxq3/gZ+jKv3HSutDWAxcO59zLvff4CWdKO86XVVhYXFpeKa6W1tY3Nrfs7Z2milNJaIPEPJZtHyvKmaANzTSn7URSHPmctvzhRe637qhULBbXepRQL8KhYAEjWBvppltjYdjNUKlnl52KMwGaJ+4PKVf9z9r5m/1e79kf3X5M0ogKTThWquM6ifYyLDUjnI5L3VTRBJMhDmnHUIEjqrxscvAYHRilj4JYmhIaTdTfExmOlBpFvumMsB6oWS8X//M6qQ7OvIyJJNVUkOmiIOVIxyj/HvWZpETzkSGYSGZuRWSAJSbaZJSH4M6+PE+aRxX3pHJ8ZdK4hSmKsAf7cAgunEIVLqEODSAQwT08wpMlrQfr2XqZthasn5ld+APr9RtUg5M2</latexit>

Chapter 2

MRI
Background

Chapter 4

Optimizing sampling

Double shift:
From model based to learning-based
From model-driven to data-driven

Chapter 3

Reconstruction
From model-driven to
data-driven approaches

(

Introduction

>tixetal/<2M5gUtR9rPA+dl5nsahtZqX2rfQrlMpw80TwQASDOEqLVIEnugAc7fAsKmSh4KdZ8JHp3XxRa+EP+6M4HSJZabSJASWRuZGSYGSkzTEpZWvH/jyxIVOIimOkUVNJJyIvO7Qq6M//X8SWo6BsMmuvFpBlOmxEfTdTaIhmYJ4jliRHYAvcsxrqjEIUHnmDhMJBRTV3L5InjUDLyYfi6MuqWhTTKgo0M5X3fk97e1/m5r9z9fVKP4+JaGwMK25lnlKUNjdYjtlppvBWjEAYhK8LQRpeXbBLUhq736eRhzvtcmPHSRU7nSTzNAamKvyHtZJPEPIaJNlim2Z7sfrN3Yt1W6aKepFXYhVVX68OKdWC4ffvLz95OcxAWDtuSH3vKj+Zg/3qxRhL3oQ/JIkJmJBZoT3kmyUs0eI2r4yNS5s6iQiqgvoBudZUfp1rp2LFANsSLDVbciH8BAAA>"=4l1qW+5LoPkLxyHQEevEO4SXigl"=46esab_1ahs tixetal<

Chapter 6

Limitations of
deep RL

Deep RL does not bring a large
added value over sLBCS

Chapter 5

Scalable LBCS
Scaling up mask optimization

Chapter 7

GAN-based
sampling

Chapter 8

Conclusion
and insights

What are the components that
enable to reach SotA eﬃciently?

Posterior modeling for reconstruction,
uncertainty quantification
and adaptive sampling

Figure 1.3: Outline of the different chapters in this thesis. Grey boxes refer to chapters dealing
with background material. Red boxes denote chapters with methodological contributions, and
blue boxes denote chapters that contribute to a better understanding of the problem of optimizing
sampling. Arrows show particular, direct dependencies between chapters.

Organization and contributions
Figure 1.3 illustrates the organization of this thesis, and each chapter is briefly described
below, along with their respective contributions.

Chapter 2 – MRI Background
In this chapter, we provide a brief introduction to the physics underlying image acquisition
in MRI, focusing on Cartesian sampling. We introduce multi-coil acquisition as well as
dynamic imaging. We then provide the mathematical model that will be used to describe
the acquisition in accelerated MRI.

Chapter 3 – Reconstruction methods
In this chapter, we review the algorithms used to reconstruct images from undersampled
measurements. We describe the shift that has taken place from model-driven approaches
based on Compressed Sensing (Lustig et al., 2007) towards data-driven approaches that
construct a model from training data. We then focus on the different trends that have
taken place within data-driven approaches, focusing on trained reconstruction models
and generative approaches. Finally, we conclude by highlighting the trade-offs between
the speed of reconstruction and the dependence on large training datasets.
5

Introduction

Chapter 4 – Optimizing sampling patterns
In this last introductory chapter, we review the different approaches that have been taken
to optimize sampling masks. We argue that, in a similar way that reconstruction methods
moved from model-driven to data-driven approaches, a double shift has taken place in
mask design. We show that mask design methods gradually moved from model-based,
model-driven to learning-based, data-driven approaches. We discuss how this trend can be
seen in the literature, and how it ties in with the empirical performance of these methods.
Then, we proceed to formalizing the problem of learning-based, data-driven mask design
and show how this problem changes when one aims at designing fixed policies – where a
mask is trained and re-used as is at test time – or adaptive policies – that adapt to the
current patient by integrating on-the-fly the information as it acquired. We conclude
by proving the optimality of the discrete mask optimization problem, compared to the
problem of finding the optimal sampling distribution.
Related publication.
Sanchez, T., Gözcü, B., van Heeswijk, R. B., Eftekhari, A., Ilıcak, E., Çukur, T., and
Cevher, V. (2020a). Scalable learning-based sampling optimization for compressive
dynamic MRI. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), pages 8584–8588.

Chapter 5 – Scalable learning-based sampling optimization
In this chapter, we extend the Learning-Based Compressive Sensing (LBCS) approach
of Gözcü et al. (2018). We primarily aim at drastically improving their scalability of
their method. We first discuss how their approach is rooted in submodular optimization
(Krause and Golovin, 2014) and how algorithms proposed in this field could provide
solutions to the limitation of LBCS. Their method scales as O(P 2 ), where P is the
resolution of the image, such a complexity is prohibitive in high dimensional settings
such as 3D MRI or dynamic MRI, and is not convenient when reconstruction becomes
slow, such as in multi-coil MRI. To address this, we first propose lazy LBCS (lLBCS), for
the multi-coil and 3D settings. This algorithm leverages lazy evaluations (Minoux, 1978),
where a list of upper bounds is precomputed and then traversed sequentially, drastically
reducing the amount of computation over LBCS. Secondly, in the context of dynamic
(multi-coil) MRI and large datasets, we propose stochastic LBCS (sLBCS) and show that
it can achieve a very significant computational gain compared to LBCS, while retaining
its performance.
Related publications.
Gözcü, B., Sanchez, T., and Cevher, V. (2019). Rethinking sampling in parallel MRI:
A data-driven approach. In 27th European Signal Processing Conference (EUSIPCO).
6

Introduction
Sanchez, T., Gözcü, B., van Heeswijk, R. B., Eftekhari, A., Ilıcak, E., Çukur, T., and
Cevher, V. (2020a). Scalable learning-based sampling optimization for compressive
dynamic MRI. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), pages 8584–8588.

Chapter 6 – On the benefits of deep RL in accelerated MRI sampling
In this chapter, we assess the performance of state-of-the-art (SotA) deep reinforcement
learning (RL) methods for mask designs. These methods improve in principle over
LBCS by designing policies that adapt to the patient at hand and leverage long-horizon
planning in order to make better decisions. We mainly aim at making sense of seemingly
contradictory conclusions. The work of Pineda et al. (2020) seems to indicate that long
term planning could be the most important component in deep RL, as their results show
that a non-adaptive, long term planning policy model trained on the dataset can perform
as well as an adaptive, long-term planning policy. On the contrary, the contribution of
Bakker et al. (2020) highlights the importance of adaptivity, as a greedy policy, that does
not do long-term planning is found to closely match policies that do long term planning.
Our results, surprisingly show that sLBCS, a non-adaptive method that does not attempt
long term planning, can perform as well as the state-of-the-art approaches of Pineda
et al. (2020); Bakker et al. (2020). We further show that the current SotA RL methods
only bring at best a limited added value, and that other factors such as the architecture
of the reconstruction algorithm or the masks used to train it have a much larger impact
on the overall performance. This work highlights the need for further discussions in
the community about standardized metrics, strong baselines, and careful design of
experimental pipelines to evaluate MRI sampling policies fairly.
Related preprint.
Sanchez, T.∗ , Krawczuk, I.∗ and Cevher, V. (2021). On the benefits of deep RL in accelerated MRI sampling. Available at https:// openreview.net/ pdf?id=fRb9LBWUo56 .
∗ denotes equal contribution.

Chapter 7 – Uncertainty driven adaptive sampling via GANs
In this chapter, we take a step back from the question of designing the best sampling
policy, and primarily aim at exploring how conditional Generative Adversarial Networks
(GANs) (Goodfellow et al., 2014) can be used to model inverse problems in a Bayesian
fashion. GANs have been used mainly as reconstruction models in the context of inverse
problems (Yang et al., 2018; Chen et al., 2022), but a few works took an additional step
to show the value of learning the full posterior distribution in order to have the ability
to provide uncertainty quantification as well (Adler and Öktem, 2018a).
7

Introduction
We argue that they do more and that conditional GANs for MRI reconstruction can
naturally provide a criterion for adaptive sampling: by sequentially observing the location
with the largest posterior variance in the measurement domain, the GAN provides an
adaptive sampling policy without ever being trained for it.
Such a criterion is not restricted to acquisition in Fourier domain, like in MRI, but
can also be readily used for image domain sampling. In this context, we show that our
GAN-based policy can strongly outperform the non-adaptive greedy policy from sLBCS.
However, in Fourier domain, the GAN-based policy does not match the performance
of sLBCS. We provide an explanation to this phenomenon rooted in the concept of the
information horizon used to make a decision at each step. We show that our model
uses less information than LBCS to design its policy, and argue that this is the reason
leading to its inferior performance. However, when compared with models that use the
same amount of information to inform their policy, our model largely outperforms the
competition.
Related preprint and workshop paper.
Sanchez, T., Krawczuk I., Sun, Z. and Cevher V. (2019). Closed loop deep Bayesian
inversion: Uncertainty driven acquisition for fast MRI. Preprint available at https:
//openreview.net/pdf?id=BJlPOlBKDB.
Sanchez, T., Krawczuk, I., Sun, Z. and Cevher V. (2020). Uncertainty-Driven Adaptive
Sampling via GANs. In NeurIPS 2020 Workshop on Deep Learning and Inverse
Problems. Available at https://openreview.net/pdf?id=lWLYCQmtvW.

Chapter 8 – Conclusion and future works
In this last chapter, we review the contributions of the different chapters and summarize
the insights that our contributions bring to the design of masks for Cartesian MRI
sampling. We then provide an outlook of the opportunities for future works.
Remark (Bibliographic note). At the end of Chapters 4, 5, 6 and 7, we include a
“bibliographic note” section, where we distinguish the contributions of the co-authors
of the abovementioned publications.
Additional publication not included in this thesis.
Sun, Z., Latorre, F., Sanchez, T., and Cevher, V. (2021). A plug-and-play deep image
prior. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), pp. 8103-8107.

8

2 MRI Background

Before diving in greater detail in the problems of reconstruction from undersampled
measurements and optimization of the sampling masks, we first present the physical
underpinnings of MRI. Although this survey will be very short, it hints at the remarkable
depth and flexibility of MRI. This introduction is based upon the well-known textbook
of Brown et al. (2014)1 .

2.1

MRI physics

2.1.1

Magnetization

Let us then consider a proton under a strong external magnetic field B0 . In practice,
such a field is obtained by running a large electric current through a coil around the
scanner. High current is required because typical magnetic fields in MRI are very large,
around 3 T, and the strength of the magnetic field is directly proportional to the one of
the electric current2 .
The spin of a proton, which is positively charged, can be represented as a gyroscope
precessing about the field direction, as illustrated on Figure 2.1. The precession angular
frequency can be quantified as the Larmor frequency ω0 , with
ω0 = γB0

(2.1)

where γ is known as the gyromagnetic ratio, and is a particle dependent constant. For
instance, in water, the hydrogen proton has γ ≜ γ/(2π) ≈ 42.6 MHz T−1 . Note that
deriving this formula is involved and requires tools from relativistic quantum mechanics.
This description considers a single hydrogen nucleus in a static magnetic field, but in
1
2

Part of the content was presented in Sanchez (2018)
For comparison, the strength of the magnetic field of the earth is typically between 40 and 70 µT.

9

CHAPTER 2. MRI BACKGROUND

Figure 2.1: Left: Illustration of precession of a particle when subject to an external field B0
(large green arrow). The small red arrow corresponds to the spin of the particle, and its temporal
evolution is illustrated by the black arrow. Right: Illustration of the transverse magnetization
appearing when a body is subject to an RF pulse, due to the protons precessing synchronously,
and reaching the higher energy level, which results in arrows pointing downwards. This Figure
was first used in Sanchez (2018).

practice, we are dealing with an entire population. Depending on their levels of energy,
different protons of a population will either align in a parallel fashion against an external
field B0 , while some will take an antiparallel position. Although there is only a very little
amount of spins parallel to the magnetic field exceeding the number of spins antiparallel
to it, this is sufficient to have a spin excess that yields a noticeable net magnetization
M0 at a population level. M0 will be parallel to B0 because different protons will have
incoherent phases, and will overall cancel each other out, leading to a net magnetization
parallel to the static field. We call this resulting state the equilibrium of the system,
which is illustrated in the middle of Figure 2.1, where in this case M0 = Mz . However,
this net magnetization cannot be detected in the equilibrium, as we need to excite the
protons in order to induce a change in the magnetization, which in turn will induce an
electrical current in a receive coil. This coil will allow to record a signal corresponding to
the response of the protons to the excitation. The different components of the MRI that
will be discussed throughout this chapter are illustrated on Figure 2.2.

2.1.2

Excitation

Let us denote by ẑ the direction of the external field B0 , and let us consider a body that
we wish to image, for instance the knee of a patient. It is clear from the previous section
that the hydrogen nuclei of the body will produce a net magnetization M0 = M0 ẑ, where
M0 denotes the intensity of magnetization. The atoms will precess around ẑ at the
Larmor frequency ω0 .
In the excitation step, the system is destabilized by sending a radiofrequency (RF) pulse.
It is easier to describe it and consider its effect by placing ourselves in a frame rotating
at an angular frequency ω0 around the axis ẑ. This gives the following rotating unit
10

2.1. MRI PHYSICS

Main magnet coils

Receive coils

RF transmit
coils

Gradient coils
Patient table

Main magnet coils

Figure 2.2: Illustration of an MRI scanner and the different coils and components required to
produce an image.

vectors x̂′ , ŷ′
x̂′ = cos(ω0 t)x̂ − sin(ω0 t)ŷ
ŷ′ = sin(ω0 t)x̂ + cos(ω0 t)ŷ
and ẑ remains unchanged. We define the RF pulse as a circularly rotating pulse of
intensity B1 , namely B1 = B1 x̂′ . Applying this RF pulse will as a result cause the nuclei
to resonate and flip the equilibrium magnetization M0 into a new net magnetization M
in the x̂′ − ŷ′ plane. M will now have both a transverse component in the x̂′ − ŷ′ plane,
and a longitudinal component along the ẑ axis. We refer to the angle between M and
the ẑ-axis as the flip angle (FA), and it quantifies how strongly the magnetization M
has been moved away from its equilibrium state M0 . This process is illustrated in the
right of Figure 2.1. In particular, we see that the transverse component arises due to the
synchronization of the precessing phases of the hydrogen nuclei, which cancelled each
other out at equilibrium.

2.1.3

Relaxation

After applying the RF pulse, the system will gradually return to equilibrium, where M0
is aligned with B0 . This will require the nuclei to re-emit the energy that they absorbed
during the excitation phase. The relaxation is governed by two main components,
namely a longitudinal relaxation, characterized by a time constant T1 , and a transverse
relaxation, characterized by T2 .
T1 Relaxation. The T1 relaxation describes the regrowth of the longitudinal component
of the magnetization, i.e. how the ẑ component of M grows back over the relaxation.
This is also known as spin-lattice relaxation, as this describes a local process due to the
11

CHAPTER 2. MRI BACKGROUND
interaction of the spins with their surroundings. The process is formally described as




Mz (t) = M0 1 − exp (−t/T1 ) + Mz (0) exp (−t/T1 )

(2.2)

where Mz (t) is the longitudinal magnetization, Mz (0) describes the amount of longitudinal
magnetization left after the RF pulse, and T1 is a constant that depends on the type of
tissue considered. We see that the longitudinal magnetization will be recovered at an
exponential speed, and the speed will be determined by T1 .
T2 Relaxation. The T2 relaxation process describes how clusters of spins gradually
lose phase with each other after excitation. This dephasing will cause the transverse
magnetization Mx̂′ ,ŷ′ or M⊥ to progressively disappear, as the equilibrium state is reached
(cf. middle of Figure 2.1). We have
M⊥ (t) = M0 exp (−t/T2 )

(2.3)

where T2 is a tissue-dependent time constant that describes how fast the transverse
magnetization vanishes. In practice, T2 ≪ T1 , making the T2 relaxation much faster.
Also, in practice, the signal is likely to suffer additional suppression due to dephasing
from inhomogeneities in the external field B0 .
As T1 and T2 are tissue dependent constants, constructing images by performing measurements at different steps of the relaxation allows to highlight different characteristics
of tissues. They are one of the key components of the flexibility of MR imaging.

2.1.4

Image acquisition

Until now, we have discussed the global response of a sample to an RF pulse, but we
have not addressed how to encode spatial information into the signal, in order to be able
to construct an image. As previously discussed, the signal will be recorded by a receive
coil, where an electric current is induced depending on the relaxation of the spins.
But the question of localizing the signal remains whole so far. To solve this issue, we will
try to give to each location in the body a unique combination of phase and frequency, in
order to be able to retrieve its contribution from the signal in the receive coil. This is
achieved by introducing three additional coils to the scanner, known as gradient coils,
whose role will be to produce some spatially varying magnetic fields in the x̂, ŷ and
ẑ directions respectively. As a result, the tissues of specific areas of the object will
be excited, and this will allow to observe their local responses. Then, by repeating
a measurement with different gradients, we will be able to sequentially construct a
map of how the tissue responds to the RF pulses depending on its location. This is
why an acquisition is often referred to as a pulse sequence, as we gather a sequence of
12

2.1. MRI PHYSICS
measurements in order to obtain a full image.
There exists several ways to construct an image with MRI, and we will illustrate how
it works by considering a 2D gradient echo sequence, and we will explain its different
components. Sequences are frequently represented as diagrams, such as the one of Figure
2.3, where each line represents how different component vary in intensity through time.
If we simply excite a body with an RF pulse, neglecting the relaxation effects, we can
write the signal observed in the receive coil as
S(t) =

Z

(2.4)

ρ(x)ei(ω0 t+ϕ(x,t)) dx

where ρ(x) is the effective spin density, and is the quantity that we wish to measure, and
ϕ(x, t) is the accumulated phase
ϕ(x, t) = −

Z t
0

ω(x, t′ )dt′ .

(2.5)

We see that in Equation 2.4, without additional considerations, the signal that we observe
is integrated over the whole volume being imaged. In Equation 2.5, the main reason that
ω is now a function of space and time is that we will add spatially varying gradient fields
that in turn make ω dependent on space and time. One can also see that Equation 2.4
shows that measurements will not be provided directly in image space, but rather in
Fourier space, often referred to as k-space: MRI allows us to obtain frequency information
about our object of interest, and then an image can be obtained by mapping back these
measurements into image space.
Slice Selecting (SS) Gradient. As we see on Figure 2.3, during the RF pulse, a constant
z-gradient is applied during a time τRF with intensity GSS , and reversed afterwards.
We will not detail here why this gradient is reversed, but this has the effect of making
all components in the slice in phase, with a common accumulated phase ϕ = 0. This
enables to select the slice of interest and make sure that the signal will be strong for the
acquisition. As a result, for a slice at location z0 with thickness ∆z, the signal will be
integrated out on the slice only, i.e. we will have the spin density
S(t1 ) =

Z Z "Z z0 + ∆z
2

z0 − ∆z
2

#

ρ(x, y, z)dz dydx ≈

Z Z

ρ(x, y, z0 )dydx.

Phase Encoding (PE) Gradient. Let us now add the phase encoding y-gradient to
the picture. Assume that it is applied during a time τPE at intensity Gmin ≤ Gy ≤ Gmax .
This will result in gradients acquiring a y-dependent phase, and as a result
S(t2 ) =

Z Z

−i2π γ Gy τPE y

ρ(x, y, z0 )e



dy dx

13

CHAPTER 2. MRI BACKGROUND

Figure 2.3: A 2D gradient echo sequence diagram. Each line describes how different components
interact during the time of a single excitation. On the first line, we see the RF pulse enabled at
t = 0. On the second line, we have the slice selecting gradient Gz,SS , on the third line we have
the phase encoding gradient Gy,PE , on the fourth we have the readout gradient Gx,R and finally
the analog-to-digital converter (ADC).

14

2.1. MRI PHYSICS
Readout (R) Gradient. Finally, the x-gradient is first turned on negatively, dephasing
the phases, followed by a rephasing stage where the observation through the ADC is
carried out. It is enabled at the strength Gx and for a duration TS . Given the shifted
variable t′ = t − TE , we have as a result
′

S(t ) =

Z Z

ρ(x, y, z0 )e

−i2π γ Gy τPE y



dy e

−i2π γ Gx t′ x

dx,

− TS /2 ≤ t′ ≤ TS /2. (2.6)

Equation 2.6 is very interesting as one can rewrite the (x, y)-directions as spatial
frequencies depending on t and τy , namely
(kx , ky ) = ( γ Gx t′ , γ Gy τPE ).
This in turns enables us to rewrite Equation 2.6 as
S(kx , ky ) =

Z Z

ρ(x, y, z0 )e−i2π(kx x+ky y) dxdy

(2.7)

which turns out to be the Fourier transform of the spin density. More precisely, the
signal obtained in the presence of a gradient echo structure in the readout direction after
its encoding by a fixed Gy describes a line in the k-space of the Fourier transform of
ρ(x). This implies that by acquiring several lines at different ky by varying Gy , we can
effectively cover the k-space, as illustrated in Figure 2.4. As a result, when the k-space
has been sufficiently densely covered, the spin-density can be recovered by taking the
inverse Fourier transform of Equation 2.7, i.e.
ρ(x, y, z0 ) =

Z Z

S(kx , ky )e+i2π(kx x+ky y) dkx dky

(2.8)

This approach is referred to as Cartesian sampling because it covers k-space in a grid-like
fashion, as illustrated on Figure 2.4. Note that there exist different sequences that can
perform Cartesian sampling, and even more sequences that are not Cartesian. We refer
to such approaches as non-Cartesian sampling approaches, and they can include radial
sampling (Lauterbur, 1973), spiral sampling (Meyer et al., 1992) or even some more
flexible coverages of k-space (Lazarus et al., 2019).
Cartesian MRI has the advantage over these methods of enabling the use of the fast
Fourier transform to compute the inversion of Equation 2.8, which is not the case for
non-Cartesian methods. These approaches require some more sophisticated inversion,
involving gridding (O’sullivan, 1985) and non-uniform Fourier transforms (Fessler and
Sutton, 2003). The differences are however deeper than merely the inversion technique,
as radial trajectories are known to be more robust to motion artifacts, while spiral
acquisitions enable fast imaging but are prone to other artifacts, and Lustig et al. (2008)
discuss these issues in greater depth.
15

CHAPTER 2. MRI BACKGROUND

Figure 2.4: Traversal of k-space for a gradient echo pulse sequence. The combined application of
the phase encoding gradient Gy,PE and readout gradient Gx,R males the signal move in k-space to
the desired location in a diagonal trajectory. The phase encoding gradient is then stopped, and
the readout gradient is reversed, bringing the spins back through a complete set of kx value while
keeping ky constant. The experiment is then repeated by returning to a different ky value until
the whole of k-space has been covered in a grid-like fashion.

2.1.5

Parallel imaging

The need to cover k-space through sequential readouts makes MR imaging a slow imaging
technique, and the quality of the images is greatly dependent on the absence of motion
during the acquisition. This also implies that for body MRI, the patient should not
breathe during the whole acquisition process, as this motion would yield a reconstruction
image with many undesired artifacts (Glockner et al., 2005). This has motivated further
research into ways to accelerate the acquisition.
A first step in this direction is the use of parallel imaging. This technique relies on a
multicoil MRI acquisition, which was introduced at the beginning of the 1990s by Roemer
et al. (1990). The idea was to use several receive coils positioned at different locations
instead of a single large one. The coils record different signal intensities depending on their
spatial location, which enables to achieve a larger signal to noise ratio (SNR) compared
to a single coil acquisition. It was subsequently observed that instead of increasing the
SNR compared to a single coil acquisition, exploiting the geometry of the coil array
could help accelerate MRI by resolving the aliasing due to spacing out successive k-space
phase encodes by a factor R known as the acceleration factor (Sodickson and Manning,
1997). This is what is known as parallel imaging. However, this acceleration introduces a
tradeoff between acceleration and SNR as increasing R will reduce the SNR by a factor
√
R. Reconstructing the image becomes also non-trivial, as the images from each coil
16

2.1. MRI PHYSICS
Image from the
second coil

k-space sampling
of the second coil

R=4

R=3

R=1

Reconstruction with Reconstruction with
sum of squares
coil sensitivities

Figure 2.5: Example of parallel imaging on a cardiac image, simulated with three coils, whose
sensitivities are three superimposed rectangles, each covering one third of the image. First column:
combination via the sum of squares (SoS) technique, where the images from each coil are simply
summed weighted by their modulus. Second column: Reconstruction taking into account the
sensitivity maps, allowing for precise reconstruction for an acceleration rate R up to 3. Third
column: Image reconstructed for the second coil when taking into account its sensitivity. Fourth
column: k-space of the second channel at different undersampling rates.

have a decreased field of view (FoV), resulting in aliasing when extended to the full FoV,
as illustrated on Figure 2.5.
Various approaches have tackled this challenge, but they generally rely on some patientdependent calibration. For instance, SENSE3 (Pruessmann et al., 1999) considered coil
sensitivity maps, which describe how each coil responds to a signal at each location of
the FoV. They are typically estimated from data from a prescan (Blaimer et al., 2004).
GRAPPA4 (Griswold et al., 2002a) uses a small part of the signal near the center of
k-space, called autocalibration signal, in order to compute how the data from different
channels (coils) should be combined. Several reviews (Blaimer et al., 2004; Glockner et al.,
3
4

Sensitivity encoding
Generalized Autocalibrating Partially Parallel Acquisitions

17

CHAPTER 2. MRI BACKGROUND
2005; Deshmane et al., 2012) examine the differences and similarities of these techniques
in greater detail and detail various clinical applications. Calibrationless methods have
also been explored in more recent years (Trzasko and Manduca, 2011; Majumdar and
Ward, 2012; El Gueddari et al., 2019).
An example of parallel imaging with coil sensitivities. We illustrate the role
of sensitivity maps to provide additional information to resolve the aliasing due to the
acceleration in Figure 2.5. This example assumes three coils with box-like sensitivities
covering each a third of the image. We see that in the case R = 1, there is no aliasing,
and reconstruction is simply achieved by an inverse Fourier transform. In the case R = 3.
We see that adding the information of each coil allows to still resolve the aliasing, but
this is not sufficient for R = 4, where the three coils do not provide enough information
in order to resolve aliasing. We can see also on the last column how increasing R results
in acquisition lines getting more spread out in Fourier space.
Note that this is an idealized example, as in this case the coil-sensitivities do not
overlap, which means that summing up individual images multiplied by their respective
sensitivities suffices to reconstruct the image. The SENSE algorithm is more involved
and can resolve images with overlapping coil sensitivities.

2.1.6

Dynamic MRI

Before terminating this chapter, we will also cover the case of dynamic MRI, which will
be relevant to Chapter 5 of this thesis. Dynamic MRI, as its name suggests aims at
imaging dynamic, moving objects. It can be used for instance for cardiac imaging (Tsao
et al., 2003) or vocal tract imaging (Echternach et al., 2010). The data are acquired in
k-space at different times, resulting in k-t raw data, and the direct reconstruction is done
with a frame by frame inverse Fourier transform.
In this setting, imaging speed is most critical for several reasons: if one wants to image
a fast-moving object precisely, then a high temporal resolution will be required, which
presents a challenge for MRI (Tsao et al., 2003). Accelerating the acquisition in this case
will not only improve the comfort of the patient, but might also provide higher temporal
resolution.
However, the motion between two frames should not be too large, and in practice, which
would require breath-held examinations for Cartesian cardiac imaging (Gamper et al.,
2008; Jung et al., 2009), which is often impractical for patients. In this case, techniques
such as radial sampling can be greatly beneficial, as radial spokes continually sample the
center of the k-space, and then motion can be retrospectively corrected in free-breathing
samples (Winkelmann et al., 2007; Uecker et al., 2010; Feng et al., 2014, 2016b).
If one wishes to perform Cartesian sampling in this case, higher acceleration rates would
18

2.2. MATHEMATICAL DESCRIPTION
be required, and parallel imaging might not be sufficient to resolve aliasing. As a result,
one might use a priori structure in the images to constrain the reconstruction of the
image and obtain a good quality outcome. This is known as Compressed Sensing (Candès
et al., 2006; Donoho, 2006), and will be discussed in Chapter 3.

2.2

Mathematical description

We close this chapter by introducing the mathematical framework that will be used in
the sequel. So far, we have described our problem in a continuous fashion, where the
signal is a function of a continuous time t or k-space location (kx , ky ). However, in a
Cartesian acquisition, the data can be represented on a finite grid as a discrete set of
values, depending on the total number of readouts. Then, within each readout, samples
will be recorded at a time interval ∆t, and so the resolution of k-space can be described
as
∆kx = γ Gx ∆t
∆ky = γ ∆GPE τPE
Note that in order to guarantee an image without aliasing, ∆t and ∆GPE must be
sufficiently small, and their precise value is dictated by the famous Nyquist-Shannon
sampling theorem. Assuming a total of Ny phase encoding lines with each continuing Nx
samples, we can represent the signal S(kx , ky ) as a matrix S ∈ CNx ×Ny , which can be
written in a vectorized form s ∈ CP , where P = Nx · Ny .
In the Cartesian case, the spin density ρ(x, y) will be obtained by computing an inverse
Fourier transform on s. It is common for the resulting image to be described as ρ ∈ CP ,
with the same dimensions as s. The inversion operation is performed by the inverse
Fourier transform F −1 , applied to s, namely
ρ = F −1 (s).

(2.9)

This operation is often represented as matrix vector multiplication, where the operator
F −1 is explicited as the FFT matrix F −1 , and in addition, one can model the imperfections
in the measurement as additive Gaussian noise ϵ ∈ CP , which yields
ρ = F −1 s + ϵ.

(2.10)

Note that in practice, Equation 2.10 is not evaluated explicitly, but rather the inverse
Fourier transform is directly computed using the Fast Fourier Transform (FFT) algorithm.

19

CHAPTER 2. MRI BACKGROUND

2.2.1

Accelerated MRI

In the case of accelerated MRI, the full k-space is not observed, and in practice, entire
phase encoding lines are skipped at once, resulting in an acceleration of the scanning time
proportional to the number of lines removed. We represent this operation as masking our
unobserved coefficients, which is achieved by defining a diagonal masking matrix Pω that
masks out coefficients indexed by the set ω, namely Pω,ii = 1 if i ∈ ω and 0 otherwise. ω
is defined as the sampling mask and has cardinality |ω| = N . ω is typically structured so
that entire lines are skipped at once, and a formal presentation of this structure will be
discussed in Chapter 5. We refer to the ratio N/P as the sampling rate. This masking
results in observing the partial image
ρ = F −1 Pω s + ϵ

(2.11)

Remark (Sampling mask and trajectories). There is a difference between a sampling
mask and a sampling trajectory. The former only provides a fixed view of all locations
acquired, while the latter comprises a dynamic component. The sampling trajectory
tells not only what locations were acquired, but also the temporal ordering of the
acquisition, detailing the order in which the samples are acquired throughout the
multiple acquisition steps.
In the sequel, we will consider problems where we aim at learning how to reconstruct data
from partial Fourier measurements, and how to optimize sampling masks to obtain the
best reconstruction quality. In order to be aligned with the commonly used terminology,
we will need to redefine some of the notation used until here. We assume that there
exists some underlying ground truth image x ∈ CP from which we acquire partial Fourier
measurements (or observations) yω that can be noisy, i.e.
yω = Aω x + ϵ = Pω F x + ϵ.

(2.12)

This will be our basic acquisition model for the whole of this thesis. Aω is defined as a
shorthand for Pω F . Note that Equations 2.11 and 2.12 can be related by setting s = F x
and ρ = F −1 yω . This reformulation allows to describe the image x as the fundamental
unknown, and the partial measurements to be described in the Fourier domain.
Remark. In the case of parallel or multicoil acquisition, Equation 2.12 is slightly
adapted to account for acquisition on j = 1, . . . , C coils with different spatial
sensitivities
yω,j = Aω Sj x + ϵj = Pω F Sj x + ϵj ,
(2.13)
where the coil sensitivities can be represented as a P × P diagonal matrix where Sj,ii
describes the spatial sensitivity for the i-th pixel. We will also sometimes use the
20

2.2. MATHEMATICAL DESCRIPTION
compact notation
yω = Aω Sx + ϵ

(2.14)

for multicoil MRI, and in this case, yω ∈ CCP will represent the stacked observation, Aω will be the coil-wise Fourier transform, S ∈ RCP ×P will be the stacked
sensitivities.
Recovering the ground truth image x from partial measurement yω is challenging, as the
problem of recovering an image from partial measurements is fundamentally ill-posed:
there is mathematically infinitely many solutions that are consistent with yω , but they
are not physically meaningful. Such problems are often referred to as ill-posed inverse
problems, where we aim at recovering a reference image from partial information that
entail a loss of information. In this context, Equation 2.12 is referred to as a forward
model. In the next chapter, we will discuss the approaches that have been studied in the
literature to construct an estimate x̂ of x from partial measurements y.

21

3 Reconstruction methods

In this chapter, we consider the solutions that have been explored to the problem of
accelerated MRI in the literature. We have defined this problem formally in Section 2.2.1:
we consider partial measurements yω obtained through the forward model of Equation
2.12, and aim at constructing an estimate x̂ of the underlying, unknown ground truth x.
The difficulty of this problem resides in its ill-posedness: there exist infinitely many
solutions consistent with the observations that are not physically meaningful. To counter
it, it is necessary to impart additional information to the problem in order to reach a
sensible solution. This was the approach taken in Compressed Sensing (CS) applied to
MRI (Lustig et al., 2008). The ill-posedness of the problem is alleviated by imposing
additional structure on it. All subsequent approaches rely on imposing some form of
structure to estimate x from y, but have relied on increasingly complex, data-driven
approaches.
We will trace back the evolution from CS approaches, that used simple mathematical
models to obtain structure, to recent deep learning-based approaches that construct
their models from training data: we moved from model-driven approaches to datadriven approaches. A more exhaustive presentation of these approaches can be found in
Ravishankar et al. (2019); Doneva (2020).

3.1

Model-driven approaches

3.1.1

Compressed Sensing

It has been long known that a band-limited signal can be exactly reconstructed from a set
of uniformly sampled frequencies, provided that their density if at least twice the highest
frequency of the original signal. This result is famously known as the Nyquist-Shannon
sampling theorem. However, sampling at the Nyquist rate can remain expensive, and
there has been extensive research to achieve sub-Nyquist rate sampling and reconstruction.
23

CHAPTER 3. RECONSTRUCTION METHODS
In their seminal works, Donoho (2006); Candès et al. (2006) introduced the formalism of
Compressed Sensing (CS), where they proved that by leveraging the structure present
in natural images, such as sparsity or low-rankness, one can perfectly reconstruct a
signal sampled much below the Nyquist rate. Achieving this required using a non-linear
reconstruction method, framed as the following prototypical problem
1
x̂ = arg min ∥Aω x − yω ∥22 + λR(x)
2
x

(3.1)

where the first term, known as data-fidelity, enforces consistency with the measurement,
and the second term, called regularization, defines a statistical model, capturing the
desired structure of the signal. In the literature, R(x) has taken various forms, but
typical models include sparsity in the wavelet domain, R(x) = ∥W x∥1 (Candès et al.,
2006; Donoho, 2006; Lustig et al., 2007), or sparsity with Total Variation (TV) kind
of constraints (Block et al., 2007; Knoll et al., 2011a). An alternative approach would
rather impose a low-rank structure on the resulting image, typically using R(x) = ∥x∥∗
(Lingala et al., 2011), or more sophisticated structures such as the Hankel matrix (Jin
et al., 2016). The common trend in all of these methods is the idea that the structure
of the signal implies a redundancy that can be exploited to represent the signal in a
compact fashion when transforming it appropriately. As expressing the structure in a
mathematical fashion is not a straightforward task, these methods are the expression of
various attempts to capture this idea.
Note that all these approaches typically introduce a trade-off between acquisition speed
(by lowering the number of k-space lines acquired) and reconstruction speed. Solving
Equation 3.1 requires using an iterative method, and while the computation is carried out
offline, the reconstruction can last up to hours for specific settings with large accelerations
(Feng et al., 2016a).
We note also for completeness that CS relies not only on leveraging additional structure
in the image to be reconstructed, but also on incoherent sampling (Lustig et al., 2008).
In practice, this often involves random-like sampling strategies, which enable the aliasing
to be incoherent, i.e. to feature a noise-like structure, as illustrated on Figure 3.1. We
will defer a more complete discussion of sampling to the next chapter.
Overall, it is clear that such approaches rely on statistical models, constructed to
implement the designer’s assumptions, rather than being learned from data. While such
methods enabled great progress in the last decade, the statistical models restrict the
expressivity of the approach: they express abstract mathematical structures encoding
our assumptions rather than focusing on the real data distribution expressed through
the training set. This is why we will refer such approaches as model-driven.

24

3.2. DATA-DRIVEN APPROACHES

(a)

(b)

(c)

(d)

(e)

(f)

Figure 3.1: Random (a) and regular (d) sampling masks, acquiring 25% of the total locations,
and their corresponding point spread functions (PSF) in (b) and (e). The convolution of the
PSF with the Shepp–Logan phantom is shown respectively in figures (c) and (f). We see that
the random undersampling produces incoherent or noise-like aliasing, which still allows us to
distinguish the original image. On the other hand, the aliasing produced by regular, grid-based
sampling (known as fold-over artifacts) yields a low-intensity result where the original image
cannot be distinguished.

3.2

Data-driven approaches

Many approaches tried to learn their model directly from data, and in this Section, we
survey a couple of the most significant representatives of data-driven methods.

3.2.1

Dictionary learning

Dictionary learning approaches rely on the assumption that an image can be represented
as a sparse combination of atoms from a dictionary, i.e. x = Dz where D is a dictionary
and z the sparse coefficient vector. This is sometimes referred to as a synthesis formulation,
as opposed to 3.1, which shows an analysis formulation. Synthesis models are often
applied patch-wise to images, and they aim at simultaneously learning a reconstruction
x, a dictionary D and an encoding Z (Ravishankar and Bresler, 2011b; Caballero et al.,
2014)
N
X
1
2
min ∥Aω x − yω ∥2 + β
∥Pj x − Dzj ∥22
x,D,Z 2
(3.2)
j=1
s.t. ∥zj ∥0 ≤ s, ∥di ∥2 = 1, ∀i, j
25

CHAPTER 3. RECONSTRUCTION METHODS

3.2.2

Unrolled neural networks

Unrolled neural networks are a data-driven, deep learning-based approach inspired by the
iterative algorithms used to solve Equation 3.1. This problem is known as a composite
optimization problem, and can be solved by various approaches. A famous approach is
known as the Iterative Soft-Thresholding Algorithm (ISTA) (Daubechies et al., 2004),
or Proximal Gradient method (Combettes and Pesquet, 2011), solves the problem by
iteratively computing




xt+1 = proxλR xt − αA† (Axt − yω ) .

(3.3)

The proximal operator is defined as proxg (x) ≜ arg minv g(v) + 12 ∥v − x∥22 , and for some
functions, can have a closed-form solution. This is typically the case when R(x) = ∥W x∥1 ,
where the proximal takes the form of a soft-thresholding operator, i.e. proxλ∥W ·∥1 (x) =
sign(W x) ⊙ max(W x − λ, 0).
Motivated by the formulation of Equation 3.3, unrolled neural networks parameterize the
proximal operator with a neural network f (yω , ω; θ), and then train a model of the form


xt+1 = f xt − αA† (Aω xt − yω ); θ(t)



(3.4)

for a fixed, small number of iterations t = 1, . . . , T . The output of the model is thus
f (yω , ω; θ) = xT where θ = {θ1 , . . . , θT }. As the proximal term capture the statistical
properties of the signal that wish to recover, parametrizing it with a neural network
enables to learn a model directly from data. The model being more flexible, it allows for
improved reconstruction performance over methods using statistical models.
A flurry of approaches has been motivated by unrolling iterative algorithms, and we
highlight a few examples hereafter. A more complete discussion of unrolled neural
networks can be found in Liang et al. (2020). In Mardani et al. (2018), the parameters
of the proximal mapping can be shared across iterations, or different at each iteration
(Schlemper et al., 2017). The unrolling can leverage the structure of a different optimization algorithm, like the Alternating Direction Method of Multipliers (ADMM) in Sun
et al. (2016), or the Primal-Dual Hybrid Gradient in Adler and Öktem (2018b). In the
noiseless case, it is also common to replace the inner step xt − αA† (Aω xt − yω ) with a
step called Data Consistency (DC) (Schlemper et al., 2017; Zhang et al., 2019b), that
replaces the reconstruction with the exact values at observed locations. Essentially, the
inner step becomes
DC(xt , yω ) = F † (PωC F xt + yω ) = F † (PωC F xt + Pω F x).

(3.5)

We see that at the observed locations ω, we simply keep the observation, while at the
non-observed locations ω C . Note that here, Pωc = I − Pω .

26

3.2. DATA-DRIVEN APPROACHES
These unrolled approaches require training the models fθ on a dataset of reference images
and observations {xi , yi }m
i=1 by minimizing a loss ℓ(·, ·):
θ∗ = arg min
θ

m
1 X
ℓ(xi , f (yi ; θ)).
m i=1

(3.6)

Various losses have been used in the literature. Initial works used ℓ2 loss (Schlemper
et al., 2017; Hammernik et al., 2018), but lately practitioners have turned to using ℓ1 loss,
SSIM or compounds of these Muckley et al. (2021). ℓ1 was indeed found to outperform ℓ2
on vision tasks Zhao et al. (2016). The structural similarity (SSIM) (Wang et al., 2004)
is a metric developed to match more closely the human perception, and recent work have
proposed that a weighted sum between ℓ1 loss and the multiscale SSIM (MS-SSIM) as a
compromise (Zhao et al., 2016).
Deep learning-based reconstruction methods further shift the burden of computation,
compared to compressed sensing approaches. While these enabled faster acquisition at
the cost of a slower, iterative reconstruction procedures, deep learning-based methods
retain the benefit of enabling sub-Nyquist sampling, but also enable fast reconstruction,
in the order of 10−1 second for a slice (Jin et al., 2017; Hammernik et al., 2018). However,
the computational burden is only shifted an additional step: these methods need to be
trained with large, fully-sampled1 datasets (Zbontar et al., 2019).

3.2.3

Deep image prior

In a spirit closer to dictionary learning and classical CS, Ulyanov et al. (2018) have shown
that untrained CNN can provide a sufficiently strong prior to enable image reconstruction,
even without training data. Given an untrained CNN fθ a single measurement instance
yω , the idea in deep image prior (DIP) is simply to minimize
min ∥yω − Aω fθ (z)∥22
θ

(3.7)

where z is a fixed, randomly sampled vector, and where the parameters θ are randomly
initialized. Then, using an iterative optimization algorithm, fitting the weights to the
observations enables the network to output a high-quality reconstructed image without
supervision or training data. This method was successfully applied to MRI in (Darestani
and Heckel, 2021; Yoo et al., 2021). A common pitfall of DIP is that this flexibility comes
at the price of an increase inference time: the network weights must be retrained for
each individual image. In addition, DIP can suffer from overfitting and so heuristics such
as early stopping are used to counteract this (Ulyanov et al., 2018; Sun et al., 2021).

In their recent work, Yaman et al. (2020) proposed approaches to train only using undersampled
data, but having access to a dataset of undersampled images is still required.
1

27

CHAPTER 3. RECONSTRUCTION METHODS

3.2.4

A Bayesian interpretation of reconstruction

So far, the discussion has been framed as moving from mathematically-defined to datadriven regularization in order to improve the recovery of Problem 3.1. However, this
approach can also be framed from a Bayesian perspective, where the data consistency
term 12 ∥Aω x − yω ∥22 is viewed as a likelihood and the regularization λR(x) as a prior.
This was initially discussed by Ji et al. (2008) for compressed sensing. The idea revolves
around Bayes rule, where the posterior p(x|y) can be expressed as
likelihood prior

z }| { z }| {

p(y|x) p(x)
p(x|y) =
∝ p(y|x)p(x).
p(y)

(3.8)

| {z }

marginal

Under Gaussian noise, we can derive an explicit likelihood model for Equation 2.12,
where, assuming
ϵ ∼ N (0, σ 2 I), i.e assuming additive Gaussian noise, we get p(y|x) =

1
1
exp − 2σ2 ∥y − Aω x∥22 ∼ N (y; Pω F x, σ 2 I). The regularization term can then
(2πσ 2 )P/2
be viewed as a prior from which the ground truth x is sampled, in the case of classical CS,
 P

the prior can we specified as the Laplace density function p(x) = λ2 exp (−λ∥W x∥1 )
(Ji et al., 2008). Such an approach enables to view Problem 3.1 as a maximum a posteriori
(MAP) estimation, where one aims to find the reconstruction x̂ that maximizes posterior
probability
x̂ = arg max p(x|y) = arg max
x

x

p(y|x)p(x)
p(y)

by definition

= arg max p(y|x)p(x)

as the optimization is indep. from y

= arg min − log (p(y|x)p(x))

by taking the log and negative of the problem

x

x

= arg min
x

1
∥y − Aω x∥22 + λ∥W x∥1
2σ 2

explicit the functions and discard the constants

The Bayesian perspective gives a different interpretation of reconstruction and of the
role of regularization as a prior, and lends itself naturally to an extension to data-driven
models, which act as learned prior models. The Bayesian framing of Equation 3.1 has
also the advantage of naturally defining a criterion to acquire observations that have large
uncertainty according to the model (Ji et al., 2008). We will discuss these implications
in greater depth in the next Chapter as well as in Chapter 7. For now, we turn to our
last category of deep learning-based approaches which make use of this Bayesian insight
and attempt to explicitly learn meaningful, data-driven priors.

28

3.3. TRADE-OFFS

3.2.5

Generative approaches

The role of the reconstruction methods can be seen as learning a data-driven prior which
has proven to enable higher quality of reconstruction than mathematically motivated
models. While reconstruction methods embed the full iterative resolution of Problem 3.1
into a deep architecture, other approaches, relying on deep generative models, propose to
explicitly parametrize the prior with a neural network (Bora et al., 2017; Jalal et al., 2021;
Patel and Oberai, 2021). This has the advantage of keeping the optimization procedure
more transparent, as it will be about explicitly finding a sample from a deep generative
prior that is consistent with the observation. The approach, proposed initially by Bora
et al. (2017) relies on training a generative adversarial network (GAN) to capture the
prior distribution. We first briefly introduce GANs.
Generative adversarial networks. GANs (Goodfellow et al., 2014) are a specific
type of deep learning models where a generator gθ (z) and a discriminator dϕ (x) are
trained in a competitive fashion: starting from random noise z ∼ p(z), the generator
tries to construct samples that imitate a distribution of references images p(x) from
which we are given a large set of samples {xi }m
i=1 . The discriminator, on the other
hand, tries to classify a sample based on how likely it is to originate from the reference
distribution. The problem can formally be described as a game between the generator
and the discriminator, where we solve
min max Ep(x) [dϕ (x)] − Ep(z) [dϕ (gθ (z))] .
θ

ϕ

(3.9)

Generative priors. Given a trained generator gθ∗ (z) approximating the image distribution of interest p(x), the approach of Bora et al. (2017), applied also to MRI in
Patel and Oberai (2021), aims at finding a sample from the generator consistent with
the observations by solving
min ∥Aω gθ∗ (z) − yω ∥2
(3.10)
z

While this approach does not enable to directly construct a maximum a posteriori
(MAP) estimate, due to the GAN only yielding samples and no probabilities, the lowdimensionality of the latent space z enables to efficiently carry out Markov Chain Monte
Carlo (MCMC) computations (Patel and Oberai, 2021) to construct posterior samples.

3.3

Trade-offs

While all the methods presented in this chapter enable reconstruction from sub-Nyquist
sampling rates, they feature different advantages and drawbacks. The state-of-the-art in
terms of reconstruction performance is occupied by deep learning-based reconstruction
methods, such as unrolled algorithms. They allow for the best quality and fastest inference
among all methods presented here, but require large training datasets and are not robust
29

CHAPTER 3. RECONSTRUCTION METHODS
to distribution shifts (Jalal et al., 2021; Darestani et al., 2021). Generative priors also
require large amounts of data to train the generative model, and generally perform less
well than trained reconstruction networks, while also having a slower inference time
(Darestani et al., 2021). They have the advantage however of being robust to some
distribution shifts like change in the distribution of masks ω and their inference is more
interpretable than deep learning methods, as the conditioning on the observations is
explicitly computed from the prior, rather than through a black box like in an end-to-end
reconstruction algorithm.
Compressed sensing (CS), dictionary learning (DIL) and deep image prior (DIP) are
generally slower at inference, as they all rely on iterative reconstruction, and require
the optimization of increasingly large amounts of variable: from the image in CS to the
image and dictionary in DIL to the parameters of a deep network in DIP. However, DIP
is found to generally outperform CS-based approaches. The trade-offs between DIL, DIP
and Generative priors is however less clear, as these methods have not been extensively
compared in the literature.
Overall, this survey highlights a couple of the general trends observed in the literature
and is not exhaustive. It shows however a clear trend where different methods try to
exploit the redundancy due to the structure in the data. Earlier works attempted to
explicitly embed this structure by compactly representing the signal in a variety of
different bases, such as Wavelet domain (Lustig et al., 2007), or through representing
the signal as low-rank (Lingala et al., 2011). Later, approaches aimed rather at learning
how to leverage this structure from data, and use the powerful tools of deep learning
to represent the signal in complex ways, following a data-driven paradigm. All these
methods rely however on the idea that the structure of the signal allows to alleviate the
ill-posedness of the inverse problem, and retrieve a signal from undersampled observations,
because of its underlying structure.
In the next chapter, we will detail the evolution in mask designs ω used to generate the
observation and how they have been optimized in order to provide the most informative
observations to enable the best reconstruction downstream.

30

4 Optimizing sampling patterns

In the previous chapter, we discussed the evolution of reconstruction methods for accelerated MRI, where a shift from model-driven approaches to data-driven approaches was
observed. A similar shift can be observed in the design of sampling patterns.

4.1

A taxonomy of mask design methods

While the evolution of reconstruction methods can generally be described along the axis
of moving from model-based approaches towards learning-based ones, the picture is less
clear for the mask design methods. It is nonetheless to consider two axes of discussion
that run in parallel to the evolution witnessed in reconstruction methods.
These two axes pertain to the two main components needed to optimize sampling patterns
(or masks or trajectories). One needs i) a prior distribution from which candidates
patterns are drawn, and ii) a criterion, a metric to quantify its performance. We will see
that at the onset of compressed sensing, sampling patters were drawn from heuristically
designed parametric distributions, a setting widely known as variable density sampling,
and gradually moved towards distributions tailored for a specific dataset. Similarly, the
criteria used to establish the quality of a mask initially were mathematical quantities
such as coherence, but the criteria became increasingly related to the reconstruction
performance enabled by subsampling data using a given mask. One can clearly see a shift
from i) model-based approaches towards learning-based ones, and from ii) model-driven
criteria towards data-driven ones.
Remark. We chose to denote the distribution from which candidate sampling
patterns are drawn as the base for the sampling optimization procedure, while the
criterion used as what drives it.
In the rest of this chapter, we will survey the literature from the first application of
compressed sensing to MRI onwards to show how mask design evolved from model31

CHAPTER 4. OPTIMIZING SAMPLING PATTERNS
based, model-driven approaches towards learning-based, data-driven approaches. We will
conclude by formalizing the problems linked with optimizing mask design in this context.

4.2

Compressed sensing and MRI: Model-based, modeldriven approaches

4.2.1

The contribution of Lustig et al. (2007)

With the onset of Compressed Sensing (CS), random sampling patterns were used, as they
allowed to obtain noise-like, incoherent aliasing, as illustrated on Figure 3.1. This was
indeed theoretically motivated, as random sampling easily enables to achieve incoherent
sampling, which in turn meant that a small number of measurements were sufficient to
exactly solve the problem 3.1 for R(x) = ∥W x∥1 with high probability.
Definition. Let (A, W ) be two orthonormal bases of Cp , where A is the observation
domain, and W is the sparsifying (representation) basis. Then, the coherence
between these bases is defined as (Candès and Wakin, 2008)
µ(A, W ) =

√

p · max |⟨ai , wj ⟩|.
1≤i,j≤p

(4.1)

The coherence measures the maximal correlation between any element of A and
√
W . µ(A, W ) ∈ [1, p], where coherence 1 implies maximal incoherence between the
√
bases, and p maximal coherence.
Coherence then allows to bound the minimal number of measurements N required for
exactly solving 3.1 with high probability. Indeed, for a given signal x ∈ Cp observed in
domain A, assuming that x is S-sparse in the transformed domain W , then a typical
compressive sensing result (Candes and Romberg, 2007) dictates that, given N random
measurements, exact reconstruction will occur with high probability when
N ≥ C · S · µ2 (A, W ) · log P

(4.2)

for some positive constant C.
While the theory of CS obtains results for random measurements, practitioners have
quickly sought to optimize the measurements, and a natural initial criterion has been
to minimize the coherence between measurements, i.e. to select a sampling pattern
ω with |ω| = N that enables to minimize µ(Aω , W ) (Lustig et al., 2007). However,
when applying compressed sensing to MRI, it was quickly observed that purely random
undersampling of the Fourier space is not practical, for several reasons (Lustig et al.,
2007, 2008): i) it does not satisfy hardware and physical constraints that require smooth
trajectories in Fourier space and the use of trajectories robust to system imperfections,
32

4.2. COMPRESSED SENSING AND MRI: MODEL-BASED,
MODEL-DRIVEN APPROACHES
ii) it does not consider the energy distribution of the signal in k-space, which is very
concentrated around the center
For this reason, the seminal work of Lustig et al. (2007) proposed to use a variable-density
sampling approach, where the k-space is less undersampled near the origin (low frequency,
high energy region) and more towards the periphery (high frequency but low energy)1 .
They used a distribution centered at the origin of k-space, where the probability of
acquiring a location decayed at a polynomial rate as a distance of the origin, namely
P (kx , ky ) =

2 q
1− √
kx + ky
p

!d

(4.3)

with decay d > 1, and −p/2 ≤ kx , ky ≤ p/2. This heuristic was proposed to imitate the
decay in the spectrum observed on real data. The mask to be used is then estimated by
a Monte-Carlo procedure: a set of masks are randomly sampled from Equation 4.3, and
the mask with minimal coherence is retained.
The idea proposed by Lustig et al. (2007) of using variable-density sampling for MRI
was not totally novel, as variable-density sampling had previously been used in the MRI
community to design undersampling trajectories with incoherent aliasing when using
linear reconstructions (Marseille et al., 1996; Tsai and Nishimura, 2000). However, Lustig
et al. (2007) were the first to integrate this heuristic within the framework of CS.
It should be clear from the presentation that the approach of Lustig et al. (2007) is a
model-based approach, as the masks are sampled from a heuristic probability distribution
(eq. 4.3) and also a model-driven approach, as the criterion used to select the masks is
coherence, a mathematical structure prescribed by the theory of compressed sensing.

4.2.2

Model-based, model-driven sampling in the sequel

A similarly model-based, model-driven was followed by Gamper et al. (2008), which
considers dynamic MRI, although from a different initial mask: they attempted at
randomizing a deterministic mask design proposed by Tsao et al. (2003) to achieve
incoherent sampling. The prior in this case is given by a structured mask.
Some refinements to variable-density sampling (VDS) were developed for instance in
(Wang et al., 2009), but in the sequel, two broad approaches were followed to improve
the design of sampling patterns. The first approach leveraged learning-based approaches,
and will be discussed in the next section, while the second attempted at developing
theoretical models that more closely captured the structure of real-world signals.

The name variable-density sampling originates from the very fact that k-space is not uniformly
subsampled, but rather with a variable density (Tsai and Nishimura, 2000).
1

33

CHAPTER 4. OPTIMIZING SAMPLING PATTERNS
The work of Wang and Arce (2009) is a first step in this direction, where improved VDS
functions are proposed by exploiting a priori statistical information of real-world images
in the wavelets domain. One can notice that this approach chose to tie the sampling
pattern specifically to a type of regularization, namely sparsity in the wavelet domain,
and obtains better sampling density for this problem but lose in generality over competing
approaches.
The work of Puy et al. (2011) proposes a coherence-driven optimization of the variable
density sampling procedure. Instead of relying on a heuristic distribution such as
equation 4.3, the procedure proposes an objective that yields a VDS distribution that
minimizes the coherence between the measurement and representation bases.
A large body of subsequent works (Krahmer and Ward, 2013; Roman et al., 2014; Adcock
et al., 2015, 2017) rely on some additional theoretical tools such as local coherence,
asymptotic sparsity, and multilevel sampling to give a solid ground to heuristics such as
VDS that show good performance but lack theoretical justification.

4.3

Towards learning-based, data-driven approaches

In parallel to these works, there was a motivation to move towards learning-based
approaches, due to some weaknesses in the VDS approach of Lustig et al. (2007). One
of them is that the VDS distribution of Equation 4.3 is a heuristic that requires tuning
to work well on different types of data, and that heuristically tuned variable density
distributions generally performed better than the ones solely prescribed by CS theory
(Chauffert et al., 2013).
Learning-based, model-driven sampling. A first work of Knoll et al. (2011b) aimed
at solving this issue by constructing the VDS distribution from data rather than from a
tuned heuristic. They simply average the spectra of several similar images, normalized it
and sampled their mask from the resulting distribution, minimizing coherence to select
the final mask, as in Lustig et al. (2007). This approach was then improved by Zhang
et al. (2014); Vellagoundar and Machireddy (2015).
Early learning-based, data-driven approaches. Ravishankar and Bresler (2011a);
Liu et al. (2012) proposed similar methods where they aimed at designing a sampling
pattern that minimized the reconstruction error in Fourier space, rather than coherence.
Both algorithms partition k-space in blocks, and iteratively design a mask by removing
samples from low-error regions in order to re-assign them in high-error regions. Liu et al.
(2012) made the important point that the design of a mask should reconstruction-based
and not observation-based. Until then, approaches did not include the reconstruction
algorithm into the design of the mask, but relied on model-based criteria relying on the
observations, such as coherence. This insight proved to be impactful in the performance
34

4.4. A MATTER OF PERFORMANCE: HOW DO THESE METHODS
COMPARE TO EACH OTHER?
of the methods, and while no quantitative comparison was available in the paper, further
works confirmed the impact of including the reconstruction algorithm in the mask design
(Zijlstra et al., 2016).
While these methods capture the paradigm that will be overwhelmingly dominant in the
deep learning era, there was little follow-up work until much later, around 2018.
The case of Seeger et al. (2010). The work of Seeger et al. (2010) lies in its own
category. This approach builds on a sequential Bayesian experiment design perspective
for sparse linear models (Seeger et al., 2007; Seeger and Nickisch, 2008). First a model of
the posterior distribution p(x|yω ) is built, and then, a sampling pattern is iteratively
built by successively acquiring the readouts that have the highest entropy. This approach
is able to sequentially tailor the sampling pattern to the need of an unseen image, given
the previously acquired measurements. However, the approach features a quite high
computational cost for a single image. Liu et al. (2012) raise that is not practical, as the
matrix inversion required to compute the next measurement that must be acquired is
too slow to match the readout time of a scanner.
While methods to optimize sequentially the sampling mask are common in the deep
learning era, the approach of Seeger et al. (2010) is unique in its time, as all other
works focused on optimizing the sampling pattern for a fixed sampling rate, and not in a
sequential fashion. While more principled than other approaches, and using maximum
entropy as the criterion to optimize the acquisition trajectory, this method remains
model-based (using a Laplace prior) and model-driven (maximum entropy).

4.4

A matter of performance: how do these methods compare to each other?

Very few of the works discussed until then feature an extensive comparison of performance
of their proposed approaches. Most works rely on visual comparisons, or state the
weaknesses of previous methods, but do not compare against them. In this regard,
the contributions that bring a comparison of various approaches to mask design are
particularly valuable, and in this section, we will discuss two of them, namely Chauffert
et al. (2013) and Zijlstra et al. (2016).
The work of Chauffert et al. (2013) aims at designing a better VDS density, while also
providing a comparison between theoretically informed sampling patterns, and the ones
obtained using the polynomial of Equation 4.3 tuned at various degrees d ∈ {1, 2, . . . , 6},
and a hybrid distribution mixing some theoretically prescribed components and heuristics.
It is interesting to observe that in their data, the tuned hybrid distribution performs best,
followed by the tuned polynomial and the optimal theoretically predicted distribution.
This theoretically optimal distribution is based on the arguments from Rauhut (2010) uses
35

CHAPTER 4. OPTIMIZING SAMPLING PATTERNS
rejection of already acquired locations (Puy et al., 2011) and minimizes coherence, while
the polynomial was tuned on the data using directly the final evaluation metric, namely
PSNR. The authors conclude that the theory would need to capture additional structure
of real-world images for sampling patterns obtaining better practical performance.
The article of Zijlstra et al. (2016) compares the performance of coherence-based VDS
using Equation 4.3, of Knoll et al. (2011b), of Liu et al. (2012) and of VDS optimized
to minimize normalized root mean squared error (NRMSE). The results show that
coherence-based VDS is generally outperformed by Knoll et al. (2011b), which is matched
by the optimized VDS. Both are outperformed by the method of Liu et al. (2012). The
conclusion is however careful:
When putting the presented results in perspective, it is important, to remember
that data-driven optimization of an undersampling pattern for CS is just one
aspect of the CS implementation. Data-driven approaches should be taken to
optimize other aspects of the CS implementation, such as the sparse domain
and the reconstruction algorithm.
Overall, while these works do not exhaustively compare the existing methods in the
literature, they give us reasons to view learning-based, data-driven methods as the most
direct avenue towards improvement in mask design optimization.

4.5

The advent of deep learning: Learning-based, datadriven approaches

Interest for learning-based, data-driven approaches to mask optimization was renewed
around 2018, with the advent of the first deep learning-based reconstruction methods
applied to MRI (Sun et al., 2016; Schlemper et al., 2017; Mardani et al., 2018). This
enabled the possibility to scale up the complexity of mask design methods, as deep learning
methods enable fast reconstruction times, which was not the case for CS methods.
The first work that we discuss in this section comes slightly before this time and is the one
of Baldassarre et al. (2016), where the authors propose a first principled, learning-based
approach to optimizing a deterministic sampling pattern given a set of training signals.
Note that in our taxonomy, their approach is both learning-based and data-driven. While
the optimization only considers the energy of the signal in the case of Fourier sampling,
and while the paper on linear reconstruction, the authors show improvement about the
sampling patterns prescribed in Lustig et al. (2007); Roman et al. (2014). However, as
they optimize only the sampling pattern to capture the energy of the signal and do not
include the reconstruction algorithm in their optimization procedure, they fail to obtain
significant improvement by turning to non-linear reconstruction methods.
36

4.5. THE ADVENT OF DEEP LEARNING: LEARNING-BASED,
DATA-DRIVEN APPROACHES
However, the authors published a sequel to this work with Gözcü et al. (2018), that
introduced a sequential optimization of the sampling pattern in a fully data-driven
fashion. The principle consists, at each step, to evaluate the improvement for a given
performance metric of including a candidate location in the final sampling pattern. Once
all candidate locations have been evaluated, the algorithms picks the one that brings the
largest improvement and moves to the next step. This method was shown to outperform
existing approaches, such as the one of Lustig et al. (2007) and Knoll et al. (2011b) by a
large margin, due to its approach being both learning-based and data-driven. Part of
the improvement is due also to not solely adding sampling locations based on the current
error, but computing how adding a location would actually decrease the error in the
reconstruction. The distinction is important as the method of Gözcü et al. (2018) is
able to take into account how the newly acquired location affects the behavior of the
non-linear reconstruction algorithm, rather than acquiring locations based on the current
reconstruction error. We will describe the setting of Gözcü et al. (2018) in greater depth
in Chapter 5 as this is the basis upon which this thesis was built.
At roughly the same time as Gözcü et al. (2018), Haldar and Kim (2019) published
another work aiming at optimizing sampling, based on an experiment design approach
to minimize the Cramér-Rao bound (CRB). The CRB provides a lower-bound to the
covariance matrix that is based on the measurement matrix Aω . This approach is
independent on the reconstruction algorithm, and provides a different take on a learningbased, model-driven approach, aiming at minimizing the covariance of an estimator. In
the context of MRI sampling, this amounts to finding the measurement matrix that will
encode the most information about the original signal x.
The last non-fully deep learning-based work that we will discuss is the one of Sherry
et al. (2019), where a bilevel optimization procedure is proposed: the first level solves an
empirical risk minimization problem for the mask, while the second level consists in an
iterative algorithm to solve a Problem similar to 3.1. The work takes an original approach
whereby relaxing the mask optimization problem into a continuous optimization problem,
they are able to tackle the mask optimization and the reconstruction under a unified
framework. While the reconstruction remains model-based, the mask stage is clearly
learning-based and data-driven, as training data are used, and aim at minimizing some
loss between the original image and the reconstruction.
Due to the overwhelming performance of deep learning methods, most subsequent works
focus exclusively on deep learning-based reconstruction methods, and all follow a learningbased, data-driven paradigm. All methods initialize the prior from which masks are
drawn randomly, and aim at refining it throughout training in order to maximize some
performance metric.
There are however some trends that can be highlighted within these methods. Some
of them aim at jointly train a reconstructor and a sampling pattern for some fixed
37

CHAPTER 4. OPTIMIZING SAMPLING PATTERNS
sampling budget. Others tackle a problem of designing a mask sequentially, by adding
new sampling locations based on the currently acquired information. Finally, some
methods aim at training a model that is able to propose a patient-adaptive mask design.
These methods often leverage tools from reinforcement learning (RL) and this is why we
refer to such models as policy models.
The works of Bahadir et al. (2019); Aggarwal and Jacob (2020); Weiss et al. (2020);
Huijben et al. (2020) relax the problem of optimizing the mask to a continuous optimization problem, and train the sampling pattern along with the neural network using
backpropagation. After training, the mask is then mapped back as a boolean vector.
Other approaches use a pre-trained reconstruction model and train a policy model that
performs patient-adaptive sequential mask optimization using reinforcement learning
(Pineda et al., 2020; Bakker et al., 2020). The approach of Zhang et al. (2019b) is unique,
as they aim at training an evaluator that tries to estimate the current reconstruction
error for each location in k-space, while using this evaluator to inform the training of the
reconstruction model. Finally, a fourth category of approaches tackles the challenging
problem jointly learning a reconstruction model and a policy model for patient-adaptive
sampling (Jin et al., 2019; Van Gorp et al., 2021; Yin et al., 2021).
Remark (A short discussion of non-Cartesian sampling methods). The problem
of optimizing sampling trajectories becomes more complex when moving to the
non-Cartesian realm. The problem is inherently continuous, and there is a lot of
freedom in designing non-Cartesian trajectories as long as hardware constraints such
as bounded first and second derivatives of the trajectory (Boyer et al., 2016).
As a result, there is great diversity in the methods proposed. Traditional approaches
such as radial spokes (Lauterbur, 1973) and spirals (Meyer et al., 1992) are typically
model-based, model-driven, but more recent space filling trajectories like SPARKLING
(Lazarus et al., 2019) fall under this category. Weiss et al. (2019); Chaithya et al.
(2022) attempted to train non-Cartesian sampling and reconstruction jointly, in an
end-to-end fashion, yielding learning-based, data-driven approaches, while Wang et al.
(2021) proposed to parametrize the trajectories with B-spline to facilitate learning,
resulting in a model-based data-driven method. Finally, (Chaithya et al., 2021) also
proposed to refine the SPARKLING algorithm by learning from data the underlying
variable-density distribution, yielding a learning-based, model-driven approach.
These approaches were compared in Chaithya and Ciuciu (2022), where additional
challenges due to non-Cartesian sampling are highlighted. In this setting, it appears
that the algorithm used to optimize the sampling trajectory greatly matters, as
it is possible for trajectories to remain stuck close to initialization. As a result,
merely being a learning-based, data-driven approach does not guarantee an improved
performance over model-based model-driven alternatives. Nonetheless, the best

38

4.6. FORMALIZING LEARNING-BASED, DATA-DRIVEN MASK
DESIGN
performing method was found to be the hybrid learning approach of Chaithya et al.
(2022).

4.6

Formalizing learning-based, data-driven mask design

Recall the observation setting described in Section 2.2.1. We consider the problem of
accelerated MRI, where we obtain partial information yω ∈ CP about a ground truth
signal x ∈ CP
yω = A ω x + ϵ = P ω F x + ϵ
where ϵ a complex Gaussian noise vector. F is the Fourier transform operator, and
Pω is our masking operator, implemented as a diagonal matrix with diagonal entries
Pω,ii = 1 if i ∈ ω and 0 otherwise. ω denotes the sampling mask and |ω| = N denotes is
cardinality.
As described in Chapter 3, we construct an estimate x̂ of the ground truth x using a
reconstruction algorithm fθ such that x̂ = fθ (y, ω), where θ denote the parameters of
the reconstruction algorithm. This can correspond to the regularization parameter of a
compressed sensing method, or denote all the parameters of a neural network.
A learning-based approach requires access to a data distribution p(x) that can give us
access to reference images, while a data-driven approach requires a way to quantify how
close an estimation x̂ is from the ground truth x. This is done by using a performance
metric η(·, ·) : CP × CP → R, which we will aim to maximize2 . The process is illustrated
on Figure 4.1.
An ideal sampling algorithm would tailor the mask to each instance of x ∼ p(x), solving
max η(x, x̂θ (yω = Pω Ax)),

ω:|ω|≤N

(4.4)

However, it is evident that such an approach is infeasible in practice, because solving
Equation 4.4 would require access to the ground truth image x at the time of evaluation.
This is not the case in reality, where we only have access to fully sampled training signals
for training. We turn now to two main approaches to this problem that have been
discussed before. They rely on either i) using non-adaptive masks, designed on a training
set offline and deployed at testing time or ii) using adaptive masks, where a sampling
policy or heuristic πϕ is trained offline and then takes patient-adaptive decision during
the acquisition at evaluation time.

A similar reasoning would hold if one would consider a loss ℓ(·, ·) : CP × Cp → R instead of a
performance metric. In this case, one would then need to find the sampling mask that minimizes it.
2

39

CHAPTER 4. OPTIMIZING SAMPLING PATTERNS

Ground truth

Undersampled
Fourier spectrum

Observation

iﬀt

Reconstruction

Reconstructor

ﬀt
Training data
Sampling
policy

Undersampling

Fourier spectrum

Sampling mask

Figure 4.1: Overview of sampling in accelerated MRI, using a knee image. Acquisition physically
happens sequentially in Fourier space (full readout lines acquired at once), and a sampling policy
can decide ahead of time what sampling mask to use (non-adaptive sampling), or integrate
information from the current reconstruction to decide on the next location to acquire (adaptive
sampling).
Blue dashed lines indicate optional relations: not all policies and reconstruction methods rely on
training data, and not all policies are patient-adaptive. The light grayed area indicates that we
do not have access to the full ground truth and spectrum at evaluation time, but that we rather
directly acquire undersampled data.

4.6.1

Non-adaptive sampling masks

In this first setting, given access to a set of training data {x1 , . . . , xm } ∼ p(x), we aim
at finding the mask that exhibits the best average performance, namely
m
1 X
η(xi , fθ (yω,i ; ω)) s.t. yω,i = Aω xi + ϵ.
ω:|ω|≤N m
i=1

max

(4.5)

However, solving Equation 4.5 remains challenging, due to the combinatorial nature of
the problem. It also brings questions of generalization, of whether a mask ω that shows
good performance on the training set would also show good performance on a testing set
of unseen samples from the same distribution p(x). While standard learning-theoretic can
give answer on the generalization problem (Gözcü et al., 2018), the combinatorial nature
of the problem dooms to failure any naive approach, as the total number of different
masks grows exponentially, with a complexity O(2N ).
Two main types of approaches have been proposed for Problem 4.5, relying either on
tackling the combinatorial problem in an approximate fashion (Gözcü et al., 2018; Zibetti
et al., 2020) or relaxing the mask to a continuous variable that is then solved using
gradient-based optimization (Bahadir et al., 2019; Aggarwal and Jacob, 2020; Weiss
et al., 2020; Huijben et al., 2020; Sherry et al., 2019). Combinatorial approaches typically
40

4.6. FORMALIZING LEARNING-BASED, DATA-DRIVEN MASK
DESIGN
tackle Problem 4.5 as a sequential problem, starting from some initial mask ω0 , and
gradually growing the mask until the sampling budget N is exhausted. On the contrary,
continuous approaches choose to directly optimize the mask at the maximal sampling
budget N , a setting which we refer as fixed sampling.

4.6.2

Adaptive sampling masks

In contrast to Equation 4.5, adaptive sampling generally relies on a two-step procedure,
where one must first train a policy model πϕ , and subsequently evaluate it on testing
data. The problem of training the policy reads

max
ϕ




ω = ωt−1 ∪ vt

 t

m
1 X
η(xi , fθ (yωT ,i )), where vt ∼ πϕ (fθ (yωt−1 ))

m i=1


|ω | ≤ N
t

t = 1, . . . , T.

(4.6)

Here, T describes the number of acquisition rounds that the policy does. While the
problem may look similar to Problem 4.5, there are two fundamental differences to be
noted. First, the optimization here is on the parameters ϕ of the policy model, rather
than on the mask itself. Secondly, the problem is inherently sequential, as the mask
is gradually built from atoms vt , t = 1, . . . , T that are all given from the policy πϕ at
different stages in the acquisition.
Solving the problem remains challenging, as gradient-based optimization of ϕ remains
challenging, due to the sampling operation vt ∼ πϕ ((fθ (yωt−1 )) preventing a direct
computation of the gradients. As a result, two main approaches can be found to tackle
this problem, relying either again on relaxing the mask to a continuous variable (Van Gorp
et al., 2021; Yin et al., 2021), or leveraging reinforcement learning (Bakker et al., 2020;
Pineda et al., 2020; Jin et al., 2019), which is naturally suited to consider the problem of
searching the policy that yields the best performance given a changing environment. In
our case, the environment is described by the unknown underlying ground truth.
The second stage of adaptive sampling is the inference step, where the trained policy πϕ
is then used to guide the acquisition of an unseen image xtest :
vt+1 ∈ arg max[πϕ (fθ (yωt ,test ))]v

(4.7)

v∈[P ]

where [·]v denotes the v-th entry of the vector, ωt = ωt−1 ∪ vt , for t = 1, . . . , T . At
inference time, we see that the test image is gradually sampled at the locations vt provided
by the policy and based on the reconstruction at the previous step.
Remark (Using heuristics). The inference of Equation 4.7 is not restricted to
policies trained following Equation 4.6, but can be paired with different heuristics.
41

CHAPTER 4. OPTIMIZING SAMPLING PATTERNS
For instance, Zhang et al. (2019b) trained an evaluator that estimates the error
made by the reconstruction model at the different locations to be acquired. They
then use this heuristic for sampling, by acquiring at each step the location that is
estimated to have the highest reconstruction error.
We refer to this approach as a heuristic rather than a policy because it is not
specifically trained to estimate what would be the best location to answer next; it
only trains a model that estimates the current reconstruction error, and then uses it
in a greedy fashion.
We will expand upon the use of heuristics for adaptive sampling in Chapter 7, where
we propose to use the variance of a generative adversarial network (Goodfellow et al.,
2014) to perform adaptive sampling.

Remark (Non-sequential adaptive sampling). Although a sequential acquisition is
a natural choice for adaptive sampling, it is possible to perform patient-adaptive
sampling with a fixed (non-sequential) mask.
This was done for instance by Bakker et al. (2021), where in the case of multi-coil
MRI, the authors used the auto-calibration signal (ACS), a small set of measurements
that is systematically acquired, to design a patient-adaptive fixed mask in a one-shot
fashion. They inputted the measurements into a policy model that outputs a single
mask for the rest of the acquisition as a result. However, their results surprisingly
show that their best performing sampling policies explicitly learn to be non-adaptive,
and their results provide then a state-of-the-art non-adaptive sampling mask.

4.6.3

On the optimality of the discrete mask optimization problem

In the previous sections, we defined Problems 4.4 and 4.5 as optimization problems over
masks. However, compressed sensing approaches Puy et al. (2011); Chauffert et al. (2013)
rather optimized the probability distribution from which masks are subsequently drawn.
These arguments were also constructed in the context of model-based, model-driven
methods, aiming at minimizing coherence. Can we relate the optimization of a probability
distribution to the optimization of a discrete mask such as Problem 4.5? How does
the problem of finding an optimal probability distribution change when moving from a
model-based, model-driven context to a learning-based, data-driven context? These are
the questions that we will answer in this section.
While in Chauffert et al. (2013), the optimization was done on the distribution π from
which masks are subsequently sampled, in our case, the performance metric will be
optimized directly for a fixed mask rather than a distribution. We will formally show
below that in a data-driven context, we can construct an optimal sampling distribution π
42

4.6. FORMALIZING LEARNING-BASED, DATA-DRIVEN MASK
DESIGN
from an optimal mask. The optimal mask describes the support of an optimal probability
distribution.
We model the mask designing process as finding a probability mass function (PMF)
P
π ∈ S P −1 , where S P −1 := {π ∈ [0, 1]P : Pi=1 πi = 1} is the standard simplex in RP . π
assigns to each location i in the k-space a probability πi to be acquired. The mask is
then constructed by drawing without replacement from π until the cardinality constraint
|ω| = N is met. We denote this dependency as ω(π, N ). The problem of finding the
optimal sampling distribution is subsequently formulated as
max η(π),
η(π) := Eω(π,N ) [η (x, fθ (y, ω))] ,
(4.8)
π∈S P −1

x∼p(x)

where the index set ω ⊂ [P ] is generated from π and [P ] := {1, . . . , P }. This problem
corresponds to finding the probability distribution π that maximizes the expected
performance metric with respect to the data p(x) and the masks drawn from this
distribution. To ease the notation, we will use η (x, fθ (y, ω)) ≡ η (x; ω).
In practice, we do not have access to Ep(x) [η(x; ω)] and instead have at hand the training
images {xi }m
i=1 drawn independently from p(x). We therefore maximize the empirical
performance by solving
m
1 X
E
[η(ω, xi )] .
(4.9)
max ηm (π),
ηm (π) :=
m i=1 ω(π,N )
π∈S P −1
Note that Problem 4.9, while being an empirical maximization problem, is distinct from
Problem 4.5 above, as it seeks to optimize the distribution from which masks are drawn,
and as a result, performs an expectation over all the masks drawn from it.
Given that Problem 4.9 looks for masks that are constructed by sampling N times
without replacement from π, the following holds.
Proposition 1. There exists a maximizer of Problem 4.9 that is supported on an
index set of size at most N .
bn be a maximizer of Problem 4.9. We are interested in
Proof. Let the distribution π
P
bn . Because |ω|=N Pr[ω] = 1, note that
finding the support of π

1 Xm
η(xi ; ω) · Pr[ω|π]
i=1
m
π∈S P −1
|ω|=N

max ηm (π) := max

π∈S P −1

X

1 Xm
η(xi ; ω)
i=1
π∈S P −1 |ω|=N m
1 Xm
= max
η(xi ; ω).
i=1
|ω|=N m

≤ max max

b N be an index set of size N that maximizes the last line above. The above
Let ω

43

CHAPTER 4. OPTIMIZING SAMPLING PATTERNS
b N ] = 1 and Pr[ω] = 0 for ω ̸= ω
b N and π = π
bN . This
holds with equality when Pr[ω
b
b
in turn happens when πN is supported on ω . That is, there exists a maximizer of
Problem 4.9 that is supported on an index set of size N .
□

While this observation does not indicate how to find this maximizer, it nonetheless allows
bN has
us to simplify Problem 4.9. More specifically, the observation that a distribution π
a compact support of size N implies the following:
Proposition 2.

m
1 X
η(xi ; ω)
|ω|=N m
i=1

Problem 4.9 ≡ max

(4.10)

Proof. Proposition 1 tells us hat a solution of Problem 4.9 is supported on a set of
size at most n, which implies
Problem 4.9 ≡

max

π∈S P −1 :|supp(π)|=N

ηm (π).

(4.11)

That is, we only need to search over compactly supported distributions π. Let SΓ
denote the standard simplex on a support Γ ⊂ [P ]. It holds that
Problem 4.11 ≡ max max ηm (π)
|Γ|=N π∈SΓ

1 Xm
η(xi ; Γ) · Pr[Γ|π]
i=1
|Γ|=N π∈SΓ m
1 Xm
η(xi ; Γ)
= max max
i=1
|Γ|=N π∈SΓ m
1 Xm
= max
η(xi ; Γ).
i=1
|Γ|=N m
= max max

(4.12)

To obtain the second and third equalities, one observes that all masks have a common
support Γ with N elements, i.e. π ∈ SΓ allows only for a single mask ω with N
elements, namely ω = Γ.
□
The framework of Problem 4.9 captures most variable-density based approaches of the
literature that are defined in a learning-based fashion (Knoll et al., 2011b; Ravishankar
and Bresler, 2011a; Vellagoundar and Machireddy, 2015; Haldar and Kim, 2019; Bahadir
et al., 2019; Sherry et al., 2019), and Proposition 2 shows that Problem 4.12, that we
tackled in Gözcü et al. (2018); Gözcü et al. (2019); Sanchez et al. (2020a) and develop in
Chapter 5, also aims at solving the same problem as these probabilistic approaches.
Note that while the present argument considers sampling points in the Fourier space, it is
readily applicable to the Cartesian case, where full lines are added to the mask at once.
44

4.6. FORMALIZING LEARNING-BASED, DATA-DRIVEN MASK
DESIGN
Several additional observations and remarks can be made. The argument presented here
is thus mainly a justification of the objective of Equation 4.5 that we use for a greedy
mask optimization: we do not aim at solving a fundamentally different problem than
the one of finding the optimal sampling density in a data-driven approach when trying
to find the best performing mask. Indeed, Proposition 2 shows that we do not need to
find a density, but that finding the support of cardinality N of an optimal distribution is
sufficient. And proposition 1 shows the existence of an optimal distribution with such a
support.
The argument does not formally prove the suboptimality of variable-density sampling,
although one can argue for this from the intuition that these results bring. Indeed,
any distribution not compactly supported on a set of N optimal location will open the
possibility to a suboptimal mask being sampled by doing sampling without replacement.
The heuristic of picking sampling locations at random without replacement could be a
cause of the worse practical performance of VDS compared to LBCS, as we will see in
Section 5.4 of next Chapter.
Finally, these propositions do not prescribe any particular algorithm that should be used
to solve Problem 4.5. Rather, they are limited to establishing an equivalence between
problems, and do not speak either about the expected generalization of these solutions.
Generalization is however addressed by standard learning theoretic arguments (Gözcü
et al., 2018).

Bibliographic note
Propositions 1 and 2 are due to Armin Eftekhari.

45

5 Scalable learning-based sampling
optimization
In this chapter, we present our extensions to the framework of Gözcü et al. (2018). We
primarily aim at drastically improving the scalability of their method. We begin by
presenting in detail their learning-based CS (LBCS) framework in Section 5.1. In Section
5.2, we take a short detour in the field of submodular function maximization, from which
Gözcü et al. (2018) took inspiration, and discuss how algorithms that have been proposed
in this context could help improve the LBCS method. In Section 5.3, we present our two
improvements over LBCS, namely lazy LBCS and stochastic LBCS. We then carry out
extensive validation of these methods in Sections 5.4 and 5.5, and discuss the results in
Section 5.61 .

5.1

Learning-based Compressive Sensing (LBCS)

LBCS aims at designing a non-adaptive mask in a sequential fashion, and tackles Problem
4.5. Indeed, solving this problem in a naive, brute-force approaches would only be possible
in very simple cases, as the total number of masks grows exponentially, with a rate O(2P ).
Therefore, one must use approximate solutions to this problem. Gözcü et al. (2018)
proposed a greedy, parameter-free approach, where a mask is grown sequentially by
adding elements to the mask in an iterative fashion. This approach, called Learning-Based
Compressed Sensing (LBCS) allows to reduce the number of configurations evaluated to
a complexity O(P 2 ). The simplified procedure is illustrated on Figure 5.1.
A couple of elements must be introduced in order to formally define their algorithm.
To implement the constraints of sampling in MRI, typically acquiring full lines in the
The work in this chapter is based on the following publications:
Gözcü, B., Sanchez, T., and Cevher, V. (2019). Rethinking sampling in parallel MRI: A data-driven
approach. In 27th European Signal Processing Conference (EUSIPCO).
Sanchez, T., Gözcü, B., van Heeswijk, R. B., Eftekhari, A., Ilıcak, E., Çukur, T., and Cevher, V. (2020a).
Scalable learning-based sampling optimization for compressive dynamic MRI. In ICASSP 2020 - 2020
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8584–8588.
1

47

CHAPTER 5. SCALABLE LEARNING-BASED SAMPLING
OPTIMIZATION

Figure 5.1: Illustration of the LBCS greedy approach. On the right, we start from an empty
mask, and select the line that yields to the best performance. Once it is acquired, we keep this
mask, try adding to it the line that yields the best performance given what has been acquired, and
continue until the final sampling budget is reached. On the left, we see how the performance is
assessed: we use a set D of training data, that are subsampled according to the current mask,
then reconstructed and evaluated using the performance metric η. The computation is repeated
for each different mask.

Fourier space at once, we define a set of subsets S of {1, . . . , P } and assume that the
final mask will be constructed as
ω=

t
[

Sj , s.t. Sj ∈ S

(5.1)

j=1

for some t > 0. Gözcü et al. (2018) also introduce a cost function c(ω) ≥ 0 and a maximal
cost Γ that the final mask ω must not exceed. In the case of Problem 4.5, we have
c(ω) = |ω| and Γ = N . With these two additions, the LBCS method (Gözcü et al., 2018)
is displayed in Algorithm 1. The procedure is very simple: while budget is available (l.2),
the algorithm tests all feasible candidates S ∈ S (l.3-7) and adds permanently to the
sampling mask the candidate that yields the largest performance gain (l.8).
The greedy approach of LBCS brings multiple advantages over previous heuristics such
as variable-density sampling. First, the mask ω has a nested structure: by recording the
order in which elements were added to the mask, it is possible to immediately adapt to
different costs Γ. This feature is not possible for many approaches that directly optimize
the sampling mask for a given sampling rate. In addition, this approach does not have
any parameter that requires to be tuned, making it particular easy to implement in
48

5.2. LBCS MOTIVATION: SUBMODULARITY

Algorithm 1 LBCS
Input: Training data x1 , . . . , xm , decoder fθ ,
sampling subsets S, cost function c, maximum cost Γ
Output: Sampling pattern ω
1: ω ← ∅
2: while c(ω) ≤ Γ do
3:
for S ∈ S such that c(ω ∪ S) ≤ Γ do
4:
ω′ = ω ∪ S
5:
For each j, set yj ← Pω′ F xj , x̂j ← fθ (yj , ω ′ )
1 Pm
6:
η(ω ′ ) ← m
j=1 η(xj , x̂j )
7:
end for
8:
ω ← ω ∪ S∗,
S ∗ = arg max η(ω ∪ S) − η(ω)
S : c(ω∪S)≤Γ

9: end while
10: return ω

practice. It also does not rely on any probability distribution from which candidate
masks are drawn, which makes it fully learning-based. Finally, the algorithm is not
tied to a specific reconstruction method or performance metric, which makes it easy to
integrate in novel approaches.

5.2

LBCS motivation: Submodularity

The idea to apply a greedy approach to the problem of mask design was initially motivated
by techniques used in the field of submodular function maximization (Baldassarre et al.,
2016; Krause and Golovin, 2014). We discuss in greater depth these concepts here as
the motivation for scaling up LBCS are also motivated by algorithms proposed in the
context of submodular function maximization.
Submodular functions have been studied for a long time (Nemhauser et al., 1978; Minoux,
1978), and formalize among other things the idea of diminishing returns. This is the
idea that adding an element to a smaller set increases the objective more than when it is
added to a larger set. Formally
Definition (Submodularity). A set function η(ω) mapping subsets ω ∈ [P ] to real
numbers is said to be submodular if, for all ω1 , ω2 ⊂ [P ] with ω1 ⊆ ω2 and all
i ∈ [P ], it holds that
η(ω1 ∪ {i}) − η(ω1 ) ≥ η(ω2 ∪ {i}) − η(ω2 ).

(5.2)

The function is said to be modular if the same holds true with equality in place of
inequality
49

CHAPTER 5. SCALABLE LEARNING-BASED SAMPLING
OPTIMIZATION
An example of this is sensor placement. As one places more sensors to cover an area
more fully, often, the area covered by sensors will be overlapping, and so adding a new
sensor will bring less information if a large set of sensors is already available. This is
illustrated on Figure 5.2.
This typically translates into problems of submodular function maximization, where a
common constraint lies on the cardinality of the solution:
max η(ω) subject to |ω| ≤ N.

(5.3)

ω⊆[P ]

In the case of linear reconstruction in MRI, it can be shown that the problem of finding the
sampling mask that gives minimal mean squared error is a modular optimization problem:
there is no diminishing returns, but each component can be optimized individually
(Baldassarre et al., 2016). In the case of a non-linear reconstruction in MRI, there is no
formal proof that shows that submodularity exactly holds, but empirical results show a
diminishing return throughout acquisition. This is what motivated Gözcü et al. (2018)
to leverage tools from submodular optimization for mask design in MRI.

Sensor 1

Sensor 2
Sensor 3

⌘({1} [ {3})
<latexit sha1_base64="JgLtd2/rfAmq83c1HEzifpdqXww=">AAACOnicbVDLSgMxFM34rPU16tJNsAgtaJmpRV0W3bhswdZCU0omvW2DmYdJRijDfJcbv8KdCzcuFHHrB5g+8FE9EDiccy4393iR4Eo7zqM1N7+wuLScWcmurq1vbNpb2w0VxpJBnYUilE2PKhA8gLrmWkAzkkB9T8CVd30+8q9uQSoeBpd6GEHbp/2A9zij2kgdu0ZA0zxJXJJiwuIIk+SIpAV8iL+NAiZ9uPkSDkok/T86cgodO+cUnTHwX+JOSQ5NUe3YD6QbstiHQDNBlWq5TqTbCZWaMwFplsQKIsquaR9ahgbUB9VOxqeneN8oXdwLpXmBxmP150RCfaWGvmeSPtUDNeuNxP+8Vqx7p+2EB1GsIWCTRb1YYB3iUY+4yyUwLYaGUCa5+StmAyop06btrCnBnT35L2mUiu5xsVwr5ypn0zoyaBftoTxy0QmqoAtURXXE0B16Qi/o1bq3nq03630SnbOmMzvoF6yPT9+bqds=</latexit>

⌘({1})

⌘({1, 2} [ {3})

⌘({1, 2})

Figure 5.2: Illustration of the diminishing return property of submodular functions. If h quantifies
the total coverage of sensors, we see that adding sensor 3 to sensor 1 increases the total coverage
more than when adding it to the set {1, 2}, as there is more surface coverage surface overlapping
in this case.

Indeed, greedy algorithms have a special place in submodular optimization because of
the celebrated result by Nemhauser et al. (1978), who proved that the greedy algorithm
50

5.2. LBCS MOTIVATION: SUBMODULARITY
provides a good approximation to the optimal solution of NP-hard instances of submodular
function maximization (Krause and Golovin, 2014).
Theorem 1 (Nemhauser et al. (1978)). Let η : 2P → R+ be a nonnegative monotone
submodular function. Let ωt , t ≥ 0 be the sequence obtained by the greedy algorithm.
Then, for all positive t, N ,


η(ωt ) ≥ 1 − e−t/N



max η(ω).

ω:|ω|≤N

(5.4)

In particular, for t = N , η(ωN ) ≥ (1 − 1/e) maxω:|ω|≤N η(ω).
For many classes of submodular functions h, this result is the best that can be achieved
with any efficient algorithm (Krause and Golovin, 2014). In particular, Nemhauser and
Wolsey (1978) proved that any algorithm that can only evaluate h on a polynomial
number of set will not be able to obtain an approximation guarantee that is better than
(1 − 1/e). This property could be key to explaining the strong empirical performance of
the LBCS algorithm of Gözcü et al. (2018) in the context of MRI.
Although the approximation guarantee cannot be generally improved, many works have
proposed more efficient approaches than the simple greedy algorithm. Two works are of
particular interest to us, as they serve as the motivation to the algorithms that we will
describe in the next section. These algorithms are the lazy greedy (Minoux, 1978) and
stochastic greedy, called lazier than lazy greedy (Mirzasoleiman et al., 2015).
The lazy greedy algorithm relies on the fact that if at the i-th iteration of the greedy
algorithm, an element S is expected to bring a marginal benefit ∆(S|ωi ) = η(ωi ∪S)−η(ωi ),
then it holds that ∆(S|ωj ) ≤ ∆(S|ωi ) for all j ≥ i for a submodular function η. Exploiting
this fact, one can keep a list of upper bounds on the marginal benefits of each element
S, called ρ(S), initialized at +∞. Then, at each iteration, the element S with largest
value ρ(S) is picked and updated as ρ(S) = ∆(S|ω). If ρ(S) ≥ ρ(S ′ ) ∀S ′ ∈ S, the
marginal benefit of this element S is larger than the upper bounds of each other marginal
contribution, and is consequently added permanently to ω. This can speed up the
algorithm by several orders of magnitude, since ρ(S) ≥ ρ(S ′ ) can be satisfied after a few
trials, instead of trying each available S at every iteration (Krause and Golovin, 2014).
The stochastic greedy algorithm (Mirzasoleiman et al., 2015) samples at each step of the
greedy algorithm a subset of candidates Siter ⊆ S, with |Siter | = k at line 3 of Algorithm 1.
As a result, given a nonnegative monotone submodular function η : 2P → R+ and taking
P
k=N
log (1/ϵ), the stochastic greedy algorithm achieves a (1 − 1/e − ϵ) approximation
guarantee in expectation to the optimum, using only O (P log(1/ϵ)) evaluations.
Both of these algorithms inspired the methods that we proposed to scale up LBCS, and
their implementation in the context of MRI will be respectively covered in Sections 5.3.3
51

CHAPTER 5. SCALABLE LEARNING-BASED SAMPLING
OPTIMIZATION
and 5.3.2.
Remark (Submodular optimization under noise). Although the cases so far mostly
discussed optimizing submodular functions without measurement noise, such a setting
is not realistic in the case of prospective MRI acquisitions, where repeated lines will
have a different noise, and resampling can improve the average signal. Nonetheless,
submodular optimization under noise has been studied, and algorithms that adapt
to such settings have been proposed (Singla et al., 2016; Hassidim and Singer, 2017).

5.3

Scaling up LBCS

5.3.1

Limitations of LBCS

As we have hinted above, in many practical scenarios, evaluating O(P 2 ) configurations
remains very expensive. This is also due to the fact that each configuration requires
reconstructing m data points. In particular, the LBCS approach becomes prohibitively
expensive when either of the following occurs:
1. When the number of candidate locations grows significantly. This is particularly
the case, in 3D MRI, where one needs to undersample entire volumes, or dynamic
MRI, where one undersamples multiple frames for a single 2D image.
2. When the size of the dataset grows, it becomes impractical to evaluate LBCS on
the whole testing set at once.
3. When the running time of the reconstruction method becomes large. This is
particularly the case for multi-coil MRI, which is the setting most commonly used
in practice. This issue also occurs when dealing with larger data, from 3D or
dynamic MRI.
As a result, we wish to develop variants of LBCS that allow to tackle these different
settings, while retaining the performance of LBCS as much as possible. We will discuss
two variants of LBCS that were respectively proposed in Gözcü et al. (2019) and Sanchez
et al. (2020a), and that aim at tackling mostly the issues of multi-coil and 3D MRI as
well as dynamic MRI. We start by explaining how they differ from LBCS.

5.3.2

Stochastic LBCS

With stochastic LBCS (sLBCS), we aim at addressing two fundamental drawbacks of
the approach of Gözcü et al. (2018), namely that 1. its scaling is quadratic with respect
to the total number of readout lines, 2. its scaling is linearly with respect to the size
of the dataset. With this approach, we also aim at integrating a specific constraint of
52

5.3. SCALING UP LBCS
dynamic MRI (dMRI), namely that the sampling mask should have a fixed number of
readouts by temporal frame.
Scalability with respect to the size of the dataset is essential for the applicability of
sLBCS to modern datasets, which can contain several thousands of images, making it
impractical to go through the entire dataset at each inner loop of the algorithm.
Before describing the algorithm, let us briefly recall the dynamic MRI setting. The main
difference with static MRI is that the signal will be a vectorized video, i.e. x ∈ CP , where
P = H × W × T instead of H × W . As a result, the operator Aω will be constructed
differently, and in particular, the Fourier transform will be applied frame-wise. In
addition, the mask ω will be constructed as a union of elements from a set S that will
represent lines at different time frames, and will be described as S = S1 ∪ . . . ∪ ST . We
also define the set L ⊆ {1, . . . , m} as a set of indices where the training data {x}m
i=1 will
be subsampled.
Let us now detail how our proposed stochastic greedy method solves the limitations 1.
and 2. of LBCS (Algo. 1). The issue 1. is solved by picking uniformly at random at
each iteration a batch possible readout lines Siter of constant size k from a given frame
St , instead of considering the full set of possible lines S (line 3 in Alg. 2), and is inspired
from the stochastic greedy of Mirzasoleiman et al. (2015); the number of candidate
locations is further reduced by iterating through the lines to be added from each frame
St sequentially (lines 1, 3 and 11 in Algorithm 2); 2. is addressed by considering a fixed
batch of training data L of size l instead of the whole training set of size m at each
iteration (line 4 in Algorithm 2);
Remark. We will discuss the impact of introducing the batch size k for the readout
lines and the subset L to subsample training data in the results. However, we want
to mention here that one has to be careful to pick the subset L before the for loop
of line 5 in Algorithm 2. Indeed, failing to do this would result in the different
candidates S would be evaluated on different subsets of the data. As there is some
variation within the data, for instance in terms of SNR, such an approach would
yield performances that are not comparable among different candidates. In turn, this
would skew the results towards picking the candidate location S with an associated
subset LS with high signal-to-noise ratio, at the expense of selecting a readout
location for the performance improvement that it actually brings. This would be
especially the case if |L| were small. Picking L before this for loop ensures that the
readout line selected will indeed be the one that brings the largest performance gain
among the candidates for the subset of data L.
The improvements of sLBCS allow to reduce the computational complexity from

Θ mr(HT )2 to Θ (lrkHT ), effectively speeding up the computation by a factor Θ( ml HT
k ),
where r denotes the cost of a single reconstruction. Our results show that this is achieved
53

CHAPTER 5. SCALABLE LEARNING-BASED SAMPLING
OPTIMIZATION
without sacrificing any reconstruction quality. We will see that the improvement is
particularly significant for large datasets, as the batch size l will typically remain small,
typically up to 100 samples, whereas MRI datasets can contain more than 15 000 data
points (Zbontar et al., 2019).
Algorithm 2 sLBCS: Greedy mask optimization algorithm for (d)MRI
Input: Training data {x}m
i=1 , recon. rule g, sampling set S, max.
cardinality n, samp. batch size k, train. batch size l
Output: Sampling pattern ω
1: (dMRI) Initialize t = 1
2: while |ω|
(≤ n do
Siter ⊆ S (MRI)
3:
Pick
at random, with |Siter | = k
Siter ⊆ St (dMRI)
4:
Pick L ⊆ {1, . . . , m}, with |L| = l
5:
for S ∈ Siter such that |ω ∪ S| ≤ Γ do
6:
ω′ = ω ∪ S
7:
For each ℓ ∈ L set x̂ℓ ← fθ (Pω′ F xℓ , ω ′ )
1 P
8:
η(ω ′ ) ← |L|
ℓ∈L η(xℓ , x̂ℓ )
9:
end for
10:
ω ← ω ∪ S ∗ , where S ∗ = arg max η(ω ∪ S)
S:|ω∪S|≤n
S∈Siter

11:

(dMRI) t = (t mod T ) + 1

12: end while
13: return ω

5.3.3

Lazy LBCS

In cases where the cardinality of the candidate elements S grows extremely large, even
Algorithm 2 can become impractical. In 2D MRI, LBCS deals with images of size H × W
while having a candidate set of size H, with H different lines. However, in the case
of 3D MRI, the number of candidate locations explodes, as we move to images of size
P = H × W × S, where S is the number of slices, with a candidate set of size H × W ,
i.e., one is allowed to sample in both phase and frequency encoding directions. While in
2D dynamic MRI, the candidate set is of size H × T with T typically much smaller than
H, in the case of 3D MRI, we often have H ≈ W . In addition, we have less structure
than in dMRI, as there is no option to cycle through frames. This implies that the batch
of randomly sampled locations must be much larger to enable a good performance, which
results in the cost remaining overall prohibitive. An additional reason for this increased
cost is the reconstruction time of 3D methods. The number of slices can be much larger
than the number of frames in dMRI, slowing down reconstruction considerably. Moving
from single coil to multi-coil data further increases this cost, although this change does
not increase the complexity of the sampling optimization problem.
54

5.4. EXPERIMENTS - STOCHASTIC LBCS
Lazy LBCS proposes a solution to this limitation by leveraging another approach to
speed up LBCS. It uses lazy evaluations (Minoux, 1978) which is equivalent to but faster
than the greedy algorithm for submodular functions, and possess optimization guarantees
for such functions (Nemhauser et al., 1978). While in our present setting, verifying
submodularity is difficult because writing the performance of reconstructions in a closed
form is generally not possible, we were motivated by the fact that Algorithm 3 works
well in practice even if the objective function is not exactly submodular. Indeed, as we
will observe in the experiments, lazy evaluations also perform well for mask optimization.
The implementation of lazy evaluations for LBCS is described in Algorithm 3. We see that
a list of upper bounds is initially computed (l.2), and that we sequentially traverse the
list of upper bounds until we find a candidate sample S that brings a larger improvement
than any of the upper bounds in ρ (ll.8-9).
Algorithm 3 Lazy Learning-based Compressive Sensing (lLBCS)
Input: Training data x1 , . . . , xm , decoder f , sampling subsets S, cost
function c, maximum cost Γ
Output: Sampling pattern ω
1: ω ← ∅
2: ρ(S) ← +∞ ∀S ∈ S s.t. c(ω ∪ S) ≤ Γ
3: while c(ω) ≤ Γ do
4:
ω ′ ← ω ∪ S, where S =
arg max
ρ(S ′ )
S ′ ∈S : c(ω∪S ′ )≤Γ

5:
6:

For each j, set yj ← Pω′ F xj , x̂j ← fθ (yj , ω ′ )
m
1 X
η(xj , x̂j )
η(ω ′ ) ←
m j=1

ρ(S) ← η(ω ′ ) − η(ω)
8:
if ρ(S) ≥ ρ(S ′ ) ∀S ′ ∈ S s.t. c(ω ∪ S ′ ) ≤ Γ then
9:
ω =ω∪S
10:
end if
11: end while
12: return ω
7:

5.4

Experiments - Stochastic LBCS

In this Section, we validate the performance of sLBCS, and will discuss lLBCS in Section
5.5. We carry out extensive experiments to validate our stochastic LBCS method. We
first compare sLBCS against the vanilla LBCS methods of Gözcü et al. (2018), paying
particular attention to how it compares to the original method in terms of performance
and computational complexity. We study the influence on the batch sizes k and l on
performance and on the design of the mask. Then, we carry out experiments on both
single coil and multicoil data, as well as noise free and noisy data. Finally, we show the
benefit of our method on large scale datasets, for which LBCS is prohibitively expensive.
55

CHAPTER 5. SCALABLE LEARNING-BASED SAMPLING
OPTIMIZATION

5.4.1

Experimental setting

Dataset. Our experiments were carried out on three different datasets.
• Cardiac dataset. The data set was acquired in seven healthy adult volunteers with
a balanced steady-state free precession (bSSFP) pulse sequence on a whole-body
Siemens 3T scanner using a 34-element matrix coil array. Several short-axis cine
images were acquired during a breath-hold scan. Fully sampled Cartesian data
were acquired using a 256 × 256 grid, with relevant imaging parameters including
320 × 320 mm field of view (FoV), 6 mm slice thickness, 1.37 × 1.37 mm spatial
resolution, 42.38 ms temporal resolution, 1.63/3.26 ms TE/TR, 36◦ flip angle,
1395 Hz/px readout bandwidth. There were 13 phase encodes acquired for a frame
during one heartbeat, for a total of 25 frames after the scan.
The Cartesian cardiac scans were then combined to single coil data from the initial
256 × 256 × 25 × 34 size, using adaptive coil combination (Walsh et al., 2000;
Griswold et al., 2002b). This single coil image was then cropped to a 152 × 152 × 17
image. This was done because a large portion of the periphery of the images
are static or void, and also to enable a greater computational efficiency. In the
experiments, we used three volumes for training and four for testing.
• Vocal dataset. The vocal dataset that we used in the experiments comprised 4
vocaltract scans with a 2D HASTE sequence (T2 weighted single-shot turbo spinecho) on a 3T Siemens Tim Trio using a 4-channel body matrix coil array. The
study was approved by the local institutional review board, and informed consent
was obtained from all subjects prior to imaging. Fully sampled Cartesian data
were acquired using a 256 × 256 grid, with 256 × 256 mm field of view (FoV), 5 mm
slice thickness, 1 × 1 mm spatial resolution, 98/1000 ms TE/TR, 150◦ flip angle,
391 Hz/px readout bandwidth, 5.44 ms echo spacing (256 turbo factor). There was
a total of 10 frames acquired, which were recombined to single coil data using
adaptive coil combination as well (Walsh et al., 2000; Griswold et al., 2002b).
• fastMRI. The fastMRI dataset was obtained from the NYU fastMRI initiative
(Zbontar et al., 2019). The anonymized dataset comprises raw k-space data from
more than 1,500 fully sampled knee MRIs obtained on 3 and 1.5 Tesla magnets.
The dataset includes coronal proton density-weighted images with and without fat
suppression.
Reconstruction algorithms. We consider two reconstruction algorithms, namely k-t
FOCUSS (KTF) (Jung et al., 2009), and ALOHA (Jin et al., 2016). Their parameters were
selected to maintain a good empirical performance across all sampling rates considered.
KTF aims at reconstructing a dynamic sequence by enforcing its sparsity in the x-f
56

5.4. EXPERIMENTS - STOCHASTIC LBCS
domain. Formally, it aims at solving
min ∥z∥1 subject to ∥yω − Pω F Ft−1 z∥2 ≤ ϵ
z

where Ft−1 denotes the inverse temporal Fourier transform. It tackles the problem by
using a reweighted quadratic approach (Jung et al., 2007).
ALOHA (Jin et al., 2016) is an annihilating filter based low-rank Hankel matrix approach. ALOHA exploits the duality between sparse signals and their transform domain
representations, cast as a low-rank Hankel matrix, and effectively transforms CS reconstruction into a low-rank matrix completion problem. We refer the reader to their
detailed derivation for further details.
Mask selection baselines:
• Coherence-VD (Lustig et al., 2007): We consider a random variable-density sampling
mask with Gaussian density and optimize its parameters to minimize coherence.
This is an instance of a model-based, model-driven approach.
• Golden (Li et al., 2018): This approach designs a Cartesian mask by adding
progressively lines according to the golden ratio. This is a model-based, modeldriven approach.
• LB-VD (Gözcü et al., 2018; Gözcü et al., 2019): Instead of minimizing the coherence
as in Coherence-VD, we perform a grid search on the parameters using the training
set to optimize reconstruction according to the same performance metric as our
method. This makes this approach model-based, data-driven.
We also consider several variants of LBCS and sLBCS that we will use throughout the
experiments. We define abbreviations for conciseness:
• G-v1 is the original LBCS algorithm.
• G-v2 is a version of sLBCS that uses a batch of training data l and cycles through
the frames.
• SG-v1 is a version of the stochastic LBCS algorithm that considers a batch of
sampling locations k but uses all training data, i.e. l = |m|.
• SG-v2 is sLBCS, using a batch of sampling locations k, a batch of training data l
and cycling through the frames.

57

CHAPTER 5. SCALABLE LEARNING-BASED SAMPLING
OPTIMIZATION
Remark. In this work, we mainly focused on small scale datasets with a large set of
candidate sampling locations. Our experiments therefore are carried out using SG-v1
and SG-v2. We did not observe any significant performance difference between these
variants of sLBCS.
However, we note that G-v2 might be of interest in deep-learning based settings,
where one deals with large datasets, such as fastMRI (Zbontar et al., 2019). This
will be further discussed in Section 5.4.5.

5.4.2

Comparison of greedy algorithms

We first evaluate SG-v1 and SG-v2 against G-v1 to establish how sensitive our proposed
method is to its parameters k and l. We carry out this experiment on the cardiac data,
and report only the results on KTF, although similar trends can be observed with other
reconstruction algorithms.

40

PSNR

37
34

k=5
k = 10
k = 20
k = 38
k = 76
k = 152
G-v1

31
28
0.05

0.15
Sampling rate

0.25

Figure 5.3: PSNR as a function of the rate
for KTF, comparing the effect of the batch
size on the quality of the reconstruction for
SG-v1. The result is averaged on 4 testing
images of size 152 × 152 × 17.

Influence of the batch size k on the mask
design. We first discuss the tuning of the batch
size used in SG-v1, to specifically study the effect of different batch sizes. Figure 5.3 provides
quantitative results of the performance, while
Figure 5.4 shows the shapes of the masks obtained with various batch sizes. Unsurprisingly,
small batch sizes yield poor results, as this algorithm has to choose from a very small set
of randomly sampled candidates, being likely
to miss out on important frequencies, such as
the center ones. It is interesting to see however that the PSNR reaches the result from
G-v1 with as few as 38 samples (out of the 152
candidates of a frame, and 152 × 17 = 2584
candidates overall).

In some cases, such as k = 38, SG-v1 can even
yield an improved performance over G-v1, with 60 times less computation that G-v1. We
hypothesize that it is due to the noise introduced by sLBCS allowing to avoid suboptimal
configurations encountered by G-v1. Note that this might not be expected if the problem
was to be exactly submodular, but this is not the case in practice.
Focusing on Figure 5.4, one can make several observations. First of all, as expected,
taking a batch size of 1 yields a totally random mask, and taking a batch size of 5 yields
a mask that is more centered towards low frequency than the one with k = 1 but it still
58

5.4. EXPERIMENTS - STOCHASTIC LBCS
has a large variance. Then, as the batch size increases, resulting masks seem to converge
to very similar designs, but those are slightly different from the ones obtained with G-v1.
Overall, our initial experiments suggest, as a rule of thumb, that roughly sampling 25%
of a frame enables sLBCS to match the performance of G-v1.

5

1

10

20

38

50

76 152 Greedy

0.25

0.15

Figure 5.4: Learning-based masks obtained with SG-v1 for different batch
sizes k using KTF as a reconstruction
algorithm, shown in the title of each
column, for 15% and 25% sampling
rate. The optimization used data of
size 152 × 152 × 17, with a total of
2584 possible phase encoding lines for
the masks to pick from.

Influence of the batch size l. We then include the effect of using a smaller data batch
size l into the equation, comparing the performances of G-v1 with SG-v1 and SG-v2.
The results are presented on Figure 5.5. Both SG-v1 and SG-v2 behave similarly, and
for k = 38, both methods end up outperforming G-v1, with respectively 60 times less
and 180 times less computation. As seen before, using a too small batch size k (e.g. 10)
yields a drop in performance. Using a batch of training images l (SG-v2) does not seem
to significantly reduce the performance compared to SG-v1, while substantially reducing
computations. As a result, in the sequel, we use k = 38 and l = 1 for SG-v2.

40

39
38

37 37
PSNR

0.21 0.23 0.25

34
SG-v1 k = 10
SG-v1 k = 38
SG-v2 k = 10
SG-v2 k = 38
G-v1 Greedy

31
28
0.05

0.15
Sampling rate

0.25

Computational costs. We discuss now the
computational costs for the different variations
of the greedy methods used in the single coil
experiments. Table 5.1 provides the running
times and empirically measured speedup for
the greedy variation, and Table 5.2 provides
the computational times required to obtain
the learning-based variable density (LB-VD)
parameters through an extensive grid-search.
The empirical speedup is computed as
Speedup =

tG-v1 · nprocs, G-v1
tSG-v2 · nprocs, SG-v2

(5.5)

Figure 5.5: PSNR as a function of the sampling rate for KTF, comparing the different
reconstruction methods as well as the effect where tG-v1 , tSG-v2 refer to the measured runof the batch size on the quality of the recon- ning times of the algorithms, and nprocs, G-v1 ,
struction for SG.

nprocs, SG-v2 to the number of CPU processes
used to carry out the computations.

59

CHAPTER 5. SCALABLE LEARNING-BASED SAMPLING
OPTIMIZATION
Algorithm

G-v1 (LBCS)

Setting

SG-v1 (sLBCS-v1)

SG-v2 (sLBCS-v2)

Time

nprocs

Time

nprocs

Speedup

Time

nprocs

Speedup

KTF

152×152×17×3
256×256×10×2

6d 23h
∼ 7d 8h∗

152
256

11h 40
12h 20

38
64

58 (68)
57 (68)

3h 25
5h 20

38
64

170 (204)
173 (204)

IST

152×152×17×3

3d 11h

152

5h 30

38

60 (68)

1h 37

38

184 (204)

ALOHA

152×152×17×3

∼ 25d 1h

152

1d 14h 25

38

62 (68)

18h 13

38

133 (204)

∗

Table 5.1: Running time of the greedy algorithms for different decoders and training data sizes.
The setting corresponds to nx , ny , nframes , ntrain . The number nprocs denotes how many parallel
processes are used by each simulation. ∗ means that the runtime was extrapolated from a few
iterations. We used k = nprocs for SG-v1 and SG-v2 and l = 1 for SG-v2. The speedup column
contains the measured speedup and the theoretical speedup in parentheses.

Algo.

Setting

npars

nprocs

Time

KTF

152×152×17×3
256×256×10×2∗

1200
2400

38
64

6h 30
6h 45

IST

152×152×17×3

1200

38

3h 20

ALOHA

152×152×17×3

1200

38

1d 8h

Table 5.2: Comparison of the learning-based random variable-density Gaussian sampling optimization for different settings. npars denotes the size of the grid used to optimize the parameters.
For each set of parameters, the results were averaged on 20 masks drawn at random from the
distribution considered. The npars include a grid made of 12 sampling rates (uniformly spread in
[0.025, 0.3]), 10 different low frequency phase encodes (from 2 to 18 lines), and different widths of
the Gaussian density (uniformly spread in [0.05, 0.3]) – 10 for the images of size 152 × 152, 20 in
the other case.

The main point of these tables is to show that the computational improvement is very
significant in terms of resources, and that our approach improves greatly the efficiency of
the methodof Gözcü
 et al. (2018). This ratio might differ from the predicted speedup
factor of Θ ml NT
due to computational considerations. Table 5.1 shows that we have
k
roughly a factor 1.2 between the predicted and the measured speedup, mainly due to the
communication between the multiple processes as well as I/O operations.
Moving to the grid search computations for LB-VD, displayed in Table 5.2, we ought to
do several remarks. The number of parameters was chosen in order to ensure the same
order of magnitude of computations to be carried out for both LB-VD and sLBCS. Note
also that in opposition to SG-v2, the grid search cannot be sped up by using a batch
of training data: all the evaluation should be done on the same data. In the case of
SG-v2 however, one can draw a new batch of training data at each sampling round of
the algorithm.
60

5.4. EXPERIMENTS - STOCHASTIC LBCS

5.4.3

Single coil results

Main results. We see on Figures 5.6 and 5.7 that the SG-v2 brings a consistent
improvement over all baselines considered, even though some variable-density techniques
are able to provide good results for some sampling rates and algorithms. Compared to
Coherence-VD, there is always at least 1 dB improvement at any sampling rate, and it
can be as much as 6.7 dB at 5% sampling rate for ALOHA. For the rest, the improvement
is over 0.5 dB. Figure 5.6 also clearly indicates that the benefits of our learning-based
framework become more apparent towards higher sampling rates, where the performance
improvement over LB-VD reaches up to 1 dB. Towards lower sampling rates, with much
fewer degrees of freedom to achieve good mask designs, the greedy method and LB-VD
yield similar performance, which is expected2 .

ALOHA

KTF
40

PSNR

37
34
31

Coherence-VD
LB-VD
SGv2-KTF
SGv2-ALOHA

28
0.05

0.15
Sampling rate

0.25

0.05

0.15
Sampling rate

0.25

Figure 5.6: PSNR as a function of sampling rate, evaluating the different masks (Coherence-VD,
LB-VD, SG-v2-KTF, SG-v2-ALOHA) on both reconstruction methods (KTF and ALOHA). The
results are averaged over 4 images.

Focusing on Figure 5.7, we observe that comparing reconstruction algorithms using
the model-based, model-driven approach Coherence-VD does not allow for a faithful
comparison of their performance. In this case, the performance difference between
KTH and ALOHA is marginal. However, when moving to the model-based, data-driven
approach LB-VD, the difference becomes quantitatively clearer. Moving to the learningbased, data-driven approach SG-v2 makes the difference even more noticeable: ALOHA
with its corresponding mask clearly outperforms KTF, and this conclusion could not
be made by looking solely at reconstructions with VD-based masks. This trend is also
reflected on Figure 5.6.
This is an important observation, as most works have developed new reconstruction
algorithms that were then tested using random masks designed with criteria such as
minimal coherence. However, our results suggest that this approach does not reflect the
true performance of a reconstruction method, and that one should look to optimize the
At low sampling rates, learning-based approaches will prioritize low-frequency phase encoding lines,
as those contain most of the energy at a given frame.
2

61

CHAPTER 5. SCALABLE LEARNING-BASED SAMPLING
OPTIMIZATION
KTF decoder

ALOHA decoder

PSNR=33.56
SSIM=0.86

PSNR=33.39
SSIM=0.85

PSNR=34.41
SSIM=0.88

PSNR=34.99
SSIM=0.89

PSNR=35.18
SSIM=0.88

PSNR=35.94
SSIM=0.9

PSNR=33.78
SSIM=0.87

PSNR=36.59
SSIM=0.91

Ground truth

LB-VD ALOHA

SGv2-ALOHA

SGv2-KTF

LB-VD KTF

Coherence-VD

Mask

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Figure 5.7: Comparison of the different reconstruction masks and decoders, for a sampling rate
of 15% on a single sample with its PSNR/SSIM performances.

mask and the reconstruction algorithm jointly.
The results on Figure 5.6 highlight also an interesting asymmetry between the performance
of the SG-v2 mask optimized on KTF (SG-v2-KTF) or ALOHA (SG-v2-ALOHA): the
one from KTF is more centered towards low frequencies than the one of ALOHA, and
also performs well, and better than LB-VD on ALOHA. However, the reverse is not true:
the mask from ALOHA is very scattered throughout k-space and leads to a very poor
62

5.4. EXPERIMENTS - STOCHASTIC LBCS
performance when applied to KTF. This trend is also observed on multi-coil data, as
highlighted on Table 5.3. This suggests that ALOHA, not relying explicitly on ℓ1 sparsity,
might succeed in exploiting varied information, from both low-frequency emphasizing
masks from models promoting sparsity, but also higher frequency information.
Remark. This point does not hold directly for deep learning methods, that will
be discussed in later chapters. Indeed, the reconstruction methods that we use in
these experiments are all based on an underlying model, such as sparsity or low-rank.
They are not data-driven approaches to reconstruction.
The question of what truly reflects the performance of a reconstruction method
becomes even more complex in the case of deep learning-based reconstruction method,
where the model is trained on a distribution of different sampling masks, making
reconstruction and sampling even more intricate. We will come back to this question
in Chapter 6.
Remark. The results in the sequel are obtained with the SG-v1 algorithm, rather
than SG-v2. Recall that SG-v1 already accelerated G-v1 by a factor 60, and that we
did not observe, in our experiments, a significant performance difference between
SG-v1 and SG-v23 .
Cross-performances of performance measures. Up to here, we used PSNR as the
performance measure, and we now compare it with the results of the greedy algorithm
paired with SSIM, a metric that more closely reflects perceptual similarity (Wang et al.,
2004). For brevity, we only consider ALOHA in this section. In the case where we
optimized for SSIM, we noticed that unless a low-frequency initial mask is given, the
reconstruction quality would mostly stagnate. This is why we chose to start the greedy
algorithm with 4 low-frequency phase encodes at each frame in the SSIM case.
The reconstructions for PSNR and SSIM are shown on Figure 5.8, where we see that the
learning-based masks outperform the baselines across all sampling rates except at 2.5%
in the SSIM case. The quality of the results is very close for both masks, but each tends
to perform slightly better with the performance metric for which it was trained. The
fact that the ALOHA-SSIM result at 2.5% has a very low SSIM is due to the fact that
we impose 4 phase encodes across all frames, and the resulting sampling mask at 2.5% is
a low pass mask in this case.
A visual reconstruction is provided in Figure 5.9, we see that there is almost no difference
in reconstruction quality, and that the masks remain very similar. Overall, we observe in
this case that the performance metric selection does not have a dramatic effect on the
The main reason of having these experiments on SG-v1 is that they wer