E QUIVARIANT S CALAR F IELDS FOR M OLECULAR D OCKING WITH FAST F OURIER T RANSFORMS Bowen Jing1 , Tommi Jaakkola1 , Bonnie Berger1 2 CSAIL, Massachusetts Institute of Technology 2 Dept. of Mathematics, Massachusetts Institute of Technology bjing@mit.edu, {tommi, bab}@csail.mit.edu arXiv:2312.04323v1 [q-bio.BM] 7 Dec 2023 1 A BSTRACT Molecular docking is critical to structure-based virtual screening, yet the throughput of such workflows is limited by the expensive optimization of scoring functions involved in most docking algorithms. We explore how machine learning can accelerate this process by learning a scoring function with a functional form that allows for more rapid optimization. Specifically, we define the scoring function to be the cross-correlation of multi-channel ligand and protein scalar fields parameterized by equivariant graph neural networks, enabling rapid optimization over rigid-body degrees of freedom with fast Fourier transforms. The runtime of our approach can be amortized at several levels of abstraction, and is particularly favorable for virtual screening settings with a common binding pocket. We benchmark our scoring functions on two simplified docking-related tasks: decoy pose scoring and rigid conformer docking. Our method attains similar but faster performance on crystal structures compared to the widely-used Vina and Gnina scoring functions, and is more robust on computationally predicted structures. Code is available at https://github.com/bjing2016/scalar-fields. 1 I NTRODUCTION Proteins are the macromolecular machines that drive almost all biological processes, and much of early-stage drug discovery focuses on finding molecules which bind to and modulate their activity. Molecular docking—the computational task of predicting the binding pose of a small molecule to a protein target—is an important step in this pipeline. Traditionally, molecular docking has been formulated as an optimization problem over a scoring function designed to be a computational proxy for the free energy (Torres et al., 2019; Fan et al., 2019). Such scoring functions are typically a sum of pairwise interaction terms between atoms with physically-inspired functional forms and empirically tuned weights (Quiroga & Villarreal, 2016). While these terms are simple and hence fast to evaluate, exhaustive sampling or optimization over the space of ligand poses is difficult and leads to the significant runtime of docking software. ML-based scoring functions for docking have been an active area of research, ranging in sophistication from random forests to deep neural networks (Yang et al., 2022; Crampon et al., 2022). These efforts have largely sought to more accurately model the free energy based on a docked pose, which is important for downstream identification of binders versus non-binders (virtual screening). However, they have not addressed nor reduced the computational cost required to produce these poses in the first place. Hence, independently of the accuracy of these workflows, molecular docking for large-scale structure-based virtual screening remains computationally challenging, especially with the growing availability of large billion-compound databases such as ZINC (Tingle et al., 2023). In this work, we explore a different paradigm and motivation for machine learning scoring functions, with the specific aim of accelerating scoring and optimization of ligand poses for high-throughput molecular docking. To do so, we forego the physics-inspired functional form of commonly used scoring functions, and instead frame the problem as that of learning scalar fields independently associated with the 3D structure of the protein and ligand, respectively. We then define the score to be the cross-correlation between the overlapping scalar fields when oriented according to the ligand pose. While seemingly more complex than existing scoring functions, these cross-correlations can 1 A B ϕP ϕL ⋆ ϕP C ESF ⋆ ESF ϕL Figure 1: Overview of the scalar field-based scoring function and docking procedure. The translational FFT procedure is shown here; the rotational FFT is similar, albeit harder to visualize. (A) The protein pocket and ligand conformer are independently passed through equivariant scalar field networks (ESFs) to produce scalar fields. (B) The fields are cross-correlated to produce heatmaps over ligand translations. (C) The ligand coordinates are translated to the argmax of the heatmap. Additional scalar field visualizations are in Appendix C. be rapidly evaluated over a large number of ligand poses simultaneously using Fast Fourier Transforms (FFT) over both the translational space R3 and the rotational space SO(3). This property allows for significant speedups in the optimization over these degrees of freedom. Scalar fields representing molecules in 3D space have been previously parameterized as neural fields (Zhong et al., 2019), i.e., neural networks taking coordinates of a query point as input and producing field values as output. However, our scalar fields must be defined relative to the ligand (or protein) structure—represented as a graph embedded in 3D space—and be SE(3) equivariant in order to yield an invariant scoring function. To satisfy these requirements, we introduce equivariant scalar field networks (ESFs), which parameterize the scalar fields with message passing E3NNs (Thomas et al., 2018; Geiger & Smidt, 2022). Unlike neural fields, these networks yield a compact representation of the scalar field in a single forward pass and are automatically SE(3)-equivariant. Contrasting with existing ML scoring functions, the computational cost of our method can be amortized at several levels of abstraction, significantly accelerating runtimes for optimized workflows. For example, unlike methods that require one neural network forward pass per pose, our network is evaluated once per protein structure or ligand conformer independently. Post-amortization, we attain translational and rotational optimization runtimes as fast as 160 µs and 650 µs, respectively, with FFTs. Such throughputs, when combined with effective sampling and optimization, could make docking of very large compound libraries feasible with only modest resources. Empirically, we evaluate our method on two simplified docking-related tasks: (1) decoy pose scoring and (2) rigid conformer docking. On both tasks, our scoring function is competitive with—but faster than—the scoring functions of Gnina (Ragoza et al., 2017; McNutt et al., 2021) and Vina (Trott & Olson, 2010) on PDBBind crystal structures and is significantly better on ESMFold structures. We then demonstrate the further advantages of runtime amortization on the virtual screening-like setup of the PDE10A test set (Tosstorff et al., 2022), where—since there is only one unique protein structure—our method obtains a 50x speedup in total inference time at no loss of accuracy. To summarize, our contributions are: • We are the first to propose learning protein-ligand scoring functions based on crosscorrelations of scalar fields in order to accelerate pose optimization in molecular docking. • We formulate a neural network parameterization and training procedure for learning equivariant scalar fields for molecules. • We demonstrate that the performance and runtime of our scoring function holds promise when evaluated on docking-related tasks. 2 2 BACKGROUND Molecular docking. The two key components of a molecular docking algorithm are (1) one or more scoring functions for ligand poses, and (2) a search, sampling, or optimization procedure. There is considerable variation in the design of these components and how they interact with each other, ranging from exhaustive enumeration and filtering (Shoichet et al., 1992; Meng et al., 1992) to genetic, gradient-based, or MCMC optimization algorithms (Trott & Olson, 2010; Morris et al., 1998; McNutt et al., 2021). We refer to reviews elsewhere (Ferreira et al., 2015; Torres et al., 2019; Fan et al., 2019) for comprehensive details. These algorithms have undergone decades of development and have been given rise to well-established software packages in academia and industry, such as AutoDock (Morris & Lim-Wilby, 2008), Vina (Trott & Olson, 2010) and Glide (Halgren et al., 2004). In many of these, the scoring function is designed not only to identify the binding pose, but also to predict the binding affinity or activity of the ligand (Su et al., 2018). In this work, however, we focus on learning and evaluating scoring functions for the rapid prediction of binding poses. ML methods in docking. For over a decade, ML methods have been extensively explored to improve scoring functions for already-docked ligand poses, i.e., for prediction of activity and affinity in structural-based virtual screens (Li et al., 2021; Yang et al., 2022; Crampon et al., 2022). On the other hand, developing ML scoring functions as the direct optimization objective has required more care due the enormous number of function evaluations involved. MedusaNet (Jiang et al., 2020) and Gnina (Ragoza et al., 2017; McNutt et al., 2021) proposed to sparsely use CNNs for guidance and re-ranking (respectively) in combination with a traditional scoring function. DeepDock (MéndezLucio et al., 2021) used a hypernetwork to predict complex-specific parameters of a simple statistical potential. Recently, geometric deep learning models have explored entirely different paradigms for docking via direct prediction of the binding pose (Stärk et al., 2022; Zhang et al., 2022; Lu et al., 2022) or via a generative model over ligand poses (Corso et al., 2023). FFT methods in docking. Methods based on fast Fourier transforms have been widely applied for the related problem of protein-protein docking. Katchalski-Katzir et al. (1992) first proposed using FFTs over the translational space R3 to rapidly evaluate poses using scalar fields that encode the shape complementarity of the two proteins. Later works extended this method to rotational degrees of freedom (Ritchie & Kemp, 2000; Ritchie et al., 2008; Padhorny et al., 2016) and additional scoring terms, such as pairwise electrostatic potentials and solvent accessibility (Gabb et al., 1997; Mandell et al., 2001; Chen & Weng, 2002). Today, FFT methods are a routine step in established proteinprotein docking programs such as PIPER (Kozakov et al., 2006), ClusPro (Kozakov et al., 2017), and HDOCK (Yan et al., 2020), where they enable the evaluation of billions of poses, typically as an initial screening step before further evaluation and refinement with a more accurate scoring function. In contrast, FFT methods have been significantly less studied for protein-ligand docking. While a few works have explored this direction (Padhorny et al., 2018; Ding et al., 2020; Nguyen et al., 2018), these algorithms have not been widely adopted nor been incorporated into established docking software. A key limitation is that protein-ligand scoring functions are typically more complicated than protein-protein scoring functions and cannot be easily expressed as a cross-correlation between scalar fields (Ding et al., 2020). To our knowledge, no prior works have explored the possibility of overcoming this limitation by learning cross-correlation based scoring functions. 3 M ETHOD 3.1 E QUIVARIANT S CALAR F IELDS We consider the inputs to a molecular docking problem to be a pair of protein structure and ligand molecule, encoded as a featurized graphs GP , GL , and with the protein structure associated with P 3×NP alpha carbon coordinates XP = [xP . The molecular docking problem is to 1 , . . . xNP ] ∈ R L L L find the ligand atomic coordinates X = [x1 , . . . xNL ] ∈ R3×NL of the true binding pose. To this end, our aim is to parameterize and learn (multi-channel) scalar fields ϕP := ϕ(x; GP , XP ) and ϕL := ϕL (x; GL , XL ) associated with the protein and ligand structures, respectively, such that the scoring function evaluated on any pose XL ∈ R3NL is given by XZ P P L L L 3 E(XP , XL ) = ϕP (1) c (x; G , X )ϕc (x; G , X ) d x c R3 3 where ϕc refers to the cth channel of the scalar field. While neural fields that directly learn functions R3 → R have been previously developed as encodings of molecular structures (Zhong et al., 2019), such a formulation is unsuitable here as the field must be defined relative to the variable-sized structure graphs GP , GL and transform appropriately with rigid-body motions of their coordinates. Instead, we propose to parameterize the scalar field as a sum of contributions from each ligand atom or protein alpha-carbon, where each contribution is defined by its coefficients in a spherical harmonic expansion centered at that atom (or alpha-carbon) coordinate in 3D space. To do so, we choose a set Rj : R+ → R of radial basis functions (e.g., Gaussian RBFs) in 1D and let Ymℓ be the real spherical harmonics. Then we define   X x − xn ϕc (x; G, X) = Acnjℓm (G, X)Rj (∥x − xn ∥)Yℓm (2) ∥x − xn ∥ n,j,ℓ,m where here (and elsewhere) we drop the superscripts L, P for common definitions. Given some constraints on how the vector of coefficients Acnjℓm transforms under SE(3), this parameterization of the scalar field satisfies the following important properties: Proposition 1. Suppose the scoring function is parameterized as in Equation 2 and for any R ∈ P ℓ ℓ SO(3), t ∈ R3 we have Acnjℓm (G, R.X + t) = m′ Dmm ′ (R)Acnjℓm′ (G, X) where D (R) are the (real) Wigner D-matrices, i.e., irreducible representations of SO(3). Then for any g ∈ SE(3), 1. The scalar field transforms equivariantly: ϕc (x; G, g.X) = ϕc (g −1 .x; G, X). 2. The scoring function is invariant: E(g.XP , g.XL ) = E(XP , XL ). See Appendix A for the proof. We choose to parameterize Acnjℓm (G, R.X) with E3NN graph neural networks (Thomas et al., 2018; Geiger & Smidt, 2022), which are specifically designed to satisfy these equivariance properties and produce all coefficients in a single forward pass. The core of our method consists of the training of two such equivariant scalar field networks (ESFs), one for the ligand and one for the protein, which then parameterize their respective scalar fields. While the second property (invariance of the scoring function) is technically the only one required by the problem symmetries, the first property ensures that different ligand poses related by rigid-body transformations can be evaluated via transformations of the scalar field itself (without re-evaluating the neural network) and is thus essential to our method. Next, we show how this parameterization enables ligand poses related by rigid body motions to some reference pose to be rapidly evaluated with fast Fourier transforms (all derivations in Appendix A). There are actually two ways to do so: we can evaluate the score of all poses generated by translations of the reference pose, or via rotations around some fixed point (which we always choose to be the center of mass of ligand). These correspond to FFTs over R3 and SO(3), respectively. 3.2 FFT OVER T RANSLATIONS We first consider the space of poses generated by translations. Given some reference pose XL , the score as a function of the translation is just the cross-correlation of the fields ϕL and ϕP : XZ X P L L 3 P E(X , X + t) = ϕP (ϕL (3) c (x)ϕc (x − t) d x = c ⋆ ϕc )(t) c R3 c where we have dropped the dependence on G, X for cleaner notation and applied Proposition 1. By the convolution theorem, these cross-correlations may be evaluated using Fourier transforms: n  P o 1 P ϕL F −1 F [ϕL (4) c ⋆ ϕc = c ] · F ϕc 3/2 (2π) Hence, in order to simultaneously evaluate all possible translations of the reference pose, we need to compute the Fourier transforms of the protein and ligand scalar fields. One naive way of doing so would be to explicitly evaluate Equation 2 at an evenly-spaced grid of points spanning the structure and then apply a fast Fourier transform. However, this would be too costly, especially during training time. Instead, we observe that the functional form allows us to immediately obtain the Fourier transform via the expansion coefficients Acnjℓm : X X X (−i)ℓ Acnjℓm Hℓ [Rj ](∥k∥)Yℓm (k/∥k∥) (5) F [ϕc ] (k) = e−ik·xn n ℓ m,n 4 where now Yℓm must refer to the complex spherical harmonics and the coefficients must be transformed correspondingly, and r Z ∞ 2 Hℓ [Rj ](k) = jℓ (kr)Rj (r)r2 dr (6) π 0 is the ℓth order spherical Bessel transform of the radial basis functions. Conceptually, this expression corresponds to first evaluating the Fourier transform of each atom’s scalar field in a coordinate system centered at its location, and then translating it via multiplication with e−ik·xn in Fourier space. Importantly, Hℓ [Rj ] and Yℓm can be precomputed and cached at a grid of points independently of any specific structure, such that only the translation terms and expansion coefficients need to be computed for every new example. 3.3 FFT OVER ROTATIONS We next consider the space of poses generated by rotations. Suppose that given some reference pose XL , the protein and ligand scalar fields are both expanded around some common coordinate system origin using the complex spherical harmonics and a set of global radial basis functions Sj (r): X ϕc (x) = Bcjℓm Sj (∥x∥)Yℓm (x/∥x∥) (7) j,ℓ,m We seek to simultaneously evaluate the score of poses generated via rigid rotations of the ligand, which (thanks again to Proposition 1) is given by the rotational cross-correlation XZ L −1 E(XP , R.XL ) = ϕP x) d3 x (8) c (x)ϕc (R c R3 Cross-correlations of this form have been previously studied for rapid alignment of crystallographic densities (Kovacs & Wriggers, 2002) and of signals on the sphere in astrophysics (Wandelt & Górski, 2001). It turns out that they can also be evaluated in terms of Fourier sums: Z X L −1 ℓ ϕP x) d3 x = dℓmh dℓhn Imn ei(mξ+hη+nω) (9) c (x)ϕc (R R3 ℓ,m,h,n where ξ, η, ω are related to the the Euler angles of the rotation R, dℓ is the (constant) Wigner Dmatrix for a rotation of π/2 around the y-axis, and Z ∞ X ℓ P L Imn = Bcjℓm Bckℓn Gjk where Gjk = Sj (r)Sk (r)r2 dr (10) 0 j,k Thus the main task is to compute the complex coefficients Bcjℓm of the ligand and protein scalar fields, respectively. This is not immediate as the fields are defined using expansions in “local” radial and spherical harmonic bases, i.e., with respect to the individual atom positions as opposed to the coordinate system origin. Furthermore, since we cannot (in practice) use a complete set of radial or angular basis functions, it is generally not possible to express the ligand or protein scalar field as defined in Equation 2 using the form in Equation 7. Instead, we propose to find the coefficients Bcjℓm that give the best approximation to the true scalar fields, in the sense of least squared error. Specifically, suppose that R ∈ RNgrid ×Nlocal are the values of Nlocal real local basis functions (i.e., different origins, RBFs, and spherical harmonics) evaluated at Ngrid grid points and A ∈ RNlocal is the vector of coefficients defining the scalar field ϕc . Similarly define S ∈ RNgrid ×Nglobal using the real versions of the global basis functions. We seek to find the least-squares solution B ∈ RNglobal to the overdetermined system of equations RA = SB, which is given by B = (ST S)−1 ST RA (11) Notably, this is simply a linear transformation of the local coefficients Acnjℓm . Thus, if we can precompute the inverse Gram matrix of the global bases (ST S)−1 and the inner product of the global and local bases ST R, then for any new scalar field ϕc the real global coefficients are immediately available via a linear transformation. The desired complex coefficients can then be easily obtained via a change of bases. At first glance, this still appears challenging due to the continuous space of possible atomic or alpha-carbon positions, but an appropriate discretization makes the precomputation relatively inexpensive without a significant loss of fidelity. 5 3.4 T RAINING AND I NFERENCE We now study how the rapid cross-correlation procedures presented thus far are used in training and inference. For a given training example with protein structure XP , the scoring function E(XP , XL ) ⋆ should ideally attain a maximum at the true ligand pose XL = XL . We equate this task to that of learning an energy based the log-likelihood of the true pose under the model  model to maximize  likelihood p(XL ) ∝ exp E(XP , XL ) . However, as is typically the case for energy-based models, directly optimizing this objective is difficult due to the intractable partition function. Instead, following Corso et al. (2023), we conceptually decompose the ligand pose XL to be a tuple XL = (XC , R, t) consisting of a zero-mean conformer XC , a rotation R, and a translation t, from which the pose coordinates are obtained: XL = R.XC + t. Then consider the following conditional log-likelihoods: Z   log p(t | XC , R) = E(XP , XL ) − log exp E(XP , R.XC + t′ ) d3 t′ (12a) 3 ZR   log p(R | XC , t) = E(XP , XL ) − log exp E XP − t, R′ .XC dR′ (12b) SO(3) We observe that these integrands are precisely the cross-correlations in Equations 3 and 8, respectively, and can be quickly evaluated and summed for all values of t′ and R′ using fast Fourier transforms. Thus, the integrals—which are the marginal likelihoods p(XC , R) and p(XC , t)—are tractable and the conditional log-likelihoods can be directly optimized in order to train the neural network. Although neither technically corresponds to the joint log-likelihood of the pose, we find that these training objectives work well in practice and optimize their sum in our training procedure. At inference time, a rigid protein structure XP is given and the high-level task is to score or optimize candidate ligand poses XL . A large variety of possible workflows can be imagined; however, for proof of concept and for our experiments in Section 4 we describe and focus on the following relatively simple inference workflows (presented in greater detail in Appendix B): • Translational FFT (TF). Given a conformer XC , we conduct a grid-based search over R and use FFT to optimize t in order to find the best pose (XC , R, t). To do so, we compute the Fourier coefficients (Equation 5) of the protein XP once and for each possible ligand orientation R.XC . We then use translational cross-correlations (Equation 3) to find the best translation t for each R and return the highest scoring combination. • Rotational FFT (RF). Given a conformer XC , we conduct a grid-based search over t and P use FFT to optimize R. To do so, we compute the global expansion coefficients Bcjℓm of L the protein X − t relative to each possible ligand position t and once for the ligand XC relative to its (zero) center of mass (Equation 11). We then use rotational cross-correlations (Equation 8) to find the best orientation R for each t and return the highest scoring combination. • Translational scoring (TS). Here we instead are given a list of poses (XC , R, t) and wish to score them. Because the values of R nor t may not satisfy a grid structure, we cannot use the FFT methods. Nevertheless, we can compute the (translational) Fourier coefficients of the protein XP and for each unique oriented conformer R.XC of the ligand using Equation 5. We then evaluate XZ L C −ik·t 3 F[ϕP d k (13) E(XP , R.XC + t) = c ](k) · F[ϕc ( · ; R.X )](k) · e c R3 Since the Fourier transform is an orthogonal operator on functional space, this is equal to the real-space cross-correlation. • Rotational scoring (RS). Analogously, we can score a list of poses (XC , R, t) using the global spherical expansions Bcjℓm . We obtain the real expansion coefficients of the protein relative to each t and for each ligand conformer XC using Equation 11. The score for (XC , R, t) is then given by the rotational cross-correlation X P L ℓ E(XP , R.XC + t) = Bcjℓm (XP − t)Bckℓn (XC )Dmn (R)Gjk (14) c,j,k,ℓ,m,n ℓ where Gjk is as defined in Equation 10 and Dmn are the real Wigner D-matrices. 6 Table 1: Typical runtimes of the computations involved in inference-time scoring and optimization procedures, measured on PDBBind with one V100 GPU. The three sets of rows delineate computations that are protein-specific, ligand-specific, or involve both protein and ligand, respectively. Frequency Per protein structure ,→ Per translation Per ligand conformer ,→ Per rotation Per conformer × rotation Per conformer × translation Per pose Computation TF RF TS RS Runtime Coefficients Acnjℓm FFT coefficients Global expansion Bcjℓm ✓ ✓ ✓ ✓ ✓ ✓ 65 ms 7.0 ms 80 ms Coefficients Acnjℓm Global expansion Bcjℓm FFT coefficients ✓ Translational FFT Rotational FFT Translational scoring Rotational scoring ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ 4.3 ms 17 ms 1.6 ms 160 µs 650 µs 1.0 µs 8.2 µs The runtime of these workflows can vary significantly depending on the parameters, i.e., number of proteins, ligands, conformers, rotations, and translations, with amortizations possible at several levels. Table 1 provides a summary of the computations in each workflow, their frequencies, and typical runtimes. We highlight that the RF workflow is well-suited for virtual screening since the precomputations for the protein and ligand translations within a pocket can be amortized across all ligands. Furthermore, if the ligands are drawn from a shared library, their coefficients can also be precomputed independent of any protein, leaving only the rotational FFT as the cost per ligandprotein pair. See Appendix F for further discussion on runtime amortization. 4 E XPERIMENTS We train and test our model on the PDBBind dataset (Liu et al., 2017) with splits as defined by Stärk et al. (2022). We train two variants of our model: ESF and ESF-N, where the latter is trained with rotational and translational noise injected into the examples to increase model robustness. All other model and inference-time hyperparameters are discussed in Appendix E. For the test set, we consider both the co-crystal structures in the PDBBind test split and their counterpart ESMFold complexes as prepared by Corso et al. (2023). We also collect a test set of 77 crystal structures (none of which are in PDBBind) of phosphodiesterase 10A (PDE10A) with different ligands bound to the same pocket (Tosstorff et al., 2022). This industrially-sourced dataset is representative of a real-world use case for molecular docking and benchmarks the benefits of runtime amortization with our approach. To evaluate our method against baselines, we note that a scoring function by itself is not directly comparable to complete docking programs, which also include tightly integrated conformer search, pose clustering, and local refinement algorithms. Here, however, we focus on the development of the scoring function itself independently of these other components. Thus, we consider two simplified settings for evaluating our model: (1) scoring decoy poses with the aim of identifying the best pose among them, and (2) docking rigid conformers to a given pocket, similar to the re-docking setup in Stärk et al. (2022). The first setting focuses on evaluating only the quality of the scoring function itself, whereas the second is a simplified version of a typical docking setting that circumvents some of the confounding factors while still allowing the benchmarking of FFT-accelerated optimization. For the baselines, we select Gnina (McNutt et al., 2021) as the representative traditional docking software, which runs parallel MCMC chains to collect pose candidates that are then refined and re-ranked to produce the final prediction. Among the scoring functions supported by Gnina, we evaluate its namesake CNN (Ragoza et al., 2017) scoring function and the traditional scoring function of Vina (Trott & Olson, 2010). For recent deep learning baselines, we adapt TANKBind (Lu et al., 2022) and DiffDock (Corso et al., 2023) for decoy-pose scoring. Specifically, with DiffDock we use the provided confidence model—which is natively trained as an <2 Å RMSD classifier—to score all poses. With TANKBind, we use the predicted distance map and ground truth conformer interatomic distances to calculate Lgeneration as the pose score. We do not evaluate these methods on pocket-level conformer docking as they cannot be easily adapted for this task. 7 Table 2: Decoy scoring results (median over test complexes). All RMSDs are heavy-atom symmetry aware. The best results from our method (ESF) are underlined if not bolded. † These methods have been adapted for the pose scoring task; see main text. Crystal structures Method ESMFold structures Time per % % Top Top Top Top <2 Å <2 Å AUROC RMSD Rank <2 Å AUROC RMSD Rank <2 Å Pose Complex Vina Gnina 0.93 0.90 0.54 0.59 2 3 91 83 0.86 0.84 2.43 2.19 419 1110 43 46 3.4 ms 13.0 ms 110 s 426 s TANKBind† DiffDock† 0.69 0.96 4.01 0.66 6811 3 10 87 0.64 0.89 4.22 2.01 8538 143 9 50 1.1 µs 62 ms 62 ms 2041 s ESF-TS ESF-RS ESF-N-TS ESF-N-RS 0.87 0.87 0.92 0.92 0.59 0.63 0.69 0.75 3 3 4 5 87 85 81 80 0.82 0.82 0.87 0.87 1.38 1.75 1.64 1.74 24 22 22 26 57 53 54 53 1.0 µs 8.2 µs 1.0 µs 8.2 µs 3.2 s 5.7 s 3.2 s 5.7 s 4.1 S CORING D ECOYS For each PDBBind test complex, we generate 323 − 1 = 32767 decoy poses by sampling 31 translational, rotational, and torsional perturbations to the ground truth pose and considering all their possible combinations. On median, the RMSD of the closest decoy is 0.4 Å, and 1.6% of all poses (n = 526.5) are below 2 Å RMSD (Appendix D.1). We then score all poses using the baselines and with our method in both TS (Equation 13) and RS (Equation 14) modes. The quality of each scoring function is evaluated with the AUROC when used as a <2Å RMSD classifier, the RMSD of the top-ranked pose (Top RMSD), the rank of the lowest-RMSD pose (Top Rank), and the fraction of complexes for which the top-ranked pose is under 2 Å RMSD. As shown in Table 2, our method is competitive with the Gnina and Vina baselines on crystal structures and better on ESMFold structures. This improved robustness is expected since the interaction terms in traditional scoring functions are primarily mediated by sidechain atoms, which are imperfectly predicted by ESMFold, whereas our scalar fields only indirectly depend on the sidechains via residue-level coefficients. Our method matches and outperforms the DiffDock confidence model on crystal and ESMFold structures, respectively, in terms of identifying the best poses, although DiffDock obtains better <2 Å RMSD AUROC. Finally, our method substantially ourperforms TANKBind in all metrics. Overall, the performance of our method is superior in the TS mode relative to RS, likely due to the spatially coarser representation of the scalar fields in the global spherical harmonic expansion (i.e., Equation 7) relative to the grid-based Cartesian expansion. In terms of runtime per pose, our method is faster than Vina by several orders of magnitude, with even greater acceleration compared to the CNN-based Gnina and the E3NN-based DiffDock, both of which require one neural network evaluation for every pose. The runtime improvement per complex is more tempered since the different proteins and ligand in every complex limit the opportunity for amortization. In fact, of the total runtime per complex in Table 2, only 1% (TS) to 5% (RS) is due to the pose scoring itself, with the rest due to preprocessing that must be done for every new protein and ligand independently. Similar to our method, TANKBind requires only a single neural network evaluation per complex and thus has comparable per-pose runtime, but enjoys faster per-complex runtime due to the lack of such preprocessing. 4.2 D OCKING C ONFORMERS We consider the task of pocket-level docking where all methods are given as input the ground-truth conformer in a random orientation. Following common practice (McNutt et al., 2021), we aim to provide 4 Å of translational uncertainty around the true ligand pose in order to define the binding pocket. To do so, we provide Gnina with a bounding box with 4 Å of padding around the true pose, and provide our method with a cube of side length 8 Å as the search space for t (with a random grid offset). For PDE10A, we define the pocket using the pose of the first listed complex (PDB 5SFS) and cross-dock to that protein structure. In all docking runs, we deactivate all torsion angles so that Gnina docks the provided conformer to the pocket. 8 Table 3: Rigid conformer docking results (median over test complexes). All RMSDs are heavyatom symmetry aware. The median RMSD of our method (ESF) is lower-bounded at 0.5–0.6 Å by the resolution of the search grid (Appendix G.2). The runtime is shown as an average per complex, excluding / including pre-computations that can be amortized. PDBBind test Crystal Method % <2 Å Vina Gnina ESF-TF ESF-RF ESF-N-TF ESF-N-RF ESMFold PDE10A Med. RMSD % <2 Å Runtime % <2 Å Med. RMSD Med. RMSD Runtime 79 77 0.32 0.33 24 28 6.1 5.9 20 s 23 s 74 73 0.75 0.77 6.1 s 6.0 s 70 71 72 73 1.13 0.97 1.10 1.00 31 32 46 47 4.6 4.4 2.9 3.0 0.8 s / 8.3 s 0.5 s / 67 s 0.7 s / 8.2 s 0.5 s / 68 s 67 73 64 70 1.20 0.82 1.11 1.00 1.0 s / 7.1 s 0.5 s / 1.5 s 1.0 s / 7.2 s 0.5 s / 1.5 s As shown in Table 3, the baseline scoring functions obtain excellent success rates on the PDBBind crystal structures (79%). Our method is slightly weaker but also obtains high success rates (73%). The performance decrease in terms of Median RMSD is somewhat larger, likely due to the coarse search grid over non-FFT degrees of freedom (Appendix G.2) and the lack of any refinement steps (which are an integral part of Gnina) in our pipeline. On ESMFold structures, however, our method obtains nearly twice the success rate (47% vs 28%) of the baseline scoring functions. Unlike in decoy scoring, noisy training noticeably contributes to the performance on ESMFold structures, and the RF procedure generally outperforms TF, likely due to the relatively finer effective search grid in rotational cross-correlations (Appendix G.2). Because of the nature of the PDBBind workflow, the total runtime is comparable to or slower than the baselines when precomputations are taken into account. However, in terms of the pose optimization itself, our method is significantly faster than the Gnina baselines, despite performing a brute force search over the non-FFT degrees of freedom. While it is also possible to trade-off performance and runtime by changing various Gnina settings from their default values, our method expands the Pareto front currently available with the Gnina pipeline (Appendix G.3; Figure 8). This favorable tradeoff affirms the practical value-add of our method in the context of existing approaches. To more concretely demonstrate the runtime improvements of our method with amortization, we then dock the conformers in the PDE10A dataset. Our method again has similar accuracy to the baselines (Table 3); however, because of the common pocket, all protein-level quantities are computed only once and the total runtime is significantly accelerated. For the RF procedure in particular, the computation of protein global coefficients on the translational grid is by far the most expensive step (Table 9), and the remaining ligand precomputations are very cheap. The amortization of these coefficients leads to a 45x speedup in the overall runtime (67 s → 1.5 s). (The runtime for Gnina is also accelerated, although to a lesser extent, due to the smaller ligand size.) As the number of ligands increases further, the total runtime per complex of our method would further decrease. 5 C ONCLUSION We have proposed a machine-learned based scoring function for accelerating pose optimization in molecular docking. Different from existing scoring functions, the score is defined as a crosscorrelation between scalar fields, which enables the use of FFTs for rapid search and optimization. We have formulated a novel parameterization for such scalar fields with equivariant neural networks, as well as training and inference procedures with opportunities for significant runtime amortization. Our scoring function shows comparable performance but improved runtime on two simplified docking-related tasks relative to standard optimization procedures and scoring functions. Thus, our methodology holds promise when integrated with other components into a full docking pipeline. These integrations may include multi-resolution search, refinement with traditional scoring functions, and architectural adaptations for conformational (i.e., torsional) degrees of freedom—all potential directions of future work. 9 ACKNOWLEDGMENTS We thank Hannes Stärk, Samuel Sledzieski, Ruochi Zhang, Michael Brocidiacono, Gabriele Corso, Xiang Fu, Felix Faltings, Ameya Daigavane, and Mario Geiger for helpful feedback and discussions. This work was supported by the NIH NIGMS under grant #1R35GM141861 and a Department of Energy Computational Science Graduate Fellowship. R EFERENCES Rong Chen and Zhiping Weng. Docking unbound proteins using shape complementarity, desolvation, and electrostatics. Proteins: Structure, Function, and Bioinformatics, 47(3):281–294, 2002. Gabriele Corso, Hannes Stärk, Bowen Jing, Regina Barzilay, and Tommi S Jaakkola. Diffdock: Diffusion steps, twists, and turns for molecular docking. In The Eleventh International Conference on Learning Representations, 2023. Kevin Crampon, Alexis Giorkallos, Myrtille Deldossi, Stéphanie Baud, and Luiz Angelo Steffenel. Machine-learning methods for ligand–protein molecular docking. Drug discovery today, 27(1): 151–164, 2022. Xinqiang Ding, Yujin Wu, Yanming Wang, Jonah Z Vilseck, and Charles L Brooks III. Accelerated cdocker with gpus, parallel simulated annealing, and fast fourier transforms. Journal of chemical theory and computation, 16(6):3910–3919, 2020. Jiyu Fan, Ailing Fu, and Le Zhang. Progress in molecular docking. Quantitative Biology, 7:83–89, 2019. Leonardo G Ferreira, Ricardo N Dos Santos, Glaucius Oliva, and Adriano D Andricopulo. Molecular docking and structure-based drug design strategies. Molecules, 20(7):13384–13421, 2015. Henry A Gabb, Richard M Jackson, and Michael JE Sternberg. Modelling protein docking using shape complementarity, electrostatics and biochemical information. Journal of molecular biology, 272(1):106–120, 1997. Mario Geiger and Tess Smidt. e3nn: Euclidean neural networks. arXiv preprint arXiv:2207.09453, 2022. Thomas A Halgren, Robert B Murphy, Richard A Friesner, Hege S Beard, Leah L Frye, W Thomas Pollard, and Jay L Banks. Glide: a new approach for rapid, accurate docking and scoring. 2. enrichment factors in database screening. Journal of medicinal chemistry, 2004. Huaipan Jiang, Mengran Fan, Jian Wang, Anup Sarma, Shruti Mohanty, Nikolay V Dokholyan, Mehrdad Mahdavi, and Mahmut T Kandemir. Guiding conventional protein–ligand docking software with convolutional neural networks. Journal of Chemical Information and Modeling, 60 (10):4594–4602, 2020. Bowen Jing, Gabriele Corso, Jeffrey Chang, Regina Barzilay, and Tommi Jaakkola. Torsional diffusion for molecular conformer generation. arXiv preprint arXiv:2206.01729, 2022. Ephraim Katchalski-Katzir, Isaac Shariv, Miriam Eisenstein, Asher A Friesem, Claude Aflalo, and Ilya A Vakser. Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques. Proceedings of the National Academy of Sciences, 89 (6):2195–2199, 1992. Julio A Kovacs and Willy Wriggers. Fast rotational matching. Acta Crystallographica Section D: Biological Crystallography, 58(8):1282–1286, 2002. Dima Kozakov, Ryan Brenke, Stephen R Comeau, and Sandor Vajda. Piper: an fft-based protein docking program with pairwise potentials. Proteins: Structure, Function, and Bioinformatics, 65 (2):392–406, 2006. 10 Dima Kozakov, David R Hall, Bing Xia, Kathryn A Porter, Dzmitry Padhorny, Christine Yueh, Dmitri Beglov, and Sandor Vajda. The cluspro web server for protein–protein docking. Nature protocols, 12(2):255–278, 2017. Hongjian Li, Kam-Heung Sze, Gang Lu, and Pedro J Ballester. Machine-learning scoring functions for structure-based virtual screening. Wiley Interdisciplinary Reviews: Computational Molecular Science, 11(1):e1478, 2021. Zhihai Liu, Minyi Su, Li Han, Jie Liu, Qifan Yang, Yan Li, and Renxiao Wang. Forging the basis for developing protein–ligand interaction scoring functions. Accounts of Chemical Research, 50 (2):302–309, 2017. Wei Lu, Qifeng Wu, Jixian Zhang, Jiahua Rao, Chengtao Li, and Shuangjia Zheng. Tankbind: Trigonometry-aware neural networks for drug-protein binding structure prediction. Advances in neural information processing systems, 2022. Jeffrey G Mandell, Victoria A Roberts, Michael E Pique, Vladimir Kotlovyi, Julie C Mitchell, Erik Nelson, Igor Tsigelny, and Lynn F Ten Eyck. Protein docking using continuum electrostatics and geometric fit. Protein engineering, 14(2):105–113, 2001. Andrew T McNutt, Paul Francoeur, Rishal Aggarwal, Tomohide Masuda, Rocco Meli, Matthew Ragoza, Jocelyn Sunseri, and David Ryan Koes. Gnina 1.0: molecular docking with deep learning. Journal of cheminformatics, 13(1):1–20, 2021. Oscar Méndez-Lucio, Mazen Ahmad, Ehecatl Antonio del Rio-Chanona, and Jörg Kurt Wegner. A geometric deep learning approach to predict binding conformations of bioactive molecules. Nature Machine Intelligence, 3(12):1033–1039, 2021. Elaine C Meng, Brian K Shoichet, and Irwin D Kuntz. Automated docking with grid-based energy evaluation. Journal of computational chemistry, 13(4):505–524, 1992. Garrett M Morris and Marguerita Lim-Wilby. Molecular docking. In Molecular modeling of proteins, pp. 365–382. Springer, 2008. Garrett M Morris, David S Goodsell, Robert S Halliday, Ruth Huey, William E Hart, Richard K Belew, and Arthur J Olson. Automated docking using a lamarckian genetic algorithm and an empirical binding free energy function. Journal of computational chemistry, 19(14):1639–1662, 1998. Trung Hai Nguyen, Huan-Xiang Zhou, and David DL Minh. Using the fast fourier transform in binding free energy calculations. Journal of computational chemistry, 39(11):621–636, 2018. Dzmitry Padhorny, Andrey Kazennov, Brandon S Zerbe, Kathryn A Porter, Bing Xia, Scott E Mottarella, Yaroslav Kholodov, David W Ritchie, Sandor Vajda, and Dima Kozakov. Protein– protein docking by fast generalized fourier transforms on 5d rotational manifolds. Proceedings of the National Academy of Sciences, 113(30):E4286–E4293, 2016. Dzmitry Padhorny, David R Hall, Hanieh Mirzaei, Artem B Mamonov, Mohammad Moghadasi, Andrey Alekseenko, Dmitri Beglov, and Dima Kozakov. Protein–ligand docking using fft based sampling: D3r case study. Journal of computer-aided molecular design, 32:225–230, 2018. Rodrigo Quiroga and Marcos A Villarreal. Vinardo: A scoring function based on autodock vina improves scoring, docking, and virtual screening. PloS one, 11(5):e0155183, 2016. Matthew Ragoza, Joshua Hochuli, Elisa Idrobo, Jocelyn Sunseri, and David Ryan Koes. Protein– ligand scoring with convolutional neural networks. Journal of chemical information and modeling, 57(4):942–957, 2017. David W Ritchie and Graham JL Kemp. Protein docking using spherical polar fourier correlations. Proteins: Structure, Function, and Bioinformatics, 39(2):178–194, 2000. David W Ritchie, Dima Kozakov, and Sandor Vajda. Accelerating and focusing protein–protein docking correlations using multi-dimensional rotational fft generating functions. Bioinformatics, 24(17):1865–1873, 2008. 11 Brian K Shoichet, Irwin D Kuntz, and Dale L Bodian. Molecular docking using shape descriptors. Journal of computational chemistry, 13(3):380–397, 1992. Hannes Stärk, Octavian Ganea, Lagnajit Pattanaik, Regina Barzilay, and Tommi Jaakkola. Equibind: Geometric deep learning for drug binding structure prediction. In International Conference on Machine Learning, pp. 20503–20521. PMLR, 2022. Minyi Su, Qifan Yang, Yu Du, Guoqin Feng, Zhihai Liu, Yan Li, and Renxiao Wang. Comparative assessment of scoring functions: the casf-2016 update. Journal of chemical information and modeling, 59(2):895–913, 2018. Nathaniel Thomas, Tess Smidt, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, and Patrick Riley. Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. arXiv preprint, 2018. Benjamin I Tingle, Khanh G Tang, Mar Castanon, John J Gutierrez, Munkhzul Khurelbaatar, Chinzorig Dandarchuluun, Yurii S Moroz, and John J Irwin. Zinc-22– a free multi-billion-scale database of tangible compounds for ligand discovery. Journal of Chemical Information and Modeling, 63(4):1166–1176, 2023. Pedro HM Torres, Ana CR Sodero, Paula Jofily, and Floriano P Silva-Jr. Key topics in molecular docking for drug design. International journal of molecular sciences, 20(18):4574, 2019. Andreas Tosstorff, Markus G Rudolph, Jason C Cole, Michael Reutlinger, Christian Kramer, Hervé Schaffhauser, Agnès Nilly, Alexander Flohr, and Bernd Kuhn. A high quality, industrial data set for binding affinity prediction: performance comparison in different early drug discovery scenarios. Journal of Computer-Aided Molecular Design, 36(10):753–765, 2022. Oleg Trott and Arthur J Olson. Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry, 31(2):455–461, 2010. Benjamin D Wandelt and Krzysztof M Górski. Fast convolution on the sphere. Physical review D, 63(12):123002, 2001. Wikipedia. Hankel transform — Wikipedia, the free encyclopedia, 2023. URL https: //en.wikipedia.org/wiki/Hankel_transform#Fourier_transform_in_ three_dimensions. Yumeng Yan, Huanyu Tao, Jiahua He, and Sheng-You Huang. The hdock server for integrated protein–protein docking. Nature protocols, 15(5):1829–1852, 2020. Chao Yang, Eric Anthony Chen, and Yingkai Zhang. Protein–ligand docking in the machinelearning era. Molecules, 27(14):4568, 2022. Anna Yershova, Swati Jain, Steven M Lavalle, and Julie C Mitchell. Generating uniform incremental grids on so (3) using the hopf fibration. The International journal of robotics research, 29(7):801– 812, 2010. Yangtian Zhang, Huiyu Cai, Chence Shi, Bozitao Zhong, and Jian Tang. E3bind: An end-to-end equivariant network for protein-ligand docking. arXiv preprint arXiv:2210.06069, 2022. Ellen D Zhong, Tristan Bepler, Joseph H Davis, and Bonnie Berger. Reconstructing continuous distributions of 3d protein structure from cryo-em images. In International Conference on Learning Representations, 2019. 12 A M ATHEMATICAL D ETAILS A.1 P ROOF OF P ROPOSITION 1 Proposition 1. Suppose the scoring function is parameterized as in Equation 2 and for any R ∈ P ℓ ℓ SO(3), t ∈ R3 we have Acnjℓm (G, R.X + t) = m′ Dmm ′ (R)Acnjℓm′ (G, X) where D (R) are the (real) Wigner D-matrices, i.e., irreducible representations of SO(3). Then for any g ∈ SE(3), 1. The scalar field transforms equivariantly: ϕc (x; G, g.X) = ϕc (g −1 .x; G, X). 2. The scoring function is invariant: E(g.XP , g.XL ) = E(XP , XL ). Proof. Let the action of g = (R, t) ∈ SE(3) be written as g : x 7→ Rx + t and hence g −1 : x 7→ RT (x − t). We first note that ∥x − g.xn ∥ = ∥g −1 .x − xn ∥ and RT (x − g.xn ) = g −1 .x − xn . Then   X x − g.xn m ϕc (x; G, g.X) = Acnjℓm (G, R.X + t)Rj (∥x − g.xn ∥)Yℓ ∥x − g.xn ∥ n,j,ℓ,m   X X x − g.xn ℓ m = Acnjℓm′ (G, X)Rj (∥x − g.xn ∥) Dmm′ (R)Yℓ ∥x − g.xn ∥ m n,j,ℓ,m′   X ′ RT (x − g.xn ) = Acnjℓm′ (G, X)Rj (∥x − g.xn ∥)Yℓm ∥x − g.xn ∥ n,j,ℓ,m′   X g −1 .x − xn −1 m′ ′ = Acnjℓm (G, X)Rj (∥g .x − xn ∥)Yℓ ∥g −1 .x − xn ∥ ′ n,j,ℓ,m = ϕc (g −1 .x; G, X) Next, E(g.xP , g.xL ) = XZ c = XZ c = P P L L L 3 ϕP c (x; G , g.X )ϕc (x; G , g.X ) d x R3 −1 −1 ϕP x; GP , XP )ϕL x; GL , XL ) d3 x c (g c (g R3 XZ c ′ P P L ′ L L 3 ′ ϕP c (x ; G , X )ϕc (x ; G , X ) d x R3 where the last line has substitution x′ = g −1 x with g volume preserving on R3 . A.2 D ERIVATIONS In this section we describe the derivations for the various equations presented in the main text. We use the following convention for the (one-dimensional) Fourier transform and its inverse: Z 1 F[f ](k) = √ e−ikx f (x) dx (15a) 2π Z 1 F −1 [f ](x) = √ eikx f (k) dk (15b) 2π Equation 5 It is well known (Wikipedia, 2023) that given a function over R3 with complex spherical harmonic expansion X f (r) = fℓ,m (∥r∥)Yℓm (r/∥r∥) (16) ℓ,m its Fourier transform is given by f (k) = X (−i)ℓ Fℓ,m (∥k∥)Yℓm (k/∥k∥) ℓ,m 13 (17) where Z ∞ √ 1 Fℓ,m (k) = √ rfℓ,m (r)Jℓ+1/2 (kr) r dr (18) k 0 with Jℓ the ℓth -orderp Bessel function of the first kind. Relating these to the spherical Bessel functions jℓ via Jℓ+1/2 (x) = 2x/πjℓ (x), we obtain r Z ∞ 2 Fℓ,m (k) = fℓ,m (r)jℓ (kr) r2 dr := Hℓ [fℓ,m ](k) (19) π 0 which is the form of Equation 6. To apply this to our scalar fields, we define the translation operator Tr [f ](x) = f (x − r) and note its composition with the Fourier transform (F ◦ Tr )[f ] = e−ik·r F[f ] (20) We then decompose the form of our scalar fields (Equation 2) into contributions from zero-origin spherical harmonic expansions X ϕc (x) = Txn [ϕcn ](x) (21a) n ϕcn (x) = XX Acnjℓm Rj (∥x∥) Yℓm (x/∥x∥) (21b) j ℓ,m | {z } ϕcnℓm (∥x∥) Hence, the Fourier transform of each contribution is X F[ϕcn ](k/∥k∥) = (−i)ℓ Hℓ [ϕcnℓm ](∥k∥)Yℓm (k/∥k∥) (22) ℓ,m Equation 5 is then obtained via Equation 20 and the linearity of the Fourier and spherical Bessel transforms. Equation 9 We source (with some modifications) the derivation from Kovacs & Wriggers (2002). We consider the cross-correlation Z c(R) = ϕ(x)ψ(R−1 x) d3 x (23) R3 L L which is the same as Equation 8 with ϕ = ϕP c and ψ = ϕc since ϕc is a real field. Expanding in m complex spherical harmonics Yℓ and radial bases Sj : ϕ(x) = X Φjℓm Sj (∥x∥)Yℓm (x/∥x∥) X Ψjℓm Sj (∥x∥)Yℓm (x/∥x∥) (24) [Sj · Sj ′ ](∥x∥)[Yℓm · Yℓn′ ](x/∥x∥) d3 x (25a) ψ(x) = j,ℓ,m j,ℓ,m We then obtain X c(R) = ℓ Dnm ′ (R)Φjℓm Ψj ′ ℓ′ m′ Z R3 j,j ′ ,ℓ,ℓ′ ,m,n,m′ X = ℓ Dnm ′ (R)Φjℓm Ψj ′ ℓ′ m′ j,j ′ ,ℓ,ℓ′ ,m,n,m′ = X ℓ,m,m′ ℓ Dmm ′ (R) X Z ∞ |0 Z [Yℓm · Yℓn′ ](r̂) dr̂ [Sj · Sj ′ ](r) r2 dr S2 {z }| {z } Gjj ′ (25b) δℓℓ′ δmn Φjℓm Ψj ′ ℓm′ Gjj ′ (25c) j,j ′ | {z ℓ Imm ′ } Now to evaluate the complex Wigner D-matrix, we adopt the extrinsic zyz convention for Euler angles (applied right-to-left) and note that any rotation (ϕ, θ, ψ) can be decomposed as R(ϕ, θ, ψ) = Rz (ϕ − π/2)Ry (π/2)Rz (π − θ)Ry (π/2)Rz (ψ − π/2) (26) | {z } | {z } | {z } η ξ ω Next, one can easily check (using the standard spherical harmonics) that the Wigner D-matrix for a ℓ rotation about the z-axis is diagonal and given by Dmn (Rz (ω)) = δmn e−inω . Hence, ℓ Dmn (R(ϕ, θ, ψ)) = e−imξ dℓmh e−hη dℓhn e−iωn (27) ℓ ℓ where d = D (Ry (π/2)) are constant and real. Complex conjugation then gives Equation 9. 14 Equation 12 The conditional likelihood is log p(t | XC , R) = log p(XC , R, t) p(XC , R) (28a) = log p(XC , R, t) − log Z p(XC , R, t′ ) d3 t′ (28b)   exp E(XP , R.XC + t′ ) d3 t′ (28c) R3 = log E(XP , XL ) − log Z R3 Similarly, log p(R | XC , t) = log p(XC , R, t) p(XC , t) (29a) = log p(XC , R, t) − log Z p(XC , R′ , t) dR′ (29b)   exp E(XP , R.XC + t) dR′ (29c) SO(3) = log E(XP , XL ) − log Z SO(3) Finally, we move t to the protein coordinates (invoking the invariance of the score E) to obtain a form consistent with the rotational cross-correlations (Equation 8). Equation 13 Given a pose XL = R.XC + t, we evaluate XZ L C 3 E(XP , R.XC + t) = ϕP c (x)ϕc (x; R.X + t) d x (30) R3 c The functional inner product is equivalent in Fourier space: XZ L C 3 E(XP , R.XC + t) = F[ϕP c ](k) · F[ϕc ( · ; R.X + t)](k) d k (31) R3 c Then with the translation operator T defined previously, C C ϕL c (x; R.X + t) = Tt [ϕ( · ; R.X )](x) (32a) C −ik·t C F[ϕL F[ϕL c ( · ; R.X + t)](k) = e c ( · ; R.X )](k) (32b) We then substitute into Equation 31 to obtain Equation 13. P Equation 14 Given a pose XL = R.XC + t, we assume that the field ϕP c ( · ; X − t) and C L ϕc ( · ; X ) are written in the real global spherical harmonic expansion: X P P ϕP Bcjℓm Sj (∥x∥)Yℓm (x/∥x∥) (33a) c (x; X − t) = j,ℓ,m C ϕL c (x; X ) = X L Bcjℓm Sj (∥x∥)Yℓm (x/∥x∥) (33b) j,ℓ,m Then, analogously to Equation 25, E(XP , R.XC + t) = E(XP − t, R.XC ) XZ P L −1 = ϕP x; XC ) d3 x c (x; X − t)ϕc (R c = (34a) (34b) R3 X ℓ Dmm ′ (R) X P L Bcjℓm Bcj ′ ℓm′ Gjj ′ (34c) j,j ′ c,ℓ,m,m′ Complex conjugation has been omitted because the coefficients and D-functions are now real. 15 B A LGORITHMIC D ETAILS Below, we present in detail the four inference procedures introduced in Section 3.4. The three blocks of computations are color-coded corresponding to protein preprocessing (green), ligand preprocessing (blue), and the core computation (red) and labelled with typical runtimes from Table 1 (unlabelled lines have negligible runtime). The various loop levels make clear that depending on the workflow, the protein and ligand processing precomputations can be amortized and approaches a negligible fraction of the total runtime. Note, however, that for readability we have presented the algorithms assuming that all possible combinations (i.e., of proteins, ligand conformers, rotations, and translations) are of interest; if this is not true (for example in PDBBind, or in any typical pose-scoring setting), then the full benefits of amortization may not be fully realized. Algorithm 1: T RANSLATIONAL FFT L L P Input: Proteins {(GP i , Xi )}, conformers {(Gh , Xh )} P L Output: Docked poses (Xi , Xih ) ∀i, h P foreach (GP // protein preprocessing i , Xi ) do P P P Compute coefficients AP // 65 ms i = {Acjnℓm (Gi , Xi )} with neural network ; P Compute Fourier-space field values F[ϕP ]i using AP , x ; // 7.0 ms i i L foreach (GL // ligand preprocessing h , Xh ) do L L L Compute coefficients AL // 4.3 ms h = {Acjnℓm (Gh , Xh )} with neural network ; foreach Rk ∈ {R}grid ⊂ SO(3) do ℓ Compute rotated coefficients AL h,k using D (Rk ); L L Compute Fourier-space field values F[ϕ ]h,k using AL // 1.6 ms h,k , Rk Xh ; P foreach (GP i , Xi ) do L foreach (Gh , XL h ) do foreach Rk ∈ {R}grid ⊂ SO(3) do L Compute E(XP i , Rk Xh + t), ∀t using FFT; ⋆ ⋆ L Ek , tk ← {max, arg max}t E(XP i , Rk Xh + t) ; k ⋆ ← arg maxk Ek⋆ ; L ⋆ XL ih ← Rk⋆ Xh + tk⋆ ; // pose optimization // 160 µs Algorithm 2: ROTATIONAL FFT P L L Input: Proteins {(GP i , Xi )}, conformers {(Gh , Xh )} P L Output: Docked poses (Xi , Xih ) ∀i, h P foreach (GP // protein preprocessing i , Xi ) do P P P Compute coefficients AP // 65 ms i = {Acjnℓm (Gi , Xi )} with neural network ; for tk ∈ {t}grid ⊂ R3 do P P Compute global expansion BP // 80 ms i,k = {Bcjℓm } from Ai , Xi − tk ; L foreach (GL // ligand preprocessing h , Xh ) do L L L Compute coefficients AL = {A (G , X )} with neural network ; // 4.3 ms h cjnℓm h h L L Compute global expansion BL = {B } from A , X ; // 17 ms cjℓm h h h P foreach (GP // pose optimization i , Xi ) do L foreach (Gh , XL ) do h foreach tk ∈ {t}grid ⊂ R3 do L Compute E(XP // 650 µs i − tk , R.Xh ), ∀R using FFT ; ⋆ ⋆ L Ek , Rk ← {max, arg max}R E(XP − t , R.X + t) ; k i h k ⋆ ← arg maxk Ek⋆ ; ⋆ L XL ih ← Rk⋆ Xh + tk⋆ ; 16 Algorithm 3: T RANSLATIONAL S CORING P L L Input: Proteins {(GP i , Xi )}, conformers {(Gh , Xh )}, rotations {Rk }, translations {tℓ } P L Output: Scores E(Xi , Rk Xh + tℓ ) ∀i, h, k, ℓ P foreach (GP // protein preprocessing i , Xi ) do P P P // 65 ms Compute coefficients AP i = {Acjnℓm (Gi , Xi )} with neural network ; P Compute Fourier-space field values F[ϕP ]i using AP , x ; // 7.0 ms i i L foreach (GL // ligand preprocessing h , Xh ) do L L L Compute coefficients AL = {A (G , X )} with neural network ; // 4.3 ms h cjnℓm h h foreach Rk do ℓ Compute rotated coefficients AL h,k using D (Rk ); L Compute Fourier-space field values F[ϕL ]h,k using AL // 1.6 ms h,k , Rk Xh ; P foreach (GP i , Xi ) do L foreach (GL h , Xh ) do foreach Rk do foreach tℓ do L Compute E(XP i , Rk Xh + tℓ ) using Equation 13; // scoring // 1.0 µs Algorithm 4: ROTATIONAL S CORING P L L Input: Proteins {(GP i , Xi )}, conformers {(Gh , Xh )}, rotations {Rk }, translations {tℓ } P L Output: Scores E(Xi , Rk Xh + tℓ ) ∀i, h, k, ℓ P foreach (GP // protein preprocessing i , Xi ) do P P P Compute coefficients AP = {A (G , X )} with neural network ; // 65 ms i i i cjnℓm 3 for tk ∈ {t}grid ⊂ R do P P Compute global expansion BP // 80 ms i,k = {Bcjℓm } from Ai , Xi − tk ; L foreach (GL // ligand preprocessing h , Xh ) do L L L Compute coefficients AL // 4.3 ms h = {Acjnℓm (Gh , Xh )} with neural network ; L L Compute global expansion BL = {B } from A , X ; // 17 ms cjℓm h h h P foreach (GP i , Xi ) do L foreach (Gh , XL h ) do foreach Rk do foreach tℓ do L Compute E(XP i , Rk Xh + tℓ ) using Equation 14; 17 // scoring // 8.2 µs C L EARNED S CALAR F IELDS 6qqw 6jap 6np2 Figure 2: Visualizations of learned scalar fields. All five channels of the ESF-N learned scalar fields ϕL (top row) and ϕP (bottom row) are shown on the xy-plane passing through the center of mass of the ligand, with a box diameter of 20 Å. Positive values of the field are in blue and negative values in red. At left, the ligand and pocket structures are shown looking down the z-axis. Note that as the fields are only 2D slices, not all 3D features visible in the structures are visible in the fields. 18 6uvp 6oxq 6jsn 6moa Figure 3: Visualizations of learned scalar fields, continued. 19 D E XPERIMENTAL D ETAILS D.1 D ECOY S ET ⋆ Given a zero-mean ground-truth ligand pose XL , we generate 323 − 1 = 32767 decoy poses via the following procedure. • Sample 31 translational pertubations: ti ∼ N (0, I3 ), i = 1 . . . 31 and set t0 = 0, with units in Å. • Sample 31 rotational perturbations: Rj = FromRotvec(rj ), rj ∼ N (0, 0.5I3 ), j = 1 . . . 31 and set R0 = I3 . • Sample 31 noisy conformers XC k , k = 1 . . . 31 by sampling torsional updates ∆τk ∼ NT (0, (π/2)Im ) where NT is a wrapped normal distribution (Jing et al., 2022) and m is the number of torsion angles. The torsional updates are applied to the smaller side of the L⋆ molecule. Set XC . 0 =X ⋆ C L L • Set XL . ijk = Rj Xk + ti , i, j, k = 0 . . . 31 and discard X000 = X PDB ID 6A73 is excluded from the procedure due to the high level of graph symmetry and significant runtime for computing RMSDs for all decoys. Summary statistics for the decoy sets of the remaining 362 PDB IDs are presented in Figure 4. 1.0 0.8 0 5 10 15 20 CDF 0.6 0.4 0.002 0.2 0.000 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0 RMSD 5 10 0 2 15 20 Figure 4: Decoy set statistics. Top left: histogram of RMSDs across all decoys sets (12 M total). Bottom left: histogram of minimum RMSDs among the decoy sets. All sets have a pose less than RMSD <1 Å from the true pose. Right: cumulative density function of RMSDs in each decoy set. Bottom right inset: all decoy sets have at least 23 poses with RMSD <2 Å. D.2 RUNTIME M EASUREMENTS All runtime measurements were performed on a machine with 64 Intel Xeon Gold 6130 CPUs and 8 Nvidia Tesla V100 GPUs. Gnina was run with default thread count settings. All of our processes were run on a single V100 GPU. For our method, we performed runtime analysis using CUDA events to remove the effects of asynchronous CUDA execution. Script loading, model loading, and algorithmic-level precomputations (which, if necessary, can be cached on disk) were excluded from the analysis. For Gnina, we attempted to remove similar overhead by timing single-pose scoringonly runs as representative of constant overhead costs. We report conformer docking runtimes in Table 3 using the PDBBind crystal structures; ESMFold runtimes are marginally shorter. Typical 20 runtimes reported in Table 1 and Appendix B are obtained from timing runs with our method across the entire PDBBind test set. D.3 DATASETS As noted previously, we use train, validation, and test splits from Stärk et al. (2022). However, due to RDKit parsing issues with Gnina-docked poses, the following 30 complexes are excluded (leaving 333 remaining) from all rigid conformer docking comparisons against Gnina: 6HZB, 6E4C, 6PKA, 6E3P, 6OXT, 6OY0, 6HZA, 6E6W, 6OXX, 6HZD, 6K05, 6NRH, 6OXW, 6RTN, 6D3Z, 6HLE, 6PY0, 6OXS, 6E3O, 6HZC, 6Q38, 6E7M, 6OIE, 6D3Y, 6D40, 6UHU, 6CJP, 6E3N, 6Q4Q, 6D3X. Scoring comparisons include all test complexes except 6A73, for which decoy poses could not be generated. We download the 77 PDB IDs provided in Tosstorff et al. (2022) from the PDB to form the PDE10A dataset, keeping the A chain of each assymetric unit and the Ligand of Interest (LOI) interacting with it. We then align all ligands to the crystal structure of 5SFS using the procedure described in Corso et al. (2023) for aligning ESMFold structures, except transforming the ligand rather than the protein. This constitutes the construction of a cross-docking dataset due to the use of the same pocket for all ligands. Due to RDKit parsing errors with the Gnina-docked poses, the following 7 PDB IDs are excluded from all comparisons: 5SFA, 5SED, 5SFO, 5SEV, 5SF9, 5SDX, 5SFC. The remaining 70 ligands are shown superimposed on the 5SFS pocket in Figure 5. Figure 5: PDE10A ligands aligned on 5SFS 21 E H YPERPARAMETERS Model hyperparameters The main model hyperparameters are the aspects of the scalar field parameterization (Equation 2) which dictate the number of coefficients Acnjℓm predicted by the neural network. Intuitively, these determine the expressivity or “intrinsic resolution” of the scalar field which can be parameterized by the ESF neural network. These hyperparameters and their search spaces are detailed in Table 4, with full supporting experimental results in Appendix G.1. Our data featurization and E3NN neural network architecture are largely adapted from the default settings of Corso et al. (2023). Both the protein and ligand neural networks operate on all heavy atom nodes; however, in the protein network only the alpha-carbons nodes emit Acnjℓm coefficients that contribute to the scalar field. For simplicity, we also omit ESM features. Table 4: Training-time model hyperparameters and their search spaces. The parameters corresponding to experiments in the main text are in bold. Parameter Search Space # scalar field channels # radial basis functions RBF cutoff radius Order of spherical harmonics 3, 5, 8 3, 5, 8 3 Å, 5 Å, 8 Å 1, 2, 3 Inference hyperparameters The primary inference-time hyperparameters control the “search resolution” over the FFT and grid-search degrees of freedom. For the grid-search, the resolution is controlled directly by the selection of grid points. For the FFT, the resolution is effectively controlled by frequency domain representation of the scalar field. Specifically, • In the translational case, this resolution is expressed in terms of a sampling resolution which corresponds to the grid of frequency vectors at which the translational Fourier coefficients (Equation 5) are evaluated. To generate these frequency vectors, we use a 40 Å cubical domain with periodic boundary conditions. • In the rotational case, this resolution is given by the maximum order of spherical harmonics in the global spherical basis (Equation 7) used for evaluating rotational crosscorrelations. The search space for these resolution hyperpameters is detailed in Table 5. Increasing the search resolution improves accuracy but at the cost of core optimization runtime. The default settings chosen for the main experiments are based on a qualitative assessment of the best tradeoff. Figure 6 visualizes this tradeoff, with full supporting experimental results in Appendix G.2. Table 5: Inference-time model hyperparameters and their search spaces. The search grid points over SO(3) are generated according to Zhong et al. (2019); Yershova et al. (2010). The translational search grid points are defined by evenly spacing the stated number of points along each dimension of a 8 Å cube. The parameters corresponding to experiments in the main text are in bold. Procedure Parameter Search Space TF Translational signal resolution Rotataional search grid points 0.5 Å, 1 Å 576, 4608 RF Rotational signal order Translational search grid points 10, 25, 50 73 , 93 , 133 22 Rot. grid pts. Translational Optimization (TF) Median RMSD (Å) Ligand preprocessing runtime (ms) Cross-correlation runtime (ms) 4608 1.1 7196 715 576 1.53 1.5 100 122 1 0.5 1 0.5 1 Translational cross-correlation resolution (Å) 0.5 930 928 Trans. grid pts. Rotational Optimization (RF) Median RMSD (Å) Protein preprocessing runtime (seconds) Cross-correlation runtime (ms) 133 1.05 0.9 197 200 994 1430 93 1.16 1.0 0.98 64 67 63 332 476 1486 73 1.25 1.15 1.16 30 30 31 157 224 704 10 25 50 10 25 50 10 Rotational cross-correlation resolution ( max) 25 50 Figure 6: Inference-time hyperparameters and their effect on performance and average runtime. In the TF procedure (top), the protein preprocessing time is roughly constant and is not shown; similarly in the RF procedure (bottom) the ligand preprcessing time is constant. The selected default settings are boxed. All numbers are taken from full results on PDBBind in Appendix G.2 23 F RUNTIME A MORTIZATION In this section we discuss in greater detail the practical implications of runtime amortization for various common workflows in molecular docking, focusing on rigid conformer docking as a proxy task. Conceptually, the total runtime of a docking workflow is given by     # unique # unique ligand Cprot · + Clig + Cpair · (# complexes) (35) proteins conformers where Cprot , Clig , Cpair refer to the protein precomputations, ligand precomputations, and crosscorrelation (i.e., pose optimization) runtimes, respectively. If the number of complexes grows faster than either the number of unique proteins or ligand conformers individually, then the runtime contributions from the latter can be amortized across the complexes—meaning that the unit cost per complex decreases as their number increases. To illustrate this amortization, suppose that we have N protein pockets (for simplicity we assume each is on a seperate protein) and M rigid conformers and wish to dock all M N complexes. Then, following the exposition and typical runtimes provided in Section 3.4, we can compute (Table 6) the total docking cost of this workflow for the TF procedure: (72 ms) · M + (7400 ms) · N + (740 ms) · M N (36) Similarly, the total cost under the RF procedure is: (58400 ms) · M + (21 ms) · N + (470 ms) · M N (37) Note that Cprot ≪ Clig for the TF procedure, whereas Cprot ≫ Clig for the RF procedure. This is because that the grid search over rotations in TF involves a new set of translational Fourier coefficients for each rotation of the ligand, while the protein is kept fixed; on the other hand, the grid search over translations in RF requires a new global spherical expansion Bnjℓm of the protein around each grid point, but only once for the ligand around its center of mass. This discrepancy has significant implications for the suitability of each procedure depending on the behavior of M, N : • In the virtual screening-like setting, we keep a fixed protein (M = 1) and let the number of ligands increase (N → ∞). Hence, the TF procedure is unsuitable, as the cost per ligand (and thus per complex) is a constant ≈8 seconds, approximately on the same order as traditional docking methods. On the other hand, the cost per ligand under the RF procedure begins at nearly ≈60 s with a single ligand, but asymptotically approaches ≈500 ms as the number of ligands increases (Figure 7, left). • In the inverse-screening-like setting, we keep a fixed ligand (N = 1) and let the number of proteins increase (M → ∞). Now the RF procedure is unsuitable, as the cost per protein is a constant ≈60 s, which is far too high to be useful. On the under hand, the cost per protein under the TF procedure begins at ≈8 s and asymptotically approaches ≈800 ms as the number of proteins increases (Figure 7, right). • In the general setting, we divide both Equation 36 and 37 by M N to find that when M ≫ N (i.e., more proteins than ligands), the TF procedure is more efficient, and when M ≪ N (more ligands than proteins), the RF procedure is more efficient. However, both procedures converge to similar per-complex runtimes (740 ms vs 470 ms) in the limit of M, N → ∞. 24 Table 6: Computing the runtime of the TF and RF procedures as a function of the number of proteins M and ligands N . T and R are the number of points used in the grid search over rotations and translations for TF and RF, respectively; they are fixed constants on the order of ≈1000 (we use T = 729, R = 4608 by default). The rows in red contribute the greatest preprocessing runtime. Translational Optimization (TF) Computation Count Runtime Protein scalar field M× Protein FFT coeffs M× Ligand scalar field N× Ligand FFT coeffs RN × Cross-correlation + RM N × = Runtime per complex (s) Rotational Optimization (RF) Computation Count Runtime Protein scalar field M× Protein global coeffs TM× Ligand scalar field N× Ligand global coeffs N× Cross-correlation + TMN× 65 ms 7.0 ms 4.3 ms 1.6 ms 160 µs = (Equation 36) (Equation 37) Virtual screening (M = 1) Inverse screening (N = 1) 101 102 N (# ligands) 101 M (# proteins) 60 50 40 30 20 10 0 0 10 60 TF 50 RF 40 30 20 10 0 103 100 65 ms 80 ms 4.3 ms 17 ms 650 µs 102 Figure 7: Amortized conformer docking runtimes per protein-ligand complex in the virtual screening (left) and inverse screening (right) settings. 25 G A DDITIONAL R ESULTS G.1 M ODEL H YPERPARAMETERS In Table 7, we explore the impact of varying model hyperparameters that determine the expressivity of the scalar field from their default values. We use the ESF-N model and evaluate on the rigid conformer docking task on PDBBind crystal structures. As the results illustrate, increasing the expressivity of the scalar field can bring additional improvements in performance. On the other hand, decreasing the expressivity degrades the performance of our method to the extent that it is no longer competitive with the baselines (Table 3). Table 7: Rigid conformer docking results for varying model hyperparameters. Runtimes are shown averaged over complexes, excluding / including pre-computations that can be amortized. TF RF Method % <2 Å Med. RMSD Runtime % <2 Å Med. RMSD Runtime Baseline # channels = 3 # channels = 8 # RBFs = 3 # RBFs = 8 RBF cutoff = 3 Å RBF cutoff = 8 Å Spherical harmonics ℓmax = 1 Spherical harmonics ℓmax = 3 72 70 71 63 77 72 76 62 80 1.10 1.07 1.15 1.27 1.07 1.16 1.16 1.35 0.99 0.7 s / 8.2 s 0.8 s / 6.5 s 0.9 s / 12 s 0.9 s / 8.0 s 0.8 s / 9.3 s 0.8 s / 8.4 s 0.8 s / 8.4 s 0.9 s / 7.6 s 0.9 s / 10 s 73 71 70 64 76 72 71 62 77 1.00 0.98 1.09 1.13 0.95 1.00 1.08 1.23 0.87 0.5 s / 68 s 0.4 s / 70 s 0.7 s / 71 s 0.5 s / 70 s 0.5 s / 74 s 0.5 s / 68 s 0.5 s / 69 s 0.5 s / 62 s 0.5 s / 81 s G.2 I NFERENCE H YPERPARAMETERS In Tables 8–11 below, we explore the impact of inference-time hyperparameters on the performance and runtime of our method on the rigid conformer docking task. We use the ESF-N model variant and experiment with the PDBBind crystal test set and PDE10A test set. As discussed in Appendix E, for the TF procedure, we adjust (1) the number of grid points over SO(3) (576 or 4608); and (2) the spatial resolution for translational cross-correlation (either 1 Å or 0.5 Å). For the RF procedure, we adjust (1) the number of translational grid points in the 8 Å cube (73 , 93 , or 133 ); and (2) the resolution for the rotational cross-correlation, expressed in terms of the maximal spherical harmonic order (10, 25, or 50). The rows corresponding to results in the main Table 3 are bolded. In all rows, the effective number of poses searched over via both degrees of freedom is computed. To provide an idea of the impact of discretization, we compute the median RMSD of the closest grid point to the ground-truth pose (decomposed into rotational and translational contributions). This serves as a hard lower bound for the median RMSD of the output docked pose. In the TF procedure, increasing the resolution is memory-intensive; thus, the RF procedure is more effective at leveraging FFT to conduct fine-grained search over the accelerated degree of freedom. The default reported performance is attained with a translational offset of 0.4 Å and a rotational offset of 0.16 Å. While performance improves with smaller grid offsets, the returns are rapidly dimishing. The runtime of the method (averaged over 333 PDBBind complexes and 70 PDE10A complexes) is reported and color-coded according to Appendix B: protein preprocessing (green), ligand preprocessing (blue), and the pose optimization (red). The effect of the non-FFT grid resolution is also color-coded, i.e., in TF the explicit enumeration over SO(3) grid points directly scales the ligand preprocessing, whereas in RF the enumeration over R3 scales the protein preprocessing. As the tables show, the preprocessing of these explicit grid points contributes to the majority of the nonamortizeable runtime. In general, the SO(3) grid / ligand preprocessing in TF is less expensive, however, it cannot be amortized when moving from PDBBind to PDE10A (where the ligands are still distinct). On the other hand, the R3 grid / protein preprocessing time in RF is significantly reduced (very roughly on the order of 70-fold, as expected) in PDE10A compared to PDBBind. 26 Table 8: PDBBind TF Grid offset Runtime (ms) Trans. resol. SO(3) grid pts. Effective # poses Tr. Rot. All Med. RMSD % <2Å Prot. prep. Lig. prep. Opt. 1 Å 1 Å 0.5 Å 576 4608 576 420k 3.4M 2.8M 0.52 0.50 0.25 0.84 0.42 0.80 0.98 0.67 0.84 1.53 1.10 1.50 63 72 64 65 72 70 931 7196 928 100 715 123 Table 9: PDBBind RF Grid offset Runtime (ms) Trans. grid pts. SO(3) ℓmax Effective # poses Tr. Rot. All Med. RMSD % <2Å Prot. prep. Lig. prep. Opt. 73 73 73 93 93 93 133 133 10 25 50 10 25 50 10 25 3.2M 45M 353M 6.8M 97M 751M 20M 291M 0.65 0.67 0.65 0.49 0.50 0.51 0.33 0.33 0.38 0.15 0.08 0.36 0.15 0.08 0.37 0.15 0.80 0.70 0.67 0.64 0.53 0.52 0.51 0.37 1.25 1.15 1.16 1.16 1.00 0.98 1.05 0.90 70 69 70 73 73 74 74 72 30k 31k 32k 64k 67k 63k 198k 200k 85 87 85 85 87 84 85 86 158 225 704 333 476 1487 995 1430 Table 10: PDE10A TF Grid offset Runtime (ms) Trans. resol. SO(3) grid pts. Effective # poses Tr. Rot. All Med. RMSD % <2Å 1 Å 1 Å 0.5 Å 0.5 Å 576 4608 576 4608 420k 3.4M 2.8M 23M 0.51 0.50 0.26 0.26 0.88 0.48 0.89 0.44 1.00 0.69 0.93 0.51 1.85 1.11 2.05 1.00 56 64 50 73 Prot. prep. Lig. prep. Opt. 22 21 20 20 761 6159 756 6147 89 736 106 892 Table 11: PDE10A RF Grid offset Runtime (ms) Trans. grid pts. SO(3) ℓmax Effective # poses Tr. Rot. All Med. RMSD % <2Å Prot. prep. Lig. prep. Opt. 73 73 73 93 93 93 133 133 10 25 50 10 25 50 10 25 3.2M 45M 353M 6.8M 97M 751M 20M 291M 0.72 0.57 0.65 0.46 0.48 0.49 0.34 0.33 0.38 0.16 0.08 0.39 0.16 0.09 0.41 0.16 0.83 0.59 0.65 0.63 0.51 0.50 0.55 0.36 1.60 1.21 1.30 1.05 1.00 0.99 1.17 0.96 54 63 64 64 70 64 64 69 476 549 635 1014 946 943 2798 2912 44 42 59 42 43 42 42 45 161 227 718 327 465 1483 986 1469 27 G.3 P ERFORMANCE -RUNTIME T RADEOFF In Figure 8, we further investigate the tradeoff between speed and performance offered by our method compared to Gnina (with the Vina scoring function). While in the main results (Table 3) we run Gnina using all default settings, it is possible to reduce the runtime (and performance) by adjusting these settings. In particular, we explore setting --max_mc_steps and --minimize_iters to 5 independently and in combination. Together with the default runs and the --score_only runs, these trace out a Pareto frontier representing the tradeoff between runtime per complex and <2 Å RMSD success rate. With the default settings, Gnina outperforms all variants of our method on the PDBBind crystal and PDE10A test sets. However, Figure 8 shows that we can reach previously inaccessible regions in the accuracy v.s. runtime tradeoff landscape. % < 2 A RMSD 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.6 0.4 0.5 0.2 0.0 0.0 0 0 2 5 10 15 PDBBind Runtime (s) 4 20 0 2 4 PDE10A Runtime (s) 6 Figure 8: Tradeoff between speed and accuracy using our method compared to Gnina on PDBBind crystal structures (left) and PDE10A (right). In both cases, variants of our method (blue dots) enable possibilities not reachable with Gnina (orange curve). 28