gen.parRep: a first implementation of the Generalized Parallel Replica dynamics for the long time simulation of metastable biochemical systems

Florent Hédin, Tony Lelièvre

CERMICS, École des Ponts–ParisTech, 6 et 8 avenue Blaise Pascal, Cité Descartes – Champs-sur-Marne, F-77455 France.
Inria Centre de Recherche Paris Rocquencourt, 2 rue Simone Iff, F-75589 Paris, France


Metastability is one of the major encountered obstacle when performing long molecular dynamics simulations, and many methods were developed to address this challenge. The “Parallel Replica”(ParRep) dynamics is known for allowing to simulate very long trajectories of metastable Langevin dynamics in the materials science community, but it relies on assumptions that can hardly be transposed to the world of biochemical simulations. The later developed “Generalized ParRep” variant solves those issues, but it was not applied to significant systems of interest so far.

In this article, we present the program gen.parRep, the first publicly available implementation of the Generalized Parallel Replica method (BSD 3-Clause license), targeting frequently encountered metastable biochemical systems, such as conformational equilibria or dissociation of protein–ligand complexes. It will be shown that the resulting C++ implementation exhibits a strong linear scalability, providing up to 70% of the maximum possible speedup on several hundreds of CPUs.

1 Introduction

Molecular dynamics (MD) simulations are nowadays of a common use for simulating large and complex biological or chemical systems [1]: the continuous increase of the available computing power, together with the development of stable and accurate deterministic or stochastic sampling strategies, made possible the emergence of computer based, in silico drug design strategies [2–4]. However, a commonly encountered obstacle while performing MD simulation is the timescale separation between the fastest conformational changes — usually vibrations occurring at the femtoseconds (fs) level — and the slowest one, occurring from the nanosecond (ns) to second (or more) timescale; one may use various coarse grained [5, 6] approaches in order to reach such large simulation time, however this usually implies to sacrifice the accurate description of fast processes, such as non-bonded donor–acceptor interactions, playing a key role in biological interactions [7, 8]. The existence of metastable regions in the configurational space, separated by high potential energy or entropy barriers, is the main origin of this timescale separation, and the simulation time required for observing a transition from such a region to another one can quickly become intractable by the use of direct numerical simulation.

A large amount of methods were developed to address the challenge of metastability in MD simulations. When it is assumed that both the starting and the ending metastable regions (let us denote them as \( A \) and \( B \)) are known, one can consider that most of the methods fall within one of the two following categories: local search methods start from an initial guess path connecting \( A \) and \( B \), and will optimize it until convergence to an optimal path — for example characterized by a minimal potential or free energy profile — and one can cite the nudged elastic band method [9], the string method [10], the max flux approach [11] or the transition path sampling method [12] (which is actually a path sampling method starting from the initial guess, but not an optimization method); the second category consists in global search methods where the ensemble of all the possible paths between \( A \) and \( B \) is sampled without any initial guess, and it includes adaptive multilevel splitting methods [13–16], transition interface sampling [17], forward flux sampling [18] or milestoning techniques [19–23].

A. F. Voter and coworkers also proposed another class of methods, the Accelerated Dynamics methods [24–28]: the Parallel Replica (ParRep) method (and the derived parSplice algorithm) [29–31], the hyperdynamics method [32, 33], and the temperature accelerated dynamics method [34]. They all have in common to be derived from the Transition State theory and kinetic Monte Carlo models, and the aim of these algorithms is to efficiently generate a succession of jump processes between metastable regions, modeled as discrete states.

The ParRep method was later formalized [35], and it was shown that the notion of quasi-stationary distribution (QSD) [36, 37] is the mathematical foundation at its heart: this revealed one of the possible weaknesses of the algorithm, where it is assumed that the (user defined) time required for converging to the QSD is the same for all the metastable regions. While this assumption may be reasonable for materials science, this cannot be transposed to the chemical configurational space, where the large variety of possible interactions and steric exclusions usually results in a rough energy landscape, characterized by both an extremely large number of energy minima, and the presence of super basins of attraction (usually referred to as “funnels”). The Generalized Parallel Replica (Gen. ParRep ) [38] method addresses this issue by estimating during the simulation if convergence to the QSD is obtained; however, while this can possibly extend the range of application of ParRep to any biochemical system which can be studied via MD, no implementation has been designed and released so far.

This article describes the first publicly available Gen. ParRep based MD simulation software, specially targeting metastable biochemical systems; after a description of the methods in Section 2, the novelty of the software implementation is detailed in Section 3; two study cases are later investigated in Section 4, the conformational equilibrium of the alanine dipeptide (subsection 4.1), and the dissociation of the protein–ligand complex FKBP–DMSO (subsection 4.2). It will be shown in both cases that the Gen. ParRep algorithm can be used for accurately sampling the average state-to-state transition time; furthermore evidences that the software exhibits a strong linear scaling will be reported, as it will be shown that when running over hundreds of CPUs one gets speedups of up to 70 % of the maximum possible linear value.

2 Methods

2.1 Langevin dynamics

Let us consider a stochastic process \( X_t = (q_t,p_t)_{t \ge 0} \in \R ^{d \times d} \) (\( \R ^{d \times d} \) representing the phase space), where \( q \) and \( p \) denote the positions and momenta of the \( d/3 \) particles at time \( t \). The stochastic dynamics of \( X_t \) can be described by the Langevin equation:

(1) \begin{equation} \begin {cases} dq_t = M^{-1} p_t \mathop {dt} \\ dp_t = - \nabla V(q_t) \mathop {dt} -\gamma M^{-1} p_t \mathop {dt} + \sqrt {2\gamma \beta ^{-1}} \mathop {dW_t}
\end {cases} \label {eqn:langevinDynamics} \end{equation}

where \( \beta = \frac {1}{k_B T} \) is the inverse temperature, \( M \) is the mass matrix, \( V : \R ^d \rightarrow \R   \) is a function associating to a given configuration \( q \) a potential energy \( V(q) \), \( \gamma > 0 \) is the damping parameter (representing a friction term), and \( W_t \) a \( d \)-dimensional Brownian motion.

In the following we will refer to this stochastic dynamics of \( X_t \), evolving within the phase space \( \R ^{d \times d} \), as the Langevin dynamics.

The Langevin dynamics on the \( d \)-dimensional potential energy surface \( V(q) \) is likely to consist in a succession of “entry then exit” events from wells (or groups of wells) progressively discovered by the process \( X_t \), and one can expect that the time spent within a well before it hops to another one will be far more large that the discretization time step \( dt \): it is therefore necessary to design an alternative approach to the computationally expansive direct simulation in order to address this problem of metastability.

2.2 States and metastability

Let us introduce the ensemble of metastable states \( \mathcal S=\{S_1, ... , S_n\} \). These are typically defined in terms of positions only (and not velocities). In the original ParRep algorithm [29, 31], these states are defined as the basin of attraction of the local minima of \( V \) for the gradient descent \( \dot {q} = - \nabla V(q) \): this leads to a partition of the state space. One important output of the mathematical analysis performed in Ref. [35] is that (i) the metastable states can be defined arbitrarily, the only prerequisite being that for most of the visits in one of those, the exit time will be much larger than the convergence time to the local equilibrium within the state (the so-called Quasi Stationary Distribution), and (ii) the algorithm can be applied even if these metastable states do not define a partition of the state space: in this work we propose to define them as disjoint subsets, using collective variables or reaction coordinates [39, 40], modeled a priori in order to correspond to a few given metastable conformations of the molecular system of interest; the topological definition of the states will be discussed for each system of interest in the Section 4.

Let \( \Omega \in \mathcal {S} \) be a given state: we define

\[   \tau = \inf \set {t\geq 0\mid X_t \notin \Omega }   \]

to be the first exit time from \( \Omega   \) (for a given initial condition \( X_0 \in \Omega   \)), and

\[    X_\tau \in \partial \Omega   \]

to be the corresponding exit configuration (first hitting point on the boundary \( \partial \Omega \)): the goal of the various Parallel Replica (ParRep) [29, 31, 38] based methods (but also of other accelerated dynamics methods) is to sample efficiently the values \( (\tau ,X_\tau ) \) from the unknown exit distribution associated to each state \( \Omega   \).

2.3 Quasi-Stationary Distribution (QSD)

Recent mathematical analyses showed [35] that the quasi-stationary distribution (QSD) [36, 37] is an essential ingredient of the above mentioned accelerated dynamics methods. Let \( \nu   \) be a probability measure with support in \( \Omega   \): \( \nu   \) is a QSD if and only if, for any \( A \subset \Omega   \) and \( t \ge 0 \):

\[   \nu (A) = \prob ^{\nu } \left [ X_t \in A\mid t <\tau \right ]   \]

where \( \prob ^{\nu } \) indicates that the initial condition \( X_0 \) is distributed according to \( \nu   \). This means that \( \nu   \) is a QSD if, for all \( t \), when \( X_0 \) is distributed according to \( \nu   \), the law of \( X_t \) conditionally to the fact that \( (X_s)_{0 \le s \le t} \) remains in the state \( \Omega   \) is still \( \nu         \).

The QSD satisfies the following properties which will be of critical importance (see Refs. [35, 38] for detailed proofs):

2.4 The Generalized ParRep method

Having introduced the concepts of states and QSD, it is now possible to detail the Generalized Parallel Replica [38] (Gen. ParRep ) method. In the following, it is assumed that different metastable states \( \mathcal {S} = \{S_1,...,S_n\} \) are defined, either by partitioning the whole configuration space, or by defining disjoint subsets of \( \R ^d \), and \( \Omega   \) denotes any member of \( \mathcal {S} \). It is also assumed that at least \( N \) CPU cores are available in order to propagate simultaneously \( N \) replicas of the system in parallel.

As stated above, the aim of accelerated dynamics methods is to quickly sample values of \( (\tau ,X_\tau ) \) (respectively the first exit time from a metastable \( \Omega   \) and the first hitting point on the boundary \( \partial \Omega   \)): in the case of ParRep methods, detailed information about how the process evolves within each state \( \Omega   \) is discarded, and in return exit events can be generated \( N \) times faster (a linear speedup is achieved), which is of particular interest when considering computations performed on High Performance Computing (HPC) machines, where thousands of CPUs can be used at once by a single simulation.

Figure 1: Diagram view of the generalized ParRep [38] algorithm. After setup, the first step is to iterate the reference walker until \( X_t \) enters a defined state \( \Omega \) (denoted as Transient propagation, however if states define a partition of the configuration space, this first part of the algorithm is not required); then the Gen. ParRep procedure (right frame) is executed, starting with the Convergence step, until: (i) the reference walker exits before convergence of the G-R statistics is observed, or (ii) convergence of the G-R observables is obtained. In the later case simulation proceeds to the Parallel dynamics step, until an exit event from \( \Omega   \) is observed, generating a sample of \( (\tau ,X_\tau ) \). After the parallel phase (or if \( X_t \) exited \( \Omega   \) before convergence), the reference walker performs once again the Transient propagation procedure, until entering a valid state \( \Omega   \), and the Gen. ParRep procedure is iterated once again. This is repeated until the total simulation time \( \tsim   \) reaches a user defined value of \( t_{\rm max} \), where the program stops. The two frames colored in green are parts of the algorithm fully exploiting the \( N \) available CPU cores.

In the following, let \( \tsim \ge 0 \) be the simulation clock, corresponding to the physical time (i.e. a multiple of the time step \( dt \)), and let \( X^\refe _{\tsim } \) be the configuration of the system at time \( \tsim   \) (where \( \refe \) indicates the reference walker, the first replica). The method is implemented as a three steps procedure, repeated as the process diffuses from one state to another, until a total simulation time \( t_{\rm max} \) is reached (see Figure 1 for a diagram representation):

In the following the Convergence step and Parallel dynamics step will be detailed.

2.4.1 Convergence step

The Fleming-Viot (F-V) process [41, 42] is a branching and interacting particle process, used for simulating the law of the random variable \( X_t \) conditioned to \( \{ \tau > t \} \). As a consequence, an estimate of \( \tfv \) — the F-V convergence time — can be obtained by assessing the convergence to a stationary state of the F-V process, and when this convergence is observed, one obtains samples (approximately) distributed according to the QSD. For a detailed description with illustrations, we refer to the dedicated section from Ref. [38].

Let us first consider \( N \) i.i.d. initial conditions \( X_0^k \) (\( k \in \{1, \ldots ,N\} \)); the procedure summarizes as follows: a reference walker \( X^\refe _{t} \) (namely the replica numbered \( k=1 \)) explores \( \Omega   \) driven by the Langevin Equation (1): at the same time the other replicas (the F-V workers) perform the following tasks:

However, if at any moment the reference walker \( X^\refe   \) leaves \( \Omega \) before the F-V process has converged (convergence will be defined below using the G-R diagnostic), all the F-V walkers replicas are killed, and a new Convergence step will be initiated from a new state (after Transient propagation if required).

The observables are properties of interest which are expected to characterize the convergence to equilibrium of the F-V particle process within each state \( \Omega   \): they can have a physical meaning (e.g. based on the potential \( V \), or the momenta \( p \)), or either be any type of distance/topological measure (for instance derived from the collective variables used for designing the set \( \mathcal {S} \)).

The convergence of the observables is assessed using the Gelman-Rubin (G-R) statistics [43, 44]: let \( \obs : \Omega \to \R   \) be some observable, and let

(2) \{begin}{align} \begin {split} \bar {\obs }^k_{t} &\equiv t^{-1} \int _0^t \obs (X_s^k)~ds \\ \bar {\obs }_{t} &\equiv \frac {1}{N} \sum _{k=1}^N \bar {\obs }^k_{t} =
\frac {1}{N} \sum _{k=1}^N t^{-1} \int _0^t \obs (X_s^k)~ds \end {split} \label {eqn:GR_def_Obs} \{end}{align}

be the average of an observable along each trajectory (\( \bar {\obs }^k_{t} \)) and the average of the observable along all trajectories \( \bar {\obs }_{t} \): the statistic of interest for observable \( \obs   \) is defined as:

(3) \begin{equation} \hat {R}_{t}(\obs ) = \frac {\frac {1}{N}\sum _{k=1}^N t^{-1} \int _0^t(\obs (X_s^k) - \bar {\obs }_{t})^2 ds }{\frac {1}{N} \sum _{k=1}^N t^{-1} \int _0^t (\obs
(X_s^k) -\bar {\obs }_{t}^k )^2 ds } \label {eqn:GR_def_ratio} \end{equation}

Note that \( \hat R_t(\obs ) \geq 1 \), and as the F-V workers’ trajectories explore \( \Omega   \), \( \hat {R}_t(\obs ) \) converges to \( 1 \) as \( t \) goes to infinity.

The time required for the F-V particle process to converge is denoted \( \tfv   \) and defined by:

(4) \begin{equation} \tfv = \inf \set {t\geq 0\mid \hat {R}_t(\obs _j) < 1 + \tol ,~\forall j} \label {eqn:GR_tfv_formula} \end{equation}

i.e. it is the time required for obtaining a ratio \( \hat {R}_{t}(\obs ) \) less than \( 1 + \tol   \) for each of the observable \( \obs   \) (where \( \tol > 0 \) is a user defined stopping criterion) if the reference walker has not left the state \( \Omega       \).

After a successful Convergence step, the simulation clock is updated as follows:

\begin{equation*} \tsim \leftarrow \tsim + \tfv      \end{equation*}

and one proceeds to the Parallel dynamics step; in case the reference walker left \( \Omega   \) before convergence at time \( \tfv   \), the simulation clock time is updated as follows:

\begin{equation*} \tsim \leftarrow \tsim + t_{\refe } \end{equation*}

where \( t_{\refe } \) is the amount of simulation time the reference walker spent within \( \Omega    \) before an exit event was observed, and one proceeds to a new Transient propagation step.

Note that because of the small value of the timestep \( dt \), usually chosen between \( 0.5 \) and \( 2~\rm fs \) (in the later case a set of holonomic constraints is applied to chemical bonds involving hydrogens atoms in order to allow for the larger values to be used), one does not expects to observe large fluctuations of the observables between two consecutive times \( t \) and \( t + dt \): it therefore makes sense to accumulate the values of the observables less frequently, let us say with period \( \tgr   \), satisfying \( dt < \tgr \ll \tfv    \).

Likewise, the test to check whether an exit from \( \Omega   \) occurred is only performed with period \( \tcheck   \), with typically \( \tgr < \tcheck \ll \tfv   \).

2.4.2 Parallel dynamics step

The \( N \) i.i.d. samples obtained after the Convergence step are used as initial conditions; then the \( N \) replicas are propagated under Equation (1).

Let \( t_{\rm para} = 0 \) be the simulation time spent in the Parallel dynamic step, until the first exit event is observed; let \( \tcheck   \) be a simulation time interval (multiple of \( dt \)) at which one tests if an exit event occurred, and let \( M \) counts how many times this test was performed before an exit event occurred; finally let

\begin{equation*} k = \min \argmin _{n \in \{1,...,N\}} t_{\rm para}^n \end{equation*}

be the index of the first replica for which an exit event occurred: then it was shown [45] that the exit time \( \tau   \) can be sampled as:

(5) \begin{equation} \tau = \left [ N(M-1) + k \right ] \tcheck . \label {eqn:ParRep_CorrectedDiscreteTauEscape} \end{equation}

The simulation clock is updated as:

(6) \begin{equation} \tsim \leftarrow \tsim + \tau \label {eqn:ParRep_ClockUpdate} \end{equation}

A new Transient propagation can therefore be initiated, using as new initial condition the newfound exit point \( X_\tau   \).

2.4.3 Differences from the original ParRep algorithm

The Gen. ParRep algorithms differs from the original ParRep algorithm (as described in Refs. [29, 30]) on several points:

Those two limitations are addressed by the implementation of the Gen. ParRep algorithm described in this article: while permitted, partition of the configuration space is not enforced, and the user has total control on how to define the states; this allows for instance to merge multiple local minima together in order to define a metastable state accurately englobing a funnel of the PES.

Furthermore the use of the F-V particle process during the Convergence step releases the user from providing a priori estimates of the time required for converging to the QSD, as \( \tfv \) is estimated on the fly based on the convergence of the observables, the only requirements being to provide meaningful observables and a tolerance level.

3 Software implementation

In the following section 4 we will present results obtained with our current implementation of the Generalized ParRep algorithm: it consists in a newly written C++ program, gen.parRep, available free of charge (see and released under an open-source BSD 3-clause licensing. We aimed at providing an easy to use, versatile and performance oriented implementation, focusing on the study of metastability encountered when studying chemical and biochemical systems. Note that while the original ParRep method is also implemented and available in our new software, we will not present any result for it, as we focus on the novelty of Gen. ParRep .

In the following paragraphs, the critical requirements for developing such a code are detailed, together with details on the technical solutions adopted in order to address them.

3.1 Distributed computing capabilities

The replica-based approach of the ParRep algorithms naturally suggests that the parallelization is achieved by using a distributed computing approach: an obvious choice nowadays is to use the Message Passing Interface (MPI) [46] standardized protocol, for which various high performance computing (HPC) implementations are available [47, 48].

Each of the \( N \) replica corresponds to a MPI task: each task will use \( P \) CPU cores, \( P \) usually being at least \( 1 \) and at most all the cores available on a given machine (a MPI node). Therefore each computing node will execute \( 1 \) or more replicas, each performing the dynamics on \( P \) cores.

Regularly, messages of arbitrary size are exchanged between the replicas, which can be classified in two categories:

Furthermore, communications can either be blocking or non-blocking, the later allowing the developer to interleave communications and computations in order to hide latency. To provide an efficient Gen. ParRep implementation, the use of barriers and collective communications have been reduced to the minimum possible, and non-blocking variants of those were used whenever possible.

3.2 MD engine

One requires an efficient Molecular Dynamics (MD) engine, capable of performing the dynamics of Equation (1): the minimal requirement is to have access to one code block which, when executed, will realize one or more discretization steps of size \( dt \), and which internally takes care of the evaluation of the potential \( V(q) \) and its gradient (usually analytically calculated). A read and write access to the internal configuration \( X_t = (q_t,p_t) \) of each replica is also required for performing the exchanges.

In order to study large systems, one also expects: full support of commonly used force-fields, availability of modern optimizations such as the Particle Mesh Ewald [49], Reaction Field [50, 51], or Cell-Linked Lists [52–55] methods, for an efficient evaluation of non-bonded interactions. As mentioned in the previous paragraph one can decide to provide \( P \ge 1 \) CPU cores to each of the \( N \) replica, therefore a shared memory parallelization capability for the MD engine is encouraged.

For the current implementation it was decided to use the OpenMM 7 library; [56] OpenMM is a high performance, free of charge and open source toolkit for performing molecular simulations, which can be used either as a software library on which to build a program, or directly as an application (via python scripting): the later is used for preparing the molecular systems before simulation, accepting force-field and configuration files from various origins (CHARMM [57], AMBER [58], GROMACS [59],NAMD [60],...), while the library mode provides a direct and simplified access to the MD engine from within the C++ application.

3.3 Definition of the states \( \mathcal {S} \)

While technical aspects as parallelization and efficiency of the MD engine are important, the Generalized ParRep critically relies on an efficient definition of the set of states \( \mathcal {S} \). As stated in subsection 2.2 this implementation focuses on applications where the states are a priori defined using either atomic coordinates or more elaborated collective variables: it is thus necessary to provide a way to define the states online, e.g. using a scripting language interfaced with the core C++ methods in order to have access to atomics properties.

It was decided to use the Lua [61] language: it is a fast, lightweight, easy to learn, embeddable and dynamically typed scripting language. The user input required for running the ParRep algorithms is written to an input Lua file, together with all the code and variables for (i) defining the states, (ii) checking if an exit event is observed, and (iii) monitor the G-R statistics. The Sol2 [62] library (embedded within the C++ program’s source code) takes care of parsing the input file at initialization, and it dynamically maps the user-defined code to C++ functions. The core code is therefore state agnostic as it never exactly knows how a state has been defined: indeed the whole implementation will only call the following: (i) a function returning a true/false boolean value indicating if \( (X_t \notin \mathcal {S}) \) (always true in case of a partitioned configuration space, possibly false otherwise); (ii) another function returning a true/false boolean value indicating if \( (X_t \notin \Omega ) \) where \( \Omega \) is the last visited state, being called every time it is required to check if an exit event occurred; (iii) and a few functions (one per user defined observable) monitoring the G-R observables \( \obs \) returning a real value to be accumulated and used in Equations (2), (3), and (4). Figure 2 exemplifies the Lua code checking if an exit event has occurred.

Figure 2: Example of a Lua function written by the user within the input file (corresponding to the validation system presented in subsection 4.2 below), and called from the C++ program. This function is called every time the algorithm checks whether an exit event from the current metastable state \( \Omega \) happened, either during the Convergence Step or the Parallel dynamics step. The variables in- dex_dist_1 and index_dist_2 are simply tables of atomic indexes defined earlier in the input file, corresponding to atomic indexes used in the state definition (see Figure 9 (b) below). Functions get_coordinates(sele) and get_COM_idxs(sele) are bindings to the C++ code which respectively retrieve atomic coordinates, and calculate the center of mass, for a given atomic indexes selection sele: they provide to the user (together with a few others documented in the test input files) enough flexibility for defining the metastable states accurately and without requiring any modification of the compiled code.

For further increased performance it is possible to use the LuaJIT implementation [63] where the Lua code is compiled to machine code during parsing: this allows performance close to what would be obtained by defining the states based on compiled code

Finally, the Lua layer can act as an intermediate proxy between the C++ Gen. ParRep code and any other external library, providing the possibility to define states and observables using external software pieces: one can for example imagine to use tools such as Colvars [39] or PLUMED [40], providing access to an extensive ready to use collection of collective variables definitions.

3.4 On the choice of \( \tgr   \) and \( \tcheck    \)

As previously mentioned in subsection 2.4 it is not necessary to check at each integration of \( t \) if \( (X_t \notin \Omega ) \) (parallel step) or \( (X_t^\refe \notin \Omega ) \) (F-V step) as one expects that the exit time is much larger than \( dt \).

And likewise, while it is important to regularly gather the value of the G-R observables \( \obs \) in order to obtain convergence of Equation (3), it is expected that they will not differ that much between time \( t \) and \( t + dt \): hence it is interesting to choose \( \tgr > dt \).

While the values should be fine tuned for each system, based on our experience we ended up with the following rule of thumb: one can take \( \tgr   \) to be \( 5 \) to \( 100 \) times the value of \( dt \), and \( \tcheck   \) to be \( 500 \) to \( 2000 \) times \( dt \). This should be adjusted depending on: (i) the amount of calculations involved in the process of defining the state and the G-R observables in the input script, and (ii) the size of the system; for a large solvated protein, if the states and observables only involve distance measures on a few atoms, then the time required for performing the dynamics will be comparatively much larger and relatively small values of \( \tgr \approx 10 \) and \( \tcheck \approx 250 \) can be selected; however, if it involves tracking the length of several dozens of hydrogen distances, or counting native contacts, a wise approach would be to choose \( \tgr \approx 50 \) and \( \tcheck \approx 1000 \).

Furthermore, it should be emphasized that while Equation (5) is mathematically valid for any values of \( \tcheck   \) (in the sense that it indeed samples the exit time of the sub-sampled Markov chain \( (X_{k \tcheck })_{k\in \mathbb {N}} \)), if \( \tcheck   \) is taken too large, one may miss an exit event if the process re-enters the same state \( \Omega   \) during the time interval \( t \rightarrow t + \tcheck \); however one may argue that such cases may denote a poor definition of the states, and that for states exhibiting strong metastability this should not be an issue.

4 Results and discussion

Now that both the algorithm and the software implementation of the Generalized ParRep (in the following denoted as “Gen. ParRep ”) have been extensively discussed, let us consider two applications: the first validates the implementation and consists in a study of the conformational equilibrium of alanine dipeptide (subsection 4.1), while the second investigates the dissociation of the FKBP–DMSO protein–ligand complex (subsection 4.2).

In the following, when reporting estimated values of the average exit time \( \mathbb {E}(\tau ) \) from a metastable state we will consider the sample average \( \bar {\tau } \) over \( n \) samples \( \{\tau _1,...,\tau _n\} \) as

\[   \bar {\tau } = \dfrac {1}{n} \sum _{i=1}^{n} \tau _i . \]

Furthermore the \( 1-\alpha    \) confidence interval for those (close to) exponentially distributed samples is:

\[   \frac {2n\bar {\tau }}{\chi ^2_{1-\frac {\alpha }{2},2n}} < \mathbb {E}(\tau ) < \frac {2n\bar {\tau }}{\chi ^2_{\frac {\alpha }{2},2n}}   \]

where \( \chi ^2_{q,\nu } \) is the value of the quantile function of the \( \chi ^2 \) distribution with \( \nu   \) degrees of freedom at level \( q \); in the following we chose \( \alpha = 0.05 \) and therefore report the \( 95 \% \) confidence interval.

4.1 Conformational equilibrium of the alanine dipeptide

The blocked alanine dipeptide (Ac-Ala-N-H-Me) has been used as a validation system for computational studies of conformational equilibria, and energy landscape reconstruction and analysis [64–71]. The dipeptide contains several notable structural features, including the two \( (\phi ,\psi ) \) dihedral angles, NH- and CO-groups capable of H-bond formation, and a methyl group attached to the \( C_\alpha \) atom. One suitable way to visualize the conformations and the transitions between them is to draw an energy surface as a Ramachandran plot [72]: when studied in vacuo the following two metastable states are clearly identified: (i) \( C_{7 \rm eq} \) for \( (\phi ,\psi ) \sim (\ang {-75},\ang {100}) \), and (ii) \( C_{7 \rm ax} \) for \( (\phi ,\psi ) \sim (\ang {60},\ang {-60}) \).

In the following we estimate the mean first passage time \( \tauala \) between the two metastable states, using the Gen. ParRep algorithm; accuracy of the method is compared to one long serial Langevin dynamics, as the low complexity of this system allows direct numerical simulation of numerous transition events; finally the influence of some of the Gen. ParRep parameters is also evaluated.

4.1.1 MD setup

The initial configuration of the dipeptide is \( (\phi ,\psi ) = (\ang {-81.0},\ang {70.0}) \), i.e. within the most populated area of the \( C_{7 \rm eq} \) state (see the yellow mark on Figure 3 together with a representation of the corresponding conformation). The OpenMM system was configured as follows: the CHARMM 22 all-atoms for proteins and lipids force-field including CMAP corrections [73, 74] was used; dynamics was performed using a Langevin integrator (time-step of \( dt = 2 \rm ~fs \), friction of \( \gamma = 2 \rm ~ps^{-1} \)), thermostated at a temperature of \( T = 300 \rm ~K \); the non-bonded interactions were evaluated using a non-periodic cutoff scheme up to a distance of \( 1.6 \rm ~nm \); and bonds involving hydrogens are constrained to a value of \( \pm 10^{-3}~\% \) of their original distance.

4.1.2 Gen. ParRep setup

The procedure for defining the states may have to be adapted for each force-field and in the following we assume the use of the aforementioned CHARMM22 FF. Figure 3 is a Ramachandran plot based free energy surface built from preliminary serial Langevin MD simulation: it illustrates how the ParRep states were a priori defined.

One can see that in the upper left quarter of the plot two close stable conformations indeed coexist, separated by a low energetic barrier of \( 1 \) to \( 2 \) kcal/mol, which is only a few times the product \( k_BT \): hence it was decided to combine those two minima together, as they do not constitute alone a valid metastable target for applying the ParRep method (Refs. [65, 69, 71] indeed suggest that the transition between those two wells is of \( 2.7 \), \( 3.0 \) and \( 4.05~\rm ps \), respectively). Therefore the conformational equilibrium of the dipeptide is modeled using a two states definition:

Therefore this setup corresponds to a two states partition of the configuration space projected onto a Ramachandran plot.

Figure 3: Alanine dipeptide : definition of ParRep domains based on a free energy surface (color coded, in kcal/mol, dashed oblique lines correspond to unsampled areas), constructed from the long MD reference simulation. The red rectangle corresponding to \( \phi \in [0;120]~\text {and}~\psi \in [-170;0] \) is used as a threshold defining the \( C_{7 \rm ax} \) state (see subsection 4.1 for de- tails). The yellow cross corresponds to the starting configuration for either Gen. ParRep or serial MD simulations.

Concerning the Fleming-Viot procedure, four Gelman-Rubin observables \( \obs   \) are considered for tracking the convergence to the QSD: the total potential energy \( V(q) \), the kinetic energy \( K(p) = \frac {1}{2} p^T M^{-1} p \), and the value of the \( \phi   \) and \( \psi   \) dihedral angles acting here as collective variables. The tolerance criterion \( \tol   \) is fixed per simulation to a given value which is the same for each observable (the influence of \( \tol   \) is investigated below for a range of values). The value of \( \tgr   \) (accumulation of observables) was set to \( 10 \times dt \) (i.e. \( 20~\rm fs \)) as the observables are not computationally expansive to calculate, and the test \( (X_t^\refe \notin \Omega ) \) is performed at \( \tcheck = 250 \times dt \) (i.e. \( 0.5~\rm ps \)) during the Convergence step, but at \( \tcheck = 2500 \times dt \) during the Parallel dynamics step, which corresponds to \( 5~\rm ps \), in order to maximize the CPU time spent in the Langevin dynamics.

4.1.3 Discussion

In the following the distribution of the Gen. ParRep sampled values \( \tauala \) are compared to results obtained when performing a long reference dynamics (denoted as reference MD in the following), consisting in a Langevin dynamics simulation of a total length of \( 162~\mathrm {\mu s} \). As a result, \( 533 \) \( \tau _{C_{7 \rm eq} \rightarrow C_{7 \rm ax}} \) events were sampled, and \( \mathbb {E}(\tau _{\rm MD ref}) = 304.47~\rm ns \) is obtained, while the confidence interval is \( 280.19~\text {ns} < \mathbb {E}(\tau _{\rm MD ref}) < 332.07~\text {ns} \) (see Table 1 for a summary); in Ref. [69] the authors estimate \( \tauala   \) to be of \( 353 \rm ~ns \) (no provided error estimate), in agreement with this value.

First, the possibility to obtain an accurate estimate of \( \tauala   \) by generating a relatively small number \( n \) of samples is investigated: the number of Gen. ParRep replicas was set to \( N = 224 \), and the number \( n \) of samples of \( \tauala   \) generated was \( \{31,39,40\} \) for respective tolerance levels of \( \tol = \{0.01,0.025,0.05\} \) (red, green and blue solid lines; less samples were collected for \( \tol = 0.01 \) because of the higher computational effort required for lower tolerance values).

The convergence to the MD reference (black lines) can be visualized on Figure 4 where \( \mathbb {E}(\tauala ) \) (solid lines) and the corresponding confidence interval (dashed lines) are given for the three ParRep samples, when considering only the subset of the first \( m \) sampled values; the red line (\( \tol = 0.01 \)) quickly converges to the same distribution than the MD reference, while higher values of \( \tol   \) appear to converge to a different distribution underestimating \( \mathbb {E}(\tau ) \) (see also numerical values in Table 1): this suggests that convergence to the QSD is only obtained for a value of \( \tol = 0.01 \) when studying the \( C_{7 \rm eq} \rightarrow C_{7 \rm ax} \) transition.

Figure 4: Convergence of \( \mathbb {E}(\tauala ) \) (plain lines) to the MD reference (black lines), for Gen. Par- Rep sampled values (red, green and blue lines), when considering only the first \( m \) of the \( n = \{31,39,40\} \) samples (abscissa), for \( \tol   \) levels of respectively \( \{0.01,0.025,0.05\} \). Dashed lines correspond to the \( 95 \% \) confidence interval; The number of Gen. ParRep replicas was fixed at \( N = 224 \).

Now that a value of \( \tol = 0.01 \) appears to be accurate enough, one can collect more samples \( n \) in order to verify that the distribution of the exit times converges to the one obtained with the reference MD simulation. In Figure 5, the distribution of \( n = 350 \) samples generated for a level of \( \tol = 0.01 \) and using \( N = 224 \) is illustrated: this was done by building an empirical complementary cumulative distribution function (in the following referred to as \( ccdf \)) using the \( n \) samples, and it provides an estimate of the probability that \( \tauala   \) is higher than a given value \( t \) (i.e. \( \mathbb {P}(\tauala > t) \)), with by definition \( \mathbb {P}(\tauala > 0) = 1 \)). One can see from Figure 5 that the Gen. ParRep and MD distributions are in really good agreement for \( t \in [0;1500]~\rm ns \), where a quasi linear function \( t \mapsto \ln \mathbb {P}(\tauala > t) \)(i.e. exponential law) is observed (we ignore the area for \( t > 1500~\rm ns \), i.e. the low probability tail of the distribution corresponding to large values of \( \tauala \), where the MD simulation lacks samples for performing a meaningful comparison). This observation is confirmed by looking at Figure 6: the convergence of Gen. ParRep samples to the MD reference is observed, both for the mean value and the confidence interval, for increasing values of \( n \).

Figure 5: Distribution of Gen. ParRep sampled values of \( \tauala   \) for \( \tol = 0.01 \), for \( N = 224 \), and a number of \( \tau   \) values generated \( n = 350 \).

Figure 6: Convergence of \( \mathbb {E}(\tauala ) \) (solid lines) for Gen. ParRep sampled values from Figure 5, when considering only the first \( m \) sampled values among \( n = 350 \) (abscissa). Dashed lines correspond to the \( 95 \% \) confidence interval.

4.1.4 Distribution of the F-V estimated value \( \tfv      \)

One last interesting quantity to collect is the average time necessary for the convergence of the observables \( \obs   \); as detailed in subsection 2.4, this corresponds to the value of \( \tfv \) (see Equation (4)). Figure 7 provides the histogram distribution of \( \tfv \) for two of the datasets from Figure 4, together with a Kernel Density Estimate smoothing [75, 76] (dashed lines). For a tolerance of \( 0.05 \) one observes that \( \tfv \approx 40 \rm ~ps \) and appears to follow a normal distribution; for \( \tol = 0.01 \) it seems that the distribution is bimodal, with a major mode at \( \tfv \approx 180 \rm ~ps \) and a minor mode at \( \tfv \approx 240 \rm ~ps \). While it is difficult to argue how the distribution of \( \tfv   \) ideally looks like, one should remember that the \( C_{7 \rm eq} \) is defined as the large funnel on the left (\( \phi < \ang {0} \)) side of Figure 3, and that therefore it encompasses the whole range of the possible \( \psi   \) values, meaning that \( \psi   \) will be the slowest observable to converge; it is thus expected that the value of \( \tfv   \) will be large for conservative tolerance levels (\( \tol \to 0 \)), and that depending on how the F-V workers randomly diffused on the \( (\phi ,\psi ) \) surface, its distribution will be broad, possibly multimodal. Hence the dispersion for \( \tol = 0.01 \) in Figure 7 appears coherent, while the homogeneous distribution for \( \tol = 0.05 \) probably indicates that the F-V workers did not diffuse far enough from their starting point in \( C_{7 \rm eq} \): they are therefore still distant from what the QSD would be, and this explains why \( \mathbb {E}(\tauala ) \) never converged to the result obtained by direct numerical simulation when \( \tol = 0.05 \).

Figure 7 emphasizes one of the main advantages of the Gen. ParRep algorithm versus the original method, i.e. the fact that \( \tfv \) is calculated on the fly, and thus adjusted to the initial condition within the state, whereas the original algorithm required a fixed user defined value after which it was assumed that the QSD was reached: indeed, one can see that for \( \tol = 0.01 \) (which seems necessary to be sufficiently close to the exact QSD), the distribution of \( \tfv   \) is spread over an interval going from \( 120 \) to \( 300~\rm ps \); it is therefore obvious that choosing a priori a decorrelation time of \( 120 \rm ~ps \) would result in a bias as this value appears to be far below the time it takes to reach the QSD for some initial conditions; and on the contrary choosing a decorrelation time of \( 300 \rm ~ps \) would ensure quasi-convergence to the QSD for most of the initial conditions, but at the cost of an unnecessary long (and then costly) decorrelation step for some of the initial conditions.

Figure 7: Histogram distribution (vertical lines) of \( \tfv   \), obtained for two different tolerance levels of \( \tol = \{0.01,0.05\} \) (two datasets from Figure 4): \( \tfv \) corresponds to the simulation time before one assumes that the samples are dis- tributed according to the QSD (see subsection 2.4). The dashed lines correspond to a Kernel Density Estimation [75, 76] smoothing.

Table 1: Summary of the estimated value of \( \mathbb {E}(\tauala ) \) and of the corresponding \( 95 \% \) confidence interval for data presented in Figures 4 to 6. The Gen. ParRep results (\( N = 224 \)) appear to converge accurately for a value of \( \tol = 0.01 \).

Method \( n \) \( N \) \( \tol \) \( \mathbb {E}(\tau ) \) (ns) Confidence interval (ns)
MD ref. \( 533 \) \( \mathbf {304.47} \) \( (280.19,332.07) \)
Gen. ParRep \( 40 \) \( 224 \) \( 0.05 \) \( 248.72 \) \( (186.60,348.14) \)
Gen. ParRep \( 39 \) \( 224 \) \( 0.025 \) \( 257.37 \) \( (192.45,361.94) \)
Gen. ParRep \( 31 \) \( 224 \) \( 0.01 \) \( 321.26 \) \( (232.54,472.83) \)
Gen. ParRep \( 350 \) \( 224 \) \( 0.01 \) \( \mathbf {304.22} \) \( (274.70,338.78) \)
4.1.5 Performance

The last point to discuss concerns the performance of the Gen. ParRep method and particularly the speedup compared with the reference serial Langevin dynamics.

In Table 2 benchmarking data is reported for the simulations from Figure 4; the fifth column reports the calculated effective speedup which is compared to the maximum possible speedup \( N = 224 \); the sixth column reports the ratio between the effective speedup (see Table’s caption for methodology) and the maximum possible speedup (hence a value of \( 100 \% \) would indicate a perfect linear speedup).

One can see that for a large tolerance of \( 0.05 \) a value of \( 189 \) is obtained, i.e. \( 84~\% \) of the maximum possible value; and for a more conservative tolerance criterion of \( 0.01 \) this falls to \( 156 \) i.e. \( 70~\% \) of N: this illustrates the cost of an accurate convergence step which, as seen in the previous section, is the key for obtaining accurate results.

Considering the reduced size of the system (\( 22 \) atoms) and the fact that during the F-V procedure the MD engine code is interrupted every \( 10 \times dt \) for collecting the value of the G-R observables, this speedup is an impressive result; although slightly higher values may be obtained by tunning further the values of \( \tgr   \) and \( \tcheck \), there is probably little space for optimization for such a small test-case system: therefore a more detailed performance analysis will be performed in the next subsection for the protein–ligand system.

Table 2: Benchmarking data for the three datasets from Figure 4 (\( N = 224 \), \( n = \{31,39,40\} \) and \( \tol = \{0.01,0.025,0.05\} \)). Each replica runs on \( P = 1 \) CPU cores. The wall-clock time (column \( 2 \)) is taken as the time elapsed from the beginning to end of the execution of the program, it includes both computations and communications time. The speed (ns/day, column \( 4 \)) is obtained by dividing the total simulation clock \( \tsim   \) (column \( 3 \)) by the value of the wall-clock time (in days). The effective speedup (column \( 5 \)) corresponds to column \( 4 \) divided by the performance of a serial MD reference (evaluated as \( 921 \) ns/day by independent tests on the same architecture). The ratio between the effective and maximum possible speedup (by definition equal to \( N = 224 \)) is given as a percentage in column \( 6 \): a theoretical value of \( 100\% \) would correspond to the maximum possible speedup.

\( \tol \) WT(s) \( \tsim \)(ns) Speed(ns/day) Eff. speedup (Eff./Max.)
\( 0.01 \) \( 6015 \) \( 10008 \) \( 143752 \) \( 156 \) \( 70 \% \)
\( 0.025 \) \( 5239 \) \( 10103 \) \( 166609 \) \( 181 \) \( 80 \% \)
\( 0.05 \) \( 4973 \) \( 10032 \) \( 174296 \) \( 189 \) \( 84 \% \)
4.2 Dissociation of the FKBP–DMSO protein–ligand system

After validation of the Gen. ParRep algorithm on the alanine dipeptide, we would like to demonstrate its efficiency on a larger protein–ligand system: the aim is to sample the dissociation time \( \tauoff \) between the bound and unbound states of the FKBP–DMSO complex (see Figure 8). The FKBP protein (also known as the FK506 binding protein) have a role in the folding of other proteins containing proline residues [77]; in the human body the FKBP12 protein binds to the tacrolimus molecule (and derivatives), an immunosuppressant drug used to reduce organ rejection after an organ transplant [78]. Because of this important role, both experimental studies [79] and molecular dynamics simulations [80–82] were performed for evaluating the affinity of the FKBP protein to multiple ligands; these include the DMSO (Dimethyl-sulfoxide), a small molecule with anti-inflammatory, antioxidant and analgesic activities [83], and often used in topical treatments because of its membrane-penetrating ability, which enhances the diffusion of other substances through the skin [84].

(a) Bound state

(b) Unbound state

Figure 8: Illustration of the FKBP–DMSO complex, corresponding to the RCSB-PDB entry “1D7H” : on the left the undissociated (“bound”) state used as starting configuration for all the simulations; on the right the target dissociated (“unbound”) state characterized by a \( \tau _{off} \) dissociation time.

4.2.1 MD setup

The initial configuration was taken from the RCSB-PDB entry “1D7H”; the AmberTools17 [58] software suite was used for setting up an implicit solvent input configuration (using the OBC [85] model II): first, parameters for the DMSO ligand were retrieved from the GAFF [86] force-field using the antechamber program; then parameters for the protein are taken from the ff14SB [87] force-field; dynamics was performed using a Langevin integrator (time-step of \( dt = 2 \rm ~fs \), friction of \( \gamma = 2 \rm ~ps^{-1} \)), thermostated at a temperature of \( T = 310 \rm ~K \); the non-bonded interactions were evaluated using a non-periodic cutoff scheme up to a distance of \( 1.6 \rm ~nm \); and bonds involving hydrogens are constrained to a value of \( \pm 10^{-3}~\% \) of their original distance.

Before running ParRep simulations, the system was equilibrated for \( 1.0 \) ns, with the DMSO’s center of mass being position-constrained within \( 0.36 \) nm of its original crystallographic position (force constant of 50 kJ/mol/nm\( ^2 \)).

4.2.2 Gen. ParRep setup

For defining the ParRep states, we used the following procedure, inspired from Refs. [80, 82]: a closer view at the ligand binding cavity (see Figure 9 (a)) reveals a dense packing with only little available space around the ligand, and one expects that the sulfur and oxygen atoms will interact favorably via non-bonded interactions with the surface of the protein; when observing in details the residues surrounding the DMSO (see Figure 9 (b) corresponding to the RCSB-PDB structure obtained from X-ray diffraction [79]), one can see favorable interaction of the O atom with residue ILE-\( 56 \) and of the S atom with residue TRP-\( 59 \).

Hence we used for defining the Gen. ParRep metastable “bound state” (denoted by \( b \)) a criteria based on the distances d\( _1 \) and d\( _2 \) as illustrated in Figure 9 (b): d\( _1 \) corresponds to the distance between ligand’s oxygen and the hydrogen amide of residue ILE-\( 56 \); and d\( _2 \) corresponds to the distance between ligand’s sulfur and the center of mass of the carbons forming the ring of residue TRP-\( 59 \). The DMSO is considered to be in the \( b \) state when any of d\( _1 \) or d\( _2 \) is less than \( 1.2 \) nm, and the “unbound state” (denoted by \( u \)) is simply defined as configurations where both distances are \( \ge 1.2 \) nm.

(a) DMSO in its binding cavity

(b) Tracked distances

Figure 9: On the left a closer view of the DMSO ligand in its binding cavity: favorable interactions between the O or S atoms and surface residues, together with the little available space around the ligand, are responsible for metastability. On the right, surrounding residues within the cavity are represented: in order to detect the dissociation event two distances are tracked for defining the bound state (see the Gen. ParRep setup paragraph for details).

One may wonder whether this distance threshold \( d < 1.2~\rm nm \) has a physical meaning: Figure 10 shows a histogram distribution of the two distances d\( _1 \) and d\( _2 \), for a \( 30~\rm ns \) Langevin dynamics: it appears that the threshold of \( 1.2 \) nm corresponds to rarely sampled configurations, far enough from the top of the two distributions (\( \approx 0.25 \) and \( \approx 0.55 \) nm, respectively), while it also precedes higher probabilities at \( d \ge 1.4 \) nm, corresponding to unbound states. This threshold therefore appears to approximately correspond to a boundary between the \( b \) and \( u \) states.

Figure 10: Histogram distribution of the two d\( _1 \) and d\( _2 \) distances as defined in Figure 9 (b), for \( 30~\rm ns \) of plain Langevin dynamics.

Concerning the Fleming-Viot procedure, once again \( 4 \) observables were selected in order to track convergence to the QSD: the two first are the aforementioned distances d\( _1 \) and d\( _2 \); the third one is the distance between the center of mass of the DMSO and the center of mass of the protein; the last one is the root mean square velocity of the DMSO ligand. The levels of \( \tol \) were set to the same value for each of the observable, and this value will be the main Gen. ParRep parameter to be discussed below. The value of \( \tgr   \) was set to \( 50 \times dt \), and the test \( (X_t \notin \Omega ) \) is performed with a period \( \tcheck = 1000 \times dt \), both during the Convergence step and the Parallel dynamics step.

4.2.3 Discussion

In the following we will compare the Gen. ParRep results to Ref. [82], where the authors performed long explicit water MD simulations using the CHARMM 27 force-field and where a value of \( \mathbb {E}(\tauoff ) = 2.2~\rm ns \) is reported.

In Figure 11 the distribution of \( \tauoff   \) for tolerance values of \( \tol = \{0.1,0.075,0.05,0.025,0.01\} \) is shown (details are available in Table 3): first, a moderate number of transition events (\( n < 100 \)) was generated for each level of \( \tol   \), and a first estimate of \( \mathbb {E}(\tauoff ) \) was calculated: as the results for \( \tol = \{0.1,0.075\} \) appeared to be far from the results obtained for more stringent tolerances, they were not further considered; then for the three remaining tolerance levels (\( \tol = \{0.05,0.025,0.01\} \)), extended simulations were performed which permitted to obtain \( n = \{282,320,301\} \) samples of \( \tauoff   \), thus providing estimates \( \mathbb {E}(\tauoff ) \) of respectively \( 1.51 \), \( 1.32 \) and \( 1.34~\rm ns \) with a confidence interval of approximately \( \pm 0.3~\rm ns \) (see Table 3); the convergence of the corresponding simulations can be observed on Figure 12.

It should first be noted that levels of \( \tol   \) larger than \( 0.05 \) have to be avoided for this system (and, from our experience, for any application in general) as they will systematically produce biased results: it is indeed not possible to generate initial conditions distributed according to the QSD by using such a loose tolerance criterion, especially when the definition of the state \( \Omega     \) involves a large number of degrees of freedom.

For tolerance levels smaller or equal to 0.05, the confidence intervals mostly overlap, as one can see on Figure 12: for \( \tol = 0.01 \) or \( 0.025 \) (respectively the cyan and dark blue lines) the two estimated values of \( 1.32 \) and \( 1.34~\rm ns \) are almost identical, for \( \tol = 0.05 \) (the green line) the estimated value of \( \mathbb {E}(\tauoff ) \) is \( 1.51 \), a slightly higher value. A closer look at the upper and lower bounds of the confidence intervals shows a constant overlap — of decreasing width — around \( 1.35~\rm ns \), which is the value of \( \mathbb {E}(\tauoff ) \) for \( \tol = 0.01 \) or \( 0.025 \) (\( 1.34 \) and \( 1.32~\rm ns \)), therefore suggesting that, once again, strict tolerance criteria provide the most accurate estimate; this is confirmed by a qualitative (solid vs dashed lines) and quantitative (coefficient \( R^2 \)) look at Figure 11, where one can see that the two lowest values of \( \tol \) follow the more accurately the quasi-exponential distribution (however, one should remember that the distribution is not expected to be exactly exponential especially for small values of \( \tauoff \), as exit events of the reference walker happening before the end of the Convergence step are not guaranteed to be exponentially distributed as the QSD was not yet reached).

In the aforementioned Ref. [82] the estimate of \( \mathbb {E}(\tauoff ) \) is \( 2.2~\rm ns \) (no confidence interval provided): this can be considered to be in a reasonable agreement with our value of \( 1.34~\rm ns \) obtained for \( \tol = 0.01 \) and where the \( 95 \% \) confidence interval is \( (1.20,1.51)~\rm ns \), considering that the force-field was different, and that the current study uses an implicit solvent while the reference used explicit water molecules.

Figure 11: Distribution of the dissociation time \( \tauoff \) for the complex FKBP–DMSO, estimated using the Gen. ParRep method, at different levels of tolerance \( \tol = \{0.1,0.075,0.05,0.025,0.01\} \). The top-right inset is a zoom for \( t < 1.0~\rm ns \), and dashed straight lines (for which the \( R^2 \) coefficient of determination is given) denote the expected quasi-exponential distribution for large t: \( \ln \mathbb {P}(\tauoff >t) = -t/\mathbb {E}(\tauoff ) \). See Table 3 for quantitative values of: \( \mathbb {E}(\tauoff ) \), \( N \), \( n \) and the \( 95 \% \) confidence interval.

Figure 12: Convergence of \( \mathbb {E}(\tauoff ) \) for the datasets from Figure 11 and Table 3. Dashed lines correspond to the \( 95 \% \) confidence interval.

Table 3: Summary of the estimated value of \( \mathbb {E}(\tauoff ) \) and of the corresponding \( 95 \% \) confidence interval for data presented in Figures 11 and 12.

\( \tol \) \( N \) \( n \) \( \mathbb {E}(\tauoff ) \) (ns) confidence interval (ns)
\( 0.1 \) \( 112 \) \( 88 \) \( 0.92 \) \( (0.75,1.14) \)
\( 0.075 \) \( 112 \) \( 61 \) \( 0.63 \) \( (0.50,0.83) \)
\( 0.05 \) \( 140 \) \( 282 \) \( 1.51 \) \( (1.35,1.70) \)
\( 0.025 \) \( 140 \) \( 320 \) \( 1.32 \) \( (1.19,1.48) \)
\( 0.01 \) \( 140 \) \( 301 \) \( 1.34 \) \( (1.20,1.51) \)
4.2.4 Performance and convergence to the QSD

Because the FKBP–DMSO system is much more representative of a typical research application than the alanine dipeptide, it is of an utmost interest to provide an accurate estimate of the performances: for this, we provide in Table 4 benchmarking data (following the same methodology established for Table 2): the three datasets correspond to the samples for \( \tol = \{0.01,0.025,0.05\} \) from Figure 11; the number of replica \( N = 140 \) corresponds to the maximum speedup, while column \( 5 \) reports the calculated effective speedup; the ratio given in column \( 6 \) gives an idea of the efficiency of the implementation for studying a medium size protein–ligand system.

One can see that the effective speedup is close to \( 97 \) (i.e. \( \approx 69 \% \) of the maximum \( N = 140 \)) for tolerances of \( 0.025 \) and \( 0.05 \); for the stricter tolerance level of \( 0.01 \) the speedup falls to \( \approx 80 \) (i.e. \( \approx 57 \% \) of the maximum \( N = 140 \)) which illustrates the computational cost one has to pay for an increased accuracy. Once again the ability to obtain a speedup between \( 57 \) and \( 69~\% \) of the maximum possible denotes the parallel efficiency of the Gen. ParRep implementation on a production system, and is an extremely promising achievement towards future studies of larger biochemical systems.

The distribution of \( \tfv \) is illustrated in Figure 13 for all the considered levels of tolerance: all distributions appear to be multimodal, revealing that multiple sub-states are likely to be found within the surrounding cavity definition of the bound state (this was suggested in earlier studies such as Ref. [81]); for the larger levels of \( \tol   \) the multimodality is particularly visible with two well defined peaks, resulting in an average \( \tfv   \) falling between the peaks, around \( \approx 25~\rm ps \); however for low tolerance such as \( 0.01 \) a broad distribution is observed, and the average \( \tfv   \) goes to \( \approx 50~\rm ps \).

This emphasizes once again how difficult it would be to fix a priori a value for \( \tfv \) (as required by the original ParRep implementations), as this value would be inappropriate for some of the initial conditions; the ability of the Gen. ParRep algorithm to automatically determine a value of \( \tfv \) appropriate for the current initial condition therefore appears to be the major advantage of the method when investigating protein–ligand complexes’ dissociation.

Table 4: Benchmarking data for three of the datasets from Figure 11 and Table 3 (corresponding to \( N = 140 \), \( \tol = \{0.01,0.025,0.05\} \)): each replica used \( P = 4 \) CPUs cores, and the equivalent speed of a reference Langevin dynamics on those same \( 4 \) cores is \( 5.15 \) ns/day; see Table 2 for the methodology.

\( \tol \) WT(s) \( \tsim \)(ns) Speed (ns/day) Eff. speedup (Eff./Max.)
\( 0.01 \) \( 85142 \) \( 403.5 \) \( 409.4 \) \( 79.5 \) \( 56.8 \% \)
\( 0.025 \) \( 79574 \) \( 457.6 \) \( 496.8 \) \( 96.5 \) \( 68.9 \% \)
\( 0.05 \) \( 84455 \) \( 482.2 \) \( 493.4 \) \( 95.8 \) \( 68.4 \% \)

Figure 13: Kernel Density Estimation of the distribution of \( \tfv   \), obtained for tolerance levels of \( \tol = \{0.1,0.075,0.05,0.025,0.01\} \) (datasets from Figure 11 and Table 3).

5 Conclusion and outlook

In this article, we detailed a new implementation of the Gen. ParRep algorithm, developed with the aim of facilitating the study of biochemical systems exhibiting strong metastability. After detailing the methods and the software implementation in Sections 2 and 3, a validation (Section 4) with two systems of increasing complexity was discussed.

In subsection 4.1, it was shown that the Gen. ParRep method can accurately sample the transition time \( \tauala \) characterizing the conformational equilibrium of alanine dipeptide in vacuum: for sufficiently small levels of tolerance (e.g. \( \tol = 0.01 \)), the estimation converges to what was obtained from a long reference Langevin dynamics (see Table 1 for a summary, and Figures 3 to 7). Results also appeared to compare favorably to previously published studies [69]. Finally it was also shown that the implementation of the algorithm proves to be scalable as one can obtain \( \approx 80 \% \) of the maximum possible speedup (see Table 2).

The second application consisted in the study of the dissociation of the FKBP–DMSO complex (subsection 4.2), a protein–ligand system of larger size, much more representative of typical metastable problems encountered in computational biology or chemistry. The goal was to obtain an accurate estimate of the average time \( \tauoff \) required for observing a dissociation of the complex, with comparison to previous computational studies [80–82]. It was shown (see Table 3 and the associated Figures 8 to 13) that a simple definition of the bound and unbound states based on a two distances threshold can provide an accurate estimate of \( \tauoff   \), once again when tolerance levels \( \tol < 0.05 \) are used: a value of \( 1.32 < \mathbb {E}(\tauoff ) < 1.51 \) is found using the Gen. ParRep method, the value of \( 1.34~\rm ns \) for \( \tol = 0.01 \) appearing to be the most accurate, and this compares relatively well to Ref. [82] where a value of \( 2.2~\rm ns \) was found using a different forcefield and an explicit solvent. The algorithm was also benchmarked for the FKBP–DMSO system (see Table 4), and it was shown that one can maintain performances up to \( 60 \sim 70 \% \) of the maximum possible speedup while using 560 CPU cores, which definitely makes this new Gen. ParRep ready for production on large scale HPC machines.

From the two studies performed in this article it ought to be remembered that, beyond the accurate definition of the states \( \mathcal {S} \), the choice of the tolerance level is the main parameter influencing the accuracy of the results: a value of \( \tol = 0.01 \) appears to be the most reasonable choice, confirming previous observations on smaller systems [38]. Furthermore, the algorithm provides an accurate estimate of the time \( \tfv \) required for approximating the QSD depending on the initial condition within the state, and it was shown (see Figures 7 and 13) that \( \tfv \) is distributed over a large interval of time: in such a case the a priori choice of a fixed value decorrelation time approximating \( \tfv       \) (as it was done in the original ParRep implementations) is non-obvious, and the use of the Gen. ParRep method is justified.

As an outlook, it has to be emphasized that there is still, of course, place for improvement of the software implementation: the authors would like to make the program compatible with other MD engines; tests are currently being performed where replicas are distributed over General-Purpose computing units (GPGPUs) in order to consider applications to larger systems; and preliminary simulations are being performed on a HPC Cloud Computing platform, on which a user could easily imagine using thousands of replicas.

The authors are also currently studying more advanced biochemical metastable problems, including larger protein–ligand systems in explicit water where the states consist in a set of disjoint cavities inside the protein.