TLDR: We contribute an approach to simulate chemistry reaction with (1) catalyst and (2) endothermicity in 3D.
Special thanks: This work done in QOSF cohort 9 Quantum mentorship program. My gratitude to my mentor Danial Motlagh and to the organizer QOSF
Introduction
\(\text{NH}_3\) production uses 1% of the world’s energy. The most widely adopted synthesis method, the Haber-Bosch process, is
\[\begin{align}\text{N}_2 + 3\text{H}_2 \overset{\text{Fe}}{\longrightarrow} 2\text{NH}_3 \end{align}\]However, it is energy-intensive and relies on catalysts. Understanding the reaction pathway with catalyst hopes to gain insights into the process, enhancing its efficiency. [1] has simulated the reaction using VASP using Fe211 as a catalyst as shown in Fig. 1.
Since VASP is computationally expensive, what can we achieve with a less computationally expensive, hence faster method? This post explores how we can use Pennylane to execute the first steps with limited computer resources. Concretely, we replicate the first three steps of [1] in Fig. 2, which are the most complex reactions in the whole pathway. It is computationally cheaper than the method used in [1].
Method
- Varies the coordinates of the reactant: The transition in \(x,y,z\) axis and rotation angle around its center.
- Construct the Hamiltonian
- Measure and Optimize: Measure the expectation value of the Hamiltonian using the quantum circuit and optimize the parameters using a classical optimizer to minimize this value.
Motivation
To define a 3D affine transformation for a molecule, we need to define three translation parameters \(t_x, t_y, t_z\) and three rotational parameters \(\theta_x,\theta_y,\theta_z\). Calculating the gradient needs \(t_\bullet \pm \Delta_t\) and \(\theta_\bullet \pm \Delta_\theta\). Therefore, each learning step requires \(6 \times 2\) Hamiltonian, which is expensive.
Bayes Optimization (BO)
BO, being a gradient-less method, does not have these requirements. Here is how it works. After the initial samplings \(\pmb{X}=x_1, x_2, ... , x_n\) , Gaussian processing (GP) assume that
\[\pmb{X} \sim \mathscr{N}\begin{bmatrix}\begin{pmatrix} \mu_1 \\ \vdots \\ \mu_n\end{pmatrix},\begin{pmatrix}1 & K_{12} & \cdots & K_{1n}\\ K_{21} & \vdots & \ddots & \vdots \\ K_{n1} & K_{n2} & \cdots & K_{nn} \end{pmatrix} \end{bmatrix}\]and fixes a regression curve through these data points. Only the observed data points have absolute certainty (variance = 0). For an unknown point \(x'\), GP returns a prediction from the distribution of \(\mathscr{N}(\mu,\Sigma\lvert\pmb{X})\).
A beautiful property of Gaussian is that when conditioning the Gaussian on a set of variables, the result is also a Gaussian \(\mathscr{N}(\mu',\Sigma')\). In Fig.2, the prediction value of the unknown point is \(\mu'\), whereas the confidence interval is \(\Sigma'\).
Here, we face a dilemma. In \(\pmb{X}\) there is a point with the smallest value \(x_\min\). Therefore, the next minima should be somewhere close to it. However, it is also possible that points in the wide confidence interval have smaller values than the current minima. To quantify this trade-off, we define an acquisition function \(\alpha\) to obtain new positions to sample. There are several approaches to define \(\alpha\), we are using the Expected Improvement (EI) approach here.
EI works by generating random functions that lie inside the confident intervals, as demonstrated in Figure 3. The intuition is a larger confidence interval would have more diverse sampling functions. Afterward, we sample the point at the extrema and update the prior. The process continues for a predefined number of steps.
Simplification
To facilitate the calculation with our limited computing resources we made the following simplification
- Calculating the Hamiltonian with \(\text{STO-3G}\), a minimal orbital basis.
- Simplyfing the catalyst substrate.
Visualization
We will replicate the first three steps of Figure 2, whose Hamiltonians are the most complex in the whole pathway.
In the below visualization, we color-coded the atoms as \(Fe\) \(N\) , and \(\text{H}\) . Please open the GIFs/videos in a new tab for a full-resolution version.
Step 1
We fix the coordinates of the catalyst substrate and use the gradient descent method to optimize the coordinates of the reactants. The result is as follows.
Even with the same learning rate, the speed of the reactants slows down greatly compared to the beginning. Therefore, we conclude that in this step, the learning converged.
Step 2
In this step, we add \(H_2\) as a reactant. To minimize the time to build the Hamiltonian, we fix the coordinates of all the reactants of the previous step. Since gradient descent works before, let’s apply the same method.
Note the moving speed of the reactants is much slower than the previous step. Therefore, it is clear that we are stuck at a local minimum, and gradient descent would have a hard time escaping it without modeling the endothermicity of this reaction (\(0.98\) below the arrow). Let us try with BO to see if it can overcome this minima.
Due to the nature of the acquisition function, the plot of the ground state energy does not decrease nicely as seen in the gradient descent. Most of the \(\text{H}\) atoms went too far away from the catalyst substrate, due to our initial setup of the search margin, which is \(1 nm\) bigger than the catalyst substrate. Here we filter only what went inside the substrate.
We conclude that the search boundary condition is of utmost importance. Accordingly, we modify the search space as below.
Here is the optimization after setting the search boundary. We made these demonstration videos instead of GIFs so that readers can control the frame shown.
\(\text{NH}_3\) molecule is produced in a total of 6 trials (namely 12th, 13th, 25th, 26th, 27th, and 28th), which coincidentally matches the efficiency of the Harber-Bosch process of about 20%1. It also offers an explanation for that number, which is the coordinates that create \(\text{NH}_3\) have higher ground state energy than the other configurations where \(\text{H}\) atoms float away. Note that there are times that the BO chooses to sample two very close points consecutively (12th and 13th frame for example), it is because of the exploitation and exploration trade-of that we mentioned earlier
Conclusion
This blog post provides an alternate method to optimize the geometry of the Haber-Bosch process. Due to the lack of computational resources we have to reduce the catalyst platform, number of active orbitals, and electrons. These parameters are visible in chem_config.yaml
. Even with a simple orbital basis and reduced numbers of active electrons and orbitals, the experiments still take up a lot of time.
- Gradient descent: 32 CPU, 256 GB RAM, 50h runtime
- BO: 8 CPU, 64 GB RAM, 60h runtime
The parameters of free electrons and free orbitals are crucial parameters to make this project possible as a full configuration interaction can take up to TBs of RAM to calculate. That being said, even this setup can replicate the original paper. Even when we receive the desired result where \(\text{NH}_3\) is produced, it would be interesting to see if gradient descent can further finetune these positions.
Comments? Questions? Please let us know at Issues page
References
[1] Reaction mechanism and kinetics for ammonia synthesis on the Fe(211) reconstructed surface. Jon Fuller, Alessandro Fortunelli, William A. Goddard III, and Qi An. Physical Chemistry Chemical Physics Issue 21, 2019
[2] Reiher Markus, Nathan Wiebe, Krysta M. Svore, Dave Wecker and Matthias Troyer. Elucidating reaction mechanisms on quantum computers. Proceedings of the National Academy of Sciences 2017
[3] Kevin Patrick Murphy. Machine Learning: a Probabilistic Perspective. MIT Press, 2012