Tutorial
This is a tutorial that comes with ProtoMS. When you have downloaded ProtoMS you should find all of files in the "tutorials" directory (and a copy of this web page is in ProtoMS.htm. While using these tutorials, you will be running ProtoMS, modifying input files, writing output files etc...I therefore strongly advise you to copy the tutorials directory and work on that copy (e.g, cp -a tutorials mytutorials). You will have to modify some paths in the scripts RETImaster.py for the tutorials to work. There are also some issues you need to check. For instance whether or not to use rsh or ssh in the script RETImaster.py (the feasible options will depend on the way your system administrator has setup your computers). In a number of tutorials I have left directories with (compressed) output files. You can uncompress them and see if you obtain similar results. In theory, with the same random number seed you should reproduce my results. However OS/python versions/may make things change.
Some of the tutorials to set compounds up require other modeling programs. You may not have AMBER or whatif available and you may have to find your own ways of setting systems up. Hopefully that won't prove too painful.
Table of contents
1.1 Perform some simple simulations
1.2 Setup a solute for a Monte Carlo simulation
1.3 Setup a solute for a free energy calculation with the single topology method
1.4 Setup a protein/protein scoop for a Monte Carlo simulation
1.5 Equilibrate a system
1.6 Calculate a relative solvation free energy
1.7 Calculate a relative binding free energy
1.8 Perform a Monte Carlo
minimisation in implicit solvent
1.9 Use additional restraints
Perform some simple simulations
- One of the simple, yet interesting system that ProtoMS can simulate consists in two atoms with no non-bonded terms and connected by a harmonic spring potential U(x) = K(r-req)**2. In this first example we will calculate the average potential energy of such a system. A Monte Carlo simulation will be performed in internal coordinates and a Monte Carlo move will consist in a random change of the bond length between the two atoms. Given the one dimensional nature of this system and the quadratic form of the potential energy function, it is easy to see that the average potential energy of the system can be predicted according to the equipartition theorem. It is simply 0.5kT. At a temperature of 25 degrees celsius this amounts to 0.2962(3) kcal.mol-1
- First let's examine the input files. What ProtoMS will do is dictated by the contents of the file input.cmd
# Parameter files
parfile1 ./oscillators.ff
# PDB Files
solute1 ./harm1.pdb
#set the output files
streamheader on
streaminfo on
streamresults on
# Simulation parameters
#lambda 0.0 1.00 0.00
ranseed 1234559
# Simulation chunks
chunk1 simulate 100000 solute=1
chunk2 results write
- The first command parfile1 lists where the parameter file that describe the topology and force field parameters of the system to study is located. The command solute1 indicates where is the pdb file that contains the atom names, residues and coordinates of the system to study. Then there are three streamXXXX commands which controls the output of the program. For intance streamheader on causes the header output of ProtoMS (the one that list license informations among other things) to be sent to the standard output. Latter you should try to change this to for instance myheader.txt and see that this will cause the output to be sent to a file called myheader.txt. If you do not wish to save the header file, to save disk space for instance, you can simply write streamheader off. The same applies to other kind of streams. There are many other streams that the three listed here. Refer to the manual for details. Next we move on to simulation parameters. For now ignore the command #lambda ... we will come back to this latter. The next command is ranseed which causes a random number seed to be specified at the start of the simulation, This usually ensure that two Monte Carlo simulations repeated with the same ranseed are strictly identical. In practice simulations run on different processors could behave differently. The last section describes chunks. A chunk can be thought of as one action taken by ProtoMS. So basically once the code has loaded all the input, set streams and simulation parameters etc...It will go on and perform one chunk after the other. The chunks therefore describe what you actually want to do on the system you have loaded. Here chunk1 causes 100 thousand MC moves to be performed on our solute and solute=1 indicates that only solute moves should be attempted (which is fine since there is no solvent or protein in this example). The second chunk2 will cause the results (energy terms and free energies) to be printed out. And that's all this simnulation does!
- Let's look at the file that solute1 will load
HEADER harm1
ATOM 1 A1 ha1 1 0.000 0.000 0.000
ATOM 2 A2 ha1 1 2.000 0.009 0.000
TER
END
- You can see that the contents of the file are very similar to a typical pdb file. The only thing important to notice is that the first line of this pdb must contain a HEADER line which lists the name of the solute. Next let's look at the file oscillators.ff
mode clj
# Energy is sum of coulomb and Lennard Jones terms
#parameter atm proton-num charge(|e|) sigma(A) epsilon(kcal mol-1)
par 1 OS 01 0.00 0.00 0.00
mode bond
# U(r) = k(r-r0)**2
#parameter k(kcal mol-1 A-2) r0(A) comment
par 1 20.0 2.0
par 2 10.0 2.0
mode template
solute harm1
info translate 0.0 rotate 0.0
#
# DM1--DM2
# |
# DM3--A1--A2
#
#
atom A1 HA1 1 1 DM3 DUM DM2 DUM DM1 DUM
atom A2 HA1 1 1 A1 HA1 DM3 DUM DM2 DUM
bond A2 HA1 A1 HA1 flex 0.2 param 1 1
mode template
solute harm2
info translate 0.0 rotate 0.0
#
# DM1--DM2
# |
# DM3--A1--A2
#
#
atom A1 HA2 1 1 DM3 DUM DM2 DUM DM1 DUM
atom A2 HA2 1 1 A1 HA2 DM3 DUM DM2 DUM
bond A2 HA2 A1 HA2 flex 0.2 param 2 2
mode template
solute harm1t2
info translate 0.0 rotate 0.0
#
# DM1--DM2
# |
# DM3--A1--A2
#
#
atom A1 1T2 1 1 DM3 DUM DM2 DUM DM1 DUM
atom A2 1T2 1 1 A1 1T2 DM3 DUM DM2 DUM
bond A2 1T2 A1 1T2 flex 0.2 param 1 2
- The section mode clj defines the non bonded parameters for one kind of atom. Because the charge and Lennard-Jones term are zero there is effectively no non bonded energy.
- The section mode bond defines bonded terms between pair of atoms. There are two such terms which differ only by their force constant, 20 or 10 kcal.mol-1.A-2 respectively
- The section mode template describes the topology and links force field parameters to atomic sites. Let's focus on the template that describe oscillator harm1 , the one we are currently simulating. First it is important that the line solute harm1 contains the name of the solute as specified in the pdb file. Otherwise the code will complains and crashes. The line info rotate X.XX translate X.XX specifies the extent by which rigid body rotations and translations can displace the whole solute after each MC move. These rigid body rotations/translations are irrelevant to our current problem and thus set to zero. The line atom A1 specifies that atom A1, in residue HA1 is bonded to dummy atom DM3, of residue DUM, has an angle to dummy atom DM2 of residue DUM and a torsion with dummy atom DM1 of residue DUM. This is a standard way of representing internal coordinates in a zmatrix. These three dummy atoms are necessary to start the zmatrix. The two numbers 1 1 listed before the dummy atoms indicate that atom A1 has initial non bonded parameters 1 and final non bonded parameters 1. In a free energy simulation the two numbers would be different. Because they do not change here it means the non bonded terms would be kept constant during a free energy simulation. Finally the line bond specifies that atoms A1 and A2 are bonded, flex 0.2 that such bond is flexible and hence will be sampled during a MC simulation and that each MC move will attempt to modify the bond length by +- 0.2 angstrom. The term param 1 1 indicates that the initial and final force field terms are parameter 1 (which consists in a force constant of 20 kcal.mol-1.A-2 and an equilibrium distance of 2.0 angstroms if you paid attention).
- So let's actually run a simulation by typing
./protoms2 input.cmd
- You will see printed out a HEADER section, a INFO section and a RESULTS section. You can play and turn them on/off if you want. In the results section you can locate the total energy of the system.
RESULTS Total Energy 0.29774268 ( 0.417 ) | 0.29774268 ( 0.417 ) 0.29774268 ( 0.417 )
- Note that the total energy with an associated standard deviation is listed three times. This is because ProtoMS2 is constructed to perform free energy calculations using double wide sampling (more of this latter). For now you should just know that unless you are performing a free energy simulation, all these terms should be identical. In this particular case our estimate is correct within 2 decimales. We can try to do better than this. Edit the file input.cmd so that you perform now 1 million Monte Carlo moves instead of 100 thousand MC moves. I then get
RESULTS Total Energy 0.29607301 ( 0.416 ) | 0.29607301 ( 0.416 ) 0.29607301 ( 0.416 )
- The new estimate is now closer to the analytical result. As the sample size increases, the results will converge towards the analytical result. Another experiment you can do is to replace ranseed by another number and see how much the results varies. Ideally, the computed average should not be too sensitive to the inital random number as this parameter is arbitrary.
- Now we will go on to perform a free energy simulation. We will attempt to calculate the free energy difference between two harmonic oscillators which have the same equilibrium distance but a different force constant of 20 and 10 kcal.mol-1.A-2 respectively. Again, classical statistical mechanics has worked out the answer for us and the relationship is:
A2 - A1 = (kT/2) ln (K2/K1)
- Which comes out as -0.2053 kcal.mol-1 at 25 degrees celsius. To compute such a quantity with ProtoMS edit the file input.cmd such that it looks like this
# Parameter files
parfile1 ./oscillators.ff
# PDB Files
solute1 ./harm1t2.pdb
#set the output files
streamheader on
streaminfo on
streamresults on
# Simulation Parameters
lambda 0.0 1.00 0.00
ranseed 1234557
# Simulation chunks
chunk1 simulate 100000 solute=1
chunk2 results write
- All we did is to change the solute we load to harm1t2.pdb and uncomment the parameter lambda. The first change allows us to load a solute setup to perform a single-topology free energy calculation. The second parameter specify the value of the coupling parameter for the, reference, forwards and backwards states. The reference state is the state at which the MC simulation is performed. So if lambda is set to 0, we are simulating the harmonic oscillator with its initial parameter. With forwards set to 1.00 we simulate the oscillator with its final parameter. And backwards is identical to reference in this case. Looking at harm1t2 above we see that the initial force field parameter of the bond is 1 and the final is 2. Going further above to check the contents of the mode bond section, we see that this correspond to changing the force constant from 20 to 10 kcal.mol-1.A-2.
- Next run protoms2 on this input.cmd file. This time in the RESULTS section you will notice that the forwards energies are different from the reference and backwards energies. In addition, a free energy is reported. I got
RESULTS Backwards Free Energy = 0.0000000000000000000 ( 0.000000000 )
RESULTS Forwards Free Energy = -0.21177880938162346736 ( .2146595169 )
- We see our free energy estimate is not too far off. You can now see how much improvement you get by increasing the number of MC moves to say 1 million.
- While simple, there are plenty of things you can try on this system. For instance, what happens when you increase the force constant from 20 to 2000 ? How accurate are your results ? Next try to perform the simulation at the final reference state by setting lambda 1.0 1.00 0.00 and hence perform the simulation in reverse order. Why are the results more accurate ? You will find answers to these questions in the litterature, by checking out for instance Kofke D.A. Mol. Phys. 2004, 102, 405-420 or Michel J. Verdonk M.L. Essex J. W. J. Chem. Theory Comput., submitted
- REFERENCES:
tutorials/protoms-setup-ligand/
Setup a solute for a Monte Carlo simulation
- First I get a pdb of the solute I am going to use. If necessary I add missing hydrogen atoms using
the command
babel -ipdb indole.pdb -opdb indoleh.pdb -h
this require you to have the program babel installed on your computer.
You can also use the Prodrg2 server (http://davapc1.bioch.dundee.ac.uk/programs/prodrg/) to protonate your solute.
- I make sure all the atoms in my solute are uniquely labeled. I add a line in the pdb file
which says HEADER <name_of_solute> and I use then the command :
antechamber -i indoleh.pdb -fi pdb -o indole.prepi -fo prepi -c bcc -pf yes
This will generate a amber prepi file with AM1/BCC atomic partial charges for the solute.
this require you to have antechamber installed somewhere. Antechamber comes with the program amber
or can be dowloaded at this URL http://amber.scripps.edu/antechamber/antechamber.html
- Then I check if the GAFF force field has all the parameters I need or if some additional ones must be derived
parmchk -i indole.prepi -f prepi -o indole.frcmod
parmchk comes with amber/antechamber.
- I then write a zmatrix for my solute. Find documentation about zmatrices before making your own.
The contents of my zmatrix file indole.zmat looks like :
C7 DU DU DU
C6 C7 DU DU
C5 C6 C7 DU
H6 C6 C5 C7
C4 C5 C6 C7
H5 C5 C6 C4
C9 C4 C5 C6
H4 C4 C5 C9
C8 C9 C4 C5
H7 C7 C8 C6
C3 C9 C4 C8
C2 C3 C9 C4
H3 C3 C2 C9
N1 C2 C3 C9
H2 C2 N1 C3
H1 N1 C2 C8
- At this stage I wrote a python script that read up all this information and build a template for the solute.
The script is called prepi2template.py (see links at the end of this section)
../../../tools/prepi2template.py -p indole.prepi -f indole.frcmod -z indole.zmat > indole.template
will give you a template file. There are a few things which needs to be done in this template file.
The line solute molecule must be replaced by solute <name_of_solute>
The DU IND entries must be replaced by DM3 DUM, DM2 DUM, DM1 DUM respectively.
The atomic partial charges on symetric atomic sites must be averaged (e.g, all the methyl hydrogens of a methyl group should have the same atomic partial charge). I usually use a visualisation program like rasmol or VMD to find
symmetric sites.
The degrees of freedom that will be sampled during a MC simulation must be added. For example
angle H6 IND C6 IND C5 IND flex 0.50
- Once I am happy with the template, I run some short MC simulations of the solute in vacuum to check if
if there are no problematic warnings.
-
- First ./protoms2 input.cmd and I check the warnings file. Some warnings are almost always generated by the code.
-
- If it is fine ./run.py 1.00 will run a short MC sim and generate some snapshots for visualisation. Check all the warning files as well.
- REFERENCES:
tutorials/protoms-setup-ligand/
contains input file, setup files for a solute at various stage of the
operations mentionned above. Copy everything in this directory to your home disk
and try to repeat this tutorial. Remember you need to have babel and amber or antechamber installed.
Setup a solute for a free energy calculation with the single topology method
- I setup the two solutes I am interested in, for example ethane and methanol following the procedure described above
- Here is a template file to simulate ethane
mode clj
# Energy is sum of coulomb and Lennard Jones terms
#parameter atm proton-num charge(|e|) sigma(A) epsilon(kcal mol-1)
par 3001 HC 01 +0.031 2.6495 0.0157
par 3002 CT 06 -0.093 3.3997 0.1094
mode template
solute ethane
info translate 0.0 rotate 5.0
#
# DM2--DM1
# |
# H2 DM3 H6
# | / |
# H3--C1-----C5-H7
# | |
# H4 H8
#
atom C01 ETH 3002 3002 DM3 DUM DM2 DUM DM1 DUM
atom C05 ETH 3002 3002 C01 ETH DM3 DUM DM2 DUM
atom H06 ETH 3001 3001 C05 ETH C01 ETH DM3 DUM
atom H07 ETH 3001 3001 C05 ETH C01 ETH H06 ETH
atom H08 ETH 3001 3001 C05 ETH C01 ETH H06 ETH
atom H02 ETH 3001 3001 C01 ETH C05 ETH H06 ETH
atom H03 ETH 3001 3001 C01 ETH C05 ETH H02 ETH
atom H04 ETH 3001 3001 C01 ETH C05 ETH H02 ETH
bond C05 ETH C01 ETH
bond H06 ETH C05 ETH
bond H07 ETH C05 ETH
bond H08 ETH C05 ETH
bond H02 ETH C01 ETH
bond H03 ETH C01 ETH
bond H04 ETH C01 ETH
angle H06 ETH C05 ETH C01 ETH flex 0.5
angle H07 ETH C05 ETH C01 ETH flex 0.5
angle H08 ETH C05 ETH C01 ETH flex 0.5
angle H02 ETH C01 ETH C05 ETH flex 0.5
angle H03 ETH C01 ETH C05 ETH flex 0.5
angle H04 ETH C01 ETH C05 ETH flex 0.5
dihedral H02 ETH C01 ETH C05 ETH H06 ETH flex 15.0
- And here is a template file to simulate methanol
mode clj
# Energy is sum of coulomb and Lennard Jones terms
#parameter atm proton-num charge(|e|) sigma(A) epsilon(kcal mol-1)
par 3010 CT 06 +0.117 3.3997 0.1094
par 3011 H1 01 +0.028 2.4714 0.0157
par 3012 OH 08 -0.598 3.0665 0.2104
par 3013 HO 01 +0.397 0.0000 0.0000
mode template
solute methanol
info translate 0.0 rotate 5.0
#
#
#
# DM1--DM2 H02 H06
# | | |
# DM3--C01--O05
# / \
# H03 H04
#
atom C01 MEO 3010 3010 DM3 DUM DM2 DUM DM1 DUM
atom H02 MEO 3011 3011 C01 MEO DM3 DUM DM2 DUM
atom O05 MEO 3012 3012 C01 MEO H02 MEO DM3 DUM
atom H03 MEO 3011 3011 C01 MEO O05 MEO H02 MEO
atom H04 MEO 3011 3011 C01 MEO O05 MEO H02 MEO
atom H06 MEO 3013 3013 O05 MEO C01 MEO H02 MEO
bond O05 MEO C01 MEO
bond H06 MEO O05 MEO
bond H02 MEO C01 MEO
bond H03 MEO C01 MEO
bond H04 MEO C01 MEO
angle H06 MEO O05 MEO C01 MEO flex 0.5
angle H02 MEO C01 MEO O05 MEO flex 0.5
angle H03 MEO C01 MEO O05 MEO flex 0.5
angle H04 MEO C01 MEO O05 MEO flex 0.5
dihedral H06 MEO O05 MEO C01 MEO H02 MEO flex 15.0
- I now create a template ethane2methanol.par which looks like this
mode clj
# Energy is sum of coulomb and Lennard Jones terms
#parameter atm proton-num charge(|e|) sigma(A) epsilon(kcal mol-1)
par 3001 HC 01 +0.031 2.6495 0.0157
par 3002 CT 06 -0.093 3.3997 0.1094
par 3010 CT 06 +0.117 3.3997 0.1094
par 3011 H1 01 +0.028 2.4714 0.0157
par 3012 OH 08 -0.598 3.0665 0.2104
par 3013 HO 01 +0.397 0.0000 0.0000
mode template
solute ethane2methanol
info translate 0.0 rotate 5.0
#
# DM2--DM1
# |
# H2 DM3 H6(H6)
# | / |
# H3--C1-----C5(O5)--H7(D7)
# | |
# H4 H8(D8)
#
atom C01 E2M 3002 3010 DM3 DUM DM2 DUM DM1 DUM
atom C05 E2M 3002 3012 C01 E2M DM3 DUM DM2 DUM
atom H06 E2M 3001 3013 C05 E2M C01 E2M DM3 DUM
atom H07 E2M 3001 100 C05 E2M C01 E2M H06 E2M
atom H08 E2M 3001 100 C05 E2M C01 E2M H06 E2M
atom H02 E2M 3001 3011 C01 E2M C05 E2M H06 E2M
atom H03 E2M 3001 3011 C01 E2M C05 E2M H02 E2M
atom H04 E2M 3001 3011 C01 E2M C05 E2M H02 E2M
bond C05 E2M C01 E2M param 0 0
bond H06 E2M C05 E2M param 0 0
bond H07 E2M C05 E2M param 0 0
bond H08 E2M C05 E2M param 0 0
angle H06 E2M C05 E2M C01 E2M flex 0.5
angle H07 E2M C05 E2M C01 E2M flex 0.5
angle H08 E2M C05 E2M C01 E2M flex 0.5
angle H02 E2M C01 E2M C05 E2M flex 0.5
angle H03 E2M C01 E2M C05 E2M flex 0.5
angle H04 E2M C01 E2M C05 E2M flex 0.5
dihedral H02 E2M C01 E2M C05 E2M H06 E2M flex 15.0
variable C05 E2M bond 1.526 1.410
variable H06 E2M bond 1.090 0.960
variable H07 E2M bond 1.090 0.200
variable H08 E2M bond 1.090 0.200
- The parameters in the atom line correspond to ethane (initial) or methanol (final). Two hydrogens atoms of ethane are converted into dummy atoms of parameter 100
- There is a variable section which change the value of the bond lengths with lambda. So initial C05--C01 bond length 1.526 (in ethane), final C05--C01 bond length 1.410 (in methanol). If you couple a degree of freedom to lambda, you cannot sample it during the simulation !
- Finally, I copy the pdb file ethane.pdb to ethane2methanol.pdb and rename the solute ethane2methanol and E2M to match the template.
REFERENCES:
tutorials/protoms-setup-single-toplogy/
contains pdb and templates for ethane, methanol and ethane2methanol
Setup a protein/protein scoop for a Monte Carlo simulation
- I obtain a pdb for the protein you want to work with. Strip funny things off it. For example co-factors, ligands etc...
- I add polar hydrogens to the protein using the program whatif and the
hbonds menu. Please see Whatif for more information. Other programs can do this for you (AMBER or reduce notably). - At this stage, I like to minimize the protein with a ligand bound to it. This is to get rid of steric clashes that translate into big energetic penalties when using a force field. Some persons would argue that such minimisation would bias the binding site to favour the ligand that was minimised in. So think about it.
- Get a frcmod, prepi and pdb file for your ligand. If you followed the instructions in the previous section you are all set.
- Get a pdb file of your protein. Insert into it the coordinates of the
ligand and any crystallographic waters. The protein must have the correct naming
convention for protein residues. I use a script to convert pdb between different
naming conventions ( and pray one day a single convention will be adopted). The
command below shows how to use the script (the files are the directory tleap).
../../../tools/convertproteinatoms.py -c ../../../tools/atomnamesmap.dat -f 184L-i4bh.pdb -i protoms -o amber > 184L-i4bh-amber.pdb
- Now I use tleap to setup a protein and ligand for sander (the MD program of amber). I create a file called leap.in which contains the following:
source leaprc.ff99
source leaprc.gaff
loadAmberParams i4bh.frcmod
loadAmberPrep i4bh.prepi
x = loadPdb 184L-i4bh-amber.pdb
saveAmberParm x prmtop prmcrd
- Then simply type tleap -f leap.in and check that tleap does not complain too much about your pdb. leap will add missing non polar hydrogen atoms so that you should end up with a proper all atom model of a protein-ligand complex. Quit leap. You will have two new files, prmtop and prmcrd.
- I setup sander to run a small minimisation that should remove bad steric contacts. I create a mdin file for sander. Refer to the amber documentation for the details of what these keywords means, they might not be what you want for your system !!
protein: minimisation
&cntrl
imin = 1,
maxcyc = 500,
ncyc = 100,
ntpr = 2,
ntb = 0,
igb = 1,
cut = 12
/
- Then I run sander with this command
sander -O -i mdin -o mdout -c prmcrd -p prmtop
- When the minimisation is finished, generate a pdb from the output of sander. I also like to generate a pdb from the starting prmcrd file so I can check how much the protein has changed after minimisation.
ambpdb -p prmtop < prmcrd > initial.pdb
ambpdb -p prmtop < restrt > 184L-i4bh-minimised.pdb
- Out of the minimised complex I make two pdbs, one with the coordinates of the ligand and one with the coordinates of the protein+crystallographic waters. I then rename the protein atoms to comply with the protoms standard
../../../tools/convertproteinatoms.py -c ../../../tools/atomnamesmap.dat
-f 184L-minimised.pdb -i amber -o protoms > 184L-mined-protoms.pdb
- For my work I often create a scoop of the protein (you may want to keep the whole protein intact) with another script. Check the help mode of the script or better, read the source code to understand what the options are.
../../../tools/generatescoop.py -l i4bh-minimised.pdb -p 184L-mined-protoms.pdb -o 15 -i 10 -f sidechain -g rigid > 184L-scoop1510.pdb
- ProtoMS will consider the first residue of your protein to be Nterm and
the last to be Cterm. Some manual editing for the Nterminal and Cterminal residues might be
needed for your work. If you do NOT want the Nterm and Cterm residues to be
protonated (because they are not protonated in the original pdb, they just
happen to be Nterm/Cterm because of the scoop), you must write in the HEADER
line of your protein "scoop". If you do want protonation of N and C term
residues, do not write "scoop".- Next, I need to convert the crystallographic waters into TIP4P molecules.
../../../tools/maket4p.py 184-scoop1510-neutral.pdb > xwat-t4p.pdb
- Then I remove the waters from the file 184L-scoop1510.pdb and
create the file 184L-scoop1510-mod.pdb. I now want to make sure the net
charge of my scoop is 0. The utility of this is debatable but nonetheless here
is the procedure. I load up the scoop model in protoms with the following
input.cmd file (located in the directory 'scoop-solvate').
# Parameter files
parfile1 ../../../parameter/amber99.ff
parfile2 ../../../parameter/solvents.ff
parfile3 ../../../parameter/amber99-residues.ff
parfile4 ../../../parameter/gborn.parameters
parfile5 ../../../parameter/surface.parameters
parfile6 ../../../parameter/gaff.ff
parfile7 i4bfree.template
# PDB Files
solute1 i4b.pdb
protein1 184L-scoop1510-mod.pdb
solvent1 waterT4P.pdb
#set the output files
streamwarning warning
streamheader off
streaminfo on
streamdetail off
streamaccept off
streamdebug on
streamresults on
cutoff 10.00
feather 0.5
#ranseed 12345678
chunk1 solvent cap 27.418 6.728 3.436 22.0
chunk2 pdb all solvent=all file=solvated.pdb- Where 27.418 6.728 3.436 at the line chunk1 are the coordinates of the closest ligand atom to the center of geometry of the ligand ( you can put the center of geometry if you have calculated it).
- This would give me an output file "solvated.pdb" which contains a ball of water surrouding my protein scoop, and is 22 angstrom radius wide, centered around the center of geometry of the ligand. However, I check what was the net charge of the scoop by looking at the output of the code. Here it was +5. So I decided to neutralise three lysine residues lying in the outer part of the scoop K124,K135,K147. Afterwards, I looked at the whole protein model and added two extra residues to the scoop Asp159 and Glu5. I did it by re-running the generatescoop.py script.
- You will note that you may have a net charge which is not quite zero. This is because of rouding errors in the template files for amber99.ff that affect Nterminal/Cterminal residues. If you are brave, you can fix the whole template files so that they work for all possible Nterminal/ Cterminal residues. If you are no as brave, you can manually edit a few lines of it so that it works for the particular protein you have. If you are lazy you can decide that a net charge of 0.002 is close to zero anyway.
- I then edit the file solvated.pdb that was output by protoms, I remove all the protein and solute atoms and I write at the first line of the pdb. I save the file as watercapraw.pdb
HEADER cap 27.418 6.728 3.436 22.0 1.5
- So that protoms knows the center, radius and force constant of the solvating sphere of water. At this stage, I have to insert the crystallographic waters that were present initialy. I use the script filterbulkwat.py. This script could really be improved, and you will have to edit a lot of the residue numbers in
the file watercapraw.pdb so that the script does not complain.
You can also use the script "renumber_t4p_residues.pl" to renumber the t4p residues in "watercapraw.pdb". "filterbulkwat.py" will not complain any longer.- Finally, to obtain a ball of water for the solvated ligand, just edit the input.cmd file and comment out the protein.
- REFERENCES:
tutorials/protoms-setup-protein/
contains input file, intermediate setup files and all the scripts that were described in this tutorial. To use many of the python scripts you will need a working knowledge of python and install the python module SimulationObjects.py on your computer. You will need to have a working version of whatif, amber and protoms.- In general the solvated protein-complex structure obtained at the end of the protein setup is not ready for simulation. This is because the protein was energy minimised and the ball of water around the protein has not yet relaxed. It is therefore advised to run a preequilibration of the system before performing free energy simulations
- The system we will prepare is the scoop of lyzozime bound to indole described in the previous section. You should be a little bit familiar with the input command file by now.
- The first step is to perform solvent only moves. The aim is to relax the solvent structure, alleviate bad overlaps etc..So go to the directory solvent and inspect the file input.cmd first to understand what we are going to load.
- This equilibration is done with the script equilibrate.py which can be split into two parts. The top part of the script sets up simulation parameters. In particular "nconf" is the number of MC moves in one block of simulation and "nblocks" the number of blocks of simulation. So with 10000 MC moves and 20 blocks we will perform 0.2 million MC moves on this system. This will be run at a given value of lambda (1,0) here but this parameter only makes sense if you perform a free energy calculation and does not have importance for this exercise. Obviously, variables which contains path must be changed if you run this script with the code/input files ...arranged differently.
- Towards the end of the script, there is a "Simulation specific parameters" section. This is where we specify chunks that will be performed by ProtoMS during one block of simulation. Some of the first chunks here involve fixing all the protein backbone and then fizing a number of protein residues. This sequence of residues was generated previously when the protein scoop was built. It is important to make sure that this sequence match the particular protein scoop you are working with. On line 93, the command "simulate %d solvent=1" % nconf will causes ProtoMS to perform nconf MC moves. Because no solute or protein move probabilities have been defined, only solvent molecules will be moved.
- The script is run simply by executing equilibrate.py and should run for a few minutes. As the simulation progress you will see "results-XXX" file appear in the output directory "out/lam-1.00"
- A simple mean to check whether or not the system is equilibrated is to plot the average total energy as a function of the number of blocks and see any systematic drift can be observed. This can be accomplished with the following commands
cd out
../../../../tools/processfdti.py -c- This will generate a standard output and file called win-0-totnrg.dat . Ignore the sfandard output for now and check the contents of the file using for instance xmgrace. You should see the total energy decreasing as the number of blocks increases.
- The next step is to equilibrate while allowing protein and solute moves. This is demonstrated in the protein directory. Inspect the example file. Here note that we are now loading the solvent file 184L-watercapbound-preeq.pdb. This file was obtained by editing the last pdb output from a simulation conducted with 20M solvent only moves.
- Inspect the file equilibrate.py . The only difference is that we now allow solute and protein moves when we make use of the simulate chunk. That is to say line 93 reads "simulate %d solute=1 protein=9 solvent=60 " % nconf, which means that the solute (indole) is moved 1 tout of 70 times on average, the protein 9 out of 70 and the solvent 60 out of 70 times on average.
- Run equilibrate.py. This will take a few minutes. The simulation is a bit longer than before because solute and protein moves are more expensive than solvent moves on average. When the script is completed, you can plot the total energy with the same procedure as before
cd out
../../../../tools/processfdti.py -c- This time the energy appears stable very rapidly. This behavior is by no means systematic and will depend on the quality of the initial scoop model as well as the fashion it was prepared. Depending on the kind of problem 10 - 30M moves of equilibration would be typically employed in a serious application.
- Note that at each stage you can compare what you have done to output files I saved in the directory saved, for both the solvent moves only and the protein+solute+solvent moves examples. As long as you have not modified the variable mastrand you should be getting identical results.
REFERENCES:
tutorials/protoms-equilibrate/Calculate a relative solvation free energy
- In this tutorial, the relative solvation free energy of ethane and methanol will be calculated using a single and a dual topology method. This tutorial will also introduce the use of a more complex python script to setup and launch a parrallel free energy calculation.
- We begin with the single topology method. In a previous tutorial, a template that convert ethane into methanol was devised. The input file that describes the simulation we will run with such system looks like this
# Parameter files
parfile1 ../../../parameteramber99.ff
parfile2 ../../../parameter/solvents.ff
parfile3 ../../../parameter/gaff.ff
parfile4 in/ethane2methanol.par
# PDB Files
solute1 in/ethane2methanol.pdb
solvent1 in/boxT4P.pdb
#set the output files
streamwarning warning
streamfatal fatal
streamheader off
streaminfo off
streamdetail off
streamaccept off
# simulation parameters
cutoff 10.00
feather 0.5
boundary solvent
nptsim on
pressure 1.0
prefsampling 1 200.0- You should be familiar with the first three sections now. Note that we load the AMBER force field, additional parameters for solvents, the gaff force field, and the parameter file that contains a template for the perturbation ethane to methanol. The starting structures are in solute1 and solvent1, where solvent1 is a box of 533 TIP4P water molecules.
- In the simulation parameters section, we set the cutoff/feather as usual. The next keywords are necessary for a periodic boundary NPT simulation. First boundary solvent tell ProtoMS that the box dimensions are located in the solvent file (see below). That we are running an NPT simulation (nptsim on), the pressure is 1.0 atmosphere. In addition we wii use preferential sampling so as to move more often solvent molecules closer to the solute. prefsampling 1 200.0 means the preferential sampling is performed around solute 1, and the last number controls the bias of selection of solvent molecules closer to the solute. Usually we almost always use this number. If you now inspect the beginning of the file boxT4P.pdb you will see this line
HEADER box -12.5 -12.5 -12.5 12.5 12.5 12.5
- This defines the box dimensions in angstrom. Do not forget this line in the pdb file of the solvent if you have specified boundary solvent in the input file !! Otherwise you will run a simulation in vacuum...
- You may have noticed that this command file does not contain any chunks section. Because the simulation will be rather long, it will be broken down into several execution of the program. As the chunks will change every time, we are going to use a script to write these chunks. This task, among other things, is done by the script RETImaster.py. If you open this file in a text editor, you should see at the top section something like this
#!/usr/bin/python
#
# This is a socket based implementation of RETI
# calculations for the program ProtoMS
# Work in pair with RETIslave.py
# Julien Michel, 2007
#########################################################
# SIMULATION PARAMETERS
# These global variables are all prefixed GLOB for clarity
# where the sim is being run
GLOBrundir = "/net/roundabout/pulsar/julien/protoms2.2/tutorials/protoms-ethanemethanol/single-wat/"
# where the binary is
GLOBprotoBin = "../../../protoms2.tutorial"
# If there is a initial restart
#GLOBinitrestart = ""
# list of lambas sims and where they should run
GLOBnodesFile = "nodes.dat"
# where is the cmd file that describe the constant parameters of the simulation
GLOBinputCmd = "input.cmd"
# where is the script the slaves should run...
GLOBslaveScript = "RETIslave.py"
# The number of blocks of equilibration before data collections starts
GLOBequilblocks = 10
# the number of iterations ( one iteration = one potential RETI move)
GLOBiterations = 20
# The number of blocks per iteration (Protoms will create it*blocks )
GLOBblocks = 2
# How often a PDB must be saved (every ? blocks )
GLOBpdbout = 10
# The temperature at which the simulation is run
GLOBtemperature = 25
# The master random number. If set to 0 a random number is generated,
# otherwise the one listed here is used.
GLOBmastrand = 1234567
# If running on iridis, then this parameter is defined. otherwise comment out !
# The jobs will be evenly spread on the number of nodes you requested
#GLOBiridis = [0.00, 0.06, 0.12, 0.19, 0.26, 0.33, 0.40, 0.47, 0.54, 0.61, 0.68, 0.75, 0.82, 0.88, 0.94, 1.00]
# This will cause each slave to copy all the input files to a /tmp dir
# and run on a local disk instead of a relying on an NFS share. At the
# end of the simulation, the file will be copied back to the NFS share
GLOBlocal = False
# The delta lambda value for the FDTI calculation ( normally 0.001 is fine )
GLOBdeltalam = 0.001
# Verbose mode, will generate a lof of output that can be used to check that
# the simulation run as it should
GLOBverbose = False
# the port RETIMaster will run on, change only if it causes problems
GLOBport = 43201
# The remote connection protocol
GLOBrshProtocol = "ssh -x"
#GLOBrshProtocol = "rsh "
(...)- To use RETImaster.py on your computer you will need to modify the absolute paths such that they point to where your input files are (GLOBrundir, GLODprotoBin and sys.path.append("...") ). You also need to make sure you can use python and network sockets on your
computer/cluster (usually the case on a linux system). Also, depending on your local settings you may not be able to let your script rsh between nodes in which case you may have to switch to ssh -x (GLOBrshprotocol).In addition the top of the file RETIslave.py must be modified so that the command sys.path("") points to where the code protoms2 is kept.
- The tandem of scripts RETImaster/RETIslave implements Replica exchange thermodynamic integration free energy calculations. RETImaster centralises most of the informations needed to run the simulation. For instance, how many blocks of equilibration (10 in this case). How many RETI moves are performed (20 iterations), how many times protoms is run during one iteration is performed (2), how often pdb are saved for analysys (every 20 blocks), how many values of the lambda parameter are used (defined in nodes.dat). The other keywords are more technical. For instance, if the system you are running on use firewall, make sure GLOBport is set to a socket value which is allowed by your firewall (see your system administrator if you are confused !).
- In this example, since the files nodes.dat list 11 different lambda values, we see that protoms2 will be executed 11*(10+20*2) times. What protoms2 will do every time will depend on, the lambda value it is running at, the iteration number, and the contents of the file RETIslave.py
- The file RETIslave.py implements subroutines that tell protoms what to do for the equilibration or data collection phase of the system you are interested in. In particular for our problem here we can find the line
sim.setChunk(cn,"equilibrate 50000 solute=19 solvent=980 volume=1")
- which shows that one block of equilibration will consist of 50000 MC moves where a solute move is performed 1.9% of the time, solvent moves 98% of the time and volume moves 0.1 % of the time (see the manual for details on the calculation of move probabilities). A similar line appears in the subroutine that implements the data collection phase.
- Assuming everything is setup correctly...You launch the script by typing
./RETImaster.py > RUN.dat &
- RETImaster reads the list of nodes and the lambda value that must be run on each node in the files nodes.dat. It then create a socket server on port GLOBport. For each lambda value, it connects to the associated nodes and launch there the script RETIslave.py. This script will initialise, run protoms2 a bit and then report to the socket server. Once all the jobs have completed one iteration, RETImaster will handle replica exchange moves and distribute restart files according to the outcome of a Monte Carlo test and authorise the slaves script to perform one more iteration. And so on..until the defined number of iterations is reached at which point the slaves will terminate, the socket server close and the master script exit.
- If you can, connect to one of the node you have setup for this job and verify that protoms2 is running as it should. You should see output files being written regularly in the 'out/lam-X.XX' directories, where X.XX denote the value of the lambda parameter.
- After about 1 hour (on my cluster) this simulation will terminate. The most interesting quantity to extract is the free energy change. We do this this way:
cd out ; ../../../../tools/processfdti.py > freenrg.dat
- The contents of the output file freenrg.dat is
#Gradients g+- g-+
# 0.000 -4.242 1.661 1.661
# 0.100 -1.519 1.354 1.354
# 0.200 -0.029 1.377 1.377
# 0.300 0.844 0.784 0.784
# 0.400 0.837 0.637 0.637
# 0.500 -0.415 0.756 0.756
# 0.600 -1.208 0.637 0.637
# 0.700 -5.126 0.637 0.637
# 0.800 -7.319 0.804 0.804
# 0.900 -9.163 0.971 0.971
# 1.000 -9.763 0.817 0.817
# Lambda PMF PMF+- PMF-+
0.00000 0.00000 0.00000 0.00000
0.10000 -0.28804 -0.13731 -0.43877
0.20000 -0.36547 -0.07820 -0.65274
0.30000 -0.32473 0.07058 -0.72005
0.40000 -0.24069 0.22568 -0.70706
0.50000 -0.21958 0.31646 -0.75562
0.60000 -0.30072 0.30498 -0.90643
0.70000 -0.61746 0.05194 -1.28686
0.80000 -1.23975 -0.49828 -1.98121
0.90000 -2.06390 -1.23369 -2.89410
1.00000 -3.01022 -2.09064 -3.92981
# Free Energy : -3.010 +- 0.920 kcal.mol-1- The first section list the average free energy gradients recorded
throughout the simulations. The second section lists the potential of mean force
(integral of the gradients along lambda). The final quantity you want is the
free energy listed on the last line. Note that the error estimate is a bit
high. This is because we have run a short simulation only. In production runs I
have typically done 5 M moves of equilibration and at least 20 M moves of
collection. Nonetheless it turns out that our estimate is not far from what
longer simulations yield. To complete our thermodynamic cycle, we need new to
perform the same perturbation in vacuum. -
The directory 'single-vac' contains everything ready to conduct the perturbation
in vacuum. Compare the different keywords in the file input.cmd. In addition, the simulation is much shorter to converge since we only have to deal with the solute isolated. The file RETIslave.py is edited so that each block only uses 1000 Monte Carlo moves. In addition, only one block of equilibration is performed. Youi can repeat the run as done previously and this time the simulation should take only a few minutes (mainly because the slave scripts are inefficiently waiting to do RETImoves). - The simulation output is analysed as before. This time I got a free energy change of +2.69 +- 0.03 kcal.mol-1. This time the result is much more precise. Now I can work out a relative hydration free energy of -3.01 - 2.69 = -5.7 kcal.mol-1. This can be compared with the experimental relative hydration free energy for this system, which stands at -6.9 kcal.mol-1.
- We will now use a completely different technique to calculate this free energy change. Go to the directory named dual and inspect the contsnts of the file input.cmd
# Parameter files
parfile1 ../../../parameter/amber99.ff
parfile2 ../../../parameter/solvents.ff
parfile3 ../../../parameter/gaff.ff
parfile4 ./in/ethane.par
parfile5 ./in/methanol.par
# PDB Files
solute1 ./in/ethane.pdb
solute2 ./in/methanol.pdb
solvent1 ./in/boxT4P.pdb
#set the output files
streamwarning warning
streamfatal fatal
streamheader off
streaminfo off
streamdetail off
streamaccept off
cutoff 10.00
feather 0.5
dualtopology1 1 2 synctrans syncrot
softcore1 solute 1
softcore2 solute 2
softcoreparams coul 0 delta 1.00
boundary solvent
nptsim on
pressure 1.0
prefsampling 1 200.0- The obvious difference is that we now load two solutes, and each one of them comes with his own parameter file. You can inspect the files ethane.par or methanol.par. You will see that they provide all the information to simulate these molecules independently, but nothing about how one will be morphed into the other. The fact that we are actually running a free energy simulation and not a simple MC simulation will be controlled by the following keyword :
dualtopology1 1 2 synctrans syncrot
- Which states that we have one pair of dual topology solutes (solute 1 and solute2), and that the translations and rotations of these solutes will be coupled together. If you check the templates of ethane.par and methanol.par you will see that only rigid body rotations are attempted in this simulation. In fact for a relative solvation free energy, it is not necessary to sample the translational and rotational degrees of freedom of the solutes. This is because the solvent is an isotropic medium. However, for the calculation of relative binding free energy these degrees of freedom are important.
- The next three new keywords let us use softened non bonded energy function to describe the intermolecular interactions of the two solutes. This helps converge free energies faster in a dual topology simulation. The actual parameters (coul and delta) have been optimised for this system, but they vary somewhat from one system to another.
- You can inspect the files RETImaster.py and RETIslave.py and you will see that they do not differ from the previous single topology example. All the information you need to run a single or dual topology calculation are contained in the pdbs, template and input.cmd files.
- The simulation will take about 1 hour to complete on 11 processors. Once it is completed you can run the script processfdti.py to get the free energy change. This time I got -5.5 +-1.0 kcal.mol-1. Because of the way the dual topology method works, there is no need to run a simulation in vacuum and thus this number is the relative solvation free energy. It is relatively close to our previous single topology result (-5.7 kcal.mol-1). They should be in principle identical to within statistical error. Long simulations have established that the free energy change computed by both techniques is closer to about -6.0 kcal.mol-1. As a final remark, the error estimate produced by the script processfdti.py should not necessarily be taken too seriously. It seems that the only reliable way to estimate the true statistical error is to repeat the same simulation a number of times, ensuring that sufficient equilibration time was performed.
- By the way, if you are surprised you got identical results to mine when trying this
tutorial, try to set the variable GLOBmastrand to 0 or another number in the
script RETImaster.py and then repeat the simulations...
REFERENCES:
tutorials/protoms-ethanemethanol/Calculate a relative binding free energy
- In this example, we will calculate the relative binding free energy of two congeneric inhibitors of the protein cyclo-oxygenase2. Thermodynamics states that the relative binding free energy between ligand A and B will be the difference between the free energy change when ligand A is converted into B in the protein binding site, and the free energy change when ligand A is converted into ligand B in solution.
- The two ligands of interest differ only in the replacememt of a hydroxyl group (ligand 8) by a methyl group (ligand 1).
We will use the single topology technique for this perturbation.
- We will start by running the simulation in solution. Go to the directory single-wat and visualise the pdbs lig8.pdb and lig1.pdb in the in directory. Next look at the pdb lig8t1.pdb. You can see how this pdb has been constructed from ligand 1 and some atoms have been renamed to reflect the fact that they would be dummy atoms or oxygen atoms when describing ligand 8. Next look at the template for the perturbation of ligand 8 into ligand 1 which is described in lig8t1free.template. The usual information (force field parameters, zmatrix connectivity, variable degrees of freedom ...) is present. Of particular interest are the following sections
(...)
bond C14 8T1 C13 8T1
bond O34 8T1 C14 8T1 param 0 0
bond H38 8T1 O34 8T1 param 0 0
bond D39 8T1 O34 8T1 param 0 0
bond D40 8T1 O34 8T1 param 0 0
bond C15 8T1 C14 8T1
bond H15 8T1 C15 8T1
(...)- The force field parameters of the bonds that connect atoms which change force field terms and/or geometry (see below) have been set to zero. This is because with this template the bond lengths are not sampled during the MC simulation. Therefore, any change in energy for these bonds would affect the free energy by a constant term which would cancel out in a thermodynamic cycle. I prefer therefore to turn these terms off. If you do sample these bonds you should not set their force field terms to zero.
(...)
dihedral C12 8T1 C11 8T1 C2 8T1 N1 8T1 flex 2.0
dihedral H38 8T1 O34 8T1 C14 8T1 C13 8T1 flex 5.0
variable O34 8T1 bond 1.40 1.52
variable H38 8T1 bond 0.95 1.09
variable D39 8T1 bond 0.20 1.09
variable D40 8T1 bond 0.20 1.09- The variable section at the end of the template allows you to change the geometry of the ligand with the coupling parameter lambda. Here for instance, at lambda 0.00 , atom D39 has a bond length of 0.20 angstrom (appropriate for ligand 8 as this is a dummy atom for this ligand and with such a small bond length it will be "hidden" inside the van der waals radius of the oxygen atom O34), and at lambda 1.00 atom D39 has a bond length of 1.09 angstrom which is fine for a hydrogen attached to a carbon atom.
- Let's turn to the file input.cmd
# Parameter files
parfile1 ../../../parameter/amber99.ff
parfile2 ../../../parameter/solvents.ff
parfile3 ../../../parameter/amber99-residues.ff
parfile4 ../../../parameter/gaff.ff
parfile5 in/lig8t1free.template
# PDB Files
solute1 in/lig8t1.pdb
solvent1 in/COX2-watercapfree.pdb
#set the output files
streamwarning warning
streamfatal fatal
streamheader off
streaminfo off
streamdetail off
streamaccept off
cutoff 10.00
feather 0.5
prefsampling 1 200.0- Most of the keywords should make sense to you if you have been through the previous tutorials. It is not obvious however here that we are running a non periodic boundary simulation. If you look at the top of the file in/COX2-watercapfree.pdb you will see the line
HEADER cap 24.32 22.28 15.37 22.00 1.50
- Which states we are running a watercap simulation. This means there are no periodic boundary conditions. A restraining harmonic potential of 1.5 kcal.mol-1.angstrom-2 is applied to any water molecule which drift more than 22 angstrom away from the point in space defined by the first three coordinates. These coordinates correspond roughly to the center of geometry of the ligand. The boundary could have been specified with the boundary keyword, but because it is a bite complex, it is best left in the HEADER file of the solvent pdb. To make it obvious you could have, alternatively added the keyword boundary solvent to the input file.
- Inspect the file RETIslave.py. Note that the move probabilities differ from the previous example with ethane and methanol. For instance it is not appropriate to attempt volume moves when using a watercap. Now inspect the file RETImaster.py. The contents should also be familiar to you. Note that the command GLOBinitrestart has been commented out. We will come back to this.
- In the 'single-wat' directory there is a 'saveres' directory. It contains an equilibrated restart for each value of the coupling parameter lambda. Copy the contents of this directory to the 'out' directory.
cp -a saveres/* out/
- Edit the file nodes.dat to list the hostname of the nodes you are going to run your simulation on (using the same lambda values as in the listed in this file). We are now ready to launch the simulation ! (If this doesn't work. remember to check if a firewall does not block the port number you requested )
./RETImaster.py > RUN.dat &
- Now you can sit back and relax because this is going to take a little while (less than one to a few hours depending on how fast your nodes are). Because there already was a file restarteq in each lam-X.XX directory, the script does not carry out an equilibration phase and starts calculating free energies immediately.
- Once this is over, you can extract the free energy change.
../../../../tools/processfdti.py > freenrg.dat
- I got a free energy change of +17.4 kcal.mol-1. This should be compared to the averaged result over multiple long simulations that has been reported in the litterature : 18.1 +- 0.1 kcal.mol-1.
- Now we are going to run the simulation a bit differently. Delete (or move) the contents of the out directory and the file retiplot.dat
mkdir testing1 ; mv out/* retiplot.dat testing1
- Then edit the file RETImaster.py and uncomment the line #GLOBinitrestart = "./restart-lig8t1-free-watercap" and launch the simulation one more time. Again it will take a little while so go and read McQuarry or some other interesting book.
- When the simulation completes, analyse the results. I personnally got +15.2 kcal.mol-1 which is quite far off the previous result ! Which one is correct ? If you look at the contents of the directory 'run-vacuum' you will see we got a free energy change of 13.5 kcal.mol-1. This is enough information to calculate a relative hydration free energy of +3.9 kcal.mol-1 with the first result and +1.7 kcal.mol-1 with the second result. Quite a difference. Any idea why the second results seem far off ?
- The second simulation does not give converged results because it was started from a restart file equilibrated at a single value of lambda set to 1.0. Because no further equilibration is carried out, the solvent does not have time to relax around the different ligands run at the other values of the coupling parameter. This introduces a bias into the results in favour of ligand 1. This example illustrates rather well what a lack of convergence can cause to your compute free energies. When you attempt to produce high quality results, you should repeat each simulation a few times and from different starting configurations, to find out if you get similar answers. Sadly it often takes more time than desired to get reasonably well reproducible results.
- So far we have established the free energy change for the perturbation of ligand 8 into ligand 1 in pure water only. To complete out thermodynamic cycle we now need to carry out this perturbation in the protein binding site. The calculation is setup in the directory 'single-prot'. Let's look at input.cmd first
# Parameter files
parfile1 ../../protoms2/parameter/amber99.ff
parfile2 ../../protoms2/parameter/solvents.ff
parfile3 ../../protoms2/parameter/amber99-residues.ff
parfile4 ../../protoms2//parameter/gaff.ff
parfile5 in/lig8t1bound.template
# PDB Files
solute1 in/lig8t1.pdb
solvent1 in/COX2-watercapbound.pdb
protein1 in/COX2-scoop1510-neutral.pdb
#set the output files
streamwarning warning
streamfatal fatal
streamheader off
streaminfo off
streamdetail off
streamaccept off
cutoff 10.00
feather 0.5
prefsampling 1 200.0- The only differences are that we now load a template file for the ligands bound to the protein. This file differ from the free template in only one line, where rigid body rotations and translations are no longer set to zero. In the isotropic medium of pure water an isolated ligand does not have a rotational and translational degrees of freedom, but these become relevant once bound to the protein. Also, note that we load a different solvent cap that is ready for the protein.
- RETImaster.py is for the most part identical to the previous simulation. However if you check RETIslave.py you will see that we are now doing a few more chunks during equilibration and collection. In particular around line 80 of the script
# The chunks
cn += 1
if x == start:
sim.setChunk(cn,"restart read %s " % self.params['restartstart'])
else:
sim.setChunk(cn,"restart read %s " % self.params['restart'] )
command = "fixbackbone 1 all"
cn += 1
sim.setChunk(cn,command)
command = "fixresidues 1 1 4 8 11-17 27-32 36 38 40-42 44-45 48-49 52-57 61 70 75-83 92-97 101-120 123 125 146-147 150-155"
cn += 1
sim.setChunk(cn,command)
cn += 1
sim.setChunk(cn,"simulate 50000 solute=1 protein=9 solvent=60")
cn += 1
sim.setChunk(cn,"results write %s/results-%s" % \
(self.params['out'],ps ) )- You can see we now use commands to fix the protein backbone and then additionally fix a bunch of residue side chains. This allows us to concentrate our sampling on the parts of the binding site that are in close contact with the ligand. Also note that the chunk simulate now includes protein moves.
- To run the simulation, edit the file nodes.dat and add 12 hosts nodes. Then copy the contents of 'saveres' to the 'out' directory and launch RETImaster.py. The simulation will take approximately twice as long as in the previous example.
- Once the simulation is completed, calculate the free energy change using the script processfdti.py . I got a free energy change of 15.0 kcal.mol-1. Multiple long runs reported in the litterature have established a free energy change of 15.1 +- 0.1 kcal.mol-1. So with the short runs I performed here I get a relative binding free energy of -2.4 kcal.mol-1, to compare to -3.0 +- 0.2 kcal.mol-1 for the (hopefully) converged result. The relative binding affinity measured by experiments should be greater than -4.6 kcal.mol-1 and the present simulations thus underestimate the relative affinities of ligand 8 and ligand 1. Even though the runs were short here, it is important to note that they were started from well equilibrated restart files. If you plug the free energy difference you obtained with the insufficiently equilibrated restart into the thermodynamyc cycle, you would have gotten a relative binding free energy of -0.2 kcalmol-1 and concluded that both inhibitors are equally potent. This should make it clear that you have to ensure the free energy changes are reasonably converged if you want your predictions to be reliable.
REFERENCES:
tutorials/protoms-cox2/Perform a Monte Carlo minimisation in implicit solvent
- Not everybody realise that the Monte Carlo method can be used to minimise a system. In metropolis MC, a move that converts state i into state j is accepted with probability exp(-beta (Ej-Ei) ) where beta = ( 1 / kT ) . The lower T is the higher beta is and the harder it is to accept a state of energy Ej greater than Ei. In the limit of T sufficiently close to 0K, only moves that get to a state j of lower energy Ej will be accepted. In practice this is a rather inefficient method to minimise systems and will also get trapped in the nearest local mimina fairly easily. However if that is what you want to do...
- In this example we use a model of the protein COX2 complexed to an inhibitor. Let's first look at the input files and simulation parameters we want to use
# Parameter files
parfile1 ./protoms2/parameter/amber99.ff
parfile2 ./protoms2/parameter/solvents.ff
parfile3 ./protoms2/parameter/amber99-residues.ff
parfile4 ./protoms2/parameter/gborn.parameters
parfile5 ./protoms2/parameter/surface.parameters
parfile6 ./protoms2/parameter/gaff.ff
parfile7 in/lig1bound.template
# PDB Files
solute1 in/lig1.pdb
protein1 in/COX2-scoop1510-neutral.pdb
#set the output files
streamwarning warning
#streamfatal fatal
streamheader off
streaminfo on
#streamenergy on
streamdetail off
streamaccept off
streamdebug on
streamresults results
#streammove stdout
#streamaccept stdout
cutoff 10.00
feather 0.5
born cut 20 threshold 0.005
surface probe 1.4 quality 3
#ranseed 12345678
temperature -273.15- So the par files tell us we want to use the amber force field for the
protein and the gaff force field for the ligand and a GBSA force field for the
solvent, hence solvents.ff is not strictly necessary because we are actually
doing a GBSA simulation (see below). The first parameter for the born keywords
is the cutoff for the Born radius calculations and threshold controls the number
of pair pair interactions that are skipped for the energy update after a MC
move. The values here should be considered as default values and you should use
something else only if you know what you are doing. Surface probe is the probe
size for the non polar energy calculation and should be set to 1.4 for
water. quality controls the accuracy of the surface area calculation. 3 is fine
in most cases. Note that the temperature is set to -273.15 celsius as explained
above. We also specify that the results of the simulation will be saved in a file called results.
- Now looking at the chunks
chunk1 pdb all file=initial.pdb
chunk2 simulate 100 solute=1 printmove=1
chunk3 averages reset
chunk4 fakesim
chunk5 results write
chunk6 pdb all file=final.pdb- The first and last operation is to save a pdb. As a result if you load them in vmd you will easily see how much the ligand has moved during the minimisation. We are also doing 100 MC trials, but only the solute is moved. Because we are not interested in running averages, but only the energy of the last configuration, it is necessary to reset the averages and run a fakesimulation where we pretend we have done a move. The energies are then printed in the results section.
REFERENCES:
tutorials/protoms-mcmin/- ProtoMS allows you to define extra restraining potentials. This can be
useful in a number of circumstances. Here I simply show how to add a distance harmonic
potential between a ligand and protein atom. This could be useful for instance
if you then decouple the ligand from the binding site and want to keep it
anchored there.
- Looking at the input.cmd file...
# Parameter files
parfile1 ../../parameter/amber99.ff
parfile2 ../../parameter/solvents.ff
parfile3 ../../parameter/amber99-residues.ff
parfile4 ../../parameter/gaff.ff
parfile5 in/lig8bound.template
# PDB Files
solute1 in/lig8.pdb
solvent1 in/COX2-watercapbound.pdb
protein1 in/COX2-scoop1510-neutral.pdb
#set the output files
streamwarning warning
streamfatal fatal
streamheader off
streamdebug on
streaminfo on
streamdetail off
streamaccept off
cutoff 10.00
feather 0.5
prefsampling 1 200.0
ranseed 1234550chunk1 id add 1 SOLUTE 1 N2 LI8
chunk2 id add 2 PROTEIN 1 O 318
chunk3 restraint add 1-2 bond harmonic 5.0 3.33
chunk4 pdb all solvent=all file=traj00.pdb
chunk5 simulate 10 solute=1 printmove=1
chunk6 pdb all solvent=all file=traj01.pdb
chunk7 results write- You see then that to define a restraint you must first add id tags to
specific solute, protein or solvent atoms (chunk1 and chunk2). In this example,
id1 will point to the atom name N2, in residue LI8 of solute 1. id2 points to
the atom named O, residue 318, in protein 1. Then a restraint is specified
(chunk3), that will link id1 and id2 with a bond of force constant 5
kcal.mol-1.A-2 and equilibrium distance of 3.33 angstroms.
- Run the script and 10 moves will be executed, the RESULTS section of
the output should show a BNDHAR energy term in the extra energies
section. That's it !