Phobrain

DNA Simulation

When you are viewing the Phobrain slideshow, this DNA simulation is running and influencing the picture choice. Here is some information about the simulation.

Simulation and reality. The DNA molecule used is a very short snippet, only a single strand, something you might get if you chemically digested DNA from a living organism or a virus. Specifically it is a strand of three bases: Cytosine, Adenine, and Thymine, or CAT as commonly abbreviated. It was chosen for its flexibility and the amount of computer power available to simulate it. A water bath is simulated by the Generalized Born continuum approach, also for speed.

A Molecular Dynamics program is run to simulate the movement of this molecule at a given temperature, and the web page shows a snapshot of this simulation once per second. The program calculates momentum and the sum of forces on each atom, calculates velocity, and moves the atom an infinitesimal step, and repeats — this could go on forever, at least until the computer variable representing time reaches its maximum value (then it can resume with time reset to 0). Meanwhile, here is a history of the simulation.

You can zoom and rotate (mouse buttons) to see different views. The speed the DNA atoms are moving is controlled to keep the temperature around 2/3 of the way between absolute zero and body temperature (300 Kelvin), i.e. 200K or -100 degrees F, since a warmer molecule resulted in too much jumpiness when viewed at 1 frame per second, and the forces introduced by user clicks in the slideshow page add heat temporarily.

From here on, I focus on setting up this simulation for yourself and viewing it on your own machine, then I give some tips on setting up a webserver to serve coordinates. Getting it on your own machine is rewarding because you can see as much detail as you want of the trajectory by saving more frames and viewing them more quickly. It could take you hours to a day or so to get to that point, using freely available software and your machine's package management system to solve installation problems and the like.

Fortunately no cats were digested to produce this fragment — it was generated by an online DNA structure generator from Rutgers: we pick a DNA conformation (BDNA fiber, but it doesn't matter for this short snippet), specify the sequence CAT, and get a 'PDB file', which contains coordinates for the atoms in the CAT residues, plus the complementary residues ATG (G is Guanine), which form the other strand of normal, two-stranded DNA. Not only are we stripping our CAT of its ATG partner, we are running our simulation in an approximation of real simulated water — so we will save calculation time, not see water molecules swirling around our molecule, and not need to strip them out to get a better look at things. Some might call that keeping it real, but not scientists who really want to study systems like these, for the most part. I like to call it hillbilly science; it's up to you to learn more from the links in this how-to.

Because of the cost for a web server that will simulate all 6 bases in real time, and moreover because there is something wild about a single DNA strand seemingly in a vacuum, we heartlessly edit out ATG from the PDB file. Some may shrink from this, and we respect their choice. (Within a week, if you get the bug, you might be running 3-day jobs with 100 base pairs, real simulated water, and graphics card.) No matter what, we delete the first 3 atoms from each strand because they don't work for the simulation model we are using. They look something like this:

ATOM    246  P     C A   1      -5.706  -7.249 -38.973
ATOM    247  O1P   C A   1      -6.417  -8.303 -39.730
ATOM    248  O2P   C A   1      -6.461  -6.623 -37.865
And if we didn't delete ATG,
ATOM    246  P     A B   4      -5.706  -7.249 -38.973
ATOM    247  O1P   A B   4      -6.417  -8.303 -39.730
ATOM    248  O2P   A B   4      -6.461  -6.623 -37.865
The Clue As To where one strand ends and another begins is the 5th field, for chain A or B.

This part is a little spooky: we've got to make the 'T' residue names 'DT' without changing the position of the 'T' on the line:

ATOM     63  P     T     3       3.249  -5.487   2.154  1.00  0.00           P  
becomes
ATOM     63  P    DT     3       3.249  -5.487   2.154  1.00  0.00           P  
If you know the editing program 'vi', it is:
:%s/ T /DT /g
Worst case you have to do it by hand to 64 atoms (32 if you deleted ATG).

Now we get the AMBERTools package and install it. This could be a long download; time to spend in gratitude for an incredible amount of work that has gone into it over the years. Follow the instructions to get it built. I have done it on Mac and Linux. You will need to know your operating system's package manager (e.g. apt-get, yum, brew, port). Building also takes a while.

Now you have AMBERTools installed and you know what AMBERHOME is. There is just one more thing to get before we can crank through the setup and start the simulation. It's not strictly necessary, because the goal is not to publish our results, but who knows? Let's get the ParmBSC1 Forcefield for DNA. Download it, don't be afraid, get the Library Parameters and the Frcmod Parameters, and put them with the pdb in a new directory. Now we have:

  • cat.pdb
  • parmBSC1.lib
  • parmBSC1.frcmod
Create a leaprc file to establish the setup:
# setup
source leaprc.ff14SB
addAtomTypes {
  { "C1" "C" "sp2" }
  { "C2" "C" "sp2" }
  { "CI" "C" "sp3" }
  { "CE" "C" "sp3" }
}
loadoff parmBSC1.lib
loadamberparams parmBSC1.frcmod
And a leap.build file for our action steps:
# run
mymol = loadpdb cat.pdb
saveamberparm mymol cat.top cat.crd
Now we have
  • cat.pdb
  • parmBSC1.lib
  • parmBSC1.frcmod
  • leaprc
  • leap.build
and we are ready to run a program, Leap.

There are two ways to handle the next step: without and with graphics. Both are good to know. Without:

% tleap -s -f leaprc -f leap.build
 .. lots of output ..
> quit
%
Or you can use the graphical editor, in which case you can
% xleap -s -f leaprc -f leap.build
 ...
> edit mymol
to see the molecule. (Middle button to rotate, middle and right together to zoom.) I will leave you to read and explore. Warning, use Unit--Close to exit the molecule editor, or xleap will exit altogether.

Basic setup is complete! Now we will make our computers warm up with some real number crunching. (Don't worry, your computer cannot be too fast for this step, but if it starts shaking it's best to simulate fear by running from the room.) If you want to go for the paid version of AMBER, you can run on a GPU and accomplish feats it took a supercomputer to do until GPUs came along.

Having come this far, let's celebrate by blowing through the remaining steps as tersely as possible. But first let's list the files we have achieved so far that are needed for the next step:

  • cat.top
  • cat.crd
Now is not yet the time to call our grandmothers with excitement, but we are getting close!

Create min.in:

min
 &cntrl
  IMIN=1, NCYC=250, MAXCYC=500,
  NTB=0, cut=20
/
Run the energy minimization step. Good Form though not so important for a small molecule without flaws:
% sander -O -i min.in -o cat_min.out -p cat.top -c cat.crd -r cat_min.rst
This gives us cat_min.rst for the next step (and cat_min.out for error and other interesting info). Create md.in:
md init
 &cntrl
  IMIN=0, 
  NTB=0, cut=99,
  NTT=1, igb=1,
  tempi=200, temp0=200,
  dt=0.002, ntc=2, ntf=2,
  ntxo=1, ntwr=1000, ntwx=1000,
  nstlim=1000000
/
Run dynamics:
% sander -O -i md.in -c cat_min.rst \
         -p cat.top -o cat_md.out \
         -r cat_md.rst -x cat_md.crd &
% cat mdinfo

If you keep catting mdinfo, you should see the progress of the simulation. At any point we can also see the molecule itself by

% ambpdb -p cat.top -c cat_md.rst > cat_snap.pdb
% xleap
> x = loadpdb cat_snap.pdb
> edit x

Cool, but not dynamic. To see an animation, get VMD, first load a cat_snap.pdb to define the molecule, then load cat_md.crd as an 'AMBER Coordinates' file, and call granny.

If you want to put it online, here is code for a web server Servlet for serving up the files to pages running NGL, a WebGL molecule viewer (NGL Viewer: a web application for molecular visualization). (Note that the Servlet is hard-coded for the number of atoms in my molecule.)

See the Amber site for docs and tutorials, notably for a more science-oriented DNA setup with a longer, two-strand molecule, TUTORIAL B1 Simulating a DNA polyA-polyT Decamer. And see Wikipedia for general info on Molecular Dynamics.

There is more background to Phobrain in the Gallery.


  
 


You're not doing it right!

What use to cry for Capricorn? it sails
Across the heart's red atlas: it is found
Only within the ribs, where all the tails
The tempest has are whisking it around.


— Mervyn Peake, Titus Alone

© 2015,2016,2017 Photoriot.