Molecular Structures

NMRView applications are all about studying molecules, so it's not surprising that you can use it to keep track of information about your molecule. But how do you get all those bonds and atoms into the program where you can do something useful with them. Most NMRView users are studying molecules made up of amino acids and/or nucleic acids. Since these molecules are polymers of a small number of residues all you need to do is give NMRView a list of the names of the monomers. The simplest format for this information is just a text file containing the names of the monomers. Something like:

met
ala
asn
glu
lys

The entries in the file are the three letter names of the amino acids, and four letter names for nucleotides. For nucleotides, the first letter is "d" or "r" depending on whether the monomer is for DNA or RNA, respectively.

To read the file, use the MoleculesRead TopologySequence File menu command. If you like typing more than mousing, type the command "nv_sread seq fileName" and you will get the same result. In the current version of NMRView, the "fileName" argument needs to be the complete path to the molecule. Both these methods will read the molecule in, and setup up some information within NMRView so that when you save a "STAR" file the molecular structure will be saved as well. That way, you only need to explicitly read the molecule in once.

Lots of scientists are interested in studying the interactions of two molecules so you may want to have more than one molecule loaded at once. There is no hard limit in NMRView for the number of molecules. You'll probably have your head swimming in molecules before NMRView complains. To keep track of all these molecules they need to have a unique names..

By default the molecule name used within NMRViewJ is derived from the name of the sequence file. What if you want your file named myfavoritemoleculein2006.seq, but want to use a simpler name within the program, like Fred? Just put a name for the molecule in your sequence file. To do this you should include a molecule line at the beginning of the sequence file. This line should have two fields, the first should read "-molecule" and the second should be the name of the molecule. Two addional special lines can also be added to further define the molecule. The "polymer" and the "coordset" fields lines have the same format as the molecule line, that is "-polymer" or "-coordset" followed by the name. NMRViewJ will use them to allow multiple polymer and coordset entries. Think of the polymer as being a unique amino acid (or nucleci acid) sequence. The coordset corresponds to what X-ray crystallographers refer to as an assymetric unit. A homodimer would have one polymer, and two coordsets.

Ligands can be specified in the ".seq" file with a line like "-sdfile fileName.sdf". The file specified with the name must an ".mol" or ".sdf" file and must be in the same direcotry as the ".seq" file.

What if you already have a PDB file and don't want to be bothered writing a sequence file. Not too worry, you can read the PDB file directly with the MoleculesRead Topology (using library)PDB File menu command. You're probably wondering what the parenthetical "using library" comment is all about. If you use this command, then NMRView figures out what the residue sequence is from the PDB file, and then loads the appropriate residue topologies from NMRView's own residue library. The resulting atoms and bonds should be the same as if you had read a sequence file with the same sequence as the that in the PDB file. Because of this, the atom names may not be exactly the same as what's in the PDB file. If instead, you use the MoleculesRead Topology (using library)PDB File menu command, then NMRView will use the exact atoms that are in the PDB file. In this case NMRView figures out the bonding based on inter-atomic distances and it may not get the bonding exactly right.

The Topology from Sequence commands just generate molecular topology information within NMRView. (Whats topology? Just the names of the atoms, and what atoms are bonded to what other atoms.) There is no information about where in space (not to mention time) they are. You can have your topology and coordinates too (if you know what they are). If you've already generated a topology and want to add coordinates use the MoleculesRead CoordinatesPDB File menu command. The above two methods that use PDB files will get coordinates for each atom they can find in the PDB file

When using the above menu option to read coordinates in NMRViewJ will read all the "models" in the PDB file. Each model will be stored in a structure accessible with the specified model number. For example, all the coordinates after "MODEL 4" will be in structure number "4" in NMRViewJ. All the models will be used when using the peak identification tools and when calculating atom rmsd values and molecular superpositions. When NMRViewJ first reads a sequence file it will automatically generate 3D coordinates for the atoms. These will be for the molecule in an extended conformation and will be stored in structure number "0". Structure nubmer "0" is deselected in coordinates are explicitly read in. To specify which structures are currently active use the "mol structures molName active ..." command.

Within NMRViewJ, atoms can be specified using a nomenclature as follows:

coordSetName.entityName:residueName.atomName

The meaning of residueName and atomName should be obvious, but what is a coordSet and entity? Imagine a dimer of two identical polypeptide chains. Each polypeptide chain as represented by an amino-acid sequence is, in the terminology of NMRViewJ, an "entity". In this example there is only one entity. However, the entity is represented twice (each monomer) in the actual molecule. Each of these monomers is, in the terminology of NMRViewJ, a "coordSet" (so named because in the molecular structure each monomer is represented by a set of coordinates). A molecule that is a heterodimer would have two entities, one for each polymer. Similarly, a protein with a bound ligand would also have two entities, one for the polymer and one for the ligand. Multiple coordsets and entities can coexist, so for example a homopolymer with a ligand on each polymer would have two entities, each represented once in each coordset.

Use the "mol select" command to select a set of atoms, then use "mol listatoms" to return a list of them, and use "foreach" to loop over them. Here's an example: This would print out all the chemical shifts of the protons in a molecule.

mol select atoms *:*.H* foreach atom [mol listatoms] { set ppm [nv_atom elem ppm $atom] if {$ppm != ""} { puts $ppm } }

Note: the "nv_atom elem ppm" command now returns "" (an empty string) if the atom doesn't have an assigned chemical shift (which I think is a better design than returning some stupid large negative number).

Within NMRViewJ, atoms can be specified using a nomenclature as follows:

coordSetName:residueName.atomName

The meaning of residueName and atomName should be obvious, but what is a coordSet?

Imagine a dimer of two identical polypeptide chains. Each polypeptide chain as represented by an amino-acid sequence is, in the terminology of NMRViewJ, an "entity". In this example there is only one entity. However, the entity is represented twice (each monomer) in the actual molecule. Each of these monomers is, in the terminology of NMRViewJ, a "coordSet" (so named because in the molecular structure each monomer is represented by a set of coordinates).

Use the "mol select" command to select a set of atoms, then use "mol listatoms" to return a list of them, and use "foreach" to loop over them. Here's an example:

This would print out all the chemical shifts of the protons in a molecule.

mol select atoms *:*.H* foreach atom [mol listatoms] {

set ppm [nv_atom elem ppm $atom] if {$ppm != ""} { puts $ppm } }

Note: the "nv_atom elem ppm" command now returns "" (an empty string) if the atom doesn't have an assigned chemical shift (which I think is a better design than returning some stupid large negative number).