[NOTE: This is the text of a manuscript that Cyrus Levinthal wrote and circulated shortly before his death in 1990. We are grateful to Francoise Levinthal for making this text available to us]
In the Fall of 1964, the research in my laboratory at MIT was directed towards understanding intra-cistronic complementation. We had studied pairs of mutant strains of E. coli, each of which produced alkaline phosphatase protein which had no enzymatic activity and each of the proteins was electrophoretically different from the wild type and different from each other. Phosphatase dimers were dissociated and the separated monomers reassociated to produce some hybrid molecules which could be identified and separated electrophoretically. Several of these hybrid molecules were enzymatically active. Various attempts were being made in the lab to make physical models in an attempt to understand the specifics of the molecular interactions involved.
A few days before Christmas, in 1964, I talked to Bob Fano, who was the director of MIT's Project MAC, to ask whether we could get computer time to handle scintillation counter data since we had recently acquired a multichannel counter and at that time the calculations involved were tedious. During our conversation, Fano told me about the "kludge" which had recently been developed at Project MAC, by Tom Stotz and John Ward, with which one could produce, on a video screen, an image which gave the illusion of a three dimensional object in rotation. In thinking about this over Christmas day, I became enthused at the idea of making protein models which I thought could be used for our complementation work and a variety of other macromolecular modeling problems. The following day I spoke further with Fano, got a password for the system, and immediately began learning programming from Richard Mills, who was the associate director of MAC and from Tom Stockman, who was then one of the junior faculty members in electrical engineering. A few days after this, I called Bob Langridge, who had started using computers in 1956 in connection with x-ray diffraction studies of DNA, to ask whether he knew of anyone who was already using interactive computer graphics to model proteins or other molecular structures. He told me he did not know of anyone but he became as enthused as I at the notion of being able to do this. Within about three weeks, I had learned enough programming in the language then used at Project MAC, called MAD (Michigan Algorithm Decoder), to generate the coordinates of a polypeptide chain which could be put into the form of an alphahelix or other sub structures.
As programs were being developed for the generation of protein coordinates so that displays were possible it became clear that one of the major uses of computer graphics in this work was the ease with which one could use the graphics as a way of debugging programs. The first example of this was one in which a particular bond in the peptide chain changed as the chain grew, indicating that one of the rotation matrices which I had written was not orthogonal. As soon as this was clear, it was simple to find the problem and correct it.
Observing the display was extremely exciting at that time and it generated a great deal of enthusiasm on my part and on the part of one of my graduate students, Martin Zwick. Zwick then joined me in the project. He also learned to program quickly and together we went on to study proteins, protein crystals and whatever structural data we could get our hands on. At that time, it was difficult to obtain crystallographic coordinates although the results of the structural analysis had been published. To a considerable extent we deduced the general nature of the structure from published stereo-pair photographs of Kendrew brass models, and combined this information with the model building of alpha-helices.
About this time, I spoke further with Langridge and asked him to join me at Project MAC so that he could pursue his work on the x-ray structures and the mechanics of nucleic acids while I continued working on proteins. Langridge and Andy MacEwan, Martin Zwick and I then continued working on the problems of computer graphics; Zwick and I on proteins, and Langridge and MacEwan on nucleic acids. Except as an attempt to be funny in making movies, we made no effort to think about the interaction of proteins and nucleic acids.
I started the protein work with considerable hope that the use of the graphics would allow us to guide computer programs in a search for a minimum energy structure and thus aid in understanding the problem of protein folding. It quickly became clear that this was grossly over-optimistic, but trying to analyze the problem lead to the notion that the folding of a protein has to be thought of at least as much in terms of the selection of potentially stable structures during biological evolution as a simple problem in physical chemistry. In order to speed the calculations for determining the energy of a protein conformation, I developed a procedure for "cubing" a protein in space with the help of a set of "list-processing" programs of Joe Weizenbaum, also at Project MAC. "Cubing" was one of the first examples of the use of a divide-and-conquer algorithm to simplify the process of determining relevant interactions in a protein. With it each atom was located within a cube in space and its interaction distance was tested with respect to atoms in the same cube and the 26 surrounding ones. With the storage limitations of the IBM 7094 then being used, this procedure saved considerable time although with present generation computers it is not clear that it is very useful.
Some time after we were well along on the modeling work I was approached by William Raub and Bruce Waxman who were at NIH to consider developing programs for small molecules which could be used by pharmacologists in attempting to design drugs which might interact with proteins. I became interested in this proposal as a useful application of what we had already done, so we initiated a contract with NIH when I decided to move to Columbia University in the Spring of 1967. Much of the work that we did on this small molecule project was done with Lou Katz and others at Columbia but it was ultimately taken over in a more serious way by the Prophet system set up by NIH.
My own involvement in the work on protein structures waned somewhat at about the time I moved to Columbia because it seemed to me that most of the activities which were directed at predicting structure from sequence data were, at that time, more game playing than serious science. Writing a computer program to predict what is already known from crystallographic data has been too dangerous. However, in our laboratory, and even more in that of Langridge, the use of much improved interactive computer graphics programs for the docking of proteins, and the interaction of small molecules with proteins has played a major role. In my laboratory we continued to improve the computational and interactive aspects of programs so that we have had a steadily improved version of an interactive modelling package of programs called the PKG, which uses rotatable bonds as the variables with energy minimization in torsional space. The interactive torsional minimizer is now running on a STAR-100 array processor with code written by Mr. Huajun Wang at our facility so it is fast enough to be effectively interactive, and it can be used to feed coordinates to a conventional cartesian program like Amber, Charmm, or Discover which also run on the STAR by virtue of the microcode written by Dr. Bernard Brooks at NIH. However, for interactive purposes where one wants to move parts of a protein through a substantial distance, the torsional program is much faster than a cartesian program although the latter are much more efficient for local minimization and dynamics. My attitude on the usefulness of calculational predictions changed significantly when it became clear that techniques in recombinant DNA research made it possible to modify proteins or generate new proteins and thus, provide a way of evaluating models which were generated computationally. In addition the use of computational methods to predict the structure of proteins which are homologous to others which have been solved crystallographically becomes more important as more amino acid sequences are deduced from DNA sequencing.
It seems likely that deducing the conformations of homologous proteins will require extensive calculations on high-speed computers as well as effective interactive graphics. The availability of time on super-computers, as well as the increased use of mini-supers, array processors and attached processors has increased the computer power available for the biophysical chemistry calculations. However, it is obvious that greater computational speed have also increases the demands on graphics systems. Homologies in conformation are most readily noted by human observation of 3-D structures and frequently ambiguities in computational results can only be understood by studying the graphical outputs. Our initial hope that our "chemical insight" could be used to guide the programs has been superseded by a much more realistic hope that although we may not have "chemical insight" there are more and more 3-D structures determined experimentally to aid in understanding which conformational results are reasonable and which are not; as long as we can look at them.