UMass Amherst Computer Scientist and International Team Offer Theoretical Solution to 36-year-Old Computation Problem

Barna Saha
Barna Saha

AMHERST, Mass. – University of Massachusetts Amherst computer science researcher Barna Saha, with colleagues at MIT and elsewhere, are reporting the theoretical solution to a 36-year-old problem in RNA folding predictions, which is widely used in biology for understanding genome sequences.

The authors presented preliminary results in an extended abstract at the Foundations of Computer Science conference in New Brunswick, N.J. Their final article will appear as one of the conferences’ select papers in an upcoming special issue expected in 2019 of the Society for Industrial and Applied Mathematics’s (SIAM) Journal of Computing.

As Saha, an expert in algorithms, explains, computational approaches to find the secondary structure of RNA molecules are used extensively in bioinformatics applications. Knowing more about RNA structure may reveal clues to its role in the origin and evolution of life on earth, but experimental approaches are difficult, expensive and time-consuming. Computational methods can be helpful, and when integrated with experimental data can add to knowledge, she adds.

Among early researchers to take on the problem were structural chemist Ruth Nussinov in Israel and microbiologist Ann Jacobson in Stony Brook, N.Y., who in 1980 published an algorithm for predicting the secondary structure of single-strand RNA, an accomplishment that “has been in the heart of many further developments in this basic problem” ever since, Saha says. “Over the past 36 years, the cubic running time for their algorithm has not been improved,” she adds, which made many to believe that it is the best running time possible for this problem.

Cubic running time refers to the length of time it will take a computer to do the calculations required, Saha explains. It is a function of the length of the RNA base pair string entered as data. For example, if the string has 1,000 base pairs, the running time for Nussinov and Jacobson’s algorithm will be 1,000 cubed, or 1,000 x 1,000 x 1,000.

However, Saha and colleagues Virginia Vassilevska Williams at MIT, Karl Bringmann at the Max Planck Institute, Saarbrücken, and Fabrizio Grandoni at the Istituto Dalle Molle di Studi sull’Intelligenza Artificiale, Switzerland, now say they have shown theoretically that there is a faster, subcubic algorithm possible for RNA folding computations.

Saha says, “Our algorithm is the first one that takes the Nussinov and Jacobson model and improves on it. We show that you can solve the problem faster than cubic running time.” She and colleagues developed a new faster algorithm for a special kind of matrix multiplication using which they reduced running time from 3 times the length of the base pair string to 2.82 times. “It may not be the fastest yet, there might be room for improvement,” she says. “This is the first work that breaks the barrier.”

Vassilevska adds, “Before our algorithm, it seemed quite plausible that there is a cubic barrier for RNA-folding. In fact, with co-authors I tried to prove this. We failed – the cubic barrier didn’t seem to follow from any known hypotheses. This failure later helped us break the cubic barrier – if a problem is not hard, it’s likely easy. One of the most fascinating things about our solution is that it is more general than RNA-folding. It breaks the cubic barrier for several other problems, and could potentially lead to further breakthroughs in our understanding for fundamental problems such as shortest paths in networks, pattern recognition and so on.”

Earlier this year, Saha was awarded a five-year Faculty Early Career Development (CAREER) grant from the National Science Foundation, its highest award in support of junior faculty, which supported her work on this project. She says that in future papers, she plans to show that running time can be improved further if the researcher is willing to allow the algorithm to yield “slightly suboptimal folding structures,” that is, the solution will be very close but not precisely correct.