Why genomics isn’t all that

As you gathered from this post, I am skeptical that genomics will have much impact on improving human health. An immediate objection to this skepticism would be that genomic-driven advances have simply not had enough time to work their way into tangible treatments. It has only been 15 years since the first human genome was published. Fair enough. But the obvious follow-on question is, how much time is enough time?

I’d say that 15 years is more than enough time for a truly powerful advance to make an impact. Consider these examples:

• Germ theory as a cause of infectious disease was formulated in the mid-1870s. The first scientific vaccines were developed in the 1880s.

• Blood types were first described in 1901; the first successful transfusions were made in 1907.

• Insulin was discovered in 1921; the first diabetics were treated with it in 1922.

• HIV was confirmed to be the cause of AIDS in 1984; the first anti-retroviral drug was approved in 1987.

• The role of LDL in hypercholesterolemia was described in 1974; the first statin was approved in 1987.

Let’s note that none of these discoveries and treatments are for niche diseases. Infectious diseases were the leading killers in every country until the mid-20th century. Heart disease has now taken its place, and diabetes is not far behind. Even in the age of elaborate clinical trials, big advances get translated into big cures in 15 years or less. By this criterion, genomics does not deserve to be labeled a big advance.

Why not? No one, certainly not me, disputes that genes are fundamental to the workings of life, and that DNA sequences are fundamental to the workings of genes. But there are two factors that interfere with our ability to draw straight lines between DNA sequences and health risks and outcomes: distance and chaos.

DNA, with few exceptions, plays no direct role in health and disease but instead is several steps removed. Our well-being is created and maintained by effector molecules that do all the work: proteins and metabolites. It is true that protein sequences are mostly determined by DNA sequences, and that protein levels are under genetic control. But there is a long series of steps required to translate DNA into a fully processed and localized protein, and every one of these steps is subject to feedback and modification by other proteins as well as by metabolites.

The large number of steps between gene and finished protein means that gene expression contains an element of chaos. Chaos does not mean randomness, although certainly there is a degree of randomness in gene expression. Chaos instead means that the number of interactions in a system gives rise to a combinatorial explosion in which the number of possible outcomes is so great that they cannot be predicted by models that are much less simple than the system itself.

Weather, of course, is the classic example of a chaotic system, one in which butterflies cause blizzards. The number of potential genetic interactions in a cell is not as large as the number of interactions in the atmosphere (the number of air molecules is some 100 tredecillion, which is a 1 followed by 44 zeros), but it is large. In fact, the study of these interactions is its own “-omics” discipline — “interactomics”. A map of the genetic interactome of a single-celled yeast looks like this:

From Genetic Networks

To a first approximation, every gene is connected to and influenced by every other gene. Complex animals like humans have much more complex interactomes.

It gets worse. There is not one human genome, of course. Even though humans are remarkably uniform at the genetic level — we are about 99.5% identical — there are still an enormous number of possible variations. The number of single-nucleotide changes that are found in at least 1% of the population is estimated at 10-30 million. The number of unique changes across our population is about 420 billion (60 mutations per genome replication x 7 billion living humans). Only about 1% of these changes are meaningful. But 1% of a very large number is still a large number.

And it keeps getting worse. We don’t know how to reliably recognize meaningful changes in DNA sequences. When we do recognize them, we can’t usually predict how those changes will play out. We can only do this retrospectively, not prospectively. We have to identify people with a medical condition, analyze their genomes and then try to sort the overwhelming number of meaningless changes from the few that are.

Our ability to create genetic information has far outstripped our ability to store and analyze it. Sure, it may cost only $1000 to sequence a genome. But a conservative estimate puts the costs of analysis at over $100,000 per genome, and that is for an analysis that is not very informative or powerful.

Genomics is not a pseudoscience or a hoax. It is an immensely important discipline that has yielded many insights about the human condition and will continue to do so. But putting it to use is far harder than many of its enthusiasts have led the public to believe. Don’t hold your breath waiting for a new era of medical breakthroughs based on genomics. The most realistic expectation is that progress in genomics will resemble that of the other great science of complex phenomena, weather forecasting: slow, painful incremental progress driven largely by improvements in computational power:

From The ABCs of NWP

Progress will be evident on the scale of decades, not years or months.