Disclosure up front: I helped found SomaLogic and was its first R&D director. I left the company in 2004, but exercised my options and still hold its stock. I am definitely not neutral about the value of proteomics as a diagnostic and prognostic platform.
Last Sunday’s NYT magazine had a pretty good article about the potential of proteomics, focusing largely on SomaLogic and its story. It conflated the origin stories of NeXstar and SomaLogic, but that’s no big deal.
Genomics has gotten most of the love from science journalism in the 21st century. And there is no question that it is one of the landmark advances in biology, one that has transformed our understanding of life at many levels – the cell, the organism, our environments and our evolutionary history.
But its value in medicine, both potential and realized, has been vastly oversold. I’ve argued this take repeatedly. But rather than repeat these arguments, I’ll steal one of Larry Gold’s favorite examples.
Here I am at age 8:
And here I am at age 58:
My choice of beverage has not changed. Nor has my genome.
But I have changed a lot. The health risks I faced at age 8 are not at all the ones I face at age 58. Nor would the treatments appropriate at one age be appropriate for the other.
Although I’ve undoubtedly picked up somatic mutations along the way, there is little that a genomic scan could tell me about my health now (which is not much) that it could not have in 1964. That seems to me like a pretty serious limitation.
In contrast, the suite of expressed proteins can tell me a lot. In fact, it can even tell me my age (although a calendar does that also I suppose):
From Plasma proteomic signature of age in healthy humans
Genes, on their own, do nothing. Without a set of proteins to decode them and carry out their will, genes sit inert and helpless. DNA sequences are static and largely unresponsive to changes in the environment. Proteins respond to every change in the environment, both internal and external. The proteome – that is, the suite of proteins present in a tissue at any one time – comprises a near-complete description of what that tissue is and what it is doing and how it is doing it. That is way more dynamic and informative than DNA sequences could ever be.
We’ve been using proteins as markers for diagnosis and prognosis for a long time of course. But single proteins just aren’t all that informative for most diseases. Prostate-specific antigen (PSA) for detecting prostate cancer is an example. Although PSA levels are elevated in cancer patients, the overlap in PSA between healthy and cancerous men is substantial. Any level that we pick as a cutoff is going to lead to many false positives, false negatives, or both. This isn’t a limitation of measurement accuracy, it’s biology.
It stands to reason that measuring more proteins will uncover patterns of expression (call them signatures) that contain more information than is carried by any single protein.
That is an uncontroversial idea, but the problem is that proteins are hard to measure. Nucleic acids are chemically very simple, consisting of only 4 components, all of similar size, solubility and charge. They are linked together in a one-dimensional array. Their information is easy to read (that is their biological role, after all). And so we have gone looking under the lamppost, collecting their information because we can. We are now approaching 10 trillion bases of sequence information. But we can’t predict or detect many diseases using gene sequences.
Proteins are much more complex than nucleic acids, having 20 different components that are chemically diverse. It is much easier to infer the sequence of proteins from the genes that encode them than it is to sequence the proteins themselves.
Detecting and quantifying proteins is correspondingly harder. The two main approaches are to chop them up into small segments that can be analyzed by physical methods such as mass spec, or develop affinity reagents that detect one specific protein to the exclusion of all others.
The usual platform for affinity reagents is antibodies. The technology for generating and using antibodies to detect proteins is mature and well-developed. Nearly all diagnostic assays that target proteins use antibodies. They work great for detecting one or a few proteins.
But if you want to detect hundreds or thousands in a single sample? Not so great. Once you mix a dozen or so antibodies into an assay, they begin to misbehave, binding to each other and to the wrong protein targets. They just don’t work that great in “omics”-scale applications, where you want to scan hundreds or thousands of proteins.
SomaLogic, after many false starts (some of which I was responsible for) figured out how to use aptamers (small snippets of RNA that are generated by an in vitro selection process) to recognize proteins with high sensitivity and specificity in a scalable format. Their current assay is now capable of measuring some 5000 proteins in human plasma. That is a big deal.
Proteomics, for the first time ever, is no longer limited by protein detection technology. Significant advances in diagnosis and prognosis of disease are already being made: for infectious diseases, cancer, cardiovascular disease, Alzheimer’s and more. Nearly all diseases – and pre-disease states – are likely to leave a signature in the proteome.
With detection and measurement solved, the challenges now for proteomics are clinical and informational. Signature discovery to date has used a case-control model: proteomes from a group with a disease are compared to those without, and a model is constructed that differentiates the two groups. The limitation of this approach is that it can be no more accurate than the quality of samples it relies upon. Even a small number of misdiagnosed samples can confound the results. And, particularly for high-prevalence diseases, a substantial fraction of the control group is likely to have undetected disease or pre-disease states that also yield problematic results. The next step in advancing this technology is to implement longitudinal studies, in which large cohorts are regularly monitored for years or decades. That, obviously, will take time.
In the meantime, there is much that can be done, including predictions of who will respond to a given drug treatment and who will suffer adverse events; dissection and monitoring of immune responses, including transplant rejection; monitoring of overall fitness and health; diagnosis and classification of cancer and other diseases. In principle, just about any condition or response that changes or is changed by protein expression levels can be usefully studied with proteomics.
Genomics is essentially a record of the past. Proteins tell us what is going on now. We’ve already gotten most of the clinically useful information from genomics that we will ever get. But we are just getting started with the proteome.
1 thought on “Proteomics will soon eclipse genomics.”