.
I don't hate computational biology, but I've got my issues with the way the field is often practiced. Most of my complaints boil down to this: if a computational biologist is not contributing to our understanding of biology, and not contributing to fundamental computer science either, then what's the point? What are we learning from the research?More Adaptive Complexity articles
AllThe problem usually crops up when computational biologists don't seem to care whether their computational results correspond with any biological reality. If a computer model or algorithm is able to (more or less) recapitulate existing data, then that's considered sufficient. But then what is your model contributing? We already knew the existing data, and chances are, your model hasn't contributed anything new to computer science.
Recently, I've moderated my stance on this a little: there is perhaps one legitimate niche, I think, for computational biologists who don't really care about testing their models with new experiments. It's conceivable that you can write an algorithm or develop a modeling approach that doesn't advance some fundamental computer science question, and that doesn't teach us anything new about biology, but nevertheless produces of new way of dealing with a computational problem.
An example: you might figure out a better way to incorporate prior information (like a model of how gene regulatory elements evolve) into a computational tool that searches for regulatory elements in genome sequence. Your improved method does better than others on the data you basically used to build your model (say, gene expression data from the cell cycle or Drosophila embryonic development), and so you're satisfied; it's time to publish. Don't deceive yourself - you have not shown that your method is now generally better at capturing real biology, because you haven't tested your model with new data some new context. Checking for overfitting by training on only half your original data and testing on the rest doesn't count, because your success could just be due to some quirk of that particular data set. Your method might do significantly worse on data in a different context. (And yes, this happens frequently.)
In this case, you haven't advanced biology or addressed a deep question in computer science, but maybe you have developed a new tool that some other computational biologist can use to genuinely learn something about biology. This is a narrow niche, and not my idea of great science, but I can see how this could be useful. Unfortunately, a big chunk of the literature falls into this category.
While I grudgingly accept that ther may be a niche for computational biology that doesn't produce new results about biology, I've had a hard time understanding how some computational biologists can be so passive about the issue. Don't they care whether their methods for aligning non-coding sequence/finding cis-regulatory elements/predicting protein-protein interactions/modeling gene regulatory networks are right? To know whether you're right or not, you must test your model on something new. And if you have two models that explain the data equally well, the next step is to devise some experiment that will distinguish between these models.
The philosophy I'm advocating here is captured in a quote by Richard Feynman that I've used before:
There is also a more subtle problem. When you have put a lot of ideas together to make an elaborate theory, you want to make sure, when explaining what it fits, that those things it fits are not just the things that gave you the idea for the theory; but that the finished theory makes something else come out right, in addition.
I believe strongly in this, but I've recently experienced an epiphany about computational biology. I understand now, I think, why some computational biologist don't agree with this philosophy of science. Their outlook is dramatically different, and it can explain why the field works the way it does. What is this outlook? It's this: the goal of computational scientists is to explain the existing data with models that produce a good fit to the data using the fewest parameters. If they can do so, even if it's just on data that was used to build the model, then, they argue, their model represents a better understanding of the biological system.
And so, if you have two models that explain the data equally well, instead of devising some experimental test that will distinguish between the two, you simply go with the model that has fewer parameters and assume, as a matter of philosophy, that this model is better.
I can't agree with this approach. If your goal is pure prediction, without understanding the underlying phenomena, then this approach is OK. But computational biologists don't limit themselves to pure predictions - they love to make claims about how their model shows what makes the cell cycle robust, or how their results disprove some prevailing idea about the evolution of transcription factor binding sites. They talk like they've gained "mechanistic insight" into gene regulation or some other biological phenomenon.
To make claims like that, about some biological phenomenon, it's simply not enough to have a model with fewer parameters. You have to have to go out and compare your claims with reality. Without that, you have no clue whether you're ideas are right.
Read the feed:









You were able to identify the mechanisms responsible for a phenom, or at least identify mechanisms that are equivalent to the ground-truth mechanisms. For example, say we know a good deal about the ecological interactions between a six or seven species, we think we know how they interact, so we build a model simulating that interaction, and produce similar data to the "ground-truth." Then that verifies that our ideas of how these species interact is correct - or, if not correct, are capable of generating the sort of behaviors we observe... in a sense we show that the processes are equivalent, even if our model isn't "true."
Such work doesn't contribute to computer science, doesn't produce anything "new" but it is sort of a "proof of concept" for the ideas we had governing species interactions. Does that make sense?