Track your comments!
[x]


When you register, comments on your articles and replies to your comments appear here. Register Now!

Sign in to your account
[x]

Not a Scientific Blogging member yet?

Register Now for a Free Scientificblogging.com Account

  • Customize your profile with pictures, banner, a blogroll and more.
  • Leave comments on articles, add other members to your friend lists, chat with people on the site.
  • Write blog posts that can be seen by hundreds of thousands of readers.

It's free and it only takes a minute!

Already a Scientific Blogging member?

Sign In Now

Banner
By Michael White | October 9th 2009 08:26 PM | 3 comments | Print | E-mail | Track Comments
.

More Adaptive Complexity articles

All

About Michael White

Welcome to Adaptive Complexity, where I write about genomics, systems biology, evolution, and the connection between science and literature, government, and society.

I'm a biochemist


... Full Bio

I don't hate computational biology, but I've got my issues with the way the field is often practiced. Most of my complaints boil down to this: if a computational biologist is not contributing to our understanding of biology, and not contributing to fundamental computer science either, then what's the point? What are we learning from the research?

The problem usually crops up when computational biologists don't seem to care whether their computational results correspond with any biological reality. If a computer model or algorithm is able to (more or less) recapitulate existing data, then that's considered sufficient. But then what is your model contributing? We already knew the existing data, and chances are, your model hasn't contributed anything new to computer science.

Recently, I've moderated my stance on this a little: there is perhaps one legitimate niche, I think, for computational biologists who don't really care about testing their models with new experiments. It's conceivable that you can write an algorithm or develop a modeling approach that doesn't advance some fundamental computer science question, and that doesn't teach us anything new about biology, but nevertheless produces of new way of dealing with a computational problem.

An example: you might figure out a better way to incorporate prior information (like a model of how gene regulatory elements evolve) into a computational tool that searches for regulatory elements in genome sequence. Your improved method does better than others on the data you basically used to build your model (say, gene expression data from the cell cycle or Drosophila embryonic development), and so you're satisfied; it's time to publish. Don't deceive yourself - you have not shown that your method is now generally better at capturing real biology, because you haven't tested your model with new data some new context. Checking for overfitting by training on only half your original data and testing on the rest doesn't count, because your success could just be due to some quirk of that particular data set. Your method might do significantly worse on data in a different context. (And yes, this happens frequently.)

In this case, you haven't advanced biology or addressed a deep question in computer science, but maybe you have developed a new tool that some other computational biologist can use to genuinely learn something about biology. This is a narrow niche, and not my idea of great science, but I can see how this could be useful. Unfortunately, a big chunk of the literature falls into this category.

While I grudgingly accept that ther may be a niche for computational biology that doesn't produce new results about biology, I've had a hard time understanding how some computational biologists can be so passive about the issue. Don't they care whether their methods for aligning non-coding sequence/finding cis-regulatory elements/predicting protein-protein interactions/modeling gene regulatory networks are right? To know whether you're right or not, you must test your model on something new. And if you have two models that explain the data equally well, the next step is to devise some experiment that will distinguish between these models.

The philosophy I'm advocating here is captured in a quote by Richard Feynman that I've used before:

There is also a more subtle problem. When you have put a lot of ideas together to make an elaborate theory, you want to make sure, when explaining what it fits, that those things it fits are not just the things that gave you the idea for the theory; but that the finished theory makes something else come out right, in addition.

I believe strongly in this, but I've recently experienced an epiphany about computational biology. I understand now, I think, why some computational biologist don't agree with this philosophy of science. Their outlook is dramatically different, and it can explain why the field works the way it does. What is this outlook? It's this: the goal of computational scientists is to explain the existing data with models that produce a good fit to the data using the fewest parameters. If they can do so, even if it's just on data that was used to build the model, then, they argue, their model represents a better understanding of the biological system.

And so, if you have two models that explain the data equally well, instead of devising some experimental test that will distinguish between the two, you simply go with the model that has fewer parameters and assume, as a matter of philosophy, that this model is better.

I can't agree with this approach. If your goal is pure prediction, without understanding the underlying phenomena, then this approach is OK. But computational biologists don't limit themselves to pure predictions - they love to make claims about how their model shows what makes the cell cycle robust, or how their results disprove some prevailing idea about the evolution of transcription factor binding sites. They talk like they've gained "mechanistic insight" into gene regulation or some other biological phenomenon.

To make claims like that, about some biological phenomenon, it's simply not enough to have a model with fewer parameters. You have to have to go out and compare your claims with reality. Without that, you have no clue whether you're ideas are right.

Read the feed:


Comments

As an undergrad, I did some computational work for a biologist. My work actually was able to generate new hypotheses so it doesn't fall under the category you describe here, but I just wanted to address your question: if your work does nothing but recapitulate data we've already observed, you actually have contributed.

You were able to identify the mechanisms responsible for a phenom, or at least identify mechanisms that are equivalent to the ground-truth mechanisms. For example, say we know a good deal about the ecological interactions between a six or seven species, we think we know how they interact, so we build a model simulating that interaction, and produce similar data to the "ground-truth." Then that verifies that our ideas of how these species interact is correct - or, if not correct, are capable of generating the sort of behaviors we observe... in a sense we show that the processes are equivalent, even if our model isn't "true."

Such work doesn't contribute to computer science, doesn't produce anything "new" but it is sort of a "proof of concept" for the ideas we had governing species interactions. Does that make sense?

adaptivecomplexity's picture
You were able to identify the mechanisms responsible for a phenom, or at least identify mechanisms that are equivalent to the ground-truth mechanisms.

This gets to the heart of it - don't you want to know which mechanism is right? It's not enough to come up with a mechanism that explains the data equally well as something else. 


If you have two models of how something works that explain the data equally well, the next step is to push your models - do they ever make divergent predictions about what we should observe? If they do, then you have a chance to really find out which model is (provisionally) right. Alternately, if two models are truly in all respects equivalent (like, say, Heinsnberg's and Schrödinger's formulations of quantum mechanics), then that should be shown formally.


I do think your example of species interactions sounds OK - my rant was inspired more by the kind of thing that goes on in the analysis of gene expression data or attempts to predict gene expression from sequence.  In your case, I think what you suggest is right: you have a system and you think, based on existing evidence, that you know how the components interact. You thus build a mathematical model to formalize your thinking, and to see if your mechanistic model can produce what's observed. I still think that at this point it's essential to test the model - in ecology, experiments aren't always feasible, but often nature does the experiment for you, and you can see if your model explains a slightly different situation.

As an addendum, in a sense, the work I described wouldn't tell us what exact mechanisms are happening in the cell, or ecosystem, or organ or what have you, but it *would* tell us that such knowledge isn't *absolutely necessary* to gain further understanding. It shows that the proposed mechanism is roughly equivalent to the ground-truth mechanism.

Add a comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <sup> <sub> <a> <em> <strong> <center> <cite> <code> <TH><ul> <ol> <li> <dl> <dt> <dd> <img> <br> <p> <blockquote> <strike> <object> <param> <embed> <del> <pre> <b> <i> <table> <tbody> <div> <tr> <td> <h1> <h2> <h3> <h4> <h5> <h6> <hr> <iframe>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.
CAPTCHA
If you register, you will never be bothered to prove you are human again. And you get a real editor toolbar to use instead of this HTML thing that wards off spam bots.