Track your comments!
[x]


When you register, comments on your articles and replies to your comments appear here. Register Now!

Sign in to your account
[x]

Not a Scientific Blogging member yet?

Register Now for a Free Scientificblogging.com Account

  • Customize your profile with pictures, banner, a blogroll and more.
  • Leave comments on articles, add other members to your friend lists, chat with people on the site.
  • Write blog posts that can be seen by hundreds of thousands of readers.

It's free and it only takes a minute!

Already a Scientific Blogging member?

Sign In Now

Banner
By Michael White | June 20th 2007 05:07 PM | 3 comments | Print | E-mail | Track Comments
.

More Adaptive Complexity articles

All

About Michael White

Welcome to Adaptive Complexity, where I write about genomics, systems biology, evolution, and the connection between science and literature, government, and society.

I'm a biochemist


... Full Bio

What has the ENOCODE project done, and how do their results change our understanding of the human genome? In Time to Rethink the Gene? I put this project into perspective by briefly outlining some past concepts of the gene and highlighting some of the ENCODE findings.

Now it's time to take a closer look at the results of the ENCODE project and their significance for our understanding of the human genome.

ENCODE's genome snapshot is unquestionably fascinating, and it suggests that some features of genome regulation that were previously viewed as exceptions to the norm are really quite common. But are these results revolutionary? Do they overturn any long-cherished notions about genes that scientists have heavily relied on in their understanding of gene regulation, as some have suggested? And do they support intelligent design? I don't think so.

What ENCODE Did

In one sense, the ENCODE project can be thought of as the third big Human Genome Project - the first project being the actual genome sequencing, and the second being the HapMap Project to extensively study genome variation in different human populations. The ENCODE project is an effort to find and study, on an encyclopedic scale, all of the functional elements in the human genome.

For the first phase of this project, the ENCODE researchers examined a small but reasonably representative chunk of the human genome (roughly 1%, or 30 million DNA bases) by running that chunk through a battery of experimental tests and computational analyses. Most of the experimental techniques and results are unfortunately beyond the scope of this little summary. This first round of the ENOCDE project produced a big paper in Nature, and the journal Genome Research has devoted its entire June issue to papers from the ENCODE project. I'm going to winnow down this mass of material to two of the most interesting topics: transcription and evolution.

Transcription (if you don't know what transcription is, look here):

The researchers attempted to identify regions of DNA that were transcribed. Why? Because our presumption has generally been that most (note the qualifier!) transcripts contain some functional material, such as protein-coding genes or non-coding RNAs that have some functional role (such as miRNAs, snoRNAs, rRNAs, etc.). Therefore by looking for transcribed regions, we can find new functional portions of the genome.

The transcribed regions were identified using tiling arrays, which are DNA-chips, or microarrays, that cover the entire genome and thus can detect transcription from any place in the genome. This is in contrast to more traditional microarrays that only detect the transcription of known genes. Thus by using tiling arrays and a handful of other complementary techniques, the ENOCDE researchers found that a large fraction of the genome region in the study was transcribed, including many places that have no recognizable genes. They estimate that up to 93% of the genome is transcribed, although the evidence for much of this is indirect and other explanations of the experimental results are possible. The actual transcribed fraction may be substantially lower, although it is still likely to be large.

The most interesting finding of these transcription studies is that a lot of strange stuff is ending up in these RNA transcripts. We have long known that different protein-coding regions (exons) from a single gene can be spliced together in various combinations to create many different proteins. The ENCODE researchers confirmed this (the protein-coding genes they studied produce on average 5.4 differently spliced forms), but they also found that chunks of other sequence end up in the transcripts, such as coding and non-coding portions of neighboring genes. Why this is happening is not yet clear, although part of the explanation is surely that the transcription and splicing machinery are more noisy than we previously (and naively) appreciated.

Another major part of the ENOCODE project is to find out just where transcription starts. Transcription start sites (TSSs) are important, because key regulatory events take place there. Regulatory sequences in the DNA, together with regulatory proteins, act at TSSs to control the protein machinery that carries out transcription; this control is critical for deciding which genes in the cell are 'on' or 'off'.

The ENCODE researchers found many new TSSs, sometimes very far away from known genes. Interestingly, the TSSs far away from known genes had different characteristics from those close to known genes, suggesting two distinct functional roles. One possible role for these distant TSSs is to control the higher-order structure (i.e., chromatin structure) of big regions of the genome, and thus to some degree regulating entire sets of genes. This work lays a good foundation for studying these control systems.

Evolution

The ENCODE researchers searched for regions of the human genome that have changed little throughout mammalian evolutionary history; these are the regions that have been constrained by natural selection. They compared portions of the human genome with the genomes of 14 other mammalian species, and found that 5% of the genome is under evolutionary constraint, a result that agrees with earlier studies.

The immediate question then is, how much of the 5% consists of known functional elements? The ENCODE researchers reported the following breakdown:

Of the 5% of the genome that is evolutionarily constrained:
- 40% consists of protein-coding genes
- 20% consists of known, functional, non-coding elements
- 40% consists of sequence with no known function

The sequence with no known function is not too surprising. Functional DNA elements other than protein-coding genes are difficult to find, and in spite of many recent studies we know we're missing a lot. These results tell us roughly how much more functional, non-coding sequence we need to find, and where it is probably located.

The ENCODE researchers also looked at evolutionary conservation from another angle: how much of known, functional DNA falls into conserved regions? Protein-coding genes and their immediate flanking regions are generally well-conserved, while known, non-coding functional elements are less conserved. Again, this is nothing too surprising; non-coding elements tend to be very short and have what is called 'low information content', and they are more easily created and destroyed by by random mutations.

Many potentially functional elements, picked up in the experimental data analyzed by the ENOCODE groups, are not evolutionarily constrained - about 50%, when these elements are compared across all mammalian genomes in the study. This means that there are regions of the genome that are bound by regulatory proteins or that are transcribed, but which have not been constrained by natural selection.

Intelligently Designed Transcription?

I need to pause here and answer the obvious question here that those of you who aren't molecular biologists are probably asking: So does this mean that evolution can't explain much of the functional parts of the genome? Intelligent design advocates are already on the web, misreading the ENCODE work and claiming that it somehow supports the fuzzy claims of intelligent design. My advice: don't believe what you hear about this from people who only have the vaguest understanding of how ENCODE's experiments and analyses work (and that includes biochemist Michael Behe).

The ENCODE results do not cast doubt on evolution. Here are some of the reasons why:

1. Just because something is transcribed or bound by a regulatory protein does not mean that it is actually functional. The machinery of the cell does not literally read the DNA sequence like you and I do - it reads DNA chemically, based on thermodynamics. As I mentioned before, DNA regulatory elements are short, and thus are likely to occur just by chance in the genome. An 8-base element is expected to show up just by chance every 65,000 bases, and would occur randomly over 45,000 times in a 3 billion base pair genome. Nature does work with such small elements, but their random occurrence is hard to control. In a genome as large and complex as ours, we should expect that there is a significant amount of random, insignificant protein binding and transcription. Incidentally, such random biochemical events probably make it easier for currently non-functional events to be occasionally recruited for some novel function. We already know from earlier studies that this kind of thing does happen.

2. To say that something is truly functional requires a higher standard of evidence than the ENCODE research provides. The ENCODE researchers did a fine job detecting transcription and regulatory protein binding with state-of-the-art experimental and computational techniques, but confirming a functional role for these elements will require more experiments aimed at addressing that issue.

3. Some of the functional elements that don't appear to be conserved really are conserved. When you're comparing a small functional element in a stretch of DNA between say, humans and mice, it is often difficult to find the corresponding region in each species. The mice and humans may have the same functional element, but in slightly different places. Thus conserved elements can be missed. The ENOCODE researchers note this, and people like myself who study these small elements know from experience that this happens frequently.

4. Despite what you may read, there is still a lot of junk DNA. The ENOCDE project does not "sound the death-knell for junk DNA." Our genomes are filled with fossils of genetic parasites, inactive genes, and other low-complexity, very repetitive sequence, and it's extremely clear that most of this stuff no functional role. Much of this sequence may be transcribed, but remember that the ENCODE evidence for most of this transcription is indirect - their direct measurements only detected transcripts for ~14% of the regions they studied. Even if much of it is transcribed, this mainly suggests that it is not worth expending energy to actively repress this transcription, since there are so many other controls in place to deal with unwanted transcripts in the cell.

Enlightening but not revolutionary

Moving on from intelligent design, some people, around the web and in a few journals, are making the ENCODE results out to be more revolutionary than they really are. For example, writing in a Nature piece stuffed with exaggerated claims about what our "preconceptions" supposedly are (subscription required), John Greally states that "Now, on page 799 of this issue, the ENCODE Project Consortium shows through the analysis of 1% of the human genome that the humble, unpretentious non-gene sequences have essential regulatory roles," and "the researchers of the ENCODE consortium found that non-gene sequences have essential regulatory functions, and thus cannot be ignored."

Every biologist I know could have told you that "non-gene sequences have essential regulatory roles," years ago, before ENCODE. Larry Moran, over at Sandwalk says that he hasn't "had a 'protein-centric' view of a gene since I learned about tRNA and ribosomal RNA genes as an undergraduate in 1967." Where has Greally been all this time? I'm not sure why he is so surprised.

Also, as I mentioned above, not all (or maybe not even most) of the transcribed, intergenic sequences found by ENCODE are believed to have "essential regulatory roles." Non-coding DNA regulatory elements have been the subject of intense study by many groups for many years now. To claim that we have not paid enough attention to them is wrong. None of the types of transcripts discovered by ENOCDE are really novel; we've seen examples in earlier studies of they found. What is significant about the ENCODE results is the extent of this unusual transcription; what were once thought to be exceptions are now seen to be much more common.

I'm happy to see the ENCODE results; many of us will use their results in our own research, and projects like this certainly help to make the human genome much less of a black box. But they haven't shattered any paradigms that weren't already on their way out, or revolutionized the field of genomics.

Comments

Hank's picture
If you want to get a chuckle out of an anti-Creationism posting, see this: Various Proofs of the Theory of Evolution presented in original form by my uncle, the honorable Charles Darwin in the year 1859 and in subsequent years - by Derwin Darwin II. He cites Ouroboros and Genomicron and we love their stuff.

He's a Biological Anthropologist. I didn't even know such a thing existed, much less that they were so funny.

And here you are writing facts and stuff instead! :)

Jim's picture
those articles always do well. how many hits did sarda's creation museum article get? 8 million? it single-handedly caused us to start re-writing the comment system

Most of the human genome coding is set to accomodate the salient (successful)'human' features and functions (in balance) to (with) the environment. Assuming the driver for evolution is adaptation, then the environment must leave an imprint in human genetics insofar as the 'shape' of humanity is simply a biological reaction to environmental conditions. It seems obvious to me that a mechanism must exist to regulate this very important issue. Within the human genome is a kind of 'map'(white noise) of all the environments by which we have been created. The core specific to humanity is an abreaction process out of which is produced blue-eyed blonds :). Human intelligence is perhaps an interesting variation in the pattern of human specificity, but maybe it is insignificant to the big picture of speciation on the spaceball called earth.
(Logon Didonai)

Add a comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <sup> <sub> <a> <em> <strong> <center> <cite> <code> <TH><ul> <ol> <li> <dl> <dt> <dd> <img> <br> <p> <blockquote> <strike> <object> <param> <embed> <del> <pre> <b> <i> <table> <tbody> <div> <tr> <td> <h1> <h2> <h3> <h4> <h5> <h6> <hr> <iframe>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.
CAPTCHA
If you register, you will never be bothered to prove you are human again. And you get a real editor toolbar to use instead of this HTML thing that wards off spam bots.