Track your comments!
[x]


When you register, comments on your articles and replies to your comments appear here. Register Now!

Sign in to your account
[x]

Not a Scientific Blogging member yet?

Register Now for a Free Scientificblogging.com Account

  • Customize your profile with pictures, banner, a blogroll and more.
  • Leave comments on articles, add other members to your friend lists, chat with people on the site.
  • Write blog posts that can be seen by hundreds of thousands of readers.

It's free and it only takes a minute!

Already a Scientific Blogging member?

Sign In Now

Fake Banner
By Gerhard Adam | October 9th 2009 05:21 PM | 9 comments | Print | E-mail | Track Comments
A recent LiveScience article 'Computers Faster Only for 75 More Years' has indicated that new research conducted by two physicists have placed a speed limit on what's attainable regardless of the size of the components.  

Moore's Law1 has often been touted as representing an infinite curve of progress, but this explanation clearly indicates that nothing proceeds indefinitely.  In addition, depending on technological developments in computer design and architecture, that limit may actually occur within 20 years according to Scott Aaronson, an assistant professor of electrical engineering and computer science at MIT.

One of the problems in today's technology is the creation of heat before speed is achieved.  As a result many manufacturers have resorted to using duo and quad-core processors to improve performance.  However, one statement in the article bears some additional consideration.
"Hence the recent trend in duo and quad-core processing; rather than build faster processors, manufacturers place them in tandem to keep the heat levels tolerable while computing speeds shoot up."

This perception of computer speed is a common misunderstanding.  While there is an argument that can be made for parallel processing, the majority of applications in use do not employ parallel programming techniques and consequently can derive little or no benefit, individually, from duo or quad-core designs.  In other words, a sequence of instructions can not run across processors, so the "speed" improvements are due to allowing increased levels of multiprogramming, where more tasks can run simultaneously, rather than improving the speed of a single task.  "Speed" gains are achieved by reducing processor queuing and minimizing competition for access to the CPU.

In large systems this becomes more problematic because of the systems architecture is having to provide speed-matching capabilities for high level caches.  The cost of ensuring integrity and coherency between these cache memories creates a performance penalty whereby each additional processor decreases the total computing power available to applications.  As a result, there are limits in the degree of parallelism that can be exploited by applications as well.  

While some problems lend themselves well to parallel programming techniques, many do not and may present another barrier to performance improvements.  In particular, a key problem is data movement since few applications are dependent only on computing speed, the ability to hold large amounts of data becomes a significant factor in considering systems architectures.  This presents a new hurdle as operating systems need to support and control larger and larger amounts of memory2, newer algorithms and techniques that must be deployed to maintain scalability.

As new technologies and developments emerge we may see further increases in raw computing speed, but regardless of what occurs, we now know that it can't continue forever.  The unrestricted growth envisioned in many quarters has now been shown to be wrong.

"If we view the exponential growth of computation in its proper perspective as one example of the pervasiveness of the exponential growth of information based technology, that is, as one example of many of the law of accelerating returns, then we can confidently predict its continuation."
http://www.kurzweilai.net/articles/art0134.html?printable=1




1
Moore's Law states that manufacturers could double computing speed every two(2) years by using smaller and smaller components.
2 It seems clear that accessing external devices over an I/O bus is unlikely to ever catch up to the processing speeds available in the core, so data access must utilize on-board memory to allow high speed processors to run without disrupting the instruction pipeline waiting for data to arrive.

Comments

kerrjac's picture
Nice article and interesting pulled quotes.

Regarding the ending quote, the author could've just as easily referred to the law of diminishing returns, which would have made the opposite point. It raises an interesting issue of when - or, for that matter, if - the productive benefits derived from increases in computer speed will top off.

Gerhard Adam's picture
Interestingly enough, the primary issue with increased speed is two-fold.  The first is to allow higher levels of multi-programming so that numerous unrelated tasks can run on a single operating system.  Therefore the faster the processor (or the more CPUs it has), the more work can run simultaneously with minimal delays (and thereby achieve good response times).

The second is to provide larger bandwidths for data access, especially as far as databases are concerned.  Many database systems maintain several Gigabytes of RAM as an on-board cache to ensure rapid access and in high-end systems there are processors that allow a type of "memory-sharing" so that multiple systems in a configuration can gain several orders of magnitude improvement in data sharing over conventional disk access.

All of these implementations benefit from increases in speed, but more importantly it is the improvements in architecture that enable such gains.  Despite the fascination with processor speeds, the sticking point is data bandwidth which is hopelessly slow, hence the need for larger memory sizes (and their attendant scalability problems).


Speed of computing requires concise research. Every so often, people will run across unexplained research.  Research that is to be given the weight of authority must stand up to peer review, and has to crib from and cite sources accurately and correctly.  Furthermore, unexplained research also tends to not stand up to the scrutiny of the scientific method, the methodology for research which has been a cornerstone of Western Civilization for a long time.  (Results must be replicable EXACTLY, and the same results have to be arrived at EVERY TIME, otherwise it's an accident, period.)   Often the term unexplained research is used to explain paranormal activity and paranormal studies – ghosts and so forth.  For the most part, don't put payday loans or any weight into unexplained research– no matter what anyone tells you.

Gerhard Adam's picture
No it doesn't.  Research is required to exploit technologies, but there is no concise research required when the limitation is already part of established theory.  There's no research required to indicate that any technology depending on exceeding the speed of light cannot work.  Similarly, it is blatantly obvious that one cannot make computers smaller than the size of a single atom (or pick whatever particle you're entranced with).  The point is, that regardless of how much improvement may still be possible, there is a finite limit.
(Results must be replicable EXACTLY, and the same results have to be arrived at EVERY TIME, otherwise it's an accident, period.)

Your emphasis on "exactly" is incorrect.  Results must agree with predictions.  To suggest "exactness" implying error-free measurement is a fallacy.  This is especially true when the event being examined is subject to imprecise initial conditions.  In a computer system, it is impossible to replicate any activity "exactly" when it involves assessing the speed of any running process.  Since no unit of work runs in isolation (i.e. it depends on operating system, architecture, system services, etc.), then it is impossible to run the same unit of work twice in a row (on the same machine) and obtain identical results.

All too often, the focus is on the raw speed of a particular chip.  This is largely irrelevant in determining the actual speed with which a process will run.  There are significantly more factors involved of which the actual chip speed is only one piece.  Architecture is much more important to realize the throughput implied by the chip speed (which represents the upper bound).  No computer will ever run as fast as its rated speed, but architecture can help it get close.

What is far more important in context of communication speed is what is possible to make parallel and what is not.

the steps to balance a checking account for example requires you to add up all additions and subtractions. this work can be split up into as many segments as needed and 'fostered off' into whatever cpu's you have. the problem is that there is a basic amount of 'record keeping' that is needed per unit of work as well as communication overhead that is required for this. current CPU's / architectures / languages (with few exceptions) are heavily geared towards serial work. the other issue is that some jobs can not be realistically parallelized. how do you know which of two algorithms to use until the result of a previous algorithm is finished? especially if the result will be used in the subsequent work.
There are ways to 'cheat' the system. for example, one way to get a speed up is to do 'dumb' work. ie, if you are not sure which algorithm should be used once this particular piece of work is done....do them both! once you know which 'path' should be followed, throw away the previous work! this methods are well known but we are basically making one trade off for another. there are fundamental limitation to computational power. the limitation does not reside in the actual speed of the computation but the speed of communication between components. it does me little good to calculate a part of the work at point A if communicating the work to point B takes longer then it would have for point B to do the work itself.
Communication is the fundamental limiter, not computation speed per say.

Gerhard Adam's picture
Communication is certainly a limit, but parallelism is much more readily talked about than it is achieved.  In the first place, general programming isn't as easy as people think. 

More importantly, many problems cannot be parallelized because they are dependent on the results of previous operations, so there is no benefit in splitting the work off to run on separate processors. 
if you are not sure which algorithm should be used once this particular
piece of work is done....do them both! once you know which 'path'
should be followed, throw away the previous work!

These methods are clearly used already in things like Branch History Tables where conditional results need to be evaluated during instruction interpretation.  The downside to this is that it also becomes increasingly resource intensive since more time has to be spent in coordinating activities and providing more operations that are ultimately "thrown away".

This underlies the basic difficulty, because there are many ways to exploit systems for specific applications.  However, specific solutions don't necessarily create a good generic model, so unless one is prepared to build computers for each specific problem to be solved, one has to come up with a general computational model that is broadly applicable and scales reasonably well.

Isn't this exactly what i said? i simply pointed out that we can make computation faster by instead being someone speculative and being willing to throw away unneeded work. we are trading power for performance instead of clock cycles. there is only so much we can do here but it is still possible.

the fundamental limit here though is how much of a problem is linear by nature X, how much is parallel-able Y, how much time is spent on overhead of the parallel part Z, how much is overhead of the linear component T, and how much is overhead of communication between the parallel components Q, and how much is communication between the linear segment and the parallel segment W.

the fastest we can do the calculation S is

S = X+Y+(Y * Z)+(X * T) +(Q * Y) + (W * X).

If X is significantly larger then the other factors then we are linear limited, we can do speculative work on the parallel part and have many many many possible answers all waiting for when we finish X (effectively turning the other factors into 0) but we can not cut past that linear factor.

If instead the linear factor is small (which may or may not be the case) then we have to look at the map part (the overhead Z), the execution of the mapping (Y), and especially the time of communication of Q, imagine that we have an infinitely fast super duper calculation machine. it does ANY single operation instantly. any instantly. we still have to transmit the information from point A to point B and that is limited by the speed of light. so even though we can do the calculation instantly, and we can do the recombining (or marshaling) of information instantly, we are STILL fundamentally limited by how long the communication between point A and point B is.

the next steps are to figure out proper topology systems for mapping the problem such that the communication pathways are as short as possible, ie tuning the general case to a specific case for as fast as execution as possible. if you have spare cpu's thats one job they can do. if you do the job once it's of little use, if you do it multiple times (and what do we only do once? honestly) then it's a perfectly good use.

future software will focus on mapping YOUR software to the fastest version of your hardware possible. mapping the way your software runs, the way it processes, and the way you communicate information as fast as possible. real time, dynamic, and cached previous behavior prediction. all of this is where we are going. we simply lack the tools currently.

> "This underlies the basic difficulty, because there are many ways to exploit systems for specific applications. However, specific solutions don't necessarily create a good generic model, so unless one is prepared to build computers for each specific problem to be solved, one has to come up with a general computational model that is broadly applicable and scales reasonably well."

Actually, no one doesn't actually. FPGA's, GPGPU, etc etc etc. the goal is to map general system specifications (such as declarative programming schemes) onto specific types of hardware. we are literally talking about building custom hardware implemented software. you write a program in some Intermediate language (IL) and it is converted into a FPGA board system or a GPGPU implementation in real time, and eventually dynamically as code and data behavior changes.

The limitation remains communication and overhead, with mapping for implicit overhead reduction if you are not interpreting what data MEANS and instead DOING what the data is you reduce overhead. hard to describe but if you read the paper Synthesis: An Efficient Implementation of Fundamental Operating System Services (1992) Henry (Alexia) Massalin and focus on the 'self traversing' data structures you will get what i mean.

there are plenty of directions to go, not in making cpu's more powerful (well there are directions still there) but also doing things smarter. that is the future. not doing work faster. doing less work when possible, more when needed, and all of it the smarter way.

bubble sort vs quicksort is a lot bigger of a change then which cpu you are running on. the day of stupid software development is over. we need to step up soon.

There certainly are limitations of the current computing paradigm, but technology is always marching forward in different areas. There could be a wide range of advances that moves us beyond the current limitations. One is the new materials research for transistors, another is nanotechnology and the study of exotic particles. Prior to fiber optics there was a bigger limitation for data transmission, so back then you could have stated that we'll never get faster than X download speeds because of the limitation of copper wires.

Gerhard Adam's picture
You're missing the essential point.  The speed limit is determined by the smallest unit of time that can be processed for a single quantum event.  There is no technology that can circumvent that limitation.  In other words, this is not a technologically based limitation, but a physical one.

Add a comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <sup> <sub> <a> <em> <strong> <center> <cite> <code> <TH><ul> <ol> <li> <dl> <dt> <dd> <img> <br> <p> <blockquote> <strike> <object> <param> <embed> <del> <pre> <b> <i> <table> <tbody> <div> <tr> <td> <h1> <h2> <h3> <h4> <h5> <h6> <hr> <iframe>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.
CAPTCHA
If you register, you will never be bothered to prove you are human again. And you get a real editor toolbar to use instead of this HTML thing that wards off spam bots.