There is a common misconception when assessing computers in suggesting that parallel processing increases speed. This simply isn't true.
Parallel processing is intended to increase throughput by addressing queuing delays that may be experienced by "ready" units of work that are waiting for access to the processor. Each processor is essentially a hardware server for instructions to be processed. In modern computers there are actually multiple points of parallelism and overlap processing, but the primary point is to avoid delays.
Instruction processing occurs with an architectural concept known as pipelining. It is like an assembly line where the instructions are fetched from memory into the level one (L1) cache1. Data operands are fetched, instructions are decoded and interpreted, and placed on a queue for actual execution.
In some designs the instruction sequence can be processed so that there is a small degree of parallelism when instructions are detected that don't conflict with each other. In those cases, instructions may be processed simultaneously (and potentially out of sequence) with the order of operations being preserved when the results of the operation are committed. This form of parallelism can occur without any programming effort and is used to improve throughput at the processor level.
There is also no throughput solution in creating longer instructions, because complex instructions must still perform the basic underlying operation, so there are no shortcuts in performing such functions. Complex instructions may be implemented by microcode or millicode, but the basic operation is still the same.
The only way an individual program can gain performance increases is if the software can be written with the express purpose of dividing up its functions in such a way that it is running in parallel. The important distinction is that the speed hasn't increased, but we have an "effective" increase in the application's throughput by performing two or more operations in tandem.
By analogy, one can imagine traffic on a highway. Adding lanes increases the number of vehicles that can simultaneously move, but it cannot improve the speed of any individual vehicle except by removing obstacles or "competitors". If we imagine that an application is moving passengers, then we can envision a vehicle transporting an individual(s) and then returning to transport more. If we can have more vehicles that we can simultaneously transfer more people and gain a performance improvement. However, you cannot simply suggest something like a bus or train, because there is no corrolary in computing to simply increase the functionality of a single operation which is what that would mean.
While much has been made of parallelism as a way of achieving improved performance, the ability to exploit parallel architectures by applications can be a problem. Many computing problems simply don't lend themselves to parallelism especially when operations within a program are dependent on results obtained from earlier operations. In those cases, it makes no sense to try and exploit parallelism because the units of work must wait on other tasks to complete. In addition, many problems may have opportunities for one or even several parallel units of work, but this isn't nearly sufficient to capitalize on the processing power available. If a program can be rendered 100% parallel (which isn't likely), it will improve in performance only by the number of simultaneous tasks that can be executed. Therefore three (3) tasks would be a 300% improvement2. While this may sound significant, it is trivial when workloads are defined in tens of thousands of programs per hour. At present, excepting mathematically intensive problems (scientific programming), the most widely used applications that can exploit high degrees of parallelism are databases and those using graphics3.
While processor parallelism is already being heavily used to manage queuing for multiple units of work (multi-programming), the ability to apply it to individual applications has limited utility and will not likely change in any appreciable fashion in the future.
Another point often raised, is that newer programming languages will resolve the issue of parallel computing, but this is also not likely to happen. In the first place, all programming languages must be resolved to actual machine language instruction streams. Therefore, whatever the programmer doesn't explicitly code, must be generated by the language interpreter or compiler. What makes this approach less promising, is that despite the hype, higher level languages are never as efficient as their low level equivalents. While they are certainly easier for the programmers and novices to use, they are significantly more resource intensive and rather heavy handed in generated solutions.
Parallel computing can certainly be a significant benefit for problems that can be programming in that fashion. Unfortunately, much general computing doesn't lend itself to such techniques and consequently cannot derive much benefit from it.
1 This is a greatly simplified explanation of instruction processing and should be understood as a very rough approximation of what occurs.
2 Amdahl's Law
S = 1 / (1-x+x/p)
S = Speed-up Factor
X = Fraction of Process Affected
P = Speed Increase of Process
3 Special effects, gaming, and simulators are prime candidates for parallel processing because of their need to perpetually calculate spatial coordinates.
Comments
Actually you're missing the point. It isn't semantics to indicate that there is a huge difference in processor speed improvements versus the improvement of an application's throughput. More importantly, "real computing workloads" are not primarily databases or graphics, but they are transaction processing. Similarly, multiprogramming levels are not synonymous with application parallelism.
Most high level languages do NOT exploit parallel processing techniques because that is a function of the design and not the language that establishes this. Parallelism is achieved by the subsystem under whose control programs run and not the programming languages being used.
This is precisely where many data centers get themselves in trouble when determining capacity requirements; when they equate processing speed to application's throughput. To use a simple example, processor speed is often articulated in Millions of Instructions Per Second (MIPS). However it would be foolish in the extreme to argue that a 1000 MIPS machine is the same as 4 x 250 MIPS machine.
Most high level languages do NOT exploit parallel processing techniques because that is a function of the design and not the language that establishes this. Parallelism is achieved by the subsystem under whose control programs run and not the programming languages being used.
This is precisely where many data centers get themselves in trouble when determining capacity requirements; when they equate processing speed to application's throughput. To use a simple example, processor speed is often articulated in Millions of Instructions Per Second (MIPS). However it would be foolish in the extreme to argue that a 1000 MIPS machine is the same as 4 x 250 MIPS machine.
Gerhard Adam | 10/14/09 | 19:12 PM
A true statement. But, which machine would be faster? Depends upon the task. And this is where you are missing the point, because faster depends upon the specific task to be performed, not on an arbitrary measure of computing speed. If I am able to have one processor waiting upon man-machine interface interactions while other processors are performing useful work, then the multiple-processor machine is apt to be faster.
"Parallelism is achieved by the subsystem under whose control programs run and not the programming languages being used. "
Not a true statement. Try looking at Ada, for instance. As a programming language, it does a fine job of task control.
"While processor parallelism is already being heavily used to manage queuing for multiple units of work (multi-programming), the ability to apply it to individual applications has limited utility and will not likely change in any appreciable fashion in the future."
Dead wrong, and an obvious "stuck in the past" mindset. Many commercial and aerospace programs already successfully apply multi-programming to achieve greater throughput, and greater productivity from their software teams.
Brian Salter (not verified) | 10/15/09 | 18:10 PM
You're looking only at specific solutions, of which I am well aware. I've said all along that if you wanted to customize a machine for a specific application, then by all means.
However, if you want to say that the throughput gains are subject to an "it depends" kind of scenario, I agree with that as well.
What remains though, is that speed and parallelism are NOT the same things and very much dependent on the context. For a few processes, parallel programs should achieve the same performance as a faster machine. However, this depends as much on the I/O profile and competition from other work as anything else. However, if the degree of parallelism increases significantly, then the work will run slower than on a correspondingly faster machine.
The point is that speed is independent of the workload, while parallelism is dependent on the workload (application). If the two happen to coincide then they may be comparable to each other.
However, if you want to say that the throughput gains are subject to an "it depends" kind of scenario, I agree with that as well.
What remains though, is that speed and parallelism are NOT the same things and very much dependent on the context. For a few processes, parallel programs should achieve the same performance as a faster machine. However, this depends as much on the I/O profile and competition from other work as anything else. However, if the degree of parallelism increases significantly, then the work will run slower than on a correspondingly faster machine.
The point is that speed is independent of the workload, while parallelism is dependent on the workload (application). If the two happen to coincide then they may be comparable to each other.
Gerhard Adam | 10/15/09 | 20:24 PM
The way I explain it (this being my subfield) is that parallel computing isn't faster, instead it lets you tackle larger problems.
And in practice this is true. A scientist doesn't suddenly do a 4-hour run in 1 hour, they either do 4 1-hour runs (swarming, or exploring parameter space) or they do a 4-hour run that has 10x the resolution as before. Even in desktop graphics, parallel graphics cards aren't used to run at more than the preferred 60 frames per second (fps) but to give higher detail and fidelity at the desired 60fps.
Alex, next door at the Daytime Astronomer
And in practice this is true. A scientist doesn't suddenly do a 4-hour run in 1 hour, they either do 4 1-hour runs (swarming, or exploring parameter space) or they do a 4-hour run that has 10x the resolution as before. Even in desktop graphics, parallel graphics cards aren't used to run at more than the preferred 60 frames per second (fps) but to give higher detail and fidelity at the desired 60fps.
Alex, next door at the Daytime Astronomer
Alex Antunes | 10/16/09 | 08:50 AM
Gerhard Adam | 10/16/09 | 12:00 PM








It simply isn't false, either. It simply isn't ANYTHING, because the situation isn't simple. But it's probably safe to say that your comment "the most widely used applications that can exploit high degrees of parallelism are databases and those using graphics" actually describes a majority of the real computing workload, and high level languages which exploit parallel processing techniques to perform those tasks do an incredible job of speeding up computation.
Parallel processing increases throughput, and that increases the amount of work performed in any given period of time. This article is a silly exercise in semantics to describe this a program which performs, in the majority of situations, more work in less time as anything other than "faster".