By Clay Breshears (Intel) (63 posts) on November 9, 2006 at 5:50 pm

Sure, paradigm is one of those "business-speak" words that was popularized in the last decade, like the Macarena and Beanie Babies. It does have entomological roots that go way back, though. The roots of computing go way back, too. If you look up "computer" in dictionaries before the turn of the 20th century, it would be defined as a person that does computation. Whether calculating ballistic trajectories for the military or actuarial charts for insurance companies or tide tables for sailors, computers were men working through a fixed set of formulas to derive their answers. Then, as now, computers were well-suited to perform well-defined, repetitive calculations.

Computers have obviously changed over the last 100+ years. So have the methods used to program them. One of the first electronic computers, ENIAC, used patch cords and wires to direct stored data through the different components, tables, and computing engines. The realization that a computer's program could be treated and stored in the same way as data led to the idea and implementation of the "stored-program" computers. What seems common to us today was a major breakthrough back in 1948.

If programs could be stored in memory, like data, how would you encode the instructions of a program into the machine? At first, you used the machine language of the computer written and entered directly in binary. Then came assembly language, which is just a set of mnemonics used in place of the binary machine instructions. It is easier for a human programmer to understand the operation performed by an 'ADD' instruction than it is to remember that '01101110' will do the same thing. An assembler is the program that performs the well-defined and repetitive process of translating assembly instructions into machine language.

Engineers, physicists, chemists, and other scientists had always realized the advantages that computers brought to their researches and work. However, assembly language must have seemed like the secret language of an underground brotherhood from the lost continent of Atlantis. Scientists work with mathematical formulas and this led to the development of FORTRAN (FORmula TRANslation), the first high-level language. High-level languages require a compiler so that a computer can perform the well-defined and repetitive process of translating the (more) human-readable programs into machine language. Since FORTRAN, we've seen a plethora of high-level programming languages and techniques. Most recently has been object-oriented programming and managed run-time environments.

What does this history lesson have to do with anything? Well, each of these innovations in programming has been a paradigm shift. (There's another one of those '90s buzzwords.) From patch cords to binary machine instructions to assembly language to high-level languages to object-oriented programming. Now, with the advent of multi-core processors, the next paradigm shift in software is concurrent programming through threads.

Our bodies and brains do parallel processing all the time (heart beating, breathing, cogitating, walking, and chewing gum all at the same time). Thinking about things being done in parallel can still be pretty difficult, no matter how much we think we can multi-task. Yet, this is the skill that will be needed to succeed in this new programming paradigm. But, is it really all that new? No. Remember all those human computers that I mentioned? That was an example of parallel processing: each computer was assigned a portion of the whole job, each man worked at the same time as all the others, and the results were compiled together when complete. Sounds simple. Can it really be that easy to thread your applications?

(Parallel computations and high performance computing (HPC) have been used for many years now by engineers and scientists. In support of HPC, computer scientists have been developing their own paradigm shifts with parallel architectures, MPI, plus research into distributed and parallel algorithms. I'm not all that sure how much of this will be of use to programmers using threads, though.)

Multi-core processors are bringing parallel execution to the masses. So, while the paradigm of concurrent programming and parallel processing may not be new, it is going to be much more pervasive from hereon out. Will you jump on this bandwagon and take advantage of dual- and quad-cores? A more relevant question might be, Do you really need to? Will the benefits outweigh the investment of time and effort to thread your codes? Just because everyone else in your office starts doing the Macarena during their lunch hours, that doesn't mean you need to start doing it, too, right?

--clay <script src="http://feeds.feedburner.com/~s/IntelSoftwareNetworkBlog?i=http://softwareblogs.intel.com/2006/11/09/new-paradigm-of-software-programming/" type="text/javascript" charset="utf-8"></script>

Comments (7)

By William on December 21st, 2006 at 7:54 pm

Will parallel processing be the result of parallel programming? IOW, does the person have to be knowledgeable about parallel programming in order to affect parallel processing, or will it be something the OS will automatically be doing in the background unbeknownst to the programmer?

With these quad-core processors, I could just continue programming the way I've always been doing (using threads when I need to, etc.) and the OS will automatically do the parallel processing for me. Right???

By Clay Breshears on December 21st, 2006 at 8:58 pm

William -

You are correct, sir. When you code with threads you are engaging in concurrent programming. The threaded code can be run on a single core processor and the OS will swap threads in and out to make it appear that things are executing in parallel.

If you have multiple cores, the OS can spread your threads among the cores to execute simultaneously (in parallel). I guess all of this could be considered "unbeknownst" by the user since there is no need to do more than begin execution of the application. Thus, if you know how to use threads (concurrent programming), you're all set to get parallel execution on quad-core processors.

Threads are a shared-memory programming paradigm. When you have a set of processors connected by a network (known as distributed-memory), you use a different model of programming to write code. In this model you need to explicitly coordinate the movement and sharing of data between processes. (With threads you only need to ensure mutual exclusion and that data to be shared is written before it is read.)

--clay

By j2xs on February 5th, 2007 at 5:40 pm

For some types of applications, such as data-intensive processing (search, fraud detection & info surveillance, sort/group or otherwise analyze), the paradigm some use is dataflow or flow based programming (FBP).

This is not simply multi-threading, but rather thinking about your application in terms of the data flowing at high speeds through a directed graph -- each node in the graph is a very simple 'operator' on that data, or maybe even itself a complex sub-graph.

Here's a free Beta of the DataRush framework -- Yes, it's Java but the pipeline, horizontal and vertical parallelism injected into the architecture provides performance gains that far exceed any JVM overhead.

http://www.pervasivedatarush.com

By buckminster on February 7th, 2007 at 8:05 am

Several years ago I was involved in a couple of engineering projects that developed EDA software that could exploited parallel processing. The goal was to reduce the turnaround time for critical path design tasks like IC design rule checks and pattern generation. Since the design of a IC was stored as different layers in the commonly used IC CAD database files, the obvious approach was to assign each layer to different processor and run the task in parallel. This coarse grain approach was easy to implement since we didn't have to rework the basic algorithms. We just tweaked the job control to fan out each layer to a different processor.

The results were successful enough that a commercial version of the PG and ebeam CAD software was sold on dedicated hardware with eight CPU's. This hardware was similar to the blade servers used for running web applications in a parallel fashion.

In doing the above we did rediscover Amdahl's law of parallel execution. Computer design wizard Gene Amdahl was developing supercomputers with multiple processors when he realized that the serial portion of any program was the limiting factor, no matter how many processors you threw at the task. For example, assume the serial portion of some task was 50%. Now even if some really deep magic could reduce the time to run the parallel portion of the job to zero, the time required to execute the task can only improve by a factor of two. And any time reduction must include the additional overhead required to parse the data out to each CPU or the extra time required to merge the results at the end of the job. Unfortunately, Amdahl's constraints on parallel processing are applicable even with the use of multi-threading techniques.

Some tasks, like database queries, at best are able to achieve a linear improvement using parallel execution. This was demonstrated by database software developed at Tandem Computers more than a decade ago. Search engines are another example where multiprocessors can be used to reduce the time required to process queries.

Our experience suggested that most applications won't see a big time reduction unless the algorithm developer is able to find a way to change his code to use fine-grain parallelism without incurring significant overhead. Another approach that was tried with some success was to have compilers look for code that can execute on separate processors much like they do when executing on the floating point unit in parallel with the main CPU. Parallel versions of run time libraries were also tried as yet another way to get a task to execute in parallel without serious algorithm hacks. The RTL approach had been used successfully on systems with floating point array hardware used for vector processing,

In the current trade literature, there are claims that new system architectures have repudiated Amdahl's law. Doing that simply requires a computer with the ability to make time to run in reverse. So until entropy runs backward and the laws of causality are repealed, I remain extremely skeptical of these claims. As Richard Feynman pointed out:

"For a successful technology,
reality must take precedence over public relations, for Nature cannot be fooled."

buck

By Olav on February 8th, 2007 at 11:13 am

For problems where Amdahl's law are a huge limiting factor(like the 50% in your example) it's probably not worth adding multithreading support.

However, saying that "most" applications can't benefit from multi-core CPUs is wildly off mark. Why? - batch processing. Any application that consists of mostly unparallellizable code, but where higher level jobs/transactions, can be run in parallell, have good chances of getting linear increases in throughput on multicore systems. A lot of the problems that have issues with Amdahl's law can get linear throughput speedups this way. The individual transactions will not run faster, but that isn't always a showstopper.

It's also worth mentioning that there are many important application areas where a lot of problems are trivial to run in a parallell manner, such as signal processing, rendering, build systems and some forms of simulation. Most commercial software in these areas support multithreading and benefit from multi-core cpus(or other parallell hardware).

However, arguing about the real life impact of Amdahl's law is not that interesting, obviously some applications benefit more than others from a parallell approach. In the end it just boils down to exploiting the opportunities for software parallellism that give decent returns, they are there, they apply to many important problems and are certainly worth implementing even though you "just" get linear speedups(most people think a 4* speedup is huge in real life, even though it may be insignificant when looking at an exponential algorithm).

I imagine programmers will mostly use a coarse grained multithreading approach because this is easiest to do correctly. Over time I'm sure standard libraries will give applications automatic multithreading support for more and more problems.

I don't see fine-grained parallellism ever becoming the norm though. At least not beyond what can be automated in a compiler. The potential for creating deadlocks is just too great when you start spreading synchronization primitives around randomly in your code and the proofs required to guarantee correctness are far beyond your average programmer.

By Clay Breshears on February 8th, 2007 at 2:35 pm

j2xs -

I'm familiar with data flow or flow based programming. It was a hot research topic about 10-15 years ago. While the idea is a good fit for parallel execution, I'm not sure if the execution model is going to catch on with the cache-based architectures that we have today. That is, if the next operation that can be executed is assigned to the next available core, the operands will need to traverse to that core. Data flow on multiple cores will need a much flatter memory model or some other specialized architecture to remove the latency of shifting operands around the system. Anyone heard of the MTA from Tera?

Intel processors already carry a simple form of data flow as out of order execution for instructions. I guess you might consider Hyper-threading Technology as another example of a form of data flow, too.

By Clay Breshears on February 8th, 2007 at 3:06 pm

I must say I would be extremely skeptical of (and even willing to bet a large sum of money against) claims on the repeal of Amdahl's Law. Even if I have an infinite number of processors that can execute the concurrent sections of my code in no time at all, I'm still left with the serial portions of the applicaiton still needing to be run.

If I've only got 50% of my code that can be run in parallel, should I not even try to thread my code? It really depends on how much effort it is going to take. If I can put in a couple OpenMP pragmas and recompile my code, then it probably is worth it. Another factor to consider is the scalability of the parallel sections. Will it run on 4, 8, and 16 cores and continue to gain some performance? It's all a trade off; getting the most "bang" for the buck.

In that respect, I agree with Olav that fine-grained threading is likely not going to be the most cost-effective way to pursue parallelism. (Of course, if I can parallelize 98% of my code that way in 2 months, it may well be worth it.) Coarse-grained approaches (e.g., thread at the level of groups of pictures rather than within a single video frame) are going to be the way to go for the time being. Tools, libraries, and compilers will make concurrent programming much easier, so we may see a shift in the near future. Besides cutting out the human effort, we'll need to see reductions in the overheads that multiple execution streams will entail.

Now that multi-core processors are the norm and developers are needing to confront the art and science of concurrent programming, we'll see more research effort devoted to making the process as easy as possible. We're at the "assembly language" stage, and I can see glimpses of high-level approaches on the horizon.

New Paradigm of Software Programming?

Comments (7)

馬雲的頓悟：阿里巴巴在維基經濟學中成長

年初17大熱門技術年底走勢如何

基於Visual C++6.0的DLL編程實現

在 console mode 中使用 C/C++ 編譯器

New Paradigm of Software Programming?

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結