Simultaneous multithreading

From Academic Kids

Simultaneous multithreading, often referred to as SMT, is a technique for improving the overall efficiency of the hardware that executes instructions in a computer. This hardware is typically called the CPU. SMT permits multiple independent threads of execution to better utilize the resources provided by modern processor architectures.

Normal multithreading operating systems allow multiple processes and threads to utilize the processor one at a time, giving its exclusive ownership to a particular thread for a time slice in the order of milliseconds. Quite often, a process will stall for hundreds of cycles while waiting for some external resource (for example, a RAM load), thus wasting processor time.

A successive improvement is super-threading, where the processor can execute instructions from a different thread each cycle. Thus cycles left unused by a thread can be used by another that is ready to run.

Still, a given thread is almost surely not utilizing all the multiple execution units of a modern processor at the same time. Simultaneous multithreading allows multiple threads to execute different instructions in the same clock cycle, using the execution units that the first thread left spare. This is done without great changes to the basic processor architecture: the main additions needed are the ability to fetch instructions from multiple threads in a cycle, and a larger register file to hold data from multiple threads. The number of concurrent threads can be decided by the chip designers, but practical restrictions on chip complexity usually limit the number to 2, 4 or sometimes 8 concurrent threads.

This technique dates to the 1950's. An excellent timeline is available at

The Denelcor HEP is notable, as is the Stellar GS-1000 which may have been when field installations of commercial SMT machines first reached the hundreds.

Early notable machines from the 1950's are the Bull Gamma 60 and Honeywell 800.

Every decade seems to rediscover the technique and put a new spin on it. Although it is primarily a throughput enhancement technique, it has constantly been touted as "hiding latency", while not changing anything about the latency of any operation.

Since the technique is really an efficiency solution, and there is inevitable increased conflict on shared resources, measuring or agreeing on the "goodness" of the solution can be difficult. Some researchers have shown that the extra threads can be used to proactively seed a shared resource like a cache, to improve the performance of another single thread, and claim this shows that SMT is not just an efficiency solution. Others use SMT to provide redundant computation, for some level of error detection and recovery.

But, in most currrent cases, SMT is about efficiency and increased throughput of computations, per amount of hardware used.

Efficiency is not the only goal of computers, however. Any engineer can appreciate this while staring from his cubicle at a parking lot of cars sitting idle from 9 to 5.


Commercial implementations

Intel Pentium 4 was the first commercial processor using simultaneous multithreading, starting from the 3.06GHz model, and since introduced into a number of their processors. Intel calls the technology hyper-threading, which is a basic two-threads SMT engine. Intel claims up to a 30% speed improvement compared against an otherwise identical, non-SMT Pentium 4. The performance improvement seen is very application dependent, however, and some programs actually slow down slightly when HT is turned on. This is due to the fact that they have only one thread available to fetch from at any given time, and there is an increase in pipeline length due to processor changes necessary to support the SMT.

The DEC Alpha EV8 was to be equipped with an even more powerful 4-thread SMT engine, but the company owner, Compaq, terminated the project before it could be commercialized. This technology may eventually find its way into Tukwila.

The latest MIPS architecture designs include a two-thread SMT system known as MIPS MT.

The IBM POWER5, announced in May 2004, is a dual-core processor, with each core including a two-thread SMT engine. IBM's implementation is more sophisticated than the previous ones, because it can assign a different priority to the various threads, and the SMT engine can be turned on and off dynamically, to better execute those workloads where a SMT processor would not increase performance.

Sun Microsystems' forthcoming Niagara (~2005) and Rock (~2007) processors are implementations of SPARC focused almost entirely on exploiting SMT and CMP techniques. Sun refers to these combined approaches as "CMT", and the overall concept as "Throughput Computing".

See also

External links


LE Shar and ES Davidson, "A Multiminiprocessor System Implemented through Pipelining", Computer Feb 1974


Academic Kids Menu

  • Art and Cultures
    • Art (
    • Architecture (
    • Cultures (
    • Music (
    • Musical Instruments (
  • Biographies (
  • Clipart (
  • Geography (
    • Countries of the World (
    • Maps (
    • Flags (
    • Continents (
  • History (
    • Ancient Civilizations (
    • Industrial Revolution (
    • Middle Ages (
    • Prehistory (
    • Renaissance (
    • Timelines (
    • United States (
    • Wars (
    • World History (
  • Human Body (
  • Mathematics (
  • Reference (
  • Science (
    • Animals (
    • Aviation (
    • Dinosaurs (
    • Earth (
    • Inventions (
    • Physical Science (
    • Plants (
    • Scientists (
  • Social Studies (
    • Anthropology (
    • Economics (
    • Government (
    • Religion (
    • Holidays (
  • Space and Astronomy
    • Solar System (
    • Planets (
  • Sports (
  • Timelines (
  • Weather (
  • US States (


  • Home Page (
  • Contact Us (

  • Clip Art (
Personal tools