Although the Itanium is capable of sustaining a theoretical maximum of 6 instructions and executing up to 11 instructions, and despite its massive register set, it uses fewer transistors for its core than all competitors. The main disadvantage is that it needs much more cache and instruction fetch width, but the disadvantage of needing more cache diminish as process technology gets better (smaller). To improve performance, the Itanium needs much bigger caches than its competitors, but this adds very little to the overall power consumption. As superscalar RISCs in x86 competitors increase their instruction execution width, they need to upgrade the Out-Of-Order buffers and more importantly, increase the complexity of the schedulers. This leads to a much higher complexity and power consumption.
As the focus shifts to Thread Level Parallellism, the Itanium's small cores make it easier to use more cores without increasing the power consumption too much. Montecito will be the living proof of this. The Itanium is also wider than the competition, which results in bigger benefits from threading techniques.
While Itanium may not be very popular in the hardware enthusiast community, it is definitely an architecture that, from an academic and technical point of view, deserves a lot more attention. We'll delve deeper in upcoming articles.