Monday, April 16, 2007

Core 2 Duo -- The Embarassing Secrets

Although Core 2 Duo has been impressive since its introduction last year, a veil of secrecy has remained in place which has prevented a true understanding of the chip's capabilities. This has been reminiscent of The Wizard Of Oz with analysts and enthusiasts insisting we ignore what's behind the curtain. However, we can now see that some of C2D's prowess is just as imaginary as the giant flaming wizard.

The two things that Intel would rather you not know about Core 2 Duo are that it has been tweaked for benchmarks rather than for real code, and that at 2.93 Ghz it is exceeding its thermal limits on the 65nm process. I'm sure both of these things will come as a surprise to many but the evidence is at Tom's Hardware Guide, Xbitlabs, and Anandtech. But, although the information is very clear, no one has previously called any attention to it. Core 2 Duo roughly doubles the SSE performance of K8, Core Duo, and P4D. This is no minor accomplishment and Intel deserves every bit of credit for this. For SSE intensive applications, C2D is a grand slam home run. However, the great majority of consumer applications are more dependent on integer performance than floating point performance and this is where the smoke and mirrors have been in full force. There is no doubt that Core 2 Duo is faster than K8 at the same clock. The problem has been in finding out how much faster. Estimates have ranged from 5% to 40% faster. Unfortunately, most of the hardware review sites have shown no desire to narrow this range.

Determining performance requires benchmarks but today's benchmarks have been created haphazardly without any type of standards or quality control. The fact that a benchmark claims to be a benchmark does not mean that it measures anything useful. When phrenology was believed to be a genuine science, one enterprising individual patented a device that automatically measured bumps on the head. This device then gave a score that purportedly showed the person's mental characteristics. Unfortunately, many of today's benchmarks have the same lack of utility. When benchmark code is too small or is run in an environment far simpler than a real application environment we get an artificial sensitivity to cache size. This is particularly true of shared cache as C2D has. Under real conditions the use of cache by both cores tends to fragment and divide up the L2 which limits how much gain each core gets. Yet, typical testing by review sites carefully runs benchmark code on only one core without other common threads that would be running in the background. This tends to make these tests more of a theoretical upper limit than something that is actually attainable. This common testing however is misleading because the split caches on K8 are immune to cross core interference. This should mean that K8 will perform better under real conditions than typical testing would indicate. The routine testing that review sites do is a bit like testing gas mileage while driving downhill or testing the air conditioning on a 70 F day. Obviously, real driving is not always downhill and the air conditioning will be more likely to run on an 85 F day.

Although C2D has always done well in the past with this type of artificial testing, more recent tests with Celeron 400 versions of C2D with 512K L2 cache give a much more realistic view of the processor's capabilities. Dailytech hides this fact as well as it can by doing very limited testing of Celeron 440 in Conroe-L, Celeron 400, but the large drops in performance can still be seen. Likewise Xbitlabs doesn't make this process any easier when it puts a comparison with a stock Celeron 440 on Conroe-L, page 3 and the comparison with Conroe E4300 on Conroe-L, page 4. The charts are pictures so they can't be copied and the relevant information is on two separate pages. So, it is necessary to transcribe both charts to find out what is going on. In terms of scaling, Celeron 440 is good with 94% scaling after a 50% clock increase. However, the comparison between the 2.0Ghz Celeron 440 and the 1.8Ghz E4300 is not so good. With a 10% greater clock speed, the lower cache C2D is actually 36% slower. This is a difference of 42%. The tricky part is trying to figure out how much of this is due to cache and how much is due to dual core. Unfortunately, none of the review sites make any attempt to isolate these two. We can plainly see that for a very small number of benchmarks like Company of Heros and Zip compression the cache makes a huge difference and artificially boosts the speed by more than 25%. For Fritz, Pov-ray, and Cinebench the boost is at least 10%. However, for most benchmarks the boost is probably about 5%. Still, considering that C2D is typically only 20% faster than K8 we would still have to strongly question the speed suggested by these benchmark scores. It is unfortunate that review site testing commonly avoids real world conditions. Under real conditions, C2D is probably closer to 10% greater IPC with integer operations. However, the SSE performance should still be much higher.

The rumors last year before C2D was released were that Intel would release a 3.2Ghz clocked model before the end of the year. This didn't happen and it now appears that it won't happen this year either. While some try to explain away the lack of higher clocks they nevertheless insist that Intel could release higher clocks if they wanted to. The evidence for this is the supposed overclockability as routinely reported by nearly every review site. It is now clear that this perception is wrong and that even at the stock 2.93Ghz, X6800 is exceeding the factory's temperature limits. The factory limits and proper thermal testing procedures for C2D are spelled out quite nicely in the Core 2 Duo Temperature Guide. We will use this excellent reference to analyze the testing done by Anandtech of the Thermalright 120. The first important issue is what to run to thermally load the cpu. Anandtech simply looped the "Far Cry River demo for 30 minutes".

However, the Guide says that Intel "provides a test program, Thermal Analysis Tool (TAT), to simulate 100% Loads. Some users may not be aware that Prime95, Orthos, Everest and assorted others, may simulate loads which are intermittent, or less than TAT. These are ideal for stress testing CPU, memory and system stability over time, but aren't designed for testing the limits of CPU cooling efficiency." Since we know that the Anandtech testing did not reach maximum we have to allow a greater margin when reviewing the temperature results. The Guide says that, "Orthos Priority 9 Small FFT’s simulates 88% of TAT ~ 5c lower." Since the Far Cry demo will not load the processor as much as Orthos we'll allow an extra 2c for 7c altogether. Next we need to know what the maximum temperature can be.

According to the Guide, "Thermal Case Temperatures of 60c is hot, 55c is warm, and 50c is safe. Tcase Load should not exceed ~ 55c with TAT @ 100% Load." So, 55c is the max and since we are allowing 7c because of less than 100% thermal loading, the maximum allowable temperature would be 48c. The second chart, Loaded CPU Temperature lists the resulting temperatures. We note that the temperature of the X6800 at 2.93Ghz with the stock HSF (heatsink and fan) is shockingly 56c or 8c over maximum. We can see that even a Thermalright MST-6775 is inadequate. From these temperatures we can say that X6800 is not truly an X/EE/FX class chip. This is really a Special Edition chip since it requires something better than stock cooling just to run at its rated clock speed. This finally explains why Intel has not released anything faster. If the thermal limits can be exceeded with stock HSF at stock speeds then anything faster would be even riskier. Clearly, Intel is not willing to take that risk and repeat the 1.13Ghz PIII fiasco. This explains why Intel is waiting until it has a suitable 45nm Penryn to increase clocks again. Presumably with reduced power draw, Penryn could stay inside the factory thermal limits.

We've now seen that Intel's best processor is somewhat less impressive than it has been portrayed. However, we still have the question of Intel's stance in the market. One possible way to judge is to look at Intel's past boom times. The first boom time starts with the introduction of the 440 chipset on the new ATX motherboard in 1995. Intel made great strides from this point, taking away both brand value and money from Compaq and IBM. This boom time continued until Intel's PIII was surpassed by K7 in 1999. The second boom period began in early 2002 when Intel released the Northwood P4. This second boom time ended in early 2004 as K8 came up strong while Intel was only able to release Prescott as a Celeron processor due to power draw. The third boom time obviously began in the fourth quarter of 2006 after Intel's release of Core 2 Duo. There is some suggestion from the first two that these boom times are getting shorter. If this is true and the 3rd boom is shorter still we may have something like:

1st boom time - 4 years.
2nd boom time - 2 years.
3rd boom time - 1 year ?

If this trend actually occurs then this would mean that Intel's third boom would end in the fourth quarter of this year. It seems this could be possible if AMD's K10 is competitive and its DTX form factor and 600 series chipsets are popular.

Friday, April 13, 2007

Intel's Chipsets -- The Roots Of Monopoly

I've been surprised to see so many analysts and board posters criticize AMD's purchase of ATI. Every time I've read this I've wondered what other option they thought would have been better. But, not one article by any these supposedly knowledgeable analysts has included any real alternative. My eventual conclusion was that most people, analysts included, don't understand the historical or current importance of chipsets. Intel's chipsets have everything to do with its current position as both a horizontal and nearly vertical monopoly.

Before 1995, Compaq and IBM were the top PC vendors. These two were essentially the Cadillacs of the PC business with models commanding higher prices but being seen as higher quality. This quality wasn't hard to see when comparing a Compaq with a Packard Bell. Compaq and IBM had had a technological advantage because they were capable of designing their own motherboards and putting together their own chipsets. However, this advantage was lost in 1995. Intel had been dabbling with chipsets as far back as the 80486 and had gotten more serious about this with the Pentium. But, this all came together for Intel in 1995 with the release of the 440 chipset for Pentium II along with Intel's new ATX motherboard standard. In one fell swoop Intel had leveled the playing field between Compaq, IBM, and other vendors. By 1997, ATX and the 440/450 chipsets had become firmly established and IBM and Compaq's position had eroded. Just four years later, the talk was about Compaq's being bought by another company and the following year it was acquired by HP. IBM had deeper pockets so it held out longer but it is difficult to understate the fact that the company that invented the PC; the XT, Baby AT, and AT motherboard standards; PS/2 ports; and VGA graphics finally had to divest of its own PC line.

The significant point is that when Intel became a genuine force as a chipset and motherboard supplier it took away both brand value and money from Compaq and IBM. The brand value and money shifted, naturally, to Intel. Intel deserved this success because ATX was what vendors needed. The ATX standard moved the processor out from under the expansion cards where taller heatsinks and fans were becoming a problem. The ATX standard replaced the XT, Baby AT, and AT standards which had been created by IBM; but ATX was close to the Baby AT standard and didn't require much modification to be used with these cases. ATX has since evolved into smaller form factors like mini, micro, and flex ATX.

So, it was something of a shock when Intel departed significantly from the idea of giving vendors what they needed when it tried to force the four incompatible BTX (BTX, micro, nano, and pico BTX) standards back in 2004. According to Intel, BTX was supposed to have better cooling and the two smallest sizes were supposed to reduce costs. However, BTX was not readily compatible with AMD's K8 processor because the northbridge was setting where AMD needed the processor to go. Also, the cost savings were mostly unrealized because BTX had little compatibility with ATX. Whereas transition cases were made that fit both ATX and Baby AT motherboards this was not possible for BTX. ATX is 305mm wide but BTX is narrower at 266mm. This made the nano-BTX at 223 X 266mm about the same size as the existing micro-ATX at 244 X 244mm but a different width. This has led more than one person to suggest that BTX was both Intel's quick fix for the thermal problems of Pentium 4D and an obvious attempt to disenfranchise AMD. Whatever Intel's reasoning (or scheming) may have been, the BTX strategy has essentially failed and ATX remains as the most popular standard. Even Intel has finally come to this realization and has halted any further development of BTX just three years after introduction.

Intel's failure with BTX to provide vendors what they needed left a vacuum into which AMD has recently introduced the DTX and mini-DTX standards. These standards are compatible with existing standards and therefore allow leveraging of experience with existing motherboards. Specifically, the width of the DTX motherboard is the same as micro-ATX at 244mm's. DTX is like a shorter version of Micro-ATX so micro-ATX/DTX transition cases should be fairly easy. However, the significant difference is that only two ATX (or micro-ATX) motherboards can be cut from one standard PCB (printed circuit board) panel while four DTX motherboards can be cut from the same panel. The smaller mini-DTX standard allows six motherboards from one panel. DTX motherboards can also be made with as few as four layers. This provides an immediate and significant cost reduction for DTX motherboards with only a small transition cost from existing micro-ATX motherboards and cases. In contrast, the three largest BTX standards only allow two motherboards from one panel while the smallest of the four BTX standards, pico-BTX allows four. VIA's mini-ITX does allow six motherboards to be cut from one panel but this standard is only used by VIA. It appears that AMD was giving a nod to VIA by making the width of mini-DTX the same 170mm's as mini-ITX. This should mean that both transition micro-ATX/DTX and mini-ITX/mini-DTX cases should appear in short order.

This move by AMD away from ATX is as significant as Athlon's move away from Socket 7. It remains to be seen how Intel will respond. With the failure of BTX, any attempt at yet another incompatible standard would be silly. This only leaves Intel with three choices: they can do nothing and continue with the existing BTX standards for awhile, they can support DTX and micro-DTX, or they could reach for the fig leaf of creating a new form factor with the same widths but with different component placement. However, only pico-BTX can compete with DTX in terms of board cost but without the same case advantages as DTX. So, doing nothing means getting pushed out of the low range by a standard that was designed to be profitable even at a low price. On the other hand, supporting DTX or using the same width puts Intel back on a level playing field with AMD and other chipset makers, and this is where Intel has tried very hard not to be. The bottom line is that AMD has stolen Intel's thunder by choosing to supply what its vendor customers wanted just as Intel did with IBM back in 1995.

Although not quite as important as DTX, AMD is also pushing an extended ATX standard with Quad FX. This standard makes a lot of sense because there currently exists a huge gap between ATX and the massive WTX (workstation motherboard) standard which is roughly twice the size of ATX. Theoretically, extended ATX fits into with the same width as ATX but has greater depth up 13”. The problem is that the volume for extended ATX has been so low that this size has never truly been a standard in the same way that ATX and WTX has been. However, components have gotten smaller making the old WTX standard overkill and unnecessarily expensive. Quad FX should bring some much needed volume to this market and pump up the available cases. This should be good news to companies like Alienware, VoodooPC, and Boxx who can have trouble stuffing all of the high end components into a standard ATX motherboard and case.

With DTX, AMD's decision to buy ATI makes a great deal of sense. I suppose some self styled experts would see it as AMD/ATI = AMD + ATI – (Intel's ATI orders). By this view, the combined company is worth less that the two companies were separately. This, of course, is wrong. Intel's self promotion reached a pinnacle with Centrino as this brand was pushed ahead of all vendor brands. AMD's ability to now deliver not only competitive mobile chipset solutions but all in one desktop solutions makes it a viable alternative to being pushed around by Intel. AMD benefits ATI by bringing both much needed development money and the AMD factory brand name to ATI's chipsets. While this distinction is not so important for discrete graphic cards it will have positive effect on demand for ATI chipsets. Anyone who doubts the importance of chipsets for AMD is clearly forgetting not only 1995 but the fact that without AMD's 760 and 8000 series chipsets, neither K7 nor K8 would have gotten off the ground. ATI's inclusion, however, means that AMD can now pursue the development of strategic areas that have been ignored by chipset vendors looking for nearer term profit. In other words, AMD/ATI not only fills in AMD's product line but breaks Intel's chipset position and makes another Centrino by Intel nearly impossible. The only question remaining is whether Intel has yet realized this fact and will go back to being a good supplier; or, will Intel keep putting its efforts into self promotion and trying to gain additional market leverage?

Tuesday, April 03, 2007

Intel -- The Monopoly Under Siege

It hasn't happened yet but there are now clear signs that Intel's monopoly is crumbling. It's a slow process but from what has been announced there is no reason why AMD cannot achieve this goal by the end of 2008.

Back in 1998 AMD scooped Intel by releasing the powerful 3DNow! instruction set which for the first time allowed SIMD floating point instructions on X86 processors. However, this was cold comfort to AMD since Intel simply ignored AMD's 3DNow! instruction set and released its own SSE instructions a year later. In some ways, like the ability to add numbers within the same register, Intel didn't catch up to 3DNow! until SSE3 was released nearly six years later. Intel's monopoly position was such that it could create standards even when it was trailing its competition.

In 2003 when AMD released the AMD64 64 bit extensions to the X86 instruction set Intel had planned to simply introduce its own incompatible instruction set much as it had with SSE. However, Microsoft refused to support another set of 64 bit extensions so Intel was forced to follow AMD's lead. It is significant that even four years later C2D's ability to handle AMD64 instructions is still lacking. This is a clear indication that Intel has not actually reinvented itself and is still following the old notion that it can do whatever it wants simply because it is Intel.

After all, being behind with Pentium M worked fine. Although Pentium M was actually less power efficient (even without AMD64 instructions) than AMD's Turion, the Centrino brand was labeled as being more advanced. The reason Intel was able to hide the fact that Pentium M was behind was that it produced its own mobile Centrino chipsets. These chipsets were so good that they more than made up for the higher power draw of Pentium M. AMD in contrast was left waiting and hoping for a proper mobile chipset from one of the 3rd party chipset suppliers like VIA, nVidia, or ATI. No such chipset was ever produced. However, now that AMD owns ATI this is no longer the case. AMD is fully capable of producing its own low power chipsets along with its mobile processors. In contrast, Intel's standardization of the C2D family prevented it from leaving out the AMD64 instructions from Merom and this increased power draw past Intel's original target. The mobile market is now much more competitive.

There are other signs that things are changing. Intel has tried to replace the ATX motherboard with BTX. However, ATX is still the standard and BTX is still not popular with OEM's. In contrast, AMD is introducing two new motherboard standards. For workstations, ATX is too small so AMD introduced an extended ATX standard with Quad FX. A new standard is badly needed since there is no current standard larger than ATX. And while ATX is too small for workstations it is too large for low cost desktop systems. So, AMD is also introducing the DTX standard which will allow smaller motherboards and cases. This standard seems particularly attractive for several reasons. The first is that the rise of USB as a common external port means that many ports like the parallel, serial, and even mouse and keyboard ports can be left off. This means much less competition for I/O port space at the back of the motherboard. Secondly, both Intel and AMD are talking about putting GPU's in the CPU package. With AMD's Integrated Memory Controller (IMC) this would completely eliminate the Northbridge chip which would similarly reduce competition for space on the motherboard itself. There is some indication that Intel is resisting this move for fear of losing chipset revenue. However, making the motherboard and case smaller while reducing the chip count and I/O ports will reduce costs for OEM's so DTX should become very popular.

Another big problem for Intel is server support from IBM. Intel currently enjoys good server support from IBM because IBM spent a lot of money developing its Hurricane chip for the Xeon FSB and is eager to sell as many systems with it as possible. The problem is that Intel's Nehalem processor has its own IMC and therefore no FSB which makes it incompatible with Hurricane. It would of course be possible for IBM to create another chipset to support Nehalem but this does not appear likely. The reason for this is that one of Intel's current directions is to use the same socket for both Nehalem and Itanium. This should reduce cost for Itanium development and particularly for IBM's sever competitor, HP. Apparently, in response to this, IBM is making its Power 7 family socket compatible with AMD's Opteron. So, it appears that the strong server support that Intel enjoyed because of Hurricane is now over and that IBM will strongly support Opteron instead. With Itanium on the Xeon socket and Power on the Opteron socket this leaves Sparc as the only competing server processor without leveraged support. It remains to be seen what Sun's response will be but without similar costs reductions Sparc could be pushed out of the market.

Other problem areas have included compilers and DIMMs. With its IMC AMD's K8 processors benefit from low latency. AMD would have therefore benefited from DDR speeds of 466 and 500MHz. However, Intel was able to push the development of DDR2 so DDR standard speeds stopped prematurely at only 400MHz. There is no doubt that this helped Intel while hurting AMD. However, there is currently some suggestion that AMD will be able to get DDR2 extended from 800MHz to 1066MHz. If this is true then it would be a real benefit for AMD. Coming with lackluster support for Intel's FBDIMM this would indicate genuine shift in influence. Likewise Intel enjoyed optimal code speed from its compiler while AMD without its own compiler often saw its full processor power unused. Now, however, the Portland Group has a compiler that produces good code for both Intel and AMD processors. This would appear to be the logical choice for software developers who want the maximum possible market for their products. However, increasing use of this compiler by developers is reducing another of Intel's artificial advantages. It isn't an overnight change but all of these things should steadily erode Intel's monopoly standards position and allow AMD to create its own standards independently of Intel.

If AMD does indeed succeed in breaking the Intel monopoly by late 2008 then IBM's increased server support will simply strengthen AMD's position. I've heard a lot of talk about Intel's lead in process technology. However, this too is changing. AMD will reduce Intel's lead at 45nm to just 6 months. But, Intel will continue with a 24 month cycle to 32nm while AMD is pursuing an 18 month schedule. This should mean that AMD will introduce 32nm at basically the same time as Intel in late 2009 or early 2010, a difference of perhaps a few weeks instead of months. It also appears that Intel's Simultaneous Multi-Threading strategy is going to be ineffective for some time. This should be a maximum benefit on dual cores but a decreasing benefit on quad cores or quad cores on dual sockets. This is true because both Vista and Linux would be hard pressed to produce the 8 or 16 threads necessary to fully occupy these chips. This really only competes with the limited market currently being pursued by Sun's T1 chip. All in all, SMT is a good thing for processors and it is commendable if Intel can release a good version. However it is ironic that this advance would put Intel in the same boat that AMD has been in since 2003 with its advanced 64 bit K8 hardware underutilized.

I think that Intel will eventually realize that it still needs a few philosophy changes. Essentially I think that Intel's current strategy of low prices is going to fall flat in 2008 when faced with Fusion, chipsets without Northbridges, and DTX since this is exactly the platform that can flourish with severe pricing pressure. In contrast, Intel seems to be holding out by continuing to use the old FSB on single socket Nehalems and these will not fare so well with low prices. I would say that Intel is heading for another round of reorganization in late 2008 or early 2009. I also think Intel will end up scaling back some 45nm and 32nm FAB upgrades. I'm not sure if AMD will actually pursue the NY FAB option. If AMD is still pressed for cash it may resort to some type of expansion in Dresden perhaps as it did with FAB 30 back in 2004. So, we should continue to see AMD and Intel moving closer together into 2010.