Core 2 Duo -- The Embarassing Secrets
Although Core 2 Duo has been impressive since its introduction last year, a veil of secrecy has remained in place which has prevented a true understanding of the chip's capabilities. This has been reminiscent of The Wizard Of Oz with analysts and enthusiasts insisting we ignore what's behind the curtain. However, we can now see that some of C2D's prowess is just as imaginary as the giant flaming wizard.
The two things that Intel would rather you not know about Core 2 Duo are that it has been tweaked for benchmarks rather than for real code, and that at 2.93 Ghz it is exceeding its thermal limits on the 65nm process. I'm sure both of these things will come as a surprise to many but the evidence is at Tom's Hardware Guide, Xbitlabs, and Anandtech. But, although the information is very clear, no one has previously called any attention to it. Core 2 Duo roughly doubles the SSE performance of K8, Core Duo, and P4D. This is no minor accomplishment and Intel deserves every bit of credit for this. For SSE intensive applications, C2D is a grand slam home run. However, the great majority of consumer applications are more dependent on integer performance than floating point performance and this is where the smoke and mirrors have been in full force. There is no doubt that Core 2 Duo is faster than K8 at the same clock. The problem has been in finding out how much faster. Estimates have ranged from 5% to 40% faster. Unfortunately, most of the hardware review sites have shown no desire to narrow this range.
Determining performance requires benchmarks but today's benchmarks have been created haphazardly without any type of standards or quality control. The fact that a benchmark claims to be a benchmark does not mean that it measures anything useful. When phrenology was believed to be a genuine science, one enterprising individual patented a device that automatically measured bumps on the head. This device then gave a score that purportedly showed the person's mental characteristics. Unfortunately, many of today's benchmarks have the same lack of utility. When benchmark code is too small or is run in an environment far simpler than a real application environment we get an artificial sensitivity to cache size. This is particularly true of shared cache as C2D has. Under real conditions the use of cache by both cores tends to fragment and divide up the L2 which limits how much gain each core gets. Yet, typical testing by review sites carefully runs benchmark code on only one core without other common threads that would be running in the background. This tends to make these tests more of a theoretical upper limit than something that is actually attainable. This common testing however is misleading because the split caches on K8 are immune to cross core interference. This should mean that K8 will perform better under real conditions than typical testing would indicate. The routine testing that review sites do is a bit like testing gas mileage while driving downhill or testing the air conditioning on a 70 F day. Obviously, real driving is not always downhill and the air conditioning will be more likely to run on an 85 F day.
Although C2D has always done well in the past with this type of artificial testing, more recent tests with Celeron 400 versions of C2D with 512K L2 cache give a much more realistic view of the processor's capabilities. Dailytech hides this fact as well as it can by doing very limited testing of Celeron 440 in Conroe-L, Celeron 400, but the large drops in performance can still be seen. Likewise Xbitlabs doesn't make this process any easier when it puts a comparison with a stock Celeron 440 on Conroe-L, page 3 and the comparison with Conroe E4300 on Conroe-L, page 4. The charts are pictures so they can't be copied and the relevant information is on two separate pages. So, it is necessary to transcribe both charts to find out what is going on. In terms of scaling, Celeron 440 is good with 94% scaling after a 50% clock increase. However, the comparison between the 2.0Ghz Celeron 440 and the 1.8Ghz E4300 is not so good. With a 10% greater clock speed, the lower cache C2D is actually 36% slower. This is a difference of 42%. The tricky part is trying to figure out how much of this is due to cache and how much is due to dual core. Unfortunately, none of the review sites make any attempt to isolate these two. We can plainly see that for a very small number of benchmarks like Company of Heros and Zip compression the cache makes a huge difference and artificially boosts the speed by more than 25%. For Fritz, Pov-ray, and Cinebench the boost is at least 10%. However, for most benchmarks the boost is probably about 5%. Still, considering that C2D is typically only 20% faster than K8 we would still have to strongly question the speed suggested by these benchmark scores. It is unfortunate that review site testing commonly avoids real world conditions. Under real conditions, C2D is probably closer to 10% greater IPC with integer operations. However, the SSE performance should still be much higher.
The rumors last year before C2D was released were that Intel would release a 3.2Ghz clocked model before the end of the year. This didn't happen and it now appears that it won't happen this year either. While some try to explain away the lack of higher clocks they nevertheless insist that Intel could release higher clocks if they wanted to. The evidence for this is the supposed overclockability as routinely reported by nearly every review site. It is now clear that this perception is wrong and that even at the stock 2.93Ghz, X6800 is exceeding the factory's temperature limits. The factory limits and proper thermal testing procedures for C2D are spelled out quite nicely in the Core 2 Duo Temperature Guide. We will use this excellent reference to analyze the testing done by Anandtech of the Thermalright 120. The first important issue is what to run to thermally load the cpu. Anandtech simply looped the "Far Cry River demo for 30 minutes".
However, the Guide says that Intel "provides a test program, Thermal Analysis Tool (TAT), to simulate 100% Loads. Some users may not be aware that Prime95, Orthos, Everest and assorted others, may simulate loads which are intermittent, or less than TAT. These are ideal for stress testing CPU, memory and system stability over time, but aren't designed for testing the limits of CPU cooling efficiency." Since we know that the Anandtech testing did not reach maximum we have to allow a greater margin when reviewing the temperature results. The Guide says that, "Orthos Priority 9 Small FFT’s simulates 88% of TAT ~ 5c lower." Since the Far Cry demo will not load the processor as much as Orthos we'll allow an extra 2c for 7c altogether. Next we need to know what the maximum temperature can be.
According to the Guide, "Thermal Case Temperatures of 60c is hot, 55c is warm, and 50c is safe. Tcase Load should not exceed ~ 55c with TAT @ 100% Load." So, 55c is the max and since we are allowing 7c because of less than 100% thermal loading, the maximum allowable temperature would be 48c. The second chart, Loaded CPU Temperature lists the resulting temperatures. We note that the temperature of the X6800 at 2.93Ghz with the stock HSF (heatsink and fan) is shockingly 56c or 8c over maximum. We can see that even a Thermalright MST-6775 is inadequate. From these temperatures we can say that X6800 is not truly an X/EE/FX class chip. This is really a Special Edition chip since it requires something better than stock cooling just to run at its rated clock speed. This finally explains why Intel has not released anything faster. If the thermal limits can be exceeded with stock HSF at stock speeds then anything faster would be even riskier. Clearly, Intel is not willing to take that risk and repeat the 1.13Ghz PIII fiasco. This explains why Intel is waiting until it has a suitable 45nm Penryn to increase clocks again. Presumably with reduced power draw, Penryn could stay inside the factory thermal limits.
We've now seen that Intel's best processor is somewhat less impressive than it has been portrayed. However, we still have the question of Intel's stance in the market. One possible way to judge is to look at Intel's past boom times. The first boom time starts with the introduction of the 440 chipset on the new ATX motherboard in 1995. Intel made great strides from this point, taking away both brand value and money from Compaq and IBM. This boom time continued until Intel's PIII was surpassed by K7 in 1999. The second boom period began in early 2002 when Intel released the Northwood P4. This second boom time ended in early 2004 as K8 came up strong while Intel was only able to release Prescott as a Celeron processor due to power draw. The third boom time obviously began in the fourth quarter of 2006 after Intel's release of Core 2 Duo. There is some suggestion from the first two that these boom times are getting shorter. If this is true and the 3rd boom is shorter still we may have something like:
1st boom time - 4 years.
2nd boom time - 2 years.
3rd boom time - 1 year ?
If this trend actually occurs then this would mean that Intel's third boom would end in the fourth quarter of this year. It seems this could be possible if AMD's K10 is competitive and its DTX form factor and 600 series chipsets are popular.