Monday, December 24, 2007

AMD K10 Tries To Grow Some Teeth

It has been a long, long wait for proper testing on K10. The previous hasty reviews generally showed K10 doing poorly with user applications and pretty well with server applications. Unfortunately, there never seemed to be enough information to determine if K10 were working properly or contained some serious design flaw. With the update review at Legit Reviews we now have at least some information.

It is clear that the initial teething problems with ATI have been taken care of. The new Spider platform looks to be first rate and a good win for AMD. However, this platform does need a good processor and AMD has been slow to respond. The Phenom is apparently shipping but only as 9500 (2.2Ghz) and 9600 (2.3Ghz). Things don't really get interesting until 9900 (2.6Ghz) but this is either not available until Q1 or may even be pushed back to Q2 as recently stated by Digitimes. Of course, the same source also says that Intel is delaying the Core 2 Quad Q9300, Q9450 and Q9550 until late Q1. The difference is that AMD has not commented while Intel has confirmed the delay. However, Charlie at The Inquirer says that Intel is still have teething problems of its own with its 45nm process. I have to say that this again brings up the question of how much FUD is in factory previews. We have Intel showing Penryns clocked to 3.33Ghz and AMD showing K10's clocked to 3.0Ghz with neither chip anywhere on the horizon. I guess in this smoke and mirrors contest Intel does seem to be closer to reality with a 3.2Ghz Penryn QX9770 scheduled maybe end of Q1 while AMD hasn't yet indicated when 2.6Ghz will be out. And, of course, with Intel's current lead the question of a delay in 45nm is moot unless AMD is planning to essentially skip 65nm versions and start producing 45nm in Q2 (delivery in Q3). That is about the only way I could see AMD getting back on track.

Okay, back to our original point. Assuming AMD manages to get real hardware out the door someday and make the 9900 more substantial than vaporware what do we have to look forward to? Is 9900 a contender or joke? This was the original Sandra XII scores showing very poor performance for K10 in both Integer and FP (the red bar). Such poor performance suggested a serious problem with K10. However, the new Sandra XII SP1 scores show double the previous scores removing any question of a serious design error in the memory section. The Sandra XII SP1 score of 10306 is fully 52% greater than Intel's QX9770. This explains why K10 does so well on memory limited benchmarks. Even more importantly however is the fact that it is 13% faster than 6400+ showing that K10's hardware changes to the memory controller do make a difference.

Next we need to revisit the two benchmark rumors that dogged K10 long before its release: Pov-Ray and Cinnebench. Early scores from these two benchmarks had Intel enthusiasts cackling and howling that K10 was no faster than K8. Neither was a proper benchmark but this didn't seem to stop the naysayers. We can see on the PovRay scores that 6400+ has to be 20% faster to get 2% better peformance than E6750. Thus C2D is about 18% faster than K8. However, in answer to the notion that Pov-Ray proves that K10 is no faster than K8 we see that there is no such proof. If we take the E6750 score and scale it to quad core at 2.4Ghz we can see that Q6600 scales 98% which is a very good score. However, K10 scales 102%. This suggest that K10 is at least 4% faster than K8. Since Q6600 shows no bandwidth limitations we know that the improvement is not due to memory. True, the change is small but for unoptimized code it is significant. The PovRay website also does not say what level of SSE is supported in the 64 bit version. Judging from the small increase from K8 to K10 and C2D it is clear that PovRay is not using 128 bit SSE operations. It does say that the 32 bit version only supports up to SSE2. It also does not say what compiler was used to create the executable. This too is significant since if Intel's compiler were used, K10 would take at least a 10% hit in speed. This could easily put K10 even or perhaps slightly ahead. Again, in terms of K10's speed relative to K8, the Pov-Ray Realtime scores are even more telling. Intel does manage perfect scaling with QX9770 versus E6750. However, K10 is fully 11% faster than 6400+ would be with perfect scaling. So, the Pov-Ray rumor is shelved.

Next we examine the contraversial Cinnebench scores. At the bottom we can see that K8 needs 20% more clock speed to run neck and neck with E6750. With this benchmark we see a problem though. K10 only scales 88% of 6400+. This isn't far off of Q6600's 90% scaling. However, when we look at the Penryn's we see something different. The 3.0 and 3.2Ghz Penryn's show perfect scaling from E6750. We know that this is not due to code tuning since Q6600 is slower. We know that it is not due to memory bandwidth since 9900 is slower. This pretty much only leaves cache size as the determining factor. Unfortunately, a benchmark that fits into Penryn's larger cache to run full speed is worthless for performance comparison. So, Cinnebench is still up in the air. However, given the improvements that we saw with Pov-Ray I think the Cinnebench scores can be ignored for now. When Shanghai is eventually released it will have a good deal more L3 cache than K10. If its Cinnebench scores perk up then we will know for certain that Cinnebench is useless due to cache tuning.

I could go over the rest of the benchmarks in the review but for the most part they are useless since they do not utilize all the cores. And, if we aren't using all the cores then why not just buy a dual core system? We can see that there are still questions of how benchmarks are compiled and that Intel may still be getting an edge due to the use of its compiler which is still heavily tilted in its favor. We can see that a lot of software still has not caught up to the use of 128 bit SSE codes which is the only way that C2D or K10 show their SSE strength. Older processors as far back as PIII work quite well with 64 bit SSE as does K8. And, there still remains the problem of multi-threading since a quad core is useless without quad threads. Nevertheless, this has answered a couple of questions. There is no big design defect in the K10 memory controller as earlier benchmarks suggested and K10 is clearly faster than K8 in terms of Pov-Ray. Unfortuately, we still have not turned up any real proof of whether K10's 128 bit SSE functions are on par with C2D's (and roughly twice as fast as K8's). Hopefully, this information will turn up eventually.

For now, all we can say is that assuming K10 is a signficant improvement on K8 AMD needs to get it out the door at speeds of 2.6 Ghz and above and sooner rather than later. For AMD's sake we also have to wonder if 45nm is truly still on track which should mean skipping some 65nm parts or whether this has slid as well. I suppose nothing but time will tell. Worst case for AMD would be Intel with 3.2Ghz and 3.33Ghz desktop Penryns and Nehalem launched to tackle the server market while AMD struggles to reach 2.8Ghz on 65nm with 45nm pushed back to 2009. Best case would probably be starting production of 45nm Shanghai parts in Q2 with delivery in Q3 with parts available in clock speeds of at least 3.0Ghz while still fitting within the normal TDP and Intel still at 3.2Ghz with Penryn. The sad part for AMD is that I could see either of these scenarios happening and neither one is particularly bad for Intel. We all know what AMD's New Year's resolution should be; we just don't know if AMD is capable of living up to it. C2D has some serious teeth and the fangs are just getting longer with Penryn; K10 with milk teeth is far less threatening.