Thursday, June 26, 2008

Top End Graphics

AMD's newest graphics card has definitely sent shock waves rolling across the graphics landscape.

It is clear from reviews such as Anandtech's The Radeon HD 4850 & 4870: AMD Wins at $199 and $299 that AMD is doing much better in terms of graphics. To be clear, nVidia's GT 280 is still at the top but HD 4870 has managed to surpass GT 260 to take second place. Now, with performance midway between the $400 GT 260 and $650 GT 280 the HD 4870 should be priced at $525. Instead the HD 4870 is a phenomenal bargain at just $300. It looks like nVidia's shiny new GT 200 series is now greatly overpriced. Based on the comparison GT 260 should probably be priced at about $280 and GT 280 at perhaps $350. Nor is nVidia's last minute entry of GeForce 9800 GTX+ much help. This product at $230 is not really better than the $200 HD 4850. Anandtech says:

The Radeon HD 4850 continues to be a better buy than NVIDIA's GeForce 9800 GTX, even if both are priced at $199. The overclocked, 55nm 9800 GTX+ manages to barely outperform the 4850 in a few titles, but loses by a larger margin in others, so for the most part it isn't competitive enough to justify the extra $30.

I have to say though that even when AMD does well there is still subtle bias at Anandtech. For example in the final comments they have to get in this sour note:

You may have noticed better CrossFire scaling in Bioshock and the Witcher since our Radeon HD 4850 preview just a few days ago. The reason for the improved scaling is that AMD provided us with a new driver drop yesterday (and quietly made public) that enables CrossFire profiles for both of these games. The correlation between the timing of our review and AMD addressing poor CF scaling in those two games is supicious. If AMD is truly going to go the multi-GPU route for its high end parts, it needs to enable more consistent support for CF across the board - regardless of whether or not we feature those games in our reviews.

To be perfectly honest I'm a bit floored by this comment. Last minute driver fixes for products that won't even be available for another month shouldn't be too unusual for people who claim to have experience with reviews. That is pretty much the nature of trying to get your reviews out quickly in competition with other websites. Newspapers do the same thing when they chase a breaking story and sometimes they are editing copy right up until it goes to press. This negative comment is even more odd when contrasted with the comment further back in the article:

It is worth noting that we are able to see these performance gains due to a late driver drop by AMD that enables CrossFire support in The Witcher. We do hope that AMD looks at enabling CrossFire in games other than those we test, but we do appreciate the quick turnaround in enabling support - at least once it was brought to their attention.

This seems to be Anandtech schizophrenia at its finest when appreciation for "the quick turnaround in enabling support" turns into suspicion at the "correlation between the timing of our review and AMD addressing poor CF scaling ". Guys, if you read your own text it says that you brought it to AMD's attention and then they sent you an update. We can also see that Anandtech was careful not to end with this comment:

We've said it over and over again: while CrossFire doesn't scale as consistently as SLI, when it does, it has the potential to outscale SLI, and The Witcher is the perfect example of that. While the GeForce GTX 280 sees performance go up 55% from one to two cards, the Radeon HD 4870 sees a full 100% increase in performance.

I assume nVidia must be sweating a bit over a comment like that. This suggests that some more driver work could put dual HD 4870 ahead of nVidia's dual GT 280 in a lot games. This seems especially true when we recall that unlike 3870 X2 the new 4000 series uses a special proprietary inter-processor link instead of using Crossfire. I think we can give credit for that to AMD whose engineers no doubt have lots of experience doing the same thing with CPU's. We've certainly come a long way since AMD's 2900XT last year which although worse in performance than nVidia's 8800 GT still drew a lot more watts. This should also be very good news for Intel. Intel motherboards have by far the weakest integrated graphics compared to nVidia and AMD and really need a boost with an added discrete graphic card. However, Intel motherboards can't do SLI so although Intel is loath to admit it the best graphics on Intel motherboards can only be had with AMD graphic cards using Crossfire. This means that AMD's huge leap in performance with HD 4870 has also just given a big boost to Intel motherboards for gaming.

I'm also a bit doubtful about nVidia's current strategy. nVidia must have been feeling pretty smug last year compared to 2900XT and even the recent shrink to 3870 can't have been too much of a concern. But HD 4870 is a big problem because it performs better than GT 260 but has a much smaller die. This means that nVidia has now just gained a huge liability for 200 and 8800 series inventory out in the field. nVidia will probably have to give rebates to recent wholesale buyers to offset what will almost certainly be substantial reductions in list prices. This also means that nVidia's current inventory has taken a hit as well. Although the 200 series is due for a shrink, the die is so much larger AMD's 4000 series that it is difficult to imagine that nVidia can turn a good profit even at 55nm's. With GT 280 designed for a price point above $600 nVidia is probably going to take a good revenue hit in the next quarter. I wouldn't be surprised to see nVidia change course and start thinking about reducing die size like AMD.

Finally, there should be no doubt at this point that the ATI purchase has turned ATI around giving them both more resources for product development as well as access to a much larger sales force. A rising tide for ATI products is probably some benefit to AMD but obviously AMD has put a great deal of money and resources into the merger. It still appears that AMD had no real choice in the matter but we'll probably have to wait to find out if this relationship has indeed become greater than the sum of its parts.

Some apparently feel that I pick on Anandtech too much so I'll link to some other reviews. Tech Report is considerably more upbeat in their conclusions:

The RV770 GPU looks to be an unequivocal success on almost every front. In its most affordable form, the Radeon HD 4850 delivers higher performance overall than the GeForce 9800 GTX and redefines GPU value at the ever-popular $199 price point. Meanwhile, the RV770's most potent form is even more impressive, in my view. Onboard the Radeon HD 4870, this GPU sets a new standard for architectural efficiency—in terms of performance per die area—due to two things: a broad-reaching rearchitecting and optimization the of R600 graphics core and the astounding amount of bandwidth GDDR5 memory can transfer over a 256-bit interface. Both of these things seem to work every bit as well as advertised. In practical terms, what all of this means is that the Radeon HD 4870, a $299 product, competes closely with the GeForce GTX 260, a $399 card based on a chip twice the size.

You may also get a bit better technical description of the architecture at Tech Reports and you get graphs of games at multiple resolutions.

AMD decided a while back, after the R600 debacle, to stop building high-end GPUs as a cost-cutting measure and instead address the high end with multi-GPU solutions.

This is mostly incorrect. The shaders were increased from 320 to 800 which is definitely a brute force approach to scaling. It appears that what AMD actually did was revamp the memory bus and then apply those power savings to additional shaders. In other words, it appears that AMD's limit is power draw (as born out in a number of reviews) rather an arbitrary stopping point. We also disagree in terms of die size as I believe that AMD has finally gotten dual GPU right by using a proprietary link.

The Legit Reviews article is not nearly as good. There benchmarking is only a fraction of what TR and Anandtech do nor do they have a GT 260 for comparison. However, their most bizarre statement concerns power consumption:

The GeForce GTX 280 has impressive power savings features as you can tell above. The HIS Radeon HD 4870 uses GDDR5 that is supposed to save energy, so they must have had to really increase the core voltage to reach 750MHz. Both the Radeon HD 4850 and HD 4870 use a little more power than we want to see.

From this description you would assume that HD 4870 draws more power than GT 280; this is not the case. In fact, GT 280 draws 20 amps more than HD 4870 under load. Those "impressive power savings" are only seen at idle. So, this characterization is certainly debatable since most computers have sleep or standby modes when they are idle.

The Hexus Review is also missing a GT 260 comparison. I find the author's style of English to be tougher to get information from. I'm not sure whether that is due to his UK heritage or just his heavy tongue in cheek and somewhat meandering writing style. It's probably a bit of both.

Then we have the HardOCP Review which thankfully does include a comparison with GT 260. Their conclusions are:

AMD’s new Radeon HD 4870 and Radeon HD 4850 offer a gigantic performance improvement over their last generation of GPUs and offer great values when compared to NVIDIA new GTX 280 and GTX 260. These performance improvements translate into real-world gaming benefits being able to play at higher resolutions with higher in-game settings utilizing AA and AF.

We were surprised to see the Radeon HD 4870 competing well against the GeForce GTX 260. It not only matched the gameplay experience, but in Assassin’s Creed it edged out with faster framerates too. Count on drivers to mature in both camps as well.

The Radeon HD 4850 is the new sub-$200 video card to beat as it provides the best gameplay experience for the money. It provides a better experience than a GeForce 8800 GT and a GeForce 9800 GTX and is on par with GTX 260 while being less expensive.

While NVIDIA still ultimately holds the single-GPU performance crown with the GeForce GTX 280 it also holds the “Hard to Justify the Price” and “More Money Less Value” crowns with the GTX 280 as well. AMD is now offering the best value in video cards with its 4800 series GPU. And we also see AMD’s drivers maturing more for the 4800 series than we do for the new 200 series from NVIDIA so it is our guess that AMD’s performance will get even better with its new GPUs comparatively.

I pretty much agree with this. We further have yet another reference to maturing drivers. Peddie of Jon Peddie Research even suggested, "Nvidia and ATI keep improving their drivers so they'll seesaw back and forth with their scores, almost from week to week". So, again the notion of some kind of conspiracy as suggested by Anandtech seems to have no connection with reality. Peddie also seems to agree with the idea that nVidia may have to shift to smaller dies.

Friday, June 20, 2008

The Value Of Benchmarks

Chico Marx had a great line in Duck Soup where he asked, "Who are you going to believe, me or your own eyes?" Apparently Intel is now asking the same question.

Intel has on its website some graphs purporting to show "breakthrough performance and energy efficiency" for Intel 7300 Xeon in virtualization. These are the vConsolidate benchmarks one of which uses VMware. Intel's graph at 2.01 towers over the AMD graph at just 1.08. The problem with this comparison is that reality tends to get in the way. First, Intel is comparing four Quad-Core Intel Xeon X7350 2.93GHz processors against four AMD Dual-Core Opteron 8222SE 3.0GHz processors. Perhaps the fact that Intel is using twice as many cores explains why its score is twice as high. Secondly, where did this vConsolidate benchmark come from? According to Intel:

vConsolidate is a benchmark developed by Intel Corporation to measure Server Consolidation performance.

So we are supposed to trust that Intel didn't massage its own benchmark a bit to favor its own processors? Right. Oddly enough there is a benchmark, VMmark, which is from the same people who make VMware which is what Intel claims to be testing. The problem for Intel is that:

VMmark software is agnostic towards individual hardware platforms and virtualization software systems so that users can get an objective measurement of virtualization performance.

The last thing Intel wants is an objective measurement of virtualization performance when the VMmark results show:

Dell 4x Quadcore AMD Opteron 2.5Ghz 8360 SE R905 - 14.17
Dell 4x Quadcore Intel Xeon 2.93Ghz X7350 R900 - 12.23

With 15% less clock speed the AMD system scores 16% higher.

There are also the SPEC results listed by Heise Online which show:

Dell 4x Quad Opteron 2.5GHz 8360 SE R905 - SPECint_rate2006: 167
Bull 4x Quad Xeon 2.93GHz X7350 R480E1 - SPECint_rate2006: 177

AMD is 6% slower in SPECint_rate with a 15% slower clock.

Dell 4x Quad Opteron 2.5GHz 8360 SE R905 - SPECfp_rate2006: 152
Bull 4x Quad Xeon 2.93GHz X7350 R480E1 - SPECfp_rate2006: 108

AMD is 41% faster in SPECfp_rate with a 15% slower clock.

Not all of the server benchmarks are bad for Intel though. In SAP SD, Intel and AMD are much closer:

HP ProLiant BL685c G5, 4 cpu's/16 cores/16 threads, Quad-Core AMD Opteron 8356, 2.3 GHz: 3,524 SD, SAPS: 17,650

HP ProLiant BL680c G5, 4 cpu's/16 cores/16 threads, Quad-Core Intel Xeon E7340 2.4 GHz: 3,500 SD, SAPS: 17,550

HP ProLiant DL580 G5, 4 cpu's/16 cores/16 threads, Quad-Core Intel Xeon X7350 2.93 GHz: 3,705 SD, SAPS: 18,530

With 4% more speed Intel ties AMD and with 27% more cpu speed it is 5% faster.

While in SPECjbb2005 Intel wins with higher clock speed:

HP ProLiant DL585 G5, 4 Opteron 2.3 GHz 8356s 4 × 4: 368,543

Sun Fire X4450, 4 Xeon 2.93 GHz X7350s 4 × 4: 464,355

With 27% more speed, Intel is 26% faster.

So, if Intel had more integrity they would show the benchmarks where they legitimately win like SPECint_rate, SPECjbb, and SAP SD instead of creating their own skewed benchmarks. I'm sure Intel enthusiasts will leap in to say that the only reason Barcelona does so well is because each of the four processors has its own IMC while Tigerton uses a quad FSB northbridge and has to share the same memory. Interestingly, when I brought up this same point 20 months ago in October 2006 Tigerton or Kittenton? many Intel enthusiasts said I didn't know what I was talking about and that memory bandwidth would not be an issue because the quad FSB Caneland chipset would fix everything. I guess I can't be wrong all the time.

Intel proponents are correct to point out that Nehalem will solve this problem and finally deliver real 4-way performance to Intel. The problem is that this won't happen anytime soon. Today, Intel is stuck with Tigerton and later this year they will introduce the hex core Dunnington which will just make the memory bottlenecks worse. We won't see a 4-way version of Nehalem for more than a year until late 2009.

And, although Nehalem's robust triple channel memory controller has been touted many times the truth is that it isn't needed yet. I've already seen people suggesting that Nehalem's triple channel IMC will increase your gaming performance. Don't hold your breath. The truth is that dropping the FSB and external northbridge does greatly reduce latency. However, in terms of actual bandwidth DDR3 should be fine with just two channels up to hex core. It really isn't until you move up to octal core that triple channel memory begins to shine. Intel already has this with Nehalem so they are ready for late 2009/early 2010 whereas AMD is going to have to finally get the much anticipated G3MX technology out the door to avoid its own bandwidth issues when it goes above hex core in the same time frame.

Tuesday, June 17, 2008

A Disturbing Change In AMD's Routine

AMD has two Analyst Days each year for the past three years. Apparently this year will be different.

2007: Technology Analyst Day - July 26; Financial Analyst Day - December 13
2006: Technology Analyst Day - June 1; Financial Analyst Day - December 14
2005: Technology Analyst Day - June 10; Financial Analyst Day - November 9

If you look at AMD's current schedule you'll notice something peculiar.

2008: Analyst Day - November 13

The late date would correspond to the time of year when AMD normally holds the Financial Analyst Day. If AMD were merely running a bit behind they could have pushed the Technology Analyst Day back to August but they didn't; it seems to missing from the schedule completely. Secondly, AMD doesn't call it the Financial Analyst Day which would suggest that it includes both. The most generous explanation would be that AMD has decided to combine the two days as a cost saving measure. This seems unlikely to me though since AMD has made no announcement to that effect. Presumably if there were a cost savings worth noting then AMD would be happy to say so.

The second possibility is that AMD is avoiding the Technology Analyst Day because they have nothing worth talking about. In other words, they've essentially escalated a policy of no information. I'm not sure though that that explanation is the best. AMD could easily make a half hearted attempt at a Technology Analyst Day which would probably include Shanghai demos, either information or demos of the next generation GPU's, plus information about 45nm, HKMG, the expected hex core chips, and DC 2.0. In other words, AMD could come up with a number of things to talk about if that is all they wanted to do.

I can think of a couple of other possibilities. One is that AMD is aware that their financial issues are the most pressing but AMD probably has nothing substantial to mention. The Q1 Earnings report ducked the question of what Asset Smart is and we'll probably see a repeat of this at the end of Q2. It would not surprise me if it took until November for AMD to have something substantial to talk about in terms of finances and earnings outlook. For example, that NY FAB deal doesn't seem that pressing now but by November the deadline for option on it will be just a little over two quarters away. Perhaps AMD doesn't know today but by November they should be able to say whether the deal is going through or they have to let it go. Secondly, if AMD really is going to make any headway in reducing losses they will have done it by November and have at least a starting track record to point to.

However, I doubt finances are the whole reason. Another reason probably has to do with the fact that the roadmaps from the last two Analyst Days have turned out to be fiction. There are currently no dual cores based on Barcelona. There is neither a Bobcat nor a Bulldozer scheduled for 2009 and in fact we currently have no idea when they might arrive. The contradictions seem to pile up with the introduction of the MCM hex core idea with 12 cores. Surely this would be a stopgap for the server market until Bulldozer. Yet we then have indications that Bulldozer will be made on 45nm technology. This seems strange to me because delaying Bulldozer past mid 2010 would seem to prevent a 45nm release unless AMD's 32nm has been delayed as well to 2011. This too however seems unlikely since we know that IBM is still talking about 32nm at the end of 2009. I guess confusion is bound to happen though when you scratch two roadmaps in a row.

I have also been wondering if AMD is going to put any improvements in the 2009 version of Shanghai or whether it is just a change to HKMG. In hindsight, it would have helped quite a bit if AMD had just put in a couple of improvements with Revision F in mid 2006. And, right now it certainly looks like Shanghai won't be changed much from Barcelona. Yet AMD had to know in 2007 that Barcelona had problems and also must have known that Penryn was an improvement on C2D and that Nehalem was on the way. It is possible that AMD decided then to make some improvements before the design had to be locked by late 2008 or early 2009. It is arguable that AMD didn't know that C2D would be that much of an improvement until it was too late to do anything more with Revision F and that at that point all AMD could do was hope for something better with Barcelona. This time around though the plea of ignorance would not be valid. So, the real question is whether AMD has enough spare bodies to work on a Shanghai bump. But AMD simply may not have the resources to spare.

If AMD really does have anything substantial on the burner then I guess we'll hear about it in November. I certainly hope we don't see another two rounds of minor upgrades that let Intel move further and further ahead and end up hoping for a big jump with Bulldozer as we did for Barcelona. I suppose there is room for hope but there is currently a lot more room for skepticism. Intel on the other hand is in a much better position. If Nehalem were not quite what was expected then Intel could simply extend Penryn until later in 2009 when the next upgrade of Nehalem would be due anyway. In fact, you could even argue that Intel has two safety nets since even the 65nm C2D's are ahead of AMD. However, I doubt Intel needs that much margin of safety since there doesn't seem to be anything wrong with Penryn.

Thursday, June 05, 2008

Nehalem Appears But Anandtech Chokes While Tooting Intel's Horn

Hopefully in the Nehalem preview Anand is telling the truth when he says that it was done without Intel's knowledge because Anandtech fumbled badly with this introduction. Clearly Anand is not someone to trust to carry your fine crystal.

June 5, 2008

Taken at face value the above scores show an impressive 21% increase in speed for Nehalem.

April 23, 2008

However, note that while the Q6600 score goes up 98 points from March to April the score for Phenom 9750 goes down 54 points.

March 27, 2008

And even more strange is that the score for Q9450 was 980 points higher back in February. If this number is accurate then Nehalem's increase is reduced to 10%.

February 4, 2008

However, even a 10% cumulative gain would be very nice on top of the gains we've seen for C2D and Penryn. Unfortunately, the single threaded scores are not quite as impressive.

June 5, 2008

This would be an increase of only 3% for Nehalem.

February 4, 2008

However, the score from February was 366 points higher. If this score is correct then Nehalem would be 9% slower.

Anand is now claiming that the reduction in speed is due to a Vista update. We can check this easily enough and see if the drop in speed is consistent.

3Dsmax 9 - 21% slower
Cinebench XCPU - 9% slower
Cinebench 1CPU - 11% slower
Pov-Ray 3.7 - 13% faster
DivX - 8% faster [version change from 5.13 to 5.03]

Well, this isn't exactly consistent. 3Dsmax shows twice the slowdown of Cinebench while Pov-Ray gets faster. DivX gets faster too although this is less definitive since Anand shifts to a slight older version.

Conclusions (or perhaps Confusions):

Penryn is 21% faster when using all threads, or . . . it is only 10% faster.

Penryn is 3% faster with single threads, or . . . it is 9% slower.

I am looking forward to seeing some proper testing of Nehalem. I'm sure most are anxious to see Nehalem compared with AMD's 45nm Shanghai. I have no doubt that Nehalem is much stronger in multi-threading than C2D. I believe that Intel did this with the goal of stopping AMD from taking server share. However, this is a double edged sword since gains for Nehalem will surely be losses for Itanium.

It remains to be seen if Nehalem fairs better in benchmarking than Penryn. Penryn benefited from code that used lots of cache and was predictable enough for prefetching to work. Nehalem however has far less cache but is also much less sensitive to prediction because of the reduced memory latency. Nehalem is also much more like Barcelona in that L3 has much higher latency than L2 making the bulk of Nehalem's cache much slower than it was with Penryn. One would imagine that Nehalem's preference in program structure would be much closer to Barcelona's preference than Penryn's has been. Comparisons later this year should be interesting.