Monday, December 29, 2008

Still Waiting

I should have some spare money by the end of January and plan to buy a new computer. However, like most people I am still waiting on comparisons between AMD's 45nm Shanghai and Intel's new Nehalem. The silence is deafening.

Reports trickle in from places like XtremeSystems Forums and other scattered locations of people doing freelance testing on Shanghai. The suggestions are pretty good for AMD. It looks like Shanghai gets a fairly good boost at the same clock versus Barcelona while overclocking much better. It also appears that Nehalem is worse at overclocking than Penryn. That doesn't really surprise me since an IMC seems to be tougher to overclock than a FSB and putting the memory controller on the die increases the thermal load. Frankly I was impressed by how much smaller the HSF was for Penryn versus C2D and was equally surprised to see how much bigger the HSF was for Nehalem (larger than C2D). I don't think that Intel is in any danger of another Prescott but some of the blush is definitely off the Penryn rose. I'm also wondering if reports of Nehalem overheating with the stock HSF (even with the larger size) are true.

Still, this is going to be a problem for Anand Lal Shimpi and I'm not sure what he is going to do about it. See, back in late 2005 Intel was getting thumped hard by Athlon X2 and Anand decided to be value conscious and railed against X2's "insane price". Then as C2D was released in 2006 Anand conveniently dropped his previous objection to price. In fact these days you never even see him mentioning insane prices when talking about Intel's Skulltrail system. Nevertheless, he did try to excuse his change on prices by noting that even the low end Intel's chips were much better at overclocking than AMD's.

I still have reservations about judging the value of a chip by overclocking because so few people do it. The truth is that most systems run at stock speed with stock cooling and integrated graphics. So, it is somewhat odd that Anandtech bases its value on discrete graphics with overclocking and premium cooling. Perhaps the fact that Intel excelled at overclocking with premium cooling but needed a graphics card to make up for its poor integrated graphics is just a coincidence . . . maybe. I would be happier if Anandtech did this in two parts, one for common users and another for performance users. But at any rate, I will readily admit that AMD wasn't even in the running for OC'ing until they released the B3 stepping with the SB750 southbridge. So, one has to wonder if Anandtech will suddenly change its stance on value based on overclocking now that Intel's once proud OC banner is lying on the ground in tatters.

It could be the fact that Anandtech does about three Intel articles for every AMD article that makes it appear biased. Or perhaps it is when Johan De Gelas admitted that their server testing was 18 months out of date. And, that coincidentally the things that they were behind on were the things that Intel's chips did poorly. I know that Anandtech took a big hit in credibility when they first criticized AMD's 4000 series for requiring a dual GPU card and then turned around and chose a dual nVidia card as the best value. Maybe I'm biased and just not giving Anandtech a fair shake. Maybe, but then the numbers agree with me. AMD's graphics sales are today mostly the new 4000 series while nVidia's are still the older 9000 series. In fact, demand for nVidia GT200 series is about as dismal as it was for AMD's 2000 series.

I know what the rumors are. The rumors are saying that Anandtech has already benched Shanghai and it does pretty well with Intel only holding onto the really expensive (overpriced) top slot. If prices don't change then Intel's i7 920 at 2.66Ghz is going to be going head to head at $300 with AMD's Phenom II 940 at 3.0Ghz. So, with equal price that would put Intel at a 13% clock disadvantage right off the bat. And, without the fig leaf of better overclocking it has been suggested that Anand is having a hard time spinning the comparison in Intel's favor. Time will tell but I certainly hope we see some real numbers over the next month.

Wednesday, August 13, 2008

Is Anandtech Any Help In Selecting A Computer?

In my last article I talked about trying to decide on a new computer system. My current plan is to put something together after I see how much difference Bloomfield and Deneb make. Information would be nice.

People keep asking me what is wrong with Anandtech's reviews. In all honesty I would love for Anandtech's reviews to be helpful and solid because this is not an easy time to be looking for a new system. The problem is that as I go over Anandtech's information I keep coming up empty.

According to Anandtech's graph above, the Q9300 draws less power than the E6750. This would be quite impressive if true. However, this graph is easily falsified because according to Intel the E6750 is 65 watts whereas the Q9300 is 95 watts. So either Intel's documentation is wrong or Anandtech is. I'm pretty sure that Intel knows more about their processors than Anandtech so I'm going to have to assume the chart is wrong. And, this means that the comparison with AMD's processors is equally unreliable.

Okay, so no help on power draw. How about prices? The easiest way to see where the good prices are is to graph price versus clock. Generally the curve rises slowly and then much more steeply at some point. Anandtech seems to graph everything except these price curves. Doing my own I can see that AMD's curve is incredibly flat so there is no reason to buy anything less than a 9950 BE unless you are putting together a low end system and shopping for a dual core.

Of course dual cores and integrated graphics go hand in hand on a bargain system since you can get plenty of each for under $100. Unfortunately, Anandtech's 780G Preview isn't much help because they tossed out common sense by using the 45 watt, 2.5Ghz 4850E instead of the perfectly acceptable 65 watt, 2.7Ghz 5200+ which is the same price. However, a far better value would be spending $10 more and getting the 65 watt, 2.8Ghz 5400+ Black Edition with unlocked multiplier. Also, they didn't bother doing any real testing, just video playback, so the article is nearly worthless. Well, how about the IGP Power Consumption article? No, this article is particularly daft because it should already be obvious to everyone that integrated graphics consume less power than discrete video cards. Secondly, the article has no performance information so it is impossible to determine any value. So, how about the NVIDIA 780a: Integrated Graphics article? This article is also lacking because they use a 125 watt 9850 quad core. This is really bizarre considering their earlier insistence on a 45 watt chip. Wouldn't common 65/95 watt chips make more sense? They also leave out any comparison with Intel systems. This is quite odd since the inclusion of nVidia should have allowed a cross comparison by normalizing based on nVidia/AMD with nVidia/Intel. So, this is no help in choosing between an AMD and Intel system. Finally, the Intel G45 and AMD 790GX articles are only token announcements with no actual testing. Clearly if we are looking for a low end system Anandtech is no help at all.

But since I'm not looking for a bargain system I'm really more interested in quads. Even though it is obvious that the 9950BE is the best value in AMD quads the value of the 750 southbridge can't be determined by a price graph. Anandtech does have a 750SB article which does show increased overclocking as well as an increase in northbridge clock but since they fail to do any actual testing you have no idea what value this might be. If we wanted to find out about Intel overclocking then there is more information. Well, sort of. In Anandtech's Overclocking Intel's New 45nm QX9650: article he does overclock all the way up to 4.4Ghz. However, curiously missing are power draw and temperature graphs. In fact, besides synthetics there is almost nothing in the article. The article itself even admits this:

We hope to expand future testing to include real-world gaming results from some of the newest titles like Crysis, Call of Duty 4: Modern Warfare, Unreal Tournament 3, and Gears of War. Stay on the lookout for these results and others in our next look at the QX9650 when we pair this capable processor with the best motherboards companies like ASUS, Gigabyte, MSI, abit, DFI and Foxconn have to offer.

The problem is that this article is from December 19, 2007 and eight months later there is still no followup article. Anandtech does however have articles on Atom, Larrabee, and Nehalem which also have little value in choosing a system today. However, Anandtech's Nehalem article is quite odd because it says:

We've been told to expect a 20 - 30% overall advantage over Penryn and it looks like Intel is on track to delivering just that in Q4. At 2.66GHz, Nehalem is already faster than the fastest 3.2GHz Penryns on the market today.

If this is true then Intel hasn't just shot itself in the foot; it has taken a chainsaw to both legs. This would mean that the value of Intel's entire 45nm quad line has just dropped through the floor and the effect on the 65nm line would be even worse. Considering that today Intel is only at 1/3rd 45nm production by volume this would mean that Intel would have massively reduced the value of most of its line. And, if this is true then no one should buy anything higher than Q6600 until Bloomfield is released. The problem with this scenario is that Intel already went down this road with PII and PII Celeron so I doubt it will make this mistake again.

Graphics are also a bit strange at Anandtech. For example, in the HD 4870 article Anand says:

For now, the Radeon HD 4870 and 4850 are both solid values and cards we would absolutely recommend to readers looking for hardware at the $200 and $300 price points. The fact of the matter is that by NVIDIA's standards, the 4870 should be priced at $400 and the 4850 should be around $250. You can either look at it as AMD giving you a bargain or NVIDIA charging too much, either way it's healthy competition in the graphics industry once again (after far too long of a hiatus).

So, he likes the HD 4870. And, in the NVIDIA GeForce GTX 280 & 260 article he said:

the GeForce GTX 280 is simply overpriced for the performance it delivers. It is NVIDIA's fastest single-card, single-GPU solution, but for $150 less than a GTX 280 you get a faster graphics card with NVIDIA's own GeForce 9800 GX2. The obvious downside to the GX2 over the GTX 280 is that it is a multi-GPU card and there are going to be some situations where it doesn't scale well, but overall it is a far better buy than the GTX 280.

Keep in mind that the 9800 GX2 is dual GPU and cost $500 at the time of the article. The tone changes however in the Radeon HD 4870 X2 article. Anand admits:

The Radeon HD 4870 X2 is good, it continues to be the world's fastest single card solution

He also says:

But until we have shared framebuffers and real cooperation on rendering frames from a multi-GPU solution we just aren't going to see the kind of robust, consistent results most people will expect when spending over $550+ on graphics hardware.

So, he doesn't have a problem endorsing a $500, dual GPU card from nVidia yet he balks at endorsing a $550, dual GPU card from AMD that is much more powerful. This does seem more than a little arbitrary.

I might be interested in Linux but Anandtech hasn't done a Linux review since 2005. It is possible that they lost all of their Linux expertise when Kristopher Kubicki left. That's too bad but with a quad core I am very interested in mixed performance but Anandtech likewise has not done mixed testing since 2005 even under Windows. Yet, this is what the processor would typically be doing for me, running two or three different applications. Under normal circumstances I wouldn't be running four copies of the same code nor would I be splitting up one application among four cores but this is the only type of testing Anandtech does these days. This leaves a huge gap between the typical Anandtech testing which is only suitable for single and to some extent dual cores and the all out quad socket/quad core server benchmarks that they ran.

The bottom line is that Anandtech's testing is only dual core caliber (when it is accurate). However, the poor integrated graphics testing offers no help for a typical system that would be used with dual core. But even in the area of quad core/discrete graphics where Anandtech should be strong you have to deal with their schizophrenic attitude about multi-card/dual card graphics and their ambivalent attitude about power consumption. Pricing seems to get similar ill treatment at Anandtech mostly as a crutch when it happens to support their already drawn conclusions. So, my conlusion would have to be that Anandtech isn't much help in choosing a computer system.

Monday, August 11, 2008

A Difficult Season For PC Shopping

I'll probably buy a new computer within a few months. Since I'm not likely to buy another one for two or three years this is a difficult decision and I've begun the long process of figuring out what to buy.

I suppose shopping for a PC is never a truly easy task but it seems that right now this task has become twice as hard. I think Tom's Hardware Guide said it well:

If you already have a Core 2 system or a fast Athlon 64 X2 or Phenom, you shouldn't’t rush now. It makes more sense to for wait for Intel’s Nehalem architecture with the X58 chipset, as well as AMD’s Socket AM3 platform.

The problem is that I don't have a either a Core 2 or X2. I have a 1.8Ghz P4 desktop system and a 2.0Ghz Athlon 64 notebook. But, I'm also not really wanting to wait another full year to make a purchase. I guess there is nothing to do but try to slog through this methodically and see if some configuration makes sense.

I checked the prices of CPU's on NewEgg and others and put these into a spreadsheet graph. It is obvious that none of the processors above 2.8Ghz are cost effective; you get only a tiny bit more performance for a big increase in price. The AMD curve is quite odd because it actually gets flatter above 2.3Ghz. Consequently, the 2.6Ghz 9950 Black Edition at $235 ends up being the only real choice. However, I'm not sure I want a 140 watt processor. On the Intel side the venerable Q6600 doesn't seem to be as good a bargain as it used to be. The two best choices seem to be the 2.66Ghz Q6700 for $275 or the 2.83Ghz Q9550 for $340. These are both 95 watt processors. AMD's fastest 95 watt processor the 2.4Ghz 9750 seems to be scarce these days. This is probably irrelevant though since you could always run the 9950 at 2.4Ghz and get the same wattage. So, again, on the AMD side, the 9950 BE seems to be the only current choice.

For future choices the 2.66Ghz Bloomfield is expected to cost $284. This makes it comparable in price with Q6700. The X58 chipset is a bit of a wild card at the moment but if it were a reasonable price this would probably knock the Q6700 out of the running. The P45 chipset wasn't received as favorably so this would seem to make it a choice between Q9550 with P35/X48 and Bloomfield/X58. However, if X48 begins to look best I would have to compare with nVidias chipsets as well.

I have no idea what AMD's 45nm Deneb might run but given the current prices it should have some room to increase. So, we're probably talking a 2.6 - 2.83Ghz chip in the $235-$340 range with both Intel and AMD having chips that fit. As far as I know, AMD wouldn't be changing chipsets so we should be looking at a 790GX or FX chipset with a 750 southbridge. There doesn't seem to be any good reason to look at the nVidia chipsets since they don't have the 750 southbridge ACC capability. I like this feature because not only should it allow higher overclocks it should allow less power draw at the same clock as well.

For graphics I'm looking at $200-$400 range which should include nVidia's 9800X2, GTX 260 as well as AMD's HD 4850 and 4870. There will probably be others though by October. However, I suppose a PCIe 2.0 card would make it necessary to re-examine the Intel P45 chipset and drop the P35. Running a search it appears that if I require two PCIe 2.0 ports then the choice becomes X38/X48, P43/P45, or nVidia nForce 700. Overall I doubt the motherboard is going to be much of a factor unless the X58 is particularly high. I would probably use four harddrives in a RAID 5 which should mean a mid case or larger. I haven't decided how big of a power supply to get. Right off hand I would say 4-8GB of memory. I would probably dual boot Unix/Windows but not sure which version of Unix. At the moment I'm leaning towards the Dragonfly version of BSD. I'm probably looking at a 22-26" LCD monitor but I can see right now I'm going to have to go over all the technical details of response time, contrast, and viewing angle to pick one. This is going to take some time so I suppose it is good to start now. Also, I imagine I'm going to have to revisit this topic after the release of the newer graphics cards and then Bloomfield and Deneb.

Monday, July 14, 2008

Reviews And Fairness Or How To Make Intel Look Good

I've had people complain that I've been too tough on Anand but in all honesty Anandtech is not the only website playing fast and loose with reviews.

Anand has made a lot of mistakes lately that he has had to correct. But, aside from mistakes Anand clearly favors Intel. This is not hard to see. Go to the Anandtech home page and there on the left just below the word "Galleries" is a hot button to the Intel Resource Center. Go to IT Computing and the button is still there. Click on the CPU/Chipset tab at the top and not only is the button still there on the left but a new quick tab to the Intel Resource Center has been added under the section title right next to All CPU & Chipset Articles. Click the Motherboards tab and the button disappears but there is the quick tab under the section title. There are no buttons or quick tabs for AMD. In fact there are no quick tabs to any company except Intel. Clearly Intel enjoys a favored status at Anandtech.

What we've seen since C2D was released is a general shift in benchmarks that favor Intel. In other words instead of shooting at the same target we have reviewers looking to see where Intel's arrows strike and then painting a bullseye that includes as many of them as possible. For example, encryption used to be used as a benchmark but AMD did too well on this so it was dropped and replaced with something more favorable to Intel. There has been a similar process for several other benchmarks. Of course now it isn't just processors. Reviewers have carefully avoided comparing the performance of high end video cards on AMD and Intel processors. Reviews are typically done only on high end Intel quad cores. The claim is that this is for fairness but it also avoids showing any advantage that AMD might have due to HyperTransport. It is a subtle but definite difference where review sites avoid testing power draw and graphic performance with just Integrated Graphics where AMD would have an advantage. They then test performance with only a mid range graphic card to avoid any FSB congestion which again might give AMD an advantage. Then high end graphics cards are tested on Intel platforms only which avoids showing any problems that might be particular to Intel. We are also now hearing about speed tests being done with AMD's Cool and Quiet turned on which by itself is good for a 5% hit. I suppose reviewers could try to argue that this is a stock configuration but these are the same reviewers who tout overclocking performance. So, by shifting the configuration with each test they carefully avoid showing any of Intel's weaknesses. This is actually quite clever in terms of deception.

As you can imagine the most fervent supporters of this system are those like Anand Lal Shimpi who strongly prefer Intel over AMD. I had one such Intel fan insist that Intel will show the world just how great it was when Nehalem is released. However, I have a counter prediction. I'm going to predict that we will see another round of benchmark shuffling when Nehalem is released. And, I believe we will see a concerted effort to not only make Nehalem look good versus AMD's Shanghai but also to make Nehalem look good compared to Intel's current Penryn processor. It would be a disaster for reviewers to compare Nehalem and conclude that no one should buy it because Penryn is still faster . . . so that isn't going to happen.

An example is that since AMD uses separate memory areas for each processor it needs an OS and applications that work with NUMA. In the past reviewers have run OS's and benchmarks alike oblivious to whether they worked with NUMA or not. If anything seems to be overly slow they just chalk it up to AMD's smaller size, lack of money, fewer engineers, etc. Nehalem however also has separate memory areas and needs NUMA as well. I predict that these reviewers will suddenly become very sensitive to whether or not a given benchmark is NUMA compatible and will be quick to dismiss any benchmark that isn't. This may extend so far as to purposefully run NUMA on Penryn to reduce its performance. This would easily be explained away as a necessary shift while ignoring that it wasn't done for K8 or Barcelona. That would be explained away as well by saying that the market wasn't ready for it yet when K8 was launched. That was what happened with 64 bit code which was mostly ignored. However, if Intel had made the shift to 64 bits reviewers would have fallen all over themselves to do 64 bit reviews and proclaim AMD as out of date just as they did every time Intel launched a new version of SSE.

We see this today with single threaded code. C2D and Penryn work great with single threaded code but have less of an advantage with multi-threaded code and no actual advantage with mixed code. It is a quirk of Intel's architecture that sharing is much more efficient when the same code is run multiple times. If you compared multi-tasking by running a different benchmark on each core Intel would lose its sharing advantage and have to deal with more L2 cache thrashing. Even though mixed code tests would be closer to what people actually do with processors reviewers avoid this type of testing like the plague. The last thing they want to do is have AMD to match Intel in performance under heavy load or worse still actually have AMD beat a higher clocked Penryn. But Nehalem uses HyperThreading to get its performance so I predict that reviewers will suddenly decide that single threaded code (as they prefer today) is old fashioned and out of date and not so important after all. They will decide that the market is now ready for it (because Intel needs it of course).

Cache tuning is another issue. P4EE had a large cache as did Conroe. C2D doubled the amount of cache that Yonah used and Penryns have even more. However, reviewers carefully avoid the question of whether or not processors benefit from cache. This is because benchmark improvements due to cache tend to be paper improvements that don't show up on real application code. So, it is best to avoid comparing processors of different cache sizes to see benchmarks are getting artificial boosts from cache. I did have one person try to defend this by claiming that programmers would of course write code to match the cache size. That might sound good to the average person but I've been a programmer for more than 25 years. Try and guess what would happen on a real system if you ran several applications that were all tuned to use the whole cache. Disastrous is the word that comes to mind. But you can avoid this on paper by never doing mixed testing. A more realistic test for a quad core processor is to run something like Folding@Home on one core and graphic encoding on another while using the remaining two to run the operating system and perhaps a game. Since the tests have to be repeatable you can't run Folding@Home as a benchmark but that isn't a problem since it is the type processing that needs to be simulated rather than the specific code. For example you could probably run two different levels of Prime95 tests in the background while running a game benchmark on the other two cores to have repeatable results. And, if you do run a game benchmark on all four cores then for heavens sake use a high graphic card like a 9800X2 instead of an outdated 8800.

Cache will be an issue for Nehalem because it not only has less than Penryn but it has less than Shanghai as well. It also loses most of its fast L2 in favor of much slower L3. My guess is that if any benchmarks are faster on Penryn due to unrealistic cache tuning this will be quickly dropped. That reviews shift with the Intel winds is not hard to see. Toms Hardware Guide went out of its way to "prove" that AMD's higher memory bandwidth wasn't an advantage and that Kentsfield's four cores were not bottlenecked by memory. But now that Nehalem has 3 memory channels the advantage of more memory bandwidth is mentioned in every preview. We'll get the same thing when Intel's Quick Path is compared with AMD's HyperTransport. Reviewers will be quick to point to raw bandwidth and claim that Intel has much more. They of course will never mention that the standard was derated from the last generation of PCI-e and that in practice you won't get more bandwidth than you would with HyperTransport 3.0.

I could be wrong; maybe review sites won't shift benchmarks when Nehalem appears. Maybe they will stop giving Intel an advantage. I won't hold my breath though.


We can see where Ars Technica discovers PCMark 2005 error. Strangely the memory score gets faster when PCMark thinks the processor is an Intel than when it thinks it is a an AMD. Clearly the bias is entirely within the software since the processor is the same in all three tests:

I've had Intel fans claim that it doesn't matter if Anandtech cheats in Intel's favor because X-BitLabs cheats in AMD's favor. Yet, here is the X-Bit New Wolfdale Processor Stepping: Core 2 Duo E8600 Review from July 28, 2008. In the overclocking section it says:

At this voltage setting our CPU worked stably at 4.57GHz frequency. It passed a one-hour OCCT stability test as well as Prime95 in Small FFTs mode. During this stress-testing maximum processor temperature didn’t exceed 80ÂșC according to the readings from built-in thermal diodes.

This sounds good but the problem is that to properly test you have run two separate copies of Prime95 with core affinity set so that it runs on each core. This article doesn't really say that they did that. There is a second problem as well dealing with both stability and power draw testing:

We measured the system power consumption in three states. Besides our standard measurements at CPU’s default speeds in idle mode and with maximum CPU utilization created by Prime95

This is actually wrong; the Prime95 test they peformed was not maximum power draw; it was Prime95 in Small FFTs mode. But this doesn't agree with Prime95 itself which clearly states in the Torture Test options:

In-place large FFTs (maximum heat, power consumption, some RAM test)

So, the Intel processors were not tested properly. Using the small FFTs does not test either maximum power draw or temperature and therefore doesn't really test stability. If any cheating is taking place at X-Bit, it seems to be in Intel's favor.

Wednesday, July 02, 2008

Anand's Competence Reviewed: Crash And Burn

Anandtech's latest review, AMD's Phenom X4 9950, 9350e and 9150e: Lower Prices, Voltage Tricks and Strange Behavior shows a lot more about Anand's ability as a tester than anything about AMD's hardware.

My older brother used to work on aircraft avionics on the weekends while he was going to college at Purdue. Theoretically he was one of the junior technicians at the small, commuter airline. However, he had gotten in his experience in the Marine Corps working on HAWK missile systems which required an elaborate set of five separate radar units to operate. On one occasion he accompanied a senior technician to another airport where one of their planes was down for maintenance. The other technician worked on the avionics for three hours without success and then quit to go to lunch. By the time he got back from lunch my brother had diagnosed and fixed the problem. I have to say that Mr. Lal Shimpi reminds me a lot of that senior technician. Maybe because I've never heard of anyone or any other review site wrecking systems like Anand does.

But it isn't just destroying systems that is troubling. In this article Anand says:

The first processor is the Phenom X4 9950 Black Edition. Clocked at 2.6GHz, the Black Edition moniker indicates that it ships completely unlocked. Unfortunately the unlocked nature doesn’t really help you too much as the 65nm Phenoms aren’t really able to scale much beyond 2.7GHz consistently

This is a strange claim indeed because everyone else seems to be able to clock these chips to 3.2Ghz with no trouble, and I've seen claims as high as 3.6Ghz. He also insists that Intel's quads will overclock to 4.0Ghz on air. But, in reality when you stress all four cores on an Intel quad they will overheat at 4.0Ghz without water cooling. So, the real difference between recent Intel quads and recent AMD quads when overclocking on air seems to be between 300Mhz and 500Mhz. This means that Anand has taken a probable 500Mhz advantage for Intel and stretched it to a completely fictitious 1.3Ghz advantage. No wonder Anand is seen as a minor deity among the Intel faithful.

This article shows Anand's preferences. He likes Intel. He likes powerful chips. He likes newer chips. And, he likes a low price but he doesn't like giving up anything to get it. This is easy to see from statements like:

The new $133 golden boy, Intel's Core 2 Duo E7200, is actually selling for $129 these days - making it the new value leader from the boys in blue.

He chooses this 2.53Ghz, 45nm chip because he is ignoring the lower priced Allendales that are older, 65nm chips and have less cache. The lower priced Allendales are good chips but since they have less cache and don't overclock as well they are not clearly a better value than AMD chips. The more expensive E7200 is a bargain if you do overclock because it can easily run as fast as Intel's 3.2Ghz E8500 which costs twice as much. Similarly, the Q6600 had enjoyed the status of being the best value quad since it was introduced in 2006. But, Anand doesn't heap the same praise on the 2.5Ghz Q9300 that he does on E7200 perhaps because while Q6600 has dropped to just $210, Q9300 is still running $270. Anand's biggest problem at this point is that AMD is eroding Intel's lead and with it his perceived value of these chips. Keep in mind though that this is mostly in Anand's head; the Q6600 is still a good chip as are the Allendales. The problem for him is that he doesn't want a good chip; he wants a chip that is clearly better than AMD's, and that smug feeling of Intel superiority is getting a lot harder to come by.

For example, you could match the $130 (2.53Ghz) E7200 with an AMD $126 (2.9Gh) 5600X2. Both chips are 65 watt but even at 400Mhz slower the Intel chip will still be a bit faster. The problem is though that Intel motherboards have poor integrated graphics. An equivalent Intel motherboard would need at least a low graphic card to match AMD and you could apply this savings to a $160 (3.2Gh) 6400X2 which with its 26% faster clock is not markedly slower. In other words if you stay with integrated graphics and stock speeds then Intel has no advantage because you will pay more money to get a faster system. However, adding a robust discrete graphic card neutralizes AMD's superior integrated graphics motherboards. And the better overclocking on Intel duals tends to neutralize AMD's lower dual price. This is probably why Anand always pushes a system with overclocking and discrete graphics.

But, things are slowly changing and I think Anand is catching disturbing glimpses of the handwriting on the wall. As the price of AMD's tri-cores gets lower the power of three cores tends to remove Intel's higher clocking dual advantage. Secondly, the B3 stepping greatly improves the overclockability of both AMD's tri-cores and quads. For example, multiply the E7200's $130 by 1.5 for a third core and you get $195. This makes the slightly slower 2.4Ghz Phenom 8750 X3 arguably a good value at $175. And, with its higher pricetag Q9300 is not a bargain unless it can overclock significantly better than 9950 X4. Perhaps this is why Anand is in such denial about how well the newer AMD B3 chips can overclock.

I've also never heard of anyone wrecking so many motherboards during testing. This admission by Anand is a bit shocking:

Let's just say in the motherboard section of the labs that a halon fire extinguisher is now a standard item on the test bench. Call us unlucky, abusive, or having just dumb luck, but our results these past few weeks when overclocking IGP setups has not been good. In fact, it has been downright terrible as of this week.

You see, it is not every week when you can go through five boards in less than 48 hours while trying to make an article deadline.

However, we know that Anand has destroyed motherboards and systems before including systems that were working perfectly before he got his hands on them. This has been going on for a long time, not just this week as he implies. This explains why Anand does not work in a computer repair shop.

Finally, his power testing is typical Anand. He tests completely different motherboards with different integrated graphics and measures nothing but total system power draw. However, he then strangely claims that his results show only differences in CPU power draw:

With the exception of the Q9300, Intel's competing chips draw less power at idle than even the new energy efficient AMD chips

He then pretends he has a fair comparison because he is using top IGP boards from each:

The next set of tests is particularly interesting as we are comparing Intel's top integrated graphics platform (G35) to AMD's (780G). No external graphics card was used, this is strictly an IGP comparison

However, this comparison is laughable since Intel's G35 graphics are considerably less powerful than AMD's 780G and therefore probably draw less power. Anand then makes certain that he covers up Intel's weaker IGP by using discrete graphics cards for games testing. This bait and switch testing scheme is clearly and knowingly deceptive and shows that Anand is not merely incompetent but dishonest as well. Testing based on price, stock clock speeds, stock heatsinks, and integrated graphics would often favor AMD which is why these tests never end up in Anands cherry basket.

Thursday, June 26, 2008

Top End Graphics

AMD's newest graphics card has definitely sent shock waves rolling across the graphics landscape.

It is clear from reviews such as Anandtech's The Radeon HD 4850 & 4870: AMD Wins at $199 and $299 that AMD is doing much better in terms of graphics. To be clear, nVidia's GT 280 is still at the top but HD 4870 has managed to surpass GT 260 to take second place. Now, with performance midway between the $400 GT 260 and $650 GT 280 the HD 4870 should be priced at $525. Instead the HD 4870 is a phenomenal bargain at just $300. It looks like nVidia's shiny new GT 200 series is now greatly overpriced. Based on the comparison GT 260 should probably be priced at about $280 and GT 280 at perhaps $350. Nor is nVidia's last minute entry of GeForce 9800 GTX+ much help. This product at $230 is not really better than the $200 HD 4850. Anandtech says:

The Radeon HD 4850 continues to be a better buy than NVIDIA's GeForce 9800 GTX, even if both are priced at $199. The overclocked, 55nm 9800 GTX+ manages to barely outperform the 4850 in a few titles, but loses by a larger margin in others, so for the most part it isn't competitive enough to justify the extra $30.

I have to say though that even when AMD does well there is still subtle bias at Anandtech. For example in the final comments they have to get in this sour note:

You may have noticed better CrossFire scaling in Bioshock and the Witcher since our Radeon HD 4850 preview just a few days ago. The reason for the improved scaling is that AMD provided us with a new driver drop yesterday (and quietly made public) that enables CrossFire profiles for both of these games. The correlation between the timing of our review and AMD addressing poor CF scaling in those two games is supicious. If AMD is truly going to go the multi-GPU route for its high end parts, it needs to enable more consistent support for CF across the board - regardless of whether or not we feature those games in our reviews.

To be perfectly honest I'm a bit floored by this comment. Last minute driver fixes for products that won't even be available for another month shouldn't be too unusual for people who claim to have experience with reviews. That is pretty much the nature of trying to get your reviews out quickly in competition with other websites. Newspapers do the same thing when they chase a breaking story and sometimes they are editing copy right up until it goes to press. This negative comment is even more odd when contrasted with the comment further back in the article:

It is worth noting that we are able to see these performance gains due to a late driver drop by AMD that enables CrossFire support in The Witcher. We do hope that AMD looks at enabling CrossFire in games other than those we test, but we do appreciate the quick turnaround in enabling support - at least once it was brought to their attention.

This seems to be Anandtech schizophrenia at its finest when appreciation for "the quick turnaround in enabling support" turns into suspicion at the "correlation between the timing of our review and AMD addressing poor CF scaling ". Guys, if you read your own text it says that you brought it to AMD's attention and then they sent you an update. We can also see that Anandtech was careful not to end with this comment:

We've said it over and over again: while CrossFire doesn't scale as consistently as SLI, when it does, it has the potential to outscale SLI, and The Witcher is the perfect example of that. While the GeForce GTX 280 sees performance go up 55% from one to two cards, the Radeon HD 4870 sees a full 100% increase in performance.

I assume nVidia must be sweating a bit over a comment like that. This suggests that some more driver work could put dual HD 4870 ahead of nVidia's dual GT 280 in a lot games. This seems especially true when we recall that unlike 3870 X2 the new 4000 series uses a special proprietary inter-processor link instead of using Crossfire. I think we can give credit for that to AMD whose engineers no doubt have lots of experience doing the same thing with CPU's. We've certainly come a long way since AMD's 2900XT last year which although worse in performance than nVidia's 8800 GT still drew a lot more watts. This should also be very good news for Intel. Intel motherboards have by far the weakest integrated graphics compared to nVidia and AMD and really need a boost with an added discrete graphic card. However, Intel motherboards can't do SLI so although Intel is loath to admit it the best graphics on Intel motherboards can only be had with AMD graphic cards using Crossfire. This means that AMD's huge leap in performance with HD 4870 has also just given a big boost to Intel motherboards for gaming.

I'm also a bit doubtful about nVidia's current strategy. nVidia must have been feeling pretty smug last year compared to 2900XT and even the recent shrink to 3870 can't have been too much of a concern. But HD 4870 is a big problem because it performs better than GT 260 but has a much smaller die. This means that nVidia has now just gained a huge liability for 200 and 8800 series inventory out in the field. nVidia will probably have to give rebates to recent wholesale buyers to offset what will almost certainly be substantial reductions in list prices. This also means that nVidia's current inventory has taken a hit as well. Although the 200 series is due for a shrink, the die is so much larger AMD's 4000 series that it is difficult to imagine that nVidia can turn a good profit even at 55nm's. With GT 280 designed for a price point above $600 nVidia is probably going to take a good revenue hit in the next quarter. I wouldn't be surprised to see nVidia change course and start thinking about reducing die size like AMD.

Finally, there should be no doubt at this point that the ATI purchase has turned ATI around giving them both more resources for product development as well as access to a much larger sales force. A rising tide for ATI products is probably some benefit to AMD but obviously AMD has put a great deal of money and resources into the merger. It still appears that AMD had no real choice in the matter but we'll probably have to wait to find out if this relationship has indeed become greater than the sum of its parts.

Some apparently feel that I pick on Anandtech too much so I'll link to some other reviews. Tech Report is considerably more upbeat in their conclusions:

The RV770 GPU looks to be an unequivocal success on almost every front. In its most affordable form, the Radeon HD 4850 delivers higher performance overall than the GeForce 9800 GTX and redefines GPU value at the ever-popular $199 price point. Meanwhile, the RV770's most potent form is even more impressive, in my view. Onboard the Radeon HD 4870, this GPU sets a new standard for architectural efficiency—in terms of performance per die area—due to two things: a broad-reaching rearchitecting and optimization the of R600 graphics core and the astounding amount of bandwidth GDDR5 memory can transfer over a 256-bit interface. Both of these things seem to work every bit as well as advertised. In practical terms, what all of this means is that the Radeon HD 4870, a $299 product, competes closely with the GeForce GTX 260, a $399 card based on a chip twice the size.

You may also get a bit better technical description of the architecture at Tech Reports and you get graphs of games at multiple resolutions.

AMD decided a while back, after the R600 debacle, to stop building high-end GPUs as a cost-cutting measure and instead address the high end with multi-GPU solutions.

This is mostly incorrect. The shaders were increased from 320 to 800 which is definitely a brute force approach to scaling. It appears that what AMD actually did was revamp the memory bus and then apply those power savings to additional shaders. In other words, it appears that AMD's limit is power draw (as born out in a number of reviews) rather an arbitrary stopping point. We also disagree in terms of die size as I believe that AMD has finally gotten dual GPU right by using a proprietary link.

The Legit Reviews article is not nearly as good. There benchmarking is only a fraction of what TR and Anandtech do nor do they have a GT 260 for comparison. However, their most bizarre statement concerns power consumption:

The GeForce GTX 280 has impressive power savings features as you can tell above. The HIS Radeon HD 4870 uses GDDR5 that is supposed to save energy, so they must have had to really increase the core voltage to reach 750MHz. Both the Radeon HD 4850 and HD 4870 use a little more power than we want to see.

From this description you would assume that HD 4870 draws more power than GT 280; this is not the case. In fact, GT 280 draws 20 amps more than HD 4870 under load. Those "impressive power savings" are only seen at idle. So, this characterization is certainly debatable since most computers have sleep or standby modes when they are idle.

The Hexus Review is also missing a GT 260 comparison. I find the author's style of English to be tougher to get information from. I'm not sure whether that is due to his UK heritage or just his heavy tongue in cheek and somewhat meandering writing style. It's probably a bit of both.

Then we have the HardOCP Review which thankfully does include a comparison with GT 260. Their conclusions are:

AMD’s new Radeon HD 4870 and Radeon HD 4850 offer a gigantic performance improvement over their last generation of GPUs and offer great values when compared to NVIDIA new GTX 280 and GTX 260. These performance improvements translate into real-world gaming benefits being able to play at higher resolutions with higher in-game settings utilizing AA and AF.

We were surprised to see the Radeon HD 4870 competing well against the GeForce GTX 260. It not only matched the gameplay experience, but in Assassin’s Creed it edged out with faster framerates too. Count on drivers to mature in both camps as well.

The Radeon HD 4850 is the new sub-$200 video card to beat as it provides the best gameplay experience for the money. It provides a better experience than a GeForce 8800 GT and a GeForce 9800 GTX and is on par with GTX 260 while being less expensive.

While NVIDIA still ultimately holds the single-GPU performance crown with the GeForce GTX 280 it also holds the “Hard to Justify the Price” and “More Money Less Value” crowns with the GTX 280 as well. AMD is now offering the best value in video cards with its 4800 series GPU. And we also see AMD’s drivers maturing more for the 4800 series than we do for the new 200 series from NVIDIA so it is our guess that AMD’s performance will get even better with its new GPUs comparatively.

I pretty much agree with this. We further have yet another reference to maturing drivers. Peddie of Jon Peddie Research even suggested, "Nvidia and ATI keep improving their drivers so they'll seesaw back and forth with their scores, almost from week to week". So, again the notion of some kind of conspiracy as suggested by Anandtech seems to have no connection with reality. Peddie also seems to agree with the idea that nVidia may have to shift to smaller dies.

Friday, June 20, 2008

The Value Of Benchmarks

Chico Marx had a great line in Duck Soup where he asked, "Who are you going to believe, me or your own eyes?" Apparently Intel is now asking the same question.

Intel has on its website some graphs purporting to show "breakthrough performance and energy efficiency" for Intel 7300 Xeon in virtualization. These are the vConsolidate benchmarks one of which uses VMware. Intel's graph at 2.01 towers over the AMD graph at just 1.08. The problem with this comparison is that reality tends to get in the way. First, Intel is comparing four Quad-Core Intel Xeon X7350 2.93GHz processors against four AMD Dual-Core Opteron 8222SE 3.0GHz processors. Perhaps the fact that Intel is using twice as many cores explains why its score is twice as high. Secondly, where did this vConsolidate benchmark come from? According to Intel:

vConsolidate is a benchmark developed by Intel Corporation to measure Server Consolidation performance.

So we are supposed to trust that Intel didn't massage its own benchmark a bit to favor its own processors? Right. Oddly enough there is a benchmark, VMmark, which is from the same people who make VMware which is what Intel claims to be testing. The problem for Intel is that:

VMmark software is agnostic towards individual hardware platforms and virtualization software systems so that users can get an objective measurement of virtualization performance.

The last thing Intel wants is an objective measurement of virtualization performance when the VMmark results show:

Dell 4x Quadcore AMD Opteron 2.5Ghz 8360 SE R905 - 14.17
Dell 4x Quadcore Intel Xeon 2.93Ghz X7350 R900 - 12.23

With 15% less clock speed the AMD system scores 16% higher.

There are also the SPEC results listed by Heise Online which show:

Dell 4x Quad Opteron 2.5GHz 8360 SE R905 - SPECint_rate2006: 167
Bull 4x Quad Xeon 2.93GHz X7350 R480E1 - SPECint_rate2006: 177

AMD is 6% slower in SPECint_rate with a 15% slower clock.

Dell 4x Quad Opteron 2.5GHz 8360 SE R905 - SPECfp_rate2006: 152
Bull 4x Quad Xeon 2.93GHz X7350 R480E1 - SPECfp_rate2006: 108

AMD is 41% faster in SPECfp_rate with a 15% slower clock.

Not all of the server benchmarks are bad for Intel though. In SAP SD, Intel and AMD are much closer:

HP ProLiant BL685c G5, 4 cpu's/16 cores/16 threads, Quad-Core AMD Opteron 8356, 2.3 GHz: 3,524 SD, SAPS: 17,650

HP ProLiant BL680c G5, 4 cpu's/16 cores/16 threads, Quad-Core Intel Xeon E7340 2.4 GHz: 3,500 SD, SAPS: 17,550

HP ProLiant DL580 G5, 4 cpu's/16 cores/16 threads, Quad-Core Intel Xeon X7350 2.93 GHz: 3,705 SD, SAPS: 18,530

With 4% more speed Intel ties AMD and with 27% more cpu speed it is 5% faster.

While in SPECjbb2005 Intel wins with higher clock speed:

HP ProLiant DL585 G5, 4 Opteron 2.3 GHz 8356s 4 × 4: 368,543

Sun Fire X4450, 4 Xeon 2.93 GHz X7350s 4 × 4: 464,355

With 27% more speed, Intel is 26% faster.

So, if Intel had more integrity they would show the benchmarks where they legitimately win like SPECint_rate, SPECjbb, and SAP SD instead of creating their own skewed benchmarks. I'm sure Intel enthusiasts will leap in to say that the only reason Barcelona does so well is because each of the four processors has its own IMC while Tigerton uses a quad FSB northbridge and has to share the same memory. Interestingly, when I brought up this same point 20 months ago in October 2006 Tigerton or Kittenton? many Intel enthusiasts said I didn't know what I was talking about and that memory bandwidth would not be an issue because the quad FSB Caneland chipset would fix everything. I guess I can't be wrong all the time.

Intel proponents are correct to point out that Nehalem will solve this problem and finally deliver real 4-way performance to Intel. The problem is that this won't happen anytime soon. Today, Intel is stuck with Tigerton and later this year they will introduce the hex core Dunnington which will just make the memory bottlenecks worse. We won't see a 4-way version of Nehalem for more than a year until late 2009.

And, although Nehalem's robust triple channel memory controller has been touted many times the truth is that it isn't needed yet. I've already seen people suggesting that Nehalem's triple channel IMC will increase your gaming performance. Don't hold your breath. The truth is that dropping the FSB and external northbridge does greatly reduce latency. However, in terms of actual bandwidth DDR3 should be fine with just two channels up to hex core. It really isn't until you move up to octal core that triple channel memory begins to shine. Intel already has this with Nehalem so they are ready for late 2009/early 2010 whereas AMD is going to have to finally get the much anticipated G3MX technology out the door to avoid its own bandwidth issues when it goes above hex core in the same time frame.

Tuesday, June 17, 2008

A Disturbing Change In AMD's Routine

AMD has two Analyst Days each year for the past three years. Apparently this year will be different.

2007: Technology Analyst Day - July 26; Financial Analyst Day - December 13
2006: Technology Analyst Day - June 1; Financial Analyst Day - December 14
2005: Technology Analyst Day - June 10; Financial Analyst Day - November 9

If you look at AMD's current schedule you'll notice something peculiar.

2008: Analyst Day - November 13

The late date would correspond to the time of year when AMD normally holds the Financial Analyst Day. If AMD were merely running a bit behind they could have pushed the Technology Analyst Day back to August but they didn't; it seems to missing from the schedule completely. Secondly, AMD doesn't call it the Financial Analyst Day which would suggest that it includes both. The most generous explanation would be that AMD has decided to combine the two days as a cost saving measure. This seems unlikely to me though since AMD has made no announcement to that effect. Presumably if there were a cost savings worth noting then AMD would be happy to say so.

The second possibility is that AMD is avoiding the Technology Analyst Day because they have nothing worth talking about. In other words, they've essentially escalated a policy of no information. I'm not sure though that that explanation is the best. AMD could easily make a half hearted attempt at a Technology Analyst Day which would probably include Shanghai demos, either information or demos of the next generation GPU's, plus information about 45nm, HKMG, the expected hex core chips, and DC 2.0. In other words, AMD could come up with a number of things to talk about if that is all they wanted to do.

I can think of a couple of other possibilities. One is that AMD is aware that their financial issues are the most pressing but AMD probably has nothing substantial to mention. The Q1 Earnings report ducked the question of what Asset Smart is and we'll probably see a repeat of this at the end of Q2. It would not surprise me if it took until November for AMD to have something substantial to talk about in terms of finances and earnings outlook. For example, that NY FAB deal doesn't seem that pressing now but by November the deadline for option on it will be just a little over two quarters away. Perhaps AMD doesn't know today but by November they should be able to say whether the deal is going through or they have to let it go. Secondly, if AMD really is going to make any headway in reducing losses they will have done it by November and have at least a starting track record to point to.

However, I doubt finances are the whole reason. Another reason probably has to do with the fact that the roadmaps from the last two Analyst Days have turned out to be fiction. There are currently no dual cores based on Barcelona. There is neither a Bobcat nor a Bulldozer scheduled for 2009 and in fact we currently have no idea when they might arrive. The contradictions seem to pile up with the introduction of the MCM hex core idea with 12 cores. Surely this would be a stopgap for the server market until Bulldozer. Yet we then have indications that Bulldozer will be made on 45nm technology. This seems strange to me because delaying Bulldozer past mid 2010 would seem to prevent a 45nm release unless AMD's 32nm has been delayed as well to 2011. This too however seems unlikely since we know that IBM is still talking about 32nm at the end of 2009. I guess confusion is bound to happen though when you scratch two roadmaps in a row.

I have also been wondering if AMD is going to put any improvements in the 2009 version of Shanghai or whether it is just a change to HKMG. In hindsight, it would have helped quite a bit if AMD had just put in a couple of improvements with Revision F in mid 2006. And, right now it certainly looks like Shanghai won't be changed much from Barcelona. Yet AMD had to know in 2007 that Barcelona had problems and also must have known that Penryn was an improvement on C2D and that Nehalem was on the way. It is possible that AMD decided then to make some improvements before the design had to be locked by late 2008 or early 2009. It is arguable that AMD didn't know that C2D would be that much of an improvement until it was too late to do anything more with Revision F and that at that point all AMD could do was hope for something better with Barcelona. This time around though the plea of ignorance would not be valid. So, the real question is whether AMD has enough spare bodies to work on a Shanghai bump. But AMD simply may not have the resources to spare.

If AMD really does have anything substantial on the burner then I guess we'll hear about it in November. I certainly hope we don't see another two rounds of minor upgrades that let Intel move further and further ahead and end up hoping for a big jump with Bulldozer as we did for Barcelona. I suppose there is room for hope but there is currently a lot more room for skepticism. Intel on the other hand is in a much better position. If Nehalem were not quite what was expected then Intel could simply extend Penryn until later in 2009 when the next upgrade of Nehalem would be due anyway. In fact, you could even argue that Intel has two safety nets since even the 65nm C2D's are ahead of AMD. However, I doubt Intel needs that much margin of safety since there doesn't seem to be anything wrong with Penryn.

Thursday, June 05, 2008

Nehalem Appears But Anandtech Chokes While Tooting Intel's Horn

Hopefully in the Nehalem preview Anand is telling the truth when he says that it was done without Intel's knowledge because Anandtech fumbled badly with this introduction. Clearly Anand is not someone to trust to carry your fine crystal.

June 5, 2008

Taken at face value the above scores show an impressive 21% increase in speed for Nehalem.

April 23, 2008

However, note that while the Q6600 score goes up 98 points from March to April the score for Phenom 9750 goes down 54 points.

March 27, 2008

And even more strange is that the score for Q9450 was 980 points higher back in February. If this number is accurate then Nehalem's increase is reduced to 10%.

February 4, 2008

However, even a 10% cumulative gain would be very nice on top of the gains we've seen for C2D and Penryn. Unfortunately, the single threaded scores are not quite as impressive.

June 5, 2008

This would be an increase of only 3% for Nehalem.

February 4, 2008

However, the score from February was 366 points higher. If this score is correct then Nehalem would be 9% slower.

Anand is now claiming that the reduction in speed is due to a Vista update. We can check this easily enough and see if the drop in speed is consistent.

3Dsmax 9 - 21% slower
Cinebench XCPU - 9% slower
Cinebench 1CPU - 11% slower
Pov-Ray 3.7 - 13% faster
DivX - 8% faster [version change from 5.13 to 5.03]

Well, this isn't exactly consistent. 3Dsmax shows twice the slowdown of Cinebench while Pov-Ray gets faster. DivX gets faster too although this is less definitive since Anand shifts to a slight older version.

Conclusions (or perhaps Confusions):

Penryn is 21% faster when using all threads, or . . . it is only 10% faster.

Penryn is 3% faster with single threads, or . . . it is 9% slower.

I am looking forward to seeing some proper testing of Nehalem. I'm sure most are anxious to see Nehalem compared with AMD's 45nm Shanghai. I have no doubt that Nehalem is much stronger in multi-threading than C2D. I believe that Intel did this with the goal of stopping AMD from taking server share. However, this is a double edged sword since gains for Nehalem will surely be losses for Itanium.

It remains to be seen if Nehalem fairs better in benchmarking than Penryn. Penryn benefited from code that used lots of cache and was predictable enough for prefetching to work. Nehalem however has far less cache but is also much less sensitive to prediction because of the reduced memory latency. Nehalem is also much more like Barcelona in that L3 has much higher latency than L2 making the bulk of Nehalem's cache much slower than it was with Penryn. One would imagine that Nehalem's preference in program structure would be much closer to Barcelona's preference than Penryn's has been. Comparisons later this year should be interesting.

Wednesday, May 21, 2008

Approaching Midyear

Intel's lead remains clear just as much as what steps AMD might take to become profitable again remain a mystery. The layoffs might manage $25 Million a quarter while reducing capital expenditure might be good for another $100 Million but this is a long way from the $500 Million that AMD talked about.

Lately, I've been comparing Intel's and AMD's processors. A couple of things are noticeable. SOI is supposed to confer heat tolerance and AMD's processors now have a very respectable temperature limit of 70 C. Yet, Intel's processors running on bulk silicon now have a top end of 71.5 C. The voltage spread is also noteworthy. AMD's spread ranges from 1.05 to 1.25 volts. Intel's spread is much larger, ranging from 0.85 to 1.3625 volts. Intel's processors seem to run just fine on a much larger voltage spread again on bulk silicon. I also came across a comment in one of IBM's journal articles:

Chips with shorter channels typically run faster but use considerably more power because of higher leakage. In previous-generation processors, these parts would have been discarded because of excessive power dissipation but now are usable by operating at lowered voltages. In addition, chips with longer channels typically run slower, so some of these parts also would not have been used in earlier generation processors because of their low operating frequency, but now they also are made usable by increasing their operating voltages.

This piece was from an article about IBM's Power6 development. However, since this concerns IBM's 65nm process which presumably would be nearly identical to AMD's I have to wonder. I have to say that this reminded me a lot of AMD's initial Barcelona launch. I do wonder if the 2.3Ghz 130 watt chips would have been ones that would have been scrapped in previous generations. To be honest, the current B3 stepping looks a lot more like AMD's planned initial offering. And, with this delay it becomes more doubtful that AMD will hit its target of 2.8Ghz at 95 watts on 65nm. This too presumably would have allowed 3.0Ghz at 130 watts. It now looks like AMD won't get there until 45nm. At least we assume so; there have as yet been no 45nm ES chips reviewed. Hopefully, this won't be like the Barcelona release where AMD snuck the chips out on a Friday to delay review over the weekend. Hopefully 45nm will be something that AMD can finally be proud of.

I know some time ago Ed at Overclockers had suggested that Intel might have had an advantage due to RDR. I had doubted this at the time but have since changed my mind. I realized that Intel had considered their own production marginal without DFM and that AMD has now apparently become serious about using DFM at 45nm and below. It is perhaps possible that Intel's current process advantage at 65nm could be due to just a slight edge gained by making the chip layout easier to produce. If AMD really has made use of DFM at 45nm then the addition of immersion scanning should remove any limits they had at 65nm. This would be fine were it not for the fact that Intel is using high-K at 45nm which is looking more and more like the right choice. It has yet to be shown whether or not AMD can be competitive with low-K.

In fact, it is difficult to imagine how things could be better for Intel than they are right now. Fuad has been suggesting that Intel is having trouble with 45nm production. The only evidence I've seen of this is the lack of 45nm quads at NewEgg. You can find 45nm dual cores from 2.5 to 3.2ghz but quads are still scarce. Of course, even if this is true Intel still has the excellent G0 65nm stepping which is still comfortably ahead of AMD. Considering that AMD won't produce any significant volume of 45nm until Q4 this gives Intel a good six months to tweak 45nm production. Even at that, AMD is unlikely to challenge Intel's current top speeds before 2009.

Intel's chips seem to have a fair amount of headroom as well. The dual core chips seem to be able to hit 4.0Ghz without much trouble. The quad cores might also be able to hit this but it looks like they are limited by socket wattage. If socket 775 were rated for 175 watts Intel could hit 3.5Ghz with their quads with room to spare. Of course, they could always follow AMD and introduce triple cores. This would allow a 175 watt quad to fit within the current socket limit. The problem with this idea is that Intel's marketing department already got up on their soapbox and proclaimed AMD's triple cores to be crippled chips. They would look a bit foolish to go back on this now.

Right now of course this isn't a problem because AMD's B3 chips don't have much headroom and the current triple cores are not being released at higher clocks. But, if triple cores were pushing faster than 3.33Ghz in, say, Q1 2009 what would Intel do? They could up the clock on their dual cores or eat crow and release triple cores of their own. Some suggest that Intel is secretly overrating the TDP limits of its 45nm chips. This presumably would mean that Intel would only have to change the labels to fix the problem. I'm not sure this has any truth to it because while the chips will indeed clock higher the quad TDP ratings seem to match the dual core TDP ratings. So, there is no evidence that Intel is derating its chips. Or perhaps I should clarify that by saying that there is no evidence that Intel is derating the quad chips; the duals do indeed seem capable of clocking higher while staying within the socket limits.

Of course, AMD may not be much competition in the near term. AMD might need high-K to actually pose any threat and by that time Intel will be close to or already producing 32nm. Then again, perhaps Nehalem is much better than has been suggested. I'm a bit divided about the direction that Intel has gone with Nehalem. The IMC is a great idea as is QP. And, putting the loop buffer after the decoding stage does seem like an improvement and probably a lot better than the trace cache on P4. I had been wondering how Intel was going to make SMT work without the old trace cache and this seems to be the answer. But, I still have doubts. The IMC is good but in terms of the benchmarks it seems like a lot of this will be traded off with less use of prefetching with the smaller cache. In terms of real performance this is an improvement but I wonder how it will be on the artificial benchmarks that Intel has been so fond of. Secondly, SMT seems to be one of those ideas that looks great on paper but not in practice. Most likely this was added to give Intel some greater hitting power versus Sun's strongly threaded Niagara. And, it probably will.

The problem is that as a programmer you can't really program for SMT. If you only have one application running then this can work. However, as soon as you run other applications the processor state changes and you can find yourself running slower with SMT instead of faster due to things like sharing buffers, cache, and branch prediction. And, these days with virtualiztion moving up it becomes more and more likely that servers will be running not just multiple applications but multiple OS's as well. This simply cannot be optimized at compile time. Without some method of changing the SMT profile at runtime I can't see how this would work. It seems more likely that SMT will end being an option that most server managers leave turned off. Curiously, AMD seems to be one who is moving towards dynamic load profiling but it has never given any indication that it is interested in SMT. Intel has a tough path to follow trying to make Xeon competitive with Opteron but not so good that it kills Itanium. Once Xeon has 256 bit SSE Itanium will be holding on by only an artificial RAS differentiation. But RAS will undoubtedly become more and more robust in x86 server processors. As bad as Nehalem might be for Itanium it is most certainly what Intel needs for Xeon. I'm eager to see the benchmarks to find out if it is good for the desktop as well.

Saturday, April 26, 2008

AMD's Asset Smart Explained

Although Hector Ruiz mentioned the term Asset Smart at the Q1 2008 Earnings Report he avoided explaining what it meant. There has been a lot of speculation in this vacuum but all of it that I have seen has been wrong. Most theories seem to focus on either the idea of selling all or part of a FAB to raise cash or on the idea of making graphic products at Dresden. It is actually something quite different.

Asset Smart deals with manufacturing. Starting from 1998 you have:

1998 - SPC
2000 - APC
2003 - APM
2005 - LEAN
2008 - SMART

The term Asset Light referred to AMD's changes within the Lean project (although some of these had actually started back with the APM project). The next project is SMART. Asset Smart simply refers to similar changes within the Smart project which begins in Q3 2008. Mostly, these projects deal with the process of making chips and trying to reduce manufacturing costs. The groundwork for this next phase was actually begun by AMD back in June of 2007 during SEMICON West when it hosted meetings of the Next Generation Factory (NGF) group. This continued during ISMI in October 2007 and later at SEMICON Japan. For example Semiconductor International covers the Austin meeting here:

In an effort to place a more intense focus on 300 mm fab productivity improvements, Advanced Micro Devices Inc. (AMD, Sunnyvale, Calif.) is hosting ~75 people at its Austin campus today for the second in a series of Next Generation Factory (NGF) meetings.

The day-long Austin meeting will include particpants from six integrated device manufacturers — AMD, Freescale Semiconductor, IBM, Qimonda, Renesas Technologies and Spansion — and 16 semiconductor equipment vendors, said Gerald Goff, senior member of AMD’s technical staff. Six academic experts in fab productivity were also expected to attend, Goff said prior to the meeting.

“The suppliers and IDMs used to work more directly together on productivity issues,” Goff said, adding that the NGF group is intended to complement efforts within SEMI and the International Sematech Manufacturing Initiative (ISMI). “ISMI is not moving fast enough. We have to push this 300 mm fab efficiency issue harder as an industry,” Goff said, adding that ISMI project manager Denis Fandel will be among the attendees at today’s event. “In no way do we want the takeaway from this to be that we are against ISMI,” he said, adding, however, that the growing industry emphasis on 450 mm wafers is “concerning to us.”

Goff said that because AMD was “relatively late getting to 300 mm wafers,” it may have more interest in productivity gains at the 300 mm wafer size than its competitor, Intel Corp. (Santa Clara, Calif.), which seeks momentum behind the transition to 450 mm wafers in ~2012.

You can get more information about this directly from AMD by looking at Doug Grose's Keynote Presentation at the 4th ISMI Symposium.

Back in 1998 AMD was at 2.5 days per mask layer. After SPC, APC, and APM, FAB 30 was down to 1.5 days per mask layer. With Lean, by the time FAB 30 shut down in mid 2007 it was down to just 1 day per mask layer. What AMD wants to do is reduce cost by reducing cycle time just as it has been doing for the past 10 years. As a result of Lean, wafer starts per week have jumped 31%, while labor productivity (monthly activities per operator) has climbed 72%. Monthly wafer costs have dropped 26%, and the already mentioned cycle time per mask layer has been trimmed 23%. However, FAB 36 is still at 1.4 days per mask layer. AMD is hoping to reduce this down to 0.7 days per mask layer (a 50% reduction) by shifting to small lot manufacturing.

The basic strategy involves replacing batch tooling with single wafer tooling and reducing batch size. AMD wants to drop below the current batch size of 25 wafers. AMD figures that this will dramatically reduce Queue Time between process steps as well as reduce the actual raw process time. Overall AMD figures a 76% reduction in cycle time is possible so a 50% reduction should be reasonable. Today, running off a batch of 25 wafers is perhaps 6,000 dies. Reducing batch size would allow AMD to catch problems sooner and allow much easier manufacturing of smaller volume chips like server chips. Faster cycle time means more chips with the same tooling. It also means a smaller inventory because orders can be filled faster and smaller batches mean that AMD can make its supply chain leaner. All of these things reduce cost and this is exactly how AMD plans to get its financial house in order.

Monday, April 14, 2008

Updates And Old Patterns

Amid AMD's torrent of bad news: the exit of Phil Hester, the reduced estimates for Q1 Earnings, and the announced 10% layoffs we can at least give AMD a small amount of praise for getting the B3 stepping out the door. It's a small step on a long road.

We can finally settle the question of whether AMD's 65nm process is broken. AMD's fastest 65 watt, 90nm K8 runs 2.6Ghz while the fastest 65 watt, 65nm K8 runs 2.7Ghz. So, the 65nm process is at least a little better than the old 90nm process. People still keep clamoring for Ruiz to step down. Frankly, I doubt Ruiz had any direct involvement with K10's design or development so I'm not sure what this would accomplish. I think a better strategy would be for AMD to get the 2.6Ghz 9950 out the door as soon as possible and try hard to deliver at least a 2.6Ghz Shanghai in Q3. Since Shanghai has slightly higher IPC a 2.6Ghz model should be as fast or faster than a 2.7Ghz Barcelona. I would say that AMD needs at least that this year although this would leave Intel with the top three slots.

AMD's current strategy seems to recognize that they are not competitive at the top and won't get there soon. The collection of quads without L3 cache, Tri-core processors, and the current crop of low priced quads including the 9850 Black Edition all point to a low end strategy. This is the same pattern AMD fell into back in 2002 when it devalued its K7 processors. Of course in 2002 AMD didn't have competitive mobile and its only server processors were Athlon MP. So perhaps Puma and a genuine volume of B3 Opterons will help. AMD's excellent 7xx series chipset should help as well but apparently not enough to get back into profitability without layoffs.

The faster B3 steppings are an improvement but you get the feeling they should have been here last year. You get a similar feeling when Intel talks about the next Itanium release. Although Itanium began with hope as a new generation architecture its perpetual delays keep that feeling well suppressed. And, one has to wonder how much of Itanium will be left standing when Intel implements AVX in 2010. We all know that HP is the only thing holding up Itanium at this point. Any reduction in support by HP will be the end of Itanium. And, we get a similar feeling about Intel's 4-way offerings which always seem to lag nearly a year behind everything else. For example, although Nehalem will be released in late 2008 the 4-way version of Nehalem won't come out until late 2009. Some still speculate that this difference is purely artificial and Intel's way of giving Itanium some breathing room.

However, as bad as AVX might be for Itanium it has to be a double shock for AMD coming not long after the announcement of SSE5. AVX seeks to copy SSE5's 3 and 4 operand instructions while bumping the data width all the way up to 256 bits. It looks like AMD only has two choices at this point. They can either drop SSE5 and adopt both SSE4 and AVX or they can continue with SSE5 and try to extend with GPU instructions. Following AVX would be safer but would put AMD behind since it is unlikely at this point that they could optimize Bulldozer for AVX. Sticking with SSE5 and adding GPU extensions would be a braver move but could work out better if AMD has its Fusion ducks in a row. Either way, Intel's decision is likely to fuel speculation that Larrabee's architecture isn't strong enough for its own Fusion type design. Really though it is tough to say at this point since stream type processing is just beginning to take off. However, GPU processing does demonstrate sheer brute power on Folding @ Home protein sampling. This makes one wonder why OC'ers in particular cling to the use of SuperPi which is based on quaint but outdated x87 instructions as a comparative benchmark.

There is also the question of where memory is headed. Intel is already running into this limitation with Nehalem where only the server and top end chips will get three memory channels. I'm sure Intel had intended that the mid desktop would get three as well, but without FBDIMM three channels would make the motherboards too expensive. This really doesn't leave Intel anywhere to go to get more bandwidth. Supposedly, AMD will begin shifting to G3MX which should easily it to match three or even four channels. However, it isn't clear at this point if AMD intends G3MX on the desktop or just the servers and high end like Intel. With extra speed from DDR3 this shift probably doesn't have to happen in 2009 but something like this seems inevitable by 2010.

Saturday, March 08, 2008

45nm, An Interesting Convergence In Design

AMD's K8 has been trailing C2D for the past 18 months. Making things worse, AMD has stumbled a bit with its K10 launch while Intel's Penryn seems to be on schedule. It is somewhat surprising then to discover that Intel's Nahelem and AMD's Shanghai are so similar.

Both are quad core, both use L3 cache, and both use a point to point interface (QuickPath for Intel and HyperTransport for AMD). And, according to Hans De Vries:

731 Million Transistors
246mm Die Size
7.1mm/MB L2 Cache Density

700 Million Transistors
243mm Die Size
7.5mm/MB L2 Cache Density

We don't really see much difference until we look at core size and L3 density:

24.4mm Core Size
5.7mm/MB L3 Cache Density

15.3mm Core Size
7.5mm/MB L3 Cache Density

AMD uses 2MB's of L2 + 6MB's of L3 for 8MB's total.
Intel uses 1MB of L2 + 8MB's of L3 for 9MB's total.

Along with similar total cache size the area devoted to cache is similar as well (nearly identical area for L3). However, the area devoted to core logic is quite different:

Core Area - 97.6mm
L2 Area - 7.1mm
L3 Area - 45.6mm

Core Area - 61.2mm
L2 Area - 15mm
L3 Area - 45mm

We see right away that Nehalem devotes 85% more die area to core logic than to cache whereas Shanghai devotes about the same die area to core logic and cache. It is almost a certainty that with Nehalem's greater amount of core transistors that it will run faster than Shanghai. On the other hand it should also consume more power. If we assume that Intel gets a reduction in power draw due to having better 45nm transistors then this should offset some of Nehalem's higher power draw. However, with 60% more transistors devoted to core logic I don't believe that all of this could be offset. My guess is that at full load Nehalem will draw more than Shanghai but Nehalem should be closer at idle power. Actually, Nehalem's core ratio at 40% is almost the same as Penryn's at 41% which is only slightly less than Merom's at 44%. In contrast, Shanghai's core ratio has dropped to a tiny 25% much smaller than Barcelona's 36%.

Penryn has a density of 6.0mm/MB for L2. Therefore, I would expect Nehalem's L2 at a density of 7.1mm/MB to be faster than Penryn's L2. However, I would also expect Nehalem's L3 at 5.7mm/MB to be slightly slower than Penryn's L2. This is interesting because we know that Barcelona's L3 is a bit slow. However, with an L2 twice the size of Nehalem's this should be a closer match than Barcelona is to current Penryn with its massive 12MB L2. Essentially, Shanghai is Barcelona with 3X the L3 cache but Nehalem's cache structure is much more like Shanghai's than Penryn's. Shanghai is unlikely to have any significant core changes from Barcelona but there may be some minor changes in the decoding section. This is pretty much what I would expect. Since Barcelona only improved on the decoding speed of FP instructions compared to K8 I would expect AMD to tweak a few more Integer instructions in Shanghai to increase speed. Decreasing decoding time for commonly used instructions would also make a lot of sense given Penryn's advantage in decoders. AMD is not likely to get a boost of 10-15% doing this but a boost of, say, 3% is possible. AMD might see another 3% boost in speed due to larger L3 although this could possibly be higher if AMD could make L3 faster. A lot of people are wondering if AMD will bump the master L3 clock on Shanghai for better performance at speeds above 2.3Ghz. Just bumping the clock could be worth another 1%. So, I'm guessing a total speedup for Shanghai of perhaps 7% versus Barcelona.

There is one other obvious difference in the die layout. Shanghai has mirror image cores. This means that a master clock signal originating from the center of the Northbridge area should propagate out to each of the four cores at the same time and this is a standard way of doing timing. However, since Nehalem's cores are placed side by side it must be using a more sophisticated clocking scheme. In other words, there is not any place on the Nehalem die where a clock signal would reach all four cores at the same time. This would tend to suggest the possibility that each core actually has its own master clock allowing them to run nearly asynchronously. If Nehalem does indeed have this ability then this could be a big benefit in offsetting the higher power draw since cores that were less busy could simply be clock scaled downward. AMD could therefore be in for a tough fight in 2009 in spite of the similarities between Shanghai and Nehalem. The one big benefit to AMD from the similarity is that it makes it nearly impossible to cache tune the benchmarks in Intel's favor. This means that Intel is probably going to lose some artificial advantage that it has now with Penryn. However, given the difference in core transistors and Nehalem's greater memory bandwidth I can't see any reason why Nehalem couldn't make up for this loss and still have an overall performance lead at the same clock as Shanghai.

Friday, February 29, 2008

2008: AMD Still Trailing

Intel is still moving along about the same as it has been making slow, incremental progress from the time that C2D was launched in 2006. It is clear that the increase in speed from Penryn doesn't match the early Intel hype but nevertheless any increase in speed is just that much more that it is faster than AMD. Likewise the tiny speed increases from 3.0 Ghz to 3.16 Ghz (quad core) and 3.4Ghz (dual core) are no doubt frustrating for Intel fans who would like more speed. On the other hand, AMD currently has nothing even close..

AMD will probably deliver 2.6 Ghz common chips in Q2. This chart at Computerbase claims AMD will release an FX chip in Q3. I'm not so sure about this because everything would suggest an FX of only 2.8 Ghz. This is probably the lowest clock that AMD could possibly get by with on the FX brand. There is no doubt that AMD needs a quad FX because people who bought FX in 2006 were promised upgrades and none have been forthcoming. Such a 2.8 Ghz FX would probably be clockable to 3.0 - 3.1 Ghz (with premium air cooling) based on what I've seen of the B3 stepping. This is probably the best AMD can do for now as I haven't seen anything that would suggest that B3 can deliver 2.8 Ghz as a common volume. This means that the poor man's version of FX, Black Edition will probably bump up to 2.6 Ghz as well. Intel seems to be somewhat behind in terms of 45nm but this hardly matters since their G0 stepping of 65nm works so well. But there is no doubt that AMD will be facing more 45nm Penryns in Q2. The shortages of chips have shielded AMD somewhat from increased presssure from Intel during Q4 and Q1 (although with Barcelona delayed server share may take another hit in Q1). However, as Q2 is the lowest volume of the year AMD will have to be aggressive to avoid a volume share drop during that quarter.

Probably, Fuad is closer to the truth of FX saying Q3:

The Deneb FX and Deneb cores, both 45nm quad-cores, are the first on the list. If they execute well we should see Deneb quad-core with shared L3 cache and 95W TDB in Q3. If not, the new hope will slip into Q4.

The timeline for FX being Q3 or maybe Q4 is not surprising at all. What is surprising is the idea that AMD's first new FX chip would be 45nm. If this is true then this would support the notion that AMD has suspended development of 65nm. But it would be surprising if 45nm could ramp that quickly.

The question then is what will happen in Q3 as AMD faces a steadily increasing volume of Penryn chips. The rumors suggest that AMD will not try to release a 65nm 2.8 Ghz Phenom. I'm not sure if this would then indicate that the 65nm process would hit a ceiling or whether this is to suggest that AMD will pursue these speeds with 45nm Shanghai. Another question is what 9850 might be. 9750 is supposed to be 2.4 Ghz while 9950 has been suggested to be 2.6 Ghz. So, would 9850 be 2.5Ghz perhaps? The topping out of the naming scheme does lend some credibility to the idea that AMD will suspend 65nm development and try to move to Shanghai as quickly as possible. Nevertheless, there is a big, big question of whether AMD could really deliver a 2.8 Ghz 45nm Shanghai in, say, Q3. Ever since the release of 130nm SOI, AMD's initial clock speeds on the new process have always been lower so there is a lot of doubt that AMD could reach 2.8Ghz on 45nm any sooner than Q4 2008. Nehalem will almost certainly be too small of volume in Q4 to be much of a factor. So, it looks like AMD's goal is to somehow get clock speed up and this seems even less likely with a mid year switch in process unless with 45nm AMD exceeds all past SOI efforts.

Early 2009 looks pretty good for Intel since it will not only have Penryn and Dunnington but increasing volumes of Nehalem. It still remains to be seen if Intel really will give up its lucrative chipset business on the desktop with Nehalem. It certainly seems that it wouldn't take much effort to modify an AMD HT based chipset to work with Intel's CSI interface. That would seem to remove a lot of Intel's current proprietary FSB advantage. On the other hand, with ATI out of the way this would seem to be the best time for Intel to face more competition in chipsets. Still, this does leave things a bit up in the air. If it becomes easier and cheaper to design chipsets as it surely would be if CSI is similar to HT then VIA might become more competitive. For AMD's part there seems little they can do in 2009 except try to ramp the clock speeds on Shanghai.

We have three other issues to talk about: one immediate and two longterm. The immediate issue is Hester's interview at Hexus where he mentions the slow clock speeds of K10. Basically, Hester says that the 65nm process is fine; it is a matter of adjusting some critical paths. I've seen this statement heckled by some who insist that you can't separate process from design. Curiously, these are the same people who also insisted that Intel's 90nm process was fine and that it was only a poor design with Prescott that was the problem. Anyway, this statement by Hester actually seems quite accurate to me. It was my impression that AMD had intended K10 to run at lower voltage which would have allowed higher clocks. This again seems to fit what we've seen with K10's limited by TDP. The reason for the higher voltage seems to be that the transistors don't quite switch fast enough and this causes some of the "critical paths" that Hester talked about to get out of synch. You could fix this at a low level by improving the transistors to get them back into spec with the design. Or, you could relax the timing on these critical paths which would get the design on spec with the transistors. Because 45nm is right around the corner it appears that AMD has decided to not expend more resources on 65nm improvement and will instead relax the timing. AMD's work on 45nm transistors will theoretically migrate down to 65nm, at least this is the theory of AMD's CTI (Continuous Transistor Improvement) program. However, we may now be entering a new era where improvements are so specialized that they may be unable to cross process boundaries as they used to and we may see AMD following Intel's lead. This would mean tighter design at the beginning of each process node and less reliance on later improvements.

The two long term issues concern the possibility of a New York FAB for AMD and the announcement on EUV. There are three questions about a NY FAB: Does AMD need it? Can they afford it? And, why NY instead of Dresden where FAB 30 and 36 are now? Need is most obvious because without a new FAB AMD's capacity will top out by mid to late 2010 unless the FAB 38 ramp is slower than expected. Affording is a big question but one that AMD can leave aside for now hoping that their cash situation will improve. The question of location is a curious one. One suggestion was that NY simply offered more incentives than Dresden but this by itself seems unlikely. In every case in the past Germany has shown itself more than willing to contribute money for AMD's FABs. So, the real reason for the NY location may have more to do with other factors. In fact, we even seemed to have some evidence of this from the EUV announcement.

"The AMD test chip first went through processing at AMD’s Fab 36 in Dresden, Germany, using 193 nm immersion lithography, the most advanced lithography tools in high volume production today. The test chip wafers were then shipped to IBM’s Research Facility at the College of Nanoscale Science and Engineering (CNSE) in Albany, New York where AMD, IBM and their partners used an ASML EUV lithography scanner installed in Albany through a partnership with ASML, IBM and CNSE, to pattern the first layer of metal interconnects between the transistors built in Germany."

Secondly, we need to remember that AMD only fell behind on process technology when it moved to 130nm in 2002. Prior to this AMD was doing pretty well. Although things seemed to improve after AMD's rocky transition to 130nm SOI AMD now seems to be falling behind again at 45nm. AMD used to operate its Submicron Development Center (SDC) in Sunnyvale, California. This facility was leading edge back in 1999. It surely is not lost on AMD that they have now surpassed IBM. Back in 2002 AMD only had a 200mm FAB while IBM had a more modern 300mm FAB as well as more capacity. AMD today has caught up in terms of FAB technology but passed IBM in terms of capacity. The big question for AMD has to be how badly IBM needs leading edge process technology and for how long. Robust server and mainframe chips need reliability more than top speed. Secondly, IBM has been steadily divesting hardware so one has to wonder when the processor division might become a target. Notice that in the above announcement the wafers had to be flown from FAB 36 in Dresden to New York. Given these facts I think it is possible that AMD wants to create another research facility at New York. I think this could serve both to tweak processes faster and optimize them better for AMD's needs as well as pick up any slack if research at IBM falls off. There has been no indication of this but it does seem plausible.

The recent EUV announcement is incomplete however. If we look at an IBM article on EUV in EETimes from February 23, 2007 we see that IBM very much wanted EUV for 22nm but figured that it wouldn't be ready in time for early development work.

The industry hopes EUV will make it into production sooner than latter, but the technology must reach certain milestones. ''I think the next 9 to 12 months are very critical to achieve this,'' said George Gomba, IBM distinguished engineer and director of lithography technology development at the company.

Twelve months from February 2007 would be now. So, what is missing from the EUV announcement is whether or not this recent test puts EUV on track for IBM for 22nm or whether it will have to wait for 16nm. A second question is why the test wafer was made at Dresden by AMD. If IBM had already tested its own wafers then why didn't it announce earlier? This could mean that AMD has decided to try to hit the 22nm node for EUV but that IBM has decided to wait until 16nm. If this is a more aggressive stance for AMD then it could mean that AMD will rely less on IBM for process technology for 22nm. This again would support the idea that AMD wants a new design center in NY. I think it is entirely plausible that AMD could surpass IBM to become the senior partner in process development over the next few years.

Friday, February 15, 2008

Bandwagon Journalism (AMD is sooo last year)

Much like wearing Prada, it has become fashionable to attack AMD. Often it seems that web authors say negative things less because there is any valid reason and more because they simply want to be part of the crowd. A good example of this type of trash journalism is this Extreme Tech piece by Joel Durham. Durham and many others suggest that the evidence is everywhere that AMD became lazy and stopped innovating. The reality is that there is no such evidence.

Let's start with the argument that AMD has been generations behind Intel in terms of process technology. In early 2003, Intel's Northwood P4 was at 3.2Ghz while AMD's Barton K7 was at 2.2Ghz. Both were using a 130nm process.

Intel P4
2003, 3.2 Ghz - Q4 2006, 3.8 Ghz, 19% increase in 14 quarters


2003, 1.8 Ghz - Q4 2006, 2.8Ghz, 56% increase in 14 quarters
Equal to K7, 2.0 Ghz -2.8 Ghz, 40% increase
Equal to P4, 2.2 Ghz - 2.8 Ghz, 27% increase

After changing processes twice P4 topped out at 3.8 Ghz on 65nm, a very modest 19% increase in clock. AMD increased clock by 56% in the same period of time. Of course, it could be argued that K8's initial 1.8 Ghz was slower than the fastest 2.2 Ghz Barton at the time. However, looking at either the 2.0 Ghz point where K8 matched the fastest K7 or the 2.2 Ghz point where K8 was faster than the fastest P4 we still see that AMD increased clock more than Intel over the same period of time.

The second argument is that AMD has been doing much worse with 65nm than it has before with process technology and is way behind where it should be. This is not exactly true when compared to AMD's previous track record with 130nm SOI. It took AMD about a year to match K7's old process speed of 2.2 Ghz and deliver 2.2 Ghz K8's in volume in early 2004. We see an almost identical scenario with 2.8 ghz 65nm K8's now arriving about a year after their 90nm counterparts.

The third argument is that AMD's 65nm process is broken. The supposed evidence of this is that K8 hit 3.2 Ghz on 90nm while 65nm is only now at 2.8 Ghz. This may sound good but it isn't a fair comparison. AMD stopped developing the process used for K7 because K7 was on the old socket A and therefore had a limited lifespan. If K7 had continued to be developed it very likely too would have been at a higher speed in early 2004 when 2.2 Ghz K8 arrived. We could easily have been comparing 2.2 Ghz K8 to 2.4-2.6 Ghz K7 much as we see today. In fact, this very thing did happen to Intel with P4. Intel continued to develop the Northwood core on the old 130nm process and reached 3.46 Ghz which exceeded the initial 90nm clock speeds. In fact, if we use Intel's highest 130nm clock then Intel's efforts look particularly poor as we only then see a tiny 10% increase in clock speed to 3.8 Ghz on 65nm in the next 3 years. By the logic of the bandwagon analysts Intel's 90 and 65nm processes must have been broken.

However, the reality is quite a bit different from such a superficial view. In between the period of 2003 and 2006 both companies shifted to dual core on P4 and K8 which slowed clock increases. We really can't compare one to one a dual core Tulsa at 3.8 Ghz to a single core Northwood Xeon at 3.46 Ghz. We clearly saw that even though AMD's 90nm process was mature by 2005 the initial clock speeds for X2 were 400 Mhz slower than the single core speeds. Adding in the core doubling factor we can see that the actual clock increases were greater than the apparent increases. Similarly today we see speeds being held back because of a shift from dual core to quad core.

It is clear then that K8's clock speeds advanced at a normal pace and that 65nm matches the rate of development of AMD's 130nm SOI process. This leaves the question of why the notion that AMD became arrogant and lazy has become so pervasive since 2006. There is no doubt that Intel scored a big win both by introducing an architecture with increased IPC and increasing clock at the same time. This is similar to Intel's introduction of Northwood P4 where the IPC increased over Williamette and the improved 130nm process allowed a faster clock. Compared to this AMD's necessary shift to revision F for DDR-2 seemed very disappointing. Thus at the end of 2006 Intel was at 3.0 Ghz on a 65nm process with quad core compared to AMD which was just introducing 65nm and was only at 2.8 Ghz with 90nm dual core. Some have tried to claim that AMD should have moved to 65nm earlier but FAB 30 was not capable of 65nm production and any money spent on outdated 200mm tooling for upgrades would have been wasted. AMD had to wait on FAB 36 and the 65nm ramp there seems inline with industry expectations.

So, in looking at AMD and Intel more closely we simply don't find the arrogance, laziness, or lack of innovation that it has become so chic lately to attribute (with airy wave of hand) to AMD. The change to a DDR-2 controller no doubt consumed development resources but added no speed to the processor itself. The bottom line was that Intel's 65nm process was mature when C2D arrived because it had already been wrung out with Presler and Yonah and there was simply no possible way that K8 with internal 64 bit buses was going to compete with C2D with new 128 bit buses. Intel basically got lucky with quad core since the shared FSB architecture was adaptable to this. I saw a lot of people claim that AMD should have done an MCM type design with K8 but I still haven't figure out how well this would have worked with a single memory controller feeding two chips and the second chip being fed via an HT link. Presumably the performance would have been very similar to dual socket with only one chip having a connection to memory and these only showed 50% performance for the second chip. I still have doubts that this at 2.8 Ghz would have had much effect in late 2006 and it seems that the memory bottleneck would simply have gotten worse as the speeds increased to 3.0 and 3.2 Ghz. Rather than laziness it is clear that 2006 found AMD with very few options to respond to Intel's successes.

I've seen comment after comment claiming that the purchase of ATI was a huge mistake. I'll admit that it cost a lot of money when AMD had none to spare but what exactly was the alternative? If AMD had not purchased ATI the five quarters worth of losses would have been the same. There was nothing about the ATI purchase that affected AMD's processor schedule. I've also seen claims that AMD overpaid for ATI and the supposed proof of this is the $1.5 loss of goodwill charged in Q4 07. The problem with this idea is that the purchase price had to include ATI's prospects including business from Intel. Naturally, ATI lost this business when it was purchased by AMD. Since the loss of Intel business was a direct result of AMD's purchase this loss of value at ATI was inevitable. However, on the positive side AMD acquired the 690G chipset which remained the most popular chipset for AMD systems through 2007. Likewise it is a certainty that the 790 chipset will be in 2008. AMD also gained a purpose designed mobile chipset. The lack of such a chipset prevented the superior Turion processor from matching Penium M for the past several years. Gaining this chipset is difficult to underestimate. This also puts AMD in a much more competitive position with Fusion. There is no doubt that AMD has troubles but I can't see any which were caused by the ATI purchase. Without the purchase AMD would have more money but its competitive position would be worse.

I've unfortunately found that when people state my position they usually only get it right about 1/3rd of the time. So, I'll try this clearly. It is obvious that AMD is behind and the clearest indication of this is the lack of FX chips. The Black Edition doesn't count since this is actually a volume chip, more like a poor man's version of FX. True FX chips are at the bin limits and therefore only available in limited quantities. The fact that FX chips have been replaced with Black Edition shows that even AMD knows that it is behind. AMD's official highest clock on 65nm for X2 is 2.8Ghz. This X2 4000+ review at OC Inside shows 2.93 Ghz at stock voltage. This 200Mhz margin is the difference between common and low volume. In other words AMD should therefore be capable of delivering FX 65nm X2 chips at 3.0Ghz. Of course, there would be no reason to since these would not be competitive. However, using the pattern of X2 we would assume that X4 would be 2.4Ghz common volume and 2.6Ghz FX volume. Again, a 2.6Ghz X4 would not be competitive as an FX chip so there are none.

These clocks match closely with what we've actually seen. I have seen accounts of 2.7 and 2.8Ghz on Phenom X4 with stock voltage. This of course would contrast sharply with suggestions from places like Fudzilla that Phenom will top out at 2.6Ghz since one would assume that another quarter or so would give 2.8Ghz common volume for X4 in Q3. This would then seem to allow 3.0Ghz at FX volumes. These are both good questions: whether AMD could truly deliver 2.8Ghz in Q3 and whether AMD would consider 3.0Ghz fast enough for an FX branded chip. I have seen suggestions that AMD will abandon 65nm in favor of 45nm at mid year. However, this would not seem to match AMD's previous behavior since 65nm chips use the same socket and therefore would not be end of life as K7 was in 2003. It would seem more likely that AMD would continue to rely on 65nm during 2008 for the bulk of its chips and highest speeds and that 45nm even if at reasonable volumes in Q4 will not reach competitive speeds until early 2009. In other words, barring a big process leap for 45nm I would expect AMD's best in 2008 to be 65nm. I don't suppose we will get any real idea of AMD's 45nm process until someone gets ahold of some 45nm ES chips and that probably won't happen any earlier than late Q2.