Scientia's Blog: Nehalem

Showing posts with label Nehalem. Show all posts

Tuesday, November 10, 2009

Laying The Ground Work For Proper Testing

I've seen reviews of AMD's Phenom II and Intel's Nehalem. These reviews have varied a lot in quality but none have really provided comprehensive results. It's time to find out.

I originally bought an AMD Phenom II X3 720 Black Edition. Mainly I did this to have something to try out while I was deciding what quad core to buy. The 955 BE looked pretty good. The 965 had a higher base clock but was also more expensive and was rated at 140 watts. On the Intel side, the i7-920 was still much more expensive than the PII 965. But, recently, this all changed. AMD released a new C3 stepping of PII 965 that is rated at 125 watts. Surprisingly, it was released at $200 instead of the $250 that most had been expecting. So, I ordered one. And, I ordered an Asus M4A79X motherboard which is similar to my MA4785 board but without graphics. I also purchased an ATI HD 4650 and couple of ATI HD 5770 graphic cards.

Then I noticed that the i5-750 was the same price, $200, and that I could get an Asus P55 motherboard without graphics that was almost identical to the 79X board and the same $120 price. I ordered those as well. This will give me two almost identical systems. Both systems will be native quad core with onboard memory controller and two memory channels. This should be an excellent head to head, dollar for dollar test. I'll use DDR3-1600 memory rated at CL 8. This makes the most sense because CL 7 memory is still less common, and faster memory rated at 1800 - 2100 Mhz tends to be twice the cost. I'm getting a couple of moderate sized third party coolers to test overclocking although I'm also interested in how much headroom there is with the stock HSF. Moderate sized is close to 500 gram weight, under 130mm's tall, and using 92mm fans. This compares with the heavy coolers which tend to be closer to 160mm's tall, weigh upwards of 700 grams, and use 120mm fans.

The hardware is perfect; this is the closest match of AMD and Intel hardware that I've seen in a number of years. I don't think we've had this close of a comparison since the K7/K8 single core days. The question now is how to test. I'm working on that. My game Dawn of War has a graphic check so see what level is playable. With my IGP 785 graphics the game is only playable with minimum settings. I can check this again with HD 4650, HD 5770 and HD 5770 Crossfire. To be honest, I don't expect to see much difference between the i5 and PII 965 systems. I'll also compare with my X3 720 to see if having another core makes any difference. PassMark has Peformance test which also includes both 2D and 3D graphic tests. I can say that my 785 graphics fail miserably on the last two tests. I'll give these a try but I wouldn't be surprised if they are low enough stress that even the 4650 card passes. I'm hoping that Dawn of War will be require a bit more although it too may top out before reaching the level of Crossfire.

For Integer testing I'm thinking about something based on GMP since this library shouldn't be tuned for either AMD or Intel. Use of the Intel Compiler is obviously out since this would contaminate the metrics. I have Visual Studio but this compiler is only middle of road in terms of what it produces. The next version is looking much better and it is just now available in Beta so we'll see. Better code would be nice but bugs in the Beta version could also contaminate the metrics. However, even the current version should be adequate with Integer code; it is the SSE code that is more of a concern. SSE2 is getting a bit dated. SSE3 is about the minimum level that would be nice to test. Better would be SSE4a versus some or all of Intel's SSE4. I don't have a requirement for full SSE4 testing since a fair bit of this will be replaced with Intel's next upgrade much as SSE became less important as wider SSE functions were added.

My operating system is 64 bit. I'm using 8 GB's memory and see no reason to waste time with 32 bits. Anything that I compile will be 64 bit. I do plan to test with both 2 DIMMs and 4 DIMMs to see if there is any difference. With my system I haven't seen any significant difference in timing or top speed. The two standard cases that I'll be using are both CoolerMaster cases with 200mm fans in front and top and a 120mm fan in the back. I do have a smaller case that only has a single 120mm fan in the back which I could test with. Personally, the notion of putting a $200 processor in a $60 case seems a little goofy but this could show what type of environment would be acceptable. Frankly I wouldn't be surprised if the 125 watt PII 965 were too much for that case.

I'm also glad that I got the X3 720 first since it is rated at 95 watts just like the i5-750. This should give me a pretty comparison of two 95 watt systems although since the i5 is a quad core I would expect it to be more powerful. I suppose if the i5 turned in thermals similar to my 720 while matching the performance of the 965 that would be quite a feather in Intel's cap since it would mean that i5's could be used in smaller cases with less cooling. Overall, it wouldn't be any great victory though since there is no price advantage. Of course, it has been suggested that Intel's power rating is bogus and is actually higher than they claim. Others have tested and insisted that Intel draws less power. Again, I don't really care about previous power draw tests since I have a 95 watt X3 and a 125 watt X4 to compare with. I suspect that since i5 is on the lower end of the Nehalem range it will actually fall in between these two but again I don't know without testing.

And, I have two graphics-free motherboards so I can test without the contaminating effect of integrated graphics. Given the huge gap between AMD and Intel integrated graphics there really is no way to directly compare them. The simplest solution is to discard the integrated graphics and use the same discreet graphic card. For lower level tests or small case tests I would use the HD 4650 which is a pretty solid, middle of the road card. I wouldn't expect a small case with one 120mm fan to be able to handle one 5770, much less two; nor would I expect it to handle an overclocked 125 watt processor. I know that I haven't had any thermal problems with my X3 720 but the case is well ventilated and it is only a 95 watt processor with integrated graphics. If i5 with 4650 does not pass a small case test then I can still project how well i5 would do with integrated graphics by comparing with 720 and the 785 motherboard. And, if the tests are borderline I picked up a heftier, 40 CFM, 120mm Scythe fan which would boost cooling in the small case. This should allow a pretty good inference for cooling with two regular 120mm fans. At any rate, thermal comparisons should settle any question of thermal issues either for Intel or AMD.

I still don't know if my thoughts about case testing are clear. I take issue with the open case, huge cooler testing that they do over at Anandtech. Likewise I take issue with the schizophrenic testing they do over at Toms Hardware Guide. I mean, who in their right mind would put four graphic cards in a small case? I don't really care that much about testing power draw. If electicity is really a concern you could always buy a lower power system with slower memory, 65 or even 45 watt processor and use integrated graphics. But most people are not that concerned about it. Of more concern is whether or not a given case will work with a given system. Generally, everything that people do to increase speed also increases heat. Voltage is increased on the memory, on the CPU, and even on the graphics. Higher clock speeds, more memory, and faster graphics all use more power. We all know that, at various times in the past, thermals were an issue. AMD had K7's before Barton that ran hot, Intel had Prescott which ran very hot and begat the BTX case as a desperate solution. There had been rumors of higher clocked AMD dual and Intel quad core 65nm processors running hot. Of course, now everyone is on 45nm but the top end chips are still rated at 125 or even 140 watts. My results should have much more practical value to people who would like to build lower and midrange systems rather just people at the very top end.

Keep in mind that thermal testing is somewhat separate from performance testing. You can't really begin benchmarking unless you know a given system is reliable. Too often, it seems that reviewers achieve a hasty estimate of maximum clock and then run their benchmarks without really knowing how stable the system is (unless it crashes during the tests). I suppose that and time pressure is why they take so many shortcuts. It takes hours just to run memory stability tests and hours more to run system and stress tests. And, this has to be repeated when trying to find a maximum overclock. Lots of variables like memory voltage, northbridge speed, CPU voltage, base clock versus multiplier all add up to hours and hours of added testing. And this before any actual benchmarks are run. I am confident in the settings on my system. I run the CPU at 3.4 Ghz. I've tested it much higher. I run with auto voltage on both the CPU and NB. I run with the base clock at 200 Mhz and the NB at 2.6 Ghz. I have the memory at 1333 Mhz with CL 7 and 1.545 volts. Auto doesn't work with the memory since auto is 1.5 volts and it will get errors at 1.515 volts with these settings. I'm confident that this is maximum performance for my system. I've tested overclocking the graphics but this isn't really worthwhile since you can get many times better performance without stressing the chipset just by putting in a moderate graphic card like the HD 4650. I expect both the i5 and 965 to show improved performance over my current system.

Saturday, January 17, 2009

Seeking A New System

Finally, information is arriving on Phenom II (Shanghai) versus Penryn and i7 (Nehalem). It's time to start thinking about a new desktop.

Generally when I buy a new system I end up getting something 3X more powerful than my old system. However, the last time I bought a new computer I wanted a notebook so I ended up getting a desktop replacement system with a 2.0Ghz mobile Athlon 64 which is only about 67% faster than my old P4 1.8Ghz system. This has put me a little behind so I would really like something about 9X faster than the old P4 to catch up. This requirement would be met with a tri-core of 3.0Ghz or a quad of only 2.3Ghz. There are no X3's available at 3.0Ghz yet but both Intel and AMD easily fulfill the quad requirement. However, I was reminded of Anandtech's past views about processor value after some back and forth with Johan De Gelas.

AMD's dual core Opteron & Athlon 64 X2 - Server/Desktop Performance Preview

"The real problem is that AMD has nothing cheaper than $530 that is available in dual core, and this is where Intel wins out. With dual core Pentium D CPUs starting at $241, Intel will be able to bring extremely solid multitasking performance to much lower price points than AMD"

Athlon Dual Core: Overclocking the 4200+

"The pricing, however, was a little hard to swallow with the range from just over $500 for the lowest-priced 4200+ to around $1000 for the top-line . . .

With the 4400+ sporting 1MB cache on each core, and only a few dollars more than the 512KB 4200+, we would suspect the 4400+ may well be the Dual-Core to buy"

Affordable Dual Core from AMD: Athlon 64 X2 3800+:

"and today, AMD is launching it - the $354 Athlon 64 X2 3800+; the first somewhat affordable dual core CPU from AMD.

The cheapest dual core Pentium D processor could be had for under $300, yet AMD's cheapest started at $537. Intel was effectively moving the market to dual core, while AMD was only catering to the wealthiest budgets.

The Pentium D 820, running at 2.8GHz and priced at $280, offered the most impressive value that we've seen in a processor in quite some time"

Of course, Anandtech has been leaning towards Intel for so long, I doubt they even remember what they said back in 2005. I suppose this could be why Anandtech never complains about price with Intel's Nehalem or Q9650 as they did (over and over) with AMD's X2. So, a $500 pricetag is "catering to wealthiest budgets", $354 is only "somewhat affordable", $280 is "impressive value", and more L2 cache is better at $44 because this is "only a few dollars more". I suppose someone could argue that inflation has raised the price point since 2005 however this has been more than offset by the decrease in system prices. So, let's see what happens if we use Anandtech's past, vigorously defended (but now apparently forgotten) criteria with today's chips:

We immediately eliminate the Intel i7 940 at $565 and Q9650 at $550 plus anything higher. At first glance, i7 920 would seem to be okay at $295. However, by the time we allow for the extra money required for the more expensive X58 motherboard and DDR3 memory we are again back up to $500, so it gets eliminated as well. This leaves Q9550 and below. However, Anandtech also preferred the 4400+ because it had twice as much L2 cache as the 4200+. This would make it very difficult to choose the $250 2.66Ghz Q9400 with 6MB L2 over the faster $295 2.83Ghz Q9550 with 12MB L2. In fact, the difference in price is almost identical to the difference between the 4400+ and 4200+. And, if double the cache is good at $44 then presumably triple the cache would be equally good at $66. But with only 1/3rd the cache the Q8300 is only $57 cheaper so it gets eliminated too. For Intel this only leaves the $190 2.4Ghz Q6600 with 8MB L2 and the $180 2.33Ghz Q8200 with 4MB L2. So, we have a choice between the outdated B3 stepping Q6600 and the newer 45nm Q8200 with half as much cache.

1/20/2009: The prices changed since I wrote this so I'll amend my comments. The Intel 3.0Ghz Q9650 is now "somewhat affordable" at $350and therefore at the top of the midrange. It is not a great bargain since 20% more money only gets you 6% more performance. Or, at least it does as long as the code can run from the L2; Q9650 shares the same FSB bottleneck as Q9550. The price of Q9550 has hardly change and at $290 is just $5 cheaper than it was. The Q9400 still gets eliminated because of the $50 price difference and half the cache but we retain Q8300 at $205 because it is now $85 cheaper than Q9550.

The problem is that AMD had price drops as well cutting Phenom II by $40. This makes Q9550 less of a bargain since PII 940 is now $55 cheaper. This is enough for a nice upgrade on the video card and to be honest I'd rather have a PII 940 with an HD 4850 than a Q9550 with an HD 4830. The drop in price with PII 920 to just $195 now makes it quite a value and throws a big monkey wrench into Intel's prices. Q9400, Q8300, Q6600, and Q8200 are all now about $45 too expensive. A choice between a cache crippled 2.33Ghz Q8200 for $170 and a PII 920 at 2.8Ghz for just $25 more is a no brainer. On the other hand, a Toliman system would be $50 cheaper again allowing a nice video upgrade.

If I were looking for a stopgap system I might go with the Intel $190 Q6600 or $180 Q8200. The AMD Black Edition Phenoms for $150 these days are also a bargain as is the $119 2.4Ghz 8750 BE Toliman tri-core when matched up with a 750 southbridge. I would be comfortable with any of these chips on a bargain or stopgap system. However, since AMD tripled the L3 on with its newer chips we are again pushed towards the $235 Phenom II X4 920 at 2.8Ghz or the $275 Phenom II X4 940 BE at 3.0Ghz for a midrange system.

I don't usually bother overclocking but this is a pretty simple operation if you match a 750 southbridge with the AMD quads. If I were comparing with the 9950 BE or 8750 BE in terms of overclocking I might give the B3 Q6600 the benefit of the doubt. But in the company of the newer 45nm chips the vintage Q6600 is clearly out of its league in terms of overclocking. Given a choice between the 920 and 940 AMD quads I would have to choose the 940 because its Black Edition unlocked multiplier would make it easier to overclock.

The Intel Q9550 is technically faster however it gets choked by its slower 1333Mhz FSB. To match the AMD 940 it would need 2133Mhz.However, even the still slower 1600Mhz FSB is only available on the insanely overpriced QX9770 for $1,460. Intel integrated graphics still lag behind both nVidia and AMD, so an nVidia motherboard is required for Intel unless I want to add a discrete graphics card from the start. The Dragon platform is the obvious choice for AMD although nVidia options are only a little worse in terms of integrated graphics.

For a discrete graphics card the 4000 series would be the obvious choice for something from AMD (ATI). I would probably be looking at a $140 HD 4850 or the $100 HD 4830 (which is about 75% of the 4850's performance). The nVidia equivalent of 4830 would be the 9800GT at $115 and the 4850 equivalent would be the 9800GTX+ at $165. The problem is that for $165 I can get a 1GB 4850 instead of a 512MB. And, if I move up to the 1GB size of 9800GTX+ I'm at $185 which is only $5 less than the much more powerful HD 4870. The GT260 is more expensive for less performance than the 4870 so it isn't worth considering.

I also wasn't exactly surprised with Anandtech's conclusion in AMD Phenom II X4 940 & 920: A True Return to Competition:

"Compared to the Core 2 Quad Q9400, the Phenom II X4 940 is clearly the better pick. While it's not faster across the board, more often than not the 940 is equal to or faster than the Q9400. If Intel can drop the price of the Core 2 Quad Q9550 to the same price as the Phenom II X4 940 then the recommendation goes back to Intel. The Q9550 is generally faster than the 940"

This conclusion is a bit out of whack since their own data shows that 940 compares favorably with Q9550. I would also guess that 940 would compare even better if Anandtech stopped avoiding mixed testing (for the past two years).

At any rate, I tend to agree with the guidelines that Anandtech used back in 2005. Today, I could easily recommend a Core 2 Quad Q9550 system with Intel motherboard if someone were planning to add discrete graphics anyway or with an nVidia motherboard for its better integrated graphics. An equally good choice would be a Phenom II 940 system with either discrete graphics or integrated graphics on a AMD Dragon platform. An nVidia motherboard would be adequate with Phenom II but if an older Phenom or Toliman were chosen then an AMD motherboard with 750 southbridge would be preferred.

1/20/2009: With the current prices the Q9550 is still acceptable but a worse bargain than a PII 940 system. In other words, with equivalent price it would be nearly impossible to assemble a Q9550 system equal to the PII 940 system. The drop in the PII 920 price has also greatly reduced the value of the lower end Intel quads and I could no longer recommend them even for a bargain or stopgap system. If you have money to burn, the Q9650 is less value for the money but would be an upgrade over a PII 940 or Q9550 system.

Monday, December 29, 2008

Still Waiting

I should have some spare money by the end of January and plan to buy a new computer. However, like most people I am still waiting on comparisons between AMD's 45nm Shanghai and Intel's new Nehalem. The silence is deafening.

Reports trickle in from places like XtremeSystems Forums and other scattered locations of people doing freelance testing on Shanghai. The suggestions are pretty good for AMD. It looks like Shanghai gets a fairly good boost at the same clock versus Barcelona while overclocking much better. It also appears that Nehalem is worse at overclocking than Penryn. That doesn't really surprise me since an IMC seems to be tougher to overclock than a FSB and putting the memory controller on the die increases the thermal load. Frankly I was impressed by how much smaller the HSF was for Penryn versus C2D and was equally surprised to see how much bigger the HSF was for Nehalem (larger than C2D). I don't think that Intel is in any danger of another Prescott but some of the blush is definitely off the Penryn rose. I'm also wondering if reports of Nehalem overheating with the stock HSF (even with the larger size) are true.

Still, this is going to be a problem for Anand Lal Shimpi and I'm not sure what he is going to do about it. See, back in late 2005 Intel was getting thumped hard by Athlon X2 and Anand decided to be value conscious and railed against X2's "insane price". Then as C2D was released in 2006 Anand conveniently dropped his previous objection to price. In fact these days you never even see him mentioning insane prices when talking about Intel's Skulltrail system. Nevertheless, he did try to excuse his change on prices by noting that even the low end Intel's chips were much better at overclocking than AMD's.

I still have reservations about judging the value of a chip by overclocking because so few people do it. The truth is that most systems run at stock speed with stock cooling and integrated graphics. So, it is somewhat odd that Anandtech bases its value on discrete graphics with overclocking and premium cooling. Perhaps the fact that Intel excelled at overclocking with premium cooling but needed a graphics card to make up for its poor integrated graphics is just a coincidence . . . maybe. I would be happier if Anandtech did this in two parts, one for common users and another for performance users. But at any rate, I will readily admit that AMD wasn't even in the running for OC'ing until they released the B3 stepping with the SB750 southbridge. So, one has to wonder if Anandtech will suddenly change its stance on value based on overclocking now that Intel's once proud OC banner is lying on the ground in tatters.

It could be the fact that Anandtech does about three Intel articles for every AMD article that makes it appear biased. Or perhaps it is when Johan De Gelas admitted that their server testing was 18 months out of date. And, that coincidentally the things that they were behind on were the things that Intel's chips did poorly. I know that Anandtech took a big hit in credibility when they first criticized AMD's 4000 series for requiring a dual GPU card and then turned around and chose a dual nVidia card as the best value. Maybe I'm biased and just not giving Anandtech a fair shake. Maybe, but then the numbers agree with me. AMD's graphics sales are today mostly the new 4000 series while nVidia's are still the older 9000 series. In fact, demand for nVidia GT200 series is about as dismal as it was for AMD's 2000 series.

I know what the rumors are. The rumors are saying that Anandtech has already benched Shanghai and it does pretty well with Intel only holding onto the really expensive (overpriced) top slot. If prices don't change then Intel's i7 920 at 2.66Ghz is going to be going head to head at $300 with AMD's Phenom II 940 at 3.0Ghz. So, with equal price that would put Intel at a 13% clock disadvantage right off the bat. And, without the fig leaf of better overclocking it has been suggested that Anand is having a hard time spinning the comparison in Intel's favor. Time will tell but I certainly hope we see some real numbers over the next month.

Monday, July 14, 2008

Reviews And Fairness Or How To Make Intel Look Good

I've had people complain that I've been too tough on Anand but in all honesty Anandtech is not the only website playing fast and loose with reviews.

Anand has made a lot of mistakes lately that he has had to correct. But, aside from mistakes Anand clearly favors Intel. This is not hard to see. Go to the Anandtech home page and there on the left just below the word "Galleries" is a hot button to the Intel Resource Center. Go to IT Computing and the button is still there. Click on the CPU/Chipset tab at the top and not only is the button still there on the left but a new quick tab to the Intel Resource Center has been added under the section title right next to All CPU & Chipset Articles. Click the Motherboards tab and the button disappears but there is the quick tab under the section title. There are no buttons or quick tabs for AMD. In fact there are no quick tabs to any company except Intel. Clearly Intel enjoys a favored status at Anandtech.

What we've seen since C2D was released is a general shift in benchmarks that favor Intel. In other words instead of shooting at the same target we have reviewers looking to see where Intel's arrows strike and then painting a bullseye that includes as many of them as possible. For example, encryption used to be used as a benchmark but AMD did too well on this so it was dropped and replaced with something more favorable to Intel. There has been a similar process for several other benchmarks. Of course now it isn't just processors. Reviewers have carefully avoided comparing the performance of high end video cards on AMD and Intel processors. Reviews are typically done only on high end Intel quad cores. The claim is that this is for fairness but it also avoids showing any advantage that AMD might have due to HyperTransport. It is a subtle but definite difference where review sites avoid testing power draw and graphic performance with just Integrated Graphics where AMD would have an advantage. They then test performance with only a mid range graphic card to avoid any FSB congestion which again might give AMD an advantage. Then high end graphics cards are tested on Intel platforms only which avoids showing any problems that might be particular to Intel. We are also now hearing about speed tests being done with AMD's Cool and Quiet turned on which by itself is good for a 5% hit. I suppose reviewers could try to argue that this is a stock configuration but these are the same reviewers who tout overclocking performance. So, by shifting the configuration with each test they carefully avoid showing any of Intel's weaknesses. This is actually quite clever in terms of deception.

As you can imagine the most fervent supporters of this system are those like Anand Lal Shimpi who strongly prefer Intel over AMD. I had one such Intel fan insist that Intel will show the world just how great it was when Nehalem is released. However, I have a counter prediction. I'm going to predict that we will see another round of benchmark shuffling when Nehalem is released. And, I believe we will see a concerted effort to not only make Nehalem look good versus AMD's Shanghai but also to make Nehalem look good compared to Intel's current Penryn processor. It would be a disaster for reviewers to compare Nehalem and conclude that no one should buy it because Penryn is still faster . . . so that isn't going to happen.

An example is that since AMD uses separate memory areas for each processor it needs an OS and applications that work with NUMA. In the past reviewers have run OS's and benchmarks alike oblivious to whether they worked with NUMA or not. If anything seems to be overly slow they just chalk it up to AMD's smaller size, lack of money, fewer engineers, etc. Nehalem however also has separate memory areas and needs NUMA as well. I predict that these reviewers will suddenly become very sensitive to whether or not a given benchmark is NUMA compatible and will be quick to dismiss any benchmark that isn't. This may extend so far as to purposefully run NUMA on Penryn to reduce its performance. This would easily be explained away as a necessary shift while ignoring that it wasn't done for K8 or Barcelona. That would be explained away as well by saying that the market wasn't ready for it yet when K8 was launched. That was what happened with 64 bit code which was mostly ignored. However, if Intel had made the shift to 64 bits reviewers would have fallen all over themselves to do 64 bit reviews and proclaim AMD as out of date just as they did every time Intel launched a new version of SSE.

We see this today with single threaded code. C2D and Penryn work great with single threaded code but have less of an advantage with multi-threaded code and no actual advantage with mixed code. It is a quirk of Intel's architecture that sharing is much more efficient when the same code is run multiple times. If you compared multi-tasking by running a different benchmark on each core Intel would lose its sharing advantage and have to deal with more L2 cache thrashing. Even though mixed code tests would be closer to what people actually do with processors reviewers avoid this type of testing like the plague. The last thing they want to do is have AMD to match Intel in performance under heavy load or worse still actually have AMD beat a higher clocked Penryn. But Nehalem uses HyperThreading to get its performance so I predict that reviewers will suddenly decide that single threaded code (as they prefer today) is old fashioned and out of date and not so important after all. They will decide that the market is now ready for it (because Intel needs it of course).

Cache tuning is another issue. P4EE had a large cache as did Conroe. C2D doubled the amount of cache that Yonah used and Penryns have even more. However, reviewers carefully avoid the question of whether or not processors benefit from cache. This is because benchmark improvements due to cache tend to be paper improvements that don't show up on real application code. So, it is best to avoid comparing processors of different cache sizes to see benchmarks are getting artificial boosts from cache. I did have one person try to defend this by claiming that programmers would of course write code to match the cache size. That might sound good to the average person but I've been a programmer for more than 25 years. Try and guess what would happen on a real system if you ran several applications that were all tuned to use the whole cache. Disastrous is the word that comes to mind. But you can avoid this on paper by never doing mixed testing. A more realistic test for a quad core processor is to run something like Folding@Home on one core and graphic encoding on another while using the remaining two to run the operating system and perhaps a game. Since the tests have to be repeatable you can't run Folding@Home as a benchmark but that isn't a problem since it is the type processing that needs to be simulated rather than the specific code. For example you could probably run two different levels of Prime95 tests in the background while running a game benchmark on the other two cores to have repeatable results. And, if you do run a game benchmark on all four cores then for heavens sake use a high graphic card like a 9800X2 instead of an outdated 8800.

Cache will be an issue for Nehalem because it not only has less than Penryn but it has less than Shanghai as well. It also loses most of its fast L2 in favor of much slower L3. My guess is that if any benchmarks are faster on Penryn due to unrealistic cache tuning this will be quickly dropped. That reviews shift with the Intel winds is not hard to see. Toms Hardware Guide went out of its way to "prove" that AMD's higher memory bandwidth wasn't an advantage and that Kentsfield's four cores were not bottlenecked by memory. But now that Nehalem has 3 memory channels the advantage of more memory bandwidth is mentioned in every preview. We'll get the same thing when Intel's Quick Path is compared with AMD's HyperTransport. Reviewers will be quick to point to raw bandwidth and claim that Intel has much more. They of course will never mention that the standard was derated from the last generation of PCI-e and that in practice you won't get more bandwidth than you would with HyperTransport 3.0.

I could be wrong; maybe review sites won't shift benchmarks when Nehalem appears. Maybe they will stop giving Intel an advantage. I won't hold my breath though.

Addition:

We can see where Ars Technica discovers PCMark 2005 error. Strangely the memory score gets faster when PCMark thinks the processor is an Intel than when it thinks it is a an AMD. Clearly the bias is entirely within the software since the processor is the same in all three tests:

I've had Intel fans claim that it doesn't matter if Anandtech cheats in Intel's favor because X-BitLabs cheats in AMD's favor. Yet, here is the X-Bit New Wolfdale Processor Stepping: Core 2 Duo E8600 Review from July 28, 2008. In the overclocking section it says:

At this voltage setting our CPU worked stably at 4.57GHz frequency. It passed a one-hour OCCT stability test as well as Prime95 in Small FFTs mode. During this stress-testing maximum processor temperature didn’t exceed 80ºC according to the readings from built-in thermal diodes.

This sounds good but the problem is that to properly test you have run two separate copies of Prime95 with core affinity set so that it runs on each core. This article doesn't really say that they did that. There is a second problem as well dealing with both stability and power draw testing:

We measured the system power consumption in three states. Besides our standard measurements at CPU’s default speeds in idle mode and with maximum CPU utilization created by Prime95

This is actually wrong; the Prime95 test they peformed was not maximum power draw; it was Prime95 in Small FFTs mode. But this doesn't agree with Prime95 itself which clearly states in the Torture Test options:

In-place large FFTs (maximum heat, power consumption, some RAM test)

So, the Intel processors were not tested properly. Using the small FFTs does not test either maximum power draw or temperature and therefore doesn't really test stability. If any cheating is taking place at X-Bit, it seems to be in Intel's favor.

Thursday, June 05, 2008

Nehalem Appears But Anandtech Chokes While Tooting Intel's Horn

Hopefully in the Nehalem preview Anand is telling the truth when he says that it was done without Intel's knowledge because Anandtech fumbled badly with this introduction. Clearly Anand is not someone to trust to carry your fine crystal.

June 5, 2008

Taken at face value the above scores show an impressive 21% increase in speed for Nehalem.

April 23, 2008

However, note that while the Q6600 score goes up 98 points from March to April the score for Phenom 9750 goes down 54 points.

March 27, 2008

And even more strange is that the score for Q9450 was 980 points higher back in February. If this number is accurate then Nehalem's increase is reduced to 10%.

February 4, 2008

However, even a 10% cumulative gain would be very nice on top of the gains we've seen for C2D and Penryn. Unfortunately, the single threaded scores are not quite as impressive.

June 5, 2008

This would be an increase of only 3% for Nehalem.

February 4, 2008

However, the score from February was 366 points higher. If this score is correct then Nehalem would be 9% slower.

Anand is now claiming that the reduction in speed is due to a Vista update. We can check this easily enough and see if the drop in speed is consistent.

3Dsmax 9 - 21% slower
Cinebench XCPU - 9% slower
Cinebench 1CPU - 11% slower
Pov-Ray 3.7 - 13% faster
DivX - 8% faster [version change from 5.13 to 5.03]

Well, this isn't exactly consistent. 3Dsmax shows twice the slowdown of Cinebench while Pov-Ray gets faster. DivX gets faster too although this is less definitive since Anand shifts to a slight older version.

Conclusions (or perhaps Confusions):

Penryn is 21% faster when using all threads, or . . . it is only 10% faster.

Penryn is 3% faster with single threads, or . . . it is 9% slower.

I am looking forward to seeing some proper testing of Nehalem. I'm sure most are anxious to see Nehalem compared with AMD's 45nm Shanghai. I have no doubt that Nehalem is much stronger in multi-threading than C2D. I believe that Intel did this with the goal of stopping AMD from taking server share. However, this is a double edged sword since gains for Nehalem will surely be losses for Itanium.

It remains to be seen if Nehalem fairs better in benchmarking than Penryn. Penryn benefited from code that used lots of cache and was predictable enough for prefetching to work. Nehalem however has far less cache but is also much less sensitive to prediction because of the reduced memory latency. Nehalem is also much more like Barcelona in that L3 has much higher latency than L2 making the bulk of Nehalem's cache much slower than it was with Penryn. One would imagine that Nehalem's preference in program structure would be much closer to Barcelona's preference than Penryn's has been. Comparisons later this year should be interesting.

Monday, April 14, 2008

Updates And Old Patterns

Amid AMD's torrent of bad news: the exit of Phil Hester, the reduced estimates for Q1 Earnings, and the announced 10% layoffs we can at least give AMD a small amount of praise for getting the B3 stepping out the door. It's a small step on a long road.

We can finally settle the question of whether AMD's 65nm process is broken. AMD's fastest 65 watt, 90nm K8 runs 2.6Ghz while the fastest 65 watt, 65nm K8 runs 2.7Ghz. So, the 65nm process is at least a little better than the old 90nm process. People still keep clamoring for Ruiz to step down. Frankly, I doubt Ruiz had any direct involvement with K10's design or development so I'm not sure what this would accomplish. I think a better strategy would be for AMD to get the 2.6Ghz 9950 out the door as soon as possible and try hard to deliver at least a 2.6Ghz Shanghai in Q3. Since Shanghai has slightly higher IPC a 2.6Ghz model should be as fast or faster than a 2.7Ghz Barcelona. I would say that AMD needs at least that this year although this would leave Intel with the top three slots.

AMD's current strategy seems to recognize that they are not competitive at the top and won't get there soon. The collection of quads without L3 cache, Tri-core processors, and the current crop of low priced quads including the 9850 Black Edition all point to a low end strategy. This is the same pattern AMD fell into back in 2002 when it devalued its K7 processors. Of course in 2002 AMD didn't have competitive mobile and its only server processors were Athlon MP. So perhaps Puma and a genuine volume of B3 Opterons will help. AMD's excellent 7xx series chipset should help as well but apparently not enough to get back into profitability without layoffs.

The faster B3 steppings are an improvement but you get the feeling they should have been here last year. You get a similar feeling when Intel talks about the next Itanium release. Although Itanium began with hope as a new generation architecture its perpetual delays keep that feeling well suppressed. And, one has to wonder how much of Itanium will be left standing when Intel implements AVX in 2010. We all know that HP is the only thing holding up Itanium at this point. Any reduction in support by HP will be the end of Itanium. And, we get a similar feeling about Intel's 4-way offerings which always seem to lag nearly a year behind everything else. For example, although Nehalem will be released in late 2008 the 4-way version of Nehalem won't come out until late 2009. Some still speculate that this difference is purely artificial and Intel's way of giving Itanium some breathing room.

However, as bad as AVX might be for Itanium it has to be a double shock for AMD coming not long after the announcement of SSE5. AVX seeks to copy SSE5's 3 and 4 operand instructions while bumping the data width all the way up to 256 bits. It looks like AMD only has two choices at this point. They can either drop SSE5 and adopt both SSE4 and AVX or they can continue with SSE5 and try to extend with GPU instructions. Following AVX would be safer but would put AMD behind since it is unlikely at this point that they could optimize Bulldozer for AVX. Sticking with SSE5 and adding GPU extensions would be a braver move but could work out better if AMD has its Fusion ducks in a row. Either way, Intel's decision is likely to fuel speculation that Larrabee's architecture isn't strong enough for its own Fusion type design. Really though it is tough to say at this point since stream type processing is just beginning to take off. However, GPU processing does demonstrate sheer brute power on Folding @ Home protein sampling. This makes one wonder why OC'ers in particular cling to the use of SuperPi which is based on quaint but outdated x87 instructions as a comparative benchmark.

There is also the question of where memory is headed. Intel is already running into this limitation with Nehalem where only the server and top end chips will get three memory channels. I'm sure Intel had intended that the mid desktop would get three as well, but without FBDIMM three channels would make the motherboards too expensive. This really doesn't leave Intel anywhere to go to get more bandwidth. Supposedly, AMD will begin shifting to G3MX which should easily it to match three or even four channels. However, it isn't clear at this point if AMD intends G3MX on the desktop or just the servers and high end like Intel. With extra speed from DDR3 this shift probably doesn't have to happen in 2009 but something like this seems inevitable by 2010.

Saturday, March 08, 2008

45nm, An Interesting Convergence In Design

AMD's K8 has been trailing C2D for the past 18 months. Making things worse, AMD has stumbled a bit with its K10 launch while Intel's Penryn seems to be on schedule. It is somewhat surprising then to discover that Intel's Nahelem and AMD's Shanghai are so similar.

Both are quad core, both use L3 cache, and both use a point to point interface (QuickPath for Intel and HyperTransport for AMD). And, according to Hans De Vries:

Nehalem
731 Million Transistors
246mm Die Size
7.1mm/MB L2 Cache Density

Shanghai
700 Million Transistors
243mm Die Size
7.5mm/MB L2 Cache Density

We don't really see much difference until we look at core size and L3 density:

Nehalem
24.4mm Core Size
5.7mm/MB L3 Cache Density

Shanghai
15.3mm Core Size
7.5mm/MB L3 Cache Density

AMD uses 2MB's of L2 + 6MB's of L3 for 8MB's total.
Intel uses 1MB of L2 + 8MB's of L3 for 9MB's total.

Along with similar total cache size the area devoted to cache is similar as well (nearly identical area for L3). However, the area devoted to core logic is quite different:

Nehalem
Core Area - 97.6mm
L2 Area - 7.1mm
L3 Area - 45.6mm

Shanghai
Core Area - 61.2mm
L2 Area - 15mm
L3 Area - 45mm

We see right away that Nehalem devotes 85% more die area to core logic than to cache whereas Shanghai devotes about the same die area to core logic and cache. It is almost a certainty that with Nehalem's greater amount of core transistors that it will run faster than Shanghai. On the other hand it should also consume more power. If we assume that Intel gets a reduction in power draw due to having better 45nm transistors then this should offset some of Nehalem's higher power draw. However, with 60% more transistors devoted to core logic I don't believe that all of this could be offset. My guess is that at full load Nehalem will draw more than Shanghai but Nehalem should be closer at idle power. Actually, Nehalem's core ratio at 40% is almost the same as Penryn's at 41% which is only slightly less than Merom's at 44%. In contrast, Shanghai's core ratio has dropped to a tiny 25% much smaller than Barcelona's 36%.

Penryn has a density of 6.0mm/MB for L2. Therefore, I would expect Nehalem's L2 at a density of 7.1mm/MB to be faster than Penryn's L2. However, I would also expect Nehalem's L3 at 5.7mm/MB to be slightly slower than Penryn's L2. This is interesting because we know that Barcelona's L3 is a bit slow. However, with an L2 twice the size of Nehalem's this should be a closer match than Barcelona is to current Penryn with its massive 12MB L2. Essentially, Shanghai is Barcelona with 3X the L3 cache but Nehalem's cache structure is much more like Shanghai's than Penryn's. Shanghai is unlikely to have any significant core changes from Barcelona but there may be some minor changes in the decoding section. This is pretty much what I would expect. Since Barcelona only improved on the decoding speed of FP instructions compared to K8 I would expect AMD to tweak a few more Integer instructions in Shanghai to increase speed. Decreasing decoding time for commonly used instructions would also make a lot of sense given Penryn's advantage in decoders. AMD is not likely to get a boost of 10-15% doing this but a boost of, say, 3% is possible. AMD might see another 3% boost in speed due to larger L3 although this could possibly be higher if AMD could make L3 faster. A lot of people are wondering if AMD will bump the master L3 clock on Shanghai for better performance at speeds above 2.3Ghz. Just bumping the clock could be worth another 1%. So, I'm guessing a total speedup for Shanghai of perhaps 7% versus Barcelona.

There is one other obvious difference in the die layout. Shanghai has mirror image cores. This means that a master clock signal originating from the center of the Northbridge area should propagate out to each of the four cores at the same time and this is a standard way of doing timing. However, since Nehalem's cores are placed side by side it must be using a more sophisticated clocking scheme. In other words, there is not any place on the Nehalem die where a clock signal would reach all four cores at the same time. This would tend to suggest the possibility that each core actually has its own master clock allowing them to run nearly asynchronously. If Nehalem does indeed have this ability then this could be a big benefit in offsetting the higher power draw since cores that were less busy could simply be clock scaled downward. AMD could therefore be in for a tough fight in 2009 in spite of the similarities between Shanghai and Nehalem. The one big benefit to AMD from the similarity is that it makes it nearly impossible to cache tune the benchmarks in Intel's favor. This means that Intel is probably going to lose some artificial advantage that it has now with Penryn. However, given the difference in core transistors and Nehalem's greater memory bandwidth I can't see any reason why Nehalem couldn't make up for this loss and still have an overall performance lead at the same clock as Shanghai.

Friday, February 29, 2008

2008: AMD Still Trailing

Intel is still moving along about the same as it has been making slow, incremental progress from the time that C2D was launched in 2006. It is clear that the increase in speed from Penryn doesn't match the early Intel hype but nevertheless any increase in speed is just that much more that it is faster than AMD. Likewise the tiny speed increases from 3.0 Ghz to 3.16 Ghz (quad core) and 3.4Ghz (dual core) are no doubt frustrating for Intel fans who would like more speed. On the other hand, AMD currently has nothing even close..

AMD will probably deliver 2.6 Ghz common chips in Q2. This chart at Computerbase claims AMD will release an FX chip in Q3. I'm not so sure about this because everything would suggest an FX of only 2.8 Ghz. This is probably the lowest clock that AMD could possibly get by with on the FX brand. There is no doubt that AMD needs a quad FX because people who bought FX in 2006 were promised upgrades and none have been forthcoming. Such a 2.8 Ghz FX would probably be clockable to 3.0 - 3.1 Ghz (with premium air cooling) based on what I've seen of the B3 stepping. This is probably the best AMD can do for now as I haven't seen anything that would suggest that B3 can deliver 2.8 Ghz as a common volume. This means that the poor man's version of FX, Black Edition will probably bump up to 2.6 Ghz as well. Intel seems to be somewhat behind in terms of 45nm but this hardly matters since their G0 stepping of 65nm works so well. But there is no doubt that AMD will be facing more 45nm Penryns in Q2. The shortages of chips have shielded AMD somewhat from increased presssure from Intel during Q4 and Q1 (although with Barcelona delayed server share may take another hit in Q1). However, as Q2 is the lowest volume of the year AMD will have to be aggressive to avoid a volume share drop during that quarter.

Probably, Fuad is closer to the truth of FX saying Q3:

The Deneb FX and Deneb cores, both 45nm quad-cores, are the first on the list. If they execute well we should see Deneb quad-core with shared L3 cache and 95W TDB in Q3. If not, the new hope will slip into Q4.

The timeline for FX being Q3 or maybe Q4 is not surprising at all. What is surprising is the idea that AMD's first new FX chip would be 45nm. If this is true then this would support the notion that AMD has suspended development of 65nm. But it would be surprising if 45nm could ramp that quickly.

The question then is what will happen in Q3 as AMD faces a steadily increasing volume of Penryn chips. The rumors suggest that AMD will not try to release a 65nm 2.8 Ghz Phenom. I'm not sure if this would then indicate that the 65nm process would hit a ceiling or whether this is to suggest that AMD will pursue these speeds with 45nm Shanghai. Another question is what 9850 might be. 9750 is supposed to be 2.4 Ghz while 9950 has been suggested to be 2.6 Ghz. So, would 9850 be 2.5Ghz perhaps? The topping out of the naming scheme does lend some credibility to the idea that AMD will suspend 65nm development and try to move to Shanghai as quickly as possible. Nevertheless, there is a big, big question of whether AMD could really deliver a 2.8 Ghz 45nm Shanghai in, say, Q3. Ever since the release of 130nm SOI, AMD's initial clock speeds on the new process have always been lower so there is a lot of doubt that AMD could reach 2.8Ghz on 45nm any sooner than Q4 2008. Nehalem will almost certainly be too small of volume in Q4 to be much of a factor. So, it looks like AMD's goal is to somehow get clock speed up and this seems even less likely with a mid year switch in process unless with 45nm AMD exceeds all past SOI efforts.

Early 2009 looks pretty good for Intel since it will not only have Penryn and Dunnington but increasing volumes of Nehalem. It still remains to be seen if Intel really will give up its lucrative chipset business on the desktop with Nehalem. It certainly seems that it wouldn't take much effort to modify an AMD HT based chipset to work with Intel's CSI interface. That would seem to remove a lot of Intel's current proprietary FSB advantage. On the other hand, with ATI out of the way this would seem to be the best time for Intel to face more competition in chipsets. Still, this does leave things a bit up in the air. If it becomes easier and cheaper to design chipsets as it surely would be if CSI is similar to HT then VIA might become more competitive. For AMD's part there seems little they can do in 2009 except try to ramp the clock speeds on Shanghai.

We have three other issues to talk about: one immediate and two longterm. The immediate issue is Hester's interview at Hexus where he mentions the slow clock speeds of K10. Basically, Hester says that the 65nm process is fine; it is a matter of adjusting some critical paths. I've seen this statement heckled by some who insist that you can't separate process from design. Curiously, these are the same people who also insisted that Intel's 90nm process was fine and that it was only a poor design with Prescott that was the problem. Anyway, this statement by Hester actually seems quite accurate to me. It was my impression that AMD had intended K10 to run at lower voltage which would have allowed higher clocks. This again seems to fit what we've seen with K10's limited by TDP. The reason for the higher voltage seems to be that the transistors don't quite switch fast enough and this causes some of the "critical paths" that Hester talked about to get out of synch. You could fix this at a low level by improving the transistors to get them back into spec with the design. Or, you could relax the timing on these critical paths which would get the design on spec with the transistors. Because 45nm is right around the corner it appears that AMD has decided to not expend more resources on 65nm improvement and will instead relax the timing. AMD's work on 45nm transistors will theoretically migrate down to 65nm, at least this is the theory of AMD's CTI (Continuous Transistor Improvement) program. However, we may now be entering a new era where improvements are so specialized that they may be unable to cross process boundaries as they used to and we may see AMD following Intel's lead. This would mean tighter design at the beginning of each process node and less reliance on later improvements.

The two long term issues concern the possibility of a New York FAB for AMD and the announcement on EUV. There are three questions about a NY FAB: Does AMD need it? Can they afford it? And, why NY instead of Dresden where FAB 30 and 36 are now? Need is most obvious because without a new FAB AMD's capacity will top out by mid to late 2010 unless the FAB 38 ramp is slower than expected. Affording is a big question but one that AMD can leave aside for now hoping that their cash situation will improve. The question of location is a curious one. One suggestion was that NY simply offered more incentives than Dresden but this by itself seems unlikely. In every case in the past Germany has shown itself more than willing to contribute money for AMD's FABs. So, the real reason for the NY location may have more to do with other factors. In fact, we even seemed to have some evidence of this from the EUV announcement.

"The AMD test chip first went through processing at AMD’s Fab 36 in Dresden, Germany, using 193 nm immersion lithography, the most advanced lithography tools in high volume production today. The test chip wafers were then shipped to IBM’s Research Facility at the College of Nanoscale Science and Engineering (CNSE) in Albany, New York where AMD, IBM and their partners used an ASML EUV lithography scanner installed in Albany through a partnership with ASML, IBM and CNSE, to pattern the first layer of metal interconnects between the transistors built in Germany."

Secondly, we need to remember that AMD only fell behind on process technology when it moved to 130nm in 2002. Prior to this AMD was doing pretty well. Although things seemed to improve after AMD's rocky transition to 130nm SOI AMD now seems to be falling behind again at 45nm. AMD used to operate its Submicron Development Center (SDC) in Sunnyvale, California. This facility was leading edge back in 1999. It surely is not lost on AMD that they have now surpassed IBM. Back in 2002 AMD only had a 200mm FAB while IBM had a more modern 300mm FAB as well as more capacity. AMD today has caught up in terms of FAB technology but passed IBM in terms of capacity. The big question for AMD has to be how badly IBM needs leading edge process technology and for how long. Robust server and mainframe chips need reliability more than top speed. Secondly, IBM has been steadily divesting hardware so one has to wonder when the processor division might become a target. Notice that in the above announcement the wafers had to be flown from FAB 36 in Dresden to New York. Given these facts I think it is possible that AMD wants to create another research facility at New York. I think this could serve both to tweak processes faster and optimize them better for AMD's needs as well as pick up any slack if research at IBM falls off. There has been no indication of this but it does seem plausible.

The recent EUV announcement is incomplete however. If we look at an IBM article on EUV in EETimes from February 23, 2007 we see that IBM very much wanted EUV for 22nm but figured that it wouldn't be ready in time for early development work.

The industry hopes EUV will make it into production sooner than latter, but the technology must reach certain milestones. ''I think the next 9 to 12 months are very critical to achieve this,'' said George Gomba, IBM distinguished engineer and director of lithography technology development at the company.

Twelve months from February 2007 would be now. So, what is missing from the EUV announcement is whether or not this recent test puts EUV on track for IBM for 22nm or whether it will have to wait for 16nm. A second question is why the test wafer was made at Dresden by AMD. If IBM had already tested its own wafers then why didn't it announce earlier? This could mean that AMD has decided to try to hit the 22nm node for EUV but that IBM has decided to wait until 16nm. If this is a more aggressive stance for AMD then it could mean that AMD will rely less on IBM for process technology for 22nm. This again would support the idea that AMD wants a new design center in NY. I think it is entirely plausible that AMD could surpass IBM to become the senior partner in process development over the next few years.

Sunday, January 20, 2008

AMD/Intel Q4 2007 Earnings – Where Are We?

AMD's recent performance has been a study in disappointment. Earlier, SSE 5 and microbuffered memory had looked so promising for 2009. Now, it appears that these may not arrive even in 2010. Consecutive quarterly losses, low clocks, and the TLB bug have added to AMD's pain. In contrast Intel has rebuilt its revenues, upgraded C2D to the G0 stepping and delivered the 4-way capable Caneland platform. Intel has had some difficulty with 45nm Penryn but this hasn't mattered much with the G0 65nm quad cores available.

Sometimes things are a bit counter-intuitive. For example, I've seen many assume that AMD's fortunes began to boom in 2003 with the introduction of K8. In reality, 2003 was worse for AMD than 2002. 2004 was better but AMD suffered from volume limitations due to the larger K8 die. AMD's volume share stayed at about 16% from 2002 all through 2004. It wasn't until 2005, fully two years after the introduction of K8, that AMD began making strides in the market. We see the same pattern repeated by Intel where 2006 was far worse than 2005 in spite of the introduction of C2D. To understand just how well Intel is doing we need to compare with 2005, not 2006. The numbers show that Intel is doing well but hasn't quite recovered to what it had in 2005:

2005:
Processor Earnings – $28.1 Billion
4th Quarter Cash and Short Term Investments - $12.8 Billion
Average Gross Margin – 59.3%

2007:
Processor Earnings – $25.9 Billion
4th Quarter Cash and Short Term Investments - $12.8 Billion
Average Gross Margin – 51.9%

2008 Outlook:
Predicted Average Gross Margin - 57%

Intel's Outlook makes it clear that they do not expect to return to the Gross Margins of 2005. In fact, 57% doesn't even match the 58% Average Gross Margin of 2004. That point is quite puzzling. The common reasoning has been that as Intel switches to 45nm their costs will drop dramatically. This should easily be enough to boost the Gross Margin up from the current 58% to above the 2005 levels. But, Intel says this isn't going to happen and Intel's prediction of an Average Gross Margin of 52% for 2007 was right on mark. Apparently, Intel either expects that ASP's will drop or costs will rise enough to counter the decrease in costs from 45nm. This again goes against common reasoning which assumes that Intel's higher clock speeds insulate them from pricing pressure from AMD. This further goes against Intel's own statements of reorganization which were supposed to result in another round of layoffs in 2008. I think we now have to assume that Intel's reorganization has halted halfway through and that nothing further will be done. This however disagrees sharply with the inclination by many to label Intel as “lean”. Intel cannot be lean with only half of the reorg in place; Intel still has some love handles.

If we look at the facts instead of simply talking from our gut reaction this does make sense. AMD had gained a lot of server share in 2006. However, Intel took most of this share back at the end of 2006 dropping AMD's share by 40%. AMD's server share average for 2007 has been only 14.2% compared to the 23.5% average for 2006. Intel has held onto this share due both to the value of C2D dual core and having a monopoly on quad core. Unfortunately, the problem with being on top is that you can only go down. And, Intel now faces substantial quad core server pressure from AMD. AMD moved some 130K Barcelona server chips in 2007 and this will only increase. Secondly, AMD has substantially undercut Intel in terms of lower clocked quad core offerings. These lower clocked quads are also much more immune to Intel's lower 45nm power draw which is more dramatic as we increase clock. According to AMD, the high volume range for servers is 2.3Ghz. If this is true then Intel is not at all immune to server pricing pressure and we know that servers are Intel's highest ASP. I would expect that AMD will move back up to its 2006 server share in 2008. We also know that AMD's Griffin/Puma mobile platform will be out soon and this should put pressure on Intel's mobile segment which is the second highest ASP. It then begins to make some sense when we realize that the segment where Intel will still dominate will be desktop which is the lowest ASP. Even so, AMD did report some increase in desktop ASP in Q4.

The talk about an AMD takeover or bankruptcy is never ending. A takeover is technically and financially possible except that I can't think of a single company that wants to get into the frontline processor business and slug it out with Intel. Motorola was the last challenger but they divested their processor division as Freescale in 2004. Some might point to VIA. Well, VIA's parent company could pull off the finances but they've never had a frontline processor. Although Cyrix was initially competitive in 1995, Cyrix's position had slipped a generation by the time they were acquired by VIA in 1999. Secondly, VIA never manufactured Cyrix processors; they instead chose the Centaur architecture from IDT which was never frontline. Presumably if VIA had wanted to be competitive they would have worked on a new generation of the Cyrix design which would have been released around 2002 or 2003. This never happened so VIA as buyer is very unlikely.

Some have suggested nVidia or Samsung as a buyer but neither of these companies has any experience with processors. Samsung does memory products which is a long way from the logic intensive design of cpu's. These don't overlap at all in terms of manufacturing or design methodology. NVidia does chipsets and graphics and nVidia would be an obvious monopoly conflict due to AMD's ATI holdings. No doubt someone will mention IBM. However, IBM could never take over AMD's markets since IBM's use of AMD server processors would make it a competitor with its own customers. This would mean giving up the most profitable market segment of AMD's line and would probably impair volume with customers like HP, Gateway, and Dell that also produce servers. In other words, IBM would lose its current R&D support from AMD (the largest of all its partners), lose the technical support that AMD provides to Chartered, and lose AMD support for APM. The loss of volume would also make AMD processors cost more. Thus, IBM would turn a supporting partner and Intel hedge into a money losing niche processor division. Economically, this makes no sense at all. So, an AMD purchase is probably somewhat less likely than, say, striking oil in your backyard.

AMD managed to increase its revenues every quarter in 2007 as well as its Gross Margins. AMD finally managed to trim the quarterly loss to $164 Million. Now, I have to be honest and mention that AMD's current value per share of stock is worse than it was at the end of 2003. Adjusting for the change in number of shares AMD was comfortably higher by about $1.2 Billion but the $1.6 Billion goodwill charge has now brought this about $400 Million under. This is cause for concern if AMD's loss increases again in Q1. Supposedly AMD will break even in Q2. AMD's outlook though is oddly different from Intel's. Whereas Intel predicts a stagnant Gross Margin at a lower value than even 2004, AMD predicts that the current level of 44% will hold during the first half of 2008 but increase to 50% in the second half of 2008. So, why would this be? First of all, by Q3 AMD should finally have at least 2.8Ghz chips. This would cover nearly the entire range since it looks like Intel won't go higher than 3.16Ghz. There is also the question of 45nm from AMD. Statements made by AMD during the Earnings Call include:

Ruiz
“look forward to being able to ramp 45 nanometer aggressively in the second half of this year.”

Meyer
“Followed on in the second half of the year by 45 nanometer.”

Meyer
“we’ve got internal samples of our 45 nanometer microprocessors, we’re putting them through their paces currently and we’re on track to, the plans we talked about in the past which is to start our ramp in the first half of this year and ship revenue product in the second half of this year.”

I have seen some attempt to spin these statements to mean that AMD will begin producing 45nm in the second half with actual delivery in 2009. This, however, matches neither what AMD said nor common sense. Someone might mistakenly take Ruiz's comment to mean initial production however this is completely countered by Meyer's statement. Secondly, AMD first produced Brisbane samples in Q2 2006 and delivered product in Q4. If AMD has samples in Q1 then product delivery should by Q3. So, the most reasonable assumption is that AMD will begin production in Q2 and deliver some small amount in Q3. This amount is unlikely to have any effect on revenues but the Q4 volume should be more significant. Likewise, I doubt Intel's volume of Nehalem in Q4 will have any effect on revenues. So, adding in AMD's mobile and higher clocked quad cores in Q3 along with some 45nm in Q4 I could see an increase in Gross Margin in the second half. Finally, this begins to make sense. Right now, AMD overlaps the highest volume range and should increase its overlap during 2008. This would put pricing pressure on Intel with most likely some loss of ASP and this would offset the 45nm savings. On other hand, AMD's ASP's are very low so it can only move up. It also makes sense because 50% Gross Margin would still be significantly below Intel's 57%. This would be consistent with lower ASP's for AMD and higher costs due to a lower ratio of 45nm production. Some might also have noticed that AMD has scaled back its expected 2008 volume from 100 million units to 80-90 million units. This is undoubtedly due to the scale back in the FAB 38 ramp. This in turn has undoubtedly reduced AMD's target of 30% share for 2008.

I guess the bottom line is that as long as AMD can reach break even in Q2 it should be able to start gaining asset value again. And, it looks like AMD will finally get back to the processor revenues that it had in 2006. Judging from the current unit volume though I would say that AMD intends to start gaining again, especially in servers and Intends to get back to what it had in 2006. The really surprising thing is a comparison with 2001. AMD's volume share had been about 16% but this increased to 20% in 2001. With the release of the Northwood P4 and AMD's delay in getting to 130nm, AMD's share tumbled in early 2002. AMD left 2002 with roughly same 16% volume share that it had in 2000. It stayed at this level all through 2004 then creeped up to 18% in 2005. AMD's volume gain in 2006 was much more dramatic rising to 23%. If the 2002 pattern had been followed (as many predicted) then AMD should have lost this share again in 2007. When AMD's volume share tumbled in Q1 2007 many of these armchair experts gave themselves big pats on the back. However, they couldn't have been more wrong. AMD's volume share promptly rebounded and AMD has held onto its 2006 average for the last three quarters. What these self styled experts failed to take into consideration was that when the crash occured in 2002 AMD had only been at its high point for two quarters whereas when the crash came in 2007 it followed five quarters of good volume. In other words, the two are opposites. The high in 2001 was temporary whereas the dip in early 2007 was temporary. According to IDC AMD lost 0.4% unit share and is now at 23.1%. This is about equal to AMD's 2006 average. Overall, I expect AMD to gain back its 2006 server share and go above its previous mobile share while holding onto and increasing its current desktop share. This would mean an increase in unit share from the current 23% to perhaps 26% in 2008. This seems doable but falls quite a bit short of AMD's previous target of 30%.

Intel in 2008 should finally pull ahead of its previous processor revenue high of 2005. It's Gross Margins may not reach what they were but they will still be quite good and I'm sure Intel will continue increasing its cash and doing stock buybacks. By any account this should be good performance and a reasonable rate of growth. Pricing pressure in the second half of 2008 as AMD moves up above 2.6Ghz should produce some good values for buyers. We can look forward to seeing how well K10 scales and whether or not 45nm Shanghai produces any change in speed or any reduction in power draw. We should also be getting reviews of Nehalem in Q4.

Scientia's Blog