Scientia's Blog: May 2007

Sunday, May 27, 2007

AMD's Outlook

I showed in my last article that bankruptcy for AMD is unlikely in 2007. I also showed that benchmark code should use the PGI compiler since it is faster for Intel as well. But, let's concentrate just on where AMD is and where it needs to be.

AMD's luck has not been good lately. They've seen a sharp drop in revenue in Q1 that will probably be matched in Q2 and that will almost certainly make three quarters in a row with an average loss of $500 Million (for $1.5 Billion total). And, whereas AMD had good success with Nexgen, DEC, and IBM the ATI merger seems a bit different. While it isn't a total failure like AMD's later collaboration with Motorola and UMC, it does seem more of a work in progress. It also appears that AMD is going to have to have another revision of Barcelona before launch.

For discreet graphics, R600 is not exactly a success. The delay to May is now followed more delays and less than stellar performance for the 1900. Fuad has suggested that not only is AMD going to move to the 65nm process with R650 but that ATI is going to make some tweaks to the architecture as well. Basically, this means three things. It means that R650 is going to solve the current 1900 heating and power problems. It also means that R650 is much more than a simple shrink of R600. However, this means that R650 is what R600 should be right now so ATI has to be seen as trailing by about half a generation. R650 should be able to finally move beyond the current barrier of 8800GTS performance and get up to 8800GTX where it needs to be. Obviously, nVidia is going to tweak its own designs so there will be something faster by then however there is no doubt that R650 will close up some of the current large gap in performance. Beyond this it remains to be seen if AMD will still be trailing when R700 is released or whether these delays will delay it as well. This does indeed bring into question whether the ATI merger is helping or hurting ATI. My guess is that these problems were all in place at ATI before AMD even considered the merger and I would say that AMD is currently scrambling to give ATI some very needed assistance. AMD does seem to know where the problems are so its a question of how quickly they can get them fixed.

Barcelona now appears to be delayed further into Q3. Most likely, another revision was needed to fix some remaining bugs. Another revision means at least a two month delay and this seems to match with the latest release statements. The INQ suggested that the current revision was ready to go but it looks to me like they got it wrong. I'm also inclined to believe another revision was needed because it looks like AMD is trying to compress the transition from Opteron server chips to Phenom desktop chips to be able to hit the December buying season. AMD will also have to do this in the face of another round of price cuts from Intel, the faster 3.0Ghz Kentsfield and Clovertown, and the knowledge that Penryn could remove a lot of Clovertown's deficiencies. The launch clock speeds are very much up in the air though. AMD had originally said 2.3 in Q3 and 2.5 in Q4 for quad core. However, that seems to be when they believed that Intel would stop at 2.66 on quad core. There is no doubt that Intel's 3.0Ghz speed puts pressure on AMD to move higher. However, what AMD might be capable of delivering is anyone's guess. There are some pretty good indications that AMD could launch 100Mhz higher with 2.4Ghz instead of 2.3 and then go to 2.6Ghz in Q4. However, Intel could match this with a 3.0Ghz Penryn and then followup in Q1 or Q2 with 3.33Ghz. So, even if AMD bumped the speed to 2.8Ghz in Q1 or Q2 they could still end up trailing.

AMD doesn't have many bright spots but they do have some. Their integrated chipsets seem to be working okay and their mobile chipsets seem reasonably competitive. AMD should be able to do a lot of volume in the mid to low end desktop with both DTX and its chipsets and with mini-DTX. However, the desktop has lower margins than either mobile or server and this is the low half of the desktop. This does seem reminiscent of 2002 when AMD was hanging on by selling K7 against the lower performing Celeron. In fact, mini-DTX versus mini-ITX is exactly this kind of mismatch with mini-DTX having more memory and memory bandwidith, more expansion capability, and more cpu power. This should at least help AMD hold on in the 2nd half of 2007 if they can do modest mobile sales and gain back some position in servers. Obviously though, AMD has no real chance of challenging Intel on the desktop until they can deliver some real volume in 2008 and by that time they will be up against Intel's 45nm Penryn. AMD should be fine in 4-way and higher servers after Barcelona is finally released. AMD can probably claim the fastest desktop with QFX if Intel doesn't match with its own dual socket system. Intel could very well retain the title of fastest single socket system.

While Intel seems on track and only has to make small steps with Penryn and Nehalem it looks like AMD is going to have to work a lot harder to get back into the game. There are potential gains for AMD but none of them are going to be easy. If AMD works hard and stays on track they could be in a more competive position with mobile, graphics, and high end server by mid 2008. However, since Nehalem's performance is an unknown AMD could find itself with a much tougher high end server competitor in 2009 without ever having gained signficantly in either high end desktop or low end servers. Time will tell.

Wednesday, May 23, 2007

AMD Q1 2007 Outlook

AMD's Q1 2007 results were certainly disappointing. This has caused the rumors about bankruptcy for AMD to start popping up like mushrooms. Beyond bankruptcy there is the question of AMD's Q1 sales and after that there is the question of outlook for the rest of 2007 and into 2008. Time to dig in.

The simplest way to handle the question of bankruptcy is to look at AMD's history and assume that AMD can survive whatever it has already survived. The most definitive ratio is the stockholders' equity to assets ratio. This essentially tells what percentage of the company is not in debt. AMD's ratio in Q4 06 was 44% but it has now dropped to 41%. In spite of the better than $600 Million loss in Q1 this really isn't too bad since AMD's lowest percentage was 35% back in 2003. This suggests that AMD could lose another $1.4 Billion to hit 35% and still survive (since it did in 2003). Therefore, bankruptcy does not seem to be an immediate problem.

Figuring out AMD's processor position is a bit tougher since they've now started combining processor, embedded, and chipset revenues. However, by any measure it is way down from Q4 06 (about 30% lower). As far as I can tell the dropoff is way above the usual seasonal decline, but the only other comparison is Intel. So, we have two possibilities. The first possibility is that Intel's numbers are indicative of the quarterly demand and demand for AMD has fallen sharply. The second possibility is that general demand has fallen sharply but Intel was bouyed by pent-up demand. It doesn't seem reasonable though to assume that general demand has fallen off as sharply as AMD's numbers so there must be another reason.

However, no other reason is obvious. AMD is now 100% 65nm on FAB36 and is producing more than half of its volume as 65nm. This is very confusing because AMD's inventory has not increased enough to swallow 30% of the production yet it is difficult to imagine that they've cut production that drastically after being maxed out just one quarter ago. The only other explanation I've seen is that AMD has unreported income due to payments next quarter from government sales. While Net 60 or 90 terms are not unusual, one would nevertheless think that if AMD had large revenues due next quarter that they would have mentioned this. Since AMD didn't mention this, I have to discount this line of reasoning. The only other possibility I can think of is that AMD's cpu sales fell off because of delays in delivering chipsets. About the only bright spot in this is that almost every possible reason for the sharp drop in both volume and revenue will get better in the next two quarters.

To see where AMD really stands it is necessary to dispense with the typical shell game that plagues most attempts to characterize AMD. For example, people compare AMD with nVidia for discreet graphics and then compare AMD with Intel for processors. People often dodge back and forth comparing AMD with either Intel or nVidia for integrated graphics and chipsets using whichever seems a tougher competitor. The truth is that Intel doesn't yet make discreet graphics and nVidia doesn't make processors. AMD is currently the only company that makes processors, chipsets, and discreet graphics yet I can't recall a single editorial or analysis that mentions this fact. And, the double standards for Intel and AMD are still very much alive. For example, I've seen comment after comment about AMD's R600 delay yet no mention of Intel's 965 chipset being two months late or the proper drivers being a full year late. Sadly for Intel, it looks like the finished drivers may arrive just in time to discourage upgrade to Intel's newer chipsets for Nehalem. I've now seen people proclaiming that R600 is a total failure when it has already been suggested in the trades that AMD's R600 orders will max out the capacity at foundry giant TSMC.

Without doubt though, the good news for AMD is mini-DTX. Again, the trades have suggested that mini-DTX will shove aside all other contenders to become the dominant small form factor standard on the desktop. Having compared AMD mini-DTX boards with Intel's latest, underpowered mini-ITX offerings, it is difficult for me to argue with this assessment. While Intel will offer cheap, Celeron class solutions on mini-ITX, AMD's only slightly larger mini-DTX boards will deliver real desktop power with double the cores, double the memory bandwidth and memory capacity. AMD can do this while delivering both PCI and PCI-e while Intel is only able to deliver PCI. There is little doubt that Intel is guilty of a major oversight in low cost desktop architecture but it still isn't clear what Intel can do to dig itself out. Mini-ITX is low cost but underpowered while pico-BTX has enough power but can't begin to compete in cost. This leaves AMD mini-DTX with no competitor and Intel with no sandbags to plug the gap in levee.

Likewise, Intel's Geneseo has now shown itself to be not at all a competitor for AMD's Torrenza as was first claimed. First, while Torrenza products are already available, users will have to wait until 2009 to get any Geneseo products. And, these delayed Geneseo products will be little more than upgrades to existing PCI-e products. There is no doubt that this will hurt Intel both in terms of high powered servers and HPC. HPC accelerators benefit from Torrenza's cache coherency which is missing on Geneseo while high powered servers benefit from the much greater bandwidth of HTX for things like networking cards. In simple terms, while Torrenza is designed for maximum power with lowest latency and greatest bandwidth, Geneseo is designed for low cost and overlap with existing PCI-e. This is quite a handicap because Torrenza too is designed to leverage existing HyperTransport technology to minimize development costs. But since PCI-e was already far behind HyperTransport Intel's boost with Geneseo does nothing to close the gap with Torrenza. Essentially, Geneneseo is roughly competitive with HyperTransport 1.0 as AMD moves on to the considerably more powerful HyperTransport 3.0. Consequently, an HTX powered network will run rings around a Geneseo powered network. So, in spite of the new Nehalem architecture we see Intel giving ground in high end X86 server architecture and once again optimizing for low end single and dual socket systems. One certainly has to question Intel's direction given a renewed X86 attack from AMD with K10 and a very aggressive attack from IBM with much faster Power speeds.

So, while Geneseo's delay brings into question when an actual CSI implementation will be ready, we also have to wonder about mobile. Intel took most of the mobile share by delivering the specialized Pentium M processor with a good mobile chipset and wireless capability in the Centrino platform. However, the current situation is ironic indeed. While Intel had a separate architecture for mobile, AMD used the same K8 architecture for mobile, desktop, and server. Now, Intel is using the single C2D architecture for everything while AMD is splitting off a separate mobile architecture. Assuming that AMD can deliver a quality mobile chipset by mid 2008, Intel could find itself facing a much tougher mobile competitor with both a specialized mobile cpu and chipset. A reversal in battery life with AMD notebooks outlasting Intel notebooks could be a serious blow to Intel's mobile share. We will have to see if C2D will truly be able to cover all bases with just a chipset change or whether Intel will have to scramble to try to differentiate the mobile architecture once again.

We can also see that the battle for fair benchmark testing is still raging. Most people don't seem to be aware that with the dominance of the Intel Compiler that benchmark code is often poor quality on AMD processors. This poor quality code gives Intel processors an artificial boost in testing. Yet, there has been no call to have testing code compiled on the PGI compiler even when this code gives a boost to Intel as well.

Looking at this graph we can see that Core 2 Duo runs only 95% of its normal Intel Compiler speed when running PGI Opteron optimized code, however, it runs about 109% of its normal speed when using the PGI Unified Binary. One would think that getting a 9% boost in speed would be enough to make testers want to switch to PGI.
However, here we can see the main problem. While Intel Xeon only falls off slightly while running good Opteron code, Opterons take a much bigger hit while trying to chew through the poorly optimized Xeon code generated by the Intel Compiler. So, anyone who wishes to give Intel an advantage would not want to trade Intel's current artificial 19% advantage for a genuine 9% advantage. There doesn't seem to be any valid reason to continue to use the Intel Compiler to compile benchmarks unless the reason is indeed to tip the scales in Intel's favor. PGI code is 9% faster for Intel than Intel Compiler generated code and fully 19% faster for AMD than Intel Compiler generated code. It's time to stop the sham and make PGI the standard comparison compiler for all benchmark code. There is no doubt that Intel would oppose this since it would be losing not only its current artificial advantage but revenue from its compiler sales as well. However, one does wonder what would happen if all the popular review sites grew backbones and insisted on PGI benchmark code. Until this happens I'm not sure how we will ever know how Intel and AMD processors actually compare.

In conclusion, AMD's current position is terrible. There is no doubt that it would be better if 65nm had been released sooner or the new chipset had been out sooner or K10 had been out sooner. So, AMD is just going to have to bite the bullet until Q3 when things should improve. The chipset and graphic sales should be up by then and AMD should be fully anchored on the desktop with mini-DTX and DTX. These two should prevent any further erosion of the desktop although Intel will still control the top range. Barcelona should begin to take back at least the top end server sales (which are now going to Clovertown) and re-establish AMD's server chip ASP's. However, it will take until Q4 for AMD to be able to start hitting back at the upper desktop range and to re-establish its desktop ASP's. It doesn't look like AMD will drown before Q4 but it won't be much fun getting there. I've also seen suggestions from the Intel hopeful that Penryn will prevent an AMD comeback. I'm sorry but suggestions that Penryn is 30% faster are not exactly accurate. Penryn can be 30% faster than Clovertown in some circumstances because Clovertown bogs down badly with four cores. Penryn however is not 30% faster than Conroe (possibly 10%). And, in spite of Penryn's improvements, native quad is still more efficient than MCM at managing bus access. Unless Penryn gets a substantial boost in clock speed (which is possible) AMD will take back the lead on both desktop and server. Recent tests certainly seem to suggest that AMD will bump its K10 clocks by 100Mhz. However, other demonstrations suggest that AMD could hit 2.97Ghz in Q4 while Intel could pull 3.33Ghz Penryn into Q4. So, it looks like the actual leader in Q4 is going to depend on deliverable clock speed and that is a huge question at the moment since AMD has only admitted to 2.5Ghz while Intel has only admitted to 3.0Ghz.

AMD's position does seem fairly good going into 2008 since it still insists that 45nm is on track and will be ready six months after Intel's. Meanwhile, Intel has severely downgraded expectations of Geneseo while also announcing that desktop versions of Nehalem will not use an IMC. However, even with an IMC, Nehalem is not going to be able to match K10's DC 2.0 connectivity. There is also little doubt that until Nehalem does arrive, Intel is going to get hit hard in power consumption with its monstrous quad FSB northbridge for 4-way. Even with Penryn's lower power consumption Intel is going to have a tough time matching AMD. This is doubly true with Intel still using FBDIMM. I think it is safe to say that Intel is going to lose some of its server position in late 2007 and into 2008 and presumbably is then going to start pushing Nehalem with its IMC advantages. Likewise, Intel is going to lose some of its current desktop position and it remains to be seen what they will do about mini-DTX. Intel may have to create a new small form standard to compete. My guess is that Intel will fair best in the upper desktop and lower server ranges. AMD may have a tough time taking back single socket and dual socket server share although having a real quad core to compete will certainly help. I would say that AMD's strongest position comes from the ability of quad K10 to be a drop-in replacement for dual core on socket F and AM2. So, AMD could potentially improve its volume and revenue position substantially by end of 2008. However, Intel could block this if it can deliver much greater clock speeds with Penryn or possibly if it can get Nehalem hardware out sooner than 2009.

Friday, May 11, 2007

How We Got Here -- The X86 Industry In Perspective

In my last two articles I argued both sides of the Intel/AMD aisle to get a measure of the people who comment. With that perspective in hand it's time to get back to real posting. I'm going to look in detail at X86 development over the past twenty years to see how we got here and where we can go.

AMD has a very noticeable four year develop cycle. AMD released the RISC 29050 processor in 1990. AMD then reorganized the RISC team and started working on a new X86 processor. It appears that mostly what AMD did was take the 29050, strip down the register set, and put a micro-code X86 instruction translator on the front end. This effort took five years and the K5 was released in 1995. The performance was not spectacular but for AMD's first completely new X86 design it was a reasonable result. In fact, the K5's biggest problem was not its performance but its inability to achieve higher clock rates. This appears to be more related to process technology than chip architecture. Four years later in 1999 the K7 was released and four years after that in 2003, the K8 was launched. We are now looking at the K10 four years after K8. Clearly, this is a four year development cycle. K6 is an exception in 1997 but that was because it was created by the separate Nexgen design team. The pattern since K5 is a four year major architecture cycle with a smaller team doing updates in between.

It only took Intel three years to get from 80386 to 80486 but the bar since then has also been four years. It was four years from 80486 in 1989 to Pentium in 1993. However, the extraordinary thing about Intel is that it was running not one but three separate microprocessor design teams. The second team was working on the iAXP432/i960 line and the third team was working on the VLIW i860. There is no doubt that the i860 team was put on Itanium development. Presumably, the i960 RISC team was reassembled into the Pentium Pro team. We know that Pentium Pro had a RISC core and the development would have had to start in 1991 which was during the time that Pentium was still under development. The four year pattern at Intel picks up again with Williamette. It is four years from Williamette in late 1999 to Prescott in late 2003 but then five years to Nehalem in late 2008.

Williamette was an exception in that it appeared five years after Pentium Pro but, curiously, it wasn't really complete. Williamette's performance suggests that it was rushed, yet there should have been more than ample time in the five years to get the design right. Since Northwood was a good design we can probably work backwards four years to late 1997 which is fully two years after Pentium Pro was finished. The best guess is that Intel expected Itanium to take over the top end on the desktop. When they realized in 1997 that Itanium was going to be late they hastily assembled the Williamette team and rushed it out. Since Northwood's launch probably coincided with the 130nm process it is probable that the Williamette team was assembled in early 1997 and that Intel decided to skip 180nm for Northwood. Northwood just two years after Williamette

The Williamette team went on to create the Northwood and Prescott designs but then presumably abandoned two years worth of work on Tejas in 2005. This cancellation would explain the extra year to Nehalem which presumably would have been 2007 if Tejas had not been canceled. I have a very strong suspicion though that any of the Tejas work that was useful was applied to Nehalem II. Obviously this wouldn't be the original Nehalem but this could indeed mean that C2D version of Nehalem could be more than just a two year refresh of the Conroe Design. Apparently, a third (or fourth) design team was assembled in Israel in 1999 to work on Timna which was a cost reduced version of PIII. This was a good design but was sunk by the onboard RDRAM controller in 2001. This team apparently then began working on Banias which was released in 2003 as Pentium M. Banias too has a look of being rushed. The most logical conclusion is that the Haifa team was able to use a lot of the Timna information to put together a reasonably good design. There is no doubt that Banias is an upgrade of PIII but it reached a better state of polish with Dothan.

However, Conroe is a mystery. The quality of the design suggests the same four year cycle but this would imply that work began on C2D in 2001 when Timna was canceled. I've seen some suggest that Conroe was an upgrade of Yonah but this is clearly not possible; the gap between Yonah and Conroe is far too large and the time between releases is far too short. I've seen others credit the Conroe design as a derivative of Banias but this too is suspect because both Williamette and Banias had deficiencies when rushed and nothing about the Conroe design stands out as a deficiency. There is however one clue in Intel's history that is peculiar. We know that Intel specifically avoided patenting all of the tricks that went into Banias. Looking back at this fact I'm skeptical that the successor to the mobile versions of PIII would cause this much concern for Intel. It is a certainty that Intel was aware of Prescott's problems by 2002. However, it seems more likely that after design work on Prescott had been going on for more than two years that Intel was having doubts in 2001 and that the work on Banias was folded into a larger design project including Conroe. We could assume that Intel realized the problem with Prescott in 2002 and was impressed with the completed Banias design and then began working on Conroe however this wouldn't seem to explain Intel's unusual move in hiding the patentable characteristics and, again, the Conroe design seems far too polished.

Since the separate Penium M line has now been abandoned in favor of the C2D Merom, we would seem to need one less design team. However, it is hard to believe that Intel which has always had three design teams is now down to just two: the Itanium and the C2D line. We either have to assume that the added items like CSI are taking an entire extra team and two teams are working on C2D or perhaps that the third team was put on special projects like the TeraFlop chip. I find it unlikely that Intel cut out the third team as a cost saving measure since dropping this many engineers would surely have been noticed. If the third team is indeed working on special projects this could mean that Intel will have new surprises waiting in the wings to be added to X86 designs in the next few years. Also, as I've already mentioned, Nehalem may indeed have extra tricks in its design that have been inherited from the aborted Tejas work. An onboard memory controller shouldn't be a problem since Intel has already worked on this for Timna and obviously has been making full blown memory controllers for years in its chipsets.

In addition to the single main design team that AMD has maintained they have had another design team doing chipsets. It was four years from the 750 chipset to the 8000 chipset with the upgraded 760 and 760MP chipsets in between. The importance of chipsets for AMD is difficult to overstate. AMD's K6 processor was fully capable of dual socket operation. However, no chipset existed that would allow K6 to do this. It is also clear that without AMD's 750 chipset that it would have been nearly impossible for AMD to get support for K7's Slot A. It is also clear that without the 760MP chipset that Athlon MP's dual socket capability would have gone for nothing as it did on K6. And, there is no doubt that support for Opteron was nonexistent without the 8000 chipset. I think it is clear though that the effort of designing additional chipsets exceeded AMD's abilities and this ultimately led to the decision to purchase ATI. We know that a lack of a proper chipset prevented dual socket motherboards for K6 and that it was only AMD's in-house 750, 760MP, and 8000 chipsets that allowed proper support for K7, AthlonMP, and Opteron. AMD could have skipped the ATI purchase but we would have to wonder if AMD would be able to keep up with new chipsets to keep future chip capabilities from falling by the wayside. I think a similar argument can be made that Intel's primary motive in switching to an onboard memory controller is the geometrically increasing difficulty of designing the Caneland quad FSB chipset.

Finally, we have to wonder just how well it can work for both AMD and Intel to switch to two year design schedules instead of four. For every design that was newly created in the past there was a period of upgrade. Obviously one team had to be working on the new "clean" design while another smaller team concentrated on the upgrade. There is no doubt that for engineers the relative clean slate of a new design was more exciting and offered fewer limitations. It is also the case that a new architecture will demonstrate the largest improvement. In this regard it can be seen that the upgrade team is the perpetual underdog with far greater restrictions and less improvement. The two year design cycle requires dropping two separate teams and blending the upgrade and new design teams together. This makes the upgrade cycle more rewarding because there will be larger improvements. This does create greater restrictions for the new design team because they both have to work within the existing architecture and because not all ideas can be incorporated in the shorter cycle time. However, no really good idea needs to die because each good idea can potentially be included in the next upgrade or the one after. Apparently Intel has already moved in this direction while AMD has responded by making the CPU core modular. AMD's modular die allows some "black box" modification. In other words, AMD can always assign engineers to trouble areas and these engineers can make the necessary changes to that module without worrying about the rest of the die. This ability to isolate problem areas saves a great deal of iterative design work. There is no indication yet of modular design from Intel but then Intel probably has the manpower to deal with this and may prefer the greater flexibility of a non-modular design.

How we got here is one of the important ways of seeing where we are going. I have confidence in the K10 design because it matches AMD's historic four year design pattern. Likewise Nehalem at five years beyond Prescott matches a similar pattern. Obviously though, until Nehalem appears we won't know if any of the Tejas design work made it into Nehalem, nor do we know much about it other than the Integrated Memory Controller and CSI. However, given Timna and Intel's great experience with chipset memory controllers I can't imagine that the IMC would be a problem. Likewise, Intel's PCI-e initiative in 2006 should generate the necessary hardware to make CSI a reality. I suppose we will have to wait until AMD's K11 to see just how well the new modular approach is working.

My final thought is that the doubling of SSE speed by both C2D and K10 can't be very good news for the Itanium design team. Clock speeds shot upward from K6's 166Mhz to K7's 1000 MHz in just two years while Intel matched pace with Pentium II and then Pentium III. I believe there is no doubt that this more than anything caused Intel to miss its initial performance objectives with Merced Itanium. One has to wonder if a Nehalem with SSE4, IMC, and CSI might be a serious threat to the next generation Itanium. Nor is Itanium alone. Eventually, even IBM's mighty Power will come under siege from powerful but cheaper X86's.

Thursday, May 10, 2007

Intel Is Doomed!

I constantly get accused of being slanted/biased/pro/fan in favor of AMD. So, I'm going to try to see the world through the eyes of an unabashed AMD enthusiast. This is not an easy point of view to maintain though because Common Sense keeps getting in the way.

Intel is an evil company!

CS: There is no doubt that Intel has taken advantage of market conditions and even stepped a bit over legal boundaries in their struggle with AMD. However, you should be aware that Hallmark and Coke have also done this. Hallmark reps have been known to threaten to pull entire Hallmark displays if the stores didn't stop selling competitor's cards. Coke was well known for creating exclusive contracts not only with customers but with vending machine manufacturers as well. I can't exactly give Pepsi a pass on this either since, when Coke settled with them out of court, Pepsi readily joined in a dual monopoly targeting smaller bottlers like Royal Palm, Faygo, Shasta, and Canada Dry.

Just wait though, Intel will end up having to pay Billions to AMD in compensation when the lawsuit is resolved.

CS: The truth is that this is a very difficult argument to make. Not only was Northwood in 2002 a much more polished design than Williamette in 2001 but it successfully used the 130nm process. Some of AMD's revenue drop in 2002 should be related to its slower transition to 130nm. Also, it isn't clear that AMD's numbers represented genuine gains in marketshare. There is good evidence that AMD's share bounced up and down during 2001 and that the high in Q4 2001 was just temporary. There were also changes in purchasing patterns caused by the preparation for Y2K and reactions to the attacks on the World Trade Center. After you separate out temporary and external factors you are probably looking at a potential damage award of between $200 and $500 Million. Intel could pay this out of available cash without batting an eye.

But look at all the companies that used to be Intel exclusive and are now selling AMD!

CS: Yes but these companies go where the money is. For example, although Gateway and Dell now sell AMD, their product offerings are still dominated by Intel. HP has 50/50 Intel and AMD but HP has always sold AMD. In contrast, Tiger Direct who sold mostly AMD from 2002-2005 now offers mostly Intel. Likewise, Sun is now going to offer some Intel products. There is no evidence of any major backlash against Intel, merely some flexibility in vendors' product offerings.

Yes, but it is so liberating to know that SOI is less susceptible to overheating than bulk silicon like Intel uses.

CS: True but Intel countered this by putting in thermal throttling. There has been no evidence of heat related failure of Intel chips under normal conditions so this isn't much of an advantage for AMD.

Yes, but Intel's chipset monopoly is now broken.

CS: It is true that AMD's ability to offer factory branded chipsets and to target neglected areas will be more advantageous over time. However, it isn't clear how much time this will take. Nor is Intel likely to suddenly forget how to make its own chipsets. Intel will adapt to changing market conditions but is still likely to hold onto the biggest share of the integrated graphics market.

Intel was stupid to keep pushing its overheating hyperpipelined netburst chips.

CS: The Northwood Pentium 4's were the undisputed performance leaders during all of 2002 and most of 2003 with no problems of overheating. The P4 design however did not prove readily adaptable to 64 bit extensions or to higher IPC's. It seems likely that Intel could have avoided the overheating problems of Prescott if it had been able to leave out the 64 bit extensions. Remember that K8 was designed around a 64 bit architecture while Prescott had them poorly grafted onto the Northwood design. Finally, it should be noted that it took Intel about the same amount of time to replace Prescott as it took AMD to replace K5.

Intel has alienated a lot of customers.

CS: This is true however after seeing a big revenue drop in 2006 Intel has become much nicer to both potential and existing customers. This is no longer likely to be a big factor.

K10 will destroy Intel!

CS: If AMD experienced the same growth that it saw during 2004 and 2005 when K8 was leading it would end up around 30% marketshare. It is both foolish and impractical to suggest that AMD's total manufacturing capacity could make up for Intel's share. A transfer of 7% to AMD is possible but a transfer of, say, 15% is not.

But, Intel's expenses are much higher than AMD's; Intel couldn't survive a significant drop in revenue.

CS: Intel employs nearly 100,000 people, runs multiple design teams, and owns several world class 300mm FABs. If Intel actually did lose significant volume share there would be nothing that would prevent them from selling off one FAB and reducing the employee count by the same amount to cut expenses. It is simply not conceivable that any amount of volume share that AMD could take even with a third FAB could cause Intel to go bankrupt or drop below 50% volume share. One therefore has to conclude that Intel's position is secure through at least 2012. However, with this much lead time Intel could probably almost painlessly just cut back on future FAB upgrades to 32nm and 22nm and phase out older facilities without having to do anything so drastic as selling off a FAB.

AMD already has SOI and will move to immersion scanning before Intel. AMD's FAB was rated #1 in the world for the past five years and AMD uses the much more sophisticated APM production control software. Clearly Intel is trailing AMD.

CS: AMD's SOI has not yet created any real gap in performance between AMD and Intel. And, although AMD is moving to immersion first it is doubtful that this will slow Intel much when it does decide to move to immersion for 32nm production. And, while it is true that FAB 30 was rated #1 it is likely that Intel's FABs were also among the world's best. Again, AMD has not yet been able to show any real advantage in terms of more sophisticated production, higher yields, or significantly faster product cycles. There is also reason to believe that if APM does become an advantage for AMD that Intel will create its own version. It is wrong to view either AMD or Intel as somehow technologically backward; both companies represent what is likely the most leading edge and sophisticated chip production in the world. It could also be argued that it is far easier to have the top rated FAB when all of your efforts go into a single FAB versus many production FABs as Intel has. Finally, although AMD and Intel do approach manufacturing differently there has yet to be demonstrated any significant difference in results.

Intel will take a big loss on its inventory which is piling up.

CS: Intel's inventory has stayed pretty steady for the last four quarters. Unfortunately we don't know what Intel's inventory mix is or how much it should be. If we base inventories on cpu revenues alone then it is a bit higher than AMD's however if we base inventory on total revenue then it is actually lower than AMD's. And, it is likely during the last four quarters that older inventory has been replaced with newer.

Intel has fallen way behind in HPC.

CS: While it is flattering to have your chips powering the world's fastest supercomputer these sales are relatively small volume. It is far more important in terms of revenue to see HPC wins translate into server sales. It remains to be seen if this will happen for AMD during 2007.

I prefer to support the underdog.

CS: Personal preferences aside, AMD has to produce competitive products to keep making money and expanding share. Very few buyers argue for spending more money and getting less. Being the underdog is not likely to be enough.

Tuesday, May 08, 2007

AMD Is Doomed!

I constantly get accused of being slanted/biased/pro/fan in favor of AMD. So, I'm going to try to see the world through the eyes of an unabashed Intel enthusiast. This is not an easy point of view to maintain though because Common Sense keeps getting in the way.

It's so wonderful to be able to rely on a manufacturer with so much volume capacity and a steady production supply.

CS: AMD is now making processors in three different FABs (FAB 30, FAB 36, and Chartered) plus TSMC and UMC for Chipsets.

Yes, but it is so liberating to know that I can overclock the stuffing out of my wonderful C2D chips.

CS: Yes, this is great for the 1/10th of 1% of people who do this but it doesn't effect the vast majority of computer purchasers; factory stock clock speed remains the most important. Also, it has been suggested by die-hard overclocking Intel enthusiast Ed Stroligo that Penryn's 1600MHz FSB speed will put the brakes on overclocking.

Sure, but AMD is drowning in debt; they were stupid to buy ATI.

CS: Buying ATI was the only way to compete with Intel's in-house chipset advantage and AMD's debt was recently refinanced. Of course the debt isn't good and AMD needs to start making money but it isn't going to drive the company out of business in the next year.

Intel has been so open about C2D and their 45nm process; AMD has said almost nothing so they must be hiding something.

CS: AMD was just as secretive about K8. What reason does AMD have to sabotage current sales by promoting a chip that isn't available yet? Also, AMD doesn't have anywhere near Intel's advertising budget. Making a big splash at launch couldn't hurt.

But Intel has already demonstrated 45nm; AMD must be having a problem.

CS: AMD produced its own 45nm SRAM test wafers just 3 months after Intel. If AMD is going to release 45nm mid 2008 then these chips would tape out about mid 2007. How can AMD demonstrate a chip before it is taped out?

Intel is a full process generation ahead of AMD.

CS: Isn't a process generation 2 years? If Intel leads AMD by six months wouldn't this be ¼ of process generation? Also, AMD has indicated that it is going to move to 32nm in just 18 months while Intel is still looking at 24 months. This should make AMD and Intel about equal at 32nm.

AMD is going to be delayed because they are using immersion technology. Intel was smart to stick to the old dry process.

CS: IBM and AMD have been using a pre-production prototype immersion scanner for quite some time and are reporting the same defect rate as dry; there is no reason to assume a delay.

You can't believe AMD. Intel is much more reliable.

CS: You are suggesting that the company that lied about developing 64 bit technology for X86 (when it was already on the Prescott die) is more honest?

Intel is a far better engineering company; they will win just by being better.

CS: If Intel is so much better at engineering then why was Williamette uncompetitive with K7? Why did it take the Northwood revision for P4 to take the lead? And, why did they cancel Tejas, the original Nehalem, and Whitefield? Also, if Intel can win just by being better why did they bully motherboard manufacturers to delay supporting both K7 and K8?

You can't prove Intel did that.

CS: The K8 launch was not attended by a single motherboard manufacturer; why do you think that was?

Maybe they just knew how poor K8 was.

CS: You mean the architecture that would go on to dominate Intel's products for two years?

Well, if AMD is so great why don't they have quad core?

CS: This isn't really that much different than when AMD released the excellent X2's and all Intel could offer in response were the Smithfields which ran slow and hot. It took Intel about the same amount of time to reply with C2D as it will take AMD to reply with K10.

Yes, but Intel's MCM approach is brilliant; it is much more flexible and economical than AMD's plan to put four cores on one die.

CS: True but MCM also has poor scaling. With four cores, Kentsfield is really only getting the equivalent of three. This is one of the reasons why AMD is confident that they can match Intel in quad core.

AMD will never be able to keep up. Intel has tons of unused clock ceiling.

CS: Then why is Intel bumping the thermal limits on Kentsfield at 3.0GHz with a stock HSF? And, why is Intel waiting until 45nm to release anything faster?

That's . . . just a coincidence. Intel has always had better process technology.

CS: You mean like back in 2000 when their PIII couldn't keep up with K7?

That was a fluke; just look at history!

CS: Okay, Intel's first boom period started when they introduced the ATX motherboard standard. This ended when AMD released K7 four years later. Intel's second boom period began at the beginning of 2002 with the release of Northwood on 130nm and ended at the beginning of 2004 when Prescott had to be released as Celeron D. AMD's first boom period lasted just one year in 2001 when PIII was unable to clock and Williamette wasn't quite competitive with K7. However, AMD's second boom period started in 2004 and ended when Intel released C2D in 2006. So, Intel's booms have a downward trend from 4 years to 2 years while AMD's have an upward trend from 1 year to 2 years. It seems likely now that neither company will be able to maintain a lead for more than 1 year in the future. This would suggest that Intel's current boom will end with the introduction of Barcelona but will probably return with the release of Nehalem.

Hah! Intel will hold onto the lead with faster clocks; AMD can't keep up with 2.5GHz.

CS: The Inquirer suggests that AMD will release quad core at 2.9GHz.

That's just FUD.

CS: Well, 2.5GHz would have been fine when Intel's fastest quad core speed was 2.66GHz. Isn't it likely that Intel's recent boost of quad core to 3.0GHz also encouraged AMD to match with 2.9GHz?

Even if it's true, it won't matter; Intel will bury K10 with Penryn.

CS: Aside from SSE4, isn't Penryn just a Merom with more cache and a faster FSB?

Maybe, but Nehalem is a whole new architecture.

CS: I've already said that Intel may regain the lead with Nehalem but isn't it true that this isn't quite a new architecture in the old sense?

Of course it is; what do you mean?

CS: Well, it took four years before to design a new architecture. Intel is moving to Nehalem just 2 ½ years after C2D so it doesn't seem likely that it is a whole new core but more of an upgrade. Also, won't Intel encounter the same upgrade hurdle with Nehalem with a new socket that AMD is running into now with socket 939 versus AM2? And, with it's modular core design AMD should be able to deliver its own upgrade to K11 in 2009.

Oh, come on! Everyone knows that people prefer C2D. Intel's ASP is almost double that of AMD's.

CS: Yes, but if you look at HP's website there doesn't seem to be any real difference in the prices for actual systems. I remember back in 2002 and 2003 when Intel systems commanded a price 10-20% higher than AMD systems. And, HP offered more Intel systems than AMD. Today, the price looks about the same on both desktop and mobile, and the number of systems offered is identical on both desktop and mobile. Most likely the sharp drop in AMD's ASP was due to competition from discounted P4's and AMD's desire to hold onto its volume share.

Yeah, like that worked. AMD lost tons of marketshare. They lost everything they gained in 2006.

CS: Well, AMD took a big hit in revenue, no doubt. But, I haven't yet seen volume numbers. The only comment I've seen suggests that AMD's volume drop was only about 1/3rd of the revenue drop.

Intel will bury AMD with low prices.

CS: AMD's new DTX motherboard standard seems to be pretty popular and it was designed to reduce costs on desktop systems.

AMD needs more layers to make its chips; Intel's chips will still be cheaper.

CS: But perhaps not at 45nm. Doesn't Intel's use of the old dry process require multiple passes that make up for AMD's extra layers?

AMD is losing money now and it will just get worse as Intel ramps C2D.

CS: There is no doubt that AMD was at a big disadvantage with half of its production still at 90nm on a 200mm FAB but now as AMD ramps both 65nm and 300mm production its costs should drop. Also, as Intel phases out P4 how can it maintain lower prices without dropping its ASP?

Intel has quad core and 45nm production is a lot cheaper.

CS: Yes, quad core was an advantage but that advantage is lost once Barcelona is released. Also, the last estimate I saw was that 45nm production would account for less than 1% of Intel's volume in 2007; how much could that help with costs?

Yes, but Intel has Apple and their marketshare isn't even included in the PC numbers.

CS: Apples' desktop sales have been stagnant for the past three years. And, they seem to be charging a premium price for no additional features. For example, for the price of a MacBook with 15” display you can get an HP notebook with a 17” display, twice the harddrive capacity and still save $500. You could easily save $1,000 dollars over the price of MacBook with a 17” display. At these prices I would seriously doubt that very many parents are buying these for their kids. This seems more like the market for middle aged professionals who have been using Macs for awhile and don't want to switch. However, I'm positive that there is going to be increased pricing pressure on desktop systems and even on notebooks now that Intel doesn't have the market cornered with Centrino. I just don't see the average buyer continuing to shell out a premium price for Apple computer systems over the next two years. My guess is that Apple has peaked as a computer company and is now shifting to consumer electronics.

AMD is just a one trick pony!

CS: Intel had mobile 486 chips. AMD didn't get into mobile until K6 but caught up to Intel completely with Turion. Intel had the powerful Pentium-Pro server chip. AMD didn't get started in servers until Athlon MP but had caught up in servers by Opteron. On the desktop, AMD was way behind Pentium with K5, was briefly faster than PII with K6, outpaced both PIII and Williamette for a year with K7, led Prescott P4 for 2 full years, and now seems poised to regain the lead with K10. This seems like more than one trick.

AMD was so arrogant; they just sat on their hands for four years.

CS: Northwood P4 was introduced in 2002; that's four years to C2D's launch in 2006. AMD released K8 in 2003 and that is also four years to K10 in 2007. Was Intel just sitting on its hands for four years? It seems clear that AMD made a mistake or two during those four years and lost about a year while it shifted direction to K10. Intel made mistake after mistake with Tejas and then Whitefield but got lucky because the Israeli team had an architecture that could be beefed up into C2D. The problem is though that while Intel used to run with the completely separate Pentium M architecture and the substantially different Xeon architecture they have now cut back to just a single architecture. Woodcrest today is almost identical to Conroe just as Opteron is to Athlon 64. Intel has lost its former advantage of having multiple research projects to reduce risk. Today, in this regard, it is equal with AMD.

But, everyone knows that Intel is the best.

CS: If Intel is the best then HPC is very curious. Intel was dominant in HPC with its 32 bit Xeon servers. However, it has lost ground with 64 bits until, today, Intel only has a fraction of AMD's presence in the most power supercomputers. One has to wonder why there haven't been a flood of Woodcrest based supercomputers announced.

But, but, Intel has to win because . . . they are Intel!

CS: It seems that may no longer be enough.

Wednesday, May 02, 2007

K10 -- What Hasn't Been Said

AMD has released a few hints about K10 (Barcelona). This has left people like Ed Stoligo and George Ou wailing and waving their arms trying to figure out what the numbers mean. The shrieking from Mr. Ou has been particularly shrill as he desperately tries to rationalize his belief that Intel will stay in the lead. Unfortunately, what George doesn't seem to realize is that these numbers are correct but not based on K10.

AMD has estimated benchmarks on its website. These include two graphs; let's look at SPECfp_rate2006.

What is most interesting is the quad core Opteron numbers. One would think that this would refer to K10 (since it is quad core) but it doesn't. The QC Opteron score is obtained by taking the Opteron 2222 SE score and doubling it (twice as many cores) and then adjusting it from 3.0Ghz to 2.6Ghz. Xeon 5355 (Clovertown) scales very poorly at only 75%. This is why the theoretical QC Opteron has such a lead. The lead is in fact a 50% higher SPECfp_rate2006 score at the same clock. Yes, this is the same 50% number that George desperately describes as "inflated" and which makes Ed so uncomfortable that he cuts it in half. Sorry, but this number is real; you just have to know where it came from. Back in January, AMD knew that the initial clock for K10 would be 2.5Ghz and they figured that Intel would have a 2.66Ghz Xeon. So, if QC Opteron is 50% faster at the same clock then this drops to 40% for a 2.5Ghz QC Opteron versus a 2.66Ghz Xeon. The SPECint_rate2006 score is 20% at the same clock. These are the numbers that AMD executives have mentioned in interviews. None of these are K10.

This is not complicated at all if you just look at it with a little common sense, but this seems to be what George and Ed have so much trouble with. Back in January, the estimated clock for Xeon Clovertown was 2.66Ghz but recently Intel announced 3.0Ghz. So, AMD changed their statement from 40% faster than the fastest to 50% faster at the same clock. This is neither inflation nor reduction; it is exactly the same number. The numbers for fp_rate are pretty good because they were all done with SUSE Linux. Intel's scores are even using its own compiler. The int_rate numbers are not as good because the Intel scores use Windows while AMD's still use Linux. We'll assume for the moment that these are accurate but the Intel numbers could actually higher. Let's adjust this based on what we know now.

fp
1.5X faster at the same clock, K10 at 2.5Ghz and Xeon at 3.0Ghz.
1.5X * 2.5Ghz /3.0Ghz = 1.25 (25% faster)

integer
1.2X faster at the same clock, K10 at 2.5Ghz and Xeon at 3.0Ghz.
1.2X * 2.5Ghz /3.0Ghz = 1 (0% faster)

This sounds very much like what Ed was saying and we might be tempted to give him some credit except that he never realized that the scores are for projected QC K8 rather than K10. I also don't feel like giving George any credit either even though he does make some correct calculations in his article. It is particularly annoying that George makes sure to get in the phrases "Barcelona hype" and "outrageous promises" at the beginning of his article and only gets around to saying "AMD's Barcelona might end up with a slight lead" down toward the bottom. I notice too that George makes sure to pump up Intel's numbers for higher clock and FSB speed but he too fails miserably to notice that AMD's numbers are for K8 and not K10.

So, let's continue where Ed and George got lost. K10 should be faster than K8. There are a lot of changes but the tricky part is to figure out what would allow four cores to not bog down and what would actually increase IPC. We should assume that the split memory bus, enhanced bus scheduling, increased cache, shared cache, and changes to load/store logic simply allow four cores to not get bogged down. Afterall, we are trying to stuff two more cores on the same bus. However, we know that K8 is not currently using all of its bandwidth and these things should give it a bit more room. Unfortunately, Clovertown cannot use enhanced scheduling because the two dies are separate. We also know that even the Revision F K8's can use as high as DDR2-800 and (from what AMD has said) this will probably be bumped to DDR2-1033 with K10. In other words, while Clovertown will be at 1333Mhz, K8 is already at the equivalent of 1600Mhz and should go to 2066Mhz when Penryn is at 1600Mhz. However, K10 has more improvements than this. We can probably assume that cache bus doubling, prefetch doubling, increased fast path instructions, improved branch prediction, and dedicated stack hardware will actually increase the speed per core. We also know based on what Intel has said that Penryn is 10% faster in integer at the same clock.

fp
1.5X faster at the same clock, K10 at 2.5Ghz and Xeon at 3.0Ghz. We allow 10% for Xeon with higher FSB speed and 10% for K10. The two 10% cancel so no change.
1.5X * 2.5Ghz *1.1 /(1.1 * 3.0Ghz) = 1.25 (25% faster)

1.5X faster at the same clock, K10 at 2.6Ghz and Penryn at 3.2Ghz. We allow 30% for Penryn with SSE4 and 10% for K10.
1.5X * 2.6Ghz * 1.1 / (1.3 * 3.2Ghz) = 1.03 (3% faster)

integer
1.2X faster at the same clock, K10 at 2.5Ghz and Xeon at 3.0Ghz. We allow 5% for Xeon with higher FSB speed and 10% for K10.
1.2X * 2.5Ghz * 1.1 / (1.05 * 3.0Ghz) = 1.048 (5% faster)

1.2X faster at the same clock, K10 at 2.6Ghz and Xeon at 3.2Ghz. We allow 10% for Penryn and 10% for K10.
1.2X * 2.6Ghz * 1.1 / (1.1 * 3.2Ghz) = 0.975 (3% slower)

If K10 gets the same 10% boost in speed then we come up with some interesting numbers. Basically, in integer K10 at 2.5Ghz would be perhaps 5% faster than Xeon at 3.0Ghz and K10 at 2.6Ghz would be 3% slower than Penryn at 3.2Ghz. In fp, K10 at 2.5Ghz would beat Xeon at 3.0Ghz by a pretty fair margin. However, at 2.6Ghz K10 is still a tiny 3% faster than Penryn at 3.2Ghz even if Penryn uses SSE4. So, taking us all the way up to about mid 2008, it looks like we have mostly a tie although it looks like Opteron may have a temporary lead in fp. These numbers are because of reduced scaling with quad core Clovertown but dual core would be better. On the other hand, K10 will clock to 2.9Ghz with dual core. This should put K10 dual core just a little bit more ahead but not much. I assume Intel will bump clock speeds again around mid 2008 but then AMD also moves to 45nm. This is probably about the best educated guess of speeds we can come up with before AMD releases some real benchmarks.

Scientia's Blog