Thursday, September 28, 2006

IDF, AMD And Intel In Perspective

Intel currently has the leading performance in desktop hardware. AMD is woefully uncompetitive versus Intel's E6600, E6700 and X6800. The Core 2 Duo architecture strips AMD's top FX chip from being a front line enthusiast's processor to one merely competitive with Intel's third highest speed grade. Essentially, AMD's best processor is mid-range for Intel's Core 2 Duo. Being not one but two full speed grades ahead puts Intel in the peformance lead more solidly than it has been since 2002. If this were the only thing that mattered AMD's position would be rather desperate today. However, AMD has chips that are competitive with E6300 and E6400 and they have the X2 3800+ which is cheaper than E6300. Likewise, Intel has no advantage in the Celeron range which is matched by Sempron. In terms of servers, Intel's Woodcrest is roughly a match for Opteron in single or dual socket. Woodcrest cannot do 4-way and in this area the older Prescott based cores just don't match up to Opteron.

The information from IDF is mixed. Unfortunately, when Intel feels pressed it tends to toss out future technologies with abandon and the show ends up looking like Futurama II from the 1964 New York World's Fair. Unfortunately, people may see these technologies as real rather than the very forward looking demonstrations that they are and forget that Intel has a habit of canceling and scaling back far reaching projects. It takes a special type of mental contortion or extreme forgetfulness to suggest that the same company that recently delivered a scaled back Monticeto late will be delivering processors based on photonics anytime soon.

Although Core 2 Duo is a success it also has to be seen as Intel's failure. Until recently, AMD has been too small to work on multiple processor projects. So, K8 was a one-size-fits-all strategy using the same core for servers, desktops, and notebooks while Intel supported three entirely different architectures. That Intel has switched to a similar one-size-fits-all strategy and is now using C2D for everything while also scaling back work on Itanium shows that Intel was unable to adequately manage all of those projects. Their project management now has been essentially cut in half. They've scaled back from three separate architectures to about one and a half. This is even true in terms of server chips where Xeon used to have many modifications compared to P4. Xeon had 36 bit addressing extensions, plus L3 cache, plus multi-chip support. However, Woodcrest in servers today is much more like Conroe on the desktop than Xeon was like P4. This again suggests that Intel's success with C2D is primarily a function of putting most of its engineering efforts into a single project. Curiously, this happens at a time when AMD is splitting its line into two separate cores for notebooks and desktops/servers. If AMD truly gains an advantage from this, Intel will have no choice but to match with a specialized notebook core of its own to protect its current Centrino market. Unless Intel can do something to overhaul its bureaucracy this would put it back into the same sinking boat it just jumped out of. Similarly, Intel would like to avoid mentioning that AMD's new instructions like POPCNT are specifically to compete with Itanium in the 4-way and up market. Scaling back on Itanium design doesn't really fit with this reality, especially at a time when Woodcrest is limited to 2-way and Presler cored Xeon's are outdated. This means that Intel's entire current server strategy is either due for scale back (Itanium) or end of life (Presler and 5000 chipset).

Apparently, lightening can strike twice as Intel has essentially just repeated the failure of RDRAM with FDMIMM. With Intel's massive pullback of FBDIMM and AMD's cancellation of future plans to support it, FBDIMM is essentially End Of Life upon release. In some ways this is good because it is primarily FBDIMM that is holding Woodcrest back in terms of power draw and performance versus Opteron. Without FBDIMM, Woodcrest should pull somewhat ahead. However, this means scrapping Intel's brand new 5000 series chipset and it will have to scramble to come up with a replacement. Apparently, Intel's real strategy will start Q2 07 with the Bearlake replacement chipset for 5000. This will put a dent in Intel's bottom line because it has spent a lot of money on this technology and will have to continue to subsidize it somewhat for the customers who are buying into it now.

Intel's information at this IDF has been significant both for what was said and what wasn't said. There was no mention of CSI (Common Systems Interface). CSI was supposed to have been Intel's answer to AMD's HyperTransport. It had been suggested that CSI would be released in 2008 and even as early as 2H 07. However, given the extremely long ranged technologies mentioned, like photonics, the silence on CSI is suprising to say the least. If we combine the lack of information about a CSI release with the new initiative to license the Intel FSB (Front Side Bus), and the Geneseo upgrade to PCI-E, we wind up with a picture without CSI. This suggests that Intel's next core release in 2008 will use the current FSB and will not have CSI. More than anything else this leaves Intel without a solid architectural foundation.

It is true that, on the face of it, the FSB licensing and the Geneseo PCI-E standard would be similar to AMD's Torrenza. The proposed speed for PCI-E 2.0 is similar to HT 3.0 and HTX. However, these two are quite different. If a coprocessor plugs into an AMD socket it can communicate with the processor using the same protocol, HT, that it would use if it plugged into an HTX slot. However, there is no similarity between Intel's FSB and PCI-E. Manufacturers can create enhanced PCI-E products but these would not be adaptable to the FSB. It appears that PCI-E will not include a cache coherency protocol whereas HTX could. PCI-E also suffers latency because it has to jump through a PCI-E hub of some sort whereas HTX can connect directly to the processor. Another big difference is that HT 3.0 and HTX will be in systems before Geneseo is even finalized. Also, if Intel were releasing CSI in 2008 then it would have made sense to have folded CSI into the Geneseo initiative and skip the FSB licensing. Geneseo with CSI would be very competitive with Torrenza. Geneseo without CSI is little more than a PCI-E upgrade and not true competition for Torrenza.

It is clear that Intel will not abandon the FSB because there would be no point in licensing a FSB that was going to be dropped and there would be no reason to waste money developing products for a FSB that would be gone by 2008. Therefore, we must assume that Intel is not dropping the FSB. This also has to mean that Intel is not releasing CSI in 2008 and is not following AMD's lead to an onboard memory controller. In fact, the announcement of a GPU built into the Northbridge seems to further bolster this assumption. All of the evidence points to the conclusion that Intel's FSB will be around for several more years. The big question is why. Obviously if Intel can build a memory controller as part of the Northbridge then they could certainly include it on the cpu die. In fact, Intel's Bearlake chipset shows that they have a DDR3 controller ready to go. Using an on-die memory controller requires a separate bus for I/O and interprocessor communication. Presumably, with Intel's experience with PCI-E they have most of the necessary protocol down as CSI is primarily an upgrade of PCI-E. Intel would also need to use a protocol like MOESI instead of MESI as they use now. However, AMD made this change in 2002 on the Athlon MP and there is no reason why Intel could not follow. The only conclusions left are two things. First of all, IBM made a substantial investment on a scalable Northbridge for Intel processors. It may be that Intel wants to maintain compatibility. Intel blundered with RDRAM and has now blundered with FBDIMM. However, it will be able to fix this by releasing another chipset fairly soon. If the memory controller were on the die this would take longer to fix. The biggest reason may be that Intel sees a limitation in speed with an onboard memory controller. If two channels are the practical limit for an onboard controller then potentially the bandwidth could be increased by increasing FSB speed and using more than two controllers on the Northbridge.

The TeraFlop chip was not particularly impressive. These were essentially lightweight processors and IBM has already seen a limit in using Cell. A machine based soley on Cell would be woefully inadequate for general processing as well as any type of computation that exceeded the small memory space contained in Cell. Consequently, IBM used a hybrid design with both Cell and Opteron so that Opteron could handle both the general processing and complex computation loads. Cell is used for small, parallel computation. Pursuing a lightweight processor design that is lower than Cell is unlikely to create anything beyond specialized coprocessor technology. It is also not clear who Intel woul partner with for this technology since they no longer build supercomputers of their own. The only two obvious partners, IBM and Cray are currently pursuing other directions. It is possible that Intel only intends this to be used as a coprocessor with its own procesors. However, even at that, the time frame to deliver seems too long as there will likely be several AMD compatible coprocessors in use by then.

What was not explicitly mentioned in the photonic description was that the only way to multiplex fiber optics is to use separate colors of lasers. This means that having eight channels would require eight separate lasers. There are problems with discriminating the channels while maintaining enough light intensity to carry the signal, as well as problems with having true, monochromatic light from diode lasers. This is not really a near term technology.

Intel showed technology that will not be ready for two or more years. Meanwhile it seems to be abandoning any attempt to create an onboard memory controller with distributed memory and a separate point to point interface. AMD has plans to move to Direct Connect Architecture 2.0 which will enhance their current lead in 4-way and beyond configurations. Intel's direction is very puzzling as it appears to leave Intel with no hope of catching up to AMD in the 4-way and above market. IBM does have a linkable Northbridge that can use the current Intel FSB but this would seem to leave Dell and Gateway with no choice but to use more AMD servers for 4-way and higher. However, the curent memory situation is not ideal for AMD. AMD needs to have low latency memory and unfortunately latencies have increased with DRR2 and will increase again with DDR3. This does not hurt Intel as much as having a large cache can somewhat offset the effect of increased latency.

The current outlook for desktops is unchanged. The Intel FSB has enough capacity to handle single processor systems and even dual processor systems using a dual FSB Northbridge. It is not clear that Intel has a path to adequately reach quad processing systems however these will probably not be a factor on the desktop for several years. AMD's desktop outlook is good once K8L is released in 2007. AMD's 4-way and higher outlook is much better than Intel's due to the switch to Direct Connect Architecture 2.0 in 2008. It is not clear whether Intel is simply giving up on the 4-way and higher market or whether it only has plans to pursue this with the Itanium family. For single and 2-way systems there does not appear to be any current way for AMD to gain a lead over Intel. DIMM speed seems to be the limiting factor. For example, AMD's memory controller on Revision F is capable of handling DDR2 800 memory however memory this fast is not yet available. It may be in AMD's interest to see about creating its own DIMM initiative such as using TTRAM or TTRAM caching on top of DDR DIMMs. It would also be helpful for AMD to create a new DIMM standard such as the HTDIMM that I wrote about earlier. These two technologies are not exclusive. HTDIMM is a configuration for communication and fanout on DIMMs whereas TTRAM or TTRAM caching would be the underlying storage technology. In other words, TTRAM would be technology on the DIMM for storing and retreiving data faster whereas HTDIMM would be technology for communicating with the processor faster.

It appears that AMD is likely to catch Intel with K8L both in terms of dual and quad core. In the near term, 4x4 should make AMD competitive again in the FX range. In the longer term AMD does not seem to be able to gain an advantage over Intel as long as DIMM speed is a factor. However, for servers, it appears that AMD will simply leave Intel behind on 4-way and higher. This IDF had to be very disappointing to anyone looking to Intel for future x86 based server technology.

Monday, September 25, 2006

Anandtech Melts Down

Anandtech used to be a good and honest website. However, since 2000 the opinions of Anand Lal Shimpi have changed nearly 180 degrees. This change in viewpoint has been accompanied by a similar change in quality and integrity.

It is difficult to place any crediblity in a website that says two completely different things. For example, Tom Pabst at Tom's Hardware Guide said in 2000 Intel Admits Problems With PIII:

  • On Intel's VC820 platform Sysmark 2000 crashed consistently. I was unable to finish even one run of Sysmark with this CPU and I certainly tried about 20 times. As soon as I plugged a Pentium III 1 GHz into the system the benchmark would run all the way through.

  • The most consistent error I got however was with my timed Linux kernel compilation. Even on the VC820 the Pentium III 1.13 GHz was utterly unable to finish the compilation even once. All other CPUs I used finished the compilation without the slightest flaw.

  • Interestingly, stress tests as Prime95 or CPUburn under Windows98 would not get my 1.13 GHz processor to fail on the VC820.
Today, however, in Green Machine Test THG says:

This also lets us to limit the scope of this article to measuring power consumption at maximum and minimum CPU load, using our Prime95 torture test.

This is why Tom's Hardware Guide has little credibilty today. Unfortunately, Anand has done the same thing. For example Anand also used to criticize Intel on their paper launches.

Prior to Intel’s downward spiral, AMD would be the one we would accuse of “paper launching” processors, since you could never find a newly “released” AMD CPU until after its launch. Intel’s policy was exactly the opposite, upon the introduction of a new CPU, systems based on that CPU would be available the very same day.

Since the release of AMD’s Athlon, things have changed. Slowly but surely the roles of the two companies have reversed, now, Intel is the one being accused of “paper launching” processors while AMD CPUs are readily available and definitely affordable. These “paper launches” were at their worst with the release of the 1GHz Pentium III (March 2000) before the 850, 866 and 933MHz Pentium IIIs in an attempt to compete with AMD’s 1GHz Athlon that was released just days before. What began to make the community characterize Intel’s CPU releases as “paper launches” was the fact that you couldn’t actually go out and buy a 1GHz Pentium III whereas, by the end of the month, the Athlon was already available in speeds from 500MHz up to 1GHz in 50MHz increments.

Yet, Anand's criticism of this had entirely vanished when Intel sent out its P4 EE for review in 2003. The P4 EE was specifically sent out to compete with AMD's FX review but Intel didn't actually deliver it until months later in 2004 while the FX was available shortly after. Today, AMD's chips are always available the day of release while Intel's don't show up for as much as three months.

Anand's point of view has completely switched. Today he no longer criticizes Intel for delivering chips late after "release". He began being biased against AMD some time in 2002 when he began complaining about the late release of K8. When Athlon 64 was released in September 23, 2003 he said:

Fast forward to almost two years and the Hammer is just finally being released on the desktop as the Athlon 64 and the Athlon 64 FX. AMD has lost a lot of face in the community and in the industry as a whole, but can the 64 elevate them back to a position of leadership?

AMD has also priced the Athlon 64 and Athlon 64 FX very much like the Pentium 4s they compete with, which is a mistake for a company that has lost so much credibility. AMD needed to significantly undercut Intel (but not as much as they did with the Athlon XP) in order to offer users a compelling reason to switch from Intel. However, given the incredible costs of production (SOI wafers are more expensive as well) and AMD's financial status, AMD had very little option with the pricing of their new chips.

What Anand is complaining about is that AMD originally had K8 listed on their unofficial roadmaps as being released 1st Half 2002. However, just one month later K8 had moved to 2nd Half 2002. It was actually released Q2 2003. These roadmaps are not official documents so complaining about changes seems a bit silly. It is remarkable too that these types of complaints would be made at all because both Tom and Anand had the opposite view for Intel in 2000. Both said that it would have been better for Intel to have delayed the release of the PIII 1.13Ghz chip rather than shipping a defective product. Yet when AMD delayed launch of K8 to ensure quality and availability on the new SOI process Anand was critical.

Anand was also critical because AMD had planned to release the desktop Clawhammer first and then the server Sledgehammer later. Anand didn't like it when AMD switched and released the server version first. This is why he makes a point of saying "on the desktop" when Opteron had already been out for months. However, this contradicts what he said in May 14, 2002 after the release of Athlon XP(4):

The MP server market is a very lucrative business for AMD to get into since the profit margins are so high, just look at the profit margins off of Intel's Pentium II Xeon and Pentium III Xeon parts to see the potential for AMD there. However the Athlon 4 will only be a stepping stone for AMD into this market; AMD's 64-bit solutions will truly be the ones to lead the company in this area.

His criticism on price didn't make any sense either. Just a few years earlier he was concerned that AMD would be hurt by too low of a price as he said in October 17, 2000:

we were afraid at the end of 1999 that Intel would begin to compete with the Athlon in a price war, something which AMD, being a smaller company than Intel would have some serious problems with.

His criticism is even more ludicrous considering that AMD had problems with profitability all during 2002 and into the beginning of 2003. Yet, presumably he wanted AMD to sell its best and still low volume K8 at a bargain price. K7's were still the main chip even two quarters after this article was written.

In contrast, his views in 2003 have become much more optimistic about Intel and don't change even when his optimism is unwarranted. For example, both he and Tom believed that Itanium would be a desktop processor and would compete directly with Opteron. There was no criticism when this never materialized. There was no criticism when Tejas was canceled. No criticism when Whitefield was canceled.

Anand was optimistic about Prescott. In February 1, 2004 he said:

Prescott becomes interesting after 3.6GHz; in other words, after it has completely left Northwood’s clock speeds behind.

Yet, he took it in stride when Prescott topped out at 3.8Ghz.

I have to admit that I find this one particularly interesting because nearly a year earlier in 2003, I had said that I didn't believe that Intel could add another generation onto P4. At that time, everyone that I knew of was saying the same thing as Anand, that Prescott would be great, that it would clock as high as 5.0 Ghz and put Intel back into the lead. I don't recall anyone besides me who doubted Prescott before its release. My crystal ball has been pretty good since 2003. And, none of the big websites has had a track record anywhere near mine. That often amazes me because the big websites should have much more information than I do. I don't know what the reason for this would be unless a pervasive bias leads them to consistently overestimate Intel.

Anand Lal Shimpi reached his personal low when he put his name on Spring IDF 2006 Conroe Preview: Intel Regains the Performance Crown . In this article he tosses away whatever ethics he had remaining and essentially becomes a spokesperson for Intel. Both the Intel Conroe system and the AMD FX-60 system were built by Intel. Intel would not allow Anand to look inside the case or even look at the BIOS settings. They would not allow him to bring any of his own benchmarks and only let him use what they had installed. Yet, based on this entirely controlled Intel testing, Anand, nevertheless proclaims, "Intel Regains the Performance Crown". Gone is Anand's once strong critism for Intel processors that were not available four months after review. Instead Anand cheerfully comments, "keep in mind that we are over six months away from the actual launch of Conroe, performance can go up from where it is today."

Anandtech's credbility as a whole has continued to deteriorate since 2002. Today, they too use THG's highly unethical technique of comparing overclocked Intel chips against stock AMD chips as they do in Intel Core 2 Duo E6300 & E6400: Tremendous Value Through Overclocking. A fair comparison would have been to include overclocked X2 3800+ and 4200+ but this was not done.

However, the latest sad chapter in Anandtech's increasing bias and incompetence was this comparison of Woodcrest, Opteron, and Sparc.

The Intel Woodcrest system used the excellent Intel Server Board S5000. This motherboard uses the robust Intel 5000 dual bus chipset. This chipset gives each processor its own Front Side Bus to the Northbridge. This allows each processor to have excellent memory bandwidth.

The AMD Opteron system, however, used the MSI K8N Master2-FAR. This choice of a motherboard for AMD shows either extreme incompetence or an outright attempt to cheat in Intel's favor. From the time of Opteron's release in 2003, they have always had independent memory buses for each processor. However, this is a new thing for Intel and has only been available since late 2005. It makes sense that Woodcrest would use Intel's best dual bus chipset. However, the motherboard chose for Opteron was not (and still isn't) approved by AMD for use in servers. In spite of the fact that this board has two sockets for Opterons, it only has a single memory bus. In other words, Intel's chips got the newest and best Intel dual bus motherboard whereas AMD's chips which have always had dual buses were put into a stripped down, single bus motherboard. This forced the two Opterons to share the single memory bus and greatly reduced the speed of the second processor. This comparison pretty much stripped Anandtech of whatever shreds of crediblity they had left after Anand's participation in Intel's promotion of Conroe.

Friday, September 22, 2006

Tom's Hardware Guide Sells It's Soul

The professionalism and objectivity of a website is reflected in its general tone. If you find that a website tends to treat various manufacturers equally then it is reasonable to assume that their testing is equal as well. However, if a website always seems to have a positive outlook for one company regardless of their actual products then that website is probably also putting the same positive spin on testing.

It is always a surprise to think that a once respected source has changed. I used to think very highly of Compute! magazine. However, Compute! was bought by ABC publishing and their reviews changed to reflect their desire for advertising revenues. This was made very clear when Compute! reviewed a new word processing and layout application called "Outrageous Pages". Their review was positive and everything seemed to be fine. However, Info magazine reviewed the same software and had a completely different view. They said the software was too slow and hard to work with and they couldn't recommend it. The truth of which magazine was being honest was revealed when the software release was canceled. Somehow, Compute! had stopped being a good, objective source of information and had lost its credibility. Unfortunately, today, this is also true of Tom's Hardware Guide.

The differences are not hard to see. There was no sign of bias in Tom's review of the Athlon in 1999. New Athlon Processor

There is no sign of bias in Tom's comments about the Intel 1.13 Ghz PIII.

The very worst thing in terms of prestige damage happened back in Spring 2000, when AMD was the first x86-processor maker to introduce a CPU that runs at 1 GHz = 1000 MHz clock speed. Big Chipzilla countered with the release of the Pentium III at 1 GHz two days later, but this CPU was so unavailable that not even the press was equipped with any samples. Today, some four months later, the Giga-Pentium III is still hardly available anywhere

While the normal users out there might not know about this, people in the hardware reviewing scene are well aware of the fact that AMD has shipped their 1.1 GHz Thunderbird samples to publications already weeks ago, while Intel was just able to get the rare samples of the Pentium III 1.133 GHz to the reviewers in the second half of last week. AMD is planning to launch their Thunderbird-Athlon 1.1 GHz in late August, giving us the chance to review the sample with ample time. Intel however shipped out their samples in the last minute, which proves who of the two companies is really able to actually produce 'Beyond-Giga-Processors' right now.

Apparently, Tom was so fed up with "releases" from Intel with no chips available that he called the review, Intel's Next Paper Release: The Pentium III at 1133 MHz

The contrast today is obvious. For the last three years Intel has been doing paper releases while AMD's chips are available the day of the release. Tom's criticism for Intel's paper releases, however, has vanished.

Another THG flaw is positional marketing. The concept of association is well known in marketing and surely this concept is known to the people at THG. It is interesting that they always seem to make sure that an Intel chip is on top regardless of what is being compared. For example, in this review of a Celeron in 2002 The New Generation

The presence of the Athlon XP 1600, 1800, and 1900 is reasonable since by this time AMD has dropped the Duron line and these are comparable in price to the Celerons. To try to make Intel look better, however, THG first cheats by overclocking the Celeron from 2.0Ghz to 3.0Ghz. However when this still isn't enough to move Celeron enough to the top of the chart, THG cheats again by including the Pentium 4 2.26 Ghz. The presence of the P4 in this chart makes no sense because it is much more expensive than the other chips. The only reason it was put into the chart seems to be to prevent AMD from having the top spot. By using both the severely overclocked Celeron and the Pentium 4, THG ensured that Intel had the top spot in most of the tests. This tends to distract from the real purpose of comparing a standard Celeron at 1.7, 1.8, and 2.0Ghz with the comparable Athlon XP 1600, 1800, and 1900's from AMD. The ethical way of handling overclocking is to do a separate article. But the bottom line is that if you have chips from AMD and Intel, they either both have to be overclocked or both base clock so that you get a genuine comparison. Having only one overclocked creates a false association.

We know that THG is perfectly capable of following these ethical guidelines when it wants to as it does in these two examples:

This is a proper overclocking comparison, only Intel's chips are shown.
Pentium D

This is a proper competitive comparison, no chips are overclocked.
Athlon 64 FX

However, THG cannot seem to stay on an ethical track. Here an overclocked P4EE 955 is compared with AMD chips, December 2005. Extreme Edition

The question at this point is whether or not THG shows a general bias and lack of professional reviews after December 2005.

Here we have a good review of the AMD X2 in May 2005. X2 is compared with the AMD 4000+ and Intel 840 and 660 processors. All are stock speeds. AMD X2

Then in January 2006 we have the FX-60 review with another confusing mass of 28 different processors including OC's against FX-60. For example, in the DivX test the OC's do manage to steal the top position from FX-60. FX 60

The Pentium D 900 review contains no OC's but is another jumble of 22 different processors, January 2006. Pentium D 900

This review of Core Duo is much better. It compares with a Pentium M and a Turion, January 2006. Core Duo

The AM2 review is straightfoward comparing AM2 with 939. Socket AM2

The P4 EE 3.73 is proper because it oveclocks both however, it is another confusing jumble of 27 processors. Extreme Edition 965

Then we get to the Core 2 Duo review and THG's ethics plunge again. So, we have overclocked Core 2 Duos up against stock AMD chips. This isn't as badly cluttered as the previous articles. However, the article would make a lot more sense if we dropped the OC's and eliminated all of the lower clocked comparison processors. We don't really need the 4800+, 840, and FX-60 because there are a 5000+, 960, and FX-62. Core 2 Duo

It is clear that THG is cheating and knows that it is cheating because it never puts OC'ed AMD chips up against stock Intel chips in its reviews but frequently puts OC'ed Intel chips up against stock AMD chips. There are some troubling lapses in technical knowledge such as:

There is the issue of memory coherency, but e.g. the Opteron is smart enough to deal with it at up to four processors.

It really seems that nearly three years after Opteron's release Mr Schmid should know that Opterons can handle 8-way.

In spite of THG's problems with technical aspects and obvious bias toward Intel, nevertheless, there are many people who will suggest that the testing done by THG can still be relied on. So, let's look at the actual tests. Let's look again at the recent overclocking comparison between X6800 and FX-62. Overclocked X6800 and FX-62 In this review, for a change, THG puts the overclocks in a separate article as it should. This suggests that THG will give an honest and fair comparison. However, let's look in detail.

I'm not so sure that the top clock for AMD's FX-62 chip is fair. THG claimed they could only reach 3048 Mhz with an HTT of 254 Mhz whereas Neoseeker FX-62 says: I was quite pleased by reaching 3.1GHz air cooled; and the 345MHz HT speed was very impressive as well.
Given THG's bias it certainly raises the question of whether they put as much effort into overclocking the AMD chip.

The memory speed is also not so clear. The Intel memory is clocked to 555 Mhz whereas the AMD memory is only clocked to 508 Mhz. It is not clear why THG didn't use a divider of 11 instead of 12 and clock the AMD memory to 554 Mhz (nearly identical to Intel's). Even with the handicap of slower DIMMs, AMD still manages to outdo Intel:
This increases memory throughput from 9.2 GB/s to 10.7 GB/s - a noticeable improvement over what Intel can deliver. This is where the built-in memory controller really pays big dividends for AMD.

In the temperature and load tests it is not stated what THG did to load the systems. I've found that the term "under high load" can vary quite a a bit from tester to tester. Since tests have been done that reduced the power draw of higher drawing systems we need to know the actual procedure to give it any crediblity.

Now, we'll look at the benchmarks themselves. Benchmarks are only useful if they show a significant spread among processors of varying speeds, and if faster processors are always faster than slower processors of the same model. Bencharks can also be affected by large cache but this is not always easy to detect.

Call of Duty 2 - We see an anomaly where a 4800+ beats an FX-60. Both are dual core, both are socket 939, and both have 2 x 1MB L2 cache. However, the 4800+ is clocked at 2.4Ghz while the FX-60 is clocked at 2.6Ghz. They don't tell in this article what cores these two chips are. However, just one month earlier in another review both were listed as Toledo cores, so presumably they would still be in this review. Since an identical but lower clocked chip cannot truly be faster we have to conclude that this is a symptom of a faulty benchmark, improper testing, or sloppy test records. Any of these would invalidate the tests however if the rest of the data is good we can assume that it is not the benchmark itself. We can also see that the score spread is not proportionate for either AMD or Intel. This makes the benchmark itself faulty regardless of procedure.

Quake 4 – same anomaly. Disproportionate spread for AMD and Intel. This benchmark is faulty.

Unreal Tournament 2004 – same anomaly. The scores also show other anomalies on the midrange AMD scores. The top and bottom scores are good and the scores for Intel are good. Therefore, this is indicative of sloppy testing but the benchmark is good.

Serious Sam – same anomaly. Disproportionate spread for AMD and Intel. This benchmark is faulty.
Fear min. – same anomaly. Disproportionate spread for AMD and Intel. This benchmark is faulty.
Fear av. - same anomaly. Disproportionate spread for AMD and Intel. This benchmark is faulty.

Xvid 1.1.0 – same anomaly.
Divx 6.22 – the data looks good.
Main Concept H.264 Encoder – the data looks good.
Windows Media Encoder – the data looks good.
Pinnacle Studio DV to Mpeg2 – Disproportionate spread for AMD and Intel. This benchmark is faulty.

Premier Pro 2.0 – same anomaly
Clone DVD 2.8 - the data looks good.
Lame 3.97 - the data looks good.
Ogg Vorbis - the data looks good.
Windows Media Encoder 9 - the data looks good.
iTunes - the data looks good.
WinRar 3.60 – same anomaly
Photoshop CS 2 rendering 5 pictures – the middle AMD scores are anomalous. However, the highest and lowest scores appear to be good so the anomalies are probably due to sloppy testing.

Photoshop CS 2 converting 150 pictures - the data looks good.
3D Studio Max - the data looks good.
MS Word 2003 pdf – same anomaly
MS PowerPoint pdf – same anomaly
AVG anti Virus - the data looks good.
Multitasking 1 -same anomaly. However the use of AVG is also improper as this software is severely I/O bandwidth restricted and will not properly load the second core. This benchmark was poorly designed and not useful.

Multitasking 2 – Given that there are anomalies for both Intel and AMD processors this benchmark appears to have been poorly designed and executed. This benchmark is not useful.

Sandra arithmetic ALU – the data looks good..
Sandra aritmetic MLOPs – the Pentium EE 965 score has a 17% anomalous increase. This could be due to the additional memory bandwidth. The rest of the scores seem good.
Sandra Multmedia Integer – the C2D scores are amazingly high.
Sandra Multimedia FP – the data looks good.

It is not clear why the Multimedia Integer scores are so high. It can't be due to the 4 instruction issue or it would have appeared in the ALU test. It can't be due to the faster SSE because it's an Integer test. This really only leaves the faster cache bus speed as an explanation. Unfortunately if this benchmark is faster because of the cache bus then the benchmark is useless. Also, real benchmarks show clustering of scores for similar benchmarks. The other benchmarks for MP3, MPEG, and DVD conversion all show similar patterns for C2D. The Multmedia Integer benchmark however has to be considered faulty.

We'll skip the PC and 3D Mark tests.

The conclusions are not accurate. If we take the test data at face value then an increase of 16.8% for X6800 would actually be very poor compared to the FX's 7.2%. If the data were correct then FX would be showing 100% scaling while X6800 would only be showing 67%. If the data were correct then the analysis would be attrocious. For example, XviD is listed as a 20.3% increase when in reality the increase is nearly 25%. However, some of the benchmark scores for things like Call of Duty are faulty and should not be used. If you drop the bad benchmarks then X6800 will be at least 99% scaling.

Tom's Hardware Guide used to be a website with quality and integrity. Today, the testing is sloppy, the conclusions may not even match their own data, and they use benchmarks that are clearly faulty. Not all bencharks are good and the benchmarks themselves would have to be tested to determine their quality. However, when THG is willing to use clearly faulty benchmarks we can have no confidence that they've done any testing on the benchmarks to sort which are good and which are not. Finally, THG's clear bias toward Intel greatly reduces the credibility of the website. Tom's Hardware Guide today is merely a shadow of the ethics and quality that the website used to be.

Faded Glory

I've been asked for details about unfairness in reviews. This would tend to show both Toms Hardware Guide and Anandtech in an unfavorable light. And, this is unfortunate because there was a time when both of these websites had integrity and honesty and weren't afraid to say what was true. I'd like to look back at what was probably their finest hour.

When AMD launced its K7 processor, Intel had genuine competition for the first time. During 1999 both AMD and Intel pushed the chips to be faster than their competition. This reached a breaking point when AMD announced a 1.1 Ghz K7 and Intel followed with the announcement of a 1.13 Ghz PIII. What followed was both impressive and commendable.

July, 2000. Anand Lal Shimpi at Anandtech had received a 1.13Ghz sample PIII that appeared to work fine but Thomas Pabst at Tom's Hardware Guide had not received the special motherboard with microcode patches and found that the processor wouldn't run on any other board. However, Tom found that the chip would run at 850 Mhz.

Intel's Next Paper Release: The Pentium III at 1133 MHz

As it turns out the new Pentium III at 1133 MHz is utterly unable to run reliably at this speed without a brand new micro code update, while it performs fine if you 'underclock' it to e.g. 850 MHz. So if anyone wants to make you believe that Intel was easily able to bring the 'Coppermine' to 1.133 GHz then this guy is either incompetent or a liar.

Tom was surprised to see that Anand's chip ran just fine. Tom tried updating the BIOS's but still had problems. Intel did tell him that Kyle Bennet at HardOCP also had problems but suggested that their two faulty chips were a fluke.

Revisiting Intel's New Pentium III at 1.13 GHz

Intel said that they didn't have another chip to send to Tom. So, Anand sent his own sample chip to Tom. Tom tested the chip and then sent both his and Anand's to Kyle where an Intel engineer was going to observe. Tom sent along a harddrive with Linux to run a compiler test that he had found made even Anand's more stable chip crash.

Latest Update On Intel's 1.13 GHz Pentium III

Intel admitted that there were problems with the chips and pulled all of them that had been sent out. They had to delay release for another quarter until the problems were fixed.

Intel Admits Problems With Pentium III 1.13 GHz: Production and Shipments Halted

Tom received a lot of criticism when he was the only one who had a bad review. Intel brushed aside the problems and even retaliated by not giving Tom information about the next P4 release. It would have been easy for Tom to just give in and move on. But, Tom had backbone and persisted. He kept going and looking for the answers. He organized cooperation among three different review sites and got the problems documented. He really went out of his way to find out the truth.

Anand said, " The latest shock came in the complete recall of the 1.13GHz Pentium III processors which was almost single handedly inspired by Dr. Thomas Pabst of Tom’s Hardware. "

Tom deserves praise for his actions along with Kyle and Anand. This was a time when review sites showed what real review sites do. This was when truth was more important than anything else. This was what once was.

Sunday, September 17, 2006

Could 2007 be a Repeat of 2002 for AMD? Part II.

In part I we looked at the overall events of AMD's revenue drop in 2002. In part II we'll look at the current financial outlook of both companies.

We saw that AMD's very large revenue drop to 5.5% was only a temporary event and that the actual shift was from about 9.3% to 8.2%. However, during this time, AMD was much less profitable than Intel. There are many reasons for this. Intel had been well established in servers since the Pentium Pro while AMD did not begin to make server chips until the Athlon MP in late 2001. Intel also was more involved with mobile, manufactured its own chipsets, and obtained lower costs from its 300mm FAB. Today, AMD has a good track record and established base of server chips, is more diversified into mobile, has an operating 300mm FAB (FAB 36), and with ATI's purchase is moving into chipsets. AMD is much more diversified than it was in 2002 and this tends to make AMD more profitable.

Intel was also ahead in the shift to smaller manufacturing processes. Intel did have some difficulty in following AMD to copper interconnects which led to some of problems in 2001 however they were arguably ahead on process technology at the 130nm level. AMD had had a process partnership with Motorola and then one with UMC but it found a true partner in IBM. IBM's process experience finally enabled AMD to begin manufacturing the K8. Although Intel is still moving to smaller processes more quickly it no longer has more advanced process technology. However, even if AMD's process is slightly more complex the two processes are still very much alike. The first real test of process leadership won't occur until Intel begins using high K materials for gates while AMD stays with silicon dioxide. AMD has a small lead in process control software with APM. It takes Intel two months to move a chip from test to production with its Copy Exact method whereas AMD can go to production the same day with APM. This shows a fundamental difference in design style as Intel has dedicated FABs (D1D and D1C) specifically for design while AMD runs tests in its production FABs. We saw in the graphs that when one company had production difficulty the other company temporarily benefited however there is currently no indication of problems from either company.

There is no doubt that Conroe is faster on the desktop than X2. Even the fastest FX-62 processor is only a bit faster than the E6600. If AMD were to release a 3.0Ghz FX-64 its speed would be just slightly less than an E6700 but at a much greater price tag. Assuming that the 4X4 plan covers X6800, AMD would need to price match:

E6700 - no current match, would require 3.0Ghz clock, 5800+
E6600 - no current match, would require 2.8Ghz clock, 5400+, similar to FX-62
***** - 5000+ AMD's current fastest common chip, slower than E6600 but priced higher
E6400 - 4600+, currently matches in price
E6300 - 4200+, currently matches in price
***** - 3800+ should be comparable to a proposed lower model by Intel

AMD might skip E6700 since it would need a 3.0Ghz chip but it should probably address E6600. This would still require producing a 2.8Ghz 5400+. AMD will either have to address this or leave this upper segment entirely to Intel along with its higher margins.

Overall, things are not as bad for AMD as it would appear. In servers, the Woodcrest version of Core 2 Duo doesn't share the same advantages that the desktop chip has. This is true for several reasons. Opterons are capable of up to 8-way configurations but 4-way is currently the point of maximum benefit. Woodcrest is only capable of 2-way which leaves the 4-way market very open to AMD. Intel's older Prescott based Xeon 4-way chips are much less competitive with both lower performance and higher power draw. Further, Intel's newest 5000 chipset is designed for FBDIMM which has three disadvantages. FBDIMMs draw more power than regular RDIMMs which nullifies Woodcrest's lower power draw. FBDIMM also slows down substantially when the memory slots are fully populated as they normally are with servers. This tends to nullify Woodcrest's higher speed. Finally, Intel announced a very large scale back to its expected use of FBDIMM in 2008 while AMD canceled its plans for FBDIMM entirely. This could make Intel's 5000 series chipsets much less desirable. Intel could solve all of the problems except the 2-way limitation by releasing a new chipset for Woodcrest but this will take time. These factors tend to remove any immediate advantages that Woodcrest may have had. In the mobile area Merom has much less of a lead because the fastest Conroe chips draw more power. However, the 65nm Merom chips are ahead AMD's 90nm Turion chips. AMD should be reasonably competitive again once Turion is moved down to the 65nm process.

It is clear that for dual core Intel will be fastest no matter what in 2006. However, this is not so clear when we move up from dual core. Intel will release the Kentsfield quad core equivalent of Conroe later this year and probably the Clovertown quad core version of Woodcrest in 2007. These chips however are far less ambitious and leading edge than Conroe. In fact, they are essentially the same strategy as Intel used to lackluster effect with Smithfield. If Smithfield is any indication, Kentsfield is destined to be the quad core version of Celeron by end of 2007. Smithfield was an MCM which means just putting two dies in the same package rather than a truly integrated design like Core 2, Core 2 Duo, and K8. Further, Smithfield was manufactured on the same process which led to high power draw. Kentsfield seems to have all of the same disadvantages. Kentsfield's power draw basically doubles compared to Conroe while using two dies in the same package means that two FSB controllers share the same bus. Although a desktop unit can tolerate a higher power draw for a performance chip this factor tends to make Clovertown much worse than Woodcrest. It remains to be seen if anyone will elect to upgrade from Woodcrest to Clovertown on large scale servers at the cost of doubling the cpu power draw. This is important as was shown by the upgrade of an older, Opteron based supercomputer at Oakridge National Labs. It could do this because AMD managed to stay within its TDP when moving to dual core and has stated that it will again with quad core. This could be why Opteron now has contracts for two TeraFlops supercomputers. Intel seems to be similarly capable of doing this with integrated designs (like Yonah (Core 2) and Woodcrest) but not with MCM designs. In this light, Kentsfield and Clovertown have to be seen as stopgap until Intel creates an integrated quad core design.

So, if Intel's quad core offerings are stopgap then what is AMD's 4X4? To find out if 4X4 is a clever new idea or just a stopgap reaction to Conroe and Kentsfield we have to see how it compares to current dual socket systems. You can pick up a pair of 1.8Ghz Opteron dual cores for $560. Since only the lowest of the 4X4 processors was stated to be under $1,000 this indicates that 4X4 is a more upscale offering. It would appear that AMD is trying to span two markets with 4X4. Essentially, we have two chips replacing what was one FX chip at the same price range. In this light, it would be an enthusiast offering and match up with companies like Alienware and Voodoo PC. This seems consistent as Alienware has already stated support. So, dual core FX replaced single core and now dual chip is replacing single chip. However, making use of all four cores means either having a game that has that much threading (which none currently do) or running things in the background while playing a game. Perhaps in view of this trend, multithreading will accelerate. The other area that may be spanned by 4X4 would be workstations. However, this is mostly a question of what memory is supported as RDIMMs are required to create workstation or server class systems. If 4X4 can successfully span both enthusiast and workstation markets it could be clever idea.

C2D is definitely a boost for Intel but the volume in 2006 is not high enough to drive the overall market. By year's end, C2D will only be at 30% of Intel's capacity which is what AMD's 65nm production will be. Simply shrinking the die to 65nm and reducing power consumption should be sufficient in 2006 for the server and notebook markets. It has been suggested that Intel will take back share by lowering prices. If 4X4 will replace one chip with two then it appears that AMD is at least somewhat prepared to match Intel in terms of price. However, Intel currently has a large overstock of Prescott and Presler based P4 chips. Releasing Conroe at low prices does put pressure on AMD but it puts even more pressure on Intel's P4 line which will still make up around 80% of the total chip volume in the last two quarters. Since this will dominate Intel's revenues in 2006 I would expect Intel to see around a 12% drop versus 2005 while I would expect AMD's revenues to increase around 10% versus 2005. To put this another way, Conroe simply will not be produced in large enough volumes to turn the market around for Intel. It will bring new revenues but these will not be enough to offset the P4 declines.

Things do look different in 2007. As C2D continues to ramp up this will move things in Intel's favor and at some point C2D will begin to dominate Intel's sales. This is where it gets more interesting. Intel's ramp of C2D would be very favorable by the end of the first quarter. The only other thing in AMD's pocket to counter would be K8L, AMD's fully integrated quad core design. This chip includes a shared L3 cache, double the prefetch size, double the L1 cache bus width, and double the FP pipelines. It boosts its FP performance similar to the way C2D was boosted over Yonah. This chip has advanced power management and additional AMD64 instructions. And, because it is manufactured on 65nm it draws no more power than AMD's current dual core 90nm chips. The interesting part is that AMD officially says mid year 2007 for release. However, this estimate is similar to the current end of year estimate for 65nm chips. In both cases, AMD allows itself some padding so that if problems occur it can still meet its estimated delivery. If things go well, it can always move its release forward. This is why it seems likely that 65nm will appear in October rather than in December. Likewise, K8L could arrive in the 2nd quarter of 2007. K8L is likely to surpass Conroe in FP/SSE performance at the same clock and probably cut the current integer performance gap in half. With higher SSE performance but lower Integer performance I would say that the dual core version of K8L (without shared L3) will be about equal to Conroe. And, Intel will need a better chipset to prevent Woodcrest from being left behind. This won't actually have much effect on notebooks since K8L won't be used for mobile but it will outclass Intel's stopgap quad core offerings just as X2 did with Smithfield and the Presler die shrink. Overall, K8L should deliver more performance and lower power draw than Kentsfield/Clovertown. However, Intel is likely to respond with an integrated quad core design in 2008 as it did with C2D.

It isn't just K8L though that would make AMD competitive in 2007. Whereas Intel will be almost entirely converted to 65nm by end of 2006, AMD will only be at 30% conversion on one FAB. This means that AMD will continue to gain from lower cost as it ramps 65nm production toward mid year in 2007. Similarly, Intel will be at full 300mm production whereas FAB36 will only be at 50%. AMD again gains from lower costs as FAB36 reaches 100% capacity by end of 2007 and 125% (with a cleanroom addition currently under construction) by mid 2008. Intel's biggest benefit in 2007 will be conversion to the Conroe die. Conroe's die is only about half the size of the Presler die which will lower costs for Intel. However, the die size for K8 is actually smaller than Conroe which means that AMD will get a little more benefit. AMD's falling costs due to 300mm and 65nm ramping in 2007 puts it into a very good position to resist pricing pressures from Intel in 2007. This means that the argument for gaining share by reducing prices will essentially be gone by year's end as AMD's costs will drop faster than Intel's during 2007.

There are some incidental areas like Intel is currently taking losses in non-computing areas and is spending down its cash reserves. AMD's cash reserves are depleted and it incurs some debt with the ATI purchase. The interest on the debt is mostly offset by the cost savings of merging AMD and ATI so the debt itself is the biggest factor. However, Intel's unprofitable sections and its current large inventories are likely to hit its stock harder than AMD's. This would particularly be true if Intel has to reduce stock buyback to save money. With Dell's increased AMD offerings combined with the increased server offerings from IBM, Sun, Dell, and HP it does seem unlikely that AMD will see any large reversals soon. AMD seems to have mostly short term problems while Intel's appear to be longer term. AMD needs to address E6400 and E6600. However, AMD seems to have power management and performance under control with K8L and its 4X4 plan appears to create a new price structure in the former FX/EE range. Although Intel looks very good at the moment it will need to create an integrated quad core solution and address 4X4, HTX, and HT 3.0 in 2008 to stay competitive. Short of either Intel or AMD's having production problems that reduce volume there are unlikely to be any large changes in share. However, AMD's current 18% revenue share may just be temporary and it could fall back to 15% where it was two quarters ago. Overall, I would expect Intel to do worse this year than 2005 and AMD to do better than 2005.

Wednesday, September 13, 2006

Could 2007 be a Repeat of 2002 for AMD? Part I.

The 2002 AMD revenue crash is often talked about but often does not seem to be clearly understood. Information about AMD's and Intel's finances is required as well as familiarity with the history of the markets and technology. I've seen a number of articles and posts on this subject lately but most seem to lack a thorough perspective on the actions and causes of the event. A typical example of this topic is this entry Deja vu 2002 all over again.

The general view is that AMD did well when their K7 outperformed the Intel PIII. However, Intel fought back with the original Williamette P4 and then fulled back into the lead with the Northwood P4 in 2002 whereaupon AMD's revenues crashed. This is typically compared to events today where it is said that AMD began doing well when K8 outperformed P4. However, Intel now has Core 2 Duo (Conroe for the desktop version) which outperforms K8. Therefore, another revenue crash from AMD is expected. However, I'm going to try to show that the original event was not as simple as that and that it is unrelated to today's events.

It is a bit too simple to talk about a single drop in revenue for AMD in early 2002. The actual picture was that the entire market turned down. The market actually dropped in 2001, stayed down in 2002, began to recover in 2003, and was finally back up to 2000 levels in 2004. The revenues for both Intel and AMD were down during these three years as is easily seen from the revenues for both companies (the amounts are in Millions).

2000 - $ 4,644
2001 - $ 3,892
2002 - $ 2,697
2003 - $ 3,519

2004 - $ 5,000

2000 - $ 33,726
2001 - $ 26,539
2002 - $ 26,764
2003 - $ 30,141

2004 - $ 34,209

We don't see the often described pattern that AMD's revenues were up until 2002 when they lost share and that Intel's revenues increased in 2002 as they took share back from AMD. Both companies had declines that lasted three years (although 2002 was the worst year for AMD). This can also be seen in a quarterly comparison graph. Because Intel's revenues are so much larger than AMD's, I had to exagerate the AMD graph 4X so that the detail could be seen. AMD's real revenues are much smaller.

It can be seen that the two revenue graphs are fairly similar.The biggest differences are in Q1 01 when Intel was falling while AMD rose and in Q3 02 where Intel was rising as AMD fell. In particular it can be seen that in Q3 01 AMD's graph was heading down and would have reached the point at Q2 02 except that there is a bump in between. This bump is an example of a temporary interference. But the overall trend is clear. Intel shows a trough from Q1 01 to about Q3 03. AMD is similar except that its trough starts one quarter later and ends one quarter later and has a more noticeable bump in Q4 01 and Q1 02. Intel has the same bump but not quite as large. A 4th quarter bump is not unusual but apparently AMD got some boost because Intel didn't increase the speed on the Tualatin version of PIII and Williamette was having some difficulties. In contrast, AMD falls more sharply in Q2 02 and Q3 02 because Intel had successfully released Northwood and AMD was having difficulties with the 130nm transition for the Thoroughbred version of K7.

However, in spite of the bumps and dips the drop in revenue by both companies was actually quite similar. We can more easily ignore the small bumps and dips if we take averages over several quarters. We take a four quarter average of both AMD and Intel before the drop to see how much revenue was typical and then take a ten quarter average after the drop to cover the span of the trough.

Intel average revenue per quarter before decline $ 6,825
Intel average revenue per quarter after decline $ 5,536
Drop 19%

AMD average revenue per quarter before decline $ 600
AMD average revenue per quarter after decline $ 483
Drop 20%

So, we can see that AMD didn't really crash more than Intel overall. However, AMD had big rise followed by a sharp fall which made for a relatively greater fal (and hence a crash). However, the bump just before the crash should be seen as temporary caused by Intel's production problems. And, the following drop was also temporary caused by AMD's production problems. Usually this is treated as though AMD gained a lot of marketshare from Intel and then lost it. The actual change in revenue share was fairly small. AMD had 9.3% average before Intel's drop and then 8.2% average after its own. This is not large shift. However, during this time, AMD was much less profitable than Intel so even this small loss was felt. During the lowest point in Q3 02, AMD's revenue share was only 5.5% however this had recovered to 8.8% just two quarters later. AMD would probably have been doing fairly well at this point but the next quarter saw reduced yields as AMD began the SOI process for K8. This can be seen as a dip in graph for Q2 03. However, after this dip AMD's revenues began rising as did its marketshare as the yields for SOI improved.

Genuine transfers of markeshare don't happen within one quarter. The market tends to be somewhat elastic over one or two quarters. When AMD twice gained over 12% markeshare in 2001 the market snapped back within two quarters. And, when AMD dropped sharply in Q3 02, the market again snapped back within two quarters. Real shifts in marketshare take longer to develop and last for much longer. So, we've seen that the crash in 2002 was a bit different from the way it is usually described. In part II, we'll look at the current market and see if nevertheless AMD could expect a large drop soon.

Saturday, September 02, 2006


It is no secret that as processors have gotten faster, they've outpaced memory speed. There was a time when memory was actually faster than the processor but this has slowly changed. Over time, processor designs began including Level 1 (L1) cache and then Level 2 (L2). And, now it appears that most processors will soon need L3 as well. These caches are smaller and faster with each level. So, L3 is smaller than main memory but faster; L2 is smaller, but faster than L3, and L1 is smaller again but faster than L2. Without these caches, the processor would spend a lot of time waiting on main memory.

Memory has valiantly tried to keep up. It has gone from regular DRAM to DDR to DDR2 and soon DDR3. However, as speed has gone up, fanout has decreased. Therefore, it is not possible to put as many DDR3 DIMM's on a board as DDR DIMMs. The number of usable DIMMs has been traded for speed. The only reason why this is somewhat acceptable is that memory chip capacity is going up as well so the greater memory on each DIMM somewhat makes up for having fewer DIMMs.

Intel has introduced what it feels is a way to solve this problem, FBDIMM. Unfortunately, FBDIMM creates just as many problems as it solves. Although FDBIMM is fairly fast and has good fanout, it introduces very large latencies and draws much more power. An FBDIMM is basically a DDR2 DIMM with an extra chip on it, an AMB. This AMB chip provides two serial ports that allow an upstream channel and a downstream channel. These connect from the memory controller to the first FBDIMM. Additional FBDIMMs are simply daisychained to the first one. The problem is that each FBDIMM in the chain introduces latency. It is difficult to imagine how poor the performance of the 8th FBDIMM in a chain would be after the data request has had to hop over the previous seven FBDIMMs and then the data has to make the same seven hops going back up. This can drastically slow down the memory speed and make FBDIMMs impractical at maximum fanout. Therefore, at large fanout, we've simply come full circle and traded away speed again to increase capacity. Secondly, the AMB chip draws lots of power. This chip all by itself draws as much power as the rest of the DIMM combined, thereby doubling the power draw. This is not a good tradeoff, especially in a time when large scale computers are becoming sensitive to the high cost of electric power for these systems. Processors themselves have become much more energy efficient and Intel likes to brag about the low power draw of its newer Core 2 Duo designs verus its older Pentium D designs, however, these gains will easily be lost with the extra power needed for Intel's FBDIMM design. So, power draw for the processor is simply replaced by power draw from memory and no real advancement is made.

Currently, there is nothing better, but there could be. Suppose we replace the AMB chip with a HyperTransport chip with three 16 bit links. We'll call this HTDIMM or just HTD for short. One link goes back to the processor and the other two provide a fanout tree. The first HTD connects to two HTD's. These two then connect to two more, making seven total (1 + 2 + 4). A fanout of seven would be almost as good as the maximum eight fanout of FBDIMM. However, whereas FBDIMM would need seven hops to reach the last DIMM, HTD would only need two hops. Power draw and cost could be an issue but the last four HTD's only need a single link so they could be cheaper and draw less power. Cost is also not likely to be an issue since HT was designed for low cost and current FBDIMMs cost about the same as regular DDR2 DIMMs. Three links might seem complicated compared to FBDIMM but all of these combined would be narrower than the current DIMM bus, so it would reduce motherboard circuit trace complexity. Also, each 16 bit path is independent so this greatly reduces the complexity of the current very wide serpentine circuit traces.

A fanout of seven would be a good compromise for most server configurations however this is not an actual limit. There is actually no reason you couldn't add another hop and another fanout of two. So then you would have eight more DIMMs for a total of fifteen (1 + 2 + 4 + 8). This might be necessary for some high powered servers. Even with three hops we would still be less than half of the seven hops maximum for FBDIMM but at about double the fanout. And, again, the last eight HTD's would only need a single link.

AMD's K8 now uses an onboard dual channel controller. The processor could use four HT channels instead of the current memory controller which would greatly reduce both the pincount and the number of circuit traces for memory access. This would reduce both motherboard and processor complexity.

The reason why HT would work better than FBDIMM is several factors. HT was designed long ago for use on motherboards and is rooted in technology from the DEC Alpha. As it became an open standard, it was designed for low cost, low complexity, low latency, and high bandwidth. It has been successfully used by AMD, IBM, Apple, and all of the 3rd party chipset makers like Nvidia, ATI, and VIA. In fact, there are motherboards today for Intel processors that use HT to communicate between the Northbridge and Southbridge chips. HT is so robust that it is a superset of PCI, USB, AGP, PCI-X, PCI-E, and even Intel's proposed CSI standard. HT is capable of transporting data from all of these protocols. HT has an advantage over the AMB chip in terms of flexibility and speed. HT is capable of handling a tree structure whereas AMB uses a ring structure. HT is also much faster. Whereas FBDIMM can move 6.4 GB/sec, HTDIMM would be capable of moving 20.6 GB/sec. This speed is about as fast as the fastest proposed DDR3 memory. However, HTDIMM would be capable of handling multiple simultaneous DIMM accesses and data transfers. With four HT channels, the processor would have a peak memory bandwidth of 83 GB/sec. Since this is well beyond what current processors can use, desktop systems would probably only use two channels while budget systems used one. The reason for this speed is that the base technology for HT comes from the very fast DEC Alpha chip. This technology has continued to be developed since it was used on DEC's Alpha and AMD's Athlon MP processors. Today, HyperTransport is a mature protocol in version 3.0. The speed has doubled with each version.

This type of memory would work for Intel as well. HyperTransport is an open protocol and can be used royalty free. Replacing the current FBDIMM ports on the 5000 Northbridge with HT ports would be fairly simple for Intel to do and it could pick up additional desktop chipsets from Nvidia since they already use HT. This would require no change to the processor or FSB design if Intel wanted to retain these. Using HyperTransport with current DIMM design would be the best way to reverse the lag in memory speed and increase fanout. This would also reverse the trend toward more and more cache. The DIMM design itself would be superior to FBDIMM in every way. HTDIMM is what the next Jedec standard should be.