Scientia's Blog: The Top Developments Of 2007

Wednesday, September 19, 2007

The Top Developments Of 2007

It looks like both AMD and Intel have been as forthcoming as they are likely to be for awhile about their long range plans. The most significant items however have little to do with clock speeds or process size.

The two most significant developments have without doubt been SSE5 and motherboard buffered DIMM access. AMD has already announced its plan to handle motherboard buffered DIMMs with G3MX. This is significant because it means the end of registered DIMMS for AMD. With G3MX, AMD can use the fastest available desktop DIMMs with its server products. This is great for AMD and server vendors because desktop DIMMs tend to be both faster and cheaper than register DIMMs. This is also good news for DIMM makers because it would relieve them making registered DIMMs for a small market segment and allow them to concentrate on the desktop products. Intel may have the same thing in mind for Nehalem. There have been hints by Intel but nothing firm. I suppose Intel has reason to keep this secret since this would also mean the end of FBIMM in Intel's longterm plans. If Intel is too open about this it could make customers think twice about buying Intel's current server products which all use FBDIMM. So, whether this is the case with Nehalem or perhaps not until later it is clear that both FBDIMM and registered DIMMs are on their way out. This will be a fundamental boost to servers since their average DIMM speed will increase. However, this could also be a boost to desktops since adding the server volume to desktop DIMMs should make them cheaper to develop. This also avoids splitting the engineering resources at memory manufacturers so we could see better desktop memory as well.

SSE5 is also remarkable. Some have been comparing this with SSE4 but this is a mistake. SSE4 is just another SSE upgrade like SSE2 and SSE3. However, SSE5 is an actual extension to the x86 ISA. If AMD had been thinking clearer they might have called it AMD64-2. A good indication of how serious AMD is about SSE5 is that they will drop 3DNow support in Bulldozer. This clears away some bit codes that can be used for other things (like perhaps matching SSE4). Intel has already stated that they would not support it. On the other hand, Intel's statement means very little. We know that Intel executives openly lied about their intentions to support AMD64 right up until they did. And, Intel has every reason to lie about SSE5. The 3-way instructions can easily steal Itanium's thunder and Intel is still hoping (and praying) that Itanium will not get gobbled up by x86. Intel is also stuck in terms of competitiveness because it is too late to add SSE5 to Nehalem. This means that Intel would have to try to include it in the 32nm shrink which is difficult without making core changes. This could easily mean that Intel is behind in SSE5 until 2010. So, it wouldn't help Intel to announce support until it has to since supporting SSE5 now would only encourage development for an ISA extension that it will be behind in. Intel is taking the somewhat deceptive approach of working on a solution quietly while claiming not to be. Intel can hope that SSE5 won't become popular enough that it has to support it. However, if it does then Intel can always claim to be giving in to popular demand. It's dishonest but it is understandable for a company that has been painted into a corner.

AMD understands about being painted into a corner. Intel has had the advantage with MCM quad cores since separate dies mean both higher yields and higher clock speeds. For example, on a monolithic quad die you can only bin as high as the slowest core. However, Intel can pick and choose individual dies to put the highest binning ones together. Also, Intel can always pawn off a dual core die with a bad core as a lowly Conroe-L but it would be a much bigger loss for AMD to sell a quad die as a dual core. AMD's creative solution was the Triple Core announcement. This means that any quads with a bad core will be sold as X3's instead of X4's. This does make AMD's ASP look a bit better. I doubt Intel will follow suit on this but then it doesn't have to. For AMD, having an X4 knocked down to an X2 is a big loss but for Intel it just means having a Conroe knocked down to Conroe-L which is not so big. Simply put, AMD needs triple cores but Intel doesn't. On the other hand, just as AMD was forced to release a faster FX chip on the older 90nm process so too it seems Intel has been forced to deliver Tigerton not with the shiny new Penryn core but with the older Clovetown core. Tigerton is basically just Clovertown on a quad FSB chipset. This does suggest at least a bit of desperation since after working on this chipset for over a year Intel will be lucky if it breaks even on sales. To understand what a stumble Tigerton is you only have to consider the tortured upgrade path. In 2006 and most of 2007 Intel's 4-way platform meant Tulsa. Now we get Tigerton which uses the completely incompatible Caneland chipset. No upgrades from Tulsa. And, for anyone who buys a Tigerton system, oops, no upgrade to Nehalem either. In constrast, 4-way Opteron systems should be upgradable to 4-way Barcelona with just a BIOS update. And, if attractive, these should be upgradable to Shanghai as well. After Nehalem though, things become more even as AMD introduces Bulldozer on an incompatible platform. 2009 will without doubt be the year of new sockets.

For the first time in quite awhile we see Intel hitting its limits. Intel's 3.33Ghz demo had created the expectation of cool running 3.33Ghz desktop chips with 1600Mhz FSBs. It now appears that Intel will only release a single 45nm desktop chip in 2007 and it will only be clocked at 3.0Ghz. The chip only has a 1333Mhz FSB and draws a whopping 130 Watts. Thus we clearly see Intel's straining to deliver something faster much as AMD did recently with its 3.2Ghz FX. However, Intel is not straining because of AMD's 3.2Ghz FX chip (which clearly is no competition). Intel is straining because of AMD's server volume share. In the past year, AMD's sever volume has dropped from about 25% to only 13%. Now with Barcelona, AMD stands to start taking share back. There really isn't much Intel can do to prevent this now that Barcelona is finally out. But any sever chip share that is lost is a double blow because server chips are worth about three times as much as desktop chips. This means that any losses will hurt Intel's ASP and boost AMD's by much more than a similar change in desktop volume would. So, Intel is taking its best and brightest 45nm Penryn chips and allocating them all to the server market to try to hold the line against Barcelona. Of the 12% that Intel has gained it is almost certain to lose half back to AMD in the next quarter or two, but if it digs in, then it might hold onto the other half. This means that the desktop gets the short end of the stick in Q1 2008. However, by Q2 2008, Intel should be producing enough 45m chips to pay attention to the desktop again. I have to admit that this is worse than I was expecting since I assumed Intel could do a 3.33Ghz desktop chip by Q1. But now it looks like 3.33Ghz will have to wait until Q2.

AMD is still a bit of a wild card. It doesn't appear that they will have anything faster than 2.5Ghz in Q4 but 3.0Ghz might be doable by Q1. Certainly, AMD's demo would suggest a 3.0Ghz in Q1 but as we've just seen, demos are not always a good indicator. Intel's announcement that Nehalem has taped out is also a reminder that AMD has made no such announcement for Shanghai. AMD originally claimed mid 2008 for Shanghai and since chips normally appear about 12 months after tapeout we really should be seeing a tapeout announcement very soon if AMD is going to release by Q3 2008. There is little doubt that AMD needs 45nm as soon as possible to match Intel's costs as Penryn ramps up. A delay would seem odd since Shanghai seems to have fewer architecture changes than Penryn. AMD needs a tapeout announcement soon to avoid rumors of problems with its immersion scanning process.

331 comments:

«Oldest ‹Older 201 – 331 of 331

Scientia from AMDZone said...: InTheKnow

"Just one more reason I think Silverthorn is one of the year's top developments."

Perhaps if we were discussing embedded applications.

" It has the computing power, thermals, and power envelope for x86 to start to seriously compete with ARM."

ARM isn't used on the desktop.; October 03, 2007 11:42 AM
Ho Ho said...: scientia
"G3MX will benefit the desktop even if it is never used there because it will reduce the cost and complexity of designing consumer DIMMs."

Have you looked at the RAM prices lately? A year ago you paid more than twice as much for 667MHz 512M DDR2 stick than you pay for 800MHz 1GB stick today. Any ideas what could make the prices even cheaper for desktop besides natural development that will occur no matter what?

"Software should begin using [SSE5] right away."

And lightweighted threading will not need any software support, perhaps only updated OS or a few libraries. How long did it took for software to start using any other SIMD instruction set?

"An Intel version would trail by at least a year."

Unless it appears already with Larrabee.

"It is doubtful that Intel will be able to get MS to support a second version."

Why so? MS has supported all sorts of funky stuff before and I doubt they would make an exception this time.

"ARM isn't used on the desktop."

Same with G3MX. The price difference that it could make will be insignificant. Especially considering how long will it take for other types of RAM to phase out. Basically any change that could come will be spread out over several years and nobody would notice.

Btw, why are you only considering desktop at the moment and discarding everything else, at least the stuff coming from Intel?

I have a new candidate for top development of 2007: Silicon Photonics.; October 03, 2007 1:02 PM
Scientia from AMDZone said...: Ho Ho

"Have you looked at the RAM prices lately?"

I don't see how this argument is related to what I said. If DIMM makers can stop design work on registered and FBDIMMs then this will reduce the cost of development. Are you disagreeing with this?

"And lightweighted threading will not need any software support, perhaps only updated OS or a few libraries."

Lightweight threading is not robust enough for virtualization. Lightweight threading is also very little different from what IBM has already been doing with Cell. An improvement over Cell? Yes. A major breakthrough? No.

"Unless it appears already with Larrabee."

I can safely guarantee that Larrabee has nothing remotely like SSE5. I'm baffled that you would think that it would.

"Why so? MS has supported all sorts of funky stuff before and I doubt they would make an exception this time."

False. Intel tried to release its own version of AMD64 and was stopped cold by MS. I'd say this will happen again with SSE5.

"Same with G3MX. The price difference that it could make will be insignificant."

No. This is a fundamental shift in AMD's position relative to JEDEC. I'm sorry you can't see it.

"Btw, why are you only considering desktop at the moment and discarding everything else, at least the stuff coming from Intel?"

What stuff are you referring to? I think it is ludicrous to be comparing palmtop chips to desktop chips although workstation technology does overlap more.

"I have a new candidate for top development of 2007: Silicon Photonics."

Sure, when Intel announces a release date.; October 03, 2007 1:41 PM
Mo said...: Mo, have you stopped beating your wife?
nice, resorting to personal attacks . God forbid you don't delete your OWN post.

Why is it so difficult for you to repeat things correctly? AMD made a forecast that they felt they could break even by Q4. This was not my prediction as you've tried to claim. I've talked about AMD's statement before and never said that I agreed with it. What I'm saying now is that I doubt that AMD can hit break even in Q4.

I am repeating it correctly. You fully heartedly agreed with AMD that it can break even and you even went further to show what they could do to meet that criteria.
Did you call them out and say NOPE, they can't do it?
Did you not make the chart stating that they can reduce the loss by 100mil each qtr and be even by years end? Did you not say that.

So it seems like AMD doesn't even know when it can break even.

Try repeating what I say instead of what you make up. That is the only way you will ever be correct.

and ofcourse our dear Sci. is NEVER wrong.

Again, thanks for the personal insult and no I don't beat my wife.; October 03, 2007 1:49 PM
Ho Ho said...: scientia
"I don't see how this argument is related to what I said."

I can't see how it would have any effect on desktop RAM. Prices are already so low that they cannot drop significantly simply because of removing some type of RAM from deveopment.

Are RAM developers funding development of server RAM with desktop sales? If so then why are FBDIMMs still so much more expensive than regular RAM?

"Lightweight threading is not robust enough for virtualization."

Why so? It simply the same instructions and caching mechanism that have been used for threading for quite some time, only they are highly optimized and probably there will be some additional tweaks added. Maximum potential won't be used before those instructions are but other things will start benefiting from day one.

"Lightweight threading is also very little different from what IBM has already been doing with Cell"

Are you talking about software threading/using fibres in SPUs? If yes then you are way off. If no then give me a couple of keywords about IBM doings.

"I can safely guarantee that Larrabee has nothing remotely like SSE5. I'm baffled that you would think that it would."

Larrabee is basically a GPU. From far away it looks somewhat similar to GMX3000 but based on x86 and has other modifications. What makes you think it doesn't have all those instructions that GPUs have had since the day they evolved further from registry combiners, including most of the stuff you praised so much in SSE5?

"False. Intel tried to release its own version of AMD64 and was stopped cold by MS"

Are you talking about IA64 or something else? Was the stopper MS or Intel who dropped the plans themselves?

"No. This is a fundamental shift in AMD's position relative to JEDEC. I'm sorry you can't see it."

That "No" was about what exactly?
I know what G3MX brings for AMD. Just that your talks about it bringing down RAM prices are rather weird.

"What stuff are you referring to?"

The stuff intheknow wrote and you deleted. Bringing x86 to (ultra) low power market is a big deal. I remember you praising bobcat quite a bit.

"Sure, when Intel announces a release date."

At least they have working prototype, something that cannot be said about SSE5 or G3MX. What makes the two latter so special they won't need it? Also as you yourself said quickpath will likely be using it in the future as its protocol seems to built keeping it in mind. I wouldn't be surprised if even USB 3.0 or its updates would be connected with it in some way.; October 03, 2007 2:24 PM
abinstein said...: "How are SSE5 and G3MX any different from that? When will they be on desktop? When will programs start benefiting from SSE5? Will SSE5 be important if Intel makes up its own new version of it?"

G3MX can reduce cost of processor packaging and motherboard design, with the addition of a G3MX chip. It is something that's actually developed. OTOH, "fine-grained parallelism on multi-core" is nothing new; it's been talked about for decades in the academic world. It can't be "developed" without software support.

And SSE5 is just a proposal, but not a development yet. Here I'd agree with you and disagree with scientia.; October 03, 2007 5:01 PM
abinstein said...: Ho Ho -
"There was also Prescott on February 1, 2004 that was rather big update to Northwood, about as big as k8->k10, if not greater."

It is a perfect example of how Intel squander its R&D resource on things that few customer benefits.

"Only problem was that the CPU was designed to run at much greater frequency than the manufacturing technologies allowed."

In other words the processor was poorly designed.; October 03, 2007 5:04 PM
abinstein said...: Ho Ho -
"Any theories why does ATI GPUs win in almost everywhere but still NV sells around three times as many GPUs and earns massive profits unlike ATI who had 200M revenue with 50M losses?"

Ho Ho, first you already recognize the fact that ATI has superior products, don't you?

Second, what you asked is because of marketing and fanboyism. Like people who'd gladly pay for Pentium 4 and claim it's a "more bang for the buck." You don't need to retell the story just remember the time you bought your P4 I was able to get A64 with comparable pricing running cooler and faster (where I remember you ask me not to brag about my good deal, but you miss the point that my good deal actually negates your claims that you got "more bang for the buck").; October 03, 2007 5:09 PM
abinstein said...: "And lightweighted threading will not need any software support, perhaps only updated OS or a few libraries."

Seriously, when is that last time that you see that happen, that "only a updated OS or a few libraries" does not require extensive software support?; October 03, 2007 5:17 PM
abinstein said...: Ho Ho -
"I can't see how it would have any effect on desktop RAM. Prices are already so low that they cannot drop significantly simply because of removing some type of RAM from deveopment."

I'm sorry but you make errors all over the places. DRAM prices are never too low. It's been declining pretty much exponentially for the past 15 years. Right now the RAM manufacturers make money off the tiny profit margin which when multiplied by the volume becomes huge revenue.

By using a cheaper & more effective interconnect, even a few bucks per module, DRAM manufacturers can make huge profit - which of course has to spread over chip designers, chip fabs, chip packaging, and sales. Nevertheless, it's a good move to the industry.

"Larrabee is basically a GPU. From far away it looks somewhat similar to GMX3000 but based on x86 and has other modifications. What makes you think it doesn't have all those instructions that GPUs have had since the day they evolved further from registry combiners, including most of the stuff you praised so much in SSE5?"

Now here you're being silly. GPU and SSE5 are very different. They have different characteristics and target different workloads. You should know it better, don't you?; October 03, 2007 5:28 PM
abinstein said...: gutterrat -
"How can you "safely guarantee" this? Do you have "insider information"? Are you on the Larrabee design team? Please tell how you can "safely guarantee" this?"

I think it's pretty much how anyone with some knowledge can "safely guarantee" NASA is not going to build a dam on the moon when they actually return to it - because it doesn't make sense.

Of course being the Gutterrat you wouldn't know/believe that, would you? ;); October 03, 2007 5:31 PM
Ho Ho said...: abinstein
"G3MX can reduce cost of processor packaging and motherboard design"

... by needing additional HT links from CPU and lanes from motherboard? I admit that it makes it easier to attatch more RAM to a CPU but it surely won't make things simplier than they already are.

I know that FBDIMM will be discontinued but that really did lower the motherboard and CPU complexity while offering more bandwidth. Too bad that the rest of the platform wasn't good enough.

"OTOH, "fine-grained parallelism on multi-core" is nothing new; it's been talked about for decades in the academic world"

Yes it has been and at last there is a chance that it might actually be benefitial to create separate threads that would run just a few hundred, if not few tens of CPU cycles. Transactional memory is no small thing either.

"It can't be "developed" without software support"

For one thing all the locking and mutex mechanisms will get a lot more efficient than they are today.

"In other words the processor was poorly designed."

I'd say they overestimated the speed they can reduce leaking. They gambled and they lost. I wonder how well could Netburst work on 45nm. I wouldn't be surprised if at 10GHz there wouldn't be a CPU today that could compete with the theoretical high-clocked Netburst ...

"Ho Ho, first you already recognize the fact that ATI has superior products, don't you?"

It depends on the definition of superiority. If you'd compare the list with previous months then you'd see that pretty much the only thing that has changed is that ATI has lowered prices giving them more bang for buck. They were already leaking a lot of money last quarter, I wonder what will happen this quarter.

For an average customer they do offer cheaper solution in most segments at the moment. For an investor things look a whole lot different.

If you want to see how customers choose then just see some statistics. You should already know Steam survey and you can find something about enthusiasts on B3D forums.

For me personally ATI products will not work thanks to their really bad Linux support. I've tried and I've compared. It will take years until the released specs can be turned into competitive drivers, just see how far did the old reverse-engineered driver get. NV isn't opensource but at least it works and works well.

"... just remember the time you bought your P4 I was able to get A64 with comparable pricing running cooler and faster"

I might have also if I had lived in US. Another thing that would have probably helped would have been if AMD would make it up for resellers if they lower prices the same way as Intel does. Intel price drops arrive to Estonia in a day or two whereas it can take weeks for AMD CPUs.

"Seriously, when is that last time that you see that happen, that "only a updated OS or a few libraries" does not require extensive software support?"

If you update a library then all the programs using it will benefit from the updates. AMD made a fix for XP to properly support dualcore CPUs and allow proper RDTSC usage, no other sofware needed fixing. Witl lighweight threading the biggest part will be done on OS level e.g pthreads and kernel. After that lots of things will just work.

"GPU and SSE5 are very different"

One runs on the CPU and other on GPU. They usually target different workloads but as Larrabee is x86-compliant it should be quite simple to port code from it to another x86 CPU or vica versa. It will be a great target for accelerating HPC.

"DRAM prices are never too low"

Apparently you haven't seen the profitability of dram producers during last couple of years. As a consumer I couldn't be happier but unless something changes thing can get really ugly.

"Now here you're being silly."

Why so? I see Larrabee as a kind of a testing platform for Intel. With it they can test and evaluate different kinds of things and incorporate them to the future CPUs if something works well. Ring topology, special purpose units and 512bit SIMD are just a few things there are on Larrabee that may work well on a regular CPU.

"They have different characteristics and target different workloads"

How about GPGPU? NVidia seems to be earning a lot from that market, Intel seems to want to become the biggest player in that field, AMD has Bulldozer and Fusion. How different is the market they are all targeting?

As for "quaranteeing Larrabee features" I was speechless when I read what scientia wrote. It almost sounded as if he has no idea about what Larrabee has in common with GPUs.; October 03, 2007 6:23 PM
Unknown said...: Hoho, apparently you don't understand finances. Scientia and abinstein's point both fix the problem that the RAM industry has today. That of excessive development and manufacturing costs per the amount of money made on the product. By diversifying manufacturing and development to support ECC, Registered memory, and FBDIMM they have inherently made their business less efficient and more expensive than if they just manufactured regular memory.

There is no argument against the above, btw. It's a simple truth excersized by anyone with any form of business sense or education.

Mo, nice spin. Saying AMD WILL break even in q4 is way different than saying it can, as scientia did.

Ho ho, abinstein has a point. The markets that need sse5 are the markets that wont be using Nvidia's or ATI's cards as GPGPUs, and thus would not be using Larrabee as a GPGPU (the core is still flaunted as a GPU, no?).; October 03, 2007 7:41 PM
Scientia from AMDZone said...: Mo

"nice, resorting to personal attacks."

Mo, my apologies for assuming you were familiar with an icon like Grouch Marx. The link will explain it.

" You fully heartedly agreed with AMD that it can break even and you even went further to show what they could do to meet that criteria."

No, your statement is again false. Saying that AMD's statement is possible or giving a projection showing that it could be possible is not the same as agreeing.

"and ofcourse our dear Sci. is NEVER wrong."

Mo, you are a real comedian; I've been wrong plenty of times.

"Again, thanks for the personal insult and no I don't beat my wife."

Again, I'm sorry you've never heard of Groucho Marx; I'll try to make fewer assumptions about your knowledge in the future.; October 03, 2007 9:25 PM
Scientia from AMDZone said...: abinstein

"And SSE5 is just a proposal, but not a development yet. Here I'd agree with you and disagree with scientia."

The final configuration is not finalized yet. However, at least 75% of the current SSE5 spec will not change including the most advanced features like 3 and 4 way instructions and conditional constructs.; October 03, 2007 9:42 PM
Scientia from AMDZone said...: Ho Ho

"I can't see how it would have any effect on desktop RAM."

Because it increases the volume of standard memory and reduces the cost of R&D; this is basic economics.

"It simply the same instructions and caching mechanism that have been used for threading for quite some time"

Are you quite certain that Intel's lightweight threading contains all of the current Penryn instructions or just a subset?

"Larrabee is basically a GPU... What makes you think it doesn't have all those instructions that GPUs have had since the day they evolved further from registry combiners, including most of the stuff you praised so much in SSE5?"

Because Intel did not develop SSE5 nor does Intel have any reason to cannibalize Itanium sales. Are you somehow confusing SSE5 with SSE4? These two are very different. The only Intel architecture that uses 3 way instructions is Itanium.

"Are you talking about IA64 or something else?"

Obviously I wasn't talking about Intel's version of AMD64, EM64T. IA64 is the Itanium instruction set. Intel tried to create a set of 64 bit x86 extensions that were not compatible with AMD64.

" Was the stopper MS or Intel who dropped the plans themselves?"

I'll repeat myself; it was stopped by MS, not Intel. Would you like to ask the same question a third time?

"Bringing x86 to (ultra) low power market is a big deal."

Geode already brought x86 to low power. Intel's offering is just a step further as is Bobcat.

" I remember you praising bobcat quite a bit."

Apparently you have trouble with perspective. Bobcat is good for AMD specifically because it combines the current MIPS, Geode, and Turion efforts into a single line. This is good for AMD because it reduces engineering costs and allows greater focus. However, this does not translate to a revolution in computing as you seem to claim for Silverthorne.

"At least they have working prototype, something that cannot be said about SSE5 or G3MX."

You are foolish indeed. AMD has prototypes of G3MX and has had for awhile now. The SSE5 work is already in hardware prototyping (but not final silicon). These will show up with Bulldozer. Again, when will Intel's work be released?; October 03, 2007 11:40 PM
InTheKnow said...: ARM isn't used on the desktop.

Odd, I don't see any restriction to the desktop in the title of your article.

In any event, what Intel is trying to do is push mobile (i.e. laptop) levels of computing power down the device stack. Imagine laptop computing power in your cell phone. I contend that is a significant shift since it would spell the end of the embedded market as we know it.

Right now I see the big problem as the lack of a useful interface in such a small form factor. But one step at a time. First these chips have to prove they can displace the existing architecture in this space.

Incidentally, I see this as a big deal for both Intel and AMD as this is a big market if the x86 architecture can get a foot in the door.; October 04, 2007 12:03 AM
Scientia from AMDZone said...: Ho Ho

"... by needing additional HT links from CPU and lanes from motherboard? I admit that it makes it easier to attatch more RAM to a CPU but it surely won't make things simplier than they already are."

Actually it does make things simpler in several ways.

"I know that FBDIMM will be discontinued but that really did lower the motherboard and CPU complexity while offering more bandwidth."

Now you are being silly when you claim that FBDIMM reduced complexity but that G3MX won't. G3MX has a very similar reduction in memory port width which of course makes motherboard layout easier. However, it also has twice the fanout per buffer chip of FBDIMM which reduces both cost and power draw below that achievable by FBDIMM. Better still, it works with ordinary ECC DIMMs which means a reduction in cost and better availability of faster DIMMs.

"Yes it has been and at last there is a chance that it might actually be benefitial to create separate threads that would run just a few hundred, if not few tens of CPU cycles."

I guess I'm still baffled why you don't seem to understand that lightweight threads are less efficient rather than more efficient. You have to use more threads to make up for the losses and even more to gain any advantage. This larger number of threads also has to be reflected in software design or else you have nothing to run.

"For one thing all the locking and mutex mechanisms will get a lot more efficient than they are today."

Cray has a massively threaded chip design. Do you think they will drop it in favor of Intel's design?

"I wonder how well could Netburst work on 45nm."

Just like Penryn the change would be small.

"I wouldn't be surprised if at 10GHz there wouldn't be a CPU today that could compete with the theoretical high-clocked Netburst ..."

Okay, I'll say something equally ridiculous. I wouldn't be surprised if at 7Ghz there wouldn't be a CPU today that could compete with the theoretical high-clocked K8. Why are you talking about nonachievable speeds?

"If you update a library then all the programs using it will benefit from the updates. AMD made a fix for XP to properly support dualcore CPUs and allow proper RDTSC usage, no other sofware needed fixing."

And yet, software did not magically change to be able to take advantage of two cores. Some benchmarks are still only able to use one thread. Nice theory though.

" Witl lighweight threading the biggest part will be done on OS level e.g pthreads and kernel. After that lots of things will just work."

But still won't take advantage of all those additional threads.

"One runs on the CPU and other on GPU. They usually target different workloads but as Larrabee is x86-compliant it should be quite simple to port code from it to another x86 CPU or vica versa. It will be a great target for accelerating HPC."

Which again has absolutely nothing to do with SSE5. Why do you keep making the same ridiculous assumption that just because Larrabee is x86 derived that it can do SSE5? This is nonsense. I suppose Intel could add support for SSE4 but this is not SSE5.

"I see Larrabee as a kind of a testing platform for Intel. With it they can test and evaluate different kinds of things and incorporate them to the future CPUs if something works well."

So, no release date?

"Ring topology,"

Which is already used by ATI.

" special purpose units"

Like Cell?

" and 512bit SIMD"

Similar to Stream?

" are just a few things there are on Larrabee that may work well on a regular CPU."

Are a few areas where Intel might need to catch up.

"How about GPGPU? NVidia seems to be earning a lot from that market, Intel seems to want to become the biggest player in that field, AMD has Bulldozer and Fusion. How different is the market they are all targeting?"

AMD is not doing massive lightweight threading, nor is nVidia.

"As for "quaranteeing Larrabee features" I was speechless when I read what scientia wrote. It almost sounded as if he has no idea about what Larrabee has in common with GPUs."

I'll say it again: Larrabee does not have SSE5. Intel will only move in that direction if it becomes obvious that AMD will pull too far ahead. And, when Intel does move in that direction they will first try to create a non-compatible standard.; October 04, 2007 12:29 AM
Unknown said...: In the know, I'd really like to see your reasoning as to how mobile phones could somehow kill the embedded market.

Also, while 1 watt is a pretty amazing power envelope for everything they've packed in there, that still means a very good mobile phone would have a battery life of 2-3 hours of actual use, which would be fairly useless. Yes, silverthorne is a step towards moving in the direction, but it's still not technically a shift.; October 04, 2007 12:35 AM
Scientia from AMDZone said...: InTheKnow

"Odd, I don't see any restriction to the desktop in the title of your article. "

Odd that you don't realize that I've never written an article about embedded or palmtop chips.

"In any event, what Intel is trying to do is push mobile (i.e. laptop) levels of computing power down the device stack."

Which has been the norm; nothing unusual in what Intel is doing.

" Imagine laptop computing power in your cell phone."

I'm trying to imagine both where you would carry the battery and where you would put the keyboard. In reality, neither Silverthorne nor Bobcat are laptop computing power unless you are talking about low end emerging markets.

" I contend that is a significant shift since it would spell the end of the embedded market as we know it."

which would follow AMD's lead with Geode. Where is the shift? x86 embedded will replace both ARM and MIPs for low end applications because the development is easier. AMD already has plans for this in its ATI consumer products section.; October 04, 2007 12:40 AM
Ho Ho said...: greg
"There is no argument against the above, btw. It's a simple truth excersized by anyone with any form of business sense or education."

I never said it makes no difference. I only said it will take years for other types of RAM to phase out and customers won't even notice the tiny change in prices. FBDIMM will be discontinued no matter what AMD does.

"Ho ho, abinstein has a point. The markets that need sse5 are the markets that wont be using Nvidia's or ATI's cards as GPGPUs, and thus would not be using Larrabee as a GPGPU (the core is still flaunted as a GPU, no?)."

It is exactly that they won't be using regular GPUs for their stuff that they may very likely use Larrabee as it is basically a regular x86 CPU with a lot more computational power and memory bandwidth. Or can you give a reason why they wouldn't want to use it?; October 04, 2007 4:22 AM
Ho Ho said...: scientia
"Because it increases the volume of standard memory and reduces the cost of R&D this is basic economics."

In how many years can we see the results of that?

"Are you quite certain that Intel's lightweight threading contains all of the current Penryn instructions or just a subset?"

What do you mean by that? In their paper they describe ways how to offload software task scheduling to hardware and accelerate switching between them. The CPU will contain all the other x86 instructions than its predecessor did. Did you think that LWT will not be availiable on regular CPUs?

"Because Intel did not develop SSE5 nor does Intel have any reason to cannibalize Itanium sales"

Are you saing that they will create Larrabee and remove all the regular multi-operand instructions from it? You do know that GMA 3000 (page 6) is actually quite similar to what Larrabee is only that it isn't x86 compatible? It also contains most of the nice functions SSE5 does like MADD, vector conditionals and many others SSE5 doesn't have. Intel would have to be insane if it would remove those functions from Larrabee.

"Are you somehow confusing SSE5 with SSE4?"

No, I am not. I'm saying that Larrabee will contain most of the SSE5 instructions because they have existed in GPUs for years. I doubt it will be called SSE5 compatible but I'm sure it will contain the most important instructions, notably the multi-operand and vector conditional ones.

"The only Intel architecture that uses 3 way instructions is Itanium."

... and IGP's.

"Obviously I wasn't talking about Intel's version of AMD64, EM64T. IA64 is the Itanium instruction set. Intel tried to create a set of 64 bit x86 extensions that were not compatible with AMD64."

So there was something else besides EM64T and IA64? Where can I read about it?

If you are talking about MS not accepting IA64 then I hope you do know that Windows Server supports Itanium. I know they scrapped XP support, probably because the CPU costed a "bit" too much and RoI would have been too small.

"AMD has prototypes of G3MX and has had for awhile now"

Any links? I couldn't find any with quick googling.

"The SSE5 work is already in hardware prototyping (but not final silicon). These will show up with Bulldozer."

How do you know it has silicon prototypes if we haven't even heard about Shanghai that should be released at least a year before Bulldozer? When will Bulldozer be released? I know they themselves planned it for around 2009 but considering how AMD has delayed pretty much everything lately I wouldn't be surprised if it was pushed way back also.

"Again, when will Intel's work be released?"

My guess would be around 2010-2012. Btw, wasn't HT3 on Opterons pushed back into 2009 or something? I'm not sure if the things I read were correct or not.; October 04, 2007 4:40 AM
Ho Ho said...: scientia
"G3MX has a very similar reduction in memory port width which of course makes motherboard layout easier"

No. G3MX will have the same memory interface to DIMM slots as regular DDR slots today. In addition it has HT3 lanes to CPU socket. FBDIMM had much less lanes straight from DIMM slots to memory controller in northbridge or CPU.

See this. Just imagine that regular DDR2 picture has G3MX instead of CPU and there are additional lanes for HT connection to CPU.

Of cource reduction in memory cost could offset the motherboard complexity cost a bit.

"I guess I'm still baffled why you don't seem to understand that lightweight threads are less efficient rather than more efficient. You have to use more threads to make up for the losses and even more to gain any advantage."

Less efficient compared to what? What losses? Do you have any theories how could one get decent scaling from future >=16-core CPUs?

"This larger number of threads also has to be reflected in software design or else you have nothing to run."

Yes but it will also show on todays threaded applications by helping with scalability.

"Cray has a massively threaded chip design. Do you think they will drop it in favor of Intel's design?"

What exactly does Cray have to do with anything?

"Just like Penryn the change would be small.

Considering Intel will sell 80W 3GHz quadcores in 1.5 months I wouldn't call the change too small. Certainly a lot bigger than what AMD has achieved so far with 65nm. Netburst saw a huge reduction in power usage moving to 65nm and 45nm would have helped more. I'm not sure if it would be enough for 10GHz or not.

"Why are you talking about nonachievable speeds?"

I'm just theorizing on what could have happened.

"And yet, software did not magically change to be able to take advantage of two cores."

Well, video drivers today are multithreaded and it has helped quite a lot in some cases.

"Some benchmarks are still only able to use one thread. Nice theory though."

It will be much worse with SSE5 adoption. Dualcore had potential to double performance and paved the way to "free speedups" with future multicore CPUs. How big performance increase will SSE5 give and how much will it help with upcoming CPUs with additional changes? It is a one-time speedbump.

"Which again has absolutely nothing to do with SSE5"

... besides that as I said Larrabee will contain those praised multi-operand instructions.

"Why do you keep making the same ridiculous assumption that just because Larrabee is x86 derived that it can do SSE5?"

I have never said that. I have only said that it has the most of the instructions SSE5 does but I doubt Intel would call it SSE5 compatible.

"So, no release date?"

Its release date is currently speculated to be in the late 2008-early 2009 timeframe

"Which is already used by ATI."

... in a GPU as an extension to memory controller, yes. Though it doesn't seem to help it too much as NVidia can deliver same performance with much less.

"Like Cell?"

Cell has SPEs and these aren't exactly special purpose. Toshiba Spurs does have some special purpose units in addition to SPEs though. I was thinking stuff like texture filtering and accelerators that are found in current GPUs. With that Intel can see how it can incorporate the use of out-of-core accelerators into x86 instructionset.

"Similar to Stream?"

No, R580 is a lot different than that.

"Are a few areas where Intel might need to catch up."

Not if you would consider x86 CPUs.

"AMD is not doing massive lightweight threading, nor is nVidia."

Perhaps they should if they want to have their massively threaded processors to scale decently?

"I'll say it again: Larrabee does not have SSE5."

I'm not sure how will it be labelled but as I said numerous times it will have the most important instructions from it.; October 04, 2007 4:58 AM
Ho Ho said...: greg
"Also, while 1 watt is a pretty amazing power envelope for everything they've packed in there, that still means a very good mobile phone would have a battery life of 2-3 hours of actual use, which would be fairly useless."

Why do you think the CPU will use 1W when idle?; October 04, 2007 4:58 AM
Ho Ho said...: scientia, I'm still waiting for ansvers for some of my questions.

1) When will SSE5 and G3MX start effecting desktop?
2) How long did it took for software to start using any SIMD instruction sets?
3) Why isn't lightweighted threading robust enough for virtualization?
4) With what exactly were you comparing LWT when you asked if it is the same thing as in Cell?
5) What makes you think that Larrabee won't be containing multi-operand and vector conditionals if pretty much every programmable GPU has contained them since day one?; October 04, 2007 5:05 AM
Scientia from AMDZone said...: mo

Groucho Marx link fixed.; October 04, 2007 8:12 AM
Scientia from AMDZone said...: ho ho

No. There is no link to Intel's alternative version of AMD64. This predated the initial AMD64 derived version in Prescott (which was itself incomplete).

No. There is no link showing that MS rejected Intel's alternative version. Internal negotiations are not typically pasted on the corporate website.

No. There is no link showing AMD's highly secretive internal research. The SSE5 research has moved beyond software simulation and is at least in initial hardware prototyping.

No. There is no link showing AMD's G3MX prototypes. I believe they are doing work with Micron on the prototypes and are fairly far along. In all honesty, AMD would be capable of G3MX boards in 2008 if they had a processor ready that used it. I guess you will have to wait and see if what I said is true; that AMD will begin bypassing JEDEC.

As far as Larrabee having the full set of x86 instructions you are wrong. Ars Technica:

The cores will implement a subset of the x86 ISA that includes some GPU-specific extensions.

You can also look at the update on Ars Technica.

I'm still baffled why you (and other dreamers) keep getting glossy eyed over Larrabee. Larrabee does not contain SSE5 nor is it likely to implement anything like SSE5. Larrabee is a stripped down x86 ISA with a lightweight architecture. IBM used lightweight architectures in its embedded version of Power and in Cell for HPC. Both of these designs have limited general computing functionality. This is equally true of Larrabee.

Frankly I don't know how else to explain to you about Larrabee's limitations. Each core only has 32K of L1 cache and 256K of L2. The cache bandwidth is low and the ring bus design has a lot of latency. Concepts like this go way back to the Transputer boards of twenty years ago. Do you genuinely not understand this?

Gesher might be better but I don't know yet if it is a subset ISA like Larrabee.; October 04, 2007 8:47 AM
Scientia from AMDZone said...: ho ho

Your reference to hardware with only a 32 bit FP unit is nothing like proof of Larrabee's prowess. If you want to have a discussion about the Larrabee ISA you're going to have to wait until it is published. I'm not going to argue with you about your misconceptions about SSE5 versus an unknown ISA.

However, I will say that your talk about Larrabee is almost identical to what was said about Cell. When the initial theoretical FP speed for Cell was published there were lots of people who claimed that this was a revolution in computing and that Cell would replace other CPU's on the desktop. It only took me about 5 minutes of examining the Cell architecture to see that this was not the case. History shows that I was indeed right with Cell's only being used in video games and HPC. Even its application in HPC has limitations.

Larrabee has Cell's racetrack memory layout and a similar lightweight design. It takes multiple Larrabee cores to match one modern x86 core in terms of general computing. You might amazing speed in some specialized areas but this will be matched in x86 designs with things like Fusion anyway. Larrabee is the same kind of empty revolution in computing as Cell.; October 04, 2007 9:11 AM
Unknown said...: Ho ho, even if the phone could get its sleep state down to 1/12th is active state power usage, it would still only last for a day and a half. It needs to be at least twice as long to make it into a much higher end phone, and it needs to be 3 times as long to make it into mid range phones (realize, that I'm speaking in terms of the asiatic markets, where high end is above $800 while mid range is above $500).

This completely ignores the power needs of the screen, memory system, and other losses, so silverthorne is even less useful in this sense.

Yes, we're a lot closer to cellphones with the same power as a laptop, but it's either going to require a couple process shrinks and a new battery technology, or several process shrinks, which means we will have to wait a while.

Embedded boards with the same power as a laptop have been here for quite some time. In fact, the laptop I'm typing this on is less powerful than any geode or c5 system, though it probably consumes less power.; October 04, 2007 12:15 PM
Aguia said...: In my country:
SAPPHIRE 2900 PRO 1GB 280€
SAPPHIRE 2900 PRO 512MB 230€
XFX GeForce 8800GTS 320MB 320MB 310€

When is NVIDIA going to update their prices...
Ati or their partners must be losing money manufacturing those cards. How such complex design can cost so little. Nvidia for the contrary is doing millions for sure.
I start to agree with Ho ho on this one.; October 04, 2007 12:55 PM
Unknown said...: Mo, the link is definitely not broken, I don't know what you're talking about. Either way, the reference is not actually to a specific comment, but instead to how you've phraised your questions in a way that can't be answered negatively.

Also, freedom of speech means you can get your own blog and complain about scientias, not that you can write whatever you want on his.

Aguia, AMD is manufacturing on 65nm for all of their new cards, and will be moving on to 55nm for all of r670. Thus, their manufacturing cost per die should be lower than nvidia's unless there's a drastic disparity in their die size in terms of transistors and basic geometry.

It'll be interesting to see exactly how this all plays out, as the article that was linked to makes the very good point that both nvidia and ATI are moving in very different ways than the graphics industry normally does.; October 04, 2007 5:05 PM
abinstein said...: "I like AMD, but now I want them to fail for ONE sole reason. I want you to choke on your own words."

This is ridiculous - both as a concept by itself and as an excuse for your (anti-)fanboyism.

BTW, freedom of speech is as Greg says that you can write whatever you like and scientia can delete them on his blog. :); October 04, 2007 7:51 PM
abinstein said...: Ho Ho -
"It is exactly that they won't be using regular GPUs for their stuff that they may very likely use Larrabee as it is basically a regular x86 CPU with a lot more computational power and memory bandwidth. Or can you give a reason why they wouldn't want to use it?"

Frankly I think your ideas of Larrabee is pretty vague and impractical. Larrabee is not an x86 upgrade; it is not an x86 extension (like SIMD extension) or expansion (like additional GPU cores - Fusion).

Larrabee is a great number of small x86 subset cores implemented in a multi-core way. The processor has massive parallelism like a GPU, yet it executes general-purpose instructions from an x86 subset.

This makes Larrabee more different from AMD's SSE5 or Fusion than from Sun's Niagara and Rock or IBM's Cell. Larrabee will target a totally different problem areas than both SSE and Fusion.

"1) When will SSE5 and G3MX start effecting desktop?"

Both are planned for 2009. G3MX is planned for Opteron. Whether it will propagate to desktops depends on how successful it is. (For example, FB-DIMM is almost a complete failure, and thus it's not likely to be used on desktops.)

2) How long did it took for software to start using any SIMD instruction sets?

I think it's pretty soon for Intel's SSSE3 and SSE4 to show "dramatic performance improvements", at least on Intel designated benchmarks, wasn't it?

You get benefit from SSE5 momentarily when it is implemented and you care to use those instructions.

"3) Why isn't lightweighted threading robust enough for virtualization?"

They are two different things. Light-weight threading attempts to break a single application to multiple small threads. Virtualization attempts to run multiple system on a single system. In order to do virtualization efficiently, system image states must be efficiently saved and loaded; having more software-exposed threads can actually make this harder and slower.

"4) With what exactly were you comparing LWT when you asked if it is the same thing as in Cell?"

Have you actually read the paper that you quote yourself? Honestly it's nothing newer or more novel than what's been proposed in the academia (I thought I told you this before), and scientia has a point that in terms of implementation it's not more than either Cell or even AMD's CTM.

The only thing that Intel could possibly brag about is probably its general-purpose nature and x86 "friendliness". Still, in order for the idea, it looks at only applications that can be completely parallelized. Frankly it's not any big percentage of programs out there.

"5) What makes you think that Larrabee won't be containing multi-operand and vector conditionals if pretty much every programmable GPU has contained them since day one?"

Multi-operand is nothing special - you can do that with a combination of instructions from any ISA. It is "multi-operand in a single instruction" that makes SSE5 (and Altivec, for the matter) different. Larrabee, being implemented with a subset of x86, certainly does not have this.

Vector conditional doesn't make sense for GPUs. Do you have an example of it?; October 04, 2007 8:28 PM
InTheKnow said...: Odd that you don't realize that I've never written an article about embedded or palmtop chips.

No, but you have written an article about the music industry. Am I to assume that is the focus of this blog? Of course not, but I'm talking about full laptop levels of computing here, which you have written about. The embedded discussion comes up because that is what Intel is setting themselves up to compete against.

Which has been the norm; nothing unusual in what Intel is doing.

No, this is different. Intel tried to get into this space with x-scale and failed miserably. They sold that part of the business off as a loss. Now they are trying to get into this space with x86.

AMD entered this space by buying the Geode line from National. National got it from Cyrix. AMDs contribution to Geode was to add more PC computing power in the form of L1 cache and SSE support. Incidentally this raised the power envelope from ~1W to ~9W.

I'm trying to imagine both where you would carry the battery and where you would put the keyboard.

Which is why I said "Right now I see the big problem as the lack of a useful interface in such a small form factor." Please give me credit for at least acknowledging the obvious issues.

The only really workable solution I see is to get voice recognition to work. Though there are folding screens and virtual keyboards out there. I'll dig up links if you need me to. They are out there.

In reality, neither Silverthorne nor Bobcat are laptop computing power unless you are talking about low end emerging markets.

Not true. Silverthorne is reported to have the computing power of a Pentium M. There are plenty of Pentium M laptops still in use and they function well enough.; October 05, 2007 12:10 AM
InTheKnow said...: Greg said...
In the know, I'd really like to see your reasoning as to how mobile phones could somehow kill the embedded market.

First the phones, tomorrow the world. Muhahaha!!!

Seriously, you are right phones alone won't be the end of embedded by itself. But I agree with the author of the piece from Ars Technica that the shear volume of legacy x86 code makes it a significant threat. If the x86 architecture can be squeezed into the right form factor and power envelope to work in cell phones, it is only a small step to move that computing power into other traditional embedded spaces.

Also, while 1 watt is a pretty amazing power envelope for everything they've packed in there, that still means a very good mobile phone would have a battery life of 2-3 hours of actual use, which would be fairly useless. Yes, silverthorne is a step towards moving in the direction, but it's still not technically a shift.

The trick is to see the shift before it arrives and be waiting for it. :) If the rumors are true, having this in cell phones is closer than you think.

I don't think Intel has proven to be especially good at this, but they have the financial resources to throw stuff at the wall and see what sticks. AMD can't afford that. They have to guess right.; October 05, 2007 12:26 AM
Aguia said...: Silverthorne is reported to have the computing power of a Pentium M.

Which means absolutely nothing.; October 05, 2007 7:40 AM
Scientia from AMDZone said...: intheknow

"AMDs contribution to Geode was to add more PC computing power in the form of L1 cache and SSE support."

This needs correction. The Geode FX that AMD got was not x86 compatible. AMD's actual biggest contribution was to have the Geode team create a new GX version which was compatible with 32 bit x86 code.

"Incidentally this raised the power envelope from ~1W to ~9W."

Incorrect. You are confusing the GX and LX Geode lines. AMD's new GX version used less power, not more. The LX Geode however, in spite of the name, was actually a stripped down Turion. This is why it had a completely different power consumption level.

AMD's true low power chip was Alchemy which was also not x86 compatible.; October 05, 2007 10:28 AM
Scientia from AMDZone said...: Silverthorne is Intel's attempt to get into the consumer embedded market. AMD was already ahead of Intel with Geode and is now greatly ahead with the ATI purchase. Half of ATI's income is from consumer electronics.

It has been estimated that Silverthorne will be a loss leader for Intel for at least a year until Intel begins to get established. This is good for a company with no real consumer electronics presence but this also means that Intel won't make much headway before it has to face Bobcat. Neither Bobcat, nor Silverthorne are a revolution in computing. They are not particularly significant except that they will both continue what AMD began with Geode and this is the replacement of embedded ARM, MIPS, and Power chips by x86 based chips. I have no doubt that AMD and Intel will quickly displace these other chips.; October 05, 2007 10:40 AM
Unknown said...: How does it mean nothing? We know that the Pentium M performed very well for a laptop. If they've managed to reduce the Pentium M's 27W TDP down to 0.5W in Sliverthorne that's very impressive indeed.; October 05, 2007 12:25 PM
anonymous said...: Scientia wrote:
This is good for a company with no real consumer electronics presence but this also means that Intel won't make much headway before it has to face Bobcat.

But aren't the StrongArm (PXA***) family of processors dominant in the smart phone market (which is consumer electronics)? Blackberry sold 3M devices last quarter, and those all have Intel chips. That's 12M processors a year- even if those sell for only $20/chip, $240M/year in revenue is real money.

And there are others that use them, even if they aren't dominant in the space. So I wouldn't call it no real presence.; October 05, 2007 12:34 PM
enumae said...: Scientia
...Half of ATI's income is from consumer electronics.

Prior to AMD's acquisition (Q3 2006)

PC segment (GPU & Chip sets) = 77%
Consumer segment = 23%

You must mean after loosing Intel chip set business...

Looking at AMD's Q2 2007, Consumer segment only equates to about 43% of ATI's combined income for Graphics and the Consumer segment, this does not include ATI chip sets.

You may want to revise your numbers because "half" is wrong.; October 05, 2007 12:56 PM
Scientia from AMDZone said...: I'll reference an Information Week article.

Intel's current position:

Intel's full-frontal assault on the cell phone chip market failed; perhaps that legacy has inspired an alternate tack

ATI already makes two SOCs. This is new for Intel:

Intel's dipping of its corporate toes into the SoC waters is significant because, traditionally, SoCs have often been aimed at market segments where there wasn't enough volume to justify the launch of a one-size-fits-all product. Intel, which lives on large volumes, always has stuck with the big stuff.

What market is this for?

In opting to try SoC, Intel may be engaging in something of a loss-leader strategy. That is, perhaps it is signifying that it is content to make less money in the UMPC market (with Silverthorne) to begin with, if it can help nurture that market. (Remember that the ultra-mobile segment doesn't really exist today).

So again, why would I see this as significant?; October 05, 2007 1:59 PM
Scientia from AMDZone said...: Here's another Information Week article that tells what Intel's position was in Cell phones as of June 2006:

Intel, the world's largest semiconductor company, on Tuesday finally owned up to one of the most colossal failures in that industry's history when it unloaded its communications and applications processor business to Marvell for $600 million.

During the course of the past decade Intel invested between $3 billion and $5 billion in the assets it sold to Marvell, says Will Strauss, an analyst for Forward Concepts. Intel spent nearly $2 billion on a single acquisition to bolster those communications chip efforts. It was a major rat hole of unparalleled magnitude.

In the baseband processor market for cell phones, which totals $6.5 billion, the story was even worse. Last year Intel managed less than 1% market share, Strauss says.

This article from Palm Infocenter shows what was sold in the Marvell sale:

Intel’s communications and application processor business develops and sells processors for handheld devices including smart phones and personal digital assistants. The business’ processors, based on Intel XScale technology, include the Intel PXA9xx communications processor, codenamed “Hermon,” which powers Research in Motion’s (RIM) Blackberry* 8700 device. The Intel PXA27x applications processor, codenamed “Bulverde,” is used in the Palm Treo smart phone, the Motorola Q and other devices.

So, whatever limited presence Intel had in Cell phones and palmtops with StrongArm (XScale) is now Marvell's presence.; October 05, 2007 2:18 PM
Unknown said...: I've just realised....

....45nm high-k doesn't make this list?

oh wow; October 05, 2007 5:55 PM
Unknown said...: During the course of the past decade Intel invested between $3 billion and $5 billion in the assets it sold to Marvell

Almost as bad as investing $5bn in 1 hit eh?

You forget, Intel can burn money. Silverthorn on the way. They had to get rid of this. What's the point of producing uncompetitive products against yourself? Sell it on while you can.; October 05, 2007 5:57 PM
abinstein said...: lex... I'm not sure whether you're trying to be funny or to look silly. You apparently don't know about the high-K development. Intel is probably the first that brings high-K to production, but not the only one doing so.

Also, I still fail to see how Penryn is going to change anything when it's just marginally (~7%) faster than Clovertown. It's probably a good thing for Intel that it finally has competitive floating-point capability, but the overall architecture is still inefficient.

Then it's your notion of "Intel has too much cache too lose the war." That, if you didn't know, only means Intel has more brute force; the fact that the company is still trailing its much smaller rival in a few crucial aspects, such as HPC performance, performance per watt, system architecture, scalability, clearly shows that brute force is not enough, and Intel as I said is primarily a manufacturing & marketing facility.; October 05, 2007 9:34 PM
enumae said...: Abinstein
...but not the only one doing so.

Maybe it is my interpretation of your comments, but is their a reason you are down playing Intel's High-k/Metal gate?

Is High-K/Metal gate not a big deal?

I am not trying to imply that I understand the complexities of manufacturing processors or process nodes so any help would be appreciated.

...but the overall architecture is still inefficient.

Do you mean system (platform) architecture or processor architecture?

...such as HPC performance

Due to the FSB, right?

...performance per watt

Depends on work loads, right?

...system architecture

Depends on work loads, right?

...scalability

Could you elaborate a little more? It would seem that some work loads scale well while others don't.

Or are you talking about performance scaling with clock speeds which would then fall back to the FSB?

...clearly shows that brute force is not enough...

Looking at current market conditions it would appear it is working just fine :); October 05, 2007 10:06 PM
Unknown said...: http://www.anandtech.com/mobile/showdoc.aspx?i=3117&p=1

AMD still miles behind Intel in the mobile front.

Sci, lets have AMD put their money where their mouth is with all the technical advances they have, you'd think they would be making BILLIONS not losing them lol.; October 05, 2007 10:48 PM
abinstein said...: "AMD still miles behind Intel in the mobile front."

It's been shown multiple times that AnandTech is nothing but an advertising arm (or maybe finger) of Intel.; October 06, 2007 1:37 AM
abinstein said...: enumae -
"Maybe it is my interpretation of your comments, but is their a reason you are down playing Intel's High-k/Metal gate?"

Where did you see me down-playing Intel's high-K? Let me repeat what I said to you here: "Intel is probably the first that brings high-K to production, but not the only one doing so." Is this down-playing?

There have been breakthroughs on the high-K R&D from all camps, definitely not Intel alone. Yet being the largest manufacturer Intel was able to be the first to put it into production.

If you actually read the ieee spectrum article then you would've noticed that it says "we and others" in all statements but the last, when it talks about Intel's gate-first approach. What this means is that lex didn't know the topic he's talking about and had completely wrong idea of how the high-K R&D has been done.

"Looking at current market conditions it would appear it is working just fine :)"

Are you blind or what? Didn't I just speak of all the places that Intel's platform is not "working just fine"?

It's not competitive at HPC where computational power is the most needed; it's not having better performance per watt and less competitive for data centers where again processing power is the most needed; it's not scaling performance well to both high clock rates or socket counts; the system architecture (not depending on workload) is poor that it can hardly be extended with specialized hardware - AMD's CTM are in production while Intel's EXOCHI is still in the lab.

Manufacturing brute force or the ability to sell craps and make a lot of money doesn't solve real-world problems.; October 06, 2007 1:51 AM
enumae said...: Abinstein
Where did you see me down-playing Intel's high-K?

Thanks for taking the time.

My quote could have been misleading, but in the context of this blog, microprocessor from Intel and AMD, "Intel is probably the first...", where do you get probably?

Arn't they the first?

But like I said, it could have been my interpretation.

...you would've noticed that it says "we and others" in all statements...

I am not questioning that others are doing it, but you (in my opinion) don't seem to give Intel the amount of credit that others do, so I am wondering why, is it not as important as others have portrayed?

Are you blind or what?...

Once again... Thanks anyways.; October 06, 2007 2:46 AM
Ho Ho said...: scientia
"No. There is no link to Intel's alternative version of AMD64. This predated the initial AMD64 derived version in Prescott (which was itself incomplete)."

I would be happy to get a link that would say there was anything besides IA64 and EM64T. So far I've seen nothing. If no such link can be found how can you say it existed?

"No. There is no link showing that MS rejected Intel's alternative version. Internal negotiations are not typically pasted on the corporate website."

Again, if no information was released where did you learn about it?

"No. There is no link showing AMD's highly secretive internal research. The SSE5 research has moved beyond software simulation and is at least in initial hardware prototyping."

First you say there is no data and then you say they are having first HW prototypes.

"No. There is no link showing AMD's G3MX prototypes. I believe they are doing work with Micron on the prototypes and are fairly far along."

Yet again no information but you are still quite sure about what you say.

Basically you have absolutely no data to back your claims but you are still claiming I'm wrong. Why so?; October 06, 2007 3:17 AM
Unknown said...: Lex, please don't repeat yourself...

Andy,
a) high-k is not a recent development
b) high-k is merely a progression in manufacturing, much like aluminum to copper interconnects, SOI, or process shrinks. It's apparently very useful now, but may not be later, and was certainly not before, due to the obvious hurdles.

Also, andy, if you didn't notice, the products intel was producing that went into the blackberry were very competitive. Even when you consider the amazing success the iPhone has been, blackberry's are one of the most amazing successes in the cell-phone market currently. I mean, their users need help getting treatment for being addicted to them! I don't see how that's an inferior product.

The problem was, their development scheme was too ineffective and wasteful for the returns that market generated. They had a great product, but they were also spending way too much to make it great.; October 06, 2007 3:40 AM
Ho Ho said...: scientia
"As far as Larrabee having the full set of x86 instructions you are wrong."

I haven't said "full set". My guess is x86-64 with no 32bit backwards compability.

"Larrabee does not contain SSE5 nor is it likely to implement anything like SSE5. Larrabee is a stripped down x86 ISA with a lightweight architecture"

Are you honestly claiming there won't be any kind of MADD-like things done in single instruction? Do you know of any GPU that doesn't have it?

"Both of these designs have limited general computing functionality. This is equally true of Larrabee."

Do you remember the slide that got cencored in a presentation that showed 4P Larrabee-only system? That would mean Larrabee is good enough to run a full OS. If you've forgotten that I can give you a link.

"Each core only has 32K of L1 cache and 256K of L2. The cache bandwidth is low and the ring bus design has a lot of latency."

Yes, L1 is as big as on Core2 and L2 is a lot smaller but thanks to 4-way SMT latency won't be too big problem.

How much do you know about cache bandwidth and what tells you it is low? I wouldn't be surprised to see the bandwidth being much more than in Core2, unless Intel goes crazy and adds >64 512bit SIMD registers.

Also I'd like to know more about how much latency will the ringbus add in your opinion. At most the data has to travel through half the links in ringbus as Larrabee will have memory controllers at at least two ends of the ringbus, possibly more if it has more than two memory controllers. With GPU-setup the GDDR chips will likely be connected at certain intervals (16 32bit/8 64bit channels with 2.5GHz effective memory clock) thus lowering the latency even more. It would probably be somewhat similar to what ATI has used in R520 and upwards.

"Gesher might be better but I don't know yet if it is a subset ISA like Larrabee."

Gesher is a full blown x86 CPU capable of achieving 28 double-percision GFLOPS at 4GHz per core when SSE is used (7 per cycle), ~100-200 with 4-8 cores.

Larrabee will be doing 14-40 DP GFLOPS per core with SSE (8-16 per cycle) and 200-1000 for entire chip, depending on clock speed and core count. Single percision will likely be higher.

So, with up to 16 64bit floating point calculations per cycle with 512 bit wide SIMD Larrabee must be able to execute at least two SIMD instuctions per cycle. FMADD, anyone?

Also you were guessing that Gesher with its 200GFLOPS DP performance per CPU would need to be using GPU to achieve such performance. Well, it doesn't and it also doesn't have 16 cores but 8. My guess is that it'll also have FMADD-like single cycle instructions in addition to wider SIMD units.

"Your reference to hardware with only a 32 bit FP unit is nothing like proof of Larrabee's prowess"

Excuse me but did you miss the point of why I showed it? I gave the link to show that Intel has already implemented somewhat similar things (relatively general-purpouse cores with special accelerators). It has nothing to do with what kind of ISA the cores run or what is their perfomance.

"However, I will say that your talk about Larrabee is almost identical to what was said about Cell."

Cell is very different from Larrabee. Pretty much the only thing in common is big core count and massive computation throughput. Larrabee is a lot easier to code for than Cell and has far less limitations.

"Larrabee has Cell's racetrack memory layout and a similar lightweight design."

Yes, they both have ringbus but memory layout is very different. Cell SPUs basically have relatively small local store that is faster than L2 in most CPUs. SPUs have huge latency to RAM (~3x higher than on Intel) but it is not because of being attached to ringbus but because of not having direct access to it.

Larrabee likely has lower latency to RAM than Cell, automatic prefetching will also help a lot. Cache latencies are told to be 1 cycle for L1 and 10 for L2, ringbus transmits up to 256 bytes per cycle, 2.67x more than Cell.

"It takes multiple Larrabee cores to match one modern x86 core in terms of general computing."

No doubt about that. Only "problem" is that while x86 probably won't have more than 8 cores by then Larrabee will start at around 24 real cores and execudes 96 threads in parallel. Sure, it probably won't speed up internet browsing but anything that has been half-decently parallelized should be flying on it, especially if it needs a lot of computational power.

"You might amazing speed in some specialized areas but this will be matched in x86 designs with things like Fusion anyway."

Are you saying that we will be seeing some CPU instructions offloaded to GPU with Fusion? Any wild guesses about the latency of those instructions? I don't see Fusion being more than a bit better CTM in the near future. Also that would look quite similar to what Cell is only that GPU is even less usable than SPUs.; October 06, 2007 3:57 AM
Ho Ho said...: aguia
"In my country:
SAPPHIRE 2900 PRO 1GB 280€
SAPPHIRE 2900 PRO 512MB 230€
XFX GeForce 8800GTS 320MB 320MB 310€"

Estonian resellers have changed NV prices. I can get brand new 320M GTS for around €240. Though no 2900PROs are anywhere to be seen.; October 06, 2007 3:58 AM
Ho Ho said...: abinstein
"Larrabee is not an x86 upgrade; it is not an x86 extension (like SIMD extension) or expansion (like additional GPU cores - Fusion)."

Well, it does have SIMD extensions and special HW accelerator extensions.

"Larrabee will target a totally different problem areas than both SSE and Fusion."

In addition to GPU workloads Intel is also talking a lot about HPC and RMS.

"I think it's pretty soon for Intel's SSSE3 and SSE4 to show "dramatic performance improvements", at least on Intel designated benchmarks, wasn't it?"

Yes, in a couple of benchmarks. Majority of the code is not exactly too well optimized.

"They are two different things. Light-weight threading attempts to break a single application to multiple small threads. Virtualization attempts to run multiple system on a single system. In order to do virtualization efficiently, system image states must be efficiently saved and loaded; having more software-exposed threads can actually make this harder and slower."

Are you claiming that LWT will make saving and restoring system states? This is hardly a problem as I doubt anyone saves and restores their VM state too often and even then the majority of the time goes to saving RAM contents.

If you are saying that managing more threads is more expensive than managing fewer threads then I remind you that LWT is trying to solve this exact problem and tries to make it as efficient as possible. From the paper I could conclude that they have made great progress.

"Have you actually read the paper that you quote yourself?"

I have but I have my doubts if you understood it correctly.

"Honestly it's nothing newer or more novel than what's been proposed in the academia, and scientia has a point that in terms of implementation it's not more than either Cell or even AMD's CTM."

Cell hasn't got too good thread swiching capability unless you count in software based fibers. CTM runs on GPU and later GPUs have similar HW task scheduler what LWT will bring to CPUs.

"Still, in order for the idea, it looks at only applications that can be completely parallelized. Frankly it's not any big percentage of programs out there."

So single-threaded performance is still more important? What programs are slow on your PC that are not or cannot be parallelized?

"It is "multi-operand in a single instruction" that makes SSE5 (and Altivec, for the matter) different. Larrabee, being implemented with a subset of x86, certainly does not have this."

Read my previous comment about Larrabee DP throughput and give your own explanation to the numbers.

"Vector conditional doesn't make sense for GPUs. Do you have an example of it?"

You do know that GPUs have been supporting conditional branches for quite some time, do you? Also ray tracing can benefit quite a lot from them.; October 06, 2007 3:59 AM
Ho Ho said...: abinstein
"It's been shown multiple times that AnandTech is nothing but an advertising arm (or maybe finger) of Intel."

Do you know of any better that have been benchmarking mobile CPUs lately? Also are you claiming that Anandtech did something wrong with their review when they showed that Intel laptop with lower battery capacity and bigger screen had longer lifetime than AMD one while offering more performance? For me it tells performance per watt leadership for Intel, at least on mobile solutions.

"Didn't I just speak of all the places that Intel's platform is not "working just fine"?"

Their earning reports disagree with you. If something is not working it would show as it did with Netburst. FSB is not the best thing out there but it is good enough for 1P and decent for 2P. Basically Intel platforms are good enough for the markets it earns most money from. Having a product doesn't mean much if you are loosing money on it.

"AMD's CTM are in production while Intel's EXOCHI is still in the lab."

So how widely used is it and how much money does it earn them? Last quarter they had around 200M in revenue and 50M in losses from the GPU business.; October 06, 2007 3:59 AM
Ho Ho said...: Btw, scientia, you still haven't answered my questions.; October 06, 2007 4:17 AM
abinstein said...: Ho Ho -
"Their earning reports disagree with you."

Again, you like most Intel fanbois are mistaken the ability to make money with the ability to make good products.

The most profitable car maker does not make the best or most fuel efficient car; the most profitable software maker does not write the most reliable program or operating system.

When I show you where Intel's processors has been evidently lacking behind AMD's, and where people have chosen AMD for the strength of its products, it helps none of your argument to show Intel's earning. Just because there are lots of fools to buy Intel's products doesn't make its products any better than they are.; October 06, 2007 12:28 PM
abinstein said...: Ho Ho -

I have no desire to argue "Larrabee" with you because you obviously do not have enough knowledge background; how can you even doubt "anyone saves and restores their VM state too often" and "even then the majority of the time goes to saving RAM contents" when you actually know what are going on during VM switches (and there are a lot)?

Just believe whatever you want with Larrabee. I can tell you one thing for sure; the 80-core dream is not much different from the 10GHz processor one; its x86-compatibility is also not going to be much better than Itanium's.; October 06, 2007 12:35 PM
Ho Ho said...: abinstein
"I have no desire to argue "Larrabee" with you because you obviously do not have enough knowledge background"

Tell me in (at least) one word if you agree with me that Larrabee will have multiple operand SIMD instructions that are executed in single cycle or not (FMADD).

Now I understood what you meant with the LWT and virtualization. Your talk about system image states confused me at first. From what I know it makes almost no difference for CPU if the threads it runs are from native or virtualized OS, especially with more advances in virtualization.

One thing is for sure: massive threading is coming to stay and others have to adapt. LWT will only make it easier. Arguing as if it is something not worth mentioning would in comparison make many AMD advances as worthless.

"I can tell you one thing for sure; the 80-core dream is not much different from the 10GHz processor one"

Well, considering we will have 48 physical cores in Larrabee around 2009/2010 I'd say that 80 is not that far away.

Commenting more on EXOCHI I can say that Intel has demonstrated working products, though I'm not sure if they actually sell anything already or not. My guess is not yet.

Any comments on the GPU vector conditionals?
What about programs that would be too slow on Larrabee-like architecture? I want to hear about some specific examples not just "anything that isn't threaded".; October 06, 2007 1:02 PM
Ho Ho said...: gutterrat
"I laugh when people on this blog make claims about knowing what "Larrabee" is or what it has or does not have"

Everything I've said is based on what information Intel has released. I'm not sure where others get their information.; October 06, 2007 1:27 PM
Scientia from AMDZone said...: andy

"Almost as bad as investing $5bn in 1 hit eh?"

No, not unless AMD has to sell ATI for $600 Million.

Intel started in 2000 and plowed money into Xscale trying to gain market share. When it flopped, they sold at a loss. In contrast, the ATI purchase should be profitable for AMD. So, not alike at all.; October 06, 2007 2:51 PM
Scientia from AMDZone said...: mo

"AMD still miles behind Intel in the mobile front."

Not exactly miles but AMD will stay behind Intel until AMD releases their new mobile chipset in early 2008.; October 06, 2007 2:58 PM
Scientia from AMDZone said...: ho ho

"My guess is x86-64 with no 32bit backwards compability."

Then we can discuss it when you have an actual Larrabee instruction set reference. I do have an SSE5 reference from AMD.

"Are you honestly claiming there won't be any kind of MADD-like things done in single instruction?"

No. I'm saying that on the whole, Larrabee, like Cell, will not be a robust general purpose processor. K10 most certainly is.

"Do you remember the slide that got cencored in a presentation that showed 4P Larrabee-only system?"

Why wouldn't I remember it? It was from a link that I gave. It is here at Ars Technica.

"That would mean Larrabee is good enough to run a full OS."

You need to read more closely. First of all, the application is for HPC, not the desktop. I've already mentioned that embedded Power and Cell were used in HPC yet neither of these is useful on the desktop.

a later slide gives a block diagram of a higher-end, HPC-oriented variant of Larrabee that uses Intel's forthcoming common systems interconnect (CSI) to gang together four 24-core processors.

Secondly, IBM's Cell based design was very poor in terms of general computing so it had to be combined with Opteron processors. Curiously, the Ars Technica article says the same thing about Larrabee:

"It's fairly clear from the block diagram that this layout shows a four-socket server design where all four sockets contain Larrabee parts. Such a design would be one node in a compute cluster that would almost certainly contain general-purpose CPUs from Intel as well."

"If you've forgotten that I can give you a link."

I assume you've forgotten that I already gave the link.

"Gesher is a full blown x86 CPU capable of achieving 28 double-percision GFLOPS at 4GHz per core when SSE is used (7 per cycle), ~100-200 with 4-8 cores."

What is your reference for Gesher?

"Larrabee is a lot easier to code for than Cell and has far less limitations."

Where are your instruction and architectural references for Larrabee?

"Are you saying that we will be seeing some CPU instructions offloaded to GPU with Fusion? Any wild guesses about the latency of those instructions? I don't see Fusion being more than a bit better CTM in the near future."

The entire Fusion process is 3 phases. The third phase is low latency.

" Also that would look quite similar to what Cell is only that GPU is even less usable than SPUs."

The third phase is nothing like Cell.; October 06, 2007 3:22 PM
Ho Ho said...: scientia
"I assume you've forgotten that I already gave the link."

No I have not. You've forgot that I've got (and have shown here) the presentation that includes the slides that got cencored out a few hours after publication.

"What is your reference for Gesher?"

The un-cecnored presentation. See page 31 for performance numbers and 17 for the 4P Larrabee machine. Do you need me to look up the previously linked pages talking about 48-core Larrabee chip also or do you remember that one?

"Where are your instruction and architectural references for Larrabee?"

It doesn't take a genius to figure out that not having to manage memory manually and having unified instructionset over all the cores makes a world of difference in terms of simplicity.

"The entire Fusion process is 3 phases. The third phase is low latency."

Roughly when will the third phase be ready? In what phase are they at the moment?; October 06, 2007 3:33 PM
Ho Ho said...: I just noticed that most of the information I talked about Gesher is in the Ars article you linked to. Didn't you read it yourself that you had to ask where I got my information?; October 06, 2007 3:36 PM
abinstein said...: Ho Ho -
"From what I know it makes almost no difference for CPU if the threads it runs are from native or virtualized OS, especially with more advances in virtualization."

Then what you know is wrong. According to VMWare, the nested paging added in Barcelona makes almost 80% difference in performance. Note that nested paging helps nothing but (VM) context switches.

"One thing is for sure: massive threading is coming to stay and others have to adapt."

This is not entirely correct. Massive threading will become prevalent once GPGPU hardware takes off (note that it is already possible to do massive threading on Sun Niagara/Rock processors). However there are and will be applications that do not need nor benefit from "massive threading."

"LWT will only make it easier. Arguing as if it is something not worth mentioning would in comparison make many AMD advances as worthless."

You are mistaking Intel's LWT demo with the LWT R&D in general.

"Well, considering we will have 48 physical cores in Larrabee around 2009/2010 I'd say that 80 is not that far away."

Considering we had 3.8GHz Pentium-4 around 2004/2005, how far away is any 4.8GHz processor from Intel?

"Commenting more on EXOCHI I can say that Intel has demonstrated working products, though I'm not sure if they actually sell anything already or not. My guess is not yet."

No, it is not, because the "simple OS & library update" you talked about actually requires extensive software modification to take advantage of it.; October 06, 2007 5:10 PM
Unknown said...: Sci,

You keep asking Ho HO for links to prove his arguments but you, yourself have not provided any of the links Ho HO asked for....

would be nice to back up what you say.; October 06, 2007 5:15 PM
Ho Ho said...: To greg about Silverthrone power usage:

http://forum.beyond3d.com/showpost.php?p=985363

From the numbers it seems to be as good as my ancient Jornada 720 with 207MHz ARM CPU. It can also work for around 7-8h while mostly idling. Difference with Silverthrone will be around 7 years and several orders of magnitude in performance while having roughly same battery life. I wouldn't mind it at all to have such performance in my palm.; October 07, 2007 4:26 AM
abinstein said...: Ho Ho -
"There is some more information about Larrabee on B3D. With that information I confirmed that Larrabee should be able to do 32 single-precision floating point calculations per clock per core with FMADD. This is 8x more than Core2 or Barcelona can do."

Again this shows that you don't read the information you link very well. Honestly it limits your ability to get reality.

First, the "Throughput Core" does not do 32 FP per cycle, but 4 threads of 16 FP every two cycles. There is subtle but significant difference between the two.

Second, as I said, the "Throughput Core" is already there in Sun's Niagara and Rock. It seems to me Intel is running out of tricks to impress its believers, who keeps being impressed nonetheless.

Third, the non-independent Intel self-source does not mention FMADD at all. It does not mention anything remotely similar to SSE5 either.

I'm sorry but you truly make yourself look like a serious fanboi. It is not others job to prove SSE5 not in Larrabee; it is your job to show it is - and obvious you can't but only with primitive speculation. Such stupidity even carried you so far to "keep waiting" for others anti-proof; seriously you need to review your elementary logic.

And in case you are "still waiting" for my response, I'm not going to respond to your comments about thread management/EXOCHI/LWT, which you said don't make sense at all.; October 07, 2007 2:27 PM
GutterRat said...: We're still waiting for the Shanghai tapeout announcement. Do you have any information to give whether AMD is on track with Shanghai? Is anyone expecting AMD to say that Shanghai has taped-out in their October 18 earnings conference call?

Also, to answer the 2nd comment in this post by scientia: you should give credit to Microsoft in addition to that given AMD for designing the 64 bit extensions to AMD's x86 product.; October 07, 2007 2:38 PM
Ho Ho said...: abinstein
"First, the "Throughput Core" does not do 32 FP per cycle, but 4 threads of 16 FP every two cycles

I highly doubt that. My guess is that the SMT there is similar to what was in Netbursts.

"Second, as I said, the "Throughput Core" is already there in Sun's Niagara and Rock"

... and FMADD and other SSE5 stuff is present in other CPUs. What does that mean?

"It is not others job to prove SSE5 not in Larrabee; it is your job to show it is"

Why should I prove something I've never said? I repeat for at least fifth time that I'm quite sure there Larrabee won't be SSE5 compatible but I'm quite sure it has FMADD and other similar functions there have been in GPUs for ages.; October 07, 2007 2:56 PM
Axel said...: Proof that the reason for AMD's silence on K10 all year is that the new product is too little too late for anything but high bandwidth server apps, as has been evident since the first benchmarks started leaking months ago. Not sure what AMD have been smoking for the last couple years in the R&D department, but in single threaded apps K10 is faster than K8 by only:

7% IPC in base int
10% IPC in peak int
8% IPC in fp (base & peak)

Per clock, Clovertown at 2.0 GHz is faster than K10 in all those measures:

+35% IPC in base int
+31% IPC in peak int
+29% IPC in base fp
+43% IPC in peak fp

Penryn is only a month away and will only widen this lead, at lower power usage and smaller die size, plus with SSE4. Barcelona and Phenom are exactly the same processor, so we already know how Phenom will perform. Penryn's higher IPC and clocks than K10 makes it obvious to anyone but the wantonly oblivious that Penryn will simply crush K10 in the workstation & desktop spaces, at substantially lower production cost.

Intel will thus control pricing in those markets and will easily force Phenom X4/X2 to be priced lower than Yorkfield/Wolfdale at the same clock, as I predicted in the comments for Scientia's blog entry "AMD: Limited Options". To which Scientia replied with his own prediction: "No. A 2.5Ghz Kuma is likely to be priced the same as a 2.66Ghz Wolfdale." Wrong (again).

Since AMD's revenue base is mostly dependent on desktop & mobile, they will continue to sustain great losses through 2008 unless they drastically reduce costs.; October 08, 2007 10:33 AM
Unknown said...: I dont think they are finished but they are sure as hell having one hell of a coaster ride.........down.

I can see AMD changing it's market. It will become a company more geared toward the HPC and mega datacenters. It may still have a very small presence in the low end consumer market or embedded market.

But as far as growing vastly in consumer market, mobile and desktop.... along with 1p-2P server, I dont think it will stay in there.; October 08, 2007 2:41 PM
Scientia from AMDZone said...: ho ho

"doesn't take a genius to figure out that not having to manage memory manually and having unified instructionset over all the cores makes a world of difference in terms of simplicity."

Which means nothing in terms of the instruction set.

"Roughly when will the third phase be ready? In what phase are they at the moment?"

Phase 1.; October 08, 2007 4:34 PM
Scientia from AMDZone said...: mo

"You keep asking Ho HO for links to prove his arguments but you, yourself have not provided any of the links Ho HO asked for...."

No, these two are not the same. I've made general statements without links. This is similar to when I said that AMD was doing 2.4Ghz in their labs with Barcelona. This statement by me without links was blasted by some here. However, it then turned out to actually be an understatement when AMD demonstrated 3.0Ghz.

Ho ho's comments are not general. He is attempting to make specific technical points about which he is almost certainly incorrect. However, I've given up trying to show him that he is incorrect until we have an actual Larrabee reference. It is possible (although unlikely) that such a reference could prove Ho ho correct.

"would be nice to back up what you say."

I've mentioned things that are sensitive; there won't be any links because it is not public information. However, you are free to disbelieve it just like you did the 2.4Ghz speeds.; October 08, 2007 4:46 PM
abinstein said...: Ho Ho -
"... and FMADD and other SSE5 stuff is present in other CPUs. What does that mean?"

I don't understand your logic. multiply-add is a universal instruction and it's nothing special. However, SSE5 doesn't exist in any other processors. You may say Altivec offers similar functionalities as SSE5, but Altivec does not offer SSE5, just as SPARC64 does not offer x86-64.

When people praise SSE5, they/we are definitely not praising its SIMD nature or multiply-add capability; instead, we praise its orthogonal design that's added neatly to the messy x86 ISA. Just compare SSE5 with SSE3/4 and you get the idea how AMD's designers are far better than Intel's in making sensible extension.

OTOH, Intel's Larrabee is touted by the company for nothing but "adding massive multi-core to CPU", which unfortunately is nothing but very old news. And you praise it... for what? I mean if you have actual knowledge about how neatly the multi-core design is integrated with (full) x86 compatibility and how more efficient it is than other prior implementations, feel free to contribute; otherwise you're just a fanboi cheering whatever you hear from Intel.; October 08, 2007 5:53 PM
abinstein said...: Ho Ho -
"I'm quite sure there Larrabee won't be SSE5 compatible but I'm quite sure it has FMADD and other similar functions [in GPUs]"

Well, it's time for you to prove it. What say you?; October 08, 2007 5:56 PM
abinstein said...: Ho Ho -
"My guess is that the SMT there is similar to what was in Netbursts."

Again, had you actually read the link you referenced yourself you would have known this is definitely wrong.

Larrabee's SMT will be more like Sun Niagara's than Netburst's because of the in-order processing nature of the first two cores.; October 08, 2007 6:01 PM
Mo said...: Sci have you seen the latest SpecINT and SpecFP in 1P and 2P servers that AMD submitted?

Isn't that a huge server market (1P and 2P)?

What are your thoughts on the results?

SPECint2006 (base/peak) - 2P system
AMD 1.9GHz - 9.97/11.3 (K10)
AMD 3.2GHz - 14.1/15.2 (K8)
Intel 2GHz - 14.2/15.6
Intel 3GHz - 18.9/20.8

SPECfp2006 (base/peak) - 2P system
AMD 1.9GHz - 10.7/11.2 (K10)
AMD 3.2GHz - 14.2/14.5 (K8)
Intel 2GHz - 14.5/16.9

Intel 3GHz - 18.4/21.4; October 09, 2007 12:28 AM
Scientia from AMDZone said...: GutterRat

"We're still waiting for the Shanghai tapeout announcement."

Yes, I would have expected this already.

" Do you have any information to give whether AMD is on track with Shanghai?"

If we consider "on track" to be a Q3 08 release then I'd have to say it depends. At the moment I don't know whether AMD has positive Shanghai tests in hand or is waiting on the next batch. They could stay on track with the next batch but not likely if it takes one after that.; October 09, 2007 9:41 AM
Scientia from AMDZone said...: Mo

"I can see AMD changing it's market. It will become a company more geared toward the HPC and mega datacenters."

Not likely.

" It may still have a very small presence in the low end consumer market or embedded market."

This statement is delusional.

"But as far as growing vastly in consumer market, mobile and desktop.... along with 1p-2P server, I dont think it will stay in there."

Desktop, mobile, and server. AMD is shooting for 30% share by end of 2008. They should be close to this.; October 09, 2007 9:49 AM
Scientia from AMDZone said...: mo

Even a obviously biased Intel fan like Ou claimed that Barcelona would do worst in comparison to Intel at 2.0Ghz. I'll wait for faster clocks to start saying the sky is falling (as Axel and Giant have been doing for the past year).

ho ho

What questions do you think I am ignoring? I've already told you that I will not discuss technical points on Larrabee without an actual instruction reference. Your view of Larrabee is way too optimistic.; October 09, 2007 9:53 AM
Scientia from AMDZone said...: Also, I see no reason to talk about Sandy Bridge (Gesher) since the specs for Nehalem are not even completely known yet. However, I'm sure Sandy Bridge will improve on Nehalem. The big question to me is whether Intel will have had to cave in on SSE5 by then.; October 09, 2007 9:57 AM
Aguia said...: It may still have a very small presence in the low end consumer market or embedded market.

Do you call low end consumer market having ready for launch platforms with three and four 16x PCIe slot?
What do you call to Intel 1 PCIe 16x slot + 1 PCIe 4x slot, extremely low end or cell phone market?
LOL

Scientia, You are delusional if you agree that AMD is going to have 30% market share by end of 08.

Just like AMD had expected to break even by end of 07.

Why not, do you know what date is it today?; October 09, 2007 12:01 PM
Ho Ho said...: scientia
"What questions do you think I am ignoring?"

Now where should I start.
1) When will G3MX and SSE5 start having an effect on desktop? +/- one year should be good enough.
2) Why isn't LWT robust enough for virtualization?
3) What makes LWT so similar to Cell?
4) Do you think that when every single programmable GPU has had FMADD instruction Larrabee won't have it?
5) How does G3MX make motherboad design less complex compared to current designs?
6) Where did you learn about Intel "secret" 64bit architecture that MS didn't accept and Intel stopped developing?
7) How much do you know about Larrabee cache bandwidth that you can say it is low?
8) What must be done for Fusion to accelerate common programs? Write FP-heavy stuff to be run on GPU?
9) Roughly when will the third phase of Fusion be launched (+/- 1 year)?

"No, these two are not the same. I've made general statements without links."

As I said my claims are based on analysis based on released data. YOu have given no links to data, just some theories.

For example I said that GPUs have had FMADD for ages. Larrabee is GPU (probably with the possibility of running as a CPU). Intel presentation numbers show it must run at least two 512bit DP SIMD instructions per cycle. Plain logic sais there must at least be some 3-operand instruction.

"He is attempting to make specific technical points about which he is almost certainly incorrect."

What make my claims more incorrect than your claims of LWT not being too robust, G3MX simplifying motherboard design, Intel having secret 64bit architecture and that AMD has protoypes of G3MX?

Basically it seems as you are trying to make similar technical points but for some reason you don't need to prove them. We should simply believe that what you say is plausible.

"there won't be any links because it is not public information"

So what sources did you use to get that information?

"Also, I see no reason to talk about Sandy Bridge (Gesher) since the specs for Nehalem are not even completely known yet"; October 09, 2007 1:09 PM
Ho Ho said...: abinstein
"OTOH, Intel's Larrabee is touted by the company for nothing but "adding massive multi-core to CPU", which unfortunately is nothing but very old news. And you praise it... for what?"

I praise it for bringing a whole new level of performance availiable on commodity PCs. I've been researching real time ray tracing for years and at last I see something that can bring this to reality.

"and how more efficient it is than other prior implementations"

What prior implementations? GPUs?

"Well, it's time for you to prove it. What say you?"

Nothing new I've said already multiple times. I'd still like to know why should Intel remove GPU functionality from Larrabee.; October 09, 2007 1:10 PM
Unknown said...: Hoho, read the fine print. I challenge you to find any normal cell-phone battery capable of actually storing 6 Whr. Most cell-phone manufacturers are happy with just 1 or 2 Whr.

Also, I'm not saying having silverthorne in a cell-phone wouldn't be cool, but I am saying that seeing how other companies that produce mobile products will also continue to develop new designs, Intel's challenege is not developing further on the product, but developing faster than anyone else is to a considerable magnitude.; October 09, 2007 7:07 PM
abinstein said...: Ho Ho -

One more example of your terrible logic...

"Nothing new I've said already multiple times."

So in other words you have no knowledge nor any proof about whether any SSE5-like or what GPU-like instructions will be present to Larrabee.

"I'd still like to know why should Intel remove GPU functionality from Larrabee."

If you don't even know whether any such instruction is present, then how/why are you talking about "remove"? How do you remove something that's probably not even there in the first place.

Or, as I said, do you have any reason or proof that Intel is adding some special GPU/SSE5-like instructions? You don't, do you?; October 09, 2007 8:46 PM
InTheKnow said...: Greg said ....
Also, I'm not saying having silverthorne in a cell-phone wouldn't be cool, but I am saying that seeing how other companies that produce mobile products will also continue to develop new designs, Intel's challenege is not developing further on the product, but developing faster than anyone else is to a considerable magnitude.

First off, I agree with you that Silverthorne isn't where it needs to be to get in a cell phone. Even the rumors of putting an Intel processor in the iPhone are assuming it it will be with the follow on product to Silverthrone based on the Nehalem processor design.

However, I don't think Intel has to run that much faster to catch the competition. It is a case of diminishing returns. Let's make up some numbers to illustrate my point.

Say for example the current solution (we'll call it ARM for ease of reference) uses 0.1W when active and 0.01W when idle. Intel has an offering that uses 1.0W when active and 0.5W when idle.

Now let's say the ARM solution is able to achieve a massive breakthrough and cut the numbers by 50%. Now they run at 0.05W and idle at 0.005W. However, if Intel hits it's goal to do the same thing (a 50% reduction) with their next gen device, they are at 0.5W under load and 0.25W when idle.

The gap was 0.9W under load and 0.49W at idle. With the next Gen devices for Intel and ARM the gap has narrowed to 0.45W under load and 0.245W when idle. So the difference has narrowed even though both devices achieved the same level of improvement. As the power gap narrows, the power of the x86 architecture gets closer to offsetting the shorter battery life when looked at as a value proposition.

This over simplified example doesn't even begin to cover the cost of that 50% power drop as you get closer and closer to zero. So for the existing architecture to cut their power in half as listed above, it will take a lot more time, money and effort that it will for Intel who still has a lot further to go.; October 09, 2007 10:27 PM
Scientia from AMDZone said...: Ho Ho

"1) When will G3MX and SSE5 start having an effect on desktop? +/- one year should be good enough."

The PGI compiler should support SSE5 before Bulldozer is released so software supporting it could appear soon afterwards. Naturally, the Intel compiler is not likely to support it. G3MX should give a boost to servers and workstations as soon as Bulldozer is released. To be honest, this should effect desktops in less than a year as production is shifted to simpler DIMMs however it should have positive effects for about two years.

"2) Why isn't LWT robust enough for virtualization?"

Virtualization requires a full context switch. LWT by definition runs in the same address space.

"3) What makes LWT so similar to Cell?"

You are confusing Lightweight Threading with Lightweight architecture. Any processor can use LWT. However, standard processors are not lightweight. I meant that Larrabee cores are similar to other lightweight architectures like embedded Power and Cell. These have limited application.

"4) Do you think that when every single programmable GPU has had FMADD instruction Larrabee won't have it?"

It doesn't make any difference. For the last time; Larrabee does not contain SSE5 nor is it likely to have a similar robust instruction set. When you get an actual Larrabee reference we can discuss it further.

"5) How does G3MX make motherboad design less complex compared to current designs?"

This one should be obvious. Instead of running serpentine traces (of equal length) for the full width of the memory channel you only need a much narrower serial path. Even if the total traces are of similar number they are still treated as separate units and therefore are exempt from having to have the same trace length. In other words, using serial paths reduces problems with signal skew. It is true that the data path is the same width from the buffer chip to the DIMM socket but this is much shorter than from the processor to DIMM socket. This can often allow a reduction in board layers.

Overall, the reduction in trace complexity is less than you would get with FBDIMM but since you only need half as many buffer chips you do get a separate reduction in cost, chip count, and power draw. This tends to be better since power draw is a recurring cost while board cost is fixed.

"6)"

Skipped.

"7) How much do you know about Larrabee cache bandwidth that you can say it is low?"

It is low compared to C2D or K10.

"8) What must be done for Fusion to accelerate common programs? Write FP-heavy stuff to be run on GPU?"

It depends on the code. Fusion can do it fairly easily if you rely on library calls. Tight integration would require additional instructions.

"9) Roughly when will the third phase of Fusion be launched (+/- 1 year)?"

It is looking like early 2010.

"Larrabee is GPU (probably with the possibility of running as a CPU)."

This means nothing for general purpose code.

" Intel presentation numbers show it must run at least two 512bit DP SIMD instructions per cycle."

Again, this has nothing to do with general purpose code.

" Plain logic sais there must at least be some 3-operand instruction."

You are talking about a completely different architectural context. This context has its own limitations (regardless of peak SIMD speed). Getting back to the original point was your attempt to compare Larrabee to Bulldozer with SSE5 and I still find this to be laughable. Again, and for the last time, Larrabee is not comparable to Bulldozer unless you believe it will be used on the desktop.; October 09, 2007 11:41 PM
Scientia from AMDZone said...: intheknow

"As the power gap narrows, the power of the x86 architecture gets closer to offsetting the shorter battery life when looked at as a value proposition"

This is absurd. A cell phone like device is useless unless it has a battery life of at least a day. If you cut the power draw in half and extend the battery life from 1 hour to 2 hours you still have something worthless. The fact is that you cannot trade processing power for battery life for a handheld device.; October 09, 2007 11:55 PM
Mo said...: You gladly deleted the Spec benchmarks I linked to...i hope you are replying to it.

You also gladly deleted the Vrzone link.....I hope you are replying to that as well.....; October 10, 2007 12:49 AM
Scientia from AMDZone said...: mo

See if you can make a post without trolling.

VR Zone has never been reliable for roadmaps. Feel free to believe it if you like.; October 10, 2007 1:23 AM
Mo said...: They are more reliable then Fudzilla and INQ.

Assuming it's true, what are your feelings about the road map. Can we still expect 3.0ghz in Q1 as you have been claiming since we saw AMD demo a 3ghz system.; October 10, 2007 2:02 AM
Ho Ho said...: greg
" I challenge you to find any normal cell-phone battery capable of actually storing 6 Whr"

Where exactly was I talking about cellphone? I was comparing it to my PDA.; October 10, 2007 3:18 AM
Ho Ho said...: abinstein
"How do you remove something that's probably not even there in the first place."

It is your word against mine. I say that functionality has been in GPUs for ages, you say they will remove it. We shall see who is correct.

"Or, as I said, do you have any reason or proof that Intel is adding some special GPU/SSE5-like instructions?"

If you define "GPU instructions" as texture filtering and perhaps AA resolve then yes, it will have the functionality, though in out-of-core accelerators. If you think of SSE5-like functions then I repeat for once more, this functionality has been in GPUs for ages. Also the papers I've linked to seem to say there are similar functions.; October 10, 2007 3:19 AM
Ho Ho said...: scientia
"To be honest, this should effect desktops in less than a year as production is shifted to simpler DIMMs however it should have positive effects for about two years."

Wouldn't memory manufacturers have to support older standards for several years after they aren't used in newer machines? For example how long do you think DDR1 will be produced? I doubt we could see any effects before at least three to four years has passed and that is assuming that Intel and all others also ditch their server RAMs.

"Virtualization requires a full context switch. LWT by definition runs in the same address space."

No it does not. Where did you get that idea?

"You are confusing Lightweight Threading with Lightweight architecture."

And you are confusing LWT with fibers. They are not the same.

"For the last time; Larrabee does not contain SSE5 nor is it likely to have a similar robust instruction set."

I've never said it contains SSE5. I've only said it has FMADD and other instructions found in GPUs that are meant for increasing FP throughput.

"It is true that the data path is the same width from the buffer chip to the DIMM socket but this is much shorter than from the processor to DIMM socket. This can often allow a reduction in board layers."

You cannot reduce PCB layers with shorter lanes, only having fewer lanes can do it.

Why did you skipped #6? You seemed to be so sure of it. Don't say you have some internal sources and you cannot talk about it. That would make your claims pretty much as believeable as Sharikous.

"It is low compared to C2D or K10."

You didn't ansver the question: how can you say it is lower than in C2D and K10? Wouldn't you think that having massive FP throughput and 4-way SMT it could likely have much greater bandwidth than current CPUs?

"Again, this has nothing to do with general purpose code."

Then same could be said about most FP instructions in SSE5.; October 10, 2007 3:22 AM
InTheKnow said...: This is absurd. A cell phone like device is useless unless it has a battery life of at least a day.

First, the numbers are fictional, so drawing a direct conclusion from those numbers is bogus.

Second, you missed the entire point. As power requirements approach zero, proportional changes in efficiency produce less change. And they are harder to achieve. Is my point clear now?; October 10, 2007 10:10 AM
Scientia from AMDZone said...: mo

If AMD can't release a 3.0Ghz quad in Q1 08 then there is a problem.; October 10, 2007 11:42 AM
Scientia from AMDZone said...: ho ho

"If you think of SSE5-like functions then I repeat for once more, this functionality has been in GPUs for ages."

This is nonsense. Limited SIMD processing is nowhere near robust logic processing. If you are talking about one tiny aspect of the Bulldozer ISA then you might be correct but this is a long way from matching Bulldozer on the desktop.

" Also the papers I've linked to seem to say there are similar functions."

Again, this has nothing to do with the use of Larrabee on the desktop.

Why is this concept so hard for you to understand? In terms of SIMD, all GPU's (Larrabee included) have enormous processing capability. However, a real general purpose processor has to be capable of robust logic processing. I don't know of any GPU that has ever been designed for that.

The Connection machine had similar claimed massive theoretical processing capability because of its enormous 64K line width. It could process 64K bits per cycle. Yet when it came to running real programs it was far behind other architectures. Don't waste your time (and mine) quoting papers unless you have one that shows that regular applications run better on GPU's than CPU's.; October 10, 2007 11:50 AM
Scientia from AMDZone said...: InTheKnow

"Second, you missed the entire point. As power requirements approach zero, proportional changes in efficiency produce less change. And they are harder to achieve."

Let's talk about reality for a change. There are three general classes of power: extremely low power battery, moderate battery, high battery.

Low power includes things like my wristwatch. I expect it to run for years without changing the battery. Small, light, inexepensive. The interfaces are usually very limited with text only displays and single level menuing.

Moderate power includes things like my digital camera, PDA's, and cell phones. These are moderately portable but use much more power than the previous category. These items are typically rechargable. The interfaces are medium with more graphic ability and more buttons. Three level menuing is common. However, text entry is slow and tedious because of the need to reuse buttons.

High power is almost exclusively notebooks. These have much larger batteries than cell phones or digital cameras. These are not as portable. Notebooks have full interfaces like keyboards, large displays and other ports for card reading or peripherals. Text entry is fast.

What Intel is attempting to do with Silverthorne is create a category in between the moderate and high power categories. This would have less process power and performance than a notebook but more than a cell phone. I don't really see how this would be possible because of the additional weight and lack of greater interface capability.; October 10, 2007 12:08 PM
Axel said...: Scientia

If AMD can't release a 3.0Ghz quad in Q1 08 then there is a problem.

Even if AMD could release Agena at 3.0 GHz in Q1 08 in volume, I doubt this would restore them to profitability. Based on Anandtech, Tech Reports, and the recent SPEC single-thread disclosures by AMD, we know now that Agena 3.0 GHz will roughly compare in desktop performance to a 2.66 GHz Yorkfield. The latter is predicted to be priced at $316. Therefore AMD's entire quadcore Agena line would have to be priced at $316 and below to have fair market value. This will not raise AMD's ASPs sufficiently to counteract their furious cash burn rate.; October 10, 2007 1:57 PM
Unknown said...: In the know, even following your off-track line of thought, you eventually run into the fact that those companies that already have processors with very low wattage requirements can now spend a vast amount of R&D on greater processing power.

Intel would have to focus most of its future R&D to meeting what already exists in power usage while those other companies could grow their current share and reputability by increasing their processing power. This will weaken their ability to break into this market substantially.

I agree with scientia's analysis that this is simply Intel's move to try to create a new class of processor that's more useful to the handheld-pc market. However, I think what they fail to realize is that this product's market is not growing for reasons other than performance and battery life.

What hilights this is the fact that the strongest contender in this market (at least, as far as I know) the nokia N80 doesn't even have a keyboard, has a fairly weak processor, but still gets decent battery life, and doesn't use windows.

What becomes obvious is that the device is mainly desired by the technologically affluent for its ability to use opensource software and for its open platform that allows its purpose to be completely redefined by each user. As such, it's actually generally not used as a handheld-pc but as whatever device the user has reconfigured it to be.

To summarize, the reason handheld pcs sell poorly is because they're pcs, which would only be emphasized by the use of an x86 processor.

Hoho, while I realize you weren't talking about a phone, a pda generally has a battery that is not larger.

Mo, you should quit whining. Scientia deleted my post summarizing what you were saying and how you approach your arguments, so it's not like he only moderates one side of the argument (though I'd actually argue that I don't sit on a side which you'd surely disagree with).

Axel, not to sound ungrateful for the wondeful analysis you've provided us, but why do your sources actually literally and specifically contradict you?; October 10, 2007 4:17 PM
abinstein said...: Axel, intheknow, and the rest of the lot...

A few things you Intel-lovers don't understand. First, nobody with a sane mind would care whether Agena X4 is faster or slower than Penryn QC on single-threaded applications (i.e., SPECint or SuperPi). If you really want single-threaded performance that, then buy a cheaper dual-core with faster memory and video card instead.

What really matters for these multi-core processors is SPECint_rate and SPECfp_rate, and we've seen 2.0GHz/2.6GHz Barcelona comparable to 2.33GHz/3.0GHz Clovertown. Penryn brings some better floating point but with SSE4 it's probably more due to compiler acceleration than hardware. In other words, unless the server managers recompile their applications they're not going to benefit much from Penryn's high fp score.

Another thing is your belief that Silverthorne aims for cellphone. This is totally false information purposely spread by Otellini. First, a cellphone battery usually holds less than 10Wh, so even if the Silverthorne takes just 0.6W it's not going to survive a day in a cell that doesn't do anything else. Second, even in terms of computation Silverthorne by itself is nothing; you still need the NB, SB, memory module, and ADC/DAC to say the least for any functionality.

In summary, what matter to the cellphone/smartphone is not processing power but SoC capability, which Intel terribly lacks. Intel is and has been trying to leverage its commodity processor manufacturing to penetrate the consumer mobile market, but it has always failed miserably. Ironically, the larger the consumer mobile market is, the less relatively advantage Intel has over other competing manufacturers.; October 10, 2007 6:49 PM
Christian H. said...: Even if AMD could release Agena at 3.0 GHz in Q1 08 in volume, I doubt this would restore them to profitability. Based on Anandtech, Tech Reports, and the recent SPEC single-thread disclosures by AMD, we know now that Agena 3.0 GHz will roughly compare in desktop performance to a 2.66 GHz Yorkfield. The latter is predicted to be priced at $316. Therefore AMD's entire quadcore Agena line would have to be priced at $316 and below to have fair market value. This will not raise AMD's ASPs sufficiently to counteract their furious cash burn rate.

First, the preview from Anand is not realistic because a server mobo is not designed for higher speed burst processing but load-balanced continuous processing.

Also, the server they used didn't have HT3 or 1066 RAM. I would say that you can 10-15% or perhaps more to Phenom as it will run at 2.2 and 2.4GHz on enthusiast based mobos.

And that's initially. The 9700 due in December will run at 2.6GHz with a 4GHz HT3 bus. Just as Intel uses increased bus speeds for higher perf so is AMD. The increase dspeed will also lower latency to L3 probably significantly.

Hopefully it will cause the server OEMs to want new HT3 boards for Opteron. Turning 16bit HT3 back on should be trivial.; October 10, 2007 7:35 PM
Unknown said...: Abinstein....Here is the problem with what you said.

1) we do not have results of a Single socket Quadcore AMD chip.

2) majority of consumer apps are still single-threaded.

why wouldn't anyone care about single threaded performance when majority of software is still single threaded?
Why can't I have both single threaded and multi-threaded performance in once chip?

We will need to see how opteron/Phenom does in single socket situation as most of the consumer market will be single socket.

We know that Intel does not scale as well as the Opteron in a multi-chip sitations and becomes bottlenecked by it's FSB.
How about single socket situations where FSB is not AS MUCH of a bottleneck?

Would you be surprised if a similarly clocked C2Q got close the the same Specint_rate as the similarly clocked Agena?

I think you maybe in for a surprise.

Do you have specint_rate results of single socket Opeteron quad core?; October 10, 2007 9:15 PM
Scientia from AMDZone said...: mo

There is nothing wrong with showing single socket performance. However, a quad core chip still requires multi-threaded testing even with just one socket. If most applications are indeed single threaded then you don't need a quad core. However, if you are going to compare quad cores it is ridiculous to compare on the basis of single threaded performance.; October 10, 2007 10:56 PM
Unknown said...: sci. I'm trying to say that single threaded performance is still an important fact even in a quad core chip.
Not all software is multi-threaded.

I would like to have quad core chip so I can both have performance in single and multi-threaded performance.
do you not agree with that?

What I am saying and I thought I was very clear, that the lead that K10 gets from being Mult-chip might diminish or be reduce because the FSB won't be that much of a bottle neck in a single socket system like it is in a 2-4 socket system.

DO you also not agree that most consumers will own a single socket system?

Any lead that AMD got from it's good scaling in multiple socket system will be reduced in single socket socket system when compared to Intel's single socket system.

That is not a hard concept to grasp.; October 11, 2007 12:09 AM
Ho Ho said...: scientia, seems as you have missed the whole point of Larrabee being in between GPUs and CPUs. It gets its massive throughput from GPUs and flexibility from CPUs. It has massive SIMD throughput of GPUs but it is also flexible enough to do relatively general purpose computing. Of course it won't show nearly as big IPC with stuff like XML processing when comparing to ray tracing or rasterizing.

It is quite sad you cannot further clarify your answers. You did try to ansver those nine but they were mostly lacking in the important details.

greg
"Hoho, while I realize you weren't talking about a phone, a pda generally has a battery that is not larger."

If Silverthrone tries to target smaller PDAs (<250g) then perhaps yes. With bigger ones (500g+) it could do very well.

For example my old Nokia 3510i has 950mAh battery and weighs ~100g. My even older Jornada 720 has 1500mAh battery by default and I'm upgrading it to 2000mAh one, total weight is ~500g. My cellphone has a battery life of around 5-6 days with light usage (~10-15min per day). Jornada has special very deep sleep mode where it can live on one charge for nearly a month. When it is turned on I can listen to music for around 7-8h or read books for about as long. That 7-8h is plenty enough for me and I'd be glad to have similar battery life in a much more powerful PDA.

abinstein
"First, a cellphone battery usually holds less than 10Wh, so even if the Silverthorne takes just 0.6W it's not going to survive a day in a cell that doesn't do anything else."

You can fit much more into a bigger-than-phone PDA or smaller tablet PC. Nokia N90, a smartphone, has a relatively small battery with only 950mAh and it doesn't work more than ~4.5h while talking and 18h with actual using. Sure, it is nice to show long standby times but with PDA I doubt anyone cares about these that much.

A bit newer PDA kind of thing, Nokia N800, has 1500mAh battery and has life of only around 3h while browsing the net. If Silverthrone can have at least as much battery life and has significantly higher performance it can become a great tool for many people.

For example we had to use a relatively big tablet PC to get our ship navigation software working as no PDA was fast enough to draw all the graphics that was needed. That thing has 3430mAh battry with ~2h life in it. Silverthrone based thing should be able to handle it fine and be of much less strain on the ships batteries and generators.

I don't know where did you all get your information about PDA battery life and what people expect from them but it does seem that you have vastly overrated what current PDAs can do and how long can they live on batteries.

"Second, even in terms of computation Silverthorne by itself is nothing; you still need the NB, SB, memory module, and ADC/DAC to say the least for any functionality."

Wasn't that 1W for all those chips combined?

scientia
"If most applications are indeed single threaded then you don't need a quad core."

Yes but what if I have several different kind of applications I need to run with half being singlethreaded and half multithreaded? Should I then get a quadcore to speed up those multithreaded apps or dualcore for speeding up the others? World is not black and white you know.

Sure, you could say I should get a faster clocked three-core but first information seems to indicate that they are in fact much lower clocked than quadcores.; October 11, 2007 3:07 AM
Pop Catalin Sever said...: Scientia, can you tell us something?

Are you or are you not an AMD insider?; October 11, 2007 5:11 AM
Scientia from AMDZone said...: ho ho

You claim Larrabee is in between a GPU and CPU. Your argument stops there until you have an actual ISA reference for Larrabee.

Also, I can't imagine what piece of navigation equipment you believe is going to drain a ship's batteries. I'm assuming you mean a small boat which only has a limited generator on the outboard (for electric start). This is not a ship.

Finally, I take it you've never seen a Timex/Sinclair. This computer defined the practical limitations of a small form factor.

mo

Even the cheapest Celeron/Sempron system today will run Microsoft/Open Office applications adequately. This covers probably 75% of everyone who will buy a computer.

You must be talking about something more specialized like high frame rates on a particular game or running a heavy duty application like CAD.

As soon as you go to dual socket you begin making compromises like using slower FBDIMM or registered memory. In this context I don't see the point of comparing game demos or toy benchmarks.

If you have enough spare money that you can buy a quad core just for the fun of it then that is fine. However, that doesn't describe most buyers. If you really don't need four threads then it makes more sense to compare single or dual core processors which are still much cheaper.

Ideally you would be able to upgrade later if you wanted but we know that that is seldom the reality for Intel. Anyone buying a processor now will be unable to upgrade to Nehalem. AMD users get a slightly longer run to Bulldozer but it too uses a different layout on the socket. At least, that is what has been suggested by G3MX. I suppose it could be possible for AMD to run a version with IMC and another with serial ports but I'm not counting on that.

pop

I don't claim to be anything.; October 11, 2007 10:00 AM
Unknown said...: You keep failing to address my questions. So for the sake of this argument, I'll take the single/dual core chips.

Same argument applies.

We have seen that opeterons shine when they are in a multi-chip system due to it's greater memory bandwidth.
Intel chips bog down due to it's FSB on a 2-4 chips.

We also know that 1333FSB is well beyond sufficient for a dual core.

So when Intel does not have the FSB bottleneck, how do you think it will perform against the K10 Dual core?
Same goes with quad.

Sure K10 can scale well in a multi-chip system but from the results we have seen, it doesn't seem like it would do so well in a single socket system..

Again, I will ask you since you seem to keep avoiding this....
How do you think K10 will fare in a single socket situation where FSB won't be much of a bottleneck for Intel Duals and Quads.; October 11, 2007 11:25 AM
Unknown said...: I have had my p5b deluxe since early August, 2006. I popped in a C2D no problem at all.
It will also accept Core 2 Quad.

It will also accept the upcoming penryn chips.

It will EASILY get me through 2008 and into 2009 because penryn should suffice until summer of 09.; October 11, 2007 11:31 AM
abinstein said...: Mo -

We do not have single-socket Barcelona results precisely because few of its intended customers care. But in case you really want to know you can just divide the dual-socket results by 1.1x to be conservative or 1.05x to be optimistic.

If you really believe today's computers do not take advantage of multi-processing or multi-threading, then you should buy single-core or dual-core processors only and, as I said, spend the money on better video cards or others. Apparently, quad-core is a waste for your single-threaded usage.

Ho Ho -
"A bit newer PDA kind of thing, Nokia N800, has 1500mAh battery and has life of only around 3h while browsing the net. If Silverthrone can have at least as much battery life and has significantly higher performance it can become a great tool for many people"

Wrong, and I wonder do you ever read properly? First, 1500mAh is just about 5Wh, or half of my estimate.

Second, browsing the net takes very little power for computation and a lot power for things like wireless signaling and display; Silverthorne would be consuming power for useless purposes in such scenario.

Third, for Silverthorne to function at all a lot more power is needed for the supporting chipset, memory, and I/O. In other words, Intel has done poor-to-none SoC in its chips and that makes them very poor candidates for the consumer mobile devices.; October 11, 2007 11:43 AM
Unknown said...: Hoho, you still make my point. Abinstein was being too general, a PDA will, at most, have 4 Whrs in its battery, which will still make silverthorne fairly worthless as not only is the PDA likely to have a bigger screen, but it also then less likely to be soon switching to OLED, which is what my main power reduction assumption in future products was. So even for this level, it's pointless. Also, the larger "PDA"s in this area are the handheld-pcs I was talking about, (the nokia N800, which I said was an N80) so my previous point about usability applies and further nullifies silverthorne's purpose.

Mo, what part of HT3, ddr 1066, and better scalability at higher clocks don't you get?; October 11, 2007 11:51 AM
Aguia said...: I have had my p5b deluxe since early August, 2006. I popped in a C2D no problem at all.
It will also accept Core 2 Quad.

Another ho ho…
Where did you get that info?
It’s right there in Asus web site:

P5B Deluxe
-Support Intel® next generation 45nm multi-core CPU
- Intel LGA775 Platform
- Intel® Quad-core CPU Ready?!?!
- Intel® Core™2 Extreme?!?!? / Core™2 Duo Ready
- Intel® Pentium® Extreme / Pentium® D / Pentium® 4 / Celeron® D Ready
- Dual-channel DDR2 800/667/533
- Compatible with all FSB1333/1066/800/533MHz CPUs except Quad Core ?!?!?

Is it or is not?

Also form the same place:
P5B Deluxe Intel P965/ICH8R 1333(O.C)*
* O.C means overclocking mode.

So in order to be able to use the 1333Mhz you have to overclock. That is achieved with that Asus, that model and with overclocking which means complete stability is not possible to achieve.; October 11, 2007 12:35 PM
Aguia said...: Estonian resellers have changed NV prices. I can get brand new 320M GTS for around €240. Though no 2900PROs are anywhere to be seen.

Just to remind Ho ho,
That the cheapest priced Nvidia 8800 GTS 320 is still $289.99

While the Radeon HD 2900PRO 512MB is $279.99

Where is the Nvidia price drop?; October 11, 2007 12:38 PM
Unknown said...: Abinstein, please do not twist what I have said.

I never said todays computers don't take advantage of Multi-threading. I said they don't take "ENOUGH" advantage of it.
What part of "most of the software is still single threaded in the consumer market" don't you understand?

What part of "multi-threaded performance is just as important as Single-threaded performance" don't you guys understand?

I would like to future proof myself by purchasing a quad-core. So future multi-threaded software can take advantage of it. I would also like to have performance on the current software that is single threaded. Do you expect me to switch processors in between doing single threaded and multi-threaded processing??? A single chip should do both in the consumer space.

I don't understand why you guys can't grasp that simple little point I'm trying to make.

Greg: I'm not forgetting HT3, or DDR1066.... I just don't think it'll be enough.
I do not need to go to a website that is probably being hosted on a DSL line (Asus has one of the slowest sites ive been on) and everything is translated from some foreign language. Site is full of typos.

Everyone and their grandma who tried a Quad Core in a P5B deluxe has worked with a simple bios update. It is a KNOWN fact that P5B Dlx supports Quad Core chips.
It is also been tested to support 45nm penryns. These are results for real users of these products. Deny all you want but a quad core chip drops right into a P5B deluxe and it works.
People have put Penryn ES samples, and it works....

You can say whatever you want, it doesn't change the facts and real results that people have been posting.

I have my p5B at 1840 FSB rock solid for over a year.
This board has been proven to have a very high stability rate even at higher FSBs. 1333 is a piece of cake FSB for this board and no you dont need to overclock. It's a simple Bios flash.

I would love to see how well a Phenom drops into the AM2 boards as AMD has touted it's backwards compatibility.; October 11, 2007 1:19 PM
Ho Ho said...: scientia
"You claim Larrabee is in between a GPU and CPU. Your argument stops there until you have an actual ISA reference for Larrabee."

Then your articles should be at least half as long as they are.

"Also, I can't imagine what piece of navigation equipment you believe is going to drain a ship's batteries. I'm assuming you mean a small boat which only has a limited generator on the outboard (for electric start). This is not a ship."

There are dredgers and tugboats. They aren't anything too big. Tugboats are roughly 20-30m in length without barge. All have onboard generators and batteries. With those batteries they feed the GPS'es, sonars and the software we wrote running on PCs. These ships are at work 24/7 and rarely come to harbour. The more time they can spend doing actual work instead of loading batteries the more money they make.

Long story short, less power usage translates to more money but there is a minimum limit for needed performance and no current PDA is good enough.

"Finally, I take it you've never seen a Timex/Sinclair. This computer defined the practical limitations of a small form factor."

As a matter of fact I have. What similar thing do we have today?

"Even the cheapest Celeron/Sempron system today will run Microsoft/Open Office applications adequately."

So wouldn't one core of Larrabee be able to do something similar? Intel itself said in its presentation that Larrabee will have around 30% per-core performance of the full CPU.

"AMD users get a slightly longer run to Bulldozer but it too uses a different layout on the socket"

Yes slightly assuming that Bulldozer won't be delayed too much. Socket AM2 lives on desktop from Q3 2006 to mid-2009. Not exactly too long. I wouldn't call AM2 exactly too good in terms of lifetime. Of cource when we apply a little logic then seeing Bulldozer in late 2009/early 2010 would make it better, in a way.

"I don't claim to be anything."

But you do make a whole lot of claims you don't (want to) back up.

abinstein
"Wrong, and I wonder do you ever read properly?"

What exactly was wrong?

"First, 1500mAh is just about 5Wh, or half of my estimate."

I wonder what kind of formulae you used to get that number. Care to show it to us? My formulae sais the battery should be loaded with over 3.33 volts to have 5Wh of life. What voltage did you use?

"Second, browsing the net takes very little power for computation and a lot power for things like wireless signaling and display;"

Yes it does take only a little power. Screen and wireless take a whole lot of power no matter what. Say we add a fully loaded CPU to that, how much more would that machine take power then?

OLED is nice but the price is HUGE. By the time it gets into PDAs and the like we will have several generations of CPUs.

"Third, for Silverthorne to function at all a lot more power is needed for the supporting chipset, memory, and I/O."

More compared to what?

greg
"Mo, what part of HT3, ddr 1066, and better scalability at higher clocks don't you get?"

To get any benefit from them you'd have to buy a whole new motherboard and most likely RAM too. How long lifetime would such a platform have? More than 1.5 years?

aguia
"Where did you get that info?
It’s right there in Asus web site"

Then lets take my motherboard that was released before C2D was. It is p5w dh deluxe. It supports all their current Core2's and will likely support 45nm too. How long will it take until it makes sense to upgrade from AM2 motherboard?

"Where is the Nvidia price drop?"

How should I know what your resellers do? It is not my fault they want to earn that big profits :); October 11, 2007 3:02 PM
Unknown said...: Scientia, I have a question for you:

When would you expect AMD to fix Barcelona's cache bug which is causing such low scores against K8 on some apps?

Thanks in advance!; October 11, 2007 4:10 PM
Aguia said...: How should I know what your resellers do? It is not my fault they want to earn that big profits :)

But you said this:

Estonian resellers have changed NV prices. I can get brand new 320M GTS for around €240. Though no 2900PROs are anywhere to be seen.

So what you mean for example is that resellers have Core 2 Quad 6600 at 130$ where they cost us 266$?

Because in no web site that card cost 240€, unless your talking in resellers price purchase, with 20% margins will get into the 290€ street price.; October 11, 2007 4:17 PM
Mo said...: Aguia,

240 Euros is roughly around $340. I can get you a 320M GTS for $340 ANYDAY.

240 Euros for 8800GTS 320M sounds about right.; October 11, 2007 4:43 PM
Unknown said...: Mo, if you don't back up what you say without specific reasoning, then it's not that you don't "think" it will be enough but that you "wont believe" that it could possibly be enough. This, is a sure sign of both bias and a flaw in reasoning.

hoho said:

Yes it does take only a little power. Screen and wireless take a whole lot of power no matter what. Say we add a fully loaded CPU to that, how much more would that machine take power then?

OLED is nice but the price is HUGE. By the time it gets into PDAs and the like we will have several generations of CPUs.

which means your points are even more moot (if it's even possible for something to be "more moot").

hoho also said:

To get any benefit from them you'd have to buy a whole new motherboard and most likely RAM too. How long lifetime would such a platform have? More than 1.5 years?

And to get any benefit from penryn's increased bus speed or to get any benefit from faster ram without modifying settings in the bios (which most users wont do) would be to buy a newer, if not higher end, motherboard. So, again, your point is moot.; October 11, 2007 10:14 PM
Unknown said...: Also, hoho, you don't need a better motherboard for ddr2 1066 or the higher scalability, so even more mootness.; October 11, 2007 10:16 PM
Unknown said...: Is it or is not?

I have had the P5B Deluxe for well over a year. It's the best board I've ever used.

Everyone and their grandma who tried a Quad Core in a P5B deluxe has worked with a simple bios update. It is a KNOWN fact that P5B Dlx supports Quad Core chips.
It is also been tested to support 45nm penryns. These are results for real users of these products. Deny all you want but a quad core chip drops right into a P5B deluxe and it works.
People have put Penryn ES samples, and it works....

That's 100% correct. I had an E6600 in this. Did a BIOS upgrade and this enabled full 1333MHZ FSB support and quad core support. Then I put a Q6600 in this. There are BIOS updates now that enable full 45nm Penryn compatibility.

I run this quad at 3Ghz (that's 3Ghz Quad on B3 stepping) 24/7 with zero stability issues. This board runs and performs awesomely.; October 11, 2007 11:07 PM
Unknown said...: Giant, irregardless, we're down to arguing how much of an effect ht3 will have on the overall performance of barcelona. My bet is maybe some, but not much. Just looking at the amount of bandwidth being absorbed by cpu traffic between anything other than the ram seems to point to this in most situations. If not, I think Intel would already be running into bandwidth issues, since the standard goes both ways.; October 11, 2007 11:59 PM
abinstein said...: Ho Ho -
"I wonder what kind of formulae you used to get that number. Care to show it to us?"

Actually I don't care. It's not my job to teach you basic electrical engineering and if you don't know where you are wrong I'm sorry, you just don't.

"My formulae sais the battery should be loaded with over 3.33 volts to have 5Wh of life. What voltage did you use?"

So what voltage do you use? The 3.3V is the norm for most mobile devices. It doesn't matter what's the voltage your Silverthorne use the amount of power is fixed for a certain battery.

Please also note that Intel's LV processors consume less power by lowering voltage, not amphere.

I can hardly belief that you can argue with others about mobile device power consumption when all you looked at is the power taken by a PC processor! Apparently you don't know what's important for the mobile device, and even when I told you the importance of SoC you keep using it the wrong way.

In short, an SoC chip with processor, NIC, and display interface will be a lot more efficient than a low-processor alone, even if it takes 3x the amount of power. And it doesn't matter how much computational power it has because nobody cares when even "surfing the web" is technically challenging.; October 12, 2007 12:33 AM
Unknown said...: Mo, if you don't back up what you say without specific reasoning, then it's not that you don't "think" it will be enough but that you "wont believe" that it could possibly be enough. This, is a sure sign of both bias and a flaw in reasoning.

I'll set myself to same standards as others and not provide links.

In order for Phenom to gain any performance from going to DDR1066 we would have to assume that the current barcelona is being bottlenecked by memory bandwidth thus we assume that 1066 may relieve that bottleneck.
But, it seems like Barcelona is not facing that much of a bottleneck. What the phenom will like is lower latency (AMD has always been latency sensitive) that we will get from DDR2...

Remember when AMD went from DDR to DDR2? The difference was VERY minimal.... Even though k10 is a whole new arch, it still does not suffer from the memory bandwidth bottleneck like the FSB of Intel does in high throughput situations (2-4 sockets).

You yourself have said HT3 won't bring much of improvement and thats what I said but you want links from me when you practically said the same thing?

You don't have to believe what I believe, I can guess too like Sci does all the time. So come November, We'll have to see.

At this point, I'm just going for "I told you so"...... and to be frank, I would love to be wrong. Because I don't have brand loyalty, I'll run whatever is fast, and faster is better. My last system was an Opteron 165. The switch to C2D was a $0 (i sold my AMD stuff for same price I got Intel stuff). So I gained some performance without spending any money. If I had to throw in some cash, I would have stuck with AMD.

What really ticks me off is some people just won't acknowledge certain things...; October 12, 2007 1:31 AM
Scientia from AMDZone said...: mo

I would expect dual core K10 to be perhaps 5% slower than dual core Penryn. For example, say a 3.0Ghz Penryn dual core would be as fast as a 3.2Ghz K10 dual core.

However, I'm still not seeing what I would expect to see with the benchmark scores.; October 12, 2007 1:45 PM

«Oldest ‹Older 201 – 331 of 331 Newer› Newest»

Scientia's Blog

Wednesday, September 19, 2007

The Top Developments Of 2007

331 comments:

Links

Blog Archive

About Me