Thursday, October 11, 2007

Waiting For The Wind To Blow

We seem to be trapped in the Doldrums lately, waiting. We wait for faster clocks and desktop versions of K10 from AMD and 45nm desktop chips from Intel. We wait to find out just how things really stack up.

I'm familiar with the changes in architecture between K8 and K10. These changes are quite good but, strangely, we haven't seen this reflected in the Barcelona reviews. We are well past the point of pretending that everything is normal at AMD. It seems to me that there are only four possibilities:

  1. The benchmarks are not complied properly for K10.

  2. The reviews were not very high quality.

  3. There is a bug in the revisions that have been tested.

  4. There is a major flaw in the K10 architecture.
It wouldn't surprise me if a lot of the standard benchmarks we see claiming to compare Clovertown with Barcelona actually use the Intel compiler which has been known for quite some time to handicap AMD processors. There has also been some suggestion that Intel has spent time tuning their code specifically for these benchmarks.

There is also little doubt that the Barcelona reviews were both rushed and sloppy, much sloppier the usual testing we see from places like Anandtech. However, even the Tech Report review was disappointing since the author didn't seem to have much grasp of the technical aspects of K10.

However, I've also seen benchmarks at SPEC. These are somewhat difficult to compare because the IBM sponsored scores change both operating system and compiler when moving from K8 Opteron to K10 Opteron. Taken at face value these would show an increase in core speed of 11% for FP and 14% for Integer for K10. A 14% increase in Integer is good enough that I can't say categorically that it is incorrect. However, the FP score is a problem since the FP pipeline for K10 should be nearly 100% faster. It's hard to claim that it wasn't compiled correctly because it used Portland Group's compiler which should fully support K10. In fact, AMD has been working closely with Portland Group to ensure that it does. We also can't claim test bias because the testing was performed by AMD itself.

Given the SPEC scores created by AMD with the PGI compiler, I think we have to assume that K10 either has a bug that hasn't been fixed yet or has a serious architectural flaw that is preventing full FP speed. Earlier I had wondered if K10 had a bug in the L1 cache. This still seems to be a strong possibility. The problem is that this type of bug would not appear in the list of errata. Secondly, as strapped for cash as AMD is these days they would have no reason to make this public since it would likely mean fewer K10 sales. The only real difference between a bug and a major flaw is how long it takes to fix it. In terms of architecture, something like an L1 cache bug should be fixable in six months.

A minor bug could be fixed in as little as eight weeks (if rushed) but it would take another month for the new chips to get into circulation. However, for a standard run the fix would take fifteen weeks and at least another month to ship. So, four and a half to six months would be typical. The problem however is that the 45nm version, Shanghai, runs with the same circuitry so a flaw in K10 pushes Shanghai back until the flaw is fixed. The fact that no tapeout announcement for Shanghai has been made yet supports this scenario. AMD originally planned to release Shanghai at midyear so tapeout should have occurred in July. AMD has maintained that the 45nm immersion process is on track so this would seem to only leave a design problem. If this is indeed the case then this is both good and bad. It is clearly bad because the last thing AMD needs at this point is another problem getting in the way of competitive production. However, in an odd way I suppose it would be good if the test scores we've seen do not represent the actual potential of K10. For now there is nothing to do but wait.

112 comments:

Mo said...

I am happy to see that you finally have an article on this subject. And that it's finally realized that all is not peachy at AMD.

If AMD knows that it's SPEC scores are lower than expected, why would they release them? Do they have to according to some signed guideline with SPec? I'm trying to understand why they released not so stellar numbers.

each delay is causing AMD to lose money they really don't have.

What sort of steps should AMD take at this point?

abinstein said...

Scientia,

In terms of single-threaded performance, Barcelona is about in the middle between K8 and Core 2. There is nothing strange about this. If you compare the core size estimated by Chip Architect, then you see that a K10 core (~26mm^2) is somewhat larger than a K8 core (~21mm^2) but nowhere near the size of a Core 2 (~32mm^2).

If we assume both AMD and Intel designers did their homeworks and made efficient use of die area, then the core size should at least indicate the relative IPC of these cores.

The strength of Barcelona/K10, versus Core 2, is at multi-core performance where number of cores is greater than (excluding) 2, or low-power where IMC, HT3, and advanced clock gating brings better efficiency.

Scientia from AMDZone said...

mo

Obviously if AMD is selling Barcelonas they are going to be benchmarked. The Integer score is believable but the FP score seems far too low; it should be at least as good as Clovertown.

Axel said...

I think we can safely assume that the new stepping supposedly used for Phenom will not show significant gains in IPC over the current Barcelona. The only tangible improvement will be from the use of unbuffered DDR2, and even that would be 5% at most. HT3: Absolutely no benefit to single socket performance.

So Scientia the next critical issue to tackle in order to get a true forecast for AMD's hopes of competing is all the roadmaps that VR-Zone has supposedly "confirmed" with AMD themselves:

1. Phenom X4 will not exceed 2.6 GHz until Q2 08. We already know that on the desktop, Phenom at 2.6 GHz is roughly equivalent to a 2.3 to 2.4 GHz Kentsfield. Hence, Phenom X4 is completely outgunned by Kentsfields in Q4 07 and Yorkfield in Q1 08 onwards, so Intel will reign uncontested in the high ASP quad-core desktop space.

2. Phenom X3 will not launch until March 2008. In addition, the same link shows that Phenom X2 (Kuma) will not launch until Q2 08. This means that Intel will effectively reign completely uncontested in the high volume dual-core $100 to ~$250 range for the next six months with Conroe and Wolfdale, especially as the mass rollout of Wolfdale in Q1 08 at these prices will push most Athlon 64 X2s into the sub $120 range.

3. Finally, rather than using valuable Fab 36 capacity to crank out dual-core K10 Kumas, AMD will be making faster K8 Brisbanes. The only logical explanation for this is that AMD cannot make fast K10s yet, even dual-core. There's a wall at around 2.6 GHz that they won't be able to really breach until at least Q2 08.

So the big question is, if these VR-Zone roadmaps are for real, does AMD have a prayer of surviving 2008?

P.S. The reason I hone in so much on AMD's finances in my comments is because most of the rosy outlook commentators on this blog seem to largely be blind to AMD's problems. AMD are only a couple quarters away from running out of cash at their current burn rate if they're unable to raise more capital.

Aguia said...

HT3: Absolutely no benefit to single socket performance.

The same s*** all over again?

HT3 is the biggest development in years regarding Bus technologies and motherboard longevity!

Let me explain the easiest way for you all understand (specially the Intel guys).
Just imagine Intel motherboards FSB independent. FSB clock relied entirely on the CPU! No more 400, 533, 667, 800, 1066, 1333, 1600, ... stupid updates that require a new motherboard each time the CPU clock speed is increased.

Aguia said...

axel,

If you can explain this road map from VR zone, then maybe I can understand those roadmaps you posted:

Intel Wolfdale-L

GutterRat said...

abinstein, wrote

If we assume both AMD and Intel designers did their homeworks and made efficient use of die area, then the core size should at least indicate the relative IPC of these cores.

Excuse me? Since when is core size an indication of relative IPC of AMD and Intel cores?

Why, that is one of the most far-fetched things I've ever read.

In fact, it may be the most far-fetched thing I ever read.

Christian H. said...

SPEC is an interestign benchmark because I have seen the same chip in a different system 30% faster.

I'd suggest all of you scan through the whole results page and compare chips from different vendors and models.

As afr as Shanghai, you can actually contrast it with Brisbane which didn't tape out a year before release but more like six months.

I long said that the tightness of AMDs purse strings may have caused them to delay certain runs to get more Brisbane and 65nm Turions out.

Even with SSO, 128 bit L1\L2, Core 2 does 4 IPC in perfect conditions so AMD WON'T catch up to Intel in single threading, but another possibility is that Barcelona is suffering from being on HT1.1.

I'd say also that optimization is not perfected for PGI compilers. They just announced K10 support a month or so before the release.

And then there's the fact that notables are saying thatK10 won't stretch its legs until 2.6GHz and above.

I don't thnik there is a serious bug, but there could be an issue. I mean even the mighty Intel had to respin C2D before volume release.

The word is out about newer steppings like B2F and B3 which should contribute to speeding up the arch or lowering the ACP. They have gotten Brisbane to 2.9GHz in a 65W envelope.

Either way the fact remains that K10 at 2GHz is killing K8 at 3GHz in nearly every test.

That's the point of a new arch; to be faster than the old one. And considering that Barcelona scaled almost 100% with clockspeed and the IMC speed goes up with core speed, it's easy to see that a 2.6GHz Phenom with 1066 RAM and 3.2GHz HT3 will show the true potential of the arch, because you have to remember that AMD crippled the Barcelona core by cutting the HT links back to 8 bit and even removed one for K8 compat in current mobos.


With that I think we can make a better analysis of K10 in general with the chips that have the links functioning as HT3.

Christian H. said...

1. Phenom X4 will not exceed 2.6 GHz until Q2 08. We already know that on the desktop, Phenom at 2.6 GHz is roughly equivalent to a 2.3 to 2.4 GHz Kentsfield. Hence, Phenom X4 is completely outgunned by Kentsfields in Q4 07 and Yorkfield in Q1 08 onwards, so Intel will reign uncontested in the high ASP quad-core desktop space.





And how exactly do we KNOW that? Phenom HAS NOT been benchmarked.

Why is it that Intel increasing their FSB speed does wonders but it won't for AMD.

Rahul Sood of HP siad PUBLICLY that Phenom at 3GHz will blow EVERTHING out of the water (the post is called benchmarks are wickedy-wack).


This implies that a 2.2GHz 9500 will be more than competitive with the mainstream quad models.

As far as finances, the ATi acq is starting to show fruit as FireGL is tested as MUCH FASTER than Quadro, 2600 is cleaning up in OEM orders, more handsets and cellphone deals.

Times are hard but suffering builds character. When you're in wih an 800lb gorilla whose practices are questionable at best, every little misstep will be considered indicative of being out of your league. AMD handls it well.

I'm not sure if I would look to price cuts as an answer bu tthe regaining of share for Q2 seems to bear out Hector's "share at any cost" direction.

Again if AMD has reached mature yield on Kuma\Rana SKUs ) much easier since it alleviates the "Barcelona is too big" worry.

And of course no one thinks that Barcelona has gained a lot of friends and AMDs server first strategy is kicking in.

AMD harped perf\watt and Anand's tests showed that 2347 is the most efficient CPU around right now.

That means more than conjecture and\or speculation. They have always stressed the "green" approach and Barcelona is good prep for BullDozer.

As SkullTrail shows, even HighK metal gates can't truly alleviate - currently - the effect of 3GHz+ clock speeds, hence perhaps the current "unvocalized "core wars."

But then maybe there is a problem that will need multiple layer respins. Not being privy to AMD or Intel confidential numbers, it's really impossible to say whether I strayed off-topic or not.

Unknown said...



Why is it that Intel increasing their FSB speed does wonders but it won't for AMD.


It doesn't do much at all. 2.66Ghz 1066 vs. 1333 on the FSB shows only a 2 - 3 % difference.

Unknown said...


The same s*** all over again?

HT3 is the biggest development in years regarding Bus technologies and motherboard longevity!

Let me explain the easiest way for you all understand (specially the Intel guys).
Just imagine Intel motherboards FSB independent. FSB clock relied entirely on the CPU! No more 400, 533, 667, 800, 1066, 1333, 1600, ... stupid updates that require a new motherboard each time the CPU clock speed is increased.


He wasn't talking about longevity. He was talking about performance. The difference in performance will only be a few %.

Azmount Aryl said...

I highly agree with your deduction Sci. I think the first indication of a problem came long-long ago when they first ran Cinibench 9.5 on 1.6GHz Opteron X4. Some people actually noticed that Cinibench 9.5 is FP heavy and in the past favored K8, just look at FX-74 vs. Core 2 QX6700 results.

I now wonder if AMD is going to use Phenom launch as base time at which they fix the problem and begin selling 'good' chips. Benchmarks of desktop CPUs get way more attention than that of a server parts.

abinstein said...

Gutterrat -
"Excuse me? Since when is core size an indication of relative IPC of AMD and Intel cores?

Why, that is one of the most far-fetched things I've ever read."


Maybe because you really lack proper education in circuit design?

Suit yourself.

GutterRat said...

abinstein wrote,

Maybe because you really lack proper education in circuit design?

Maybe you lack common sense.

I'll Fed-Ex you a copy of 'Circuit Design for Dummies', free of charge.

You are wrong.

Did you study circuit design under Bozo the clown?

Aguia said...

The difference in performance will only be a few %

But since the L3 cache is clocked according to the HT3 bus speed. I think the % will be higher than may of you think (At least in cache sensitive applications).

Current barcelona as a slow NB clock.
The desktop will get 2X faster.

Intel Fanboi said...

Abinstein said:
"In terms of single-threaded performance, Barcelona is about in the middle between K8 and Core 2. There is nothing strange about this. If you compare the core size estimated by Chip Architect, then you see that a K10 core (~26mm^2) is somewhat larger than a K8 core (~21mm^2) but nowhere near the size of a Core 2 (~32mm^2)."

According to the link you provided, Core 2 is 31.5mm^2. The link does not provide an estimate for K10 core size. However AceHardware estimated the K10 core size at 29.6mm^2. Thus they are not far apart.

"If we assume both AMD and Intel designers did their homeworks and made efficient use of die area, then the core size should at least indicate the relative IPC of these cores."

There are far more design trade-offs in chip architecture than size vs. speed to make a statement like this valid. Routing, power, features, etc., are also considered. Also, we can not disassociate the core from the effects of Intel's large cache or Opteron's IMC from performance equations. If your statement was true, we wouldn't need to run benchmarks, we could just measure the size of the die.

GutterRat said...

Where's abinstein?

I know he's around and posting on AMDZone and seems to be embroiled in a skirmish.

I would like to hear the argument behind his core size an indication of relative IPC of AMD and Intel cores remark.

intel fanboi's response was right on the money.

If you have any moducum of an engineering education you would not make such silly remarks as the one abinstein made.

We're waiting.

Unknown said...

when things don't go your way, you begin to avoid the topics/arguments all by themselves.

I have been questioning the availability, but some people INSIST that they can be bought anywhere, they have wide ability.

Yet all major manufacturers and vendors have "coming soon" on their site.

You probably won't see the chips for another month, have this be supply or another reason like oh i don't know, underperforming chips?

Sci is MIA...Abin is MIA.... rest of the people are MIA.....

Thursaday, AMD will announce their losses on thursday, and with K10 making no Impact on them what so ever, they're worse then projected.

Nothing seems to be going AMD's way these days and that's a sad thing for us all. I had expected more from AMD.

Scientia from AMDZone said...

Christian M. Howell

"As afr as Shanghai, you can actually contrast it with Brisbane which didn't tape out a year before release but more like six months."

AMD did with FAB 36:

Q1 06 - 90nm testing
Q2 06 - 65nm testing
Q3 06 - 65nm production
Q4 06 - 65nm delivery

It is true that by this schedule AMD could complete Shanghai in Q1 and deliver in Q3 08. However, the 65nm ramp was record setting and the volume was pretty low for Q4. I'm not sure we can count on this fast of pace with 45nm on the immersion process.

We know that AMD is running 45nm wafers now so let's split the difference and say that AMD probably needs Shanghai ready in Q4 for Q3 08 delivery.

AndyW35 said...

I have to agree it's still very much up in the air on final performance until the next month or so passes.

Is there a bug? I don't know but AMD have been claiming 40% over Clovertown and you have to assume that was in FP rather than Int. I seem to recall that this was a projected figure for a hypothetical 2.6GHz cpu so it makes me wonder, if there is a bug, when they new about it. The 40% figure was floated in Spring I think.

Another question somebody might be able to answer is that the FX Phenoms seem to be all AM2+ socket. What happened to socket F so that people who bought 4x4 boards can upgrade? That was the only good selling point for those systems, the ability to plonk in K10 when it arrived. I hope they will not be left out to dry.

Axel said...

andyw35

I don't know but AMD have been claiming 40% over Clovertown and you have to assume that was in FP rather than Int.

Also AMD, to my knowledge, never clarified whether this 40% referred to a per clock or per watt advantage over Clovertown. It was probably the latter, and even that claim only has a shred of truth in the HPC space. When Randy Allen claimed that this performance advantage would apply "across a wide variety of workloads", he clearly flat out lied.

LeeCooper said...

I don't understand what are you all talking about.

Here are the benchmarks!!!

http://www.google.com/translate?u=http%3A%2F%2Frblog-tech.japan.cnet.com%2F0061%2F2007%2F09%2Fquadcore_optero_4666.html&langpair=ja%7Cen&hl=en&ie=UTF8

Barcelona at 2.0Ghz is fast on SpecInt as Intel at 2.0Ghz, and on SpecFP is 40% faster than intel

Barcelna at 2.0Ghz is faster on SpecFP than the fastest Intel (3Ghz)!!!

Scientia from AMDZone said...

Aguia

"But since the L3 cache is clocked according to the HT3 bus speed. I think the % will be higher than may of you think (At least in cache sensitive applications)."

HT speed forms the core speed in K8 since this determines the crossbar speed where most transactions occur. However, K10 doesn't have this limitation. L3 speed on K10 is determined by the clock speed itself rather than HT clock. Therefore HT 3.0 adds little speed to single sockets on K10.

Scientia from AMDZone said...

I wrote about AMD's 40% claim some time ago. It was for speed, not power consumption and was determined by projecting K8 to quad core at 2.6Ghz based on SPEC.

The really funny part was that I wrote about this and then Kubicki at DailyTech thought he was being clever when he figured out the same thing two months later. Kubicki is not the brightest crayon in the box.

abinstein said...

scientia: "HT speed forms the core speed in K8 since this determines the crossbar speed where most transactions occur."

True. And neither HT nor L3 cache affects single-threaded benchmarks of Barcelona.

Barcelona has a very balanced architecture. It doesn't "improve" any single aspect of the processor alone while leaving the others starved.

gutterrat -

I'm of course reading as always, but not interested in teaching you basic circuit design concepts. intheknow got nothing right except his hand-waving arguments.

- Barcelona core size is estimated there on Chip Architect Pretty Pictures page (search 25.5 mm2). I'm always surprised how bad a reading skill the Intel-fanbois have.

- Routing affects clock rate, not IPC.

- Barcelona has more advanced power gating than Conroe, and thus should use even more transistors for the same ILP.

- Intel's large cache helps especially single-threaded benchmark performance more than Opteron's IMC.

My statement is simple: if two compatible cores with the same area have different average IPC, then the lower-IPC core has a poorer design.

Intel Fanboi said...

Abinstein made the following comments:

“If we assume both AMD and Intel designers did their homeworks and made efficient use of die area, then the core size should at least indicate the relative IPC of these cores.

And then added the following:

“If two compatible cores with the same area have different average IPC, then the lower-IPC core has a poorer design.”

These two statements are far too vague. In the first statement you compare Intel and AMD designs. Then in the second you retreat to “two compatible cores”. Thus the second statement contradicts the veracity of the first by applying a more general case to the argument. As to the second statement, how do we determine that two cores are “compatible”? For instance, if they had been designed to interact with different cache sizes and an internal vs. external MC, are they compatible? As another example, consider the case where each is manufactured on different processes with different characteristics. Wouldn't the architects target appropriate designs for said processes? How about if one company adds additional RAS features at an added cost of die size? Hundreds of questions like this could be asked when comparing two "compatible" designs.

“Routing affects clock rate, not IPC”

I am arguing that your conclusion is wrong because your basis is wrong. Thus I am focusing on the definition of "efficient use of die area", not "core size indicitive of IPC". Your basis “efficient use of die area” is far too vague. Two competent design teams (AMD and Intel), with two different fab technologies, with two different sets of engineering methodologies, with two different marketing goals, etc., can both have different yet correct interpretations of “efficient use of die area”. Lets discuss routing as an example. While both companies would seek to reduce routing to reduce die size (sure), each company may put a different weight on other routing issues such as cost (example: using less layers of metal), robustness (example: avoiding coupling), ease of re-spin, ease of ECO, time to compile, and the list goes on and on. Neither company would be incorrect, but would follow their own priorities as they target speed, reliability, cost, etc.

“Barcelona has more advanced power gating than Conroe, and thus should use even more transistors for the same ILP.”

This statement by you is a fine example of why your conclusion is incorrect. You have provided an excellent example of an engineering trade-off. More gates, same IPC, better power numbers. AMD made the correct decision for their needs.

“Intel's large cache helps especially single-threaded benchmark performance more than Opteron's IMC.”

I thought you were talking about IPC. Now you are talking about single-threaded benchmark performance. This is not the same thing. Again, which design is better? AMD and Intel both have excellent designs targeted at different needs.

As for the size estimate, you are correct that the link you provided did have a Barcelona estimate that I did not see. My apologies good sir.

Unknown said...

aguia
But since the L3 cache is clocked according to the HT3 bus speed. I think the % will be higher than may of you think (At least in cache sensitive applications).

Actually, you're incorrect. Even if the L3 cache is clocked the same as HT3, it still needs a mediator to actually pass the information to the cores, and this takes time.

As I explained to Baron before, HT3 won't help with the situation because the issue here is latency related, not bandwidth. As many people have addressed before, by bumping Intel's FSB only result in single digit percentage increase in performance.

Will HT3 be beneficial to AMD in MP server? Sure it will. Will it be beneficial in desktop? Not likely. Desktop applications are latency sensitive, not bandwidth sensitive.

Unknown said...

abinstein
In terms of single-threaded performance, Barcelona is about in the middle between K8 and Core 2. There is nothing strange about this. If you compare the core size estimated by Chip Architect, then you see that a K10 core (~26mm^2) is somewhat larger than a K8 core (~21mm^2) but nowhere near the size of a Core 2 (~32mm^2).

If we assume both AMD and Intel designers did their homeworks and made efficient use of die area, then the core size should at least indicate the relative IPC of these cores.


But that is based on the assumption of, both architectures are the same. However, they're not.

Yes, Core 2 is larger per core, but it is also faster in IPC than K10. Aside from that, Core 2s are also easier to manufacture, than K10.

I have to agree with you though, that K10's main advantage is in the MP server segment, where scalability is crucial to performance. On single socket, or DT / Mobile, K10 does not have the raw processing power to dethrone Core 2.

GutterRat said...

abinstein wrote,

I'm of course reading as always, but not interested in teaching you basic circuit design concepts.

What? Why would I be interested in learning circuit design from someone who does not know it? :) (yes, this 'someone' means you).

What do the pictures at Chip Architect have to do with anything?

There is no such thing as 'two compatible cores' when discussing AMD and Intel. There may be ISA compatibility to a point but to try and extrapolate beyond that is beyond your pay grade..

Scientia from AMDZone said...

I'll give credit to George Ou for calling and trying to find out about server availability but his conclusions are a bit lacking.

Apparently you can't get a Tigerton or Barcelona system at any of the big server vendors like HP or IBM right now. George wonders why.

The big clue to this situation is that Tigerton 2.93Ghz is available on NewEgg as are Barcelona 1.7, 1.8, and 1.9Ghz. Since NewEgg has a lower customer priority than, say, HP or IBM we can safely say that these vendors also have stock. If HP has stock but isn't shipping then the issue is not availability from a soft launch as George implies. The issue is certification.

Tigerton is a new processor with an all new chipset so it doesn't surprise me a bit if it takes some extra time for certification. Larger vendors will always certify before shipping. However, we know that Barcelona will drop into current dual socket Opteron boards so certification would theoretically be much faster. But, since Barcelona isn't available either we know that there has to be some issue. My best guess would be the BIOS. This suggests too that Barcelona requires a substantially different BIOS than Opteron and is also a likely indicator of more than typical errata. This again seems to be supported by the fact that Appros is shipping Barcelona systems since certification is typically either not as stringent for smaller vedors and/or faster because of fewer motherboards and configurations.

If Barcelona does indeed have an inordinate number of errata this too could suggest higher performance eventually.

JumpingJack said...

Scientia -- while I do not disagree with your assessment, I even used the validation arugment in other debates ... the channel appears disgruntled at Barcelona availability as well, and are not getting any examplations from AMD:

http://www.crn.com/white-box/202402138

This points to a supply issue more than a validation issue....

Jack

abinstein said...

gutterrat -

It doesn't matter whether you want to learn, you are simply wrong. While I understand that you lack the knowledge and training to understand the similarities and differences between Intel's and AMD's processor microarchitectures, the Chip Architect page makes plain & good estimates on Barcelona and Penryn core sizes - yet you still refused to read.

Maybe there are people who not only lacks knowledge but also the ability to learn (and yes that "people" mean you). :)

yomamafor2 -
"But that is based on the assumption of, both architectures are the same. However, they're not."

In terms of processor cores, Intel and AMD processor architectures (i.e., same ISA, same operation modes, and similar pipelines & ILP techniques) are 99% the same. Their microarchitectures are different, but that's precisely why we can compare how good a design their cores have with respect to their different sizes.

"Yes, Core 2 is larger per core, but it is also faster in IPC than K10."

Wrong. Core 2 does not have better IPC when multiple processes are running. It's likely to have less IPC than K10 when 3 or more threads are running and communicating with each other.

Core 2 only has better IPC with single-process, single-threaded applications, which is mostly due to its larger cache and larger core size. The price it has to pay is sacrifice of multi-process & multi-socket scalability (precisely what's important for multi-core systems).

Intel Fanboi said...

Yomamafor2 said:
"Yes, Core 2 is larger per core, but it is also faster in IPC than K10. Aside from that, Core 2s are also easier to manufacture, than K10."

Precisely. Connecting your comment to my comment on routing, Barcelona uses 11 metal layers, K8 uses 9 and Core 2 uses 8. This will affect ease of manufacture. However there is a cost Intel pays for having less layers; routing congestion and size. Simple engineering trade-offs.

Intel Fanboi said...

Scientia said:
"If Barcelona does indeed have an inordinate number of errata this too could suggest higher performance eventually."

Though I do not think we can dismiss issues with AMD's fabrication process, I do think there is the possibility of much truth to your statement. My theory is that the issue is not a collection of errata, but more of some piece of critical micro-arch not performing as predicted. Consider that both AMD and Intel plan future arch based on extrapolating from Moore's law on performance, so AMD had an idea of where they needed to be in 2007 when they designed K10. Thus K10 should be performing better at clock than it currently does.

Note: I have absolutely no supporting data for my statements.

Unknown said...

abinstein
In terms of processor cores, Intel and AMD processor architectures (i.e., same ISA, same operation modes, and similar pipelines & ILP techniques) are 99% the same. Their microarchitectures are different, but that's precisely why we can compare how good a design their cores have with respect to their different sizes.

The architecture of K8 and Core 2 are similar in the sense that they're all based on x86. Then they depart. They have distinctively different architectures, therefore, you cannot compare them. Core 2's architecture is based on CISC, while K10's based on RISC.


Wrong. Core 2 does not have better IPC when multiple processes are running. It's likely to have less IPC than K10 when 3 or more threads are running and communicating with each other.

This tells me how much you know about circuitry. IPC is instruction per clock a CPU can perform. It has nothing to do with multiple threads. That would be scaling, and yes, K10 has better scalability than Core 2. However, on a CPU per CPU bases, Core 2's raw performance simply surpass K10.


Core 2 only has better IPC with single-process, single-threaded applications, which is mostly due to its larger cache and larger core size. The price it has to pay is sacrifice of multi-process & multi-socket scalability (precisely what's important for multi-core systems).

Again, I agree with you on multi-socket, but not multi-core. Xeon only shows their limit when scaled above 4 cores / threads, and this is due to their MCM approach. Under that, K10's IPC simply cannot match up with Core 2's.

Scientia from AMDZone said...

jumpingjack

I have no doubt that the channel is lower priority but the quotes from the article you linked to don't really support the idea of a soft launch.

"Right now it's running at a week or so lead-time on orders, which is to be expected. We can place orders and get product when we need it. From that perspective, just a few weeks after the launch, it's been a pleasant surprise," said Shah Gautam, the president of Sunnyvale, Calif.-based custom server builder, Colfax.

-----

Let's say that Barcelona is running about 5% of volume right now. I think AMD is doing about 1.6 Million chips a week. That would be about 80K Barcelonas a week. From what we've seen, the channel would soak up 80K a week without ever noticing. So, I'm thinking Barcelona is at least 10% of production and that isn't bad for initial launch. This notion is also supported by the same article:

"Customers have just been amping for this, holding their breath. So people weren't making purchases of existing products, because they were waiting for Barcelona, like people will wait for a big movie premiere," said Appro's Lee.

----

I'm thinking that AMD is ramping normally but that the demand is high enough that there are still shortages. The descriptions that I've seen suggest that AMD is ramping aggressively. The other thing you don't get from that article is what specfic parts they want. If customers are only interested in the highest speed parts then perhaps there are shortages.

GutterRat said...

abinstein,

It's pointless to engage in a debate where you don't have the tooling or the knowledge, so please stop while you can.

To deduce that I lack the knowledge and training to understand the similarities and differences between Intel's and AMD's processor microarchitectures based on what I've *chosen* to post (or not post) is naive.

In terms of processor cores, Intel and AMD processor architectures (i.e., same ISA, same operation modes, and similar pipelines & ILP techniques) are 99% the same.

Please, stop...You are embarassing yourself and CPU/system architects the world over! same ISA? You must not read the instruction set references for AMD and Intel, I gather.

Why jump on gilboa on AMDZone due to AMD's lack of execution?

Here's something you wrote which exemplifies your 'unbiased' POV

Extinction? You mean the Penryn architecture, right? The hundreds $$$ power-hungry FB-DIMM modules are going to be useless once Nehalem is released?

You know what's a Penryn system really like next year this time? Chicken bones. You try to enjoy it but there's not much meat; you wanna throw them away but it's a pity (or ,maybe not).


Please get help soon.

abinstein said...

gutterrat -
"You are embarassing yourself and CPU/system architects the world over! same ISA? You must not read the instruction set references for AMD and Intel, I gather."

So I guess you can't install FreeBSD AMD64 on an Conroe machine, or a Linux EMT64 on an Opteron? I wonder how I did both over the past several years?

You are the one who seriously needs some help.

abinstein said...

yomamafor2 -
"The architecture of K8 and Core 2 are similar in the sense that they're all based on x86. Then they depart."

I have to disagree. You failed to distinguish between architecture and microarchitecture. The architecture of both Core 2 and K8/K10 are very similar - same CISC ISA, similar RISC-backend w/ OOO, superscalar & implicit register renaming (inside reorder buffer, unlike that used by Pentium 4), about same (12-14) number of pipelines, same memory access modes.

Their common architectures are in steep contrast to Pentium 4's Netburst, Transmeta's Efficeon, and even Via's x86 processors.

On the other hand, their different internal micro-op representations, different instruction decoding (Intel's 4-1-1-1 vs. AMD's 2-2-2) & issue (Intel's 5 ports of mixed IEU & FPU vs. AMD's separate integer & floating schedulers) are microarchitecture details. There are certainly some differences unless one is a clone of another (in which case it makes no point to compare them at all).

"They have distinctively different architectures, therefore, you cannot compare them. Core 2's architecture is based on CISC, while K10's based on RISC."

Sorry, but you are terrible wrong. Both bare based on RISC internally.

"This tells me how much you know about circuitry. IPC is instruction per clock a CPU can perform. It has nothing to do with multiple threads."

The way people calculate IPC is just to divide number of instructions by number of clock cycles. It's a meaningful & accurate measurement as long as it consistently generate same/similar results.

abinstein said...

Intel fanboi -
"Connecting your comment to my comment on routing, Barcelona uses 11 metal layers, K8 uses 9 and Core 2 uses 8."

I don't know how you can connect his comment with yours in any meaningful way, because routing affects clock rate but not IPC.

In any rate, your assumption that Barcelona has complex routing might explain why it is not running at high clock frequency now. Still, for single-thread benchmarks Barcelona is understandably slower due to its smaller core size; however, its superior chip-level (i.e., outside & connecting the cores) architecture makes it a better performer, clock-for-clock, at quad-core.

Scientia from AMDZone said...

yomamafor2

"Core 2's architecture is based on CISC, while K10's based on RISC."

As Abinstein said, you are incorrect about this. Intel has used a RISC based x86 core since Pentium Pro while AMD has used one since K5.

abinstein

"In any rate, your assumption that Barcelona has complex routing"

That's not what Intel Fanboi said. He said that because Intel uses fewer metal layers than Barcelona they have lower manufacturing cost but more complex routing. This is correct. However, I would agree with you that complex routing is more likely to limit clock rate.

Scientia from AMDZone said...

On the notion of die size being equal to IPC, I'm not so sure. There are some substantial differences in micro-architecture that make this difficult to compare. For example, Intel uses a combined FP/Integer pipeline while AMD uses separate FP and Integer pipelines. Intel uses more rename registers than AMD. The cache is inclusive for Intel but exclusive for AMD. The branch prediction is different. The OoO methods are different. There are differences in how loads and stores are performed including how addresses are calculated. And, we have differences in pipeline length.

I would say that in general, die size should be related to IPC but I'm not sure we can make that general statement here. In fact, I'm not even sure we can define what a standard instruction mix would be to determine IPC.

core2dude said...

The whole notion of IPC being proportional to die size is ridiculous. What IPC? On what benchmark? You could massively sup up your SSE unit, and get great IPC on SSE workloads, while suck at everything else. It all depends on what you optimize your CPU for.

abinstein said...

scientia -
"I would say that in general, die size should be related to IPC but I'm not sure we can make that general statement here."

I guess what you mean is die size should be related to IPC but the relation is not definite in the general case.

However what we're looking at is a very specific case: IPC of single-threaded benchmarks running a the core where 1) memory is definitely not the bottleneck due to large cache, 2) any I/O is purposely avoided (SPEC CPU), 3) an average of very different workloads are averaged (i.e., probably the least sided benchmark suite out there).

In this case, the core size should somewhat correlate to IPC. Actually you can find evidence of this in the history of K6/K7/K8 and P-pro/Yonah/Conroe SPEC scores.

"In fact, I'm not even sure we can define what a standard instruction mix would be to determine IPC."

You are right; there is no standard. But I believe SPEC is a good approximation - better than anything else for the average case, that is.

Or maybe we should look at the problem from the reverse way. If Barcelona with its 16% smaller core size reaches higher average IPC than Core 2, then we know that Intel designers didn't do a good job on Core 2.

Nylonox said...

--Abinstein, you couldn't be more wrong about this IPC vs. die size thing. I propose to you the following extreme example:

Company X designs an X86 CPU with a maximum clock frequency of 1Khz. Due to much lower timing constraints, the CPU can be designed for massively high IPC. The overall performance will be lousy due to its low frequency, but the die size will be quite small, again because of relaxed timing.

IPC isn't everything. You cannot neglect operating frequency when you talk about die size.

lex said...

Q3 results for INTEL are in... huge pop in units, stability of price, huge increase quarter to quarter and YoY in profts. Gross margin forecast for Q4 of 57%. Simply amazing how AMD has completly dissappeared.

Its clear how the wind is blowing for INTEL. 45nm on track, Penrym launch around the corner and Nehalem right after that.

Tick tock tick tock AMD is seing its limted life down to its last tock.

AMD will have another huge loss for the quarter. Their business model is simply unsustainable.

The winds are blowing right up Hector's ass and he doesn't even know it. He can't compete!

Scientia how is AMD going to compete? Invest in manufacturing, invest in silicon technology to support cost effective and performance competitive products to justify the billions spend in factories and billions spent to design new CPUs. Simply said, can't be done by a little company like INTEL.

Size matters and AMD simply isn't big enough

Unknown said...

*shake his head at Abinstein's comment*

I can't believe someone has so much passion for a company, that he would 1. flame a respected user, who happens to support the same company, and 2. come up with illogical comments like this.

Now, as I said, Barcelona's IPC is nowhere near Core 2's IPC. Barcelona can only prevail in multi-threaded (4 threads +) environment.

So it happens that Barcelona's core size is slightly smaller than Core 2's core size. It also happens that Core 2 only has 7 metal layers, as opposed to K10's 11 metal layers. Those two are not directly comparable. Therefore, your "die size = IPC" theory is not correct, and your "Intel's engineers are not doing a good job" statement is even more laughable.

GutterRat said...

abinstein,

You still want to 'teach us' something about circuit design?

So I guess you can't install FreeBSD AMD64 on an Conroe machine, or a Linux EMT64 on an Opteron? I wonder how I did both over the past several years?

Simple. It's called abstraction and the least common denominator principle. Heard of them? It does wonders when properly applied to software such as Operating System code.

I won't bother pointing out others disagreeing with your premise.

Professors at CMU, Michigan, Stanford, and Berkeley have all had a good laugh at your expense.

P.S. Must suck to see Intel with great Q3 earnings, doesn't it?

LOL

core2dude said...


If Barcelona with its 16% smaller core size reaches higher average IPC than Core 2, then we know that Intel designers didn't do a good job on Core 2.

Which it does not. If the benchmark fits in cache, Core2 eats barcelona for breakfast in almost any type of instruction mix.

Now before you start blabbering about how AMD has a better platform solution, remember you are the one who started comparing core-to-core performance. IMC/LLC/HT/FSB are not a part of the core.

abinstein said...

"Now before you start blabbering about how AMD has a better platform solution, remember you are the one who started comparing core-to-core performance."

I'm not the one who started core-to-core comparison - but the one who says it's pointless to make single-core comparison between two quad-core chips.

There above I only pointed out the simple fact that it's no surprising for Barcelona to have less single-core performance, because its focus is on quad-core scalability. Barcelona performs better than Clovertown clock-for-clock as a quad-core processor.

abinstein said...

Gutterrat -
"Simple. It's called abstraction and the least common denominator principle."

Please.. you want to make yourself look more ridiculous? What "abstraction" is there between an AMD64 FreeBSD running on a Conroe vs. on Opteron? Do you actually know what abstraction means? Have you actually looked at the FSBD kernel source for AMD64?

I bet you haven't. And let me tell you: there's no "abstraction" in the AMD64-specific kernel codes. Intel simply cloned AMD64 up to 99.9% compatibility (with few missing instructions & status bits).

"Professors at CMU, Michigan, Stanford, and Berkeley have all had a good laugh at your expense."

Who? And laugh at what? Given your apparent lack of understanding in these matters, I even doubt you have the ability to retell what I said truthfully.

Anyone with a bit understanding of OS would know that Conroe and Opteron are in the same processor architecture; their differences are in microarchitectures. Is this simple thing so difficult for you?

abinstein said...

"Therefore, your "die size = IPC" theory is not correct, and your "Intel's engineers are not doing a good job" statement is even more laughable."

I have never said "die size = IPC". I don't know where you get that. If you (or anyone else) can't read nor repeat what I said accurately, you are not worth of serious discussion.

I also have never said "Intel's engineers are not doing a good job". Again, don't put silly words of yours into my mouth.

And I really don't care who the guy is when I make correction to his false statements. There is no flame nor passion for any company, but passion for truth. I know this is too much for you to get considering you have the habit of putting wrong words into others mouth. Then so be it. I don't care.

abinstein said...

nylonox -
"Company X designs an X86 CPU with a maximum clock frequency of 1Khz. Due to much lower timing constraints, the CPU can be designed for massively high IPC. The overall performance will be lousy due to its low frequency, but the die size will be quite small, again because of relaxed timing."

That's a cute little hypothetical story but unfortunately it is not true between the Barcelona core and the Clovertown core. As I said, both have very similar number of pipeline stages, use similar OOO and superscalar techniques, and adopts almost identical ISA & operational modes. It's not like one is aiming at IPC while the other at clock rate, e.g., Niagara vs. Power6, or Efficeon vs. Netburst.

Besides, making a chip run at 1KHz with "massively high" IPC probably won't save you much die size. You might be able to reduce the pipeline depth to as few as 3 (decode, execute, retire) but the number of transistors will almost certainly increase due to the need to increase IPC.

abinstein said...

Intel fanboi -

Sorry I missed your colorful reply earlier.

"In the first statement you compare Intel and AMD designs. Then in the second you retreat to “two compatible cores”."

There is no "retreat" at all because Core 2 and Opteron/Barcelona are architecturally compatible. In fact my second statement is actually too general - Core 2 and Opteron are not only architecturally compatible but also microarchitecturally similar. The die size would be a better indication of IPC for the same type of microarchitectures.

"Hundreds of questions like this could be asked when comparing two "compatible" designs."

With the exception of IMC & cache, all other assumptions that you said are not true between Core 2 and K8/K10.

The large cache favors Core 2 in terms of single-core IPC. The IMC favors K10 in terms of multi-core scalability. Thus even if Core 2 and K10 had the same core size & circuit design quality, I'd expect Core 2 still have slightly better IPC due to the large inclusive cache.

"Your basis “efficient use of die area” is far too vague. ... but would follow their own priorities as they target speed, reliability, cost, etc."

Wow... you managed to type a lot of words without actually speaking much.

It seems to me you are the one who's trying to be vague and ambiguous. You talk about process technology difference - so in your opinion which company has the better process technology, Intel or AMD? Will the company with better process technology attain better or worse die area efficiency?

Also, which processor, Barcelona or Clovertown, has better RAS features? (Hint: look at which is used by more top supercomputers.) Will the processor with more RAS features spend more or less transistors for the same IPC otherwise?

In the end you can't escape the simple conclusion: Barcelona's smaller core size, more clock gating & RAS features, and arguably less advanced process technology, makes its lower single-core IPC not so surprising. Nevertheless, AMD designers made excellent chip-level tradeoff that Barcelona outperforms Clovertown clock-for-clock on multi-core.


"I thought you were talking about IPC. Now you are talking about single-threaded benchmark performance. This is not the same thing."

Why not? Single-threaded benchmarks reveal single-core IPC, and single-core IPC is what we're comparing here.

Pop Catalin Sever said...

"Why not? Single-threaded benchmarks reveal single-core IPC, and single-core IPC is what we're comparing here."

What's the point of comparing single core IPC on a quad core processor? This is just as superficial as comparing pure SSE throughput and not in the context of real life apllications. It's not like someone buy-in a quad core will make use of a single core and leave other 3 idle for the most part, that simply doesn't happen in real life especially in server space.

Nylonox said...

abinstein, wrote

If we assume both AMD and Intel designers did their homeworks and made efficient use of die area, then the core size should at least indicate the relative IPC of these cores.

--I'm sorry you can't just admit when you are wrong, Abinstein. Die size has too many contributing factors to say that die size indicates relative IPC. The example I gave illustrates a point, which apparently you refuse to accept. I won't waste my time further arguing with a know-it-all who can't admit mistakes.

Axel said...

Copied for posterity from Scientia's blog:

lex

Q3 results for INTEL are in... huge pop in units, stability of price, huge increase quarter to quarter and YoY in profts.

The astounding thing is that Intel claim to have sold over 2 million quadcores in Q3 alone. I don't know what the average ASP of an Intel quadcore is but I imagine somewhere around $400. That means that Intel realized nearly $1 billion in revenue from quadcore alone for Q3.

And there were people claiming that quadcore wouldn't make up much of Intel's revenue base this year? I hate to say I told you so, but in the AMD: Limited Options entry on August 17, Abinstein claimed:

"Quad-core is very small percentage in 2007 desktop, anyway. With the introduction of Yorkfield this is expected to change."

And I responded with:

"No, Intel's guidance on quad-core adoption in 2007 is sure to change in their next earnings call. Already 204 reviews of Q6600 on Newegg, these are selling like mad."

Now the majority of this 2 million was probably server, but the sheer size of the number clearly indicates that Intel's strategy is to push quadcore into the mainstream in order to strangle AMD on capacity. Intel evidently have the capacity to do it and are succeeding quite well at this strategy.

Scientia from AMDZone said...

GutterRat

"Simple. It's called abstraction and the least common denominator principle. Heard of them? It does wonders when properly applied to software such as Operating System code."

Hardware abstraction takes place above the compiler level. This is commonly used for driver interface code but your reference is presumably for OS kernal hardware abstraction. This is commonly necessary with both NT and Unix to allow common kernal code to run with varying types of memory management. This has nothing to do with an abstraction of the machine code itself.

Abinstein is talking about running the same compiled version rather than something recompiled. None of your references to abstraction had anything to do with this. I can't say for certain whether or not Abinstein was correct about the machine code version because I don't know how it was done. For example, I think the standard installer is sophisticated enough to do some kernal compilation during the install itself.

Getting back to your earlier reference to the least common denominator principle this again would not apply. This basically involves running a lower version of the code until you identify the processor. For example, you could be running 80386 code until you find out the specific cpu version. I don't think you will find an installer that runs in 64 bit mode. After the cpu is identified you can compile a new kernal image and then run this after rebooting. Least common denominator in reference to machine code is again a compiler (or assembler) level function.

"Professors at CMU, Michigan, Stanford, and Berkeley have all had a good laugh at your expense."

If they were laughing in agreement with your comments about abstraction then you must have done a poor job in describing the topic.

GutterRat said...

abinstein spewed again,

What % of FreeBSD is assembly vs portable C code, toolbait?

Traditionally, BSD make flags for gcc have used -O which generates portable code. And since I can use the Intel compiler to build FreeBSD, I can turn on specific flags to target use of instructions that AMD does not support.

As far as assembly is concerned: same principle applies: common denominator = lowest common denominator instructions.

Why you continue to embarass yourself in this manner is beyond me...

ROFLMAO

Scientia from AMDZone said...

lex

AMD increased their estimated earnings for Q3. I'm not going to get into this too much right now because AMD will post its earnings in detail tomorrow and we can start a new discussion.

Scientia from AMDZone said...

gutterrat

"What % of FreeBSD is assembly vs portable C code"

The original unix was about 10% assembler. With increases in the size of the OS code I would assume that the percentage of assembler is less today.

"Traditionally, BSD make flags for gcc have used -O which generates portable code."

Which has nothing to do with Kernal code.

"And since I can use the Intel compiler to build FreeBSD, I can turn on specific flags to target use of instructions that AMD does not support."

You don't seem to realize that you are now contradicting your earlier claims.

"As far as assembly is concerned: same principle applies: common denominator = lowest common denominator instructions."

Again, this only matters during installation. Once you've determined what cpu you are using you can use different machine code versions. Even inline assembler can have multiple versions.

Scientia from AMDZone said...

axel

"The astounding thing is that Intel claim to have sold over 2 million quadcores in Q3 alone."

This is not particularly astounding to me since it matches very closely to what I've talked about before. This is an increase in quad core volume from about 1% to about 3% of Intel volume.

"I don't know what the average ASP of an Intel quadcore is but I imagine somewhere around $400."

No. The average would be lower:

Right now we see quad tending to populate the price points for microprocessors that are sort of 150 and above

"That means that Intel realized nearly $1 billion in revenue from quadcore alone for Q3."

No; probably closer to half a billion.

"Now the majority of this 2 million was probably server"

Probably not:

One of the things that Andy talked about in his margin comments was that our mix in desktop processors towards quad-core over the course of the quarter actually improved

"Intel evidently have the capacity to do it"

Or not:

The level of inventories is lower than I would like given the current outlook for demand, capacity utilization and the ramp of the 45 nanometer process

Assuming we meet the midpoint of our revenue range we will be below our comfort level.

After Q4, below the comfort level of inventories?

Yes.


" and are succeeding quite well at this strategy."

Or not:

we saw a little bit of bad news offsetting that with the write-offs on the 45-nanometer products that aren't yet qualified for shipment.

Unknown said...


AMD increased their estimated earnings for Q3. I'm not going to get into this too much right now because AMD will post its earnings in detail tomorrow and we can start a new discussion.


Do you have a link for this?

I couldn't find anything in AMD's press section here:
http://www.amd.com/us-en/Corporate/VirtualPressRoom/0,,51_104_543,00.html

Thanks.

Re: 45nm processors, I doubt that Intel will be able to meet demand fully until FAB 28 in Isreal is online next year. I predict D1D and FAB32 will not be enough to meet demand.

Unknown said...

Additionally, if the PC market is doing well then AMD may report a smaller loss than was expected. Perhaps around 300M. Even a loss of that number would be cause for concern though, AMD needs to cut costs more and raise more cash badly.

The analysts aren't predicting that AMD will return to profitability until 2009.

I still see it as a possibility that AMD will sell FAB 30 or possibly one of it's business units to raise cash if things become really dire.

core2dude said...


Please.. you want to make yourself look more ridiculous? What "abstraction" is there between an AMD64 FreeBSD running on a Conroe vs. on Opteron? Do you actually know what abstraction means? Have you actually looked at the FSBD kernel source for AMD64?

Have you ever heard of HAL--hardware abstraction layer? Every OS has it--even Windows.

Scientia from AMDZone said...

giant

The link for AMD's earnings release?

Financial Earnings Release for Third Quarter 2007

AMD announces its Q3-07 earnings after market close on Thursday, October 18, 2007.

----

With AMD's results we should be able to have a full discussion.

Scientia from AMDZone said...

core2dude

"Have you ever heard of HAL--hardware abstraction layer? Every OS has it--even Windows."

I've already talked about this. The kernal hardware abstraction in both Windows NT and Unix is primarily to handle different memory management schemes. This does not mean that the machine code itself is abstracted.

Once again, Abinstein implied that he had ran the same kernal on both AMD and Intel hardware. I can't comment on this without knowing the details of how he did it. However, kernal level hardware abstraction has nothing to do with this since AMD and Intel cpu's do memory management in the same way.

core2dude said...


However, kernal level hardware abstraction has nothing to do with this since AMD and Intel cpu's do memory management in the same way.


Not quite! The MSRs could be different. The number of MTRRs supported could be different. Memory types could be different. Depending on whether it is hyperthreading or true multi-core, the restrictions on MTRRs could be different. If it is older CPU, it does not support SYSENTER/SYSEXIT. It differs between AMD and Intel. Intel started supporting PAE long before AMD did. All these differences between the CPUs have to be sorted out, and that is the responsibility of the HAL.

You can run the same kernel, no doubt. But depending on the CPU, it takes different paths.

In fact, on Mac, you can execute the same universal binary on Intel-based Mac and Power-based Mac. But depending on the CPU, the execution path is completely different.

When it comes to system programming, AMD and Intel CPUs are not as identical as most people consider them to be. And when it comes to VMM programming, they are significantly different. But the same Xen VMM build runs on both of them.

enumae said...

I and I believe Giant are interested in you comment AMD increased their estimated earnings for Q3, do you have a link for this?

abinstein said...

"Have you ever heard of HAL--hardware abstraction layer? Every OS has it--even Windows."

The kernel use the same HAL code to run on both AMD64 and EMT64. The difference between these two is in the few number of instructions and status bits. 99.9% of codes run interchangeably - and that is why there is not separate OS versions for them in ALL major OSes.

Again, both Barcelona and Clovertown core are 99.9% compatible; their differences are in microarchitectures.

Unknown said...

I would also like to see a link where AMD raised their Q3 estimates.

GutterRat said...

scientia wrote

"Traditionally, BSD make flags for gcc have used -O which generates portable code."

Which has nothing to do with Kernal code.


What makes you think 'kernel' code has to be written in assembly? Hello?

I've already talked about this. The kernal hardware abstraction in both Windows NT and Unix is primarily to handle different memory management schemes.

Are you saying that the primary purpose in the Windows HAL is to handle different memory management schemes? That's bogus.

"And since I can use the Intel compiler to build FreeBSD, I can turn on specific flags to target use of instructions that AMD does not support."

You don't seem to realize that you are now contradicting your earlier claims.


What contradiction?

As far as assembly is concerned: same principle applies: common denominator = lowest common denominator instructions."

Again, this only matters during installation. Once you've determined what cpu you are using you can use different machine code versions. Even inline assembler can have multiple versions.


Wrong you are. I can compile a program to statically depend on a minimum version of an ISA, use specific instructions based on CPUid at runtime, or whatever combination I'd like.

Unknown said...

Considering that AMD has refused to give earnings estimates for years, I would also like to see where they raised their estimates.

abinstein said...

gutterrat -
"What makes you think 'kernel' code has to be written in assembly? Hello?"

Which kernel code does not have assembly?

"Are you saying that the primary purpose in the Windows HAL is to handle different memory management schemes? That's bogus."

HAL can be used to handle different memory management as well as I/O access schemes.

In any rate the HAL is a *much* higher layer than that of architecture (non-)difference between Core 2 and Opteron. The fact that you'd bring that up to "show" Core 2 and Opteron can be architecturally different only shows that you don't know what we (probably not even yourself) were talking about.

"And since I can use the Intel compiler to build FreeBSD, I can turn on specific flags to target use of instructions that AMD does not support."

It has nothing to do with processor architecture difference. I can compile my code to run on some specific video card, too. Does that also imply my AMD64 with one video card is architecturally different from my AMD64 with another video card?

Again, there is practically no difference between Core 2 and Opteron architectures. I'm sorry there are people whose minds are so thick that they simply can't get it.

"Wrong you are. I can compile a program to statically depend on a minimum version of an ISA, use specific instructions based on CPUid at runtime, or whatever combination I'd like."

You can do that, but you don't need to do that to make your OS run on both AMD64 or EMT64.

Also, kernels generally do not call cpuid to choose different codes to execute in runtime. It'd use static ifdef-else macros. In the case of AMD64, there is very few difference between Core 2 and Opteron that such ifdef-else macros is rarely used to distinguish between the two.

lex said...

AMD doesn't give earnings estimate is because they have no clue and really have no control over their own destiny.

A few years ago they owned INTEL and controled their own destiny. Instead of taking that once in a lifetime opportunity to carve out a sustainable business plan they instead choose to be arrogant and lazy like their hated blue men. Then the blue men promptly got their mojo back due to their infinite bank account and infinite engineering resources. Now there is nothing AMD can do since there is no secret design AMD can unleash, some secret manufacturing edge, some secret technology edge. In fact the the manufacturing edge, the secret technology edge ALL belong to INTEL.

Its a wimp who can come out and tell me after the fact how things are that things are good or bad. Actually anyone who thinks AMD's Q3 results and Q4 guidance will change the larger way the wind is blowing is an IDIOT. Yes a IDIOT who knows not what it takes to do leading edge CPUs across the board.

It is funny to keep yanking you limp AMD fanbois. AMD is finished, don't let a slight uptick or another disastrious multi-hundred million loss with a good Hector/Dirk talk about bright future get you a booner. You can rub it as hard as you like it won't change the fact you are still going to go limp. AMD got no mojo plain and simple.

GutterRat said...

abinstein again,

The kernel use the same HAL code to run on both AMD64 and EMT64. The difference between these two is in the few number of instructions and status bits. 99.9% of codes run interchangeably - and that is why there is not separate OS versions for them in ALL major OSes.

First, there is no such thing as EMT64. Get your nomenclature correct.

Second, you are generalizing on the HAL and percentage of code that is 'interchangable' between Intel and AMD. Prove it.

Can you prove that the same rhetoric holds water with the custom HALs that OEMs like IBM, HP, Unisys, deliver?

You think it's just a matter of a few instructions and a few status bits being different?

Have you been exposed to the Windows sources? Maybe I have. You know, they let real universities have Windows source access :)

*snicker*

LOL

GutterRat said...

core2dude,

It's pointless trying to talk to people that don't have a clue about how to develop a real operating system to support mainstream CPUs features like Intel's.

Yeah, I'd like to know how pre "rev 10h" AMD CPUs could have used monitor/mwait in the idle loop instead of hlt.

ROFLMAO.

abinstein said...

core2dude -

Stop the nonsense will you?
When is HAL to handle MSR/MTRR difference in kernel? I don't know how Windows does it, but at least it's not the case for Linux/BSD. The way kernel deals with different processor architecture is just one: execute different codes explicitly. No abstraction can help you there because the architectural differences lay right below the kernel.

"The MSRs could be different. The number of MTRRs supported could be different. Memory types could be different."

These have nothing to do with processor core architectures. They just add or remove features. MSR means model-specific registers. When did a different model mean a different architecture? MTRR concerns memory typing, and is obviously outside the processor core architecture.

abinstein said...

gutterrat -

I don't know what you are trying to do, but asking pointless questions cannot make what you said correct. The fact is Intel cloned AMD64 and there is practically no architectural difference between the cores of Conroe and K8.

HAL is a design made by software vendors. If they only need to care about AMD64 (and Core 2), they wouldn't need HAL at the processor level at all.

"You think it's just a matter of a few instructions and a few status bits being different?"

Again, you have the bad reading skill of a typical Intel fanboi. The difference between AMD64 and EM64T is just a few instructions and special registers. The difference that HAL must deal with is of course much larger.

You try to dodge the fact that you are wrong, but you can't hide from truth.

"Have you been exposed to the Windows sources? Maybe I have. You know, they let real universities have Windows source access :)"

What does it have anything to do with core difference between Conroe and K8? Just because you maybe "exposed" to Windows source code, thus Conroe becomes miraculously architecturally different from K8?

Unknown said...

Would like to see where AMD raised their Q3 estimates.

Don't keep us hanging Scientia.

abinstein said...

nylonox -
"The example I gave illustrates a point, which apparently you refuse to accept."

The example you gave is wrong, and I told you why. I'm sorry, but the one who can't accept mistake is you.

I have never said bigger core size absolutely has better IPC. My original statement is this:

If we assume both AMD and Intel designers did their homeworks and made efficient use of die area, then the core size should at least indicate the relative IPC of these cores.

The argument here is at the core level, Clovertown and Barcelona have almost identical (instruction set) architecture and use similar microarchitecture techniques, thus if both microarchitectures are as good as the other then their core sizes should indicate relative IPC.

In other words, if a smaller core achieves better IPC, then the smaller core designer did a better job.

This is what I said. Now I understand there are people who can't understand subtle concepts, but saying something that I never said and then ask me to admit I was wrong in saying it is not just ridiculous but absurd.

GutterRat said...

abinstein

The fact is Intel cloned AMD64 and there is practically no architectural difference between the cores of Conroe and K8.

You are giving opinions.

AMD cloned Intel's 386 with its am386.
AMD has a cross-license with Intel.
Microsoft developed x86-64.
AMD didn't develop 64 bit computing.
AMD didn't invent IMC.

Get over it.

Unknown said...

Gutterat, how is that an opinion?

Lex, quit being so general and obnoxious. I'm very disappointed in the fact that I actually bothered to read your comment.

Jack, just because scientia hasn't posted since 3:00 doesn't mean he's avoiding giving proof. Chances are he actually has a life, or is busy. Just calm down and wait.

Guterrat, I'm fairly certain that the nature of your argument actually places the burden of proof on you. So try avoiding fallacy for a second and make a decent argument.

GutterRat said...

greg,

Bzzt! I don't think you have editorial color here. Back at you.

core2dude said...


These have nothing to do with processor core architectures. They just add or remove features. MSR means model-specific registers. When did a different model mean a different architecture? MTRR concerns memory typing, and is obviously outside the processor core architecture.

Huh? So ISA is outside core architecture?

First you claim that there is no abstraction--and the example you give is you can execute the same kernel on both processors. I point out ISA differences, and then you claim it is outside core architecture???

Are really that dumb, or you just like annoying people?

And BTW, the fact that you spell out the acronym for MSR just indicates that you googled it last minute to figure out what it was. No one in the OS world would ever spell that out. In fact, half of the system programmers wouldn't even remember the acrony--they would just know what it is.

And what about PAE? That is outside core architecture as well?

Retard!!!

Intel Fanboi said...

Abinstein first said:

"If we assume both AMD and Intel designers did their homeworks and made efficient use of die area, then the core size should at least indicate the relative IPC of these cores."

Which he reaffirmed with the following statements:

"If two compatible cores with the same area have different average IPC, then the lower-IPC core has a poorer design."

"We can compare how good a design their cores have with respect to their different sizes."

"Barcelona is understandably slower due to its smaller core size."

"Core size should somewhat correlate to IPC"

"If Barcelona with its 16% smaller core size reaches higher average IPC than Core 2, then we know that Intel designers didn't do a good job on Core 2."


But then made the following contradictory statements:

"I'm not the one who started core-to-core comparison"

"I have never said "die size = IPC"."

I also have never said "Intel's engineers are not doing a good job".

"I have never said bigger core size absolutely has better IPC"


For the love of Pete, just admit you are wrong!

And I wold like to add the following. You say "There is no flame nor passion for any company, but passion for truth." This is why I will read Scientia's blog and not your's. He is an AMD fanboi and will present the AMD perspective, but at least he ADMITS it (as I do with my nom de plume). Thus I am happy to read this blog, which serves my purpose of a "Devil's advocate". Thank you Scientia. Of all your claims, the idea that you are not a fanboi is the most unbelievable. NO ONE believes you are not an AMD fanboi.

Unknown said...

hey Intel Fanboy... just leave Abinstein alone.

First he lost his credibility by flamming an individual from AMDzone, which it turned out, he's more respected than Abinstein.

Secondly, his theory of core size is directly related to IPC is not endorsed, and / or understood, by most individuals here. Heck, even Scientia wouldn't agree with him. IMO, Scientia is more respected than Abinstein.

What can I say Abinstein? You brought it upon yourself. Maybe you should befriend with Sharikou instead. I guess you guys not only share the same passion, but also the same view.

Scientia from AMDZone said...

enumae

I didn't phrase that quite right. Rather than saying that AMD increased their earnings estimate, it should have been that estimates for AMD's earnings have increased. The numbers I've seen are half the previous loss of $600 million with only a small increase in revenue.

As I said before, we get the real numbers today.

Scientia from AMDZone said...

intel fanboi

"NO ONE believes you are not an AMD fanboi."

People can believe as they like. If it makes you feel better to believe I am heavily biased in AMD's favor then let nothing dissuade you. If you need to think that I post cherry picked news items favorable to AMD and unfavorable to Intel then don't let the absence of such posts slow you down.

Do your best to visit the sites that do post cherry picked items that favor Intel and make yourself believe that these are in fact the unbiased points of view. If this is what it takes for you to be happy and well adjusted then don't let reality get in your way.

Scientia from AMDZone said...

Finally, I hope anyone who wants to post has gotten the personal attacks out of their system; they won't be in the next thread.

Ho Ho said...

Couldn't resist posting here, sorry. It's just way out of control with all sorts of questionable claims coming from people.


scientia
"People can believe as they like."

I wonder what made me so special that you banned me once for much less ...


Btw, abinstein, VIA CPUs are also compatible with Core2/K10 as they also run regular x86 code without changing. By your theory they are relatively close to other x86 CPUs and die size should show IPC quite nicely.

I once actually thought you had a slight idea of what you were talking about.

AndyW35 said...

I would hope that with extra income from HD 2900 cards and some money from K10 the values would be a lot closer to 0 than -600m this time around.

My guess would be 3-400m loss range, if it's still around 600m loss then that is worrying.

Nylonox said...

abinstein
The example you gave is wrong, and I told you why. I'm sorry, but the one who can't accept mistake is you.

Where did you show I was wrong? Do you deny that a smaller core can be designed with higher IPC if clock frequency is disregarded?

Beyond that simple example, there are MANY other factors.

Do RAS features contribute to die size? - yes.

Do DFx features contribute to die size? - yes.

Does clocking strategy contribute to die size? - yes.

Do PM features contribute to die size? - yes.

Do virtualization features contribute to die size? - yes.

Do specific process technology characterisics (and I don't mean process node) contribute to die size? - yes

Do cell library differences contribute to die size? - yes

Does designing to meet certain SER goals contribute to die size? - yes

Does register file and other memory array design contribute to die size? - yes

Do ANY of these contribute to IPC? NO. I could go on...

Stop being stubborn and admit you are wrong.

Scientia from AMDZone said...

ho ho

There was a lot of name calling in this thread by a lot of posters. It won't be in the next thread.

andy

Yes, that is what I said: a loss of about half, $300 Million instead of $600 Million. We'll see soon.

nylonox

I think you would agree that increasing IPC within a given processor family does require more transistors. This is what I said above about IPC and die area being generally related.

However, I agree with you that this is not the only thing effecting die area. Often, you need more transistors to increase clock speed as well (unless you reverse your example and seek clock speed without regard to IPC).

Even as similar as Core 2 and Barcelona are I'm doubtful that we could expect a one to one die to IPC ratio even if we ignore the cache size.

abinstein said...

"Where did you show I was wrong? Do you deny that a smaller core can be designed with higher IPC if clock frequency is disregarded?"

You actually made two examples: one compares a clock-centric system to an IPC-centric system, the other talking about a 1kHz processor with massively large IPC.

The first example is wrong, and I have said that, because it does not reflect that between the core comparison between Barcelona and Clovertown.

The second example is wrong because even making a 1kHz processor with "massively large" IPC is going to cost you a lot of transistors. If n represents the degree of ILP in your procesor, the number of pipeline registers you save from lower clock rate reduces in O(n), but the complexity & number of transistors you need to add to raise IPC increase in O(n^2).

Finally, (and this to the other guys, too), yes die size is affected by many other things than IPC, such as cache size, IMC, scalability, routing, additional features (clock gating, SIMD, ...). But when Barcelona has a smaller core, there's nothing to be surprised at that it reaches lower single-threaded IPC in average.

OTOH, much of Barcelona's die area is spent on things outside the core, and non-incidentally 2.5GHz Barcelona scales to 50% better than 2.93GHz Tigertown on multi-processor performance.

abinstein said...

"Btw, abinstein, VIA CPUs are also compatible with Core2/K10 as they also run regular x86 code without changing. By your theory they are relatively close to other x86 CPUs and die size should show IPC quite nicely."

Do you know the core size of any Via processor? I'm quite sure a Via processor with smaller core size will reach lower averageIPC as well.

BTW, none of Via processors is compatible with x86-64 at this moment.

Also, note that it is relatively easy to design a processor with high IPC on some particular benchmark. However, the point here is an average IPC over a balanced mix of (single-threaded CPU) benchmarks.

Until people get the subtlety in dividing & manipulating complex concepts... there's little room for higher understanding.

abinstein said...

yomamafor2 -
"Secondly, his theory of core size is directly related to IPC is not endorsed, and / or understood, by most individuals here. "

Your poor reading skill and understanding capability again let you say things that I've never said and blame me for them.

I have never said core size directly related to IPC. I said core size is indicative of relative IPC, given some quite strong conditionals.

I'm sure you very much want spread FUD (like how you "quote" me above) to paint me black. That unfortunately only show your own level of ignorance.

Ben said...

Abinstein, that's fp_rate not fp, although the Xeon does have 2*the memory and a faster disk subsystem...

abinstein said...

Intel fanboi -
"For the love of Pete, just admit you are wrong!"

What I said is all there; just go check and read. What I "reaffirm" and what I "deny" are plain different, and it's not my fault if you can't get more subtle understanding on the topic. OTOH, you keep put words into my mouth and ask me to admit wrong in saying them, yet the only people who's saying the wrong things is you.

"Of all your claims, the idea that you are not a fanboi is the most unbelievable. "

I don't ask you to believe it, I only ask you to get a better understanding (which apparently you have none) of the thing under discussion.

BTW, you probably don't want to read my blog because you don't want to face the truth. If you think there's anything that's not true, then speak up in the comment area, rather than making false accusation somewhere in the corner.

OTOH, if you can't, or don't understand my blog (and I don't blame you), then you can't and you don't. It has nothing to do with anyone being any fanboi or not.

abinstein said...

ben -
"Abinstein, that's fp_rate not fp, although the Xeon does have 2*the memory and a faster disk subsystem..."

Yes, and that's why it's a multi-process benchmark, and the "fp_rate" is helped by things outside the Barcelona core, i.e., HT, IMC, etc.

And yes, the faster disk will help Xeon system, but probably just marginally observable.

abinstein said...

core2dude -
"First you claim that there is no abstraction--and the example you give is you can execute the same kernel on both processors. I point out ISA differences, and then you claim it is outside core architecture???"

I'm tired... please just go read kernel codes of freely available OS like Linux or FreeBSD, and tell me, where does the kernel use abstraction (HAL?) when talking to the processor or accessing special registers?

You can't find any because the kernel does not use abstraction when talking to the processor. There is no room to abstract to.

BTW, different MSR doesn't mean different ISA. There are the thing called model and that called architecture. They are just different. Unless you believe P-pro, P-II, P-III, P-4, Core, Core 2, and Penryn all have different ISA?


"And BTW, the fact that you spell out the acronym for MSR just indicates that you googled it last minute to figure out what it was. No one in the OS world would ever spell that out."

Gosh... no one spells out MSR? Just take a look at page 29 of AMD64 APMv2. Oh... I forgot, being such a Intel-only lover you must think AMD is "no one", too. Please then look at page 9-9 of Intel 64 SPGv3A.

I spell out MSR just to show you wrong. The MSR is specifically for the purpose of marking model differences under the same architecture.

lex said...

Q3 numbers are in!

Costs are up and margins are down. What a disaster for AMD!

Ramped 65nm which should have enabled double the die output as well as opportunity for higher performance higher margin products. AMD instead delivered higher costs with lower margins while ramping next generation technology.

Products are not competitive, technology are not competitive, resulting in dropping margins as ASPs continued to drop to garage sale levles.

AMD has no chance of recoverying here as 45nm is a couple years away. Barcelona and Phenom have really no chance to recover margins to the high 50's that is required just to break even for AMD. Penrym will force AMD pricing far lower then they orginall planned. Pretty much the roadmap they put in place 2 years ago never accounted for C2D performance/success and now AMD has no plan.

How is AMD going to pay for R&D it needs to do for 45nm silicon, products let along the 2 billion plus for a small 45nm manufacturing facility on a skinny 40% or so margins for the next couple years.


AMD can't make money in a booming growth, how are they going to make money when things slow down in 2008 when oil hits 100 bucks, the chinese stock market collapse and we get a recession next year.

The direction of wind blowing is clear. INTEL sailing with the wind! AMD getting blow away in a perfect storm

Unknown said...

Lets not be so negative.
It's better than what I had expected (450-500Mil)...
Still a whopping amount but not as bad as last qtr.

This puts amd at around 2Billion loss in 1 Year. this is a still a massive annual loss.

Axel said...

lex

Products are not competitive, technology are not competitive, resulting in dropping margins as ASPs continued to drop to garage sale levles.

The other big news of course is that AMD are shuttering Fab 30 until they can afford to upgrade it. In other words, they're down to one fab for the indefinite future. On the bright side, this will greatly reduce fixed operating costs.

Unknown said...

It's being shut down to BE UPGRADED.... not to wait.
It will open up as FAB38 once all the tools have been change and it's converted from a 200mm to a 300mm facility.

enumae said...

Jack
...It will open up as FAB38 once all the tools have been change and it's converted from a 200mm to a 300mm facility.


In listening to Hector in the conference call he says...

"But on the other side is as you heard today, is we have shut down fab 30, we are transitioning out to fab 38, and we're doing it in such a way that fab 38 is going to be like a race car idling in the pit stop. I mean, we will be prepared to ramp that quickly, should we need extra capacity in that space; which frankly, we hope we do.

But at this point in time, we're planning to have fab 38 at modest activity in 2008.
"

I would interpret that as 200mm tools will be sold and they will slowly install tool sets when capacity is needed which would allow them to save money in the early quarters of 2008.

Scientia from AMDZone said...

lex

I assume you were joking when you posted (either that or you had been huffing paint thinner all day) :)

"Costs are up and margins are down."

When of course in reality costs are down and margins are up.

"AMD has no chance of recoverying here as 45nm is a couple years away."

And actually, 45nm is scheduled for first half of 2008 which means late Q2. Naturally, 9 months is a lot shorter than 24 months.

Scientia from AMDZone said...

axel

AMD is not shuttering FAB 30. The situation is exactly as I described it months ago.

The original plan was to leave FAB 30 in operation as it converted to 300mm. However, since FAB 36 is projected to be able to handle the volume FAB 30 will indeed be taken all the way down. In other words, any outdated tooling will be sold and only pilot 300mm tooling will be installed.

This means FAB 38 will be operational at a low volume and can be incrementally expanded as needed.

Aguia said...

And actually, 45nm is scheduled for first half of 2008 which means late Q2.

Want to make a bet with me that its TSMC 45nm and not AMD?
TSMC 45nm now ready for production

And Meyer said first half 2008 not late Q2:
Meyer said AMD is already building 45 nanometre microprocessors, and he claimed that it will start its production ramp of 45 nanometre processors in the first half of next year.

45 nanometre microprocessors

Scientia from AMDZone said...

Aguia

"Want to make a bet with me that its TSMC 45nm and not AMD?"

Absolutely. The statement about 45nm processors has nothing to do with TSMC. AMD does not refer to its TSMC products as microprocessors.

AMD is currently running test dies on its 45nm Immersion equipment. These test dies would be Shanghai. I would expect a tapeout announcement later this quarter. This is consistent with both the already building 45nm processor statement and the 45nm ramp statement.

Scientia from AMDZone said...

I've now posted a new article. We can continue discussions there.

Axel said...

jumpingjack
It's being shut down to BE UPGRADED.... not to wait.

Scientia
However, since FAB 36 is projected to be able to handle the volume FAB 30 will indeed be taken all the way down.

No, I'm sorry but you gents are failing to read between the lines. If AMD had anticipated substantial continuing market share gains into 2008, they would not shut down Fab 30 entirely but instead would upgrade piecemeal while continuing 90-nm output in order to try to take more market share.

The bottom line is threefold:
- There isn't enough market demand for K8 to warrant the high fixed costs of operating Fab 30.
- AMD do not anticipate enough market demand for K10 in 2008 to warrant upgrading Fab 30 starting today.
- This returns AMD to one fab, with not much more die output capacity than they had in 2005 before the Fab 36 ramp, due to the rapidly increasing quadcore mix we will see in 2008.

In other words, AMD cannot afford to operate two fabs anymore and are doing this purely to save cost and wait and buy time for Bulldozer because K10 is not good enough to bring in the desired revenues. Yes they will slowly upgrade the tooling, the pace of which will mostly be dependent on the success of K10. I believe they are overly optimistic about the pricing they can command for K10 and in reality will not be able to bring Fab 38 on-line in 2008. I believe that AMD will be limited to a single fab throughout 2008, and will become severely capacity constrained as Intel drives demand for quadcore into the mainstream and AMD's product mix shifts towards 283 mm^2 die production.

And also, AMD said during the conference call that the details of "asset light" would not be revealed until they "do it". Info may be shared during December Analyst Day. I'm disappointed. The shareholders deserve to know what AMD plans in such a major restructuring.