I don't mind Apple products and the various tradeoffs that come with them, but if you hate them, I have good news for you. The reviews are coming in and at long last it seems like Qualcomm is making chips that can actually stand their ground against Apple's: if you get a laptop with a Snapdragon X Pro/Elite chip in you'll finally get the performance + ludicrous battery life that the Apple people have been lording over you since 2020.
It's so good, genuinely have been waiting forever for a good ARM laptop to the extent that I presently own a bad ARM laptop.
re AMD and Intel, they both have ARM divisions that I expect to see doing more server stuff, although they are in a much worse position than if this happened 10 years ago and haven't kept up with some of the more dedicated ARM designers, and much more business software is open source and will port trivially to ARM as compared to back then. Also I expect we'll see Nvidia's ARM chips cropping up more and more, one of my predictions is that in 1-2 games console generations we'll see an ARM xbox/playstation perhaps even with Nvidia graphics now that they could OEM the whole package very easily, which used to be a big AMD strength.
I do think that this is going to basically eat the laptop market once it trickles down though, and Qualcomm has finally actually tried to mainline these chips so even Linux will work. They've been mainlining the 7c for a while, which is what's in my bad laptop, and it's mostly all there (although video decoding does not work very well :( )
Why does ARM have these advantages over x86? What does it actually do differently?
So, if you casually search for the answer to this you unfortunately mostly get platitudes and horseshit. I am not an experienced processor designer so I can only offer you one step up from platitudes and horseshit. With that said:
The basic processor you learn about in school is a very simple machine, the idea is that you have memory somewhere, and at t=0 you start reading the first instruction in memory, do what it tells you to do, and then the next one, and then the next one. Sometimes an instruction will tell you to read a value out of memory, or write a new one into memory, or jump the execution to a new point, or do something with the values you've been manipulating.
x86 and ARM represent Complex and Reduced Instruction Set Computing respectively, CISC and RISC. RISC and CISC are somewhat distinguished by how many instructions they have, with RISC generally having simple, composable instructions, whereas CISC has more, special-purpose instructions, especially when you count "mandatory" or "core" instructions.
These are not hard-and-fast categories but for all but the most niche architectures most people agree on which are which, for reasons which might become apparent if I do a good job here.
Most instruction sets, even RISC ones, have a ton of performance-enhancing extensions that can operate on wide vectors and do other accelerated operations, so the overall instruction counts can be closer than you'd expect. There are still noticeable differences though, for example, a minimal ARM processor has to implement a couple hundred instructions, while a minimal x64 processor has to implement well over a thousand. A minimal to-spec RISC-V32 processor has only 47 base instructions.
In your toy baby processor, you execute one instruction at a time. The school version of this is the Fetch/Decode/Execute/Store cycle. You fetch an instruction from memory, decode it, execute it in your logic units, and then store back the results. In practice what this means is that at any moment, three of the four major components of your system are sitting idle, so we start pipelining. While one instruction is being executed, the one after it can be decoded, and the one after that can be fetched. If it turns out that the execution of the current instruction requires a new instruction to be executed, we stall the pipeline until we catch up, losing some performance in the moment but still gaining overall.
This is a good start, but there's another issue, which is that individual instructions do not all take the same amount of time to execute. CISC systems often begin with a dream of having really, really fast useful instructions that can be deployed by just using the right instruction call, you know, oh this popular encryption method uses an operation where we fold a number back on itself eight times and XOR it, so let's just do that in hardware. You can also think of it as a way to shrink code, if you have a microcontroller that can do a multi-step operation in one instruction.
Unfortunately very quickly you run up against limitations: implementing hardware logic for all your thousands of instructions uses up valuable silicon space, and a bigger chip is more expensive for several reasons (greater susceptibility to flaws and errors, lower yield per wafer, etc.)
So we do microcode. Microcode is when your processor contains a second, secret processor to do the actual work. When an instruction like "fold a number and XOR it" it comes along, a smaller, simpler processor that is maybe more like a pile of state machines and ALU's can be manipulated by a hypervisor built into the chip to execute the effective results of the intended instructions in a couple of cycles, generally (but not always) faster than a naive implementation but much slower than dedicated hardware.
If you want to know more about microcode, check out this C3 talk that goes into great detail about the x86 microcode, especially the front half.
This is slower than if you had huge expensive dedicated hardware, but you can shrink your dies to resuse hardware for multiple operations, and you can now pipeline microarchitecture-based operations to try and make up for it, squeezing multiple operations into your multi-purpose hardware. Intel started using microcode-like designs with the P6 architecture in the mid-90's, I mentioned this in my list of cool chips:
What are some of the coolest computer chips ever, in your opinion? Hmm. There are a lot of chips, and a lot of different things you could ca
This has some weaknesses though. If your CISC instructions are too complicated, they can start to become highly non-deterministic, they can take multiple cycles to execute, and now you need to dedicate a lot of processor design and effort to optimizing the pipelining and microarchitecture of these instructions to make more efficient use of your limited silicon. You leave yourself open to weird stalling and wasted compute cycles, and you spend a lot of silicon tracking all of this state and shuffling microoperations to try and make them run smoothly. Eventually you're spending so much silicon on managing microarchitecture that you lose any of the benefits you got by doing it
RISC systems generally don't do microcode in the conventional sense, because their instructions are so limited that you can kind of (do not say this to a processor designer) think of them as being microcode already. You compose the operations together into the end result you want at compile-time, rather than fussing around with stupendously complicated CPU designs.
Because of this, the individual instructions are much more deterministic and break down into pipelineable operations much more cleanly, so when combined with your compile-time optimizations you don't have as much work to do at runtime, you can use relatively simple pipelines and still get high utilization and lower waste on less silicon.
The end result is that you can shrink your silicon, reduce your complexity, while still maintaining performance parity with the CISC processors, because you're both doing the same kind of thing. Less silicon and higher utilization means better performance within a smaller power envelope, you physically have fewer transistors to switch and they're not being left idle for as long.
So why didn't we just start with RISC? Well, historical reasons mostly I think. CISC makes programming in assembly easy, because it hands the programmer nice "functions" built right into the silicon, and it trims program size by packing long operations into a single instruction. CISC comes from an era where optimizing your CPU to make assembly optimization easier and shaving some bytes off code by packing operations together made sense, and that wasn't the best idea. This article roughly agrees with me
Okay, so far I think most of this is pretty rock solid. This is where I talk out of my ass for a bit: The way I think about this is as the benefit of having insight into the purpose of code up and down the execution stack.
In CISC, you write some code, and your optimizing compiler will assemble it as best as it can to use these really big instructions. Those then get executed onto a CPU that has no idea what that code was, and why it is laid out this way. Your CPU now has to try and use lookahead and other tricks to just guess what you were trying to do, and then pipeline the microoperations that make up your compiled code. If it guesses wrong, the pipeline stalls, stuff has to bubble through, branches fall apart, and you waste a bunch of time computing microoperations that were unnecessary.
In a RISC, you do a lot of this ahead of time. You write your code, and your optimizing compiler can strip out huge chunks of your operations at compile time, if it determines that half of those "microoperations (not that)" are not actually needed. It knows where return values end up, what gets written to memory, and so on. When the code gets to the CPU, it has already been corralled into a nice format for the CPU, which you know a fair amount about, you know what instructions it has, you know how it pipelines because your instructions are simpler, so you can feed it really nice easy to optimize code, and it has a pretty good idea about what all those instructions do because they're very simple, if it didn't need to do them they would have been stripped out at compile-time, and it can focus on just executing what it does get very efficiently.
The result of this is that RISCs are very popular in the modern day, where compilers are good, no one really writes much assembly, and a few dozen extra kilobytes of instructions in a program is fine. You can save power and silicon space while still getting high performance by optimizing your pipelining with simpler operations that increase processor utilization without requiring complex babysitting hardware to oversee the pipeline.
I will note that there's another reason not-quite-correlated with the CISC-vs-RISC designation here which is that x86 is old.
It's not impossible to design an efficient, easy-to-implement CISC processor! In fact, because of the nature of modern computers (where CPUs are relatively fast but memory is slow), saving instruction count can be a fairly important part of coding! Some people even argue that a well-optimized CISC design might be even faster than a RISC one.
x86 is... not that. In fact, it's sometimes difficult to argue that it's meaningfully a CISC instruction set in the way it's actually used. For example, the Intel optimization reference manual notes:
It is generally a good starting point to select instructions by considering the number of micro-ops associated with each instruction, favoring in the order of: single micro-op instructions, simple instruction with less than 4 micro-ops, and last instruction requiring microsequencer ROM (micro-ops which are executed out of the microsequencer involve extra overhead).
Which is to say "all these CISC instructions suck, please stop using them". If you look at code generated with an old compiler, like (ex) TurboC or something, you'll note that the code looks very different - they use ENTER to setup stack frames, they use REPNE SCASB* to measure string length! Some even use LOOP** to (get this) write loops!
Turns out, all these nice instructions were great for writing assembly and some types of compilers, but were difficult to implement in hardware, so by the time of the original Pentium, they were on their way out. Lack of usage of these instructions (because they were slow) also meant that there was no impetus to make them run faster (because no programs used them) and so all of those fancy CISC instructions have basically been kept around only for legacy reasons, and a whole bunch of them were scrapped when the move to x86_64 came along. Some were already slower on the 386! The original x86 processor contains a truly insane amount of legacy instructions that practically noone uses.
However, when instructions fall out of favor, there's still a massive body of programs that still use them! As much as operating systems love to keep backwards compatibility (Windows in particular being pretty famous for this in the win95 days), even more extreme are CPUs. After all - any code compiled for a previous x86 processor should just work(tm) in a new one! Which means...
It doesn't matter that probably a single-digit number of people have included the instruction "ASCII adjust AL after addition" (AAA) unironically in a program since the 286 came out, you still have to support it (in 16/32-bit mode, at least), eating into your precious, precious opcode budget, bloating up the instruction set and increasing code cruft.
All of this means that a modern x86_64 processor is basically almost always used in a fairly RISC-like fashion (lots of registers that are basically interchangeable in most cases, simple instructions, etc.) (with maybe one exception being all the addressing modes, but honestly those might be comparable to just a macrofusion of a load/store and an operation). It just comes with lots of cruft (for example: on a modern x86_64 processor, the "first 4" registers (the ones that map to r0...r3, if you ever want to use those) are... rax, rcx, rdx, rbx. Why those names? Because of ease of code porting from the Intel 8080, a processor from 1974. Modern computer registers have funny names because of a 50 year old processor.***
ARM, on the other hand, is fairly blank slate! The registers in ARM are called... r0-r15. The last 3 can be sp, lr, pc, if you want, all registers with fairly well-defined names. (No more A for accumulator!). Because the ARM instruction set was designed later, drawing on academic work, it benefits from having a much more sane instruction set design, leaving the weird CISC cruft that even x86 has outgrown these days. Of course, because nothing in this world is cut and dry even ARM and RISC-V aren't true RISC! If you write a load to a register and then a load to the high half of it, they will happily fuse them together and pretend you wrote a longer instruction - ie. pretend it's a variable length one!
In short, ARM winning over x86 isn't just a "RISC winning over CISC" thing - modern x86 is plenty RISC already, and compilers (and processors) are always looking for more freedom to do CISC-like things anyways, even in a RISC environment****. x86 is just an incredibly old instruction set architecture that slowly metastasized out of a cute instruction set in the 1970s to become the world-consuming monster it is today. RISC does make more sense to use in a modern format! Compilers are really good at optimizing code and building a new CISC ISA would get you labeled a bit insane anyways! But x86's main weakness isn't the fact that it was originally CISC! It's the fact that it's a legacy, poorly-throught out design.



















