Learning Assembly: Arithmetic

If you’re in coding at any level, you have some familiarity with arithmetic. Most of the arithmetic you’re likely to do in assembly is will be familiar and that should be no surprise at all. Some operations may be unfamiliar and some familiar operations may have nuance that feel funny at first, but trust me that the nuance is there for a reason.

Fundamentally, there are three models of arithmetic in assembly language and the differences all have to do with a pair of simple questions:

  1. Where do the parameter(s) come from?
  2. Where does the result go?

There are three variations and each is a logical extension of the other.

Accumulator Model

In the accumulator model, there is a dedicated register (or even two) which is used as the result for (nearly) all arithmetic and often as one of the parameters. A simple example is basic addition. You might see some code like this:

LDA Position
ADD #$5
STA Position

In this example, we’re reading the value of Position into register A (LoaD A). If this is unfamiliar to you, see the previous post on moving memory. Then we add 5 to it and store it back into Position. So, in essence, this is equivalent to:

Position += 5;

You’ll notice that we had to store the result explicitly. This is because the result ends up in register A. The accumulator model is very simple, but your code feels particularly wordy. That’s perfectly fine. It’s just the lay of the land.

Second Is Result

In this model, the second parameter is also the result. If I rewrite the previous example in this way it will look like this:

ADD #$5, Position

Second Is Result differs from the accumulator model in that no register gets directly modified, which is a nice convenience. It’s similar in that the CPU is going to do nearly an identical amount of work in both cases. Second Is Result is very common in current CPUs.

Third Is Result

In this model, the first and second elements are the parameters and the third is the result:

ADD #$5, Position, Position

Again, this is equivalent to both models and on the face it doesn’t look particularly useful, since in this example the result is the same as one of the parameters. But what if instead we started from C code like this:

newPosition = oldPosition + 5;

In this case, the assembly would look like this:

ADD #$5, oldPosition, newPosition

From this point of view, it looks like a very handy format, but there are problems – the number of places that you might use aren’t quite so common and the instruction decoding that the CPU needs to do starts to get more complicated. You can see that you could write this using Second Is Result with the following:

MOV oldPosition, newPosition
ADD #$5, newPosition

Or with Accumulator Model like this:

LDA oldPosition
ADD #$5
STA newPosition

Note that the complexity of accumulator model has not changed at all. The important takeaway is that there are usually very direct mechanisms of implementing basic arithmetic as expressed in programming languages and this shouldn’t be a surprise since assembly languages have influenced high level languages, which in turn informed assembly languages.

Besides addition, what’s available? It depends on the CPU, but you can expect to see instructions that do the following on lower end CPUs

  • addition
  • subtration
  • logical and
  • logical or
  • logical exclusive or
  • bit shifting left or right
  • bit rotation
  • complement
  • negation

On higher end CPUs, you would expect to see all of the previous as well as the following:

  • multiplication
  • division
  • floating point versions of arithmetic
  • sign extension
  • conversion to/from floating point

Where Things Get Funny (not funny ha-ha)

When you’re working with a budget CPU, you might not always have multiplication or division. Sucks to be you. You can implement them – the algorithms are straight forward, but multiplication is slow and division is slower. The CPU designers knew that it was likely that the CPUs would have to interact with people and people tend to be much happier with base 10 rather than base 16, so they included a means of doing base 10 called Binary Coded Decimal, or BCD. In BCD, 4 bits out of every byte are used to represent a decimal digit, but you are limited to 0-9 in each nibble. Unfortunately, BCD tends to be pretty clunky. On the 6502, you first had to put the CPU in a special mode with the instruct SED for “SEt Decimal”. This sets a bit in processor and when ever you did addition or subtraction, if the result of operation would leave a value greater than 9, it would adjust the the result. Once done, you had to undo the SED instruction with the CLD instruction (CLear Decimal), otherwise you affect your later code. The 6800 processor had a special bit that was set when the result of BCD arithmetic was incorrect and if so, you would use a special instruction after to fix it. Also gross. Both of these methods, however, were far cheaper to implement than multiplication and division.

Most CPUs have the ability to tell when things may have gone wrong or need attention. For example, after most operations there is a flag that sets set if the result was 0 or if the result was negative or if the result might have caused an overflow or underflow. I’ll talk about that more in a different post.

Most higher end CPUs perform math on several different sizes of input. For example, an x64 processor can perform math operations on 8 bit, 16 bit, 32 bit and 64 bit inputs.

Finally, one thing you see in assembly that you rarely see in high level languages is the ability to cheaply do arbitrary precision arithmetic. The way this is managed is that the CPU has a flag in it that represents binary carry. If you add two values and the result is bigger than what fits in the machine word, then the carry flag gets set. If you subtract two values and the result is less than what fits in the machine word, then the carry gets set.

Most CPUs have a separate instruction for addition with carry and without carry (or subtraction with borrow and without borrow). Why? Because you don’t always want to add with carry, sometimes you just want to add numbers and ignore the carry. The 6502 was not one of those CPUs. It didn’t have separate add and add with carry instructions. Instead, it only had add with carry. This was irritating because if you ever wanted to do plain addition, you have to make sure that you cleared the carry flag first. If you forgot to do this, then your result might be off by one depending on what happened beforehand. The 6502 was the first CPU I coded for and I didn’t understand this at first, which led to some very perplexing bugs.

Why would you leave out multiplication and division? For the most part, it comes down to cost and limitations in the registers. In a typical 8-bit processor, you’re not likely to see either because your registers are typically 8 bit. When you add two 8 bit numbers, the result is at most 9 bits (8 bits plus the carry). When you multiply two 8 bit numbers, the result is at most 16 bits. Where does the result go? In the accumulator model, if you only have 1 accumulator (like the 6502), where do you put the other 8 bits? Furthermore, the 6502 implemented its math through dedicated silicon called an Arithmetic Logic Unit or ALU. All of the math in the 6502 is 8 bit plus carry. Having two arithmetic operations that need 16 bits is suddenly expensive. We’ll cover the ALU

Specialty Math

Some processors put in specialty math operations. For example, it’s extremely common to want to add/subtract 1 from a register or from memory. You’ll sometimes see special instructions for doing this named INCrement and DECrement. Other processors make special “fast” versions of add and subtract that will only add or subtract a small constant value.

We see that performing arithmetic in assembly language is conceptually straight forward, but in practice you need to be aware of the quirks in any given CPU to ensure that you get the result that you intended.