About Assembly Language

Assembly Language - Division

The Reduced Instruction Set of all chips in the ARM family - from the ARM2 to the StrongARM - includes weird and wonderful instructions like MLA (Multiply with Accumulate: multiply two registers and add the contents of a third to the result) and ASL (Arithmetic Shift Left: absolutely identical to the Logical Shift Left instruction). Later chips contain a few additional instructions. However, so far as I am aware, room has never been found in the instruction set for something that would have been very useful - a DIV instruction.

Principles of division

Division in machine code is exactly the same as division by any other method - it is simply a matter of repeated subtraction. When we divide twelve by four, what we really want to know is how many times the number 4 will fit into the number 12, in other words how many times we can subtract the smaller number from the bigger number until the bigger number reaches zero. The answer in this case is, of course, three times:

12 - 4 = 8
8 - 4 = 4
4 - 4 = 0

When I was very little, I was allowed to play with an old-fashioned mechanical calculating machine. The front of the machine had an array of vertical dials, like the ones in a combination lock, on which you set up the numbers you wanted to calculate, and there was a handle on one side which was wound away from you in order to add the current number to the total on the display or towards you in order to subtract it. In order to carry out division, it was necessary to set up the first number and then to subtract the second number from it repeatedly (counting how many times you turned the handle!), just as we did in the sum above. Obviously however, this could be extremely slow if you were doing a sum like 128÷4 since the answer is 32, this is the number of times you would have to turn the handle!

It would be quite possible to carry out division in ARM code using this simple technique, by constructing a loop like this:

 MOV     R1,#128   ;divide R1
 MOV     R2,#4     ;by R2
 MOV     R0,#0     ;initialise counter
.subtract
 SUBS    R1,R1,R2  ;subtract R2 from
                   ;R1 and store 
                   ;result back in
                   ;R1 setting flags
 ADD     R0,R0,#1  ;add 1 to counter,
                   ;NOT setting flags
 BHI     subtract  ;branch to start of
                   ;loop on condition
                   ;Higher, i.e. R1 is
                   ;still greater than
                   ;R2. Answer now in R0

Since even the slowest ARM processor is far faster than a child turning a handle attached to a stiff set of gears, this might even be an acceptable solution. But there is a trick you normally use to save work on a calculating machine - and you can use it on a computer, too.

Shifting along

In practice, when dividing a large number by a much smaller number, you would shift the smaller number one or more places to the left by setting it up on the dials to the left of where it would normally be, leaving the right-hand dials set to zero. The degree to which you shift it would be dictated by common sense and mental arithmetic - if you can't subtract it even once from the other number, then you have gone too far! You would then subtract this shifted number as many times as possible to obtain the first digit of the result, reset all your dials to shift it back to the right, repeatedly subtract again to find the next digit, and continue this process until you reach your original number again and can no longer subtract even that.

Using our example of 128÷4, you would end up doing this:

Shift 4 one place to the left - it becomes 40.
Shift 4 another place to the left - it becomes 400. This is bigger than 128, so we have gone too far. We shall start with the previous result, which was 40. So:
128 - 40 = 88 (turning the handle once)
88 - 40 = 48 (turning it twice)
48 - 40 = 8 (turning it three times)
We can no longer subtract 40, so we have now found the first digit of the answer to be 3.
Now we shift our number back to the right so that it becomes 4 again, and once more start counting turns of the handle.
8 - 4 = 4 (turning the handle once)
4 - 4 = 0 (turning it twice)

We can no longer subtract 4 (our original number), so we have found the last digit of the answer to be 2. In other words, the answer is 32, the result we got before - but we have had to wind the handle round only five times in order to obtain it, not thirty-two times!

You will have noticed that this is almost exactly the method one uses when doing division using pen and paper... "four into one won't go... four into twelve goes three times... four into eight goes twice... answer is thirty-two." The difference is that we know, through experience, that 12 is 4 × 3 and 8 is 4 × 2. Machines, even electronic ones like computers, don't have this advantage - they don't contain a reference set of multiplication tables, so they have to do it the hard way.

Binary division

Computers use binary numbers. Believe it or not, this actually makes life much easier when it comes to writing a machine-code routine to carry out the division of one register by another! Since the value of each binary digit of the answer can only be 0 or 1, we can avoid the 'multiplication table problem' mentioned above.

Every time we shift our number one place to the right to obtain the next digit of the answer, we know that we will be able to subtract it either exactly once or not at all. We only have to carry out one subtraction per shift; the disadvantage is that since binary numbers have many more digits than their decimal equivalents we have to carry out many more shifts.

Dividing by powers of two

At this point, I shall use a new example. The reason for this is that, in binary, dividing by four or by any other power of two is extremely easy; no programmer in his senses would write a complicated division routine to do so when all he has to do is to use a single instruction to shift the register in question to the right, any more than he would get out pen and paper to calculate the answer to the sum '460 ÷ 10' instead of just mentally knocking off the zero to get the correct answer. A sensible program to divide 128 by 4 would look like this:

 MOV  R1,#128
 MOV  R0,R1,LSR#2 ;shift R1 2 places
                  ;to the right &
                  ;store in R0
                  ;answer now in R0

Since 4 is 2 × 2, all we have to do to divide by 4 in binary is to shift the register two places to the right, just as all we have to do to divide by 100 (10 × 10) in decimal is to shift two places to the right - e.g. from 600 pence we get 6 pounds.

Dividing by other numbers

However, this approach only works when we wish to divide by a fixed value which we know to be a multiple of two. For more general use, we need to take the subtraction-based approach outlined above.

To divide 50 (%110010) by 10 (%1010) in binary:

We shift 10 as far as we can (two places) to the left until it becomes %101000 and subtract this from %110010 to get the first digit.
%110010- %101000 = %1010 (First digit is 1)
Now we shift %101000 back one place to the right and try to subtract %10100 from what remains of the 25 we started off with.
%1010 - %10100 (Next digit is 0 - it 'won't go'!)
Shift right, get %1010 and try again:
%1010 - %1010 = %0 (Successful subtraction - next digit is 1)

Our '10' has now been shifted back two places to the right, returning it to its original value, which is our signal to stop and count up the digits in our answer - %101 in binary or '5' in decimal, w ich is of course the correct answer.

Implementing the routine in machine code

Having shown that we have a working algorithm for binary division, we now need to translate it into actual assembler instructions. I am going to divide R1 by R2; we shall also need to use registers R0 and R4. Before we start, there is just one essential check that must be made....

 CMP             R2, #0
 BEQ divide_end
 ;check for divide by zero!

Setting up for division

In order to divide R1 by R2, the first thing we need to do is to shift R2 left by the necessary number of places. The easiest method of doing this is simply by trial and error - shift until we discover that R2 has become too big, then stop.

 MOV      R0,#0     ;clear R0 to accumulate result
 MOV      R3,#1     ;set bit 0 in R3, which will be
                    ;shifted left then right
.start
 CMP      R2,R1
 MOVLS    R2,R2,LSL#1
 MOVLS    R3,R3,LSL#1
 BLS      start
 ;shift R2 left until it is about to
 ;be bigger than R1
 ;shift R3 left in parallel in order
 ;to flag how far we have to go

R0 will be used to hold the result. The role of R3 is more complicated.

In effect, we are using R3 to mark where the right-hand end of R2 has got to - if we shift R2 three places left, this will be indicated by a value of %1000 in R3. However, we also add it to R0 every time we manage a successful subtraction, since it marks the position of the digit currently being calculated in the answer. In the binary example (50 ÷ 10) above, we shifted the '10' two places left, so at the time of the first subtraction, R3 would have been %100, at the time of the second (which failed) it would have been %10, and at the time of the third %1. Adding it to R0 after each successful subtraction would have given us, once again, the answer of %101!

ARM code doesn't have the specific shift and rotate instructions present in non-RISC instruction sets. Instead it has the concept of the 'barrel shifter' which can be used to modify the value specified by the right-hand register for almost any instruction, without altering that register itself. For example, the instruction ADD R0, R1, R2, LSL#2 will add together R1 and (R2<<2) and load the result into R0, without affecting the value of R2 in any way. This can be very useful, but it means that if we actually want to change the value of R2 by shifting it, as we do here, we have to resort to moving it into itself via the shifter: MOV R2,R2,LSL#1.

The subtraction loop

Now for the loop that actually does the work:

.next
 CMP       R1,R2      ;carry set if R1>R2 (don't ask why)
 SUBCS     R1,R1,R2   ;subtract R2 from R1 if this would
                      ;give a positive answer
 ADDCS     R0,R0,R3   ;and add the current bit in R3 to
                      ;the accumulating answer in R0

In ARM code subtraction (a CMP instruction simulates a subtraction in order to set the flags), if R1 - R2 gives a positive answer and no 'borrow' is required, the carry flag is set. This is required in order to make SBC (Subtract with Carry) work properly when used to carry out a 64-bit subtraction, but it is confusing!

In this case, we are turning it to our advantage. The carry flag is set to indicate that a successful subtraction is possible, i.e. one that doesn't generate a negative result, and the two following instructions are carried out only when the condition Carry Set applies. Note that the 'S' on the end of these instructions is part of the 'CS' condition code and does not mean that they set the flags!

 MOVS      R3,R3,LSR#1     ;Shift R3 right into carry flag
 MOVCC     R2,R2,LSR#1     ;and if bit 0 of R3 was zero, also
                           ;shift R2 right
 BCC       next            ;If carry not clear, R3 has shifted
                           ;back to where it started, and we
                           ;can end

.divide_end
 MOV       R25, R24        ;exit routine

The next two instructions shift right R3, the 'counter' register, and R2, which holds the number we are dividing by. We specifically set the flags by using the 'S' suffix when shifting R3 since we want to know when the bit held in this register reaches the right-hand side. During a shift to the right, bit 0 is transferred to the carry flag while the rest of the bits move along. Since only one bit of R3 is set (it was originally loaded with %1 before being shifted left and then right), when the carry flag is set it indicates that, before the shift, the value of R3 was %1, i.e. we have shifted back to where we started and R0 should now hold the right answer.

Summary

At the end of the routine, R1 holds the remainder, if any, R2 has returned to the value it held on entry to the routine, R0 holds the result and R3 holds zero. Both zero and carry flags are set. This routine won't work for negative values of R1 or R2.

As with the results of integer division in Basic, the value in R0 will always be rounded to the next lowest whole number rather than to the nearest number. For example, 1156 ÷ 19 gives a result of '60 remainder 16' which is actually closer to 61 than 60.

Source:	Archive Magazine 14.2
Publication:	Archive Magazine
Contributor:	Harriet Bazley