Debugging Assembly Code

Home


Figure 1 : Bugs

In SYS1 bugs have googly eyes and are green :), debugging any code is tricky, but debugging assembly code is even more fun :). Testing any code requires that the behaviour of the program can be observed, controlled and then compared against its expected behaviour. The low level nature of assembly code makes this difficult i.e. each instruction doesn't do a lot and may be interacting with other hardware components (peripheral devices) with their own internal state and thread of control e.g. a serial port (UART) etc. To print "Got here" in a high level language such as python is one line of code, for an embedded processor using a low level language requires 10s to 1000s of lines of code, depending on what peripheral devices you are using e.g. serial port or HDMI. Therefore, finding errors in these very long instruction traces is tricky, as always its a question of a program's testability, which is determined by its:

Designing a system with good controllability and good observability is difficult, sooo, to help solve this problem we can use simulation. This is not a magic bullet, we still have a lot of issues to solve, but it can be a good first step to help identify those "simple" bugs. Next, if required we can consider using test equipment: oscilloscopes, logic analysers etc, to check that our assumptions about how the processor works and real world signals are changing are correct :).

The material below is focused on debugging assembly code for the simpleCPU family of processors, but some of these ideas could be applied to other processors. To illustrate these techniques consider the simpleCPUv1a and simpleCPUv1d test code below. These programs write the values 0-199 and 0-999 to memory, then accumulates this data in TOTAL. The purpose of this code is to produce a lot of "noise", a lot of instructions, a long instruction trace that can be analysed. Using these example programs we will now consider how you would go about test these program, look at how we can simplify different debugging scenarios.

Note, these examples assume you have done SYS1 labs 5 and 6, so are moderately complex :)

##########################
# SimpleCPUv1a Test Code #
##########################

start:
  move 0               
  store CNT            # zero CNT

write_loop:
  load CNT             # calc write address
  add DATA
  store 5 write_data
  load CNT             # read data

write_data:
  store start          # write CNT to DATA+CNT
  add 1                # inc CNT
  store CNT      
  subm ra N            # is CNT = N?  
  jumpnz write_loop    # no repeat

accum:
  move 0  
  store TOTAL          # zero TOTAL             
  store CNT            # zero CNT

accum_loop:
  load CNT             # calc write address
  add DATA
  store 4 accum_read

accum_read:
  load start           # read DATA+CNT
  addm TOTAL
  store TOTAL

  load CNT
  add 1                # inc CNT
  store CNT      
  subm ra N            # is CNT = N?  
  jumpnz accum_loop    # no repeat

trap:
  jump trap            # stop

N:
  .data 200            # size of array
TOTAL:
  .data 0              # accumulator total 
CNT:
  .data 0              # accumulator total 
DATA:
  .data 0              # first address of DATA array 

Figure 2 : testcode_v1a.asm

Code is split into two blocks: write (init memory) and accum (calc result). To overcome the limited addressing modes available on the simpleCPUv1a both blocks use self modifying code i.e. the store 5 write_data and store 4 accum_read instructions. These modify the operand bit fields within other instructions to allow the processor to iterate through memory / array. The end of the program is signalled by entering an infinite loop i.e. the is no OS to roll back to, there is no "sleep" instruction.

##########################
# SimpleCPUv1d Test Code #
##########################

start:
  call write           # init DATA array, 0..999
  call accum           # accumulate DATA array, 0+1+2+3+ ... + 999
trap:
  jump trap            # stop

write:
  move rc 0            # rc = data value
  move rb DATA         # rb = data address

write_loop:
  store rc (rb)        # write data to memory
  add rc 1             # inc data value
  add rb 1             # inc data address

  move ra rc           # is value = N? 
  subm ra N
  jumpnz write_loop    # no repeat
  ret                  # yes exit
  
accum:
  move ra 0            # zero TOTAL
  store ra TOTAL
  move rc 0            # rc = loop counter
  move rb DATA         # rb = data address
accum_loop:
  load ra (rb)         # read data value
  addm ra TOTAL        # add TOTAL
  store ra TOTAL       # update TOTAL
  add rc 1             # inc loop counter
  add rb 1             # inc data address

  move ra rc           # is loop counter = N
  subm ra N
  jumpnz accum_loop    # no repeat  
  ret                  # yes exit

N:
  .data 1000           # size of array
TOTAL:
  .data 0              # accumulator total 
DATA:
  .data 0              # first address of DATA array 

Figure 3 : testcode_v1d.asm

As the simpleCPUv1d supports the register indirect addressing mode we do not need to use self-modifying code, also as it supports subroutines we can implement the write and accum blocks as subroutines. Finally, as my adding skills are limited i wrote the python code below just to double check what the totals should be :)

total = 0
for i in range(1000):
    total += i

print( "ACC 0-999 = " + str(total) + " " + hex(total) + " " + hex(total&0xFFFF) )

total = 0
for i in range(200):
    total += i

print( "ACC 0-199 = " + str(total) + " " + hex(total) + " " + hex(total&0xFF) )

Figure 4 : check.py code (top), results (bottom)

Note, the simpleCPUv1a is an 8bit machine sooo looking for a result of 0x00BC, the the simpleCPUv1d is an 16bit machine sooo looking for a result of 0x9F2C.

ISS

To debug standalone fragments of code perhaps the easiest solution is to use the Instruction Set Simulator (ISS). The previous test code are good examples of such code i.e. short sections of code / subroutines, processing a known data set. Where the ISS has problems is when you have code that needs to interact with other peripheral devices. These hardware components can be approximated in the ISS as a mix of "special" read-write memory and read-only memory e.g. GPIO. However, for more complex components such as UARTs or HDMI controllers this becomes a little tricky. These types of components have their own state and thread of control, therefore, its difficult to produce hardware accurate models of these components i.e. model how a serial terminal or a monitor behaves etc.

The development of the simpleCPU ISS is discussed in more detail on these webpages: (Link), (Link). This simulator can be launched via a GUI, but its easier to launch it via the command line. Before we can simulate this test code we need to assemble it i.e. generate the .dat file. Assuming you are in the ISS folder described in the the previous link, to assemble the simpleCPUv1a test program we can execute the command below:

python3 bin/simpleCPUv1a_as.py -i code/testcode_v1a -o code/testcode_v1a

This command generates the required .dat file used by the ISS. To simulate the simpleCPUv1a test program running on the processor we can next run this command:

time python3 bin/simple_cpu_v1a_simulator.py -s2 -r1 -i code/testcode_v1a > dmp

Figure 5 : command (top), results (bottom)

The -s2 switch = full speed, -r1 switch = run to completion i.e. disable user interaction, -i switch is the name of the .dat file to be simulated, all outputs are written to the file: dmp. To run this simulation took 0.03s, the dump file contains a full instruction trace of the instructions executed and a memory dump of the processor's memory when the program executed the "jump trap" instruction i.e. enters an infinite loop at the end of the program, extracts of this file are shown below:

MEMORY MAP
----------
   ADDR           READ                WRITE     
0xFF (255)   Port B DATA OUT    Port B DATA OUT 
0xFE (254)   Port B DATA IN     Port B DATA OUT 
0xFD (253)   Port A DATA OUT    Port A DATA OUT 
0xFC (252)   Port A DATA IN     Port A DATA OUT 
0xFB (251)   RAM                RAM             
...  (...)   ...                ...             
0x00 (000)   RAM                RAM             

WARNING : this simulator is not guaranteed to be functionally accurate when compared to the HW
Press CTRL-C to stop simulation run
000 : move 0 -> acc:0
001 : store 28 -> M[28]:0
002 : load 28 -> acc:0
003 : add 29 -> acc:29 z:0
004 : store 5 6 -> M[6]:20509
005 : load 28 -> acc:0
006 : store 29 -> M[29]:0
007 : add 1 -> acc:1 z:0

... 

017 : load 228 -> acc:199
018 : addm 27 -> acc:188 z:0
019 : store 27 -> M[27]:188
020 : load 28 -> acc:199
021 : add 1 -> acc:200 z:0
022 : store 28 -> M[28]:200
023 : subm 26 -> acc:0 z:1
024 : jumpnz 14 -> False z:1
025 : jumpu 25 -> True
025 : jumpu 25 -> True
    : k=keys, r=run, s=step, v=registers, x=read, w=write, i=input, o=output, q=quit

acc:0 z:1
00: 0000 501c 401c 101d 5506 401c 50e4 1001 501c 701a a002 0000 501b 501c 401c 101d
10: 5411 40e4 601b 501b 401c 1001 501c 701a a00e 8019 00c8 00bc 00c8 0000 0001 0002
20: 0003 0004 0005 0006 0007 0008 0009 000a 000b 000c 000d 000e 000f 0010 0011 0012
30: 0013 0014 0015 0016 0017 0018 0019 001a 001b 001c 001d 001e 001f 0020 0021 0022
40: 0023 0024 0025 0026 0027 0028 0029 002a 002b 002c 002d 002e 002f 0030 0031 0032
50: 0033 0034 0035 0036 0037 0038 0039 003a 003b 003c 003d 003e 003f 0040 0041 0042
60: 0043 0044 0045 0046 0047 0048 0049 004a 004b 004c 004d 004e 004f 0050 0051 0052
70: 0053 0054 0055 0056 0057 0058 0059 005a 005b 005c 005d 005e 005f 0060 0061 0062
80: 0063 0064 0065 0066 0067 0068 0069 006a 006b 006c 006d 006e 006f 0070 0071 0072
90: 0073 0074 0075 0076 0077 0078 0079 007a 007b 007c 007d 007e 007f 0080 0081 0082
a0: 0083 0084 0085 0086 0087 0088 0089 008a 008b 008c 008d 008e 008f 0090 0091 0092
b0: 0093 0094 0095 0096 0097 0098 0099 009a 009b 009c 009d 009e 009f 00a0 00a1 00a2
c0: 00a3 00a4 00a5 00a6 00a7 00a8 00a9 00aa 00ab 00ac 00ad 00ae 00af 00b0 00b1 00b2
d0: 00b3 00b4 00b5 00b6 00b7 00b8 00b9 00ba 00bb 00bc 00bd 00be 00bf 00c0 00c1 00c2
e0: 00c3 00c4 00c5 00c6 00c7 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

This is big-bang-testing at its best, hope for the best testing, if all has gone well one of the memory locations will contain our result: TOTAL, the value 0x00BC. The user can check for this value in the final memory dump, which prints the contents of all memory locations to the screen. The first column is the address in hex i.e. 00, 10, 20 ..., followed by 16, 16bit hex values. Rows containing all zeros i.e. 16 x 0x0000, are not printed. To help the user find the variables they are after the address of each variable can be identified in either the: var.txt, label.txt, or tmp.asm files, that are generated by the assembler.

Note, the files var.txt and label.txt contain the variable/label names + its decimal address in memory. The file tmp.asm contains the assembly language program, the first column is the instructions/variable address, the second is the instruction with any labels replaced with their addresses.

n 26
total 27
cnt 28
data 29
start 0
write_loop 2
write_data 6
accum 11
accum_loop 14
accum_read 17
trap 25
n 26
total 27
cnt 28
data 29
000  move 0
001  store 28 # zero 28
002  load 28 # calc write address
003  add 29
004  store 5 6
005  load 28 # read 29
006  store 0 # write 28 to data+cnt
007  add 1 # inc 28
008  store 28
009  subm ra 26 # is 28 = n?
010  jumpnz 2 # no repeat
011  move 0
012  store 27 # zero 27
013  store 28 # zero 28
014  load 28 # calc write address
015  add 29
016  store 4 17
017  load 0 # read data+cnt
018  addm 27
019  store 27
020  load 28
021  add 1 # inc 28
022  store 28
023  subm ra 26 # is 28 = n?
024  jumpnz 14 # no repeat
025  jump 25 # stop
026  .data 200 # size of array
027  .data 0 # accumulator 27
028  .data 0 # accumulator 27
029  .data 0 # first address of 29 array

Figure 6 : var.txt (top), label.txt (middle), tmp.asm (bottom)

Sooo, TOTAL is stored in address 27 (0x1B) and if we look at this memory location in the final dump file it does indeed contain the value 0x00bc i.e. the correct 8bit accumulated result. This type of testing is ok if your lucky charms are working, however, normally we need to break our testing down into smaller blocks.

ISS Breakpoints

The example test program is constructed from two distinct blocks of code: one that initialises data in memory and one that accumulates these values. Therefore, it would be a good idea to unit test these blocks, or at the least test that the first block is writing the correct values to memory. To do this we can use breakpoints i.e. we can pause the simulation and then look at what is stored into memory. A breakpoint is defined using the -b switch, this allows the user to specify an instruction address that if fetched will cause the simulator to pause. Again the address of each label can be identified in the label.txt file produced by the assembler. To test the first block of code in the test program we can next run this command:

time python3 bin/simple_cpu_v1a_simulator.py -s1 -b11 -i code/testcode_v1a 

This will cause the simulator to pause at address 11 i.e. the instruction with the label accum, the first instruction of the second block of code. Note, you can define multiple breakpoints by using multiple -b switches. The sequence steps to test if the first block of code is writing the correct values into memory are shown in figure 7 below.










Figure 7 : using breakpoints steps 1-5

ISS Test Data

To test the second block of code we need data to accumulate, sooo, testing this in isolation is a little more difficult. The simpleCPU simulators do allow you to initialise memory i.e. load an image (.ppm file) and data (.asc file) into memory. However, before we can load a data file we first need to create it :). This test data can be manually created in any text editor, but to simplify this process we can automate this using python.

#######################
# Test data generator #
#######################

def generate_asc(filename, start_addr, value_start, value_end):
  with open(filename, "w") as f:
    value = value_start

    while value <= value_end:
      line_addr = start_addr + value
      line = f"{line_addr:04X}"

      for i in range(16):
        current_value = value + i
        if current_value > value_end:
          break
        addr = start_addr + current_value
        line += f" {current_value:04X}"

      f.write(line + "\n")
      value += 16

generate_asc("output.asc", start_addr=50, value_start=0, value_end=99)

As we are initialising memory we need to the address of our variables e.g. DATA. Again, this can be obtained from the var.txt file, alternatively you can hard code this using the assembler directive .addr. However, buyer beware, you do need to make sure this memory space is free, the assembler trusts that you know what you are doing. To set the DATA variable to start from address 50 we would modify the assembler code as shown below, then run the generate_asc.py program to produce are test data.

start:

accum:
  move 0  
  store TOTAL          # zero TOTAL             
  store CNT            # zero CNT

accum_loop:
  load CNT             # calc write address
  add DATA
  store 4 accum_read

accum_read:
  load start           # read DATA+CNT
  addm TOTAL
  store TOTAL

  load CNT
  add 1                # inc CNT
  store CNT      
  subm ra N            # is CNT = N?  
  jumpnz accum_loop    # no repeat

trap:
  jump trap            # stop

N:
  .data 100            # size of array
TOTAL:
  .data 0              # accumulator total 
CNT:
  .data 0              # accumulator total 
.addr 50
DATA:
  .data 0              # first address of DATA array 
0032 0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B 000C 000D 000E 000F
0042 0010 0011 0012 0013 0014 0015 0016 0017 0018 0019 001A 001B 001C 001D 001E 001F
0052 0020 0021 0022 0023 0024 0025 0026 0027 0028 0029 002A 002B 002C 002D 002E 002F
0062 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 003A 003B 003C 003D 003E 003F
0072 0040 0041 0042 0043 0044 0045 0046 0047 0048 0049 004A 004B 004C 004D 004E 004F
0082 0050 0051 0052 0053 0054 0055 0056 0057 0058 0059 005A 005B 005C 005D 005E 005F
0092 0060 0061 0062 0063

To load and simulate this code and data we can next run this command:

time python3 bin/simple_cpu_v1a_simulator.py -s2 -r1 -i code/testcode_v1a2 -l code/output.asc 

Figure 8 : simulation output

This example only accumulates 100 values, so as shown in figure 8, the final TOTAL is 0x0056, also note that DATA is also offset to address 50 i.e. 0x32, we have a few more 0x0000 unused memory locations before we see the DATA values 0000, 0001, 0002, 0003 etc.

In the simpleCPUv1d simulator you also have the option of loading a ppm image into memory. This processor has a 12bit address bus, sooo, only having 4096 x 16-bits of memory space does put some limitations on what we can do, consider the examples shown in figure 9.

Figure 9 : Image resolution

As you can see if we were to use a 250 x 219 pixel image, which today would be considered extremely small, we would need to increase the computer's memory size by a factor of 20-ish, used some sort of paging/expanded memory. Therefore, in the test cases we will be processing images will tend to be 30 x 30 pixels. This still allows you to process a simple image, but more importantly for lab work, these smaller images reduce simulation times i.e. you are processing less data, helping reduce debugging times etc.

The image file format is the Portable PixMap (PPM) image format. Originally designed to send images within plain text emails. An example 3 x 3 pixel image using this format is shown in figure 10. The reason for selecting this format is that pixel data can be stored using a simple text format i.e. ASCII characters and can therefore, be edited / viewed using a text editor e.g. Notepad, allowing the user to see how each pixel is represented. Storing data in this format is not very efficient from a file size point of view i.e. would take a lot less space if a binary format was used, however, as the images are only 24 x 24 pixels, this inefficiency is not a significant disadvantage.

Figure 10 : PPM image file (left), displayed image (right)

The first line of the PPM image file identifies the image format to the viewer by its "magic" number P3. This is then followed by a comment/description of the file, indicated by the leading # symbol. The next line defines the image size: columns (width) and rows (height). Lastly, the maximum value of each pixel (255). The remainder of the file defines the RGB value of each pixel, starting at the top left position within the image, a row at a time. These RGB values are listed sequentially in this example, but they could be on different lines, or have multiple RGB values on the same line. However, some image views do seem to insist that you have a blank line at the end.

Figure 11 : Bob 24 x 24 PPM image

Note, you can download this ppm image here: bob.ppm

As the simpleCPU's memory can only store 16bit values, the 24bit RGB data has to be reduced in size, as shown in figure 12. The lower three bits of the RED and BLUE pixel values, and the lower two bits of the GREEN pixel value are removed, to produce a 16bit packed data type. This is done automatically by the simulator when the PPM image is loaded at the start of each simulation i.e. each pixel within the image is loaded into a separate memory location. This again has the advantage of reducing memory usage i.e. a 24 x 24 pixel image can be stored in 576 memory locations.

Figure 12 : packed RGB data format

To process this RGB data the simpleCPU processor will need to extract the original R, G and B values, then left shift these to regenerate the original 8-bit values, as shown in figure 13. The lower bits of these values will be lost, but this should not significantly affect the displayed colour.

Figure 13 : packed RGB data format

To illustrate how these types of images could be processed on the simpleCPUv1d consider the code below:

start:
  move rb PIXELS     # rb = pixel address pointer
  load ra COLOUR     # rc = RED
  move rc ra      
  move rd 4          # rd = loop count
loop:
  load ra (rb)       # read pixel address
  store rc (ra)      # update with RED pixel
  add rb 1           # inc pixel address pointer
  sub rd 1           # decrement loop counter
  jumpnz loop        # repeat ?
    
trap:
  jump trap          # no finish

# 1024 + (14 x 24) + 10  = 1370
# 1024 + (14 x 24) + 11  = 1371
# 1024 + (15 x 24) + 10  = 1394
# 1024 + (15 x 24) + 11  = 1395

PIXELS:
  .data 1370
  .data 1371
  .data 1394
  .data 1395

COLOUR:
  .data 0xF800

This program draws a red noise on the image of Bob shown in figure 11. To load an image into memory you can use the command below:

test python3 bin/simple_cpu_v1d_simulator.py -s2 -r1 -i code/image -p bob

The image of Bob (bob.ppm) is loaded into the simulator memory starting at the base address defined by the variable INPUT_IMAGE_BASE_ADDR, one pixel store in each memory location using the packet RGB image format show in figure 12. When the simulation has finished an output image is written to the file dump_bob.ppm i.e. the input image name preceded with "dump_". The data store in this file is obtained from the base address defined by the variable OUTPUT_IMAGE_BASE_ADDR, using the same image parameters, size etc, as specified in the original input image. The output image for this code is shown in figure 14.

Figure 14 : Red nose Bob 24 x 24 PPM image

ISS Single Step

The final method to debug a program in the ISS is to single step, or "slow" stepping through your code. To single step through the program i.e. simulate one instruction at a time, allowing the user to see how internal registers and external memory locations have been updated. We can launch the simulator using this command:

time python3 bin/simple_cpu_v1a_simulator.py -s0 -i code/testcode_v1a 

This will start the simulator as step-1 in figure 7. Then each time the user presses the s key i.e. step, one instruction is simulated, allowing the user to incrementally step through their program. Using the command switch -s0 sets the simulation speed to one instruction every 0.25 seconds. This may sound fast, but at this speed you can read each instruction as it executes, sooo, if you press the r key i.e. run, the simulator will step through the code for you. If any point you wish to pause the simulator you can press CTRL-C, allowing you to go back to manual single stepping. This method can be combined with breakpoints if needed, so that you can examine how a specific section of the program is working i.e. rather than single stepping through a section of code you are not interested in.

ISim

The default simulator used in Xilinx ISE 14.7 is ISim which supports VHDL, Verilog, and mixed‑language simulations i.e. its a Hardware Description Language (HDL) simulator. That's all good, but in SYS1 most of the hardware we designed is defined using schematics, so how are these simulated? The answer is that behind the scenes each schematic is automatically converted into a .vhf file i.e. a VHDL representation, a textual description of the wires and components used in that schematic. These files are then passed to ISim allowing it to perform behavioural, functional and timing verification i.e. your normal FPGA workflow, as shown in figure 15.


Figure 15 : Typical FPGA workflow

Note, i still prefer to use schematics for teaching as these help visualise / illustrate how primitive components i.e. logic gates and flip-flops, are connected together to form higher level functionality e.g. counter etc. Yes everything today is HDL, but that doesn't make it the best way to learn hardware design :).

Soooo, to restate the obvious ISim is a hardware simulator, its not a software simulator such as the Instruction Set Simulators (ISS) we looked at previously (Link). However, to test how a program will function on our games console we need to test how a program works with different hardware components, we are not running a program in isolation, to function correctly it will need to interact with different peripheral devices e.g. GPIO, UART, HDMI etc. Yes, we could add this functionality to our ISS, to turn it into an emulator (Link), to allow it to simulate these peripheral devices, but thats not practical when students are adding their own custom hardware and instructions to the processor, therefore, we do need to use a hardware simulator, we do need to use ISim to test some of our programs.

Note, a good introduction to ISim can be found here: (ug682.pdf)

Before we look at specific software debugging techniques that can be used with ISim i want to quickly go through how ISim is used in a typical FPGA workflow shown in figure 15. In a normal workflow we use different types of simulations to test different things:

In SYS1 the processor hardware is designed by me sooo is perfect, well it should be ok until i discover the next bug :). Therefore, in the majority of cases students will be using existing hardware, will be simulating how their code runs on this hardware, sooo should only need to perform behavioural level simulations. Which is good as these are the quickest simulations. By default the signals shown in the waveform diagram are the top-level IO markers, however, you can drag and drop any signal into the waveform diagram i.e. you have fully observability of any signal in your design.

Note, in functional/timing simulations dragging signals into the waveform diagram can be tricky as the names and signal used in the design are normally lost when a design is synthesised e.g. if a logic circuit is minimised.

Consider the top level schematic from lab 6 shown in figure 16. To simulate this hardware ISim uses the VHDL testbench shown in figure 17.


Figure 16 : top level schematic - computer.sch

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.std_logic_unsigned.ALL;
USE ieee.numeric_std.ALL;

LIBRARY UNISIM;
USE UNISIM.Vcomponents.ALL;

ENTITY computer_TB IS
END computer_TB;
ARCHITECTURE computer_TB_arch OF computer_TB IS 

   -- Top level schematic

   COMPONENT computer 
   PORT( 
     CLK : IN  STD_LOGIC; 
     RST : IN  STD_LOGIC; 
     GPI : IN  STD_LOGIC_VECTOR(3 DOWNTO 0);			
     GPO : OUT STD_LOGIC_VECTOR(3 DOWNTO 0);
     R : OUT STD_LOGIC;
     G : OUT STD_LOGIC;
     B : OUT STD_LOGIC );
   END COMPONENT;

   -- Wires and Buses

   SIGNAL CLK :	STD_LOGIC;
   SIGNAL RST :	STD_LOGIC;
   SIGNAL GPI : STD_LOGIC_VECTOR(3 DOWNTO 0);			
   SIGNAL GPO : STD_LOGIC_VECTOR(3 DOWNTO 0);
   SIGNAL R : STD_LOGIC;
   SIGNAL G : STD_LOGIC;	
   SIGNAL B : STD_LOGIC;
	
BEGIN

   -- Unit Under Test

   UUT: computer_debug PORT MAP(
      CLK => CLK, 
      RST => RST, 
      GPI => GPI, 		
      GPO => GPO,
      R => R,
      G => G,
      B => B );

   -- 10MHz Clock generator

   clock : PROCESS
   BEGIN
      CLK <= '0'; wait for 50 ns;
      CLK <= '1'; wait for 50 ns;		
   END PROCESS;
	
   -- Reset signal

   clear : PROCESS
   BEGIN
      RST <= '1'; wait for 150 ns;
      RST <= '0'; wait;		
   END PROCESS;
  
   -- Inputs

   tb : PROCESS
   BEGIN
     GPI <= "0000"; wait for 1000 ns;
     GPI <= "0001"; wait for 1000 ns;
     GPI <= "0011"; wait for 1000 ns;
     GPI <= "0111"; wait for 1000 ns;
     GPI <= "1111"; wait for 1000 ns;
   END PROCESS;

END;

Figure 17 : testbench - computer_tb.vhd

A testbench can be broken down into the following sections:

Note, the ";" symbol defines the end of a statement in a process block, therefore, the assignment operator and wait states could be placed on different lines. They were placed on the same line in this example to improve readability. Remember that the statements inside a process run sequentially.

The UUT and each process run concurrently, consider them to be hardware components and the testbench the PCB that connects them together. As shown in figure 18 when we simulate this schematic the only signals that will be displayed in the waveform diagram will be the top level IO markers: CLK, CLR, GPI, GPO and RGB, which is not that useful when you are trying to debug hardware within the processor or code.


Figure 18 : top level schematic

Fortunately we can manual drag and drop component into the waveform diagram, the only issue is you need to identify the components name and its location in the hierarchy. As an example we would like to look at what the ADDSUB component is doing in the ALU. In ISE we can push down the hierarchy, from the CPU, into the ALU where we can identify the auto assigned name that was given to the ADDSUB component i.e. XLXI_39. Then in ISim drag and drop this component from the "Instance and Process Name" column into the "Name" column in the waveform diagram, this will add all the IO ports and internal signals associated with this component. To view signals click on the restart button and simulate for the desired time interval.








Figure 19 : top level schematic (top), simulation (bottom)

When debugging code, we are more interested the state of internal registers e.g. IR, PC, ACC etc, and perhaps processor bus activity. Again these can be dragged and dropped into the waveform diagram, as shown in figure 20, you can also add dividers (purple bars) by right clicking in the Name column, selecting New Divider. This does help to break up the waveform diagram, help you find / view the correct signals.


Figure 20 : adding processor registers

debug register

ISim - Breakpoints

Like the ISS we can observe a program's instruction trace, see how the processors state is changing. This is ok for small programs, but you can quickly get lost with all the green squiggles on the screen. Unfortunately there is not a lot of support in ISim to help debug our programs, however, we can add the VHDL assert and report statements to the testbench to display data in the simulator:

assert TEST report MESSAGE severity LEVEL

The assert TEST should produce a boolean result. Typically a bus or signal is compared to a value e.g. GPO = 0. If the result is FALSE the MESSAGE string is printed in the ISim terminal and its severity LEVEL: note, warning, error, failure. This level is selected by the user, a severity level of failure will halt the simulator i.e. a "breakpoint". These statements can be added to the testbench, after the ARCHITECTURE BEGIN statement:

ARCHITECTURE computer_TB_arch OF computer_TB IS 
   ...
	
BEGIN
   assert to_integer(unsigned(GPO)) = 0 report "GPO = " & integer'image(to_integer(unsigned(GPO))) severity note;

   -- Unit Under Test
   ...
END;

Note, ISim will stop if the assert statement is tagged with a severity of failure, unfortunately you can not then restart Isim :(

GPO is the output bus of the GPIO component controlled by the processor. The ASSERT test first casts the GPO's STD_LOGIC_VECTOR bus to UNSIGNED, then converts this to an INTEGER, which is compared to the value 0. If not zero i.e. the initial value, the REPORT string is printed to the ISim terminal. The string "GPO = " is joined using the "&" concatenation operator with the GPO bus value converted to a string, as shown in figure 21.


Figure 21 : Assert messages

Assert messages are a useful tool to monitor signals and busses in the top-level schematic, but they can't be used to monitor values within a schematic/component e.g. the processor's address bus, or the ACC register inside the processor. ISim does support breakpoints in VHDL, but as the simpleCPU processor is implemented using schematics there is no direct way to use this feature. However, we can add a new breakpoint component, an address monitor PROCESS that is triggered when a breakpoint address is detected on the processor's address bus i.e. the same as the -b switch in the ISS. Consider the breakpoint component below:


LIBRARY IEEE;
USE IEEE.STD_LOGIC_1164.ALL;

entity breakpoint is
port ( 
  A : in std_logic_vector(7 downto 0);
  Y : out std_logic );
end breakpoint;

architecture breakpoint_arch of breakpoint is
begin
   breakpoint_monitor : PROCESS ( A )
   BEGIN
     if to_integer(unsigned(A)) = 11
     then
       Y <= '1';   -- ADD BREAKPOINT ON THIS LINE
     else
       Y <= '0';
     end if;
   END PROCESS;
END breakpoint_arch;

Figure 22 : Breakpoint component (top), VHDL implementation (bottom)

The breakpoint_monitor PROCESS is triggered whenever the A port i.e. the processor's address bus changes. In this example, within the process this value is converted to an integer and then compared to the value 11 i.e. the address of the accum label in figure 2. Within ISim the breakpoint component can be selected from the "Instance and Process Name" column and doubled clicked to open, you can then set a breakpoint by right clicking on the Y <= '1'; statement, selecting toggle breakpoint, that will pause the simulator if executed. This statement is marked with a RED circle to indicate that a breakpoint has been set, as shown in figure 23.


Figure 23 : Adding a breakpoint

The ISim simulation is run as normal, when address 11 is accessed the simulation will be paused i.e. the VHDL for the breakpoint component is opened and a YELLOW arrow indicates where the breakpoint has ocured. The user can then examine the instruction trace and register values displayed in the waveform diagram. To restart the simulation the user simply clicks on the RUN FOR icon. The breakpoint position can also be shown in the waveform diagram for later reference by dragging and dropping the breakpoint component's Y output into the waveform diagram.








Figure 24 : Triggering breakpoint

ISim - Displaying Bus and Register data

A similar technique can be used to monitor bus and register values within the processor. To do this we will need to create another new component, the bus_monitor component shown in figure 25.


LIBRARY IEEE;
USE IEEE.STD_LOGIC_1164.ALL;
USE IEEE.STD_LOGIC_UNSIGNED.ALL;

use std.textio.all;
use ieee.std_logic_textio.all;

entity bus_monitor is
generic (
  NAME : string := "ACC";
  SIZE : natural := 8 );
port ( 
  A : in std_logic_vector(SIZE-1 downto 0) );
end bus_monitor;

architecture bus_monitor_arch of bus_monitor is
  begin
    process( A )
      variable L : line;
    begin
      write(L, now);
      write(L, string'(" : "));		
      write(L, NAME);
      write(L, string'(" = "));		
      write(L, A);	
      writeline(output, L);		
    end process;
end bus_monitor_arch;

Figure 25 : bus_monitor component (top), VHDL implementation (below)


Figure 26 : Bus_monitor terminal output

The process in the bus_monitor component is triggered when the signal/bus in its sensitivity list i.e. in the brackets after the key word PROCESS change. VHDL write statements can then be used to construct a text string i.e. a line, that is written to the ISim output terminal. The generic terms can be used to adjust the text NAME and bus SIZE for each instance of the bus_monitor created. However, i confess not sure how you do that in schematics, I've only used generics in VHDL components :).

ISim - Debug component


Figure 27 : debug component

ISim - Visualiser

WORK IN PROGRESS

Creative Commons Licence

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Contact email: mike@simplecpudesign.com

Back