A Hardware Description Language SimpleCPUv1a

I finally decided to implement a Hardware Description Language (HDL) version of the simpleCPU version 1a. I confess i don't have a good explanation for why i did not do this years ago as my background is in electronics and most of my research career was spent designing custom hardware for FPGAs using HDLs. I guess as always there is never enough time to do the fun things :). Back in the past, when i actually designed actual hardware, my HDL of choice was VHDL: Very High Speed Integrated Circuit Hardware Description Language (Link). The reason for this choice was that back then the Department of Comp. Sci. at York was famous for its Ada compiler, therefore, i had done a lot of software work in Ada (Link). VHDL is based on Ada, as both languages were commissioned by the United States Department of Defence (DoD), so switching from software to hardware, putting back on my electronics engineer's hat, i found VHDL a very easy language to pickup. A HDL i never really looked at was Verilog (Link), no good reason other than it wasn't VHDL :). I guess you could also add to this list System Verilog. Therefore, as Verilog is now the more popular HDL on the market i thought it was time to learn the basics, and as always its always easier to learn stuff whilst applying this knowledge to a practical problem, so why not implement the simpleCPU. Below is a guide to how to build the simpleCPU version 1a in both VHDL and Verilog. Note, it will be interesting at the end to see which language produces the "best" hardware i.e. smallest / fastest implementation :).

SimpleCPU v1a
NOR gate
Multiplexers
Arithmetic and Logic Unit
Registers and counters
Control logic
Memory
Computer and testing
FPGA board

SimpleCPU version 1a

Figure 1 : simpleCPU version 1a block diagram

The VHDL and Verilog implementations of the simpleCPUv1a block diagram shown in figure 1 will follow the same design approach as the previous schematic implementations i.e. functionality is broken down into a series of sub-components, which are then used to build larger components, which in turn form the key building blocks of the processor's architecture. Note, well that was the plan, confess looking back at the end i did use a couple of higher level descriptions to save some time i.e. abstract descriptions. A brief intro into the VHDL and Verilog languages can be found here: (Link) (Link).

WARNING : the discussion below are my opinions not necessarily facts :), i am not a Verilog programmer, my background is in VHDL, so rather these are my observations on the differences between these two HDLs. This is not a tutorial on how to use VHDL or Verilog, rather these are notes and examples for me so i don't forget stuff later :).

NOR gate

So to start this journey we will start simple: basic logic gates. To detect if the ACC is zero the simpleCPU uses an eight input NOR gate, one possible VHDL and Verilog implementation is shown below.

VHDL
----

library IEEE;
use IEEE.STD_LOGIC_1164.ALL

entity NOR_8
port( 
  A : in  STD_LOGIC_VECTOR( 7 downto 0 );
  Z : ou STD_LOGIC);
end entity;
architecture NOR_8_ARCH of NOR_8 is
begin
  Z <= NOT( A(7) OR A(6) OR A(5) OR A(4) OR A(3) OR A(2) OR A(1) OR A(0) );
end NOR_8_ARCH;

VERILOG
-------

STYLE 1
-------
module NOR_8( A, Z );
  input [7:0] A;
  output Z;
  assign Z = ~( A[7] | A[6] | A[5] | A[4] | A[3] | A[2] | A[1] | A[0] );
endmodule

STYLE 2
-------
module NOR_8( 
  input [7:0] A,
  output Z );
  assign Z = ~( A[7] | A[6] | A[5] | A[4] | A[3] | A[2] | A[1] | A[0] );
endmodule

STYLE 3
-------
module NOR_8( 
  input [7:0] A,
  output reg Z );

  always @(*) begin
    Z = ~( A[7] | A[6] | A[5] | A[4] | A[3] | A[2] | A[1] | A[0] );
  end
endmodule

VHDL supports the Boolean operators: NOT, AND, NAND, OR, NOR, XOR, XNOR. Verilog supports the same Boolean operators: !, &, ~&, |, ~|, ^, ~^. Apart from syntax the first difference i noticed is that VHDL uses an additional library to support multi-valued logic i.e. VHDL natively supports the type bit: {0,1}, but to support other states such as high-impedance, not-connected, or weak signals etc, you need to import the STD_LOGIC libs. These support 9 distinct logic states {0,1,U,X,Z,W,L,H,-}. Verilog supports multi-valued logic built-in, but only uses 4 distinct logic states {0,1,X,Z}, but these do cover 99% of the basic scenarios.

Verilog has a few different "styles", as shown by the three examples above. Confess, not sure why you need style-1. I would describe these styles of HDL as behavioural i.e. not structural. However, this is a low level description, you are defining functionality in terms of logic gates, so i guess these will always look similar, i guess you could say these examples use a dataflow style. A key point to note in these Verilog descriptions is the "assign" keyword, this alters the meaning of "=" symbol, it defines a continuous assignment, no dependency on a clock, or sequential behaviour i.e. it is not a blocking assignment, it is just logic. Similarly, style-3 the "reg" declaration in the output definition does not mean "registered", the "*" in the sensitivity list redefines this as meaning "assigned" i.e. logic. I don't like this inconsistency :(. Finally, why isn't there a ';' after endmodule?

You can download copies of these files as a ISE project with a testbench here: (Link). Note, the .vhd and .v files need to be manually added to the project one at a time for testing as they use the same name. From these HDL descriptions the Xilinx ISE tools can create an RTL schemtic, as shown in figure 2. A waveform diagram of this component in action is shown in figure 3 below.

Figure 2 : NOR gate schematic

Figure 3 : NOR gate simulation

Multiplexers

Rather than implementing this components from basic logic gates i.e. AND, OR and NOT (Link) i decided to build this component using a higher level, abstract HDL description, one possible VHDL and Verilog implementation is shown below.

VHDL
----

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity MUX_2_8 is
Port (
    A, B : IN  STD_LOGIC_VECTOR(7 downto 0); 
    SEL  : IN  STD_LOGIC;                    
    Y    : OUT STD_LOGIC_VECTOR(7 downto 0));
end MUX_2_8;

architecture MUX_2_8_ARCH of MUX_2_8 is
begin
    Y <= B when SEL = '1' else A;
end MUX_2_8_ARCH;

VERILOG
-------
 
module MUX_2_8 ( A, B, SEL, Y );
  input [7:0] A;     
  input [7:0] B;     
  input SEL;         
  output [7:0] Y;     
  assign Y = SEL ? B : A; 
endmodule

Unlike the previous examples the HDL style used here is definitely not structural, as we are not defining the hardware's functionality in terms of Boolean logic. Most people would describe this style as dataflow, but i would tend to use the more general term of Behavioural. Confess, i don't normally use the syntax shown here i.e. select statements, i would use a PROCESS in VHDL i.e. IF-THEN-ELSE, we will see these in a bit. However, this does highlight another difference between VHDL and Verilog i.e. assignments and the use of the different "=" symbols.

In VHDL "<=" is read as driven e.g. Y <= A, would read as Y is driven by A. These are signals i.e. A and Y would be "wires" in the real world. In VHDL the "=" symbol is a relational operator i.e. equals, returning a Boolean. You do have the concept of variables in VHDL and assignments to these use ":=" e.g. Y := A would read as Y is assigned A. These are variables i.e. Y and A are used in abstract high-level hardware descriptions. These focus on describing the hardware's function, not its implementation, therefore, a variable could represent integers, floats, arrays etc, they don't have to be a "hardware" specific data type, they don't have to directly model real-world hardware.

In Verilog "=" is a blocking assignment and "<=" is a non-blocking assignment. These are used within a procedural block, which we will look at later. However, as we have already seen the "assign" keyword alters the reading of the "=" assignment to mean continuous i.e. it is logic. So using the "assign" keyword means it is neither blocking or non-blocking, which is not at all confusing :). The term "blocking" refers to how an assignment is performed. A blocking assignment must complete before the next HDL instruction is executed e.g. if you had instructions Y = 10 and the next instruction was A = Y, then A = 10. This would be the same for a VHDL variable. That seems logical, but that's how software behaves, not hardware, signals do not update instantaneously, there will be wire and propagation delays to consider. Therefore, in Verilog we have non-blocking assignments i.e. "<=" operator, these allow multiple assignments to operate in parallel, allows assignments to be scheduled / updated at the end of a simulation time step. The "==" is a relational operator i.e. equals, returning a Boolean.

Note, the concept of blocking and non-blocking assignments is confusing :). In VHDL we would now start to talk about delta cycles, start to think about how the simulator actually works. Consider what happens in a simple combinatorial logic circuit i.e. all logic gates will be working in parallel. The order these gates are simulated in the simulator must not affect the simulation result. Yes there is a "sequential" behaviour as signals propagate through a circuit, but each gate could have different propagation delays. Updates to their outputs and the "wires" they are connected to need to be scheduled in the simulator i.e. at different times. Therefore, a signal could go through multiple transitory states before settling down to a stable value. A blocking assignment does not consider this, it will be performed instantaneously within the simulator, which is not what will happen in a real-world circuit. Therefore, like VHDL, blocking assignments i.e. variables, should only be used in a PROCESS, a high-level sequential description of what a piece of hardware does, or in simple logic circuits such as the previous NOR gate example i.e. HDL descriptions with a single assignment, or logic where the output of one assignment is NOT used as the input of another. Yes, there are always exceptions to these rules, BUT, i'm always very careful where and how i use blocking assignments i.e. i try to avoid them, as you could end up with a hardware description that does not describe the hardware behaviour you want :(.

Figure 4 : MUX schematic

Figure 5 : MUX simulation

Arithmetic and Logic Unit

Basically a direct copy of the original design (Link), just implemented as a higher level, abstract HDL description. The first component constructed is the adder / subtract unit, one possible VHDL and Verilog implementation is shown below.

VHDL
----

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;

entity ADDSUB_8 is
port (
  A, B : in  STD_LOGIC_VECTOR(7 downto 0); 
  SEL  : in  STD_LOGIC; 
  Y    : out STD_LOGIC_VECTOR(7 downto 0); 
  C    : out STD_LOGIC );
end ADDSUB_8;

architecture ADDSUB_8_ARCH of ADDSUB_8 is
    signal b_int : STD_LOGIC_VECTOR(7 downto 0); 
    signal y_int : STD_LOGIC_VECTOR(8 downto 0);   
begin
    b_int <= B when SEL = '0' else not B; 
    y_int <= ("0" & A) + ("0" & b_int) + Sel; 

    Y <= y_int(7 downto 0); 
    C <= y_int(8);      
end ADDSUB_8_ARCH;

VERILOG
-------

module ADDSUB_8( A, B, SEL, Y, C );
  input [7:0] A;
  input [7:0] B;
  input SEL;
  output [7:0] Y;
  output C;

  wire [7:0] b_int;  
  wire [8:0] y_int;    

  assign b_int = SEL ? ~B : B;
  assign y_int = A + b_int + SEL;

  assign Y = y_int[7:0]; 
  assign C = y_int[8];
endmodule

I tried to follow on from the MUX example and use the "select" instructions to switch between the inverted and non-inverted input i.e. 2s complement conversion. When performing addition the result of two n-bit numbers will be n+1 bits i.e. there could be a carry. In VHDL bus bit-widths must match, therefore, you need to pad buses with a leading zero i.e. for the sign bit. This is achieve in VHDL using the concatenation operator "&". Note, in Verilog concatenation is performed using "{ }" e.g. {0, A}. In Verilog if bus sizes do not match, the shorter bus will be automatically sign-extended. Note, interestingly in Verilog you do not define how ADD is performed i.e. signed or unsigned etc. From these HDL descriptions the Xilinx ISE tools can create an RTL schemtic, as shown in figure 6. A waveform diagram of this component in action is shown in figure 7 below.

Figure 6 : ADDSUB schematic

Figure 7 : ADDSUB simulation

In addition to the ADDSUB component the ALU also has a bit-wise AND, one possible VHDL and Verilog implementation is shown below.

VHDL
----

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity AND_2_8 is
port( 
  A : IN  STD_LOGIC_VECTOR( 7 downto 0 );
  B : IN  STD_LOGIC_VECTOR( 7 downto 0 );
  Z : OUT STD_LOGIC_VECTOR( 7 downto 0 ) );
end AND_2_8;

architecture AND_2_8_ARCH of AND_2_8 is
begin

  Z(7) <= A(7) AND B(7);
  Z(6) <= A(6) AND B(6);
  Z(5) <= A(5) AND B(5);
  Z(4) <= A(4) AND B(4);
  Z(3) <= A(3) AND B(3);
  Z(2) <= A(2) AND B(2);
  Z(1) <= A(1) AND B(1);
  Z(0) <= A(0) AND B(0);

end AND_2_8_ARCH;

VERILOG
-------
 
module AND_2_8( A, B, Z );
  input [7:0] A;
  input [7:0] B;
  output [7:0] Z;

  assign Z[7] = A[7] & B[7];
  assign Z[6] = A[6] & B[6];
  assign Z[5] = A[5] & B[5];
  assign Z[4] = A[4] & B[4];
  assign Z[3] = A[3] & B[3];
  assign Z[2] = A[2] & B[2];
  assign Z[1] = A[1] & B[1];
  assign Z[0] = A[0] & B[0];
  
endmodule

No surprises here the AND_2_8 component uses the same style / code as the NOR_8. From these HDL descriptions the Xilinx ISE tools can create an RTL schemtic, as shown in figure 8. A waveform diagram of this component in action is shown in figure 9 below.

Figure 8 : AND schematic

Figure 9 : AND simulations

These two components plus the previous MUX can then be used to implement the ALU, one possible VHDL and Verilog implementation is shown below.

VHDL
----

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;

entity ALU is
port (
  A, B : in  STD_LOGIC_VECTOR(7 downto 0); 
  CTL  : in  STD_LOGIC_VECTOR(2 downto 0);                  
  Y    : out STD_LOGIC_VECTOR(7 downto 0) );
end ALU;

architecture ALU_ARCH of ALU is

  component ADDSUB_8
  port (
    A, B : in  STD_LOGIC_VECTOR(7 downto 0); 
    SEL  : in  STD_LOGIC;                    
    Y    : out STD_LOGIC_VECTOR(7 downto 0); 
    C    : out STD_LOGIC );
  end component;

  component AND_2_8 
  port( 
    A : IN  STD_LOGIC_VECTOR( 7 downto 0 );
    B : IN  STD_LOGIC_VECTOR( 7 downto 0 );
    Z : OUT STD_LOGIC_VECTOR( 7 downto 0 ) );
  end component;

  component MUX_2_8 
  port (
    A, B : in  STD_LOGIC_VECTOR(7 downto 0); 
    SEL  : in  STD_LOGIC;                    
    Y    : out STD_LOGIC_VECTOR(7 downto 0) );
  end component; 

  signal mux_int : STD_LOGIC_VECTOR(7 downto 0); 
  signal addsub_int : STD_LOGIC_VECTOR(7 downto 0); 
  signal and_int : STD_LOGIC_VECTOR(7 downto 0);
 
  signal carry : STD_LOGIC; 

begin

 mux_a : MUX_2_8 port map(
    A   => mux_int,
    B   => B,
    SEL => CTL(2),                
    Y   => Y );

 muxb : MUX_2_8 port map(
    A   => addsub_int,
    B   => and_int,
    SEL => CTL(1),                
    Y   => mux_int );

  adder : ADDSUB_8 port map(
    A   => A, 
    B   => B,
    SEL => CTL(0),                  
    Y   => addsub_int,
    C   => carry );

  bitwiseAND : AND_2_8 port map( 
    A => A,
    B => B,
    Z => and_int );

end ALU_ARCH;

VERILOG
-------

module ALU (
  input [7:0] A, 
  input [7:0] B,
  input [2:0] CTL,                  
  output [7:0] Y );

  wire [7:0] mux_int;
  wire [7:0] addsub_int;
  wire [7:0] and_int;

  MUX_2_8 mux_a (
    .A(mux_int), 
    .B(B),  
    .SEL(CTL[2]),               
    .Y(Y) );

  MUX_2_8 mux_b (
    .A(addsub_int), 
    .B(and_int),  
    .SEL(CTL[1]),               
    .Y(mux_int) );

  ADDSUB_8 add_sub (
    .A(A),
    .B(B), 
    .SEL(CTL[0]),                    
    .Y(addsub_int),   
    .C(carry) );

  AND_2_8 bitwiseAND ( 
    .A(A),
    .B(B), 
    .Z(and_int) );

endmodule

Figure 10 : ALU schematic

Figure 11 : ALU simulation

Note, ADD="000", SUB="001", AND="010", PASS="100".

Registers and ring counter

Again, i used the same basic architecture as the previous FPGA implementation (Link), building a 4bit register from D-type flip-flops, then using this component to produce the 8bit and 16bit registers. The aim here was to show component reuse and hierarchical design.

VHDL
----

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity REG_4 is
port (
  D   : in STD_LOGIC_VECTOR(3 downto 0);         
  Q   : out STD_LOGIC_VECTOR(3 downto 0);
  CLK : in STD_LOGIC;  
  CLR : in STD_LOGIC;
  CE  : in STD_LOGIC);
end REG_4;

architecture REG_4_ARCH of REG_4 is
begin

  process (CLK, CLR)
  begin
    if CLR = '1' 
    then
      Q <= (others => '0'); 
    elsif CLK='1' and CLK'event
    then
      if CE = '1' 
      then
        Q <= D; 
      end if;
    end if;
  end process;
  
end REG_4_ARCH

VERILOG
-------

module REG_4 ( D, Q, CLK, CLR, CE );
  input [3:0] D;  
  input CLK;    
  input CLR;
  input CE;
  output reg [3:0] Q;

always @(posedge CLK or posedge CLR) 
begin
  if (CLR) 
    Q <= 4'b0000;            
  else if (CE) 
    Q <= D;                 
end
endmodule

In Verilog the "always" keyword defines a procedural block, the @ symbol is used to define the sensitivity list for the procedural block. A sensitivity list specifies the signals / conditions that should change for that block to be "executed". In VHDL a comparable construct is the PROCESS, its sensitivity list is defined by its ( ). Note, as there is only one assignment in the IF-ELSE structure you do not need to declare a "begin-end" block. A block is only needed if you have multiple assignments. The IF condition statement must be in a ( ), use == for bus, or variable comparisons. This can also be used for bit inputs e.g. CLR and CE, but this not required, see example above. To signal that the Q output is registered you must use the "reg" output type. Default type is wire. Assignments are made using non-blocking assignments "<=".

To construct an 8bit register we can use two REG_4 components, one possible VHDL and Verilog implementation is shown below.

VHDL
----

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity REG_8 is
port (
  D   : in STD_LOGIC_VECTOR(7 downto 0);         
  Q   : out STD_LOGIC_VECTOR(7 downto 0);
  CLK : in STD_LOGIC;  
  CLR : in STD_LOGIC;
  CE  : in STD_LOGIC );
end REG_8;

architecture REG_8_ARCH of REG_8 is

  component REG_4
  port (
    D   : in STD_LOGIC_VECTOR(3 downto 0);         
    Q   : out STD_LOGIC_VECTOR(3 downto 0);
    CLK : in STD_LOGIC;  
    CLR : in STD_LOGIC;
    CE  : in STD_LOGIC);
  end component;

begin
  reg_low: REG_4 port map (
    D   => D(3 downto 0),      
    Q   => Q(3 downto 0),
    CLK => CLK,
    CLR => CLR,
    CE  => CE );

  reg_high: REG_4 port map (
    D   => D(7 downto 4),      
    Q   => Q(7 downto 4),
    CLK => CLK,
    CLR => CLR,
    CE  => CE );

end REG_8_ARCH;

VERILOG
-------

module REG_8 (
  input [7:0] D,  
  input CLK, 
  input CLR,  
  input CE,
  output [7:0] Q );

  REG_4 reg_low (
    .D(D[3:0]),  
    .CLK(CLK),
    .CLR(CLR),
    .CE(CE),
    .Q(Q[3:0]) );

  REG_4 reg_high (
    .D(D[7:4]),  
    .CLK(CLK),
    .CLR(CLR),
    .CE(CE),
    .Q(Q[7:4]) );

endmodule

In Verilog "." defines a port, the associated "()" defines the wire it is connected to. To construct a 16bit register we can use two REG_8 components, one possible VHDL and Verilog implementation is shown below.

VHDL
----

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity REG_16 is
port (
  D   : in STD_LOGIC_VECTOR(15 downto 0);         
  Q   : out STD_LOGIC_VECTOR(15 downto 0);
  CLK : in STD_LOGIC;  
  CLR : in STD_LOGIC;
  CE  : in STD_LOGIC );
end REG_16;

architecture REG_16_ARCH of REG_16 is

  component REG_8
  port (
    D   : in STD_LOGIC_VECTOR(7 downto 0);         
    Q   : out STD_LOGIC_VECTOR(7 downto 0);
    CLK : in STD_LOGIC;  
    CLR : in STD_LOGIC;
    CE  : in STD_LOGIC);
  end component;

begin
  reg_low: REG_8 port map (
    D   => D(7 downto 0),      
    Q   => Q(7 downto 0),
    CLK => CLK,
    CLR => CLR,
    CE  => CE );

  reg_high: REG_8 port map (
    D   => D(15 downto 8),      
    Q   => Q(15 downto 8),
    CLK => CLK,
    CLR => CLR,
    CE  => CE );

end REG_16_ARCH;

VERILOG
-------

module REG_16 (
  input [15:0] D,  
  input CLK, 
  input CLR,  
  input CE,
  output [15:0] Q );

  REG_8 reg_low (
    .D(D[7:0]),  
    .CLK(CLK),
    .CLR(CLR),
    .CE(CE),
    .Q(Q[7:0]) );

  REG_8 reg_high (
    .D(D[15:8]),  
    .CLK(CLK),
    .CLR(CLR),
    .CE(CE),
    .Q(Q[15:8]) );

endmodule

Figure 12 : REG_16 schematic

Figure 13 : REG_16 simulation

The program counter (PC) and the ring-counter, are also base on the same designs as the FPGA implementation (Link), however, i decided to build these using behavioural descriptions, one possible VHDL and Verilog implementation of a ring counter is shown below.

VHDL
----

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;

entity RING_COUNTER_3 is
port (
  CLK : in STD_LOGIC;                 
  RST : in STD_LOGIC;           
  Q   : out STD_LOGIC_VECTOR(2 downto 0));
end RING_COUNTER_3;

architecture RING_COUNTER_3_ARCH of RING_COUNTER_3 is
  signal q_int : STD_LOGIC_VECTOR(2 downto 0);
begin
  process (CLK, RST)
  begin
    if RST = '1' 
    then
      q_int <= "001";          
    elsif CLK='1' and CLK'event
    then
      q_int <= q_int(1 downto 0) & q_int(2); 
    end if;
  end process;

  Q <= q_int;

end RING_COUNTER_3_ARCH;

VERILOG
-------

module RING_COUNTER_3 (
  input wire CLK,            
  input wire RST,            
  output reg [2:0] Q );

always @(posedge CLK or posedge RST) begin
  if (RST) 
    Q <= 3'b001;          
  else 
    Q <= {Q[1:0], Q[2]};    
end
endmodule

From these HDL descriptions the Xilinx ISE tools can create an RTL schemtic, as shown in figure 14. A waveform diagram of this component in action is shown in figure 15 below.

Figure 14 : RING_COUNTER_3 schematic

Figure 15 : RING_COUNTER_3 simulation

One possible VHDL and Verilog implementation of a loadable binary counter is shown below.

VHDL
----

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;

entity COUNTER_8 is
Port (
  CLK : in  STD_LOGIC;                 
  CLR : in  STD_LOGIC;                
  LD  : in  STD_LOGIC;  
  CE  : in  STD_LOGIC;   
  D   : in  STD_LOGIC_VECTOR(7 downto 0); 
  Q   : out STD_LOGIC_VECTOR(7 downto 0) );
end COUNTER_8;

architecture COUNTER_8_ARCH of COUNTER_8 is
  signal q_int : STD_LOGIC_VECTOR(7 downto 0) := (others => '0');
begin

  process (CLK, CLR)
  begin
    if CLR='1' 
    then
      q_int <= (others => '0'); 
    elsif CLK='1' and CLK'event
    then
      if CE='1'
      then 
        if LD='1' 
        then
          q_int <= D;        
        else
          q_int <= q_int + 1;       
        end if;
      end if;
    end if;		
  end process;

  Q <= q_int;
  
end COUNTER_8_ARCH;

VERILOG
-------

module COUNTER_8 (
  input wire CLK,            
  input wire CLR,              
  input wire LD, 
  input wire CE, 
  input wire [7:0] D,        
  output reg [7:0] Q );

always @(posedge CLK or posedge CLR) begin
  if (CLR) 
    Q <= 8'b00000000;  
  else if (CE)	 
    if (LD)
      Q <= D;               
    else 
      Q <= Q + 1;         
end

endmodule

A difference between VHDL and Verilog is that a process in VHDL can not be read an output port as its an output i.e. you can only read inputs, therefore you need an internal signal as a temp buffer. In VHDL this is represented using the signal "q_int". You can download copies of these files as a ISE project with testbench here: (Link). Note, the .vhd and .v files need to be manually added to the project one at a time for testing as they use the same name. From these HDL descriptions the Xilinx ISE tools can create an RTL schemtic, as shown in figure 16. A waveform diagram of this component in action is shown in figure 17 below.

Figure 16 : COUNTER_8 schematic

Figure 17 : COUNTER_8 simulation

Control logic

The control logic is the same as used by the Logisim implementation (Link) A key component is a 4bit onehot decoder, one possible VHDL and Verilog implementation is shown below.

VHDL
----

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity ONEHOT_DECODER_16 is
port (
  A : in  STD_LOGIC_VECTOR(3 downto 0); 
  Y : out STD_LOGIC_VECTOR(15 downto 0) );
end ONEHOT_DECODER_16;

architecture ONEHOT_DECODER_16_ARCH of ONEHOT_DECODER_16 is
begin
  process (A)
  begin
    case A is
      when "0000" => Y <= "0000000000000001";
      when "0001" => Y <= "0000000000000010"; 
      when "0010" => Y <= "0000000000000100";
      when "0011" => Y <= "0000000000001000"; 
      when "0100" => Y <= "0000000000010000"; 
      when "0101" => Y <= "0000000000100000"; 
      when "0110" => Y <= "0000000001000000"; 
      when "0111" => Y <= "0000000010000000"; 
      when "1000" => Y <= "0000000100000000"; 
      when "1001" => Y <= "0000001000000000"; 
      when "1010" => Y <= "0000010000000000"; 
      when "1011" => Y <= "0000100000000000"; 
      when "1100" => Y <= "0001000000000000"; 
      when "1101" => Y <= "0010000000000000"; 
      when "1110" => Y <= "0100000000000000"; 
      when "1111" => Y <= "1000000000000000"; 
      when OTHERS => Y <= (OTHERS => '0');
    end case;
  end process;
end ONEHOT_DECODER_16_ARCH;

VERILOG
-------

module ONEHOT_DECODER_16 (
  input  wire [3:0] A,     
  output reg [15:0] Y );

always @(*) begin
  case (A)
    4'b0000: Y = 16'b0000000000000001;
    4'b0001: Y = 16'b0000000000000010;
    4'b0010: Y = 16'b0000000000000100;
    4'b0011: Y = 16'b0000000000001000;
    4'b0100: Y = 16'b0000000000010000;
    4'b0101: Y = 16'b0000000000100000;
    4'b0110: Y = 16'b0000000001000000;
    4'b0111: Y = 16'b0000000010000000;
    4'b1000: Y = 16'b0000000100000000;
    4'b1001: Y = 16'b0000001000000000;
    4'b1010: Y = 16'b0000010000000000;
    4'b1011: Y = 16'b0000100000000000;
    4'b1100: Y = 16'b0001000000000000;
    4'b1101: Y = 16'b0010000000000000;
    4'b1110: Y = 16'b0100000000000000;
    4'b1111: Y = 16'b1000000000000000;
    default: Y = 16'b0000000000000000;
  endcase
end

endmodule

Personally i find the way verilog defines stuff a little odd. In this example the output Y is declared as "reg" because it’s being assigned inside the always block. However, since the block is combinatorial (always @(*)), rather than clocked (@(posedge clk or rst)) the synthesized hardware will be combinational, not sequential. If you don't use "reg" you will get a compilation error. This feels odd, the "reg" keyword is used to declare variables that can hold values, not necessarily that these signals are registered i.e. driven by flip-flops. Also not sure about how constants / binary strings are declared i.e. size'bvalue, hmmm. From these HDL descriptions the Xilinx ISE tools can create an RTL schemtic, as shown in figure 18. A waveform diagram of this component in action is shown in figure 19 below.

Figure 18 : ONEHOT_DECODER_16 schematic

Figure 19 : ONEHOT_DECODER_16 simulation

In addition to a bin-to-onehot decoder the control logic also contains the logic needed to produce the required control signals. Again, these are taken from the original designs, but now implemented using HDLs, one possible VHDL and Verilog implementation is shown below.

VHDL
----

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity DECODER is
port (
  FETCH       : in  STD_LOGIC;                 
  DECODE      : in  STD_LOGIC;                
  EXECUTE     : in  STD_LOGIC;   
  MOVE        : in  STD_LOGIC; 
  ADD         : in  STD_LOGIC;  
  SUB         : in  STD_LOGIC; 
  BITWISE_AND : in  STD_LOGIC; 
  LOAD        : in  STD_LOGIC; 
  ADDM        : in  STD_LOGIC; 
  SUBM        : in  STD_LOGIC; 
  STORE       : in  STD_LOGIC; 
  JUMPU       : in  STD_LOGIC; 
  JUMPZ       : in  STD_LOGIC; 
  JUMPNZ      : in  STD_LOGIC;  
  Z           : in  STD_LOGIC; 
  ROM_EN      : OUT STD_LOGIC;  
  RAM_EN      : OUT STD_LOGIC;  
  RAM_WR      : OUT STD_LOGIC;   
  ADDR_SEL    : OUT STD_LOGIC;  
  DATA_SEL    : OUT STD_LOGIC; 
  ALU_CTL0    : OUT STD_LOGIC; 
  ALU_CTL1    : OUT STD_LOGIC;  
  ALU_CTL2    : OUT STD_LOGIC;  
  ACC_EN      : OUT STD_LOGIC;  
  IR_EN       : OUT STD_LOGIC; 
  PC_LD       : OUT STD_LOGIC;  
  PC_EN       : OUT STD_LOGIC );  
end DECODER;

architecture DECODER_ARCH of DECODER is
begin
  process ( FETCH, DECODE, EXECUTE,
            MOVE, ADD, SUB, BITWISE_AND,
            LOAD, ADDM, SUBM, STORE,
            JUMPU, JUMPZ, JUMPNZ, Z )
  begin

    ROM_EN   <= FETCH;
    RAM_EN   <= (DECODE or EXECUTE) and (LOAD or STORE or ADDM or SUBM);
    RAM_WR   <= EXECUTE and STORE;
    ADDR_SEL <= (DECODE or EXECUTE) and (LOAD or STORE or ADDM or SUBM);
    DATA_SEL <= LOAD or ADDM or SUBM;
    ALU_CTL0 <= SUB or SUBM;
    ALU_CTL1 <= BITWISE_AND;
    ALU_CTL2 <= MOVE or LOAD;
    ACC_EN   <= (MOVE or ADD or SUB or BITWISE_AND or LOAD or ADDM or SUBM) and EXECUTE;
    IR_EN    <= FETCH;
    PC_LD    <= DECODE and (JUMPU or (JUMPZ and Z) or (JUMPNZ and not Z));
    PC_EN    <= DECODE;

  end process;
end DECODER_ARCH;

VERILOG
-------

module DECODER (
  input  wire FETCH,          
  input  wire DECODE,            
  input  wire EXECUTE,  
  input  wire MOVE,
  input  wire ADD, 
  input  wire SUB,
  input  wire BITWISE_AND,
  input  wire LOAD,
  input  wire ADDM,
  input  wire SUBM, 
  input  wire STORE,
  input  wire JUMPU,
  input  wire JUMPZ,
  input  wire JUMPNZ,
  input  wire Z,
  output wire ROM_EN, 
  output wire RAM_EN,
  output wire RAM_WR,
  output wire ADDR_SEL, 
  output wire DATA_SEL,
  output wire ALU_CTL0,
  output wire ALU_CTL1,
  output wire ALU_CTL2,  
  output wire ACC_EN,
  output wire IR_EN, 
  output wire PC_LD,
  output wire PC_EN );

  assign ROM_EN   = FETCH;
  assign RAM_EN   = (DECODE | EXECUTE) & (LOAD | STORE | ADDM | SUMB);
  assign RAM_WR   = EXECUTE & STORE;
  assign ADDR_SEL = (DECODE | EXECUTE) & (LOAD | STORE | ADDM | SUBM);
  assign DATA_SEL = LOAD | ADDM | SUBM;
  assign ALU_CTL0 = SUB | SUBM;
  assign ALU_CTL1 = BITWISE_AND;
  assign ALU_CTL2 = MOVE | LOAD;
  assign ACC_EN   = (MOVE | ADD | SUB | BITWISE_AND | LOAD | ADDM | SUBM) & EXECUTE;
  assign IR_EN    = FETCH;
  assign PC_LD    = DECODE & (JUMPU | (JUMPZ & Z) | (JUMPNZ & ~Z));
  assign PC_EN    = DECODE;

endmodule

From these HDL descriptions the Xilinx ISE tools can create an RTL schematic, as shown in figure 20. To fully test this logic would be tricky, so i didn't :). You can download copies of these files as a ISE project with testbench here: (Link). Note, the .vhd and .v files need to be manually added to the project one at a time for testing as they use the same name. For the moment i just constructed a testbench that applied a constant logic-0 on each input, to confirm that there were not syntax errors, that all outputs also produced a logic-0. The final testing of this circuit will be done when it is integrated into the final processor i.e. tested through the execution of the test program.

Figure 20 : DECODER schematic

Figure 21 : DECODER simulation

Memory

From one point of view memory is just an array and lucky both VHDL and Verilog support this data type, one possible VHDL and Verilog implementation is shown below. Next, its just a question of how do we initialise this array with the required machine code and data.

VHDL
----

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity RAM is
port (
  CLK  : in  STD_LOGIC;         
  WE   : in  STD_LOGIC;                
  ADDR : in  STD_LOGIC_VECTOR(7 downto 0);
  DIN  : in  STD_LOGIC_VECTOR(15 downto 0); 
  DOUT : out STD_LOGIC_VECTOR(15 downto 0) );
end RAM;

architecture RAM_ARCH of RAM is
  type memory_type is array (0 to 2**8) of STD_LOGIC_VECTOR(15 downto 0);
  signal memory : memory_type := (
    0 => x"FFFF",
    OTHERS => (OTHERS => '0'));
begin
  process (clk)
  begin
    if CLK='1' and CLK'event 
    then
      if WE = '1' 
      then
        memory(to_integer(unsigned(ADDR))) <= DIN;
      end if;
      DOUT <= memory(to_integer(unsigned(ADDR)));
    end if;
  end process;
end RAM_ARCH;

VERILOG
-------
module RAM (
  input wire CLK,
  input wire WE,
  input wire [7:0] ADDR,
  input wire [15:0] DIN,
  output reg [15:0] DOUT );

  reg [7:0] memory [0:255];

  initial 
    begin
      memory[0] = 16'hFFFF;
    end

  always @(posedge CLK) 
    begin
      if (WE) 
        memory[ADDR] <= DIN; 
      DOUT <= memory[ADDR]; // Read from memory
    end

endmodule

You can download copies of these files as a ISE project with testbench here: (Link). Note, the .vhd and .v files need to be manually added to the project one at a time for testing as they use the same name. From these HDL descriptions the Xilinx ISE tools can create an RTL schemtic, as shown in figure 22. Hmmmmm, not sure why this produced a registered output, will double check what primitives this was built from i.e. LUT or BlockRAM memory? However, should be fine for simulations. Again, going to test these implementations using the processor running the test program.

Figure 22 : RAM schematic

To load the required machine-code and data into these memory components i decided to go simple and manually cut and paste i.e. write a Bash script to convert the assemblers object files into the required VHDL: 0 => x"FFFF" and Verilog memory[0] = 16'hFFFF; assignments needed for each memory location.

VHDL
----
#!/bin/sh

echo -n > vhdlData
cat code.dat | while read line
do
  addr=`echo $line | cut -d' ' -f1`
  data=`echo $line | cut -d' ' -f2`
  echo "    $addr => \"$data\"," >> vhdlData
done

VERILOG
-------
#!/bin/sh

echo -n > verilogData
cat code.dat | while read line
do
  addr=`echo $line | cut -d' ' -f1`
  data=`echo $line | cut -d' ' -f2`
  echo "      "memory[$addr] = "16'b"$data";" >> verilogData
done

To test these RAM models i used the normal test code as shown below. This was assembled using the python based assembler to produce the code.dat object file, this is then converted into the required data value using the above scripts, you can download these here: Link.

###################
# INSTRUCTION-SET #
###################

# INSTR   IR15 IR14 IR13 IR12 IR11 IR10 IR09 IR08 IR07 IR06 IR05 IR04 IR03 IR02 IR01 IR00
# MOVE    0    0    0    0    X    X    X    X    K    K    K    K    K    K    K    K
# ADD     0    0    0    1    X    X    X    X    K    K    K    K    K    K    K    K
# SUB     0    0    1    0    X    X    X    X    K    K    K    K    K    K    K    K
# AND     0    0    1    1    X    X    X    X    K    K    K    K    K    K    K    K

# LOAD    0    1    0    0    X    X    X    X    A    A    A    A    A    A    A    A
# STORE   0    1    0    1    X    X    X    X    A    A    A    A    A    A    A    A
# ADDM    0    1    1    0    X    X    X    X    A    A    A    A    A    A    A    A
# SUBM    0    1    1    1    X    X    X    X    A    A    A    A    A    A    A    A

# JUMPU   1    0    0    0    X    X    X    X    A    A    A    A    A    A    A    A
# JUMPZ   1    0    0    1    X    X    X    X    A    A    A    A    A    A    A    A
# JUMPNZ  1    0    1    0    X    X    X    X    A    A    A    A    A    A    A    A
# JUMPC   1    0    1    1    X    X    X    X    A    A    A    A    A    A    A    A        -- NOT IMPLEMENTED

########
# CODE #
########

start:
  move 1            # acc = 1
  move 3            # acc = 3
  move 7            # acc = 7
  move 15           # acc = 15    F
  move 31           # acc = 31   1F
  move 63           # acc = 63   3F
  move 127          # acc = 127  7F
  move 255          # acc = 255  FF 

  add 1             # acc = 0     0
  add 3             # acc = 3     3
  add 7             # acc = 10    A
  add 15            # acc = 25   19
  add 31            # acc = 56   38
  add 63            # acc = 119  77  
  add 127           # acc = 246  F6
  add 255           # acc = 245  F5

  sub 1             # acc = 244  F4 
  sub 3             # acc = 241  F1
  sub 7             # acc = 234  EA
  sub 15            # acc = 219  DB
  sub 31            # acc = 188  BC
  sub 63            # acc = 125  7D
  sub 127           # acc = 254  FE
  sub 255           # acc = 255  FF

  and 255           # acc = 255  FF
  and 127           # acc = 127  7F
  and 63            # acc = 63   3F
  and 31            # acc = 31   1F
  and 15            # acc = 15    F
  and 7             # acc = 7     7
  and 3             # acc = 3     3
  and 1             # acc = 1     1

  move 1            # acc = 1     1
  store A           # M[87] = 1
  move 3            # acc = 3     3
  store B           # M[88] = 3
  move 7            # acc = 7     7
  store C           # M[89] = 7
  move 15           # acc = 15    F
  store D           # M[90] = 15
  move 31           # acc = 31    1F
  store E           # M[91] = 31
  move 63           # acc = 63    3F
  store F           # M[92] = 63
  move 127          # acc = 127   7F      
  store G           # M[93] = 127
  move 255          # acc = 255   FF
  store H           # M[94] = 255

  load A            # acc = M[87] = 1     1
  load B            # acc = M[88] = 3     3
  load C            # acc = M[89] = 7     7
  load D            # acc = M[90] = 15    F
  load E            # acc = M[91] = 31   1F
  load F            # acc = M[92] = 63   3F
  load G            # acc = M[93] = 127  7F
  load H            # acc = M[94] = 255  FF

  addm A            # acc = 0     0
  addm B            # acc = 3     3
  addm C            # acc = 10    A
  addm D            # acc = 25   19 
  addm E            # acc = 56   38
  addm F            # acc = 119  77
  addm G            # acc = 246  F6
  addm H            # acc = 245  F5

  subm A            # acc = 244  F4
  subm B            # acc = 241  F1
  subm C            # acc = 234  EA
  subm D            # acc = 219  DB
  subm E            # acc = 188  BC
  subm F            # acc = 125  7D
  subm G            # acc = 254  FE
  subm H            # acc = 255  FF

  and 0             # acc = 0
  jumpz b1          # TAKEN
  move 255          # set acc to 255 if error

b1:
  add 1             # acc = 1
  jumpnz b2         # TAKEN
  move 255          # set acc to 255 if error

b2:
  and 0             # acc = 0
  jumpnz b3         # FALSE
  jumpu b4          # unconditional jump
b3:
  move 255          # set acc to 255 if error

b4:
  add 1             # acc = 1
  jumpz b5          # FALSE
  jumpu b6          # unconditional jump
b5:
  move 255          # set acc to 255 if error

b6:
  jumpu start       # jump back to start

A:
  .data 0
B:
  .data 0
C:
  .data 0
D:
  .data 0
E:
  .data 0
F:
  .data 0
G:
  .data 0
H:
  .data 0

Computer

The final simpleCPU implementations are shown in figures 23 and 24. A simulation of this processor running the tested program is shown in figure 25. You can download copies of these files as a ISE project with testbench here: (Link), (Link). Note, the .vhd and .v files need to be manually added to the project one at a time for testing as they use the same name.

Figure 23 : simpleCPU verilog schematic

Figure 24 : simpleCPU vhdl schematic

Figure 25 : simpleCPU simulation

To confirm that this simulation is working correctly you can do a visual inspection, but its quicker just to print changes to the DATA_OUT bus to the screen. This is easy to do in the VHDL testbench. A public confession, in the past i remember writing a function to convert STD_LOGIC_VECTOR to a HEX string, but the thought searching through a lot of old backups was a little depressing, so to my shame i used copilot, which gave me:

function to_hex_string(vec: std_logic_vector) return string is
  variable hex_string : string(1 to (vec'length / 4));
  variable nibble     : std_logic_vector(3 downto 0);
begin
  for i in 0 to (vec'length / 4 - 1) loop
    nibble := vec(vec'length - 1 - i * 4 downto vec'length - 4 - i * 4);
    case nibble is
      when "0000" => hex_string(i + 1) := '0';
      when "0001" => hex_string(i + 1) := '1';
      when "0010" => hex_string(i + 1) := '2';
      when "0011" => hex_string(i + 1) := '3';
      when "0100" => hex_string(i + 1) := '4';
      when "0101" => hex_string(i + 1) := '5';
      when "0110" => hex_string(i + 1) := '6';
      when "0111" => hex_string(i + 1) := '7';
      when "1000" => hex_string(i + 1) := '8';
      when "1001" => hex_string(i + 1) := '9';
      when "1010" => hex_string(i + 1) := 'A';
      when "1011" => hex_string(i + 1) := 'B';
      when "1100" => hex_string(i + 1) := 'C';
      when "1101" => hex_string(i + 1) := 'D';
      when "1110" => hex_string(i + 1) := 'E';
      when "1111" => hex_string(i + 1) := 'F';
      when others => hex_string(i + 1) := 'X';
    end case;
  end loop;
  return hex_string;
end to_hex_string;

This function can then be used in the PROCESS below, within the testbench:

debug: process
  variable data : line;
begin
  wait until DOUT'event; 
	  
  write(data, now);
  write(data, string'(" : DOUT = "));
  write(data, DOUT);
  write(data, string'(" = "));
  write(data, to_hex_string(DOUT));
  writeline(output, data); 
end process;

This process prints to the screen the time, binary value and hex value of the DOUT_BUS each time it changes, as shown below:.

0 ns   : DOUT = 00000000UUUUUUUU = 00XX
0 ns   : DOUT = 0000000000000000 = 0000
55 ns  : DOUT = 0000000000000001 = 0001
85 ns  : DOUT = 0000000000000011 = 0003
115 ns : DOUT = 0000000000000111 = 0007
145 ns : DOUT = 0000000000001111 = 000F
175 ns : DOUT = 0000000000011111 = 001F
205 ns : DOUT = 0000000000111111 = 003F
235 ns : DOUT = 0000000001111111 = 007F
265 ns : DOUT = 0000000011111111 = 00FF
295 ns : DOUT = 0000000000000000 = 0000
325 ns : DOUT = 0000000000000011 = 0003
355 ns : DOUT = 0000000000001010 = 000A
385 ns : DOUT = 0000000000011001 = 0019
415 ns : DOUT = 0000000000111000 = 0038
445 ns : DOUT = 0000000001110111 = 0077
475 ns : DOUT = 0000000011110110 = 00F6
505 ns : DOUT = 0000000011110101 = 00F5
535 ns : DOUT = 0000000011110100 = 00F4
565 ns : DOUT = 0000000011110001 = 00F1
595 ns : DOUT = 0000000011101010 = 00EA
625 ns : DOUT = 0000000011011011 = 00DB
655 ns : DOUT = 0000000010111100 = 00BC
685 ns : DOUT = 0000000001111101 = 007D
715 ns : DOUT = 0000000011111110 = 00FE
745 ns : DOUT = 0000000011111111 = 00FF
805 ns : DOUT = 0000000001111111 = 007F
835 ns : DOUT = 0000000000111111 = 003F
865 ns : DOUT = 0000000000011111 = 001F
895 ns : DOUT = 0000000000001111 = 000F
925 ns : DOUT = 0000000000000111 = 0007
955 ns : DOUT = 0000000000000011 = 0003
985 ns : DOUT = 0000000000000001 = 0001
ISim> 
# run 1.00us
1075 ns : DOUT = 0000000000000011 = 0003
1135 ns : DOUT = 0000000000000111 = 0007
1195 ns : DOUT = 0000000000001111 = 000F
1255 ns : DOUT = 0000000000011111 = 001F
1315 ns : DOUT = 0000000000111111 = 003F
1375 ns : DOUT = 0000000001111111 = 007F
1435 ns : DOUT = 0000000011111111 = 00FF
1495 ns : DOUT = 0000000000000001 = 0001
1525 ns : DOUT = 0000000000000011 = 0003
1555 ns : DOUT = 0000000000000111 = 0007
1585 ns : DOUT = 0000000000001111 = 000F
1615 ns : DOUT = 0000000000011111 = 001F
1645 ns : DOUT = 0000000000111111 = 003F
1675 ns : DOUT = 0000000001111111 = 007F
1705 ns : DOUT = 0000000011111111 = 00FF
1735 ns : DOUT = 0000000000000000 = 0000
1765 ns : DOUT = 0000000000000011 = 0003
1795 ns : DOUT = 0000000000001010 = 000A
1825 ns : DOUT = 0000000000011001 = 0019
1855 ns : DOUT = 0000000000111000 = 0038
1885 ns : DOUT = 0000000001110111 = 0077
1915 ns : DOUT = 0000000011110110 = 00F6
1945 ns : DOUT = 0000000011110101 = 00F5
1975 ns : DOUT = 0000000011110100 = 00F4
ISim> 
# run 1.00us
2005 ns : DOUT = 0000000011110001 = 00F1
2035 ns : DOUT = 0000000011101010 = 00EA
2065 ns : DOUT = 0000000011011011 = 00DB
2095 ns : DOUT = 0000000010111100 = 00BC
2125 ns : DOUT = 0000000001111101 = 007D
2155 ns : DOUT = 0000000011111110 = 00FE
2185 ns : DOUT = 0000000011111111 = 00FF
2215 ns : DOUT = 0000000000000000 = 0000
2275 ns : DOUT = 0000000000000001 = 0001
2335 ns : DOUT = 0000000000000000 = 0000
2425 ns : DOUT = 0000000000000001 = 0001
ISim>

This can then be compared to the expected ACC value shown in the test program comments. Doing this i did discover some errors in the control logic, but i went back and corrected these, so job done maybe?

FPGA Board

Figure 26 : FPGA Board

As always the only true test to see if something works is to build it, in this case to implement these HDL versions of the processor on an FPGA. The FPGA chosen was an old Spartan 3E board, as shown in figure 26, no good reason to select this board except that i had one on my desk :). To test out this processor we need to add some General Purpose Input Output (GPIO), some means for the processor to interact with the outside world, something we can connect a scope to. One possible VHDL implementation is shown below:

Note, we do also need to add some other system stuff as well e.g. reset and clock components, we will look at these later.

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity GPIO_8 is
port (
  CLK  : in  STD_LOGIC; 
  CLR  : in  STD_LOGIC; 
  WE   : in  STD_LOGIC;                
  CE   : in  STD_LOGIC;
  ADDR : in  STD_LOGIC;
  DIN  : in  STD_LOGIC_VECTOR(15 downto 0); 
  DOUT : out STD_LOGIC_VECTOR(15 downto 0);
  GPI  : in  STD_LOGIC_VECTOR(7 downto 0); 
  GPO  : out STD_LOGIC_VECTOR(7 downto 0) );
end GPIO_8;

architecture GPIO_8_ARCH of GPIO_8 is

  component REG_8
  port (
    D   : in STD_LOGIC_VECTOR(7 downto 0);         
    Q   : out STD_LOGIC_VECTOR(7 downto 0);
    CLK : in STD_LOGIC;  
    CLR : in STD_LOGIC;
    CE  : in STD_LOGIC);
  end component;
  
  component MUX_2_8 
  port (
    A, B : IN  STD_LOGIC_VECTOR(7 downto 0); 
    SEL  : IN  STD_LOGIC;                    
    Y    : OUT STD_LOGIC_VECTOR(7 downto 0));
  end component;
  
  signal en : STD_LOGIC;
  signal gpo_int : STD_LOGIC_VECTOR(7 downto 0); 
  signal dout_int : STD_LOGIC_VECTOR(7 downto 0); 

begin
  
  en <= CE AND WE;
  GPO <= gpo_int;
  DOUT <= "00000000" & dout_int;

  data_reg: REG_8 port map (
    D   => DIN(7 downto 0),      
    Q   => gpo_int,
    CLK => CLK,
    CLR => CLR,
    CE  => en ); 
	 
  data_mux : MUX_2_8 port map(
    A   => gpo_int,
    B   => GPI,
    SEL => ADDR,                    
    Y   => dout_int );

end GPIO_8_ARCH;

This is a very simple GPIO peripheral device, 8-bits of inputs and 8-bits of outputs. From this HDL descriptions the Xilinx ISE tools can create an RTL schematic, as shown in figure 27. This peripheral device is memory mapped into the processor's address space as shown in figure 28. Therefore, when the processor writes to address 0xFF or 0xFE the 8bit value stored in the ACC is written to the GPIO's internal register, the Q output pins of these associated flip-flops driving signals / pins in the real world. When the processor reads address 0xFE it will read the current value stored in this output register i.e. using the GPIOs MUX. When the processor reads address 0xFF it will read the signals connected to the GPIO's input port. The lower 4bits of this input port are connected to IO header J2, the higher 4bits are connected the the four toggle/slide switches.

Figure 27 : GPIO port

Figure 28 : simpleCPU memory map

To connect this peripheral device to the processor data bus we need to add a DATA MUX: MUX_2_16 i.e. we now have two components wanting to connect to the processor's data-in bus (memory and GPIO). The selection between these two sources is made by the address decoder logic, a seven input AND gate: AND_7. This AND gate is connected to the top seven bits of the address bus, so that when the processor writes to addresses 0xFF, or 0xFE the GPIO's clock enable (CE) line is activated i.e. set to a logic-1. From this new top-level HDL description the Xilinx ISE tools can create an RTL schematic, as shown in figure 29.

Figure 29 : top level schematic simpleCPU + GPIO + memory

In addition to this GPIO we also need a clock generator and a reset circuit. The clock generator can be implemented using the Xilinx IP-core tools, allowing us to implement a Digital Clock Manager (DCM) i.e. a hardcore component inside the FPGA that can be used to divide an external clock source down to a lower frequency, in this case the external 50MHz clock down to a 10MHz clock. The software tools automatically generate this component, providing an VHDL template that we add to our HDL description, an auto generated RTL schematic symbol of this component is shown in figure 30.

Figure 30 : Digital Clock Manager (DCM)

To ensure that the processor only starts working when the DCM clock is stable and to ensure that contact bounce from the RESET button does not cause the processor to be repeatedly restarted i implemented the async_reset.vhd component below. This component uses the DCM's lock pin to hold the processor in reset until the internal 10MHz clock is stable i.e. locked. The RESET button is debounced by a simple shift register circuit, i confess not the most robust debounce circuit, but it seems to do its job ok :).

-- =============================================================================================================
-- *
-- * Copyright (c) Mike
-- *
-- * File Name: async_reset.vhd
-- *
-- * Version: V1.0
-- *
-- * Release Date:
-- *
-- * Author(s): M.Freeman
-- *
-- * Description: async to syn reset circuit
-- *
-- * Conditions of Use: THIS CODE IS COPYRIGHT AND IS SUPPLIED "AS IS" WITHOUT WARRANTY OF ANY KIND, INCLUDING,
-- *                    BUT NOT LIMITED TO, ANY IMPLIED WARRANTY OF MERCHANTABILITY AND FITNESS FOR A
-- *                    PARTICULAR PURPOSE.
-- *
-- * Notes:
-- *
-- =============================================================================================================

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity async_reset is
Port ( clk         : in  STD_LOGIC;
       locked      : in  STD_LOGIC;
       reset_in    : in  STD_LOGIC;
       reset_out   : out STD_LOGIC;
       reset_out_n : out STD_LOGIC );
end async_reset;

architecture async_reset_arch of async_reset is

    constant SIZE : integer := 4;
    
    signal reset_internal : std_logic_vector(SIZE downto 0);
    signal full : std_logic_vector(SIZE downto 0);    
     
begin

    full <= (others => '1');

    reset_shift_reg : process ( clk, reset_in )
    begin
        if reset_in = '1'
        then
            reset_internal <= (others =>'0');
        elsif clk='1' and clk'event
        then
            if locked = '1'
            then 
                reset_internal(0) <= '1';
     
                for I in 0 to SIZE-1
                loop
                    reset_internal(SIZE-I) <= reset_internal(SIZE-I-1);
	            end loop;
	         else
	           reset_internal <= (others =>'0');   
	         end if;
	     end if;	             
    end process;
    
    
    reset_gen : process( clk )
    begin
        if clk='1' and clk'event
        then
            if reset_internal /= full
            then
                reset_out   <= '1';
                reset_out_n <= '0';                
            else
                reset_out   <= '0';
                reset_out_n <= '1';             
            end if;
        end if;
    end process;    
   
end async_reset_arch;

The final top-level HDL description is shown below, "wiring" up all of our top level components to produce a final simpleCPUv1a computer.

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity computer is
port(
  CLK : in STD_LOGIC;
  CLR : in STD_LOGIC;
  GPI : in  STD_LOGIC_VECTOR(7 downto 0); 
  GPO : out STD_LOGIC_VECTOR(7 downto 0) );
end computer;

architecture computer_arch of computer is
  
  component clock_divider_1
  port(
    CLKIN_IN        : IN std_logic;
    RST_IN          : IN std_logic;          
    CLKDV_OUT       : OUT std_logic;
    CLKIN_IBUFG_OUT : OUT std_logic;
    CLK0_OUT        : OUT std_logic;
    LOCKED_OUT      : OUT std_logic );
  end component;

  component async_reset 
  port( 
    clk         : in  STD_LOGIC;
    locked      : in  STD_LOGIC;
    reset_in    : in  STD_LOGIC;
    reset_out   : out STD_LOGIC;
    reset_out_n : out STD_LOGIC );
  end component;
  
  component simplecpu_v1a 
  port(
    CLK    : in  STD_LOGIC;
    CLR    : in  STD_LOGIC;
    ROM_EN : out STD_LOGIC;   
    RAM_EN : out STD_LOGIC;
    RAM_WR : out STD_LOGIC;
    ADDR   : out STD_LOGIC_VECTOR(7 downto 0); 
    DIN    : in  STD_LOGIC_VECTOR(15 downto 0); 
    DOUT   : out STD_LOGIC_VECTOR(15 downto 0) ); 
  end component;
  
  component RAM 
  port (
    CLK  : in  STD_LOGIC;         
    WE   : in  STD_LOGIC;                
    ADDR : in  STD_LOGIC_VECTOR(7 downto 0);
    DIN  : in  STD_LOGIC_VECTOR(15 downto 0); 
    DOUT : out STD_LOGIC_VECTOR(15 downto 0) );
  end component;
  
  component GPIO_8 
  port (
    CLK  : in  STD_LOGIC; 
    CLR  : in  STD_LOGIC; 
    WE   : in  STD_LOGIC; 
    CE   : in  STD_LOGIC; 
    ADDR : in  STD_LOGIC;
    DIN  : in  STD_LOGIC_VECTOR(15 downto 0); 
    DOUT : out STD_LOGIC_VECTOR(15 downto 0);
    GPI  : in  STD_LOGIC_VECTOR(7 downto 0); 
    GPO  : out STD_LOGIC_VECTOR(7 downto 0) );
  end component;
  
  component MUX_2_16 
  port (
    A, B : IN  STD_LOGIC_VECTOR(15 downto 0); 
    SEL  : IN  STD_LOGIC;                    
    Y    : OUT STD_LOGIC_VECTOR(15 downto 0));
  end component;
  
  component AND_7 
  port(
    A : IN STD_LOGIC_VECTOR( 6 downto 0 );
    Z : OUT STD_LOGIC);
  end component;
  
  signal clk_int   : STD_LOGIC;
  signal clr_int   : STD_LOGIC;
  signal clr_n_int : STD_LOGIC;  
  signal locked    : STD_LOGIC;
  signal clk_int_bufg : STD_LOGIC; 
  signal clk_int_50MHz : STD_LOGIC;
  
  signal addr     : STD_LOGIC_VECTOR(7 downto 0);
  signal data_in  : STD_LOGIC_VECTOR(15 downto 0);
  signal data_out : STD_LOGIC_VECTOR(15 downto 0);
  
  signal mem_data_out : STD_LOGIC_VECTOR(15 downto 0);
  
  signal gpio_data_out : STD_LOGIC_VECTOR(15 downto 0);
  signal gpio_cs   : STD_LOGIC;

  signal ram_wr   : STD_LOGIC;
  signal row_en   : STD_LOGIC; 
  signal ram_en   : STD_LOGIC;
  
begin

  clock : clock_divider_1 port map(
    CLKIN_IN        => CLK,
    RST_IN          => CLR,         
    CLKDV_OUT       => clk_int,
    CLKIN_IBUFG_OUT => clk_int_bufg,
    CLK0_OUT        => clk_int_50MHz,
    LOCKED_OUT      => locked ); 
  
  reset : async_reset port map( 
    clk         => clk_int,        
    locked      => locked,   
    reset_in    => CLR,
    reset_out   => clr_int, 
    reset_out_n => clr_n_int );

  cpu : simplecpu_v1a port map(
    CLK    => clk_int,    
    CLR    => clr_int,
    ROM_EN => row_en, 
    RAM_EN => ram_en,
    RAM_WR => ram_wr,
    ADDR   => addr,
    DIN    => data_in,
    DOUT   => data_out ); 
  
  mem : RAM port map(
    CLK  => clk_int,      
    WE   => ram_wr,            
    ADDR => addr,
    DIN  => data_out,
    DOUT => mem_data_out );
	 
  gpio : GPIO_8 port map(
    CLK  => clk_int,
    CLR  => clr_int, 
    WE   => ram_wr,               
    CE   => gpio_cs,
    ADDR => addr(0),
    DIN  => data_out,
    DOUT => gpio_data_out,
    GPI  => GPI,  
    GPO  => GPO );
	 
  addr_decoder : AND_7 port map(
    A => addr(7 downto 1),
    Z => gpio_cs );	 
 
  mux : MUX_2_16 port map(
    A   => mem_data_out,
    B   => gpio_data_out,
    SEL => gpio_cs,               
    Y   => data_in );
	 
end computer_arch;

To test out this new design the test code below is used. What we have is an incrementing COUNT variable that is written to the output port. We can then look at these 8-bits using a scope, the LSB will have the highest frequency, the MSB the lowest. To integrate the input port into this test code the 8-bit input value is bitwise ANDed with this COUNT variable i.e. if a specific input bit is 0 then that output bit is masked, set to zero. This code is a bit of a cut and paste solution as we do not have an absolute addressing mode bitwise-AND instruction, but it works ok :).

Note, in the Xilinx User Constraints File (UCF) the inputs connected to the IO header are configured to use pull-ups, so these default to a logic-1, the other four input bits are controlled via switches, allowing the user to enable and disable each output bit.

########
# CODE #
########

start:
  load count        # load count variable
  add 1             # inc
  store count       # update
  store tmp

test0:
  load 0xFF         # test input port bits
  and 0x01
  jumpnz test1
  load tmp
  and 0xFE          # 1111 1110
  store tmp

test1:
  load 0xFF         
  and 0x02
  jumpnz test2
  load tmp
  and 0xFD          # 1111 1101
  store tmp

test2:
  load 0xFF         
  and 0x04
  jumpnz test3
  load tmp
  and 0xFB          # 1111 1011
  store tmp

test3:
  load 0xFF         
  and 0x08
  jumpnz test4
  load tmp
  and 0xF7          # 1111 0111
  store tmp

test4:
  load 0xFF         
  and 0x10
  jumpnz test5
  load tmp
  and 0xEF          # 1110 1111
  store tmp

test5:
  load 0xFF         
  and 0x20
  jumpnz test6
  load tmp
  and 0xDF          # 1101 1111
  store tmp

test6:
  load 0xFF         
  and 0x40
  jumpnz test7
  load tmp
  and 0xBF          # 1011 1111
  store tmp

test7:
  load 0xFF         
  and 0x80
  jumpnz update 
  load tmp
  and 0x7F          # 0111 1111
  store tmp

update:
  load tmp
  store 0xFF        # store to output port
  jump start        # repeat
  
count:
    .data 0
tmp:
    .data 0

All seems to work Ok, not a complete test, but a reasonable test. The MSB has a period of approx 2.4ms, the LSB has a period of approx 18.6us, as shown in figure 31. You can download copies of these files as a ISE project with testbench here: (Link).

Figure 31 : output signals, MSB (left) and LSB (right)

WORK IN PROGRESS

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Contact email: mike@simplecpudesign.com

Back