This lesson starts at commit 8e8cf23ea6f3e34ae18a7bddce8bd375d43a85dd.

2. Fetch stage

With our main project structure set up, we can start working on the core itself; specifically, on the fetch stage. This stage does nothing more than loading opcodes from the memory and passing them on to the next stage in the pipeline.

Fetch stage

Loading opcodes from memory? I hear you ask. Erm, yeah... At this point we have not implemented any memory yet. Doing this properly is a bit of work, and for now I just want to get our CPU into a state where it can do something, so that we can get our dopamine hit. So, for now, we're going to implement a quick hack and implement our memory as a simple array of 32-bit bitfields.

First, we'll declare a type instruction_memory_t for holding the instructions. In RISC-V, instructions are 32 bits wide (if you are not using the "C" extension for compressed instructions, which we are not). I choose to make our memory 16 instructions big, because that seemed large enough to hold very simple programs in assembly without becoming unwieldy. We can simply increase this number if we want to execute longer programs, and we'll eventually switch to a better memory implementation anyway.

For now, I'll just fill the instruction memory with zero bits.

src/core/fetch.vhd CHANGED Viewed

@@ -15,6 +15,12 @@ end fetch;
 architecture rtl of fetch is
 begin
 	process (clk)

 architecture rtl of fetch is
+	type instruction_memory_t is array(0 to 15) of std_logic_vector(31 downto 0);
+	signal imem: instruction_memory_t := (
+		X"00000000", X"00000000", X"00000000", X"00000000", X"00000000", X"00000000", X"00000000", X"00000000",
+		X"00000000", X"00000000", X"00000000", X"00000000", X"00000000", X"00000000", X"00000000", X"00000000"
+	);
 begin
 	process (clk)

Now, we'll also want a signal to keep track of number (or address) of the current instruction. This is typically called the program counter and abbreviated to "PC". I'll make our program counter of type unsigned so that we can increase it without having to do too many casts. In RISC-V, the program counter is 32 bits wide, so I'll use 32 bits as well.

src/core/fetch.vhd CHANGED Viewed

@@ -21,6 +21,8 @@ architecture rtl of fetch is
 		X"00000000", X"00000000", X"00000000", X"00000000", X"00000000", X"00000000", X"00000000", X"00000000"
 	);
 begin
 	process (clk)

 		X"00000000", X"00000000", X"00000000", X"00000000", X"00000000", X"00000000", X"00000000", X"00000000"
 	);
+	signal pc: unsigned(31 downto 0) := (others => '0');
 begin
 	process (clk)

Now, we'll want to increase the program counter to point to the next instruction after passing the instruction at the current program counter to the next stage. Now, the program counter is a byte address, and since our instructions are 32 bits wide, we need to increase the program counter by 4 every cycle. For now we'll just assume we can output an instruction every cycle.

src/core/fetch.vhd CHANGED Viewed

@@ -28,7 +28,7 @@ begin
 	process (clk)
 	begin
 		if rising_edge(clk) then
-			-- TODO: implement
 		end if;
 	end process;

 	process (clk)
 	begin
 		if rising_edge(clk) then
+			pc <= pc + 4;
 		end if;
 	end process;

So far so good. After simulating this we can see it works as expected.

Now, we'll also want to fetch the instruction (or opcode) at the address that the program counter points to, and pass it to the next stage. So, let's add an output for it.

src/core/constants.vhd CHANGED Viewed

@@ -6,7 +6,7 @@ use work.core_types.all;
 package core_constants is
 	constant DEFAULT_FETCH_OUTPUT: fetch_output_t := (
-		placeholder => '0'
 	);
 	constant DEFAULT_DECODE_OUTPUT: decode_output_t := (

 package core_constants is
 	constant DEFAULT_FETCH_OUTPUT: fetch_output_t := (
+		instr => (others => '0')
 	);
 	constant DEFAULT_DECODE_OUTPUT: decode_output_t := (

src/core/types.vhd CHANGED Viewed

@@ -4,7 +4,7 @@ use ieee.std_logic_1164.all;
 package core_types is
 	type fetch_output_t is record
-		placeholder: std_logic;
 	end record fetch_output_t;
 	type decode_output_t is record

 package core_types is
 	type fetch_output_t is record
+		instr: std_logic_vector(31 downto 0);
 	end record fetch_output_t;
 	type decode_output_t is record

Let's actually fetch the opcode from the instruction memory and set it in the output.

src/core/fetch.vhd CHANGED Viewed

@@ -29,6 +29,7 @@ begin
 	begin
 		if rising_edge(clk) then
 			pc <= pc + 4;
 		end if;
 	end process;

 	begin
 		if rising_edge(clk) then
 			pc <= pc + 4;
+			output.instr <= imem(to_integer(pc(5 downto 2)));
 		end if;
 	end process;

To test this, let's fill the instruction memory with a counter that starts at one (to be able to distinguish between an "empty" opcode and the first opcode).

src/core/fetch.vhd CHANGED Viewed

@@ -17,8 +17,8 @@ end fetch;
 architecture rtl of fetch is
 	type instruction_memory_t is array(0 to 15) of std_logic_vector(31 downto 0);
 	signal imem: instruction_memory_t := (
-		X"00000000", X"00000000", X"00000000", X"00000000", X"00000000", X"00000000", X"00000000", X"00000000",
-		X"00000000", X"00000000", X"00000000", X"00000000", X"00000000", X"00000000", X"00000000", X"00000000"
 	);
 	signal pc: unsigned(31 downto 0) := (others => '0');

 architecture rtl of fetch is
 	type instruction_memory_t is array(0 to 15) of std_logic_vector(31 downto 0);
 	signal imem: instruction_memory_t := (
+		X"00000001", X"00000002", X"00000003", X"00000004", X"00000005", X"00000006", X"00000007", X"00000008",
+		X"00000009", X"0000000A", X"0000000B", X"0000000C", X"0000000D", X"0000000E", X"0000000F", X"00000010"
 	);
 	signal pc: unsigned(31 downto 0) := (others => '0');

If we now simulate the clk, pc, and output for 50 ns, we get the following waveforms:

Simulation waveforms

We can see that:

The value of the opcode lags behind the value of the pc register by one cycle.
For the first cycle, the opcode in the output is empty.

Both are fairly typical problems you run into when doing hardware design.

The issue of different values being out of sync is not a real problem, but can be a bit confusing. It happens because the value of output.opcode is set based on the value of pc. Their values are simultaneously updated on a rising edge of the clock, where output.opcode uses the old value of the pc. This is completely fine, so we don't need to make any changes at this point. The reason I am bringing it up, is that it's very important to be aware of things like this when we try to reason about what's going on in our CPU.

The second point is actually a problem, and we'll address it by adding an is_active flag in the output of the fetch stage and setting it to zero in the default value. If this flag is zero, the output is empty and should be ignored by other stages. We'll have to add similar flags to the output of the other stages later.

src/core/constants.vhd CHANGED Viewed

@@ -6,6 +6,7 @@ use work.core_types.all;
 package core_constants is
 	constant DEFAULT_FETCH_OUTPUT: fetch_output_t := (
 		instr => (others => '0')
 	);

 package core_constants is
 	constant DEFAULT_FETCH_OUTPUT: fetch_output_t := (
+		is_active => '0',
 		instr => (others => '0')
 	);

src/core/types.vhd CHANGED Viewed

@@ -4,6 +4,7 @@ use ieee.std_logic_1164.all;
 package core_types is
 	type fetch_output_t is record
 		instr: std_logic_vector(31 downto 0);
 	end record fetch_output_t;

 package core_types is
 	type fetch_output_t is record
+		is_active: std_logic;
 		instr: std_logic_vector(31 downto 0);
 	end record fetch_output_t;

We have to set the flag to one whenever we output an opcode.

src/core/fetch.vhd CHANGED Viewed

@@ -29,6 +29,8 @@ begin
 	begin
 		if rising_edge(clk) then
 			pc <= pc + 4;
 			output.instr <= imem(to_integer(pc(5 downto 2)));
 		end if;
 	end process;

 	begin
 		if rising_edge(clk) then
 			pc <= pc + 4;
+			output.is_active <= '1';
 			output.instr <= imem(to_integer(pc(5 downto 2)));
 		end if;
 	end process;

Now, let's simulate the waveforms again to verify that the issue is fixed:

Simulation waveforms

This looks good; the first time the is_active is 1, the opcode field is 00000001. The output.opcode field is still lagging behind pc one cycle, but as noted, this is not a problem.