This is the second part of ‘Anatomy of a Hack assembly program’ series.



In the first part, we learnt the details about Hack hardware platform. Now, it is a good time to deep dive into Hack assembly language before we understand how binary instructions flow through the CPU.

Hack Assembly

The Hack Assembly Language is minimal, it consists of 2 types of instructions: A-Instruction (Addressing instructions), and C-Instruction (Computation instructions). It also allows declaration of symbols.

A-Instruction

Sets the contents of the A register to the specified value. The value is either a non-negative number (i.e. 3) or a Symbol. If the value is a Symbol, then the contents of the A register is set to the value that the Symbol refers to but not the actual data in that Register or Memory Location.

Syntax

@value, where value is either a decimal non-negative number or a Symbol.

  • @3
  • @R3
  • @SCREEN

Binary Translation

0xxxxxxxxxxxxxxx, where x is a bit, either 0 or 1. A-Instructions always have their MSB set to 0.

  • 000000000001010
  • 011111111111111

C-Instruction

Performs a computation on the CPU and stores the output in a register or memory address, and then either jumps to an instruction location that is usually addressed by a symbol or continues with the next instruction.

Symbols

Symbols can be either variables or labels. Variables are symbolic names for memory addresses to make accessing these addresses easier. Labels are instruction addresses that allow jumps in the program easier to handle. There are three ways to introduce symbols into an assembly program: Predefined symbols, label symbols, and variable symbols.

Predefined Symbols

A special subset of RAM addresses can be referred to by any assembly program.

  • SP: RAM address 0
  • LCL: RAM address 1
  • ARG: RAM address 2
  • THIS: RAM address 3
  • THAT: RAM address 4
  • R0-R15: Addresses of 16 RAM Registers, mapped from 0 to 15
  • SCREEN: Base address of the Screen Map in Main Memory, which is equal to 16384
  • KBD: Keyboard Register address in Main Memory, which is equal to 24576

Label Symbols

To declare a label we need to use the command (LABEL_NAME), where LABEL_NAME can be any name we desire to have for the label, as long as it’s wraped between parentheses.

(LOOP)
// instruction 1
// instruction 2
// instruction 3
@LOOP
0;JMP

(LOOP) declares a new label called LOOP, it will be resolved to the address of the next instruction on the following line. The instruction @LOOP is an A-Instruction that sets the contents of A Register to the instruction address the label refers to.

Variable Symbols

Any user-defined symbol @variable that is not predefined using (variable) command is treated as a variable, and is assigned a unique memory address, starting at RAM address 16 (0x0010).

@i
M=0

The symbol @i declares a variable i, and the instruction M=0 sets the memory location of i in RAM to 0, the address i is stored in A-Register.


That’s it for the second part. In the next part I will explain how the CU (control unit) decodes an instruction and how the decoded instruction flows through the CPU.