Extremely random blog posts from Onat

Let’s implement a Bloom Filter

2020-08-10T00:00:00+00:00

I am planning to create a series of blog posts that includes some literature research, implementation of various data structures and our journey of creating a distributed datastore in distrentic.io.

You might be wondering why I start with a blog post explaining the Bloom Filter while I don’t have single clue about how to create a distributed datastore? My answer is simple: “I like the idea behind it”.

Before I get into the details of the Bloom filters, I want to give our backstory that will help you understand why we started building something we’d enjoy during our spare time that will never be production ready.

The backstory

My friend Ibrahim and I are always fascinated by complex software and distibuted systems - We’ve been working together more than 5 years (we got old dude) and we were lucky enough to work for the largest e-commerce company in Europe. We battled our way solving many different problems that distributed systems can offer. We both moved to Cambridge, UK and still fighting against distributed world villains.

Let’s explore the mystic land of probabilistic data structures by implementing a Bloom Filter.

What the hell is a Bloom Filter

You might also want to read Bloom filters debunked.

A Bloom filter is a method for representing a set $A = {a_1, a_2,\ldots, a_n}$ of n elements (also called keys) to support membership queries. It was invented by Burton Bloom in 1970 and was proposed for use in the web context by Marais and Bharat as a mechanism for identifying which pages have associated comments stored within a CommonKnowledge server. ¹

It is a space-efficient probabilistic data structure that is used to answer a very simple question: is this element a member of a set?. A Bloom filter does not store the actual elements, it only stores the membership of them.

False positive matches are possible, but false negatives are not – in other words, a query returns either “possibly in set” or “definitely not in set”. ² Unfortunately, this also means items cannot be removed from the Bloom Filter (Some other element or group of elements may be hashed to the same indices).

Because of its nature of being probabilistic, the Bloom Filter trades space and performance for accuracy. This is much like the CAP theorem, we choose performance over accuracy.

Bloom filters have some interesting use cases. For example, they can be placed on top of a datastore. When a key is queried for its existence and the filter does not have it, we can skip querying the datastore entirely.

Figure 1: Example usage of a Bloom filter.

How does it work

The idea behind Bloom filter is very simple: Allocate an array $v$ of $m$ bits, each bit in the array is initially set to $0$, and then choose $k$ independent hash functions $h_1, h_2, …, h_k$, each with range ${1,…,m}$.

The Bloom filter has two operations just like a standard set:

Insertion

When an element $a \in A$ is added to the filter, the bits at positions $h_1(a), h_2(a), …, h_k(a)$ in $v$ are set to $1$. In simpler words, the new element is hashed by $k$ number of functions and modded by $m$, resulting in $k$ indices into the bit array. Each bit at the respective index is set.

Figure 2: Adding elements to a Bloom filter ($m = 10$, $k = 3$).

Query

To query the membership of an element $b$, we check the bits at indices $h_1(b), h_2(b), …, h_k(b)$ in $v$. If any of them is $0$, then certainly $b$ is not in the set $A$. Otherwise, we assume that $b$ is in the set although it’s possible that some other element or group of elements hashed to the same indices. This is called a false positive. We can target a specific probability of false positives by selecting an optimal value of $m$ and $k$ for up to $n$ insertions.

A Bloom filter eventually reaches a point where all bits are set, which means every query will indicate membership, effectively making the probability of false positives $1$. The problem with this is it requires a priori knowledge of the data set in order to select optimal parameters and avoid “overfilling”. ³

Finding optimal $k$ and $m$

We can derive optimal $k$ and $m$ based on $n$ and a chosen probability of false positives $P_{FP}$.

\[k = -\frac{\ln{P_{FP}}}{\ln{2}} , m = -\frac{n\ln{P_{FP}}}{(\ln2)^2}\]

If you want to learn about how the above formulae are derived, you might want to pay a visit here.

Rust implementation

You can find the full implementation here. Huge thanks to @xfix for fixing hashers initialization with the same seed and @dkales for spotting an issue with the ordering of operations in index calculation.

Finally! It is time to write some rust :heart_eyes:. I am simultaneously implementing the bloom filter whilst writing this blog post. If you don’t believe me then check the below command:

cargo new --lib plum

Let’s continue with the dependencies. There is only one dependency and we will use it to create $v$.

[dependencies]
bit-vec = "0.6"

We will declare a new struct StandardBloomFilter to encapsulate required fields $k$ (optimal number of hash functions), $m$ (optimal size of the bit array), $v$ (the bit array), hash functions and a marker to tell rust compiler that our struct “owns” a T.

extern crate bit_vec;

use bit_vec::BitVec;
use std::collections::hash_map::{DefaultHasher, RandomState};
use std::hash::{BuildHasher, Hash, Hasher};
use std::marker::PhantomData;

pub struct StandardBloomFilter<T: ?Sized> {
    bitmap: BitVec,
    optimal_m: u64,
    optimal_k: u32,
    hashers: [DefaultHasher; 2],
    _marker: PhantomData<T>,
}

Careful readers will think that I made a mistake in the declaration of hashers array because of the requirement of $k$ independent hash functions. It was indeed intentional here’s why:

Why two hash functions? Kirsch and Mitzenmacher demonstrated in their paper that using two hash functions $h_1(x)$ and $h_2(x)$ to simulate additional hash functions of the form $g_i(x) = h_1(x) + {i}{h_2(x)}$ can be usefully applied to Bloom filters. This leads to less computation and potentially less need for randomness in practice. ⁴ This formula may appear similar to the use of pairwise indenpendent hash functions. Unfortunately, there is no formal connection between the two techniques.

I mentioned earlier that the Bloom Filter has two operations like a standard set: insert and query. We will implement those two operations along with constructor-like new method.

impl<T: ?Sized> StandardBloomFilter<T> {
    pub fn new(items_count: usize, fp_rate: f64) -> Self {
        // ...snip
    }
}

new calculates the size of the bitmap ($v$) and optimal_k ($k$) and then instantiates a StandardBloomFilter.

impl<T: ?Sized> StandardBloomFilter<T> {
    pub fn new(items_count: usize, fp_rate: f64) -> Self {
        let optimal_m = Self::bitmap_size(items_count, fp_rate);
        let optimal_k = Self::optimal_k(fp_rate);
        let hashers = [
            RandomState::new().build_hasher(),
            RandomState::new().build_hasher(),
        ];
        StandardBloomFilter {
            bitmap: BitVec::from_elem(optimal_m as usize, false),
            optimal_m,
            optimal_k,
            hashers,
            _marker: PhantomData,
        }
    }

    // ...snip

    fn bitmap_size(items_count: usize, fp_rate: f64) -> usize {
        let ln2_2 = core::f64::consts::LN_2 * core::f64::consts::LN_2;
        ((-1.0f64 * items_count as f64 * fp_rate.ln()) / ln2_2).ceil() as usize
    }

    fn optimal_k(fp_rate: f64) -> u32 {
        ((-1.0f64 * fp_rate.ln()) / core::f64::consts::LN_2).ceil() as u32
    }

    // ...snip
}

Let’s run these calculations on Rust Playground.

bitmap_size: 9585059
optimal_k: 7

This looks really promising! A Bloom Filter that represents a set of $1$ million items with a false-positive rate of $0.01$ requires only $9585059$ bits ($~1.14\mathrm{MB}$) and 7 hash functions.

We managed to construct a Bloom Filter so far and it is time to implement insert and contains methods. Their implementations are dead simple and they share the same code to calculate indexes of the bit array.

impl<T: ?Sized> StandardBloomFilter<T> {
    // ...snip

    pub fn insert(&mut self, item: &T)
    where
        T: Hash,
    {
        let (h1, h2) = self.hash_kernel(item);

        for k_i in 0..self.optimal_k {
            let index = self.get_index(h1, h2, k_i as u64);

            self.bitmap.set(index, true);
        }
    }

    pub fn contains(&mut self, item: &T) -> bool
    where
        T: Hash,
    {
        let (h1, h2) = self.hash_kernel(item);

        for k_i in 0..self.optimal_k {
            let index = self.get_index(h1, h2, k_i as u64);

            if !self.bitmap.get(index).unwrap() {
                return false;
            }
        }

        true
    }
}

The above methods depend on two other methods that we haven’t implemented yet: hash_kernel and get_index. hash_kernel is going to be the one where the actual “hashing” happens. It will return the hash values of $h_1(x)$ and $h_2(x)$.

impl<T: ?Sized> StandardBloomFilter<T> {
    // ...snip

   fn hash_kernel(&self, item: &T) -> (u64, u64)
    where
        T: Hash,
    {
        let hasher1 = &mut self.hashers[0].clone();
        let hasher2 = &mut self.hashers[1].clone();

        item.hash(hasher1);
        item.hash(hasher2);

        let hash1 = hasher1.finish();
        let hash2 = hasher2.finish();

        (hash1, hash2)
    }
}

We could’ve used $128$ bit MurmurHash3 and returned upper $64$ bit as hash1 and the lower as hash2 but to keep this implementation even simpler (this is how Google Guava Bloom Filter implementation currently works) and not to rely on any other additional dependencies I decided to continue with DefaultHasher - see SipHash

Now, it is time to make the final touch. We are going to implement get_index by using $g_i(x) = h_1(x) + {i}{h_2(x)}$ to simulate more than two hash functions.

impl<T: ?Sized> StandardBloomFilter<T> {
    // ...snip

    fn get_index(&self, h1: u64, h2: u64, k_i: u64) -> usize {
        (h1.wrapping_add((k_i).wrapping_mul(h2)) % self.optimal_m) as usize
    }

    // ...snip

We are finally there

:tada: :tada: :tada: We’ve just finished implementing a fast variant of a standard Bloom Filter but there is still one thing missing - We didn’t write any tests.

Let’s add two simple test cases and validate our implementation.

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn insert() {
        let mut bloom = StandardBloomFilter::new(100, 0.01);
        bloom.insert("item");
        assert!(bloom.contains("item"));
    }

    #[test]
    fn check_and_insert() {
        let mut bloom = StandardBloomFilter::new(100, 0.01);
        assert!(!bloom.contains("item_1"));
        assert!(!bloom.contains("item_2"));
        bloom.insert("item_1");
        assert!(bloom.contains("item_1"));
    }
}

❯ cargo test
   Compiling plum v0.1.2 (/Users/onat.mercan/dev/distrentic/plum)
    Finished test [unoptimized + debuginfo] target(s) in 0.99s
     Running target/debug/deps/plum-6fc161db530d5b36

running 2 tests
test tests::insert ... ok
test tests::check_and_insert ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

I hope you’ve enjoyed reading this post as much as I enjoyed writing it!

If you find anything wrong with the code, you can file an issue or, even better, submit a pull request.

Discuss it on HN

Resources

What I learned from my failed attempt of writing baremetal android in Rust

2019-04-22T12:35:34+00:00

This post is focused mostly on the tools that I use while I failed to write a bootable kernel image in rust.

Every year I define a super ambitious goal for my learning process to keep myself motivated on the way. This year I defined my goal as writing a bootable kernel image for my old HTC One X android smartphone. I knew it was going to be hard but I never thought I’d fail in the end. It was clearly the Dunning–Kruger effect that made me think that I can achieve what I want to do with my limited knowledge/experience on the subject.

Prior Work

Let’s start by looking into the projects that have been done to run baremetal code on android smartphones. Unfortunately, I managed to find only two projects out in the wild.

The first project (nexus7-baremetal) made me really excited because I thought nobody would ever care about writing baremetal android and also it was the only resource I had found until I gave up. The project contains some code from raspberrypi/bootloader05. This is because of the shared type of CPU family between Raspberry Pi 2 and Nexus 7 (and HTC One X as well) which happens to be ARM Cortex-A7.

The second project is lktris. The only thing makes this project interesting is it is built on top of littlekernel.

I wanted to try nexus7-baremetal project before I dive into writing my own code in rust but I couldn’t manage to run it successfully even though I told the author of the project the opposite. I thought it would be rude to make him waste his time on a project that he wrote 6 years ago and I wanted to do more research to understand the issue without any hand-holding.

I spent sometime to refresh my knowledge about android-ndk and android-sdk to be able to compile and unsuccessfully run nexus7-baremetal. It’s a bit pain to install standalone android toolchain on macOS and installing platforms, platform tools and emulators is just whole another story that I don’t want to talk about. The below command just shows how badly android sdkmanager cli is designed:

sdkmanager "system-images;android-19;google_apis;armeabi-v7a"

If you really want to use android standalone toolchain on macOS, you can run the following:

# install android standalone toolchain
brew install intel-haxm
brew install android-sdk
brew install android-ndk

# update env vars
export ANDROID_HOME=/usr/local/share/android-sdk
export ANDROID_NDK_HOME=/usr/local/share/android-ndk

# update path
export PATH=$ANDROID_HOME/tools:$PATH
export PATH=$ANDROID_HOME/platform-tools:$PATH

What I learned

Little Kernel

LK (Little Kernel) is a tiny operating system suited for small embedded devices, bootloaders, and other environments where OS primitives like threads, mutexes, and timers are needed. It also initializes the most important hardware such as MMU and UART.

LK is the Android bootloader and is also used in Android Trusted Execution Environment - “Trusty TEE” Operating System.

Android bootloader supports specially packed Android Boot Images only. These files contain the kernel, a ramdisk (root filesystem) and some metadata. The file header of these images includes sizes of all packaged files and the loading address of the kernel.

This header has a size of 0x8000 bytes followed by the kernel image. That’s why the loading address needs to be set to KERNEL_LOADING_ADDRESS - 0x8000 to get LK to the right place.

0x8000

0x8000 (32K) is in fact the size of an offset that leaves space for the parameter block in ARM architecture.

According to the ARM booting procedures:

Despite the ability to place zImage anywhere within memory, convention has it that it is loaded at the base of physical RAM plus an offset of 0x8000 (32K). This leaves space for the parameter block usually placed at offset 0x100, zero page exception vectors and page tables. This convention is very common.

Rust Cross Compilation

It’s a bit complicated to cross-compile rust binaries on macOS for armv7 and you probably knew it already. However, I am ignorant and stubborn and I battled my way to get a proper armv7 toolchain for my macbook. All I wanted to do was just to compile my project to armv7-unknown-linux-gnueabihf platform.

The first thing I’ve done was madly downloading all the packages I’ve found for Homebrew because I didn’t want to deal with crosstool-ng. Nevertheless, I end up installing it and after many failed attempts of building armv7-rpi2-linux-gnueabihf, I realized that macOS is no longer supported by crosstool-ng.

I deciced to do what any sane person would do and fired up a vagrant machine, installed all the toolchains needed and finally, the dysfunctional kernel image was compiled and linked successfully.

Why would I use a VM just to compile a binary? We are in 2019, right? I would have been OK if it was a container but this is a HUGE VM!

I went straight back to the list of Homebrew packages and figured out the only way to compile and link my kernel image is targetting armv7-unknown-linux-musleabihf by installing arm-linux-gnueabihf-binutils. Some would disagree my decision to use musl toolchain considering that baremetal code doesn’t need libc but it was the only viable way for me at that time and if you know a better way (you probably know), please let me know because I don’t have much knowledge about cross-compilation of low-level languages.

Rust targets

There is a list of all the available supported platforms and you can easily add any of them by using rustup.

rustup target add armv7-unknown-linux-musleabihf

To compile your program for a specific target you can either use cargo with --target flag:

cargo build --target=armv7-unknown-linux-musleabihf

or create .cargo/config file:

[build]
target = "armv7-unknown-linux-musleabihf"

cargo-binutils

cargo-binutils is a pretty handy plugin if you need to use LLVM tools for binary inspection and manipulation. It simply proxies the LLVM tools in the llvm-tools-preview rustup component and provides subcommands to invoke any of the tools.

Most of the tools in llvm-tools-preview are LLVM alternatives to GNU binutils. The main advantage of these LLVM tools is that they support all the architectures that the Rust compiler supports.

Rust inline assembly

Currently, there are two feature gated ways to write assembly: nasm! (requires #![feature(asm)]) and global_asm! (requires #![feature(global_asm)]) macros.

asm

nasm! uses the same basic format as GCC uses for its own inline nasm and restricts your inline assembly to fn bodies only. The syntax isn’t the best:

asm!(assembly template
   : output operands
   : input operands
   : clobbers
   : options
   );

The assembly template is the only required parameter and must be a literal string. Here’s an example (taken from the rust book):

#![feature(asm)]

fn foo() {
    unsafe {
        asm!("NOP");
    }
}

fn main() {
    // ...
    foo();
    // ...
}

global_asm

global_asm! gives you ability to write arbitrary assembly without the restriction of fn bodies.

A simple usage looks like this:

global_asm!(include_str!("boot.S"));

.section ".text.boot"

.globl _boot

_boot:
    bl      not_main

.section .text

.globl _put32

_put32:
    str     r1,[r0]
    bx      lr

Using `extern` Functions to Call Assembly Code

extern keyword facilitates the creation and use of a Foreign Function Interface (FFI). The below example demonstrates how to set up an integration with _put32 function in boot.S.

extern "C" {
    fn _put32(f: &u32, c: &u8);
}

fn main() -> ! {
    unsafe {
        _put32(&0xFF002000, &72);
    }
    loop {}
}

Calling Rust Functions from Assembly Code

extern also has another usage that allows us create an interface for other languages to call Rust functions. You need to add extern keyword and specify the ABI to use just before the fn keyword. We also need to add a #[no_mangle] annotation to tell the Rust compiler not to mangle the name of this function.

In the below example, we make not_main function accessible from boot.S file:

#[no_mangle]
pub unsafe extern "C" fn not_main() -> ! { }

Epilogue

That’s it! I consider this work as a huge win even though I failed to write a functional bootable image. I learnt to use quite useful tools on the way and now I have a better understanding around cross compilation.

Anatomy of a Hack assembly program - Part 2

2019-04-07T18:52:18+00:00

This is the second part of ‘Anatomy of a Hack assembly program’ series.

First Part
Second Part

In the first part, we learnt the details about Hack hardware platform. Now, it is a good time to deep dive into Hack assembly language before we understand how binary instructions flow through the CPU.

Hack Assembly

The Hack Assembly Language is minimal, it consists of 2 types of instructions: A-Instruction (Addressing instructions), and C-Instruction (Computation instructions). It also allows declaration of symbols.

A-Instruction

Sets the contents of the A register to the specified value. The value is either a non-negative number (i.e. 3) or a Symbol. If the value is a Symbol, then the contents of the A register is set to the value that the Symbol refers to but not the actual data in that Register or Memory Location.

Syntax

@value, where value is either a decimal non-negative number or a Symbol.

@3
@R3
@SCREEN

Binary Translation

0xxxxxxxxxxxxxxx, where x is a bit, either 0 or 1. A-Instructions always have their MSB set to 0.

000000000001010
011111111111111

C-Instruction

Performs a computation on the CPU and stores the output in a register or memory address, and then either jumps to an instruction location that is usually addressed by a symbol or continues with the next instruction.

Symbols

Symbols can be either variables or labels. Variables are symbolic names for memory addresses to make accessing these addresses easier. Labels are instruction addresses that allow jumps in the program easier to handle. There are three ways to introduce symbols into an assembly program: Predefined symbols, label symbols, and variable symbols.

Predefined Symbols

A special subset of RAM addresses can be referred to by any assembly program.

SP: RAM address 0
LCL: RAM address 1
ARG: RAM address 2
THIS: RAM address 3
THAT: RAM address 4
R0-R15: Addresses of 16 RAM Registers, mapped from 0 to 15
SCREEN: Base address of the Screen Map in Main Memory, which is equal to 16384
KBD: Keyboard Register address in Main Memory, which is equal to 24576

Label Symbols

To declare a label we need to use the command (LABEL_NAME), where LABEL_NAME can be any name we desire to have for the label, as long as it’s wraped between parentheses.

(LOOP)
// instruction 1
// instruction 2
// instruction 3
@LOOP
0;JMP

(LOOP) declares a new label called LOOP, it will be resolved to the address of the next instruction on the following line. The instruction @LOOP is an A-Instruction that sets the contents of A Register to the instruction address the label refers to.

Variable Symbols

Any user-defined symbol @variable that is not predefined using (variable) command is treated as a variable, and is assigned a unique memory address, starting at RAM address 16 (0x0010).

@i
M=0

The symbol @i declares a variable i, and the instruction M=0 sets the memory location of i in RAM to 0, the address i is stored in A-Register.

That’s it for the second part. In the next part I will explain how the CU (control unit) decodes an instruction and how the decoded instruction flows through the CPU.

Anatomy of a Hack assembly program - Part 1

2019-04-05T15:16:27+00:00

This blog series is based on nand2tetris book.

I don’t have a comprehensive knowledge of hardware nor low-level programming. However, I have been learning this so long mistery part of computers since the last Summer. I will try to do my best to explain how a Hack assembly is translated into binary instructions and how the Hack machine does process a single instruction in a fetch and execute loop.

First Part
Second Part

I was always fascinated by how the operating system orchestrates all the components on a computer but I’ve never previously had the chance to learn the low-level details of this hidden world. Since last summer, I’ve started to explore and uncover the details of this beautiful yet complex landscape and I want to share what I learnt so far from the books I read.

The first book I started to read was the Elements of Computing Systems (AKA nand2tetris) which has amazing content that uncovers most of the topics I always wanted to learn. In order to reinforce what I learnt from the book, I decided to write about how a Hack assembly program flows through hardware. I will try to do my best to explain the details an emphasize on the parts that I think really crucial.

Before we dive into a Hack assembly program, let’s look into to specification of the Hack hardware platform.

The Hack Hardware Platform Specification

The Hack platform is a 16-bit von Neumann machine, designed to execute programs written in the Hack machine language. In order to do so, the Hack platform consists of a CPU, two separate memory modules serving as instruction memory and data memory, and two memory-mapped I/O devices: a screen and a keyboard.

The Hack CPU consists of the ALU and three registers called data register (D), address register (A), and program counter (PC). While the D-register is used solely for storing data values, the A-register serves three different purposes, depending on the context in which it is used: storing a data value (just like the D-register), pointing at an address in the instruction memory, or pointing at an address in the data memory.

CPU - Parts

In order to implement the Hack CPU, we need an ALU chip capable of computing arithmetic/logical functions, a set of registers, a program counter, and some additional gates (Control Unit) designed to help decode, execute, and fetch instructions.

ALU (Arithmetic Logic Unit)

This is the part where actual processing (or the magic) happens. The Hack ALU computes a fixed set of functions out = fi(x, y) where x and y are the chip’s two 16-bit inputs, out is the chip’s 16-bit output, and fi is an arithmetic or logical function selected from 18 possible functions. We instruct the ALU which function to compute by setting six input bits, called control bits. The ALU can potentially compute 64 (2^6) different functions.

Two’s complement is used as the method of signed number representation. It allows computing of operations such as x-1 with ease: When zy and ny bits are 1, the y input is first zeroed, and then negated bit-wise. Bit-wise negation of zero gives the 2’s complement binary value of -1.

Specification

Chip name: ALU

Inputs:    x[16], y[16],                    // Two 16-bit data inputs
           zx,                              // Zero the x input
           nx,                              // Negate the x input
           zy,                              // Zero the y input
           ny,                              // Negate the y input
           f,                               // Function code: 1 for Add, 0 for And
           no                               // Negate the out output

Outputs:   out[16],                         // 16-bit output
           zr,                              // True iff out=0
           ng                               // True iff out<0

Function:  if zx then x = 0                 // 16-bit zero constant
           if nx then x = !x                // Bit-wise negation
           if zy then y = 0                 // 16-bit zero constant
           if ny then y = !y                // Bit-wise negation
           if f then out = x + y            // Integer 2's complement addition
                else out = x & y            // Bit-wise And
           if no then out = !out            // Bit-wise negation
           if out=0 then zr = 1 else zr = 0 // 16-bit eq. comparison
           if out<0 then ng = 1 else ng = 0 // 16-bit neg. comparison

Comment:   Overflow is neither detected nor handled.

The above specification gives a clear idea of the implementation of the ALU. We only need a 16-bit Adder chip and a couple of logic gates including 16-bit Multiplexor, 16-bit NOT, 16-bit AND, 8-way OR, OR, and NOT.

Figure 1: Arithmetic Logic Unit. (Taken from The Elements of Computing Systems, Chapter 2)

ALU computes one of the following instructions: x+y, x-y, y-x, 0, 1, -1, x, y, -x, -y, !x, !y, x+1, y+1, x-1, y-1, x&y, x|y on two 16-bit inputs, according to 6 input bits denoted by zx, nx, zy, ny, f, no. In addition, ALU computes two 1-bit outputs: if ALU output is 0 then zr is set to 1, otherwise zr is set to 0; if out<0 then ng is set to 1 otherwise ng is set to 0.

The below is an example implementation of the ALU in HDL (hardware description language).

// This file is part of www.nand2tetris.org
// and the book "The Elements of Computing Systems"
// by Nisan and Schocken, MIT Press.
// File name: projects/02/ALU.hdl

// Implementation: the ALU manipulates the x and y
// inputs and then operates on the resulting values,
// as follows:
// if (zx==1) set x = 0        // 16-bit constant
// if (nx==1) set x = ~x       // bitwise "not"
// if (zy==1) set y = 0        // 16-bit constant
// if (ny==1) set y = ~y       // bitwise "not"
// if (f==1)  set out = x + y  // integer 2's complement addition
// if (f==0)  set out = x & y  // bitwise "and"
// if (no==1) set out = ~out   // bitwise "not"
// if (out==0) set zr = 1
// if (out<0) set ng = 1

CHIP ALU {
    IN  
        x[16], y[16],  // 16-bit inputs
        zx, // zero the x input
        nx, // negate the x input
        zy, // zero the y input
        ny, // negate the y input
        f,  // compute  out = x + y (if 1) or out = x & y (if 0)
        no; // negate the out output

    OUT
        out[16], // 16-bit output
        zr, // 1 if (out==0), 0 otherwise
        ng; // 1 if (out<0),  0 otherwise

    PARTS:

    // if (zx==1) set x = 0
    Mux16(a=x,b=false,sel=zx,out=zxout);

    // if (zy==1) set y = 0
    Mux16(a=y,b=false,sel=zy,out=zyout);

    // if (nx==1) set x = ~x
    // if (ny==1) set y = ~y  
    Not16(in=zxout,out=notx);
    Not16(in=zyout,out=noty);
    Mux16(a=zxout,b=notx,sel=nx,out=nxout);
    Mux16(a=zyout,b=noty,sel=ny,out=nyout);

    // if (f==1)  set out = x + y
    // if (f==0)  set out = x & y
    Add16(a=nxout,b=nyout,out=addout);
    And16(a=nxout,b=nyout,out=andout);
    Mux16(a=andout,b=addout,sel=f,out=fout);

    // if (no==1) set out = ~out
    // 1 if (out<0),  0 otherwise
    Not16(in=fout,out=nfout);
    Mux16(a=fout,b=nfout,sel=no,out=out,out[0..7]=zr1,out[8..15]=zr2,out[15]=ng);

    // 1 if (out==0), 0 otherwise
    Or8Way(in=zr1,out=or1);
    Or8Way(in=zr2,out=or2);
    Or(a=or1,b=or2,out=or3);
    Not(in=or3,out=zr);
}

Registers

I am going to pass the specification and the implementation part for the registers since our subject is only about the computational part of the Hack platform. However, it is still useful to know about the types of registers that reside physically inside the CPU.

Data Register

Data Register holds the contents of the memory which are to be transferred from the immediate access storage to other components or vice versa.

Addressing Register

Addressing Register holds the memory address of data that needs to be accessed. When reading from memory, data addressed by addressing register is fed into the data register and then used by the CPU.

Program Counter (Instruction Pointer)

Program Counter holds the memory address of the next instruction that would be executed.

Control Unit

Control Unit controls the flow of data between the CPU and other components. It is contained within the CPU and reponsible for decoding the instructions, and figuring out which instruction to fetch and execute next.

CPU - Specification

Hack platform’s CPU is designed to execute 16-bit instructions according to the Hack machine language specification. The CPU should be connected to two separate memory modules: Instruction memory (ROM) and data memory (RAM).

Chip Name: CPU              // Central Processing Unit
Inputs:    inM[16],         // M value input (M = contents of RAM[A])
           instruction[16], // Instruction for execution
           reset            // Signals whether to restart the current
                            // program (reset=1) or continue executing
                            // the current program (reset=0)
Outputs:   outM[16],        // M value output
           writeM,          // Write to M?
           addressM[15],    // Address of M in data memory
           pc[15]           // Address of next instruction

The below figures shows the proposed CPU implementation. It does not show the control logic, except for inputs and outputs of control bits, labeled with a circled “c”.

Figure 2: Central Processing Unit. (Taken from The Elements of Computing Systems, Chapter 5)

CPU executes the given instruction according to Hack assembly language specification. D and A refer to CPU-resident registers while M refers to external memory location addressed by A, i.e. to RAM[A]. inM holds the value of this location. If the current instruction needs to write a value to M, the value is placed in outM, the address of the target location is placed in the addressM output, and the writeM control bit is asserted.

outM and writeM outputs are combinational: they are affected instantaneously by the execution of the current instruction. addressM and pc outputs are clocked, they commit to their new values only in the next time unit. If reset=1 then the CPU jumps to address 0 (i.e. sets pc to 0 in next time unit) rather than to the address resulting from executing the current instruction.

This is an example implementation of the CPU in HDL:

// This file is part of www.nand2tetris.org
// and the book "The Elements of Computing Systems"
// by Nisan and Schocken, MIT Press.
// File name: projects/05/CPU.hdl

CHIP CPU {

    IN  inM[16],         // M value input  (M = contents of RAM[A])
        instruction[16], // Instruction for execution
        reset;           // Signals whether to re-start the current
                         // program (reset=1) or continue executing
                         // the current program (reset=0).

    OUT outM[16],        // M value output
        writeM,          // Write into M?
        addressM[15],    // Address in data memory (of M)
        pc[15];          // address of next instruction

    PARTS:
    Mux16(a=instruction,b=ALUout,sel=instruction[15],out=Ain);

    Not(in=instruction[15],out=notinstruction);

    //RegisterA
    //when instruction[15] = 0, it is @value means A should load value
    Or(a=notinstruction,b=instruction[5],out=loadA);//d1
    ARegister(in=Ain,load=loadA,out=Aout,out[0..14]=addressM);

    Mux16(a=Aout,b=inM,sel=instruction[12],out=AMout);

    //Prepare for ALU, if it is not an instruction, just return D
    And(a=instruction[11],b=instruction[15],out=zx);//c1
    And(a=instruction[10],b=instruction[15],out=nx);//c2
    Or(a=instruction[9],b=notinstruction,out=zy);//c3
    Or(a=instruction[8],b=notinstruction,out=ny);//c4
    And(a=instruction[7],b=instruction[15],out=f);//c5
    And(a=instruction[6],b=instruction[15],out=no);//c6

    ALU(x=Dout,y=AMout,zx=zx,nx=nx,zy=zy,ny=ny,f=f,no=no,out=outM,out=ALUout,zr=zero,ng=neg);

    //when it is an instruction, write M
    And(a=instruction[15],b=instruction[3],out=writeM);//d3

    //RegisterD,when it is an instruction, load D
    And(a=instruction[15],b=instruction[4],out=loadD);//d2
    DRegister(in=ALUout,load=loadD,out=Dout);

    //Prepare for jump
    //get positive
    Or(a=zero,b=neg,out=notpos);
    Not(in=notpos,out=pos);

    And(a=instruction[0],b=pos,out=j3);//j3
    And(a=instruction[1],b=zero,out=j2);//j2
    And(a=instruction[2],b=neg,out=j1);//j1

    Or(a=j1,b=j2,out=j12);
    Or(a=j12,b=j3,out=j123);

    And(a=j123,b=instruction[15],out=jump);

    //when jump,load Aout
    PC(in=Aout,load=jump,reset=reset,inc=true,out[0..14]=pc);
}

That’s it for the first part! We’ve done a great job so far and I know it was super overwhelming but all of the above information were necessary to understand how the binary instructions flow through the control unit. In the next part I will explain the Hack assembly language and how its instructions are translated into binary.

Back to the blogging

2019-04-04T14:40:14+00:00

I finally convinced myself to start writing blog. It’s been years since the last time I invest my time to keep a proper log of my personal journey of learning.

I’m planning to write mostly about low-level programming and beloved programming language Rust. I’ve already got a series of blog posts waiting on the line dedicated to Hack assembly language.

hexo and netlify have a great role in my decision to start writing a blog. Despite hexo is written in javascript, I found it quite suitable for a “tech” blog with its immense variety of themes. It also has a support for GitHub Flavored Markdown which makes quite easy to move my notes from Boostnote to this blog.

Peace out!