Jekyll2021-04-06T20:22:55+00:00https://onatm.dev/feed.xmlExtremely random blog posts from OnatThe blog of Onat Yigit Mercan, part-time software engineer & full-time trollOnat Yigit MercanLet’s implement a Bloom Filter2020-08-10T00:00:00+00:002020-08-10T00:00:00+00:00https://onatm.dev/2020/08/10/let-s-implement-a-bloom-filter<p>I am planning to create a series of blog posts that includes some literature research, implementation of various data structures and our journey of creating a distributed datastore in <a href="https://distrentic.io">distrentic.io</a>.</p>
<p>You might be wondering why I start with a blog post explaining the Bloom Filter while I don’t have single clue about how to create a distributed datastore? My answer is simple: “I like the idea behind it”.</p>
<hr />
<p>Before I get into the details of the Bloom filters, I want to give our backstory that will help you understand <strong>why we started building something we’d enjoy during our spare time that will never be production ready</strong>.</p>
<h4 id="the-backstory">The backstory</h4>
<p>My friend Ibrahim and I are always fascinated by complex software and distibuted systems - We’ve been working together more than 5 years (we got old dude) and we were lucky enough to work for the largest e-commerce company in Europe. We battled our way solving many different problems that distributed systems can offer. We both moved to Cambridge, UK and still fighting against distributed world villains.</p>
<hr />
<p>Let’s explore the mystic land of probabilistic data structures by implementing a Bloom Filter.</p>
<h2 id="what-the-hell-is-a-bloom-filter">What the hell is a Bloom Filter</h2>
<blockquote>
<p>You might also want to read <a href="https://gopiandcode.uk/logs/log-bloomfilters-debunked.html">Bloom filters debunked</a>.</p>
</blockquote>
<p>A Bloom filter is a method for representing a set $A = {a_1, a_2,\ldots, a_n}$ of n elements (also called keys) to support membership queries. It was invented by <strong>Burton Bloom</strong> in 1970 and was proposed for use in the web context by Marais and Bharat as a mechanism for identifying which pages have associated comments stored within a <em>CommonKnowledge</em> server. <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote">1</a></sup></p>
<p>It is a space-efficient probabilistic data structure that is used to answer a very simple question: <strong>is this element a member of a set?</strong>. A Bloom filter does not store the actual elements, it only stores the <strong>membership</strong> of them.</p>
<p>False positive matches are possible, but false negatives are not – in other words, a query returns either “possibly in set” or “definitely not in set”. <sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote">2</a></sup> Unfortunately, this also means <strong>items cannot be removed from the Bloom Filter</strong> (Some other element or group of elements may be hashed to the same indices).</p>
<p>Because of its nature of being probabilistic, the Bloom Filter trades space and performance for accuracy. This is much like the CAP theorem, we choose performance over accuracy.</p>
<p>Bloom filters have some interesting use cases. For example, they can be placed on top of a datastore. When a key is queried for its existence and the filter does not have it, we can skip querying the datastore entirely.</p>
<p><img src="/assets/images/bloom_filter_example.png" alt="Example usage of a Bloom filter" />
Figure 1: Example usage of a Bloom filter.</p>
<h3 id="how-does-it-work">How does it work</h3>
<p>The idea behind Bloom filter is very simple: Allocate an array $v$ of $m$ bits, each bit in the array is initially set to $0$, and then choose $k$ independent hash functions $h_1, h_2, …, h_k$, each with range ${1,…,m}$.</p>
<p>The Bloom filter has two operations just like a standard set:</p>
<h4 id="insertion">Insertion</h4>
<p>When an element $a \in A$ is added to the filter, the bits at positions $h_1(a), h_2(a), …, h_k(a)$ in $v$ are set to $1$. In simpler words, the new element is hashed by $k$ number of functions and modded by $m$, resulting in $k$ indices into the bit array. Each bit at the respective index is set.</p>
<p><img src="/assets/images/bloom_filter_add.png" alt="Adding elements to a Bloom filter" />
Figure 2: Adding elements to a Bloom filter ($m = 10$, $k = 3$).</p>
<h4 id="query">Query</h4>
<p>To query the membership of an element $b$, we check the bits at indices $h_1(b), h_2(b), …, h_k(b)$ in $v$. If any of them is $0$, then certainly $b$ is not in the set $A$. Otherwise, we assume that $b$ is in the set although it’s possible that some other element or group of elements hashed to the same indices. This is called a <strong>false positive</strong>. We can target a specific probability of false positives by selecting an optimal value of $m$ and $k$ for up to $n$ insertions.</p>
<hr />
<p>A Bloom filter eventually reaches a point where all bits are set, which means every query will indicate membership, effectively making the probability of false positives $1$. The problem with this is it requires a priori knowledge of the data set in order to select optimal parameters and avoid “overfilling”. <sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote">3</a></sup></p>
<h3 id="finding-optimal-k-and-m">Finding optimal $k$ and $m$</h3>
<p>We can derive optimal $k$ and $m$ based on $n$ and a chosen probability of false positives $P_{FP}$.</p>
\[k = -\frac{\ln{P_{FP}}}{\ln{2}}
,
m = -\frac{n\ln{P_{FP}}}{(\ln2)^2}\]
<p>If you want to learn about how the above formulae are derived, you might want to pay a visit <a href="https://sagi.io/bloom-filters-for-the-perplexed/#appendix">here</a>.</p>
<h2 id="rust-implementation">Rust implementation</h2>
<blockquote>
<p>You can find the full implementation <a href="https://github.com/distrentic/plum">here</a>.
Huge thanks to <a href="https://github.com/xfix">@xfix</a> for <a href="https://github.com/distrentic/plum/pull/1">fixing <code class="language-plaintext highlighter-rouge">hashers</code> initialization with the same seed</a> and <a href="https://github.com/dkales">@dkales</a> for spotting <a href="https://github.com/distrentic/plum/issues/2">an issue with the ordering of operations in index calculation</a>.</p>
</blockquote>
<p>Finally! It is time to write some <code class="language-plaintext highlighter-rouge">rust</code> :heart_eyes:. I am simultaneously implementing the bloom filter whilst writing this blog post. If you don’t believe me then check the below command:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cargo new <span class="nt">--lib</span> plum
</code></pre></div></div>
<p>Let’s continue with the dependencies. There is only one dependency and we will use it to create $v$.</p>
<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[dependencies]</span>
<span class="py">bit-vec</span> <span class="p">=</span> <span class="s">"0.6"</span>
</code></pre></div></div>
<p>We will declare a new <code class="language-plaintext highlighter-rouge">struct</code> <code class="language-plaintext highlighter-rouge">StandardBloomFilter</code> to encapsulate required fields $k$ (optimal number of hash functions), $m$ (optimal size of the bit array), $v$ (the bit array), hash functions and a marker to tell rust compiler that our <code class="language-plaintext highlighter-rouge">struct</code> “owns” a <code class="language-plaintext highlighter-rouge">T</code>.</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">extern</span> <span class="n">crate</span> <span class="n">bit_vec</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">bit_vec</span><span class="p">::</span><span class="n">BitVec</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">collections</span><span class="p">::</span><span class="nn">hash_map</span><span class="p">::{</span><span class="n">DefaultHasher</span><span class="p">,</span> <span class="n">RandomState</span><span class="p">};</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">hash</span><span class="p">::{</span><span class="n">BuildHasher</span><span class="p">,</span> <span class="n">Hash</span><span class="p">,</span> <span class="n">Hasher</span><span class="p">};</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">marker</span><span class="p">::</span><span class="n">PhantomData</span><span class="p">;</span>
</code></pre></div></div>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">struct</span> <span class="n">StandardBloomFilter</span><span class="o"><</span><span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">></span> <span class="p">{</span>
<span class="n">bitmap</span><span class="p">:</span> <span class="n">BitVec</span><span class="p">,</span>
<span class="n">optimal_m</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span>
<span class="n">optimal_k</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span>
<span class="n">hashers</span><span class="p">:</span> <span class="p">[</span><span class="n">DefaultHasher</span><span class="p">;</span> <span class="mi">2</span><span class="p">],</span>
<span class="mi">_</span><span class="n">marker</span><span class="p">:</span> <span class="n">PhantomData</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Careful readers will think that I made a mistake in the declaration of <code class="language-plaintext highlighter-rouge">hashers</code> array because of the requirement of $k$ independent hash functions. It was indeed intentional here’s why:</p>
<blockquote>
<p><strong>Why two hash functions?</strong> Kirsch and Mitzenmacher demonstrated in their paper that using two hash functions $h_1(x)$ and $h_2(x)$ to simulate additional hash functions of the form $g_i(x) = h_1(x) + {i}{h_2(x)}$ can be usefully applied to Bloom filters. This leads to less computation and potentially less need for randomness in practice. <sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote">4</a></sup> This formula may appear similar to the use of pairwise indenpendent hash functions. Unfortunately, there is no formal connection between the two techniques.</p>
</blockquote>
<p>I mentioned earlier that the Bloom Filter has two operations like a standard set: insert and query. We will implement those two operations along with constructor-like <code class="language-plaintext highlighter-rouge">new</code> method.</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">impl</span><span class="o"><</span><span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">></span> <span class="n">StandardBloomFilter</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">new</span><span class="p">(</span><span class="n">items_count</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="n">fp_rate</span><span class="p">:</span> <span class="nb">f64</span><span class="p">)</span> <span class="k">-></span> <span class="n">Self</span> <span class="p">{</span>
<span class="c">// ...snip</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">new</code> calculates the size of the <code class="language-plaintext highlighter-rouge">bitmap</code> ($v$) and <code class="language-plaintext highlighter-rouge">optimal_k</code> ($k$) and then instantiates a <code class="language-plaintext highlighter-rouge">StandardBloomFilter</code>.</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">impl</span><span class="o"><</span><span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">></span> <span class="n">StandardBloomFilter</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">new</span><span class="p">(</span><span class="n">items_count</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="n">fp_rate</span><span class="p">:</span> <span class="nb">f64</span><span class="p">)</span> <span class="k">-></span> <span class="n">Self</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">optimal_m</span> <span class="o">=</span> <span class="nn">Self</span><span class="p">::</span><span class="nf">bitmap_size</span><span class="p">(</span><span class="n">items_count</span><span class="p">,</span> <span class="n">fp_rate</span><span class="p">);</span>
<span class="k">let</span> <span class="n">optimal_k</span> <span class="o">=</span> <span class="nn">Self</span><span class="p">::</span><span class="nf">optimal_k</span><span class="p">(</span><span class="n">fp_rate</span><span class="p">);</span>
<span class="k">let</span> <span class="n">hashers</span> <span class="o">=</span> <span class="p">[</span>
<span class="nn">RandomState</span><span class="p">::</span><span class="nf">new</span><span class="p">()</span><span class="nf">.build_hasher</span><span class="p">(),</span>
<span class="nn">RandomState</span><span class="p">::</span><span class="nf">new</span><span class="p">()</span><span class="nf">.build_hasher</span><span class="p">(),</span>
<span class="p">];</span>
<span class="n">StandardBloomFilter</span> <span class="p">{</span>
<span class="n">bitmap</span><span class="p">:</span> <span class="nn">BitVec</span><span class="p">::</span><span class="nf">from_elem</span><span class="p">(</span><span class="n">optimal_m</span> <span class="k">as</span> <span class="nb">usize</span><span class="p">,</span> <span class="k">false</span><span class="p">),</span>
<span class="n">optimal_m</span><span class="p">,</span>
<span class="n">optimal_k</span><span class="p">,</span>
<span class="n">hashers</span><span class="p">,</span>
<span class="mi">_</span><span class="n">marker</span><span class="p">:</span> <span class="n">PhantomData</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c">// ...snip</span>
<span class="k">fn</span> <span class="nf">bitmap_size</span><span class="p">(</span><span class="n">items_count</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="n">fp_rate</span><span class="p">:</span> <span class="nb">f64</span><span class="p">)</span> <span class="k">-></span> <span class="nb">usize</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">ln2_2</span> <span class="o">=</span> <span class="nn">core</span><span class="p">::</span><span class="nn">f64</span><span class="p">::</span><span class="nn">consts</span><span class="p">::</span><span class="n">LN_2</span> <span class="o">*</span> <span class="nn">core</span><span class="p">::</span><span class="nn">f64</span><span class="p">::</span><span class="nn">consts</span><span class="p">::</span><span class="n">LN_2</span><span class="p">;</span>
<span class="p">((</span><span class="o">-</span><span class="mf">1.0f64</span> <span class="o">*</span> <span class="n">items_count</span> <span class="k">as</span> <span class="nb">f64</span> <span class="o">*</span> <span class="n">fp_rate</span><span class="nf">.ln</span><span class="p">())</span> <span class="o">/</span> <span class="n">ln2_2</span><span class="p">)</span><span class="nf">.ceil</span><span class="p">()</span> <span class="k">as</span> <span class="nb">usize</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="nf">optimal_k</span><span class="p">(</span><span class="n">fp_rate</span><span class="p">:</span> <span class="nb">f64</span><span class="p">)</span> <span class="k">-></span> <span class="nb">u32</span> <span class="p">{</span>
<span class="p">((</span><span class="o">-</span><span class="mf">1.0f64</span> <span class="o">*</span> <span class="n">fp_rate</span><span class="nf">.ln</span><span class="p">())</span> <span class="o">/</span> <span class="nn">core</span><span class="p">::</span><span class="nn">f64</span><span class="p">::</span><span class="nn">consts</span><span class="p">::</span><span class="n">LN_2</span><span class="p">)</span><span class="nf">.ceil</span><span class="p">()</span> <span class="k">as</span> <span class="nb">u32</span>
<span class="p">}</span>
<span class="c">// ...snip</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Let’s run these calculations on <a href="https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=f83331e5f3fa5be8ec52545d03afa00f">Rust Playground</a>.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>bitmap_size: 9585059
optimal_k: 7
</code></pre></div></div>
<p>This looks really promising! A Bloom Filter that represents a set of $1$ million items with a false-positive rate of $0.01$ requires only $9585059$ bits ($~1.14\mathrm{MB}$) and 7 hash functions.</p>
<p>We managed to construct a Bloom Filter so far and it is time to implement <code class="language-plaintext highlighter-rouge">insert</code> and <code class="language-plaintext highlighter-rouge">contains</code> methods. Their implementations are dead simple and they share the same code to calculate indexes of the bit array.</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">impl</span><span class="o"><</span><span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">></span> <span class="n">StandardBloomFilter</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="c">// ...snip</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">insert</span><span class="p">(</span><span class="o">&</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span> <span class="n">item</span><span class="p">:</span> <span class="o">&</span><span class="n">T</span><span class="p">)</span>
<span class="k">where</span>
<span class="n">T</span><span class="p">:</span> <span class="n">Hash</span><span class="p">,</span>
<span class="p">{</span>
<span class="k">let</span> <span class="p">(</span><span class="n">h1</span><span class="p">,</span> <span class="n">h2</span><span class="p">)</span> <span class="o">=</span> <span class="k">self</span><span class="nf">.hash_kernel</span><span class="p">(</span><span class="n">item</span><span class="p">);</span>
<span class="k">for</span> <span class="n">k_i</span> <span class="n">in</span> <span class="mi">0</span><span class="o">..</span><span class="k">self</span><span class="py">.optimal_k</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">index</span> <span class="o">=</span> <span class="k">self</span><span class="nf">.get_index</span><span class="p">(</span><span class="n">h1</span><span class="p">,</span> <span class="n">h2</span><span class="p">,</span> <span class="n">k_i</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">);</span>
<span class="k">self</span><span class="py">.bitmap</span><span class="nf">.set</span><span class="p">(</span><span class="n">index</span><span class="p">,</span> <span class="k">true</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">contains</span><span class="p">(</span><span class="o">&</span><span class="k">mut</span> <span class="k">self</span><span class="p">,</span> <span class="n">item</span><span class="p">:</span> <span class="o">&</span><span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="nb">bool</span>
<span class="k">where</span>
<span class="n">T</span><span class="p">:</span> <span class="n">Hash</span><span class="p">,</span>
<span class="p">{</span>
<span class="k">let</span> <span class="p">(</span><span class="n">h1</span><span class="p">,</span> <span class="n">h2</span><span class="p">)</span> <span class="o">=</span> <span class="k">self</span><span class="nf">.hash_kernel</span><span class="p">(</span><span class="n">item</span><span class="p">);</span>
<span class="k">for</span> <span class="n">k_i</span> <span class="n">in</span> <span class="mi">0</span><span class="o">..</span><span class="k">self</span><span class="py">.optimal_k</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">index</span> <span class="o">=</span> <span class="k">self</span><span class="nf">.get_index</span><span class="p">(</span><span class="n">h1</span><span class="p">,</span> <span class="n">h2</span><span class="p">,</span> <span class="n">k_i</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">);</span>
<span class="k">if</span> <span class="o">!</span><span class="k">self</span><span class="py">.bitmap</span><span class="nf">.get</span><span class="p">(</span><span class="n">index</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="k">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">true</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The above methods depend on two other methods that we haven’t implemented yet: <code class="language-plaintext highlighter-rouge">hash_kernel</code> and <code class="language-plaintext highlighter-rouge">get_index</code>. <code class="language-plaintext highlighter-rouge">hash_kernel</code> is going to be the one where the actual “hashing” happens. It will return the hash values of $h_1(x)$ and $h_2(x)$.</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">impl</span><span class="o"><</span><span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">></span> <span class="n">StandardBloomFilter</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="c">// ...snip</span>
<span class="k">fn</span> <span class="nf">hash_kernel</span><span class="p">(</span><span class="o">&</span><span class="k">self</span><span class="p">,</span> <span class="n">item</span><span class="p">:</span> <span class="o">&</span><span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="p">(</span><span class="nb">u64</span><span class="p">,</span> <span class="nb">u64</span><span class="p">)</span>
<span class="k">where</span>
<span class="n">T</span><span class="p">:</span> <span class="n">Hash</span><span class="p">,</span>
<span class="p">{</span>
<span class="k">let</span> <span class="n">hasher1</span> <span class="o">=</span> <span class="o">&</span><span class="k">mut</span> <span class="k">self</span><span class="py">.hashers</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="nf">.clone</span><span class="p">();</span>
<span class="k">let</span> <span class="n">hasher2</span> <span class="o">=</span> <span class="o">&</span><span class="k">mut</span> <span class="k">self</span><span class="py">.hashers</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="nf">.clone</span><span class="p">();</span>
<span class="n">item</span><span class="nf">.hash</span><span class="p">(</span><span class="n">hasher1</span><span class="p">);</span>
<span class="n">item</span><span class="nf">.hash</span><span class="p">(</span><span class="n">hasher2</span><span class="p">);</span>
<span class="k">let</span> <span class="n">hash1</span> <span class="o">=</span> <span class="n">hasher1</span><span class="nf">.finish</span><span class="p">();</span>
<span class="k">let</span> <span class="n">hash2</span> <span class="o">=</span> <span class="n">hasher2</span><span class="nf">.finish</span><span class="p">();</span>
<span class="p">(</span><span class="n">hash1</span><span class="p">,</span> <span class="n">hash2</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>We could’ve used $128$ bit <a href="https://en.wikipedia.org/wiki/MurmurHash#MurmurHash3">MurmurHash3</a> and returned upper $64$ bit as <code class="language-plaintext highlighter-rouge">hash1</code> and the lower as <code class="language-plaintext highlighter-rouge">hash2</code> but to keep this implementation even simpler (this is how <a href="https://github.com/google/guava/blob/master/guava/src/com/google/common/hash/BloomFilterStrategies.java">Google Guava Bloom Filter implementation</a> currently works) and not to rely on any other additional dependencies I decided to continue with <code class="language-plaintext highlighter-rouge">DefaultHasher</code> - see <a href="https://en.wikipedia.org/wiki/SipHash">SipHash</a></p>
<p>Now, it is time to make the final touch. We are going to implement <code class="language-plaintext highlighter-rouge">get_index</code> by using $g_i(x) = h_1(x) + {i}{h_2(x)}$ to simulate more than two hash functions.</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">impl</span><span class="o"><</span><span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">></span> <span class="n">StandardBloomFilter</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="c">// ...snip</span>
<span class="k">fn</span> <span class="nf">get_index</span><span class="p">(</span><span class="o">&</span><span class="k">self</span><span class="p">,</span> <span class="n">h1</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span> <span class="n">h2</span><span class="p">:</span> <span class="nb">u64</span><span class="p">,</span> <span class="n">k_i</span><span class="p">:</span> <span class="nb">u64</span><span class="p">)</span> <span class="k">-></span> <span class="nb">usize</span> <span class="p">{</span>
<span class="p">(</span><span class="n">h1</span><span class="nf">.wrapping_add</span><span class="p">((</span><span class="n">k_i</span><span class="p">)</span><span class="nf">.wrapping_mul</span><span class="p">(</span><span class="n">h2</span><span class="p">))</span> <span class="o">%</span> <span class="k">self</span><span class="py">.optimal_m</span><span class="p">)</span> <span class="k">as</span> <span class="nb">usize</span>
<span class="p">}</span>
<span class="c">// ...snip</span>
</code></pre></div></div>
<h3 id="we-are-finally-there">We are finally there</h3>
<p>:tada: :tada: :tada: We’ve just finished implementing <strong>a fast variant of a standard Bloom Filter</strong> but there is still one thing missing - We didn’t write any tests.</p>
<p>Let’s add two simple test cases and validate our implementation.</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[cfg(test)]</span>
<span class="k">mod</span> <span class="n">tests</span> <span class="p">{</span>
<span class="k">use</span> <span class="nn">super</span><span class="p">::</span><span class="o">*</span><span class="p">;</span>
<span class="nd">#[test]</span>
<span class="k">fn</span> <span class="nf">insert</span><span class="p">()</span> <span class="p">{</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">bloom</span> <span class="o">=</span> <span class="nn">StandardBloomFilter</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="mi">100</span><span class="p">,</span> <span class="mf">0.01</span><span class="p">);</span>
<span class="n">bloom</span><span class="nf">.insert</span><span class="p">(</span><span class="s">"item"</span><span class="p">);</span>
<span class="k">assert</span><span class="o">!</span><span class="p">(</span><span class="n">bloom</span><span class="nf">.contains</span><span class="p">(</span><span class="s">"item"</span><span class="p">));</span>
<span class="p">}</span>
<span class="nd">#[test]</span>
<span class="k">fn</span> <span class="nf">check_and_insert</span><span class="p">()</span> <span class="p">{</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">bloom</span> <span class="o">=</span> <span class="nn">StandardBloomFilter</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="mi">100</span><span class="p">,</span> <span class="mf">0.01</span><span class="p">);</span>
<span class="k">assert</span><span class="o">!</span><span class="p">(</span><span class="o">!</span><span class="n">bloom</span><span class="nf">.contains</span><span class="p">(</span><span class="s">"item_1"</span><span class="p">));</span>
<span class="k">assert</span><span class="o">!</span><span class="p">(</span><span class="o">!</span><span class="n">bloom</span><span class="nf">.contains</span><span class="p">(</span><span class="s">"item_2"</span><span class="p">));</span>
<span class="n">bloom</span><span class="nf">.insert</span><span class="p">(</span><span class="s">"item_1"</span><span class="p">);</span>
<span class="k">assert</span><span class="o">!</span><span class="p">(</span><span class="n">bloom</span><span class="nf">.contains</span><span class="p">(</span><span class="s">"item_1"</span><span class="p">));</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>❯ cargo <span class="nb">test
</span>Compiling plum v0.1.2 <span class="o">(</span>/Users/onat.mercan/dev/distrentic/plum<span class="o">)</span>
Finished <span class="nb">test</span> <span class="o">[</span>unoptimized + debuginfo] target<span class="o">(</span>s<span class="o">)</span> <span class="k">in </span>0.99s
Running target/debug/deps/plum-6fc161db530d5b36
running 2 tests
<span class="nb">test </span>tests::insert ... ok
<span class="nb">test </span>tests::check_and_insert ... ok
<span class="nb">test </span>result: ok. 2 passed<span class="p">;</span> 0 failed<span class="p">;</span> 0 ignored<span class="p">;</span> 0 measured<span class="p">;</span> 0 filtered out
</code></pre></div></div>
<p>I hope you’ve enjoyed reading this post as much as I enjoyed writing it!</p>
<p>If you find anything wrong with <a href="https://github.com/distrentic/plum">the code</a>, you can <a href="https://github.com/distrentic/plum/issues">file an issue</a> or, even better, <a href="https://github.com/distrentic/plum/pulls">submit a pull request</a>.</p>
<p><a href="https://news.ycombinator.com/item?id=24102617">Discuss it on HN</a></p>
<h2 id="further-reading">Further Reading</h2>
<ul>
<li><a href="https://www.stavros.io/posts/bloom-filter-search-engine/">Writing a full-text search engine using Bloom filters - Stavros’ Stuff</a></li>
<li><a href="https://llimllib.github.io/bloomfilter-tutorial/">Bloom Filters by Example</a></li>
<li><a href="https://engineering.indeedblog.com/blog/2013/10/serving-over-1-billion-documents-per-day-with-docstore-v2/">Serving over 1 billion documents per day with Docstore v2 - Indeed Engineering Blog</a></li>
<li><a href="https://github.com/wiredtiger/wiredtiger/wiki/LSMTrees-Bloom">LSMTrees+Bloom - wiredtiger/wiredtiger</a></li>
<li><a href="https://www.sciencedirect.com/science/article/pii/S2212017316301591">SLSM - A Scalable Log Structured Merge Tree with Bloom Filters for Low Latency Analytics - ScienceDirect</a></li>
<li><a href="https://nivdayan.github.io/monkey-journal.pdf">Optimal Bloom Filters and Adaptive Merging for LSM-Trees</a></li>
<li><a href="http://webdocs.cs.ualberta.ca/~drafiei/papers/DupDet06Sigmod.pdf">Approximately Detecting Duplicates for Streaming Data
using Stable Bloom Filters</a></li>
</ul>
<h2 id="resources">Resources</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p><a href="http://pages.cs.wisc.edu/~cao/papers/summary-cache/node8.html">Bloom Filters - the math</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p><a href="https://en.wikipedia.org/wiki/Bloom_filter">Bloom filter - Wikipedia</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p><a href="https://bravenewgeek.com/stream-processing-and-probabilistic-methods/">Stream Processing and Probabilistic Methods: Data at Scale – Brave New Geek</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p><a href="https://www.eecs.harvard.edu/~michaelm/postscripts/rsa2008.pdf">Less Hashing, Same Performance: Building a Better Bloom Filter</a> <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Onat Yigit MercanI am planning to create a series of blog posts that includes some literature research, implementation of various data structures and our journey of creating a distributed datastore in distrentic.io.What I learned from my failed attempt of writing baremetal android in Rust2019-04-22T12:35:34+00:002019-04-22T12:35:34+00:00https://onatm.dev/2019/04/22/what-i-learned-from-my-failed-attempt-of-writing-baremetal-android-in-rust<blockquote>
<p>This post is focused mostly on the tools that I use while I failed to write a bootable kernel image in <code class="language-plaintext highlighter-rouge">rust</code>.</p>
</blockquote>
<p>Every year I define a super ambitious goal for my learning process to keep myself motivated on the way. This year I defined my goal as <strong>writing a bootable kernel image for my old HTC One X android smartphone</strong>. I knew it was going to be hard but I never thought I’d fail in the end. It was clearly the <a href="https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect">Dunning–Kruger effect</a> that made me think that I can achieve what I want to do with my limited knowledge/experience on the subject.</p>
<h2 id="prior-work">Prior Work</h2>
<p>Let’s start by looking into the projects that have been done to run <code class="language-plaintext highlighter-rouge">baremetal</code> code on android smartphones. Unfortunately, I managed to find only <strong>two</strong> projects out in the wild.</p>
<p>The first project (<a href="https://github.com/zhuowei/nexus7-baremetal">nexus7-baremetal</a>) made me really excited because I thought nobody would ever care about writing baremetal android and also it was the only resource I had found until I gave up. The project contains some code from <a href="https://github.com/dwelch67/raspberrypi/tree/master/bootloader05">raspberrypi/bootloader05</a>. This is because of the shared type of <strong>CPU</strong> family between <strong>Raspberry Pi 2</strong> and <strong>Nexus 7</strong> (and HTC One X as well) which happens to be <a href="https://en.wikipedia.org/wiki/ARM_Cortex-A7">ARM Cortex-A7</a>.</p>
<p>The second project is <a href="https://github.com/M1cha/lktris"><code class="language-plaintext highlighter-rouge">lktris</code></a>. The only thing makes this project interesting is it is built on top of <a href="https://github.com/littlekernel/lk"><code class="language-plaintext highlighter-rouge">littlekernel</code></a>.</p>
<p>I wanted to try <code class="language-plaintext highlighter-rouge">nexus7-baremetal</code> project before I dive into writing my own code in <code class="language-plaintext highlighter-rouge">rust</code> but I couldn’t manage to run it successfully even though <a href="https://github.com/zhuowei/nexus7-baremetal/issues/1#issuecomment-476751437">I told the author of the project the opposite</a>. I thought it would be rude to make him waste his time on a project that he wrote 6 years ago and I wanted to do more research to understand the issue without any hand-holding.</p>
<p>I spent sometime to refresh my knowledge about <code class="language-plaintext highlighter-rouge">android-ndk</code> and <code class="language-plaintext highlighter-rouge">android-sdk</code> to be able to compile and <em>unsuccessfully</em> run <code class="language-plaintext highlighter-rouge">nexus7-baremetal</code>. It’s a bit pain to install standalone android toolchain on macOS and installing platforms, platform tools and emulators is just whole another story that I don’t want to talk about. The below command just shows how badly android <code class="language-plaintext highlighter-rouge">sdkmanager</code> cli is designed:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sdkmanager <span class="s2">"system-images;android-19;google_apis;armeabi-v7a"</span>
</code></pre></div></div>
<p>If you really want to use android standalone toolchain on macOS, you can run the following:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># install android standalone toolchain</span>
brew <span class="nb">install </span>intel-haxm
brew <span class="nb">install </span>android-sdk
brew <span class="nb">install </span>android-ndk
<span class="c"># update env vars</span>
<span class="nb">export </span><span class="nv">ANDROID_HOME</span><span class="o">=</span>/usr/local/share/android-sdk
<span class="nb">export </span><span class="nv">ANDROID_NDK_HOME</span><span class="o">=</span>/usr/local/share/android-ndk
<span class="c"># update path</span>
<span class="nb">export </span><span class="nv">PATH</span><span class="o">=</span><span class="nv">$ANDROID_HOME</span>/tools:<span class="nv">$PATH</span>
<span class="nb">export </span><span class="nv">PATH</span><span class="o">=</span><span class="nv">$ANDROID_HOME</span>/platform-tools:<span class="nv">$PATH</span>
</code></pre></div></div>
<h1 id="what-i-learned">What I learned</h1>
<h2 id="little-kernel">Little Kernel</h2>
<p>LK (Little Kernel) is a tiny operating system suited for small embedded devices, bootloaders, and other environments where OS primitives like threads, mutexes, and timers are needed. It also initializes the most important hardware such as MMU and UART.</p>
<p>LK is the Android bootloader and is also used in <strong>Android Trusted Execution Environment</strong> - “Trusty TEE” Operating System.</p>
<p>Android bootloader supports specially packed <strong>Android Boot Images</strong> only. These files contain the <strong>kernel</strong>, a <strong>ramdisk</strong> (root filesystem) and some metadata. The file header of these images includes sizes of all packaged files and the loading address of the kernel.</p>
<p>This header has a size of <code class="language-plaintext highlighter-rouge">0x8000</code> bytes followed by the kernel image. That’s why the loading address needs to be set to <code class="language-plaintext highlighter-rouge">KERNEL_LOADING_ADDRESS</code> - <code class="language-plaintext highlighter-rouge">0x8000</code> to get LK to the right place.</p>
<h3 id="0x8000">0x8000</h3>
<p><code class="language-plaintext highlighter-rouge">0x8000</code> (<code class="language-plaintext highlighter-rouge">32K</code>) is in fact the size of an offset that leaves space for the parameter block in ARM architecture.</p>
<p>According to <a href="http://www.simtec.co.uk/products/SWLINUX/files/booting_article.html">the ARM booting procedures</a>:</p>
<blockquote>
<p>Despite the ability to place zImage anywhere within memory, convention has it that it is loaded at the base of physical RAM plus an offset of 0x8000 (32K). This leaves space for the parameter block usually placed at offset 0x100, zero page exception vectors and page tables. This convention is very common.</p>
</blockquote>
<h2 id="rust-cross-compilation">Rust Cross Compilation</h2>
<p>It’s a bit complicated to cross-compile rust binaries on macOS for <code class="language-plaintext highlighter-rouge">armv7</code> and you probably knew it already. However, I am ignorant and stubborn and I battled my way to get a proper armv7 toolchain for my macbook. All I wanted to do was just to compile my project to <code class="language-plaintext highlighter-rouge">armv7-unknown-linux-gnueabihf</code> platform.</p>
<p>The first thing I’ve done was madly downloading all the packages I’ve found for Homebrew because I didn’t want to deal with <a href="https://github.com/crosstool-ng/crosstool-ng">crosstool-ng</a>. Nevertheless, I end up installing it and after many failed attempts of building <code class="language-plaintext highlighter-rouge">armv7-rpi2-linux-gnueabihf</code>, I realized that <a href="http://crosstool-ng.github.io/docs/os-setup/#macos-aka-mac-os-x-os-x">macOS is no longer supported by crosstool-ng</a>.</p>
<p>I deciced to do what any <em>sane</em> person would do and fired up a <code class="language-plaintext highlighter-rouge">vagrant</code> machine, installed all the toolchains needed and finally, the dysfunctional kernel image was compiled and linked successfully.</p>
<p>Why would I use a VM just to compile a binary? We are in 2019, right? I would have been OK if it was a container but this is a <strong>HUGE</strong> VM!</p>
<p>I went straight back to the list of Homebrew packages and figured out the only way to compile and link my kernel image is targetting <code class="language-plaintext highlighter-rouge">armv7-unknown-linux-musleabihf</code> by installing <code class="language-plaintext highlighter-rouge">arm-linux-gnueabihf-binutils</code>. Some would disagree my decision to use <code class="language-plaintext highlighter-rouge">musl</code> toolchain considering that baremetal code doesn’t need <code class="language-plaintext highlighter-rouge">libc</code> but it was the only viable way for me at that time and if you know a better way (you probably know), please let me know because I don’t have much knowledge about cross-compilation of low-level languages.</p>
<h3 id="rust-targets">Rust targets</h3>
<p>There is a list of all the <a href="https://forge.rust-lang.org/platform-support.html">available supported platforms</a> and you can easily add any of them by using <code class="language-plaintext highlighter-rouge">rustup</code>.</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rustup target add armv7-unknown-linux-musleabihf
</code></pre></div></div>
<p>To compile your program for a specific target you can either use <code class="language-plaintext highlighter-rouge">cargo</code> with <code class="language-plaintext highlighter-rouge">--target</code> flag:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cargo build <span class="nt">--target</span><span class="o">=</span>armv7-unknown-linux-musleabihf
</code></pre></div></div>
<p>or create <code class="language-plaintext highlighter-rouge">.cargo/config</code> file:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">[</span>build]
target <span class="o">=</span> <span class="s2">"armv7-unknown-linux-musleabihf"</span>
</code></pre></div></div>
<h3 id="cargo-binutils">cargo-binutils</h3>
<p><a href="https://github.com/rust-embedded/cargo-binutils"><code class="language-plaintext highlighter-rouge">cargo-binutils</code></a> is a pretty handy plugin if you need to use <code class="language-plaintext highlighter-rouge">LLVM</code> tools for binary inspection and manipulation. It simply proxies the LLVM tools in the <code class="language-plaintext highlighter-rouge">llvm-tools-preview</code> <code class="language-plaintext highlighter-rouge">rustup</code> component and provides subcommands to invoke any of the tools.</p>
<p>Most of the tools in <code class="language-plaintext highlighter-rouge">llvm-tools-preview</code> are LLVM alternatives to GNU <code class="language-plaintext highlighter-rouge">binutils</code>. The main advantage of these LLVM tools is that they support all the architectures that the Rust compiler supports.</p>
<h3 id="rust-inline-assembly">Rust inline assembly</h3>
<p>Currently, there are two feature gated ways to write assembly: <code class="language-plaintext highlighter-rouge">nasm!</code> (requires <code class="language-plaintext highlighter-rouge">#![feature(asm)]</code>) and <code class="language-plaintext highlighter-rouge">global_asm!</code> (requires <code class="language-plaintext highlighter-rouge">#![feature(global_asm)])</code> macros.</p>
<h4 id="asm">asm</h4>
<p><a href="https://doc.rust-lang.org/1.8.0/book/inline-assembly.html"><code class="language-plaintext highlighter-rouge">nasm!</code></a> uses the same basic format as <code class="language-plaintext highlighter-rouge">GCC</code> uses for its own inline <code class="language-plaintext highlighter-rouge">nasm</code> and restricts your inline assembly to <code class="language-plaintext highlighter-rouge">fn</code> bodies only. The syntax isn’t the best:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">asm!</span><span class="p">(</span><span class="n">assembly</span> <span class="n">template</span>
<span class="p">:</span> <span class="n">output</span> <span class="n">operands</span>
<span class="p">:</span> <span class="n">input</span> <span class="n">operands</span>
<span class="p">:</span> <span class="n">clobbers</span>
<span class="p">:</span> <span class="n">options</span>
<span class="p">);</span>
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">assembly template</code> is the only required parameter and must be a literal string. Here’s an example (taken from the rust book):</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#![feature(asm)]</span>
<span class="k">fn</span> <span class="nf">foo</span><span class="p">()</span> <span class="p">{</span>
<span class="k">unsafe</span> <span class="p">{</span>
<span class="nd">asm!</span><span class="p">(</span><span class="s">"NOP"</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="c">// ...</span>
<span class="nf">foo</span><span class="p">();</span>
<span class="c">// ...</span>
<span class="p">}</span>
</code></pre></div></div>
<h4 id="global_asm">global_asm</h4>
<p><a href="https://doc.rust-lang.org/unstable-book/language-features/global-asm.html"><code class="language-plaintext highlighter-rouge">global_asm!</code></a> gives you ability to write arbitrary assembly without the restriction of <code class="language-plaintext highlighter-rouge">fn</code> bodies.</p>
<p>A simple usage looks like this:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">global_asm!</span><span class="p">(</span><span class="nd">include_str!</span><span class="p">(</span><span class="s">"boot.S"</span><span class="p">));</span>
</code></pre></div></div>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">.section</span> <span class="s">".text.boot"</span>
<span class="nf">.globl</span> <span class="nv">_boot</span>
<span class="nl">_boot:</span>
<span class="nf">bl</span> <span class="nv">not_main</span>
<span class="nf">.section</span> <span class="nv">.text</span>
<span class="nf">.globl</span> <span class="nv">_put32</span>
<span class="nl">_put32:</span>
<span class="nf">str</span> <span class="nv">r1</span><span class="p">,[</span><span class="nv">r0</span><span class="p">]</span>
<span class="nf">bx</span> <span class="nv">lr</span>
</code></pre></div></div>
<h3 id="using-extern-functions-to-call-assembly-code">Using <code class="language-plaintext highlighter-rouge">extern</code> Functions to Call Assembly Code</h3>
<p><code class="language-plaintext highlighter-rouge">extern</code> keyword facilitates the creation and use of a <strong>Foreign Function Interface (FFI)</strong>. The below example demonstrates how to set up an integration with <code class="language-plaintext highlighter-rouge">_put32</code> function in <code class="language-plaintext highlighter-rouge">boot.S</code>.</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">extern</span> <span class="s">"C"</span> <span class="p">{</span>
<span class="k">fn</span> <span class="mi">_</span><span class="nf">put32</span><span class="p">(</span><span class="n">f</span><span class="p">:</span> <span class="o">&</span><span class="nb">u32</span><span class="p">,</span> <span class="n">c</span><span class="p">:</span> <span class="o">&</span><span class="nb">u8</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="k">-></span> <span class="o">!</span> <span class="p">{</span>
<span class="k">unsafe</span> <span class="p">{</span>
<span class="mi">_</span><span class="nf">put32</span><span class="p">(</span><span class="o">&</span><span class="mi">0xFF002000</span><span class="p">,</span> <span class="o">&</span><span class="mi">72</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">loop</span> <span class="p">{}</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="calling-rust-functions-from-assembly-code">Calling Rust Functions from Assembly Code</h3>
<p><code class="language-plaintext highlighter-rouge">extern</code> also has another usage that allows us create an interface for other languages to call Rust functions. You need to add <code class="language-plaintext highlighter-rouge">extern</code> keyword and specify the ABI to use just before the <code class="language-plaintext highlighter-rouge">fn</code> keyword. We also need to add a <code class="language-plaintext highlighter-rouge">#[no_mangle]</code> annotation to tell the Rust compiler not to mangle the name of this function.</p>
<p>In the below example, we make <code class="language-plaintext highlighter-rouge">not_main</code> function accessible from <code class="language-plaintext highlighter-rouge">boot.S</code> file:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[no_mangle]</span>
<span class="k">pub</span> <span class="k">unsafe</span> <span class="k">extern</span> <span class="s">"C"</span> <span class="k">fn</span> <span class="nf">not_main</span><span class="p">()</span> <span class="k">-></span> <span class="o">!</span> <span class="p">{</span> <span class="p">}</span>
</code></pre></div></div>
<h2 id="epilogue">Epilogue</h2>
<p>That’s it! I consider this work as a huge win even though I failed to write a functional bootable image. I learnt to use quite useful tools on the way and now I have a better understanding around cross compilation.</p>Onat Yigit MercanThis post is focused mostly on the tools that I use while I failed to write a bootable kernel image in rust.Anatomy of a Hack assembly program - Part 22019-04-07T18:52:18+00:002019-04-07T18:52:18+00:00https://onatm.dev/2019/04/07/anatomy-of-a-hack-assembly-program-part-2<p><em>This is the second part of ‘Anatomy of a Hack assembly program’ series.</em></p>
<hr />
<ul>
<li><a href="/2019/04/05/Anatomy-of-a-Hack-assembly-program-Part-1/">First Part</a></li>
<li>Second Part</li>
</ul>
<hr />
<p>In the first part, we learnt the details about Hack hardware platform. Now, it is a good time to deep dive into <strong>Hack assembly</strong> language before we understand how binary instructions flow through the CPU.</p>
<h1 id="hack-assembly">Hack Assembly</h1>
<p>The Hack Assembly Language is minimal, it consists of 2 types of instructions: <strong>A-Instruction</strong> (Addressing instructions), and <strong>C-Instruction</strong> (Computation instructions). It also allows declaration of symbols.</p>
<h2 id="a-instruction">A-Instruction</h2>
<p>Sets the contents of the A register to the specified value. The value is either a non-negative number (i.e. 3) or a Symbol. If the value is a Symbol, then the contents of the A register is set to the value that the Symbol refers to but not the actual data in that Register or Memory Location.</p>
<h3 id="syntax">Syntax</h3>
<p><code class="language-plaintext highlighter-rouge">@value</code>, where value is either a decimal non-negative number or a Symbol.</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">@3</code></li>
<li><code class="language-plaintext highlighter-rouge">@R3</code></li>
<li><code class="language-plaintext highlighter-rouge">@SCREEN</code></li>
</ul>
<h3 id="binary-translation">Binary Translation</h3>
<p><code class="language-plaintext highlighter-rouge">0xxxxxxxxxxxxxxx</code>, where <code class="language-plaintext highlighter-rouge">x</code> is a bit, either 0 or 1. A-Instructions always have their MSB set to 0.</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">000000000001010</code></li>
<li><code class="language-plaintext highlighter-rouge">011111111111111</code></li>
</ul>
<h2 id="c-instruction">C-Instruction</h2>
<p>Performs a computation on the CPU and stores the output in a register or memory address, and then either jumps to an instruction location that is usually addressed by a symbol or continues with the next instruction.</p>
<h2 id="symbols">Symbols</h2>
<p>Symbols can be either variables or labels. Variables are symbolic names for memory addresses to make accessing these addresses easier. Labels are instruction addresses that allow jumps in the program easier to handle. There are three ways to introduce symbols into an assembly program: Predefined symbols, label symbols, and variable symbols.</p>
<h3 id="predefined-symbols">Predefined Symbols</h3>
<p>A special subset of <strong>RAM</strong> addresses can be referred to by any assembly program.</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">SP</code>: RAM address 0</li>
<li><code class="language-plaintext highlighter-rouge">LCL</code>: RAM address 1</li>
<li><code class="language-plaintext highlighter-rouge">ARG</code>: RAM address 2</li>
<li><code class="language-plaintext highlighter-rouge">THIS</code>: RAM address 3</li>
<li><code class="language-plaintext highlighter-rouge">THAT</code>: RAM address 4</li>
<li><code class="language-plaintext highlighter-rouge">R0</code>-<code class="language-plaintext highlighter-rouge">R15</code>: Addresses of 16 RAM Registers, mapped from 0 to 15</li>
<li><code class="language-plaintext highlighter-rouge">SCREEN</code>: Base address of the Screen Map in Main Memory, which is equal to 16384</li>
<li><code class="language-plaintext highlighter-rouge">KBD</code>: Keyboard Register address in Main Memory, which is equal to 24576</li>
</ul>
<h3 id="label-symbols">Label Symbols</h3>
<p>To declare a label we need to use the command <code class="language-plaintext highlighter-rouge">(LABEL_NAME)</code>, where <strong>LABEL_NAME</strong> can be any name we desire to have for the label, as long as it’s wraped between parentheses.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(LOOP)
// instruction 1
// instruction 2
// instruction 3
@LOOP
0;JMP
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">(LOOP)</code> declares a new label called <strong>LOOP</strong>, it will be resolved to the address of the next instruction on the following line. The instruction <code class="language-plaintext highlighter-rouge">@LOOP</code> is an <strong>A-Instruction</strong> that sets the contents of A Register to the instruction address the label refers to.</p>
<h3 id="variable-symbols">Variable Symbols</h3>
<p>Any user-defined symbol <code class="language-plaintext highlighter-rouge">@variable</code> that is not predefined using <code class="language-plaintext highlighter-rouge">(variable)</code> command is treated as a variable, and is assigned a unique memory address, starting at <strong>RAM</strong> address 16 (0x0010).</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@i
M=0
</code></pre></div></div>
<p>The symbol <code class="language-plaintext highlighter-rouge">@i</code> declares a variable <strong>i</strong>, and the instruction <code class="language-plaintext highlighter-rouge">M=0</code> sets the memory location of <strong>i</strong> in RAM to 0, the address <strong>i</strong> is stored in <strong>A-Register</strong>.</p>
<hr />
<p>That’s it for the second part. In the next part I will explain how the CU (<em>control unit</em>) decodes an instruction and how the decoded instruction flows through the CPU.</p>Onat Yigit MercanThis is the second part of ‘Anatomy of a Hack assembly program’ series.Anatomy of a Hack assembly program - Part 12019-04-05T15:16:27+00:002019-04-05T15:16:27+00:00https://onatm.dev/2019/04/05/anatomy-of-a-hack-assembly-program-part-1<p><em>This blog series is based on nand2tetris book.</em></p>
<blockquote>
<p>I don’t have a comprehensive knowledge of hardware nor low-level programming. However, I have been learning this so long mistery part of computers since the last Summer. I will try to do my best to explain how a Hack assembly is translated into binary instructions and how the Hack machine does process a single instruction in a <strong>fetch and execute</strong> loop.</p>
</blockquote>
<hr />
<ul>
<li>First Part</li>
<li><a href="/2019/04/07/Anatomy-of-a-Hack-assembly-program-Part-2">Second Part</a></li>
</ul>
<hr />
<p>I was always fascinated by how the operating system orchestrates all the components on a computer but I’ve never previously had the chance to learn the low-level details of this hidden world. Since last summer, I’ve started to explore and uncover the details of this beautiful yet complex landscape and I want to share what I learnt so far from the books I read.</p>
<p>The first book I started to read was <strong>the Elements of Computing Systems</strong> (AKA <em>nand2tetris</em>) which has amazing content that uncovers most of the topics I always wanted to learn. In order to reinforce what I learnt from the book, I decided to write about how a Hack assembly program flows through hardware. I will try to do my best to explain the details an emphasize on the parts that I think really crucial.</p>
<p>Before we dive into a Hack assembly program, let’s look into to specification of the Hack hardware platform.</p>
<h1 id="the-hack-hardware-platform-specification">The Hack Hardware Platform Specification</h1>
<p>The Hack platform is a <strong>16-bit von Neumann machine</strong>, designed to execute programs written in the Hack machine language. In order to do so, the Hack platform consists of a <strong>CPU</strong>, two separate memory modules serving as <strong>instruction memory</strong> and <strong>data memory</strong>, and <strong>two memory-mapped I/O devices</strong>: a screen and a keyboard.</p>
<p>The Hack CPU consists of the <strong>ALU</strong> and three registers called <strong>data register (D), address register (A), and program counter (PC)</strong>. While the <strong>D-register</strong> is used solely for storing data values, the <strong>A-register</strong> serves three different purposes, depending on the context in which it is used: storing a data value (just like the D-register), pointing at an address in the instruction memory, or pointing at an address in the data memory.</p>
<h2 id="cpu---parts">CPU - Parts</h2>
<p>In order to implement the Hack CPU, we need an ALU chip capable of computing arithmetic/logical functions, a set of registers, a program counter, and some additional gates (Control Unit) designed to help decode, execute, and fetch instructions.</p>
<h3 id="alu-arithmetic-logic-unit">ALU (Arithmetic Logic Unit)</h3>
<p>This is the part where actual processing (or the magic) happens. The Hack ALU computes a fixed set of functions <code class="language-plaintext highlighter-rouge">out = fi(x, y)</code> where x and y are the chip’s two 16-bit inputs, <code class="language-plaintext highlighter-rouge">out</code> is the chip’s 16-bit output, and <code class="language-plaintext highlighter-rouge">fi</code> is an arithmetic or logical function selected from 18 possible functions. We instruct the ALU which function to compute by setting six input bits, called control bits. The ALU can potentially compute 64 (2^6) different functions.</p>
<p><a href="https://en.wikipedia.org/wiki/Two%27s_complement">Two’s complement</a> is used as the method of signed number representation. It allows computing of operations such as <code class="language-plaintext highlighter-rouge">x-1</code> with ease: When <code class="language-plaintext highlighter-rouge">zy</code> and <code class="language-plaintext highlighter-rouge">ny</code> bits are <code class="language-plaintext highlighter-rouge">1</code>, the <code class="language-plaintext highlighter-rouge">y</code> input is first zeroed, and then negated bit-wise. Bit-wise negation of zero gives the 2’s complement binary value of <code class="language-plaintext highlighter-rouge">-1</code>.</p>
<h4 id="specification">Specification</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Chip name: ALU
Inputs: x[16], y[16], // Two 16-bit data inputs
zx, // Zero the x input
nx, // Negate the x input
zy, // Zero the y input
ny, // Negate the y input
f, // Function code: 1 for Add, 0 for And
no // Negate the out output
Outputs: out[16], // 16-bit output
zr, // True iff out=0
ng // True iff out<0
Function: if zx then x = 0 // 16-bit zero constant
if nx then x = !x // Bit-wise negation
if zy then y = 0 // 16-bit zero constant
if ny then y = !y // Bit-wise negation
if f then out = x + y // Integer 2's complement addition
else out = x & y // Bit-wise And
if no then out = !out // Bit-wise negation
if out=0 then zr = 1 else zr = 0 // 16-bit eq. comparison
if out<0 then ng = 1 else ng = 0 // 16-bit neg. comparison
Comment: Overflow is neither detected nor handled.
</code></pre></div></div>
<p>The above specification gives a clear idea of the implementation of the ALU. We only need a 16-bit Adder chip and a couple of logic gates including 16-bit Multiplexor, 16-bit NOT, 16-bit AND, 8-way OR, OR, and NOT.</p>
<p><img src="/assets/images/hack_alu.png" alt="hack alu" /></p>
<p>Figure 1: Arithmetic Logic Unit. (Taken from The Elements of Computing Systems, <a href="https://docs.wixstatic.com/ugd/44046b_f0eaab042ba042dcb58f3e08b46bb4d7.pdf">Chapter 2</a>)</p>
<p>ALU computes one of the following instructions: <code class="language-plaintext highlighter-rouge">x+y</code>, <code class="language-plaintext highlighter-rouge">x-y</code>, <code class="language-plaintext highlighter-rouge">y-x</code>, <code class="language-plaintext highlighter-rouge">0</code>, <code class="language-plaintext highlighter-rouge">1</code>, <code class="language-plaintext highlighter-rouge">-1</code>, <code class="language-plaintext highlighter-rouge">x</code>, <code class="language-plaintext highlighter-rouge">y</code>, <code class="language-plaintext highlighter-rouge">-x</code>, <code class="language-plaintext highlighter-rouge">-y</code>, <code class="language-plaintext highlighter-rouge">!x</code>, <code class="language-plaintext highlighter-rouge">!y</code>, <code class="language-plaintext highlighter-rouge">x+1</code>, <code class="language-plaintext highlighter-rouge">y+1</code>, <code class="language-plaintext highlighter-rouge">x-1</code>, <code class="language-plaintext highlighter-rouge">y-1</code>, <code class="language-plaintext highlighter-rouge">x&y</code>, <code class="language-plaintext highlighter-rouge">x|y</code> on two 16-bit inputs, according to 6 input bits denoted by <code class="language-plaintext highlighter-rouge">zx</code>, <code class="language-plaintext highlighter-rouge">nx</code>, <code class="language-plaintext highlighter-rouge">zy</code>, <code class="language-plaintext highlighter-rouge">ny</code>, <code class="language-plaintext highlighter-rouge">f</code>, <code class="language-plaintext highlighter-rouge">no</code>. In addition, ALU computes two 1-bit outputs: if ALU output is <code class="language-plaintext highlighter-rouge">0</code> then <code class="language-plaintext highlighter-rouge">zr</code> is set to <code class="language-plaintext highlighter-rouge">1</code>, otherwise <code class="language-plaintext highlighter-rouge">zr</code> is set to <code class="language-plaintext highlighter-rouge">0</code>; if <code class="language-plaintext highlighter-rouge">out<0</code> then <code class="language-plaintext highlighter-rouge">ng</code> is set to <code class="language-plaintext highlighter-rouge">1</code> otherwise <code class="language-plaintext highlighter-rouge">ng</code> is set to <code class="language-plaintext highlighter-rouge">0</code>.</p>
<p>The below is an example implementation of the ALU in <a href="https://en.wikipedia.org/wiki/Hardware_description_language">HDL (hardware description language)</a>.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// This file is part of www.nand2tetris.org
// and the book "The Elements of Computing Systems"
// by Nisan and Schocken, MIT Press.
// File name: projects/02/ALU.hdl
// Implementation: the ALU manipulates the x and y
// inputs and then operates on the resulting values,
// as follows:
// if (zx==1) set x = 0 // 16-bit constant
// if (nx==1) set x = ~x // bitwise "not"
// if (zy==1) set y = 0 // 16-bit constant
// if (ny==1) set y = ~y // bitwise "not"
// if (f==1) set out = x + y // integer 2's complement addition
// if (f==0) set out = x & y // bitwise "and"
// if (no==1) set out = ~out // bitwise "not"
// if (out==0) set zr = 1
// if (out<0) set ng = 1
CHIP ALU {
IN
x[16], y[16], // 16-bit inputs
zx, // zero the x input
nx, // negate the x input
zy, // zero the y input
ny, // negate the y input
f, // compute out = x + y (if 1) or out = x & y (if 0)
no; // negate the out output
OUT
out[16], // 16-bit output
zr, // 1 if (out==0), 0 otherwise
ng; // 1 if (out<0), 0 otherwise
PARTS:
// if (zx==1) set x = 0
Mux16(a=x,b=false,sel=zx,out=zxout);
// if (zy==1) set y = 0
Mux16(a=y,b=false,sel=zy,out=zyout);
// if (nx==1) set x = ~x
// if (ny==1) set y = ~y
Not16(in=zxout,out=notx);
Not16(in=zyout,out=noty);
Mux16(a=zxout,b=notx,sel=nx,out=nxout);
Mux16(a=zyout,b=noty,sel=ny,out=nyout);
// if (f==1) set out = x + y
// if (f==0) set out = x & y
Add16(a=nxout,b=nyout,out=addout);
And16(a=nxout,b=nyout,out=andout);
Mux16(a=andout,b=addout,sel=f,out=fout);
// if (no==1) set out = ~out
// 1 if (out<0), 0 otherwise
Not16(in=fout,out=nfout);
Mux16(a=fout,b=nfout,sel=no,out=out,out[0..7]=zr1,out[8..15]=zr2,out[15]=ng);
// 1 if (out==0), 0 otherwise
Or8Way(in=zr1,out=or1);
Or8Way(in=zr2,out=or2);
Or(a=or1,b=or2,out=or3);
Not(in=or3,out=zr);
}
</code></pre></div></div>
<h3 id="registers">Registers</h3>
<p>I am going to pass the specification and the implementation part for the registers since our subject is only about the computational part of the Hack platform. However, it is still useful to know about the types of registers that reside physically inside the CPU.</p>
<h4 id="data-register">Data Register</h4>
<p>Data Register holds the contents of the memory which are to be transferred from the immediate access storage to other components or vice versa.</p>
<h4 id="addressing-register">Addressing Register</h4>
<p>Addressing Register holds the memory address of data that needs to be accessed. When reading from memory, data addressed by addressing register is fed into the data register and then used by the CPU.</p>
<h4 id="program-counter-instruction-pointer">Program Counter (Instruction Pointer)</h4>
<p>Program Counter holds the memory address of the next instruction that would be executed.</p>
<h3 id="control-unit">Control Unit</h3>
<p>Control Unit controls the flow of data between the CPU and other components. It is contained within the CPU and reponsible for decoding the instructions, and figuring out which instruction to fetch and execute next.</p>
<h2 id="cpu---specification">CPU - Specification</h2>
<p>Hack platform’s CPU is designed to execute 16-bit instructions according to the Hack machine language specification. The CPU should be connected to two separate memory modules: Instruction memory (ROM) and data memory (RAM).</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Chip Name: CPU // Central Processing Unit
Inputs: inM[16], // M value input (M = contents of RAM[A])
instruction[16], // Instruction for execution
reset // Signals whether to restart the current
// program (reset=1) or continue executing
// the current program (reset=0)
Outputs: outM[16], // M value output
writeM, // Write to M?
addressM[15], // Address of M in data memory
pc[15] // Address of next instruction
</code></pre></div></div>
<p>The below figures shows the proposed CPU implementation. It does not show the <em>control logic</em>, except for inputs and outputs of control bits, labeled with a circled “c”.</p>
<p><img src="/assets/images/hack_cpu.png" alt="hack cpu" /></p>
<p>Figure 2: Central Processing Unit. (Taken from The Elements of Computing Systems, <a href="https://docs.wixstatic.com/ugd/44046b_b2cad2eea33847869b86c541683551a7.pdf">Chapter 5</a>)</p>
<p>CPU executes the given instruction according to Hack assembly language specification. <code class="language-plaintext highlighter-rouge">D</code> and <code class="language-plaintext highlighter-rouge">A</code> refer to CPU-resident registers while <code class="language-plaintext highlighter-rouge">M</code> refers to external memory location addressed by <code class="language-plaintext highlighter-rouge">A</code>, i.e. to <code class="language-plaintext highlighter-rouge">RAM[A]</code>. <code class="language-plaintext highlighter-rouge">inM</code> holds the value of this location. If the current instruction needs to write a value to M, the value is placed in <code class="language-plaintext highlighter-rouge">outM</code>, the address of the target location is placed in the <code class="language-plaintext highlighter-rouge">addressM</code> output, and the <code class="language-plaintext highlighter-rouge">writeM</code> control bit is asserted.</p>
<p><code class="language-plaintext highlighter-rouge">outM</code> and <code class="language-plaintext highlighter-rouge">writeM</code> outputs are combinational: they are affected instantaneously by the execution of the current instruction. <code class="language-plaintext highlighter-rouge">addressM</code> and <code class="language-plaintext highlighter-rouge">pc</code> outputs are clocked, they commit to their new values only in the next time unit. If <code class="language-plaintext highlighter-rouge">reset=1</code> then the CPU jumps to address 0 (i.e. sets <code class="language-plaintext highlighter-rouge">pc</code> to 0 in next time unit) rather than to the address resulting from executing the current instruction.</p>
<p>This is an example implementation of the CPU in HDL:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// This file is part of www.nand2tetris.org
// and the book "The Elements of Computing Systems"
// by Nisan and Schocken, MIT Press.
// File name: projects/05/CPU.hdl
CHIP CPU {
IN inM[16], // M value input (M = contents of RAM[A])
instruction[16], // Instruction for execution
reset; // Signals whether to re-start the current
// program (reset=1) or continue executing
// the current program (reset=0).
OUT outM[16], // M value output
writeM, // Write into M?
addressM[15], // Address in data memory (of M)
pc[15]; // address of next instruction
PARTS:
Mux16(a=instruction,b=ALUout,sel=instruction[15],out=Ain);
Not(in=instruction[15],out=notinstruction);
//RegisterA
//when instruction[15] = 0, it is @value means A should load value
Or(a=notinstruction,b=instruction[5],out=loadA);//d1
ARegister(in=Ain,load=loadA,out=Aout,out[0..14]=addressM);
Mux16(a=Aout,b=inM,sel=instruction[12],out=AMout);
//Prepare for ALU, if it is not an instruction, just return D
And(a=instruction[11],b=instruction[15],out=zx);//c1
And(a=instruction[10],b=instruction[15],out=nx);//c2
Or(a=instruction[9],b=notinstruction,out=zy);//c3
Or(a=instruction[8],b=notinstruction,out=ny);//c4
And(a=instruction[7],b=instruction[15],out=f);//c5
And(a=instruction[6],b=instruction[15],out=no);//c6
ALU(x=Dout,y=AMout,zx=zx,nx=nx,zy=zy,ny=ny,f=f,no=no,out=outM,out=ALUout,zr=zero,ng=neg);
//when it is an instruction, write M
And(a=instruction[15],b=instruction[3],out=writeM);//d3
//RegisterD,when it is an instruction, load D
And(a=instruction[15],b=instruction[4],out=loadD);//d2
DRegister(in=ALUout,load=loadD,out=Dout);
//Prepare for jump
//get positive
Or(a=zero,b=neg,out=notpos);
Not(in=notpos,out=pos);
And(a=instruction[0],b=pos,out=j3);//j3
And(a=instruction[1],b=zero,out=j2);//j2
And(a=instruction[2],b=neg,out=j1);//j1
Or(a=j1,b=j2,out=j12);
Or(a=j12,b=j3,out=j123);
And(a=j123,b=instruction[15],out=jump);
//when jump,load Aout
PC(in=Aout,load=jump,reset=reset,inc=true,out[0..14]=pc);
}
</code></pre></div></div>
<p>That’s it for the first part! We’ve done a great job so far and I know it was super overwhelming but all of the above information were necessary to understand how the binary instructions flow through the <em>control unit</em>. In the next part I will explain the Hack assembly language and how its instructions are translated into binary.</p>Onat Yigit MercanThis blog series is based on nand2tetris book.Back to the blogging2019-04-04T14:40:14+00:002019-04-04T14:40:14+00:00https://onatm.dev/2019/04/04/back-to-the-blogging<p>I finally convinced myself to start writing blog. It’s been years since the last time I invest my time to keep a proper log of my personal journey of learning.</p>
<p>I’m planning to write mostly about <code class="language-plaintext highlighter-rouge">low-level programming</code> and beloved programming language <code class="language-plaintext highlighter-rouge">Rust</code>. I’ve already got a series of blog posts waiting on the line dedicated to <code class="language-plaintext highlighter-rouge">Hack</code> assembly language.</p>
<p><a href="https://hexo.io/"><code class="language-plaintext highlighter-rouge">hexo</code></a> and <a href="https://www.netlify.com/"><code class="language-plaintext highlighter-rouge">netlify</code></a> have a great role in my decision to start writing a blog. Despite <code class="language-plaintext highlighter-rouge">hexo</code> is written in <code class="language-plaintext highlighter-rouge">javascript</code>, I found it quite suitable for a “tech” blog with its immense variety of themes. It also has a support for <strong>GitHub Flavored Markdown</strong> which makes quite easy to move my notes from <a href="https://boostnote.io/"><code class="language-plaintext highlighter-rouge">Boostnote</code></a> to this blog.</p>
<p>Peace out!</p>Onat Yigit MercanI finally convinced myself to start writing blog. It’s been years since the last time I invest my time to keep a proper log of my personal journey of learning.