This document is the Hakeva core design for the Verilog implementation to be configured in an FPGA. 

{{>toc}}


h1. Internal RAM

*Design subject to changes:
Currently, this module implements async-reads, thus the synthesis can not infer internal BRAM blocs. The module would have to be ported to synchronous reads and the testbench updated, in order to make it compatible with internal BRAM(Block RAM) and to make it compatible with higher frequencies.*

This module is a DP(Dual-Port)-SC(Single-Clock)-SW(Synchronous Write)-AR(Asynchronous Read) RAM. It can be preloaded with an image file to be used as a ROM (if WE0+WE1 are forced to low).

The read value is continuously assigned, thus it returns the new value, in case of writes. Its interface has to remain compatible with the external RAM (SDRAM, DDR2, DDR3, ...) design.
If both ports write to the same address concurrently, the actual stored value is undeterminated. *NEVER do that* !

Inspired by :
* http://www.asic-world.com/examples/verilog/memories.html
* https://www.verilog.pro/memories.html
* http://techmasterplus.com/verilog/verilog-ram.php
* https://www.chipverify.com/verilog/verilog-arrays-memories
* https://opencores.org/projects?expanded=Memory%20core&language=Verilog



h1. FIFO

The FIFO uses the internal RAM module design and has the same limitations in terms of size : only a power of 2. Furthermore, only ((2**ADDR_WIDTH)-1) can be used. If the FIFO is empty, it can not support a Read and a Write during the same clock cycle. The Read will return undertermined value.


FIFO inspirations :
* https://www.intel.com/content/www/us/en/docs/programmable/683082/22-1/inferring-fifos-in-hdl-code.html
* https://www.fpga4student.com/2017/01/verilog-code-for-fifo-memory.html
* https://www.verilog.pro/fifo.html
* https://www.instructables.com/Designing-a-Synchronous-FIFO-in-RTL/
* https://github.com/iammituraj/FIFOs
* http://www.asic-world.com/examples/verilog/memories.html



h1. UART

UART does not need to be complete. It only serves as a test interface to send commands to the core and receive replies, until a network layer is available/implemented.
It has to expose received data and to consume data to send in a design compatible with the TCP/IP stack implementation.

A first draft implementation is done with Nandland's sample code (https://nandland.com/uart-serial-port-module/) with two FIFOs

* Received data
* Data to send

UART Inspirations :
* https://www.verilog.pro/micro_uart.html



h1. External RAM

h2. SDRAM

Inspiration :
* https://www.fpga4fun.com/SDRAM.html

h2. DDR



h1. Cached external RAM

The cache is stored either in logic elements or in BRAM(Block RAM). It is preloaded at reset with an external memory (SDRAM, SRAM, DDR, ...) region. It supports both reads and writes with a write-behind logic. It needs a "drain" signal, to flush the dirty data (when the FPGA has to reset but not the RAM, without loosing data).



h1. Memory manager

*TODO : Rewrite this chapter to have the memory map stored at the begining of each memory chip and cached in BRAM. This will allow Full FPGA reconfiguration (partial reconfig not supported in this design) for zero-downtime/zero-dataloss bitstream upgrade.*

The RAM needs to be defragmented from time to time, if not continuously. The defragmentation needs to move data blocks which are currently used by other modules. In order to avoid any kind of locking, the other module do never store the actual RAM block address, but a pointer to the address in a RAM block allocation table.

The data is stored in internal RAM blocks (M9K) but could also be stored in external RAM (DDR2/DDR3....). The RAM block allocation table is stored in internal RAM to minimize the access latency due to the communication between chips.

The RAM Block allocation table contains 512 entries x 96 bits (49152bits = less than 64KB) and is stored at the very begining of the RAM, at address 0. The maximum number of entries is a compilation parameter that can be tuned depending on the actual hardware. Each entry contain the following fixed fields :

* RAM block address (64bits to address up to 1TB)
* RAM block size (64bits for a maximum size of 1TB)

Unused (free / available) slots reference address 0 (which is an invalid allocated block) and have the available RAM as a size. The table is initialized as follow.

|_.Slot# |_.Address       |_.Size    |
|<.0     |>.0000          |>.1TB     |
|<.1     |>.0000          |>.0000    |
|<....   |>.0000          |>.0000    |
|<.511   |>.0000          |>.0000    |

h2. Operation list (to be designed)

h3. Allocate a block

h3. Write a block

h3. Allocate and Write a block

h3. Read a block

h3. Free a block

h3. Read and Free a block

h3. Read and Free a block, then Defragment

h3. Move a block

h3. Resize a block

h3. Defragment



h1. RESP decode

h2. RESP String

h2. RESP Errors

h2. RESP Integers

h2. RESP Bulkstrings

h2. RESP Array



h1. Internal structures

h2. Strings

h2. Numbers

h2. Dictionnaries
 


h1. CPU-like core

Inspiration:
* https://www.fpga4student.com/2017/04/verilog-code-for-16-bit-risc-processor.html
* https://www.fpga4student.com/2017/06/Verilog-code-for-ALU.html
* https://www.fpga4student.com/2017/06/32-bit-pipelined-mips-processor-in-verilog-1.html


Redis command decoding
Commands stored in a table with arity and branch to trigger thru a MUX.
ALU
Bus
Command pipelining
ATSHA-like crypto to crypt data at rest
ATSHA-like crypto to sign/auth the firmwares/bitstreams
Replace b-trees with n-trees and CAM-like searches, should find any key in less then logn(x)

No module support (neither preconfigured, nor dynamically configured).
Subset of Redis commands first.
One extra command returning internal statistics in a prometheus-friendly format.

h2. decoder

h2. Operations

h3. PING

h3. INFO

h3. SET

h3. GET

h3. INCR/INCRBY