This document is the Hakeva core design for the Verilog implementation to be configured in an FPGA. {{>toc}} h1. Internal RAM *Design subject to changes: Currently, this module implements async-reads, thus the synthesis can not infer internal BRAM blocs. The module would have to be ported to synchronous reads and the testbench updated, in order to make it compatible with internal BRAM(Block RAM) and to make it compatible with higher frequencies.* This module is a DP(Dual-Port)-SC(Single-Clock)-SW(Synchronous Write)-AR(Asynchronous Read) RAM. It can be preloaded with an image file to be used as a ROM (if WE0+WE1 are forced to low). The read value is continuously assigned, thus it returns the new value, in case of writes. Its interface has to remain compatible with the external RAM (SDRAM, DDR2, DDR3, ...) design. If both ports write to the same address concurrently, the actual stored value is undeterminated. *NEVER do that* ! Inspired by : * http://www.asic-world.com/examples/verilog/memories.html * https://www.verilog.pro/memories.html * http://techmasterplus.com/verilog/verilog-ram.php * https://www.chipverify.com/verilog/verilog-arrays-memories * https://opencores.org/projects?expanded=Memory%20core&language=Verilog h1. FIFO The FIFO uses the internal RAM module design and has the same limitations in terms of size : only a power of 2. Furthermore, only ((2**ADDR_WIDTH)-1) can be used. If the FIFO is empty, it can not support a Read and a Write during the same clock cycle. The Read will return undertermined value. FIFO inspirations : * https://www.intel.com/content/www/us/en/docs/programmable/683082/22-1/inferring-fifos-in-hdl-code.html * https://www.fpga4student.com/2017/01/verilog-code-for-fifo-memory.html * https://www.verilog.pro/fifo.html * https://www.instructables.com/Designing-a-Synchronous-FIFO-in-RTL/ * https://github.com/iammituraj/FIFOs * http://www.asic-world.com/examples/verilog/memories.html h1. UART UART does not need to be complete. It only serves as a test interface to send commands to the core and receive replies, until a network layer is available/implemented. It has to expose received data and to consume data to send in a design compatible with the TCP/IP stack implementation. A first draft implementation is done with Nandland's sample code (https://nandland.com/uart-serial-port-module/) with two FIFOs * Received data * Data to send UART Inspirations : * https://www.verilog.pro/micro_uart.html h1. External RAM h2. SDRAM Inspiration : * https://www.fpga4fun.com/SDRAM.html h2. DDR h1. Cached external RAM The cache is stored either in logic elements or in BRAM(Block RAM). It is preloaded at reset with an external memory (SDRAM, SRAM, DDR, ...) region. It supports both reads and writes with a write-behind logic. It needs a "drain" signal, to flush the dirty data (when the FPGA has to reset but not the RAM, without loosing data). h1. Memory manager *TODO : Rewrite this chapter to have the memory map stored at the begining of each memory chip and cached in BRAM. This will allow Full FPGA reconfiguration (partial reconfig not supported in this design) for zero-downtime/zero-dataloss bitstream upgrade.* The RAM needs to be defragmented from time to time, if not continuously. The defragmentation needs to move data blocks which are currently used by other modules. In order to avoid any kind of locking, the other module do never store the actual RAM block address, but a pointer to the address in a RAM block allocation table. The data is stored in internal RAM blocks (M9K) but could also be stored in external RAM (DDR2/DDR3....). The RAM block allocation table is stored in internal RAM to minimize the access latency due to the communication between chips. The RAM Block allocation table contains 512 entries x 96 bits (49152bits = less than 64KB) and is stored at the very begining of the RAM, at address 0. The maximum number of entries is a compilation parameter that can be tuned depending on the actual hardware. Each entry contain the following fixed fields : * RAM block address (64bits to address up to 1TB) * RAM block size (64bits for a maximum size of 1TB) Unused (free / available) slots reference address 0 (which is an invalid allocated block) and have the available RAM as a size. The table is initialized as follow. |_.Slot# |_.Address |_.Size | |<.0 |>.0000 |>.1TB | |<.1 |>.0000 |>.0000 | |<.... |>.0000 |>.0000 | |<.511 |>.0000 |>.0000 | h2. Operation list (to be designed) h3. Allocate a block h3. Write a block h3. Allocate and Write a block h3. Read a block h3. Free a block h3. Read and Free a block h3. Read and Free a block, then Defragment h3. Move a block h3. Resize a block h3. Defragment h1. RESP decode h2. RESP String h2. RESP Errors h2. RESP Integers h2. RESP Bulkstrings h2. RESP Array h1. Internal structures h2. Strings h2. Numbers h2. Dictionnaries h1. CPU-like core Inspiration: * https://www.fpga4student.com/2017/04/verilog-code-for-16-bit-risc-processor.html * https://www.fpga4student.com/2017/06/Verilog-code-for-ALU.html * https://www.fpga4student.com/2017/06/32-bit-pipelined-mips-processor-in-verilog-1.html Redis command decoding Commands stored in a table with arity and branch to trigger thru a MUX. ALU Bus Command pipelining ATSHA-like crypto to crypt data at rest ATSHA-like crypto to sign/auth the firmwares/bitstreams Replace b-trees with n-trees and CAM-like searches, should find any key in less then logn(x) No module support (neither preconfigured, nor dynamically configured). Subset of Redis commands first. One extra command returning internal statistics in a prometheus-friendly format. h2. decoder h2. Operations h3. PING h3. INFO h3. SET h3. GET h3. INCR/INCRBY