Project

General

Profile

This document is the Hakeva core design for the Verilog implementation to be configured in an FPGA.

Internal RAM

Design subject to changes:
Currently, this module implements async-reads, thus the synthesis can not infer internal BRAM blocs. The module would have to be ported to synchronous reads and the testbench updated, in order to make it compatible with internal BRAM and to make it compatible with higher frequencies.

This module is a DP-SC-SW-AR RAM. It can be preloaded with an image file to be used as a ROM (if WE0+WE1 are forced to low).

The read value is continuously assigned, thus it returns the new value, in case of writes. Its interface has to remain compatible with the external RAM (SDRAM, DDR2, DDR3, ...) design.
If both ports write to the same address concurrently, the actual stored value is undeterminated. NEVER do that !

Inspired by :

FIFO

The FIFO uses the internal RAM module design and has the same limitations in terms of size : only a power of 2. Furthermore, only ((2**ADDR_WIDTH)-1) can be used. If the FIFO is empty, it can not support a Read and a Write during the same clock cycle. The Read will return undertermined value.

FIFO inspirations :

UART

UART does not need to be complete. It only serves as a test interface to send commands to the core and receive replies, until a network layer is available/implemented.
It has to expose received data and to consume data to send in a design compatible with the TCP/IP stack implementation.

A first draft implementation is done with Nandland's sample code (https://nandland.com/uart-serial-port-module/) with two FIFOs

  • Received data
  • Data to send
UART Inspirations :

External RAM

SDRAM

Inspiration :

DDR

Cached external RAM

The cache is stored either in logic elements or in BRAM. It is preloaded at reset with an external memory (SDRAM, SRAM, DDR, ...) region. It supports both reads and writes with a write-behind logic. It needs a "drain" signal, to flush the dirty data (when the FPGA has to reset but not the RAM, without loosing data).

Memory manager

TODO : Rewrite this chapter to have the memory map stored at the begining of each memory chip and cached in BRAM. This will allow Full FPGA reconfiguration (partial reconfig not supported in this design) for zero-downtime/zero-dataloss bitstream upgrade.

The RAM needs to be defragmented from time to time, if not continuously. The defragmentation needs to move data blocks which are currently used by other modules. In order to avoid any kind of locking, the other module do never store the actual RAM block address, but a pointer to the address in a RAM block allocation table.

The data is stored in internal RAM blocks (M9K) but could also be stored in external RAM (DDR2/DDR3....). The RAM block allocation table is stored in internal RAM to minimize the access latency due to the communication between chips.

The RAM Block allocation table contains 512 entries x 96 bits (49152bits = less than 64KB) and is stored at the very begining of the RAM, at address 0. The maximum number of entries is a compilation parameter that can be tuned depending on the actual hardware. Each entry contain the following fixed fields :

  • RAM block address (64bits to address up to 1TB)
  • RAM block size (64bits for a maximum size of 1TB)

Unused (free / available) slots reference address 0 (which is an invalid allocated block) and have the available RAM as a size. The table is initialized as follow.

Slot# Address Size
0 0000 1TB
1 0000 0000
... 0000 0000
511 0000 0000

Operation list (to be designed)

Allocate a block

Write a block

Allocate and Write a block

Read a block

Free a block

Read and Free a block

Read and Free a block, then Defragment

Move a block

Resize a block

Defragment

RESP decode

RESP String

RESP Errors

RESP Integers

RESP Bulkstrings

RESP Array

Internal structures

Strings

Numbers

Dictionnaries

CPU-like core

Inspiration:

Redis command decoding
Commands stored in a table with arity and branch to trigger thru a MUX.
ALU
Bus
Command pipelining
ATSHA-like crypto to crypt data at rest
ATSHA-like crypto to sign/auth the firmwares/bitstreams
Replace b-trees with n-trees and CAM-like searches, should find any key in less then logn(x)

No module support (neither preconfigured, nor dynamically configured).
Subset of Redis commands first.
One extra command returning internal statistics in a prometheus-friendly format.

decoder

Operations

PING

INFO

SET

GET

INCR/INCRBY

Also available in: PDF HTML TXT