E3DDS Specifications for the Test Cluster

From neicext
Jump to navigation Jump to search

Single Node Type

Here we specify a single node type in order to buy five (5) identical servers

Generic Nodes

Minimum server motherboard specifications

Dual AMD EPYC 7002 series processor family.

2 x PCIe4/16 (for the dual port 100 Gb/s NIC cards)

2 x PCIe4/8 for SSD cards.

OCP 2.0 port or PCIe4/16 for acceleration (FPGA or GPU).

2 x M.2 slots for SSDs

e.g.

Configuration

RAM: All RAM sockets populated with best value RAM/price.

1 TB HDD

1 TB SSD (2 x M.2 500GB)

2 x dual 100 Gb/s NIC

Two Node Types

Here we specify two node types to reflect that the Ring Buffer and Prompt Computing nodes have different tasks. The RB nodes are designed for high-memory throughput. The prompt computing nodes need to perform more compute-intensive work. Here we anticipate three (3) RB nodes and two (2) Prompt Computing nodes.

Ring Buffer Nodes

Minimum server motherboard specifications

Single AMD EPYC 7002 series processor family.

4 x PCIe4x16 (for the quad port 25 Gb/s NIC cards and IB card)

1 x PCIe4x8 for SSD card.

2 x M.2 slots for SSDs

e.g.

Configuration

RAM: 256 GB expandable to 4 TB.

1 TB HDD

1 TB SSD (2 x M.2 500GB)

3 x quad port 25 Gb/s NIC

1 x Infiniband NIC HDR 200Gb/s ports

Prompt Computing Nodes

Minimum server motherboard specifications

Dual Socket AMD EPYC 7002 series processor family.

2 x OCP 2.0 or PCIe3x16 for acceleration (FPGA or GPU).

eg.

Configuration

RAM: 256 GB expandable to 4 TB.

1 TB HDD

1 x Infiniband NIC HDR 200Gb/s ports

1 of the nodes should be equipped with 2x GPUs for evaluation. Need to check compatibility between RDMA on IB card and GPU.

Network Switch

  • Network switch that can run 100/25 Gb/s at full speed. No bottlenecks between different port clusters within the switch. e.g. 4x100G + 18x25G ethernet

Diagram

E3D Test Cluster.png