Tier1 Hardware Procurements

From neicext
Jump to navigation Jump to search

Introduction

This page aims to collect experience of the various hardware related to the Tier1 NDGF activity.

Disk pools

HPC2N - Dell R740xd2 - LFF chassis - purchase 1

8 machines for Tier1 (11 machines for Swestore) delivered October 2020, in service from December 2020.

  • Dell 740 xd2, LFF chassis "2U double-deep"
  • 2 x Intel(R) Xeon(R) Silver 4210R CPU, each 10 cores @ 2.40GHz
    • Two sockets populated due to PCIe slot assignment
  • 96G RAM
  • Data storage
    • H730P RAID controller, 2G non-volatile (flash-backed) cache
    • In total 26 16T SAS LFF HDDs
      • 12 x 16T SAS LFF hot-swap HDDs in the front visible
      • 12 x 16T SAS LFF hot-swap HDDs in the front behind visible HDDs
      • 2 x 16T SAS LFF hot-swap HDDs in the back
    • RAID 60 with two parity groups, 1024k strip-size
      • perccli /c0 add vd r60 size=all name=dCacheVD drives=32:0-25 pdperarray=13 wb strip=1024
    • XFS, created aligned with -d su=1024k,sw=11 directly on disk device.
  • OS storage
    • Dell BOSS - two mirrored M.2 cards in a PCIe slot providing an AHCI device.
  • QLogic FastLinQ 41262 Dual Port 10/25GbE SFP28 Adapter (1 port used)
  • iDRAC (management) with dedicated network port
  • Extra tuning applied
    • ATTR{bdi/read_ahead_kb}="16384"
    • ATTR{queue/scheduler}="deadline"
    • ATTR{queue/nr_requests}="128"
    • ATTR{queue/iosched/writes_starved}="10"

HPC2N notes:

  • No cable arm needed!
  • Sits on shelves, not rack rails, so need to be 2 persons to handle due to weight.
  • Had one machine with slow CPUs, after system board replacement seems to work OK.
  • RAID controller or SAS expander cabling/setup seems a bit underprovisioned, but that was expected with this many HDDs in machine.
    • Meets performance requirements set when procuring, so more of a notice on machine design and what to expect.

NDGF shakedown notes:

  • Bulk transfer performance 1-2GByte/s on each node. Sometimes reaching close to 25GbE link speed.
    • When filling for the first time, noticeable difference between start and end.
    • Theoretical evacuation time estimated to be about 2 days.
  • Mixed load about 1GByte/s in each direction, slightly favoring reads.
  • Pool startup full with atlas files (2M files) about 3-4 minutes.
  • File removal rate at about 800k/hour

HPC2N - Dell R740xd2 - LFF chassis - purchase 2

7 machines for Tier1 delivered January 2022, in service from XXXX 2022.

  • Dell 740 xd2, LFF chassis "2U double-deep"
  • 2 x Intel(R) Xeon(R) Silver 4215 CPU, each 8 cores @ 2.50GHz
    • Two sockets populated due to PCIe slot assignment
  • 96G RAM
  • Data storage
    • H730P RAID controller, 2G non-volatile (flash-backed) cache
      • We wanted the better H740 RAID controller, but Dell couldn't deliver that
    • In total 26 18T SAS LFF HDDs
      • 12 x 18T SAS LFF hot-swap HDDs in the front visible
      • 12 x 18T SAS LFF hot-swap HDDs in the front behind visible HDDs
      • 2 x 18T SAS LFF hot-swap HDDs in the back
    • RAID 60 with two parity groups, 1024k strip-size
      • perccli /c0 add vd r60 size=all name=dCacheVD drives=32:0-25 pdperarray=13 wb strip=1024
    • XFS, created aligned with -d su=1024k,sw=11 directly on disk device.
      • Label set with -L labelname and mounted using LABEL=labelname in fstab
  • OS storage
    • Dell BOSS - two mirrored M.2 cards in a PCIe slot providing an AHCI device.
  • Intel XXV710 25G Ethernet NIC (1 port used)
    • Long delivery times due to component shortage, NICs will arrive later sometime...
    • 2 machines initially deployed with spare 25G NICs, 5 machines with borrowed 10G NICs.
    • NICs arrived 1st week of February, painless operation to swap (same interface name, so no changes needed to UEFI and OS setup other than change MAC addresses for network boot DHCP).
  • iDRAC (management) with dedicated network port
  • Extra tuning applied via udev rules (also see Operations Tuning Linux)
    • Increase read-ahead to match workload:
      • ATTR{bdi/read_ahead_kb}="16384"
    • Tunings for mq-deadline scheduler (default in recent kernels), carried over from our previous deadline scheduler tunings:
      • ATTR{queue/nr_requests}="128"
      • ATTR{queue/iosched/writes_starved}="10"

HPC2N notes:

  • No cable arm needed!
  • Sits on shelves, not rack rails, so need to be 2 persons to handle due to weight.
  • H730P RAID controller or SAS expander cabling/setup seems a bit underprovisioned, but we knew that from previous purchase
    • Really recommend going for the better H740 RAID controller if possible.
    • Meets performance requirements set when procuring, so more of a notice on machine design and what to expect.

NDGF shakedown notes:

  • See HPC2N R740xd2 purchase 1.
  • Pure read load with a concurrency of 10-100 in the 2.3-2.5GB/s range (but on a partially filled pool)

HPC2N - Dell R730xd - LFF chassis

8 machines for Tier1 delivered and taken in service during March 2017.

  • As above, with the following exceptions
    • 1 x Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz CPU.
    • 2 x 120G SATA SFF hot-swap Boot-grade SSDs in the back

HPC2N experience is good build quality, decent performance. Doesn't take forever to reboot.

Single-stream write/read benchmark (dd bs=256k) clocks in at approx 2000 MB/s at the beginning of the storage area and approx 1000 MB/s at the end, this is in line with general performance of capacity-oriented HDDs.

CSC - HP Apollo XL450 SL4510 Gen9

December 2015 purchased by HIP.

Judgment: (to be added by NDGF)

3 chassii for ALICE:

  • 68 LFF x 4TB 7200RPM SAS in each 4U chassis
  • 1x server per chassis with 128GB RAM, 2x Xeon E5-2640v3 @ 2.6GHz and 2x 500GB SFF 7200RPM SATA for OS behind a B140i controller (needs UEFI boot for RAID features of card)
  • 8 extra disks in an extra drive cage at the back. Replacing that cage requires taking all the 68 disks out of the chassis
  • 2x P440 4GB cache - half of the disks are behind each controller. Extra drive cage at the back is also split between the controllers
  • 40GbE
  • Having dCache pool's meta/ on the OS disks á la symlink is bad for performance - better to have them on the RAID6 arrays.

Array config is 2 RAID6 per controller and a shared spare between the arrays. The disks at the back of the chassis are warmer than the ones in front and tend to fail more often (in a Gen8 of the Apollo) so each array has disks from the front all the way to the back.

Each sever has: 2((14+2)+(15+2)+1)

hpssacli ctrl slot=1 create type=ld drives=1I:1:2-1I:1:3,1I:1:7-1I:1:9,1I:1:13-1I:1:15,1I:1:19-1I:1:21,1I:1:25-1I:1:27,1I:1:65-1I:1:66 raid=6
hpssacli ctrl slot=1 create type=ld drives=1I:1:4-1I:1:6,1I:1:10-1I:1:12,1I:1:16-1I:1:18,1I:1:22-1I:1:24,1I:1:28-1I:1:30,1I:1:67-1I:1:68 raid=6
hpssacli ctrl slot=2 create type=ld drives=1I:1:32-1I:1:33,1I:1:37-1I:1:39,1I:1:43-1I:1:45,1I:1:49-1I:1:51,1I:1:55-1I:1:57,1I:1:61-1I:1:62 raid=6
hpssacli ctrl slot=2 create type=ld drives=1I:1:34-1I:1:36,1I:1:40-1I:1:42,1I:1:46-1I:1:48,1I:1:52-1I:1:54,1I:1:58-1I:1:60,1I:1:63-1I:1:64 raid=6

Parted used to create one partition 0% - 100% per logical drive. XFS created without any tuning (only making a label). Rebuild takes 25h with no load on high priority. Reboot takes ~250s

On a single sample of a system in use - with an inlet temperature of 22-23C the temperature of the disks are between 25 (front) to 38 (back).

Tape pools

HPC2N - Dell R740 - SFF chassis

  • Delivered 2019-11-18, tested during December, taken into production late 2019/early 2020.
  • 25 GigE networking
  • 10 of 1.6TB SSD SAS Mix Use, 3 DWPD, 8760 TBW
    • OS storage is a 50G LUN (VD)
    • Dedicated VD for dCache
  • PERC H740P RAID-controller
  • One Intel Xeon Silver 4215 CPU, 2.5GHz 8core
  • 48G RAM

Using hardware raid, VD created with storcli /c0 add vd r5 size=all name=dCacheVD drives=64:0-9 wt ra strip=64 which gives us a balanced 3 GB/s read performance when doing 3 GB/s write at the same time. Other cache settings gave us uneven read/write performance. Endurance on fs-level is 9*8760 = 78840 TBW minus fs overhead.

The CPU is a bit underpowered according to the sizing guidelines (it's calculated to be more than enough for 10GigE, but we revised to 25GigE during the process...)

ARC Cache

HPC2N - Dell R730xd - SFF chassis

14 machines delivered during February 2016, taken into production during March 2016. Taken out of production November 2021.

Non-uniform disk size due to vendor actually optimizing bid based on evaluation criteria.

  • Dell 730 xd, LFF chassis
  • 1 x Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
  • 64G RAM
  • Data storage
    • RAID10, 64k strip size
    • xfs created aligned with -d su=64k,sw=4
      • 8 x 2T SAS SFF hot-swap HDDs in the front
    • RAID10, 64k strip size
    • xfs created aligned with -d su=64k,sw=8
      • 16 x 1T SAS SFF hot-swap HDDs in the front
  • OS storage
    • RAID1
      • 2 x 250G SATA SFF hot-swap HDDs in the back
  • H730P RAID controller, 2G non-volatile (flash-backed) cache
  • 2 x 10GBASE-T network ports
    • One port on LHC OPN
    • One port on cluster network
  • iDRAC (management) with dedicated network port

HPC2N - HPE DL325

4 machines delivered during July 2021, partly taken into production during October 2021

  • HPE DL325 Gen10+
  • 1 x AMD® EPYC 7420P @
  • 128G RAM
  • Data Storage
    • 8 × 1.6 TB NVMe U.3
    • ZFS RAID0
      • ashift=12 recordsize=1Mb
      • 11.6 TB usable space per machine
      • split into two datasets cache and session
      • session has lz4 compression enabled and cache not
  • OS storage
    • HW RAID1
      • 2 × 480 GB NVMe M.2
    • xfs
  • 2 × 25 Gb
    • 1 port on LHC OPN
  • 200 Gb Mellanox
    • Used at 40Gb on cluster network
  • ILO5 (management) with dedicated network port

HPC2N notes