Talk:DCache Pool installation

From neicext
Jump to navigation Jump to search

Automatic tarpool machine reboots

Goal

Automate reboots, reducing the manual work/coordination needed by site admins and central operators.

Items to consider:

  • Sites (cron job, manual actions like firmware upgrades) need to be able to signal that a reboot needs to be performed
    • Should we have only one kind of reboots? How to handle firmware upgrades that might be slower or hw service interventions?
  • Central operations needs to be able to detect that a reboot is wanted.
  • There should be a mechanism to only do reboots when site personnel is present (ie. office hours), machine detectable.
  • It is assumed that central operations shuts down dCache etc and requests a machine reboot to be performed when it's done.
  • Should this care/cater for people logged in to the machine? Ie. a site admin doing admin stuff...
  • How to handle site icinga alerts etc?

Proposal

Updates to requirements

Ssh logins with public keys allowed, preferably from the world. An authorized_keys file for the dCache user will be provided.

We should require specific hosts/networks and not require access from the world.

Suggestion of changes:

  • List which networks to allow access from (NDGF ipv4 and ipv6)
  • Replace will be provided to will be provided by central operators

Symlink or copy the ~/config/92-dcache.conf file to /etc/security/limits.d/

This should document the actual requirement, and tarpool ansible should just verify that the requirements are fulfilled.

Rationale: Some sites are using configuration management systems (ie. puppet, ansible, etc), and automatically copying files from an unprivileged user to a system dir to keep settings up2date shouldn't be a requirement.

Tape tarpools

Document handover point from site ops to tarpool ops.

Suggestion from AHM 2019-1: Site operators manage ENDIT install/upgrade and endit.conf, and communicates location of ENDIT logs to central operators.