Talk:DCache Pool installation
Automatic tarpool machine reboots
Goal
Automate reboots, reducing the manual work/coordination needed by site admins and central operators.
Items to consider:
- Sites (cron job, manual actions like firmware upgrades) need to be able to signal that a reboot needs to be performed
- Should we have only one kind of reboots? How to handle firmware upgrades that might be slower or hw service interventions?
- Central operations needs to be able to detect that a reboot is wanted.
- There should be a mechanism to only do reboots when site personnel is present (ie. office hours), machine detectable.
- It is assumed that central operations shuts down dCache etc and requests a machine reboot to be performed when it's done.
- Should this care/cater for people logged in to the machine? Ie. a site admin doing admin stuff...
- How to handle site icinga alerts etc?
Proposal
Updates to requirements
Ssh logins with public keys allowed, preferably from the world. An authorized_keys file for the dCache user will be provided.
We should require specific hosts/networks and not require access from the world.
Suggestion of changes:
- List which networks to allow access from (NDGF ipv4 and ipv6)
- Replace will be provided to will be provided by central operators
Symlink or copy the ~/config/92-dcache.conf file to /etc/security/limits.d/
This should document the actual requirement, and tarpool ansible should just verify that the requirements are fulfilled.
Rationale: Some sites are using configuration management systems (ie. puppet, ansible, etc), and automatically copying files from an unprivileged user to a system dir to keep settings up2date shouldn't be a requirement.
Tape tarpools
Document handover point from site ops to tarpool ops.
Suggestion from AHM 2019-1: Site operators manage ENDIT install/upgrade and endit.conf, and communicates location of ENDIT logs to central operators.