Site Admin Operations basics

From neicext
(Redirected from Operations basics)
Jump to navigation Jump to search

Prerequisites for an admin of the site from NDGF-T1

  1. Get a eScience certificate. Sectigo is the first stop. (DK, SE, FI, [ NO?]) If your institution is not registered here, use an alternative. Alternatives can be NorduGrid or the CERN CA. Request a "GÉANT Personal Authentication - RSA" certificate.
  2. Access GOCDB, authorize with your new certificate and apply for site Administrator role. NDGF will approve your request.
  3. Register yourself in GGUS with your certificate and request a Support role. When specifying the reason to get a Support role, write that you are a local staff and have to be able to answer to the tickets concerning your site.
  4. Wait. The GOCDB roles takes multiple hours to percolate down to the services after it has been approved.
  5. Subscribe to NDGF-sysadmins mailing list. The common information NDGF distributes to the local staff is coming through this list.
  6. Optional but recommended: subscribe to wlcg-arc-ce-discuss@cern.ch mailing list. It's the place to share information and discuss issues, related to running ARC CE for WLCG, with ARC developers and experiment representatives present on the list. If you have a CERN account, subscribe through e-groups; if you don't, send an email to wlcg-middleware-officer@cern.ch.
  7. Get access to both the external and internal wiki by sending the DN of your certificate to urkedal@ndgf.org.
  8. NDGF day-to-day communication goes through Rocketchat on [1]. It is not required to be present there at all times (though it greatly reduces communication delays and is really encouraged). But at some occasions your presence there is required, so make sure you're able to connect. See chat info (internal wiki, so complete the previous bullet to get access).
  9. If you want to view your resources in NDGF's Ganglia/Nagios/Dashboard, send the DN of your certificate to support@ndgf.org to get an access to monitoring tools.
  10. If you want to access the SGAS (Accounting) views (https://count.ndgf.org:6143/sgas/view), send your DN to erik.edelmann@csc.fi.
  11. When you have access to the internal wiki, register your contact details on this page, so NDGF knows how to reach you if there is a problem on your site. In addition to your personal contacts you can share the site's general contact points, if it is not listed there yet. Also do not hesitate to correct/remove/expand erroneous contact information for your site.

Alternative way to install CA certificates.

As an alternative to download and selectively install CA certificates, you can run this script and install all of them at once. Works for both Firefox and Chrome. The source for this script is unknown so use it at your own risk. Making a backup of your profile before running it can be wise.

File:Grid-security-importer.sh.gz

Examples:

  • ./grid-security-importer.sh -v .mozilla/firefox/isnnsbc6.default
  • ./grid-security-importer.sh -C

Local staff responsibilities

The formal document describing what NDGF requires from the local site staff is yet to come. In the meantime, here we list the general expectations we have for the local staff:

  1. The site should have staff that is available during office hours and can answer correspondence in a timely manner even during vacation seasons.
  2. The staff also should have enough knowledge about the involved systems and the necessary means to fix problems.
  3. NDGF should be aware of the changes in the site staff (in this case do not forget to update site contacts, as described in prerequisites).

Operations

  1. You may from time to time want to have a look at your resource in NDGF monitoring tools to see if there are any problems. The most important are dashboard, ganglia, nagios, dcache monitors (only for sites with dcache storage), accounting statistics. dcache monitors are open to everybody with a valid grid certificate in the browser, the others you have to apply to access to, as described in prerequisites.
  2. Every Friday there is a weekly meeting at 10:10 CET. Mostly dedicated to discuss what has happened this week and what will happen in the next. The meeting happens in our chat and follows the discussion points outlined in the dedicated wiki page. To find a meeting page, go here (for 2017) and find a page corresponding to the Friday this week. The local staff is asked to update at least 'Site report' section of the page before the meeting, so NDGF can track the activities on the site. Put "Nothing to report" if nothing notable has happened on your site during the week. If you think some of this information is needed to be discussed by NDGF staff during the meeting, make it bold. Also, feel free to read, update, and bold any information in other sections, if you need the subject to be discussed. Of course, the local site staff is more than welcome to participate in the meeting itself, but it is not required.
  3. Sometimes the presence in the NDGF chat is required, usually during major upgrade rounds. In this case the local staff is notified in advance through e-mail, and we expect them to appear in the chat on the day and time specified.
  4. You can get a notification if there is a ticket in the GGUS ticket system regarding your site. It can either come with a NDGF's Operator-on-Duty poking you directly, or a notification from GGUS when the ticket is assigned to your site or directly to you by the submitter/Operator-on-Duty. In this case we expect you to communicate in the ticket directly (through 'Public diary' field), informing us that you're aware of the problem now, are working on it and how you intend to solve it. We also expect the regular updates on how the work is going on, if it requires more than a couple of days to fix the problem. When you reply to the ticket for the first time, feel free to change the status of the ticket to 'In Progress' (if it hasn't been done by the Operator-on-Duty yet). That will notify both us and the submitter that the problem is being addressed. Update the ticket when the problem is solved, so the OoD/submitter can check if the problem has disappeared and mark the ticket as solved.
  5. The staff should communicate planned interruptions in the services to the OoD at least 24h in advance. Write to support@ndgf.org, or in the chat. The OoD can always ask you to move the interruption to another day/time, if the time for your interruption risks to render the whole Nordic Tier-1 center under capacity. For the planned interruptions the downtimes have to be created in GOCDB. Usually the OoD takes care of that, but if you want, you can create a downtime yourself. You may ask the OoD anyway the first time you use GOCDB how one creates a downtime there and which type should it be. The notification to the OoD is mandatory even if you create a downtime yourself.
  6. When NDGF requests software upgrades and configurations changes on the site, we expect it to be done within a week, unless otherwise specified in the request email.

Site setup

If you need help setting up a site please take a look at the documentation available in the wiki. If you are still confused, please join the chat as there will likely be someone around that can help you.