NT1/SG-Meeting-2020-05-20
NT1 meeting 2020-05-20, 09:30 - 11:30
Online: link see nt1-sg chat
Invited: Mattias Wadenstein, HPC2N; Oxana Smirnova, HEP, LU; Michaela Barth, PDC-HPC
Agenda
- Presence and approval of agenda
- Report outcome NT1 update to NeIC board
- Report highlights
- Staff changes
- Open actionpoints
- NORDUnet network contract followup
- EGI membership and potential consequences of Sweden leaving
- Onboarding new communities
- Ideal manpower and fall budget
- NT1 openness considerations
- AOB
Minutes (approved)
Presence and approval of agenda
Mattias, Michaela, Oxana
Decision: We have quorum. The agenda is approved.
Report outcome NT1 update to NeIC board
Citing https://wiki.neic.no/wiki/Category:Board_meeting_minutes
Presentation given: https://www.dropbox.com/s/pyf2cku5l9n2hma/20200418-NT1BoardUpdate-1.pdf?dl=0
20-07.Nordic Tier-1 Status Report
The NT1 Project Manager and Project Owner presented the NT status report. Please find the presentation here. They noted the need for sustainability and sufficient levels of FTEs allocated to the activity. They also presented plans for onboarding new communities and CMS integration, which the Board commended. In response to questions from the Board the Project Manager said that scaling up to HL-LHC demands would probably not increase manpower needs. Future funding for NT1 will be handled in the normal budgeting process when the 2021 budget is prepared in the fall. Final approval will be done by the Board in its December meeting. The Director noted that NT1 should also be part of the 2023 long term funding plan.
The Board received the NT1 Status Report.
Not in the minutes but discussed:
- What are our options if ARC and dCache stops getting funded? - there are other computing and storage elements. They are worse fits to our existing environment (WLCG computing at scientific computing centers, not dedicated resources in a physics institute), so a switch comes at a cost though.
Since then:
- Presentation and actual status update opened up to staff at https://indico.neic.no/event/134/contributions/498/
- Collaboration Agreement (CA) on NT1 signed: It will be attached (together with Expressions of Intent from other communities) to Research Council of Norway (RCN) funding proposal to show NeIC's long-term plans; tentative proposal already now on 25th (to ask for allowance to submit actual proposal in the fall).
Upcoming: Discussion here and with NLCG on manpower to prepare for 2021 budget discussion.
Report highlights
- more ALICE disk in Bergen
- CPU in Oslo now running in production
- ALICE disk: very noticeable that 2020 pledges have not been reached yet as of 1st of April
- we are backwards looking, Q1 is still about 2019 pledges, though
- ALICE disk: what are the plans for next procurements?
- ALICE disk used bar in DK low due to maintenance
- DK pools getting re-installed
- Writing down to tapes is slowing down due to reaching limiting capacity for ALICE and ATLAS: region-aware dCache will solve another problem: Reading from tape should speed up.
- ATLAS tape carousel outcome: we are one of the slower sites because of Oslo and NSC
- We should do fine with suggested adaptations for run 3 (starting next year)
- Run 4 scheduled for 2026/2027: a lot more is needed: going from 1 to 8 ExaBytes
- ALICE Tape: Norway and Finland far behind pledges, no compensation negotiated yet
- ATLAS disk still saved by Sweden’s good procurement in the past
- Suggestion for DK to prioritize ALICE CPU as long as NO covers ATLAS so well.
Availability is not exactly 100% only rounded, green is set at 98%.
Other Board reporting colours: Yellow, but we could nominally shift Swedish ATLAS tape and disk resources to ALICE and be green. However, during last run of the LHC the heavy ion campaign got de-prioritized, which means that the underlying needs for storage from ALICE is proportionally less. This shift happened after experiment resource requests and site pledges were already finalized. As a result, while formally delivering below pledged levels, ALICE is not as unsatisfied with our site resource delivery as the traffic light reporting would indicate.
Cost goal on target if recruitment goes through in fall, otherwise we will be underspending.
Staff changes
The Steering group read through the open position documents in preparation.
The Storage Developer one is pretty much ready to go, apart from putting in a newer version of first page blurb and with an updated receiving email address. Mattias will consult with the leads of the dCache.org team too for comments.
Ideally we want to publish both in one go. Timewise: publish at least one by the end of next week.
Open actionpoints
New:
- AP Mattias to bring up NeIC openness policy and indico event restrictions in one of the next staff weeklies.
- AP Michaela bring up NeIC openness policy and indico event restrictions during NLCG meeting
- AP Michaela to create a mailing list for receiving open position applications
- AP Mattias run dCache open position through dCache team
- AP Mattias to make Kine aware of our open position announcement timeline
- AP Oxana find a link to WLCG’s strategy Evolution of Scientific Computing (might have to wait after review)
- AP Oxana to provide the first full draft of project plan until June 11.
https://docs.google.com/document/d/1iyTaHZGua3hkjd_2YYxrYOKF8nn6BzmH6BenGPkh6xw/edit#
Old:
- AP Mattias: Make sure the banner (WLCG workshop) gets transferred (obsolete until another physical workshop happens, up to the organising committee)
- AP Mattias: Start again from scratch and think what he wants to go in there. (done)
- AP Mattias to draft the Job announcement for the dCache storage and development at 75% engagement (1 out of 2: in progress)
- AP Michaela to clean up her first thoughts until end of November (done)
- AP Mattias to sketch NT1-update content still in December (done)
- AP Oxana to go through onboarding document
- discussion during AHM + 50000 NOK in the budget for setting up the testbed
- https://drive.google.com/drive/folders/1dIroErIRQreOuqiD33Zn0mpLYO8JccrI
- AP Mattias to create Tier-1 pledged hardware tracking website (no update)
- AP: Oxana: Advertising pamphlet (no update)
- AP: Mattias to make operational procedures again public and up to date so it can be referred to. (no update)
NORDUnet network contract followup
We have to confirm that NORDUnet adheres to the new guidelines for the billing details.
EGI membership and potential consequences of Sweden leaving
SNIC no longer plans to pay for EGI membership and we have no good arguments.
CSC in Finland: High energy physics program is no longer paying for it.
Oxana and Mattias did some impact assessment and we should be fine without it.
GGUS and accounting are not critical services.
WLCG is a customer of these services, not NT1.
GOCDB is different: for announcing downtimes, this was supposed to be replaced by CRIC. CRIC used as of February for accounting already.
The three EGI services that WLCG uses (GGUS, GOCDB, APEL) are run at WLCG Tier-1 sites and have long-term sustainability beyond EGI funding.
Are the costs EGI charges adequate to the services they provide and the share of it we use?
Security responsibility: This will gradually move towards EOSC. EOSC is expected to gradually take over.
Estonia is part of NeIC now, and they are still a member within EGI.
NeIC won’t pay for EGI membership. (It is worth 0.5 FTE.)
Onboarding new communities
Idea was presented to XT yesterday. Will bring it up again in two weeks.
- https://wiki.neic.no/int/XT_meeting_2020-05-19#NT1_onboarding_of_new_communities_.28Michaela.29
- especially open canvas: https://docs.google.com/presentation/d/1RNgeYC7EizOVq6fDfj2cJq5GX4gsqwIxVBxxWfvXJos/edit#
Instant feedback received:
- Has bringing in a second domain in a domain specific group been done before? Has this been successful in any other community?
- NT1 has been set up for a rather specific domain
- (We are already offering services towards two different CERN communities: ALICE and ATLAS, we want to concentrate mainly on storage)
- Most other Tier-1s have many other communities
- The XT noted that even if there are clear advantages in terms of synergies and saving money there might still be obstacles on a political level. This should be taken into account and tracked as well (e.g. have a survey question along Will you have to consider another option even if we are 100 times cheaper due to outer circumstances you can't influence?)
- Maybe better if we only focus on parts, like long-term data archives.
- As given in their funding guidelines usually the funding agencies (with exception of perhaps RFI in Sweden) don't fund operation only discovery and development projects, the NT1 is a special exception. So we likely won't be able to request in-kind for covering those additional communities, they would have to stand for the full additional operational cost needed. Cost-savings have to come from synergies and effective processes, only.
- We should have a clear idea for the cost of operation. One suggestion was to start with one pilot community.
- ESS was added to the suggested list of communities: We expect to be rejected, though, because they have to have their resource in DK.
- XT wanted to know more about WLCG strategy
- Might still be work in progress at this stage
- Still particle physics but not WLCG
Discussion:
EISCAT_3D data archive: should be cheaper for them in comparison if they would have to hire their own dCache experts.
Code might have to be adjusted before running on our resources? Containerize and package.
IceCube already runs on our resources pro bono.
EISCAT_3D data archive might be the safest choice for the first pilot.
- AP Oxana to provide the first full draft of project plan until June 11.
https://docs.google.com/document/d/1iyTaHZGua3hkjd_2YYxrYOKF8nn6BzmH6BenGPkh6xw/edit#
Ideal manpower and full budget
The project manager shared his excel sheets. In-kind operational manpower should be kept separate.
NT1 openness considerations
Why are NT1 All Hands meeting restricted events on Indico? How does this relate to NeIC's openness policy?
Michaela and Kine will have a meeting with NordForsk’s data protection officer. Major concern was to publish a list of participants: This is per default off, though, in our indico. Only managers can see the list of participants.
One more question to the data protection officer: Is it fine to publish the list of participants or do we need ok of participants for that? (Normally you should get asked whether you want to be listed, this has to be tested. But per default this is off).
One participant list is given at the first NDGF AHM event, but not on the following ones.
Bottle neck with sending open position applications to one email address should also be discussed with the data protection officer.