Nordic accounting workshop 2013

From neicext
Jump to navigation Jump to search

Time: 2013-11-04 09:00-16:00
Location: Radison Blu SkyCity, Arlanda (Stockholm Airport), Sweden
Registration deadline: 2013-10-28


Consilidated and consitent accounting within the Nordic computing centres is currently considered as one of the top-priority topics within Nordic e-infrastructure.

Current lack of transparent resource utlisation information prevents further planning and optimisation of resource acquisition, as well as assessment of efficiency of used solutions. (Oxana Smirnova in the NeIC LinkedIn discussion group)

NeIC as facilitator of Nordic solutions in areas of importance to Nordic researchers is therefore arranging a Nordic workshop on accounting and would like to invite everone interested to be part of a technical discussion in this important area.

The goals of the Nordic accounting workshop 2013 are to assess the status of accounting within the Nordics, to initiate knowledge exchange and discussions on interoperability, comparison to other available international solutions (within e.g PRACE and EGI) while considering user group needs and use cases; to collect some first input on the road map discussions for the SGAS project and to brainstorm on the technical implications of Nordic resource sharing (technical adoption when applying for each others resources, possible billing models, usage of the same or comparable formats based on UR standards). Besides the SGAS project, NeIC would like to encourage further Nordic collaboration projects on accounting. This workshop should be used to discuss new project ideas at any stage of maturity and shaping them towards concrete proposals. Projects should have a 6-36 month duration, involve two or preferably more Nordic countries, and should lead to concrete, measurable e-infrastructure improvements in the Nordic accounting landscape.


Suggested Agenda

Intended timeframe: starting at 9:00, closing at 16:00.

Please feel free to add suggestions for the agenda below:

  • Presenting NeIC
  • Assessment of the status in the different Nordic countries:
- Each country representing themselves, NeIC will provide some questions to be considered in advance (how to apply for applications, system informations, interoperability considerations, ..)
  • The SGAS project: presentation and a first roadmap discussion
  • Broadening the view: Looking at accounting solutions used within PRACE, EGI and EUDAT
  • User group needs: (BILS, NCoE)
  • Brainstorming session: Nordic resource sharing needs on accounting
  • Accounting of storage?
  • Possible Nordic collaboration projects in accounting
  • The NorStore Accounting Solution

Final Agenda

before 9:00 Breakfeast buffet and coffee

09:00: Welcome and Introduction to NeIC project handling with the example of the SGAS maintenance project (Michaela Barth)
09:15: SGAS: Current status and road-map discussion items (Magnus Jonsson)
09:30: A Nordic view on SAMS (Björn Torkelsson)
09:45: SUPR: SNIC User and Project Repository (Daniel Nilsson)
10:00: Accounting in Finnish Scientific Computing (Kalle Happonen)
10:25: Norwegian requirements on SGAS (Hanne Moa)
10:30: Workshop session: SGAS Road-map discussion

Please collect and prepare your needs as detailed and clear as possible in advance, so we can concentrate on the prioritisation

11:30: Lunch.

12:45: APEL and EGI accounting (Michaela Barth for John Gordon)
13:00: Standardisation of Usage Record Formats (Jon Kerr Nilson)
13:30: User group needs and cross boarder resource allocation use cases (Michaela Barth)

13.15: Workshop session: Technical accounting solutions to enable Nordic resource sharing

14.30: Coffee

15.00: Workshop session: Further possible accounting projects
15:45: Summary (Michaela Barth)
16:00: Meeting Closed

Registration and Participants

Please register by email to the NeIC Generic Area Coordinator Michaela Barth or add yourself to the list below.

Registration ends on 2013-10-28.

The workshop includes lunch and coffee, so please state any special dietary requirements.


Name Organisation Country Dietary Needs
Michaela Barth NeIC Sweden -
Erik Edelmann NeIC / CSC Finland -
Magnus Jonsson NeIC / HPC2N Sweden !nötter, !mandel, !svamp, !skaldjur
Bjørn-Helge Mevik UiO Norway -
Hanne Moa UNINETT Sigma Norway !hazelnuts, !mushrooms, !mussels.
Kalle Happonen CSC Finland -
Björn Torkelsson HPC2N Sweden -
Daniel Nilsson SUPR/C3SE Chalmers Sweden -
Mats Nylén HPC2N Sweden no food
Jon Kerr Nilsen NeIC / UiO Norway -

Minutes

Welcome and Introduction to NeIC project handling with the example of the SGAS maintenance project (Michaela Barth)

(see attached slides: File:131104-caela-NeIC_project-handling.pptx)

SGAS: Current status and road-map discussion items (Magnus Jonsson)

(see attached slides: File:Magnus_Jonsson_NeIC_Sgas_Arlanda_2014-11-04.pdf)

Comments: Could we use GitHub to make it a more open project? There is a Trac site for SGAS (with trouble tickets and everything), but it is currently not used.

A Nordic view on SAMS (Björn Torkelsson)

(see attached slides: File:Bjorn_Torkelsson-SAMS_-_NeIC_Workshop_20131104_v2.pdf)

Comments: SAMS is the SNIC Accounting and Metrics Service and is thought as middeleware for collecting statistics from all SNIC Centers. Manpower is 8PMs for 2 years from 2012 (applied for extension 2014-2016). All clusters within SNIC report to SAMS, work going on to also collect storage statistics from SweStore. So far only usage accounting is done, but it is intended to be a full scale up-time reporting tool including availability and maintenance info. One of the questions is how to define availability. Could GOCDB be interesting to be used by SAMS? It is not decided what tools to use yet.

SUPR: SNIC User and Project Repository (Daniel Nilsson)

(see attached slides: File:NeIC_Accounting_Workshop_20131104.pdf)

Comments: SUPR is a user and project database including a new application system. Old large allocation project data have been imported and is accessible. SUPR and SAMS are in contact with each other talk to each other about e.g. reporting views.

How does the project membership management functionality that the PI has compare to the Resource Entitlement Management System REMS? It is lower level and just for one project. REMS or something similar could possibly be interesting for the next, higher step.

It was remarked on the many similarities to the Norwegian solution where MAS is used also for user management. The requirements concerning collecting statistics, accounting and project management are the same. These systems could be developped together. They don't have to run together, but they could could be developed together.

Accounting in Finnish Scientific Computing (Kalle Happonen)

(see attached slides: File:2013-11-04-NEIC-accountign.pdf)

Comments: Integrating many resources into one system is less of a problem in Finland. There is a new accounting system called Reppu, integrated with SLURM. CSC is already accounting for software. Some effort is going into streamlining the open stack applications. CSC just has to take care of user/group account names, not certificates. For the Grid FGI is not accounted against CSC usage, here SGAS is used instead. For PRACE SLURM data is harvested and pushed into the PRACE accounting system. How do we define what a job is? More detailed time reporting, trying to get efficiency wall time vs. CPU time.

Norwegian requirements on SGAS (Hanne Moa)

Comments: MAS compares to SUPR and SAMS. It is taking care of the SGAS server, project membership and statistics. It is written in Python as well and also uses a PostgreSQL Database. Also the same distributed nature of the system: The Norwegian sites included are Bergen, Trondheim, Tromsø, Oslo. SGAS is also used to account for storage records taken from NorStore.

Until this year everybody used different systems to report on accounting, this was consolidated so that all sites and systems now use the same accounting solution. Schedulers/Batch systems used are SLURM and MAUI. Most systems use MAUI. For SLURM you have to decide between usage limits OR fair share (it can't do both). This is part of the reason why UiO started using Gold for usage limits (the other part being that the rest of the sites were already using Gold). Only Oslo uses SLURM but doesn't use the SGAS SLURM plugin. The Gold module for MAUI was initially developed for MAUI, but it is not developped any more. Its big advantage is that it provides you with an interface to the queuing system. SGAS and Gold conflicts locally on a machine. An SGAS Gold module (SGAS plugin for Gold) has been written, so Gold can report directly to SGAS. This ensures that accounting in SGAS will be exactly the same as the accounting on the site, so Notur centrally and users locally see the same numbers. Gold is perhaps a dead end, but at least it is now the same thing on all places.

Hanne is the (one and only) developer for MAS.

What is needed from SGAS: For UNINETT Sigma an easy way for project leaders and members to see nicely what has been spent is needed. In SUPR splitting on user level is missing. Users tend to spend their allocations too late. (It was remarked that users spending their allocations too late is rather a policy problem. Umeå for example is running fair share. There are more than 2000 enabled accounts in SUPR.)

MAS also allowes local systems and local users, and gives out usernames and UIDs, so it is coordinating user IDs as well! Different University policies make it difficult for one user to have the same UIDs on each system. Tromsø has only users in MAS, no local users. Most places use LDAP locally, so MAS is developing LDAP integration (planned for next year). MAS does not yet handle groups. FEIDE-ID (only one per person!) is used for identification, some institutions don't have FEIDE-ID, so id's that look the same (email adress) are used.

Discussion on federated identity: Interest in project moonshot? In SWAMID id's are only unique on site level. The institutions stand for the level of assurance of identities, SWAMID-2 needs photo-identity. It would be possible for all to go one level higher: to support all of Kalmar2 instead of just FEIDE-ID.

Workshop session: SGAS Road-map discussion

(Outcome see below). Follow-up discussion will be held at a NeIC workshop collocated with the EGI Community Forum in May in Helsinki.

Comments:

  • Keeping the core part bug free and working!
  • Problem with long running, large jobs: overflowing integers in the database (it is storing seconds), on the server side
  • Extend the reporting views:
    • Better log massages with level of severity (so it could be feeded into Nagios)
      • There is a monitoring service in the SGAS server, a Nagios plugin exists!
    • Norway not using SGAS reporting views, MAS doing its own reporting views.
    • in a seperate module, interface to create easily reporting views.
      • this would also enable no running the reporting on the same machine, but on a separate one.
    • JSON API could be put anything on top of it.
      • SGAS: custom query interface for SUPR with JSON: we have this in SAMS, not in the SGAS core code, but run on the SGAS server, not in any released version, keep this separate from SGAS? one collector, one querier? --> Migrate SAMS custom query interface into mainstream SGAS
  • Migrate NeIC APEL reporting into mainstream SGAS: might be a requirement, for future European collaborations.
    • We actually have a script for SSM2 but it is still untested.
  • Not easy to make SGAS plugins. Need for LRMS plugin?
  • New feature for correcting / pulling back bad records or at least remove and resend them.
    • on the server side: delete everything from this resource from that date on
    • Currently the first record wins (no duplicate)
  • Time-zone support: server just takes records from the clients as it is. Timestamp without time-zone "naive date times". Should add timestamps to local job id. Would be easier to enforce UTC everywhere. UTC everywhere as a NeIC project? :-)
  • Having a NeIC repository with all projects having GitHub?
  • The open license is explicitly mentioned as desirable in the funding agreement contract. SGAS is under apache license.

APEL and EGI accounting (Michaela Barth for John Gordon)

(see attached slides File:APEL-NeIC-041113.pptx)

Comments: (see also Michaela's comments as attached slides File:131104-caela-accounting-considerations.pptx). APEL regional servers were not existent when SGAS was started. APEL is not the only reason we do reporting, we will also have to report to other levels e.g. NeIC, SNIC, etc.. separately. So it makes sense to continue to use SGAS. Interesting to use APEL for any VO that requires this to be done, e.g. particle physics.

Standardisation of Usage Record Formats (Jon Kerr Nilson)

(see attached slides File:Standardization_of_UR_Formats_-_NeIC_Accounting_Workshop_Nov_2013.pdf)

Comments: For SGAS it makes sense to upgrade to newest version of StAR first, while waiting that OGF UR2 is implemented.

User group needs and cross boarder resource allocation use cases (Michaela Barth)

(see attached slides File:131104-caela-cross-border-resource-allocation.pptx)

Comments: Michaela will meet physically with NCoEs users end of this month to extract better defined use-cases.

Workshop session: Technical accounting solutions to enable Nordic resource sharing

Comments: Some discussion on the experiences with the NDGF Pilot project. We really need clear defined use-cases. Accounting should not be a big technical problem.

Workshop session: Further possible accounting projects

Discussed project ideas:

  • Support Federated Itentity throughout
    • Participation in the MoonShot project (start with asking Leif Nixon)
  • What is common in SUPR and MAS
    • They will stay in contact now!
  • Common repository on NeIC level
  • Collaboration on open application stacks
    • package them differently (e.g. for Cloud)
  • Collaboration on Advanced Support: Programming, Performance Analysis
  • What software is actually used?
    • Monitoring, on resource and project level
  • Monitoring in General (Nagios, GOCDB)
  • Cloud accounting in SGAS
  • UR2 in SGAS
    • Deciding on a Profile
  • more Job details
    • get it out from the batch system

Summary (Michaela Barth)

Thank you for the meeting!

SGAS Road-map

Common

  • (M1) Adding/Managing Bug trac.
  • (M1) Plugin store (collecting also not supported, community provided plugins)
  • user group meetings together with EGI community forum (19-23 May)
  • New release in January 2014
  • (M1) Setup "new" source code repository (GitHub?)

Bugs

  • (M1) Problem with long running, large jobs
  • (M1) UTF-multi byte chars (are currently rejected).

Improvements

  • (M2) Improve error reporting from SGAS
    • Bart do not exit with error code if something goes wrong
    • Level of log status
  • Extend the reporting views
    • separate module(s)?
    • removing existing views???
  • (M2) Update current storage implementation from StAR draft to 1.2 final.
  • (M2) Refactor parts of the code for easier maintenance
  • BART
    • (M2) Client side verification (Add verification step into bart-reporter & storage-reporter)
      • bad files
      • empty files
      • bad records within files
      • Missing hostname?
      • tool for verify star records.
    • Change bart archive directory
    • Lots of files problem
    • Support of LRMS? Slurm, Moab, Gold
      • Not supported, community provided plugin.
  • make idtimestamp default!
  • Cloud accounting (UR-2)

Extension

  • Migrate SAMS custom query interface into mainstream SGAS
  • Migrate NeIC APEL reporting into mainstream SGAS
    • SSM2?
  • (M2) nagios plugins
  • Command for remove records for a cluster in a safe way
  • Host scale factors.
    • Put WLCG views into separate module??

Attachments