Nordic accounting workshop 2014
Time: 2014-10-16 lunch - 2014-10-17 lunch
Location: Venue: Blåsenhus, room 13:137; Blåsenhus campus; Map of the building
Registration deadline: 2014-10-08
Description
NeIC continues its bi-annual meetings to provide a forum for those interested in the state of accounting in the Nordic countries. These meetings serve as knowledge transfer and discussion boards, furthermore they also serve as reference for roadmap discussion for the ongoing NeIC SGAS maintenance project. Further Nordic projects related to accounting are highly encouraged.
Previous meetings in this series:
- Nordic accounting workshop session during the EGI Community Forum 2014 in Helsinki
- Nordic accounting workshop 2013
Upcoming meetings in this series:
- Workshop session (180 mins) on Nordic accounting on Tuesday 2015-05-05 in conjunction with the NeIC 2015 conference near Helsinki
Agenda
Room bookings: 16/10 12:45-18:00 and 17/10 8:00-12:00.
- Background and links for and during the discussion on Energy accounting:
- Presentation on energetic Fair Share held at the SLURM User Group Meeting, September 2014 in Lugano
- Yiannis Georgiou: Energy Accounting and Control on HPC clusters
- Publication on Energy Accounting and Control with SLURM
- https://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/forschung/projekte/vampirtrace especially, https://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/forschung/projekte/vampirtrace/vtplugincntr/index_html/document_view?set_language=en
- Here's are some articles on power usage in HPC:
- Links during the Cloud accounting discussion:
- Link during the Brainstorming:
- EGI has an application accounting project, see https://indico.egi.eu/indico/contributionDisplay.py?sessionId=43&contribId=295&confId=1994
Thursday, October 16 | Friday, October 17 | ||
---|---|---|---|
09:00 - 09:30 | Status of reporting services at CSC, Marco Passerini | ||
09:30 - 10:00 | Workshop discussion Cloud accounting opportunities | ||
10:00 - 10:30 | Coffee | ||
10:30 - 11:00 | Workshop discussion Accounting on GPU and accelerator level | ||
11:00 - 11:30 | Brainstorming further accounting related projects within NeIC | ||
11:30 - 12:00 | Next meeting at the NeIC2015, Michaela Barth | ||
12:00 - 12:50 | Lunch at Restaurang Feiroz i Blåsenhus | 12:00 | Lunch at Restaurang Feiroz i Blåsenhus |
13:00 - 13:15 | Welcome and overview ongoing events and projects at NeIC, Michaela Barth | ||
13:15 - 13:45 | Status recap:
|
||
13:45 - 14:15 | Accounting on the NHPC Pilot Gardar, Anil Thapa | ||
14:15 - 15:00 | SGAS project status and SGAS roadmap discussion, Magnus Jonsson | ||
15:00 - 15:30 | Coffee | ||
15:30 - 16:00 | GLUE, StAR and dCache; a random walk in storage accounting, Paul Millar | ||
16:00 - 17:00 | Workshop discussion: Energy accounting | ||
19:00 | Dinner at PepparPeppar |
Remote connection
We will try to provide you with some remote connection on https://connect.sunet.se/nordicaccounting/ so you can listen in. No promises given, though.
You don't need to register if you only want to connect remotely.
If you have never attended an Adobe Connect meeting before:
- Test your connection: https://connect.sunet.se/common/help/en/support/meeting_test.htm
- Get a quick overview: http://www.adobe.com/products/adobeconnect.html
Registration and Participants
Please register by email to the NeIC Generic Area Coordinator Michaela Barth or add yourself to the list below.
Registration ends on 2014-10-08.
The workshop includes 2 times lunch plus coffee, so please state any special dietary requirements.
Lunch place: Restaurang Feiroz i Blåsenhus
Name | Organisation | Country | Dietary Needs/just one lunch |
---|---|---|---|
Michaela Barth | NeIC | Sweden | - |
Magnus Jonsson | NeIC / HPC2N | Sweden | !nötter, !mandel, !svamp, !skaldjur |
Erik Edelmann | NeIC / CSC | Finland | !skaldjur |
Björn Torkelsson | SNIC / HPC2N | Sweden | - |
Jon Kerr Nilsen | NeIC / UIO | Norway | - |
Marco Passerini | CSC | Finland | - |
Hanne Moa | UNINETT | Norway | !hasselnøtter, !blåskjell, !sopp/svamp |
Paul Millar | DESY | Germany | preferably veg. |
Dejan Vitlacil | NeIC | Sweden | - |
Daniel Nilsson | SUPR/C3SE | Sweden | - |
Anil Thapa | NHPC | Island | arrives after first lunch |
Dan Still | NeIC / CSC | Finland | lunch on Thursday |
Hotels
Please book your accommodation yourself.
- For inspiration, a nice preselection of hotels in Uppsala has been made at:
https://events.nordu.net/display/NORDU2014/Hotels
Alternatives:
Minutes
Presence
as registered,
Remote:
- Andreas Bach
- Ebba Hvannberg
- Ásgeir Ögmundarson
- John Gordon
- Gudmund Høst
Welcome and overview ongoing events and projects at NeIC, Michaela Barth
(Presentation according to slides)
SAMS, SUPR, Björn Torkelsson and Daniel Nilsson
- Data from SAMS is used to produce the half- year reports towards SNIC. Would be nice to create this just by one click
- Newly introduced possibility to create mailinglists within SUPR now used for the application expert mailing list
- SUPR: UPPNEX storage integrated (just waiting for them to announce that users can create applications within their projects), next on would be SweStore: currently collected in SAMS, but not published in SUPR yet.
- Aggregated usage data is collected each morning from the SGAS server into SUPR
- Future goals: aiming for general user agreement in order to have general account management
- Comparison to REMS (Tryggve is using REMS)?: SUPR doesn't know enough of REMS to be able to make a comparison
Accounting in Norway (Demo), Hanne Moa
- Decisions on how fine grained reports and graphical representations should be: Difference between management and real work oriented purposes
- There are different procedures for different resources.
- Generally MAS (www.metacenter.no) and SUPR are quite similar, differences are small: e.g. instead of just having a PI projects also have an "actual responsible person": the PhD doing the real work
- Report views also used by NorStore (e.g. Number of projects vs year).
- One main problem currently is to get NorduGRID certificates in time (having to wait 3-4 weeks), however there is no official root CA needed for SGAS, will try to use own certificates instead like in METADOC.
- Project usage vs time: is not easy to get out directly from SGAS, it has to be filtered first. It is calculated twice a day, fetching numbers from SGAS twice an hour.
- Inserted records per day: also used as test if SGAS is working
- Looking for another intern for next year (possibly a local student)
- Sigma2 will start freshly from a blank sheet with 1st of January 2015. All legal liabilities of old Sigma will stop. One of the things to be improved is the financing, which previously was not suitable for making decisons on storage solutions. In the new model Sigma2 will exist at least for 10 years, where after 5 years there will be a decision if it will exist for another 10 years. This way finally creating a long-term perspective that is enough to buy storage. Sigma2 will not do Grid stuff anymore
- People in Norway want more GPU resources.
- Problems with the database and SGAS awaited for latest 2016: 15 Gigs limit with PostgreSQL database: the way you do backups needs to be changed (pgdump can not be used any more).
- User mapping is done via a reserved UID pool, where the same person gets the same UID everywhere, luckily UIDs are no longer as short as previously, passwd hashes are being shared.
- There is a separate system for applying on quotas. Currently trying to combine this old quota application system and MAS, while avoiding big changes in MAS.
SGAS project status and SGAS roadmap discussion, Magnus Jonsson
- Storage records problem in Norway fixed last Tuesday
- Reading of the REST API is already implemted in SAMS
- Milestone 3 is scheduled late this year, early next year.
- Magnus will provide Iceland with the latest patches for the query interface as currently used within SAMS.
- Should there be a master page for all plug-ins in SGAS? SGAS is already on Github.
- The milestones as defined on the roadmap on NeIC wiki are not in the Github repository. There are just real issues, not necessarily issues of the whole SGAS project.
GLUE, StAR and dCache; a random walk in storage accounting, Paul Millar
- Linear Block Addressing mode followed sectors.
- Random access vs streaming access compares to SSDs vs spinning
- There are seperate type of pools: flush, write, read, stage pools + tape. E.g. stage pools are very fast in delivering data. There is the possibility to combine pools, but that might lead to performance loss. You have to be careful not to fill up flush pools, likewise you are limited by stage pools.
- Space reservation is the answer to the "free space" problem. StAR was careful not to speak of "total" or "free".
- planning version GLUE 2.1
- CDMI not currently supported by dCache.
- The importance of the feedback loop: Get people to report as soon as possible, not just at the end of the year.
- Is accounting used for charging? The intention was that this would be possible. There is a "billing file" within dCache. DESY can't legally charge users, since that would mean state funded while in comptention with commercial providers which is not possible. But some commercial providers are using dCache as well. Accounting might be of interest for the different Research Councils.
- Controversy: "If it is cheaper to do it locally, amazon is doing something wrong."
- It is very labourintensive to try to check the accounting numbers. CPU accounting can be corrected afterwards, but with storage accounting that is more complicated.
Accounting on the NHPC Pilot Gardar, Anil Thapa
- different prefixes are used to ensure fair sharing.
- for reports: no portal is used, just assembling data.
- IS is building national e-infrastructure with possibly adopting SGAS as accounting system. There was a rumor that SGAS is being difficult, though. Also every side in Norway has to use Gold (e.g. Oslo using Gold and Slurm, using Gold to send accounting information to SGAS): Common request from Norway and Iceland: add Gold support into SGAS.
- AP on Anil to send current plug-in hacks to Magnus so he could integrate it. We could say it is a contrib.
Workshop discussion: Energy accounting
- Starting with the presentation on energetic Fair share to start the discussion: Should users who are using less energy get higher priority? What about users using only one CPU per node? People with a finate amount of time are penalized this way. "Is there a circumstance, where a job is more energy efficient, but takes longer?"
- If you run with the cpu clocked down you will take longer but use less energy. Memory is consuming power as well. What about jobs that are using more GPU? -> It is difficult to find the sweet spot, since that varies from system to system.
- Already when doing the procurement, we assume that the life-time of a machine is limited. We don't want jobs that run longer. As we do it now, we use time as proxy for (total) energy usage, and we try to optimize on time: "I want to maximize the scientific output of a system." The best way to using less energy is to run a more efficient job. In the end this might not lead to actually saving energy, but getting more usage out of the system. The users also want to finish their jobs as fast as possible.
- We could anyway start to collect the data. How should we measure the actual consumption of energy? So far just using historic data. A plot Energy consumption against wall clock time would be useful. LSDMA is funding some effort into energy monitoring of applications. It would be interesting to see power consumption over time.
- There is nothing yet in the standarized usage records, nothing in UR2. That needs to be added. However, it would not need to be a UR3, it should be easy to add to UR2.
- For doing fair-share you don't need to collect the data, since it is done at the site.
- In Norway with its four big HPC centers, data on which application runs most efficient on which resource, would be nice!
- Creating a unified billing model for different systems (GPU,..) becomes difficult, energy might be that metric (also relevant for any resource sharing project).
- If you had a way that can make the users prefer a system where the code uses less energy, .. : You will have to redefine the cost of the system. The grant would not be expressed in CPU hours, but total use of money including energy. Currently the user as no incentive to switch systems.
- Are GPUs more energy efficient or are they faster? That might depend on your application.
- The University of Hamburg as worked on similar things (see links added to the discussion background).
Status of reporting services at CSC, Marco Passerini
- CSC is private company owned by the government, not aiming at profit. Therefore accounting data is really important.
- Resource usage follow-up project mainly used for billing and accounting: Reppu means backback. Changing database didn't mean that you had to change the code, thanks to the library, it might also work on Hadoop or Jason. Rolebase authentification is possible in Jaspersoft.
- On Sisu reservation is done per node.
- Telemetry is used to identifying the users who don't use the resources perfectly. Linux process accounting on every compute node: so we know how much an application has been used. Percentage of "unknown": half
- Saiku is open source and can be used by others as well: it is very useful.
- Billing is done per project.
Workshop discussion: Cloud accounting opportunities
- Currently those that do cloud accounting are basically doing the same thing as for the classical resources: just node accounting.
- Are we thinking about over committing? We don't know if people get the resource they are requesting.
- What services are currently running on the Cloud?
- Chipster (CSC development, correspondent to Galaxy)
- CMS in Finland is moving towards the cloud
- Galaxy in Sweden at KTH, otherwise SNIC-Cloud not for production yet
- Pricing models:
- https://www.joyent.com/products/manta/pricing
- use different pricing models: over committed vs not
- "selling half a CPU": could be a good trade-of for some applications.
- billing for the guaranteed quarter, or the two that are actually used?
- --> getting very complicated.
- not only CPU is limiting: Bandwidth to disk, bandwidth to network are also limited resources. It might be harder to set a price for that.
- Telemetry for OpenStack: how does it work?
- telemetry module https://wiki.openstack.org/wiki/Ceilometer
- Norway is also using OpenStack in their UH-Sky (see e.g. https://wiki.neic.no/w/ext/img_auth.php/d/d8/140825NordicCloud_detailed_input.pdf)
- Knowledgesharing and OpenStack configuration Management is on the roadmap for the Glenna Nordic Cloud project in WP1 [1]
- http://docs.openstack.org/developer/ceilometer/measurements.html
- Is CSC doing dynamic allocation on the resources of cluster when running on the same HPC cluster? -> The cloud is just running on the same physical environment as the traditional clust. Allocation is a manual task for now, but there is discussion if doing that an automated task in the future.
Workshop discussion Accounting on GPU and accelerator level
Round the table, who is currently accounting for GPUs:
- FI: node with 12 cores and 12 GPUs, you can mix widely when reserving. In Slurm you have this gres field where you could look how many GPUs have been reserved. But we don't record it yet, since we don't have a model for it yet. Digress is used to allocate the GPUs to the job; FI. you don't get the single GPU cores, you get the whole GPU Board.
- SE:
- PDC and Lund: is there any accounting done on GPU level? both are still in pilote mode.
- Chalmers: just a few groups, no allocation.
- NO: Major problem is that people don't know how to use GPUs.
- IS: There are a few GPUs integrated on our cluster, but no accounting so far, just created seperate queue, like FI.
- -> We need more training sessions for people to tell them how to best use GPUs, so we have something to account for.
- SNIC is currently sourcing this out to the users: http://www.snic.vr.se/apply-for-resources/snic-gpu-pilot-allocation-information
- The actual value may be two times the CPUs
- Currently there is no possibility to share a GPU Board between different jobs. Do we need it? It might get very popular, so we need to do something. But how would you then split the bill? The user doesn't want to get charged for the other persons idling cores.
- What do you want to use the accounting for?
- Just aiding the software running on them?
- Could also be some kind of energy accounting. -> If you know what has been used, you should have that automatically.
- Reporting for the funding parties: how well do you fulfill the project budget
- Application accounting on what software is used on GPUs: It might be hard to see if software was run with GPUs or not; it might be a big challenge, to find out what the users are actually running.
- Openstack: now taken care of in the Glenna Cloud project
- Application accounting: very much desired from the application experts, as also discussed in the Application Expert meeting the day before, but difficult.
- difficult to map dependencies on libraries: depending on version.
- In Sweden Åke Sandgren tried to map usage last year: that was a lot of manual work
- The application accounting project in Norway is just doing process lists from what is already installed..
- Administrators need to know what software they need to apply
- Within EGI a German university did an application accounting project (see e.g. https://indico.egi.eu/indico/contributionDisplay.py?sessionId=43&contribId=295&confId=1994)
- Start with specific application packages and control access to software, by group and check access to software by group. Within EGI VOMS is used to see the use across different sites by people of the same group.
- also relevant for license commission: We use a lot of Open Source software, but when using software you have to pay for, accounting is important. E.g. in Norwary VASP is used a lot, they want group support in the MAS meta center. At least measure on how many and who needs access to it.
- There are no guarantees when automatizing the process lists. E.g. in Finland: Process tracking shows that 55% of the used binaries are not matching the supported applications (in number of core-hours).
- Track how long users are logged in
- useful for project leader
- also useful to identify users that need help
- Accounting on over-usage under-usage:
- but that is done locally at the sites, right?
- In Norway often the local quota is sufficient so users don't have to apply for a national quota. Norway is hoping to have fewer local projects with beginning of Sigma2, also interesting to see how many sites Norway will have.
- Network accounting:
- free upload, then you pay for download.
- However, that is out of our league: that is in the scope of NORDUnet.
- What to account for in the Nordic Cloud?
- Norway has a national cloud, not only HPC stuff, also web-pages, because of legal ramifications it contains also business secrets, personal data. We can run a nice distributed cloud with fail-over. (Several coastal cities in Norway: Big storm washing away Trondheim in 1893 and killed 300 people).
- A common repository:
- You can't put everything on GitHub: e.g. patches to commercial software need to go elsewhere.
- -> using GitLab for private stuff.
- The French have a system called assembler
- That was also discussed within the Swedish application expert group. They planned to set something up at PDC. AP on NeIC to check on that with Radovan and possibly also Åke.
- a repository for sharing Nagios and other monitoring tools and local scripts.
- at least some kind of repository collecting information on what is going on locally: E.g. plug-in to Slurm that allows to define a private user, this software piece is available at GitHub already.
- We could use the NeIC wiki page to collect the bits of code that are available out there. (e.g. when it comes to Gold to SGAS), this could also go the "Common Data Base for Knowledge and Excellence" wiki suggested at the Application Expert meeting NeIC_related_discussions_at_the_Application_Expert_meeting_in_Uppsala_on_14-10-16#Suggestions_for_a_wiki
Next meeting at the NeIC2015, Michaela Barth
- Are we doing the right thing, is this the right way? Such small dedicated meetings are very useful in getting the right people to talk to each other. This meeting should possibly have been combined/collocated with the NORDUnet conference one month ago instead.
Ideas for the next workshop (only 3 hours) at the NeIC2015 conference on Tuesday 2015-05-05 afternoon near Helsinki [2]:
- If getting new plug-in architecture applied: a hands-on session/demo would be interesting
- Recaps are not that interesting with a broader audience, rather more demos.
- discussing SGAS as such. Also discussing future of SGAS, since SGAS maintenance project is scheduled until October 2015.
- Invite representatives of the end-users? Is the end-user aware of accounting? Do they need it? They might have different needs.
- Comparison to how accounting is handled within PRACE.
Links