2nd Nordic data services workshop
The second Nordic Data Services Workshop will focus on services operated by the National e-Infrastructure Providers and National Data Service providers. Which part of Research Data life cycle is covered by which service and how? What are the future strategies and trends?
- This is a two-day event, 18.05.2016 - 19.05.2016
- Steningevik Konferens AB is located 10 km from Arlanda airport and reachable by taxi.
- Full address Steningevik Konferens AB Steninge allé 111, 195 91 Märsta
- Live minutes: https://docs.google.com/document/d/1KkPYmLNe2oOUSwPwL_8w-efNJgObT_-yib_P1s3pBdc/edit
- Minutes taking (secretary to the workshop) ...
Day 1 (18.05.2016)
10:00 - 10:30 Arrival and Coffee 10:30 - 11:00 Welcome 10:30 Welcome (Michaela Barth, NeIC) slides 10:40 Pooling Competences and resources (Rob Pennington, NeIC) slides 11:00 - 12:30 Session 1 11:00 CSC Storage Service Overview (Anssi Kainulainen, CSC) slides 11:30 SNIC Storage Service Overview (Dejan Vitlacil, SNIC) slides 12:00 KTH Records Management Office (KTH IT Archivist) slides 12:10 Research Data Management in NBIS (Niclas Jareborg, NBIS) slides
12:30 - 13:30 Lunch
13:30 - 15:00 Session 2 13:30 Welcome (Gudmund Høst, NeIC) 13:40 UNINETT Sigma2 Storage Service Overview (Andreas O Jaunsen, UNINETT Sigma2) slides 14:10 DeIC Storage Service Overview (Henrik Pedersen, DeIC) slides 14:40 Embedded librarians - Working in a research infrastructure (Monica Lassi, Lund University Library) slides 15:00 - 15:30 Coffee break 16:00 - 17:00 Session 3 16:00 NSD and Research Data (Atle Alvheim, NSD) slides 16:30 DDA and Research Data (Anne Sofie Fink Kjeldgaard, DDA) slides
17:00 - 18:30 Social activity
19:00 - Dinner
Day 2 (19.05.2016)
09:00 - 10:30 Session 4 9:00 SND and Research Data [Data Pilot Activity] (Johan Fihn, SND) slides 9:45 FSD and Research Data (Helena Laaksonen, FSD) slides 10:30 - 11:00 Coffee break 11:00 - 12:30 Session 5 11:00 Ransomware and other threats to data storage and data services (Leif Nixon, Security Officer)
12:30 - 13:30 Lunch
13:30 - 15:00 Session 6 13:30 NDS and Research Data (Christine Kirkpatrick, National Data Service, USA) slides 14:00 National e-Infrastructure providers and National Data Service providers ... Do we provide complementary or competing services? ... Is there a room for collaboration? ... Building and designing services together? 15:00 Closing
Registration and Participants
The workshop is by invitation only. Please register by email to the NeIC Coordinator Dejan Vitlacil.
|Name||Organisation||Country||Confirmed Steningevik room reservation for 18th night||Special requirements|
|Dejan Vitlacil||NeIC/SNIC||Norden||Yes (2)||No|
|Rob Pennington||NeIC||USA||Yes (2)||No|
|Gudmund Høst||NeIC||Norden||Yes (2)||No|
|Michaela Barth||NeIC||Norden||No need for the room||No|
|Andreas Jaunsen||UNINETT Sigma2||Norway||Yes||No|
|Leif Nixon||Security Analyst||Sweden||No need for the room||No|
|Christine Kirkpatrick||NDS||USA||No need for the room||No|
|Monica Lassi||Lund Library||Sweden||Yes||No|
|Anne Sofie Fink Kjeldgaard||DDA||Denmark||Yes||No|
National e-Infrastructure Providers
- SNIC - Swedish National Infrastructure for Computing
- CSC - IT Center for Science
- DeIC - Danish e-Infrastructure Cooperation
- UNINETT Sigma2
National Data Services
- SND - Swedish National Data Service
- FSD - Finnish Social Science Data Archive
- DDA - Danish Data Archive
- NSD - Norwegian Centre for Research Data
- NDS - The National Data Service, USA
- Retraction Watch Leaderboard
- LHC Processing: What to record?
- CERN Open Data
- Research Data Netherlands Course
- Second year report on RDA Europe analysis programme
- BBMRI FAQ on new EU data protection law
- Useful cartoon illustrating the problems of open research data: Data Sharing and Management Snafu in 3 Short Acts
(Michaela Barth, NeIC) slides
Round of presentations (clockwise around table):
- Dejan Vitlacil, SNIC storage lead
- Jens Larsson, NSC, Swestore coordinator
- Mikael Borg, NBIS technical coordinator, providing bio data services for seden, part of ELIXIR
- Niclas Jareborg, NBIS data manager
- Michaela Barth, NeIC XT, Pool competencies focus area lead
- Monica Lassi, Lund University library
- Rob Pennington, NSCA, NeIC special advisor on data and HPC, Share resources focus area lead
- Atle Alvheim, Norwegian center for research data (formerly Social Science data services)
- Johan Fihn, Swedish data services (SND)
- Andreas Jaunsen, Sigma2 data services lead.
- Anne Sofie Fink Kjeldgaard, National archive of Denmark (DDA)
- Chris Ariyo, CSC, EUDAT service manager FI
- Anssi Kainulainen, CSC, application architect at information infrastructure services -group
- Christine Kirkpatrick, NDS, US
- Joel Hedlund, NeIC XO, Tryggve project owner, Strengthen stakeholder dialogue focus area lead
- Jessica Persson, KTH data archivist.
- Henrik Peterson, chair forum for data management in Denmark
- Gudmund Høst, NeIC dir
- Jacko Koster, SNIC coordinator for international activities (prev dir)
- Helena Laksonen, Acting director Social sciences data archives, information officer.
Pooling Competences and resources
(Rob Pennington, NeIC) slides
- NeIC does not operate any own facilities
- 26 Mio people resident in the Nordic region
- National budgets are not getting better
NeIC training policy draft is accessible internally until approved, but can be made available on request to:
Sharing of application experts expertise for advanced user support: People are not scalable!
CSC Storage Service Overview
(Anssi Kainulainen, CSC) Presentation slides: https://kannu.csc.fi/index.php/s/3BIq80ovkYfrqgO
Main goal of IDA is to keep data secure, the focus not so much on sharing data.
- Is IDA also used for active data?
- Metadata profile needs to declare if open for others, everyday storage and sharing in future plans for new IDA.
- Who is the data owner? Researchers or universities?
- The project is the data owner, actual people can change. Policies for what happens when the projects ends.
There is effort going on to match metadata between Etsin and EUDAT B2Find to make data migration easier.
- Contract with ministry: Paying customer is the ministry, agreements with the users. Services are ordered by the ministry, the services are defined in working groups.
Open Science and research initiative has forum discussions.
- Total budget of whole life cycle of data services?
- % of users using IDA?
- Compared to Taito with 1600 users yearly, 500 users would correspond to about 30%.
SNIC Storage Service Overview
(Dejan Vitlacil, SNIC) slides
Major challenge: better communication with universities.
KTH Records Management Office
(KTH IT Archivist) slides
(This item was tabled.)
Research Data Management in NBIS
(Niclas Jareborg, NBIS) slides
- Gothenburg, Lund and Umeå universities becoming part of SciLifeLab during 2016.
- NBIS invites for drop-in visits every week, check calendar at http://nbis.se.
- SNIC Sens (bianca) first users in August 2016.
(Gudmund Høst, NeIC)
UNINETT Sigma2 Storage Service Overview
(Andreas O Jaunsen, UNINETT Sigma2) [[:File:UNINETT Sigma2 presentasjon 2016 - NeIC Steningevik.pdf|slides]
Data should not be duplicated unnecessarily; this requirement reduces the number of potential vendors for ongoing procurement.
The final solution will be operated by Sigma 2 themselves: technical operation and advanced user support with support service conducted in container with HPC services in Metacenter (containing altogether 40 people). Researchers who are not at Norwegian institutions but who are collaborating with Norwegian researchers can get access to the data infrastructure. It has not been structured to be an international data sharing environment.
DeIC Storage Service Overview
(Henrik Pedersen, DeIC) slides
Apps-store for data apps that are to be usable for researchers. Four are currently under consideration.
Embedded librarians - Working in a research infrastructure
(Monica Lassi, Lund University Library) slides
NSD and Research Data
(Atle Alvheim, NSD) slides
The project’s home institutions (not the project) own the data and are legally responsible.
According to the law projects are not allowed to keep the data they have worked on for more than 3 resp. 5 years. But the data needs to be kept 10 years: deleted or otherwise anonymized at the researcher.
NSD runs a data privacy ombudsman service, a service offered to their “member” institutions.
DDA and Research Data
(Anne Sofie Fink Kjeldgaard, DDA) slides
Researchers and students need to register first with a signature on a paper to be able to use the service.
Keywords provided by archive, classification based on a social science oriented thesaurus.
European languages thesaurus based in London enables to search across several languages. Output is an xml file.
Secure data distribution system and environment needed in the future.
Day 2 (19.05.2016)
SND and Research Data [Data Pilot Activity]
(Johan Fihn, SND) slides
Atle Alvheim on our fatalistic view: Not building a place to die for the data, it’s all about the services, we should focus on what function we want the data to fulfil.
FSD and Research Data
(Helena Laaksonen, FSD) slides
Aila usage from universities outside Finland is a little bit below 10%.
Ransomware and other threats to data storage and data services
(Leif Nixon, Security Officer)
- Phalanx 2.0 is still not publicly available, (Phalanx 0.6 has leaked). It is not illegal to write malware.
- Ebury has not leaked yet either: 2-factory identification could have made Ebury attacks impossible.
- Cdorked.A + Boaxxe + Glupteba + Ebury -> all about Porn site ads
Money flow investigation: worth a Lamborghini in Russia
- FBI needs venue to get involved: crime has to happen in the geographic area.
Criminal now deported to US waiting for sentence.
Don’t run java applets in your browser!
What can you do: Defense still stays the same:
- Keep backups: make sure backups are protected: Don’t have your backup mounted all the time (e.g write once, introduce delays in replication and have snapshots if possible)
- Control authority: Nobody should be able to delete all your data at once
- Be up to date: They can’t exploit a patched vulnerability
Ransomware on a colleague’s computer can encrypt common shared documents
Threats on different OSes typically proportional to marketshare.
Billion $ question: kernel on all Linux compromised?
Some browser exploits can grab passwords out of e.g. WinSCP profile storage. Don't use it. <- seriously. It saves passwords unencrypted in the profile.
- (no not browser exploits, the whole computer was corrupted)
- You sure? That's a Windows machine.
- You can also log in remotely to Windows machines and have a password...
- Actually, yes, the win machine was rooted, but a major contributing factor in this case is that WinSCP saves passwords unencrypted. Don't use it.
- Compiler corruption to trojan kernel builds?
- Not likely.
- Isn't DNS queries to random servers unusual?
- Amateurs using sophisticated tools made by someone else? Exfil servers have more than one consumer?
- How do you replace an exfil host?
- If ISP and LE are in, can't you just shut down the compromised hosts?
- What was the browser exploit?
- 2-factor authorization enough to dodge Ebury etc? Future?
- Do you need to *buy* an exploit kit? Can't you just browse ads until you get one?
- How to protect backups?
- Enterprise backup systems with limited access.
- Don't keep backup mounted all the time.
- Write-once solutions, to random filenames.
- Delay in replication.
NDS and Research Data
(Christine Kirkpatrick, National Data Service, USA) slides
- NDS Workbench Demo: https://www.youtube.com/watch?v=Wz-45JGk4tg
National e-Infrastructure providers and National Data Service providers
Possible discussion topics:
- Do we provide complementary or competing services?
- Is there a room for collaboration?
- Building and designing services together?
- Do we have effective and useful national data management policies? Is there a case for collaboratively drafting a proposal to influence instating effective and useful national data management policies? Directed to ministries? Funding councils? (Joel)
- Is there a need for (internationally?) consistent metadata for research data? Do we have any mechanisms for instating this? (Joel)
- Are there people who should be invited to this type of meeting?
- Should security aspects be something that should be worked on together or talked about more?
- Is there an opportunity for another information exchange at NeIC 2017?
- Very useful to attend the forum. It might be good to bring in university libraries. Is there someone from a university who can describe what they need? Someone from a high level who can describe what they need?
- Mention of describing or defining the roles with university libraries or users. How do they handle the data?
- Follow up on the lack of formulated policies that specify that this is a valuable activity in a heterogeneous environment. It should include the academic side and the technology side. There is enormous diversification in the field and there isn’t a single institution managing the services. The major problem for users is finding out what the services are they need to place the data or find the data to be used. Organizing the services to manage the data services should be decentralized. The current technologies in use is a tiny fraction of what is available.
- Possibly have representatives from funding agencies at the next meeting so that they can hear about the problems.
- Data management should be advocated as a policy because it is a profitable thing that can be done and there is value in the data that should be understood.
- How do you engage the range of stakeholders in all of the countries? Each country is different - DK has RDM Forum. There was a recent workshop in NO that brought a number of stakeholders together.
- Should we meet again in this environment and make it slightly larger?
- Perhaps this type of meeting that could come into the NeIC conference as a co-located activity to expose it to the wider community.
- This might be challenging with the funding agencies to have the people who will be able to speak authoritatively attend.
- They should hear what is being done and the challenges in presenting these to the funding agencies.
- The funders want to hear from the researchers who will benefit not necessarily from the e-infrastructure providers.
- US Big Data hubs - four regions and each region should focus on specific topics. The funding is for bringing people together. Can a group be gotten together in the Nordic region that includes the domain users?
- The e-science action plan 2.0 has five relevant tasks - create a forum within NeIC that includes national data services, e-infrastructures and produce a document to respond to this as a position paper..
- Engage with the university CIOs as well. How many universities in the Nordics that might be engaged?
- Could representatives from this group take an initiative and have an impact?
- Do some background work and then plan to have a workshop in June 2017 to produce the position paper?
- Example - do a survey on the view of the RC of DM or check the web sites for each country? Draw an outline of what is available or relevant in each country?
- There is nothing in common in the Nordics on the topic of data. RIs might be quantified in terms of productivity.
- NeIC would be happy to facilitate a new activity.
- Compile a list of capabilities and talents in each of the Nordic countries - benchmarking effort? Can NeIC work with the group on this?
- Including the CIOs may be important if they have to provide research computing support.
- Try to organize another event in 12 months.
- Move forward with the idea of a Nordic Forum for Research Data Management.
- Training - is there something that might be done with the different countries? Information exchange on different training resources, such as youtube videos?
- Finnish training material is generally in Finnish.
- Train the trainer idea - can this be described better or more extensively?
- Any topics that weren’t covered here that could be included? Two data initiatives from each country at the meeting.
- Publishers have a need to attach data to the publications and help on reproducibility.
- Topics for the next meeting might be more focused on specific issues after this meeting which was general.
- For NeIC 2017 - select topics and open for submissions by participants? How long should this be?
- There are several initiatives that may be developing silos in the different countries due to institutions needing to take responsibility? Is building ERICs building silos?
- Could have an open brainstorming meeting in a month or two to decide on what to do next?
- How to give the users a good offer for solving their problems?
- National research councils are usually financing single infrastructures that may be topic or domain specific or nationally focused and these aren’t being shared.
- Should be collaborating more on sharing the services or technologies.
- EUDAT has a good concept and the services are still in development. Strength should be that they are part of one infrastructure that flows together but the links are still to be implemented, which will be interesting.
- A generic API that can use any of the necessary services - API is still being worked on with EUDAT.
- Lund has people who are the bridging element between data librarians and researchers to make it more discoverable/available.
- Common topic this year was Data Management Plan, so the group is moving along similar pathways.
- Conference in Bergen - iASSIST2016 end of May/first days of June.
- Perhaps this type of activities should be considered for dissemination like training is disseminated.
- Data librarians are in attendance at this conference.