Best practices for big data handling in biobanking workshop 2015
Workshop name: Best practices for big data handling in biobanking workshop 2015
Organizer: Davit Bzhalava, NIASC, KI
Expected output: Recommendations on how to build efficient and sustainable e-infrastructures for biobanking data
When: 9-11 November 2015
Where: Hässelby slot, Stockholm, Sweden
This workshop will bring together Nordic national biobankers and e-infrastructure service providers for sensitive data processing, sharing their experiences on how to build and use efficient and sustainable e-infrastructures for biobank research, data management and storage, and setting up and maintaining associated ecosystem of workflows, pipelines, and bioinformatics software.
The event is funded by NeIC.
Agenda
Day 1 - From freezers and bio specimen storage to computation and data storage - Nordic biobank needs
(11:30 - 17:00)
- 11:00-11:30 Arrival
- 11:30-12:30 Lunch
Session 1: Nordic biobank infrastructures
- 12:45-13:15 Joakim Dillner - Open access biobanks
- 13:15-13:45 Mads Melbye - Digitalization at scale
- 13:45-14:15 Kristian Hveem - From freezers to computers
- 14:15-14:30 Coffee Break
Session 2: Genomic Data - Bioinformatics analysis pipelines/tools and resources needed (users perspectives)
- 14:30-14:45 Oddgeir Lingaas Holmen - Human genomics and proteomics
- 14:45-15:00 Davit Bzhalava - Microbial metagenomics
Session 3: Large scale computational infrastructures
- 15:00-15:30 Piotr Bala - The Polish Grid Infrastructure
- 15:30-16:00 Ann-Charlotte Berglund Sonnhammer - Swedish National Infrastructure for Computing
- 16:00-17:00 Discussions
- 18:00 Dinner
Day 2 - Big data infrastructure, emerging technologies and Nordic biobank needs
(09:00 - 17:00)
- 09:00-09:10 Davit Bzhalava Welcome, summary of day 1 and goals of the day 2
- 09:10-09:25 Sigurd Gartmann - Introduction to some of today's technologies
Session1: Emerging technologies
- 09:25-09:45 Einar Ryeng - Strategical consideration in system designs
- 09:45-10:05 Jim Dowling - Hadoop
- 10:05-10:25 Jaakko Leinonen - OpenStack
- 10:25-10:50 Coffee Break
- 10:50-11:05 Karan Singh - Ceph Storage
- 11:05-11:30 Discussions - Configurations and deployments (ansible/chef/puppet
- 11:30-12:30 Lunch
Session2: Computer infrastructures and scalable big data infrastructure for biobanks
- 12:30-12:50 Bengt Persson - Bioinformatics Infrastructure for Life Science (BILS).
- 12:50-13:10 Gard Thomassen - TSD infrastructure
- 13:10-13:30 Niclas Jareborg - Mosler infrastructure
- 13:30-13:50 Jaakko Leinonen - ePouta infrastructure
- 13:50-14:10 Coffee Break
- 14:10-14:40 Discussion
- 14:40-15:00 Jim Dowling - BioBankCloud
- 15:00-15:20 Victor Yakimov - Danish solution
- 15:20-15:40 Kimmo Pääkkönen - FIMM solution
- 15:40-16:00 Oddgeir Lingaas Holmen - HUNT Computer Cloud
- 16:00-17:00 Discussion (including Buying hardware, security, network, agreements)
- 18:00 Dinner
Day 3 - Integration of biobanks and IT-infrastructure
(09:00 - 12:00)
- 09:00-09:25 Anu Jalanko - Finnish Biobank Infrastructure and IT-solutions
- 09:25-09:50 Joel Hedlund - NeIC and biobanks
- 09:50-10:15 Antti Pursula - Tryggve and biobanks
- 10:15-10:40 Joakim Dillner - Experience of knowledge exchange among Nordic biobanks and how we can use this experience to develop better IT solutions. Summary of 2 days and next step
- 10:50-11:00 Coffee Break
- 10:50-11:30 Discussions and planning for the larger workshop (place, dates, topics, participants, type: open/closed etc).
- 11:30-12:30 Lunch and Departure
Report
The workshop titled "Best practices for big data handling in biobanking" took place in Stockholm on Nov 9-11, 2015. The workshop consisted of presentations and discussion sessions, and was visited by a total of 23 people distributed over the following countries:
Country | Participants |
---|---|
Sweden | 9 |
Finland | 5 |
Norway | 5 |
Denmark | 3 |
Poland | 1 |
Participants:
Name | Institution | Country |
---|---|---|
Ann-Charlotte Sonnhammer | it.uu | se |
Antti Pursula | csc | fi |
Anu Jalanko | thl | fi |
Bartlomiej Wilkowski | ssi | dk |
Bengt Persson | icm.uu | se |
Einar Ryeng | ntnu | no |
Fredrik Tingstedt | ki | se |
Gard Thomassen | usit.uio | no |
Jaakko Leinonen | csc | fi |
Jim Dowling | kth | se |
Joakim Dillner | ki | se |
Joel Hedlund | nsc.liu | se |
Karan Singh | csc | fi |
Kimmo Pääkkönen | helsinki | fi |
Kristian Hveem | ntnu | no |
Mads Melbye | ssi | dk |
Niclas Jareborg | bils | se |
Oddgeir Lingaas Holmen | ntnu | no |
Piotr Bała | icm.edu | pl |
Sigurd Gartmann | ntnu | no |
Suyesh Amatya | ki | se |
Victor Yakimov | ssi | dk |
Zurab Bzhalava | ki | se |
Findings
The workshop found that:
- This workshop was useful for giving the participants a good overview of needs and challenges for IT solutions for Nordic National Biobanks & National Biobank infrastructures.
- Community of Nordic national biobanks, national biobank infrastructures and e-infrastructure service providers have many common interests, a good spirit of collaboration and what most important understanding that there is much to gain from engaging in collaborations in these areas of common interest.
- The challenges in this area are varying in scope and complexity, and have different appropriate levels of engagement. They can for example be addressed by:
- Getting the necessary service directly from one of the Nordic national e-infrastructure providers. In this case, NeIC can assist either by simply putting the parties in contact or by more active collaboration, as necessary.
- NeIC Tryggve taking them on as pilot use cases. Tryggve pilot use cases are limited in time and are of reasonable size. Costs for resource use in Tryggve pilot use cases are covered by Tryggve.
- Setting up NeIC collaborative projects, under the NeIC co-funding principles.
- To provide a road map for closer collaboration between Nordic national biobanks, national biobank infrastructures and e-infrastructure service providers 3 working groups were established:
- Group to formulate a joint use case on IT needs for the Nordic National Biobanks & National Biobank infrastructures. Coordinator of this group will be leader of one of the BBMRI nodes in Nordic countries;
- Group for open source development of optimizing most commonly used genomic data analysis, biobanking software to efficiently utilize parallel computing hardware and deliver results in a reasonable time. This working group will be coordinated by Davit Bzhalava.
- Group on Registromics, to develop platform and IT solutions that will enable to connect multiple outcome registries with different biomedical data sources and will enable to conduct reversed data mining. This working group will be coordianed by Bartlomiej Wilkowski.
- These working groups will be composed of researchers and IT staff, who will together ensure that the use cases are founded in real researcher needs and are technically feasible with the resources available.
- The progress of these working groups will be followed up on the NIASC annual meeting January 12 in Copenhagen.
- NeIC will collaborate with NIASC to organize a larger, open workshop on these topics later in the spring.