Investigating options for future NT1 operations project directive

From neicext
Jump to navigation Jump to search


NeIC NT1
Investigating options for future NT1 operations
Project directive

Background

CERN and the Large Hadron Collider (LHC)

The European Organization for Nuclear Research, known as CERN is an international organization whose purpose is to operate the world's largest particle physics laboratory. The Nordic members of CERN are Denmark, Finland, Norway and Sweden with contributions together totalling 92 million CHF out of a total budget of 1108.5 million CHF (2014).

The Large Hadron Collider (LHC) is the world's largest and highest-energy particle accelerator. It was built by CERN from 1998 to 2008, in collaboration with over 10,000 scientists and engineers from over 100 countries, as well as hundreds of universities and laboratories. It lies in a tunnel 27 kilometres (17 mi) in circumference, as deep as 175 metres (574 ft) beneath the Franco-Swiss border near Geneva, Switzerland.

The aim of LHC is to allow physicists to test the predictions of different theories of particle physics and high-energy physics, and particularly prove the existence of the Higgs boson and to invesitgate traces after the large family of new particles predicted by super-symmetric theories. The LHC addresses some of the still unsolved questions of physics, advancing human understanding of physical laws. It contains six detectors each designed for specific kinds of exploration. Nordic research groups participate in the experiments ATLAS, ALICE and CMS at the LHC.

The Worldwide LHC Computing Grid (WLCG)

The LHC produces unprecedented quantities of collision data requiring analysis. Therefore advanced computing facilities were needed to process the data. CERN Council approved of the LHC Computing Grid in 2001, based on a survey of the need of the experiments. WLCG is, as of 2014, an international e-Infrastructure consisting of a grid-based computer network infrastructure incorporating over 170 computing centers in 40 countries.

The data stream from the detectors provides approximately 300 GByte/s of data, which after filtering for "interesting events", results in a "raw data" stream of about 300 MByte/s. The LHC project thus generates 27 TB of raw data per day, plus 10 TB of “event summary data”, which represents the output of calculations done by the CPU farm at the CERN data centre. By 2012 data from over 300 trillion (3 x 1014) LHC proton-proton collisions had been analysed, LHC collision data was being produced at approximately 25 petabytes per year. This data rate is expected to more than double after the LHC starts up again in 2015 after upgrades.

This data is sent out from CERN to eleven Tier -1 academic institutions in Europe, Asia, and North America, via dedicated 10 Gbit/s links. This is called the LHC Optical Private Network. The Tier-2 institutions are connected to the Tier-1 institutions by general-purpose national research and education networks. The data produced by the LHC on its entire distributed computing grid adds up to 10–15 PB of data each year. In total, the four main detectors at the LHC produced 13 petabytes of data in 2010. The Tier-1 institutions receive specific subsets of the raw data, for which they serve as a backup repository for CERN. They also perform reprocessing when recalibration is necessary.

The participation in the WLCG is governed by a “Memorandum of Understanding for Collaboration in the Deployment and Exploitation of the Worldwide LHC Computing Grid” between CERN and all the Institutions participating in the provision of the Worldwide LHC Computing Grid with a Tier-1 and/or Tier-2 Computing Centre (including federations of such Institutions with computer centres that together form a Tier-1 or Tier-2 Centre), as the case may be, represented by their Funding Agencies. The MoU is extended automatically for successive 5-year increments and has been signed by the Danish Natural Science Research Council, Helsinki Institute of Physics, The Research Council of Norway and The Swedish Research Council.

Nordic Implementation of the WLCG Memorandum of Understanding

The Nordic countries collaborate on a distributed and virtual Tier-1 centre through the NeIC, called the Nordic Data Grid Facility (NDGF Tier-1). NeIC (and NDGF) is funded by the research agencies of Denmark, Finland Norway and Sweden. Through the provision of resources and services the NDGF Tier-1 contribute to the integrated WLCG effort to process and store the experimental data from the LHC. NeIC owns no hardware, so the services provided by NDGF depend on leveraging national hardware resources. The computing and storage hardware is made accessible to NDGF by the national research funding agencies through subcontracts to the appropriate national e-Infrastructure providers. There are currently seven centres in the Nordics contributing with such hardware.

Resources and users of the Nordic WLCG Tier-1

The users of the WLCG Tier-1 (and Tier-2) resources are the experiments as a whole. Since the data sets are enormous and distributed all over the world, user analysis jobs are sent to where the data is, no matter where it is. From this point of view the physicists in, for example, Oslo have no more or less direct use of the hard-drives or processors installed at the University of Oslo than at Brookhaven National Laboratory in the USA. In practise, the big scientific collaborations also use informal mechanisms to motivate the member institutes to contribute resources. In this indirect way, the Nordic physicists report that they benefit from the Nordic WLCG participation through increased scientific influence, more prominent roles and more co-authorships than they would have experienced elsewise. From this point of view, a Nordic Tier-1 also gives greater benefit than four national Tier-2 sites of an equivalent total size.

The size of the Nordic WLCG Tier-1 has been agreed between the four Nordic partners to the WLCG MoU that the total resources pledged should be 6% of total ATLAS needs and 9% of ALICE needs. The split between the Nordic parties is made according to an author key.

Currently storage and computing is in production use at:

  • NBI, University of Copenhagen
  • NSC, Linköping University
  • USIT, Oslo University
  • CSC, Espoo
  • BCCS, Bergen University
  • HPC2N, Umeå University.

A dedicated 10Gbit/s network connects the sites to central services located at a facility in Copenhagen where the incoming network from CERN connects.

History and investigation of current setup

The Nordic Tier-1 was created in 2005 to provide a common WLCG Tier-1 site in the Nordics, where no single country is large enough to host one. For reasons of Nordic cooperation and the difficulty in settling on a single institute and sending money accross borders a distributed solution was setup with NORDUNet A/S hosting the project. The NDGF Tier-1 site was implemented as a distributed one where the resources (storage servers, tape libraries, compute clusters) are funded nationally and run by HPC sites in all the participating countries. In 2012 NeIC took over the Nordic Tier-1 as the NT1 area and has continued to run this in a distributed fashion to reflect the distributed nature of financing and resources.

The NT1 activity is the glue that presents distributed resources in the Nordics as a single coherent Tier-1 site to WLCG, and as such spends lots of time communicating with the users as well as the resource providers at the individual HPC centres. It is speculated that this communication overhead would be smaller for a single site, and, furthermore, there might be technical simplifications possible with a single site that could lead to lower total operational costs. However, it is relevant also to assess indirect benefits resulting from this cross-border collaboration, such as competence sharing, community building within the e-Infrastructure provider organizations, user community engagement, robustness, etc.

This project has been asked for by the NeIC board, ref board minutes 4. Dec, 2014, Item NeIC 14-46 WLCG funding streams: “NeIC should examine how much it would cost to run the Nordic Tier 1 as a single Nordic entity. If a Nordic solution is clearly more cost-efficient there are grounds for developing this idea further.”

Project idea

This project will contribute to stable Nordic WLCG tier1 operations and financing by evaluating different options for operations, in particular if a centralized mode is clearly more cost-efficient compared to the current distributed setup.

Expected benefit

Either a more cost-efficient mode of operation will be found, or uncertainty on whether or not we are as cost-efficient as possible will be cleared up. This evaluation would be an important basis for a discussion on whether or not to reorganize the Nordic WLCG Tier-1 into a single site, together with a risk assessment and an evaluation of the non-monetary aspects of the different modes of operation.

In addition points for efficiency improvements of the ongoing operations might be found when evaluating the activity. These might still apply no matter what decision is made with regards to reorganization, since the current mode of operations will be in place for several years in either case.

Basis

  • WLCG MoU
  • HPC / HTC cost studies
  • Comparisons with other WLCG sites

Contact persons

< List all persons having knowledge or information and that could/should participate in the preparation phase. Specify whether the person was already contacted, is interested in the project or would be willing to provide more details, or can be considered as official National contact point. Especially, note contacts in the national e-infrastructures. Attach a list of potential stakeholders. >

Name Email Role

Timeframe and estimates for the preparations

< Enter the dates and estimates for the work up to DP2/DP3 (cf. annex 1). >

Commitment up to Date Estimated effort Estimated expenditures
DP2
DP3

Project goals

Result goals

A report on the cost-efficiency both of the current configuration, and of other possible configurations of a Nordic WLCG Tier-1. This report is intended to be input to the NeIC Board’s discussion on future configuration of the Nordic Tier-1 collaboration.

The report should accurately estimate the effects of the different modes of operations and any changes will also be assessed with transition costs and risks.

Optionally the report could also provide suggestions for efficiency improvements that could be applied even if there isn't enough savings to motivate a major reorganization.

Time goals

The report is to be presented to the board during the winter 2015/2016.

Cost goals

The project has funding for 3 person months available, plus some travel funds. This is intended to cover a single person acting as project leader, investigator, and report writer.

Project objective priority

Priority
Result Time Cost
0.5 0.1 0.4

Financing

The project is suggested to be funded from NeIC core funds, after approval from NeIC board at the March 2015 meeting.

Other

Steering group

  • NeIC NT1 area coordinator: Mattias Wadenstein (chair, project owner representative)
  • NeIC Tier-1 CERN liaison: Oxana Smirnova
  • Nordic experiment representative: Anna Lipniacka (University of Bergen)

Minutes

Known risks

Getting an accurate view on the resources actually spent on the current solution can be problematic, previous attempts have not gotten a full view of costs.

Finding a competent yet unprejudiced evaluator could prove challenging.

References

< Refer to any additional information that can form the basis for the project preparation. >

Ref.no. Document name, document designation Edition, date
1
2

< .doc page break here >

Annex 1 – Terminology

1. Decision points

During the life span of the project from startup to termination, a number of formal decisions must be made by the steering group. These fall into eight different types; which are numbered in the chronological order in which they are typically made.
DP1 – Decision point type 1; steering group decision to start the project, based on the project directive.
DP2 – Decision point type 2; steering group decision to continue, change or interrupt the project based on findings during the preparation phase. A project may have multiple DP2.
DP3 – Decision point type 3; steering group decision to approve the project plan developed during the preparation phase. Typically this is tied to a DP4 decision to start the execution phase.
DP4 – Decision point type 4; steering group decision to start the execution phase.
DP5 – Decision point type 5; steering group decision to continue, change or interrupt the project based on findings during the execution phase. A project may have multiple DP5.
DP6 – Decision point type 6; steering group decision to approve the result of a delivery, for example to end users. A project may have multiple DP6.
DP7 – Decision point type 7; steering group decision to transfer the responsibility for a delivery, typically to operations in a receiving organization.
DP8 – Decision point type 8; steering group decision to approve the final report and terminate the project.