You can submit data in two ways. You can submit results summary statistics (calculated and formatted according to the analysis plan) or you can submit individual-level data.
We prefer you submit individual-level data because they can be used beyond the few analyses that are described in the analysis plan.
Results summary statistics
Information on how to upload results summary statistics are given in the analysis plan in the section “Results upload instructions”
If you are not from US:
If you are from the US:
Researchers can have access to individual-level data in two ways. Researchers within the initiative (i.e. researchers that are registered to the initiative and that have also deposited data) and researchers outside the initiative.
Researchers outside the initiative
Access to individual-level data/datasets by external researchers is controlled by a Data Access Committee (DAC), which must be registered as part of the submission process. A DAC may consist of a single or several committee member/s that are responsible for making data access decisions in response to applications made by individuals wishing to access data. A DAC may be responsible for approving access to single or multiple datasets. Only those who have successfully applied for access via the DAC will receive access to the dataset(s) archived at the EGA and AnVIL.
Researchers within the initiative
Researchers within the initiative that have deposited data or results summary statistics or are part of established analysis groups will have fast-track access to the initiative's data deposited on EGA and AnVIL. The DAC, which is composed by the PIs of the studies that have deposited the data, will facilitate access to the full data pool. We are currently discussing which procedures to implement to facilitate fast access to these groups of researchers. All researchers are required to follow the code of conduct outlined in https://www.covid19hg.org/about/.
The EGA is working with the ELIXIR network to establish the EGA Federation network to enable data to be deposited within national jurisdictions. We expect to launch the first nodes in mid-late 2020. In the meantime, we suggest you contact your country's ELIXIR head of node to find out about the current status for your country.
The EGA is managed by EMBL-EBI and Center for Genome Regulation, Barcelona (CRG). At EMBL, that protection is enacted by the Internal Policy 68 on general data protection (IP 68). IP 68 resembles the GDPR, but adapts to the intergovernmental nature of EMBL and to the needs of enabling free scientific research across national borders. CRG is subject to the GDPR and implements it fully. The EGA GDPR notices can be found here.
Both EGA and AnVIL recommend using open standards and formats that are maintained by the Global Alliance for Genomics and Health (GA4GH), published in the GA4GH Genomic Data Toolkit. For genome sequencing data this includes FASTQ, BAM, CRAM, and VCF. All array-based technologies are accepted, which may include the raw data, intensity and analysis files, and there are no restrictions on data formats accepted.
Clinical data should be included as part of the study submission. We suggest formatting the data following the initiative’s data dictionary (tab FREEZE_1). Not all the variables listed in the data dictionary are required. If you want to submit variables that are not listed in the data dictionary please contact firstname.lastname@example.org
Yes, this is entirely possible. We suggest creating a dataset to submit every 500 samples