Skip to content

Using GEL data for publications and reports, September 2024

Description

Our participants’ privacy is absolutely paramount, which is why we heavily restrict export of data from the Research Environment (RE). In this training session, we will look at what you are and are not allowed to export, and how you can ensure your exports are compliant. We will look at the tools available in the RE for compiling and composing your data. We will cover using the Airlock tool for export.

Where publications or theses require supporting evidence that is potentially identifiable, we will look at how you can enable access to this supporting data for reviewers or examiners without exporting it from the RE.

Timetable

13.30 Introduction and admin
13.35 Rules for data export from the RE
13.45 How to ensure your exports are compliant with rules
14.00 Working with LibreOffice in the RE
14.15 Using the Airlock
14.20 Sharing data with reviewers and examiners within the RE
14:45 Getting help and questions

Learning objectives

After this training you will know:

  • what you can and can’t export from the RE
  • how to compile your data for export
  • how to export data using the Airlock

Target audience

This training is aimed at researchers:

  • working with the Genomics England Research Environment
  • hoping to publish a paper, report, dissertation or thesis, or deliver a poster or presentation using Genomics England data

Date

10th September 2024

Materials

You can access the redacted slides and video below. All sensitive data has been censored.

Slides

Download the slides

Video

Give us feedback on this tutorial

Q&A

Q&A

the airlock form still says 3 months

live answered


Are we permitted to export data to generate Oncoprints for mutated genes, as long as the data specifies only general categories such as amplification or loss, without revealing the specific origin or nature of the events affecting the gene?

Oncoplots/oncoprints generated inside the RE are allowed for export where they are showing only somatic variant data at gene level. If they also show phenotypic information about the individual participants we would then review the identification risk of the information shown to decide whether it can be allowed for export or whehter the informaiton would need to be tweaked before it could be allowed for export.

Hope that answers the question: please let me know if anything else needs to be clarified!

Thank you, this is very helpful and answers my question. I plan to include clinical metadata of the samples, such as treatment status. I'll review everything closely before publication and reach out if needed.


How about short variants 1-10bp?

Probably OK for export provided the situation justifies a need for variant level data, the data shown abides by the rule that the data exported via Airlock must always be the minimal data needed for the given purpose, and if there is any phenotypic data shown about the individual variant carriers that the data shown is low-detail enough to be safe for export via Airlock (there will be details later in the presentation about how to use the Clinical Collaboration system under CRI to directly consult clinicians and participants for cases where a higher level of detail than can be allowed via Airlock is required)


if I am submitting an abstract to a conference which includes GEL data, do I need to get the data through airlock before the abstract is submitted or juts before data is published?

You need to get Airlock approval before communicating any information originating from the RE/NGRL to the outside world, so yes you would need the data for the abstract approved via Airlock before sending it in


Is it just phenotype that relies on the 5 rule? Does it relate to patients with rare variants (for example if less than 5 patients under a normalised disease group have rare variants in the same gene) - would that be identifiable?

The <5 rule for counts applies only to phenotypic data under the current rules: this is on the basis that a person's genetic information is very unlikely to be public knowledge, whereas their phenotypic information is much more likely to be public knowledge and thus has a much higher identification risk


Is there an expectation that the airlock team will read publications or see presentation slides beforehand if GEL data is used so that they can approve the text/graphs?

Hope I'm understanding this question correctly: Emily has now gone over some details on this in the presentation but basically you as a researcher are very welcome to write a publication or make a presentation inside the RE, but you're also very welcome to use exported data to make a publication/presentation/similar outside the RE.

To put it another way , it's fine to take any exported data and restructure it or add to it from general information from the literature to make a presentation etc. , it's really up to you how you want to work. What is very important , as is noted in every Airlock approval message, is that you must NOT add any additional GEL data that has not been approved for export unless and until it has also been approved - for example if you had exported some graphs about a specific cohort, and you then wanted to write a description of that cohort in your paper, you need to export all of the descriptive data via Airlock before writing that description (so e.g. cohort consists of 130 males and 200 females all with lung cancer, you would need to export this information in some format before writing it into the paper)


Is there any intention in the future to add things like reference managers (Mendeley, Endnote etc) or stats software (e.g. SPSS etc) to make writing entire papers inside the RE more feasible?

live answered


so we need to mention the name of the journal where the paper is going to be published? What if they reject it?

That is a good question: to be clear the publication ID field is optional and has no impact on the review of the request, it's more so for our interest to be able to collect the publication IDs for requests where this is a known publication ID and a confirmed journal to publish in. Where this is not determined at time of making the Airlock request, it's fine to leave that field blank


If we first export a graph via Airlock saying we want to publish it and then remove it during later stage of manuscript preparation, do we need to tell anyone? In another word, do we need to publish 100% of graphs that we export saying we want to publish these.

Nah, if you just end up not publishing something approved for export via Airlock it's fine. Unless there is some specific caveat on the approval Airlock approval can be taken as essentially approval to do whatever you want with the data - which can include doing nothing with it!


In order to use external tools or pipelines to perform a bulk data analysis, is it possible to export raw data like RNA-seq data without any GE IDs?

It depends on exactly what the situation is and exactly what the data is, but if it is raw data then the answer is probably no. There is a procedure to allow an exemption to the usual rule that all analysis needs to happen inside the RE in circumstances where the work cannot feasibly be done inside the RE. In such circumstances the researcher would need to agree to a list of conditions, including deleting all of the data once the work is complete.

Even where this exemption is granted though the data still needs to be relatively summarised - it is a judgment call by the Airlock team and Committee what circumstances justify this exemption and what data can be allowed for export under this exemption. Hope that answers the question, please let me know if there are any follow up queries or clarifications


Am I right in thinking that the publication policy for student thesis (e.g. PhD) is still to send for review before the actual examination and not before initial submission?

live answered


If we make any changes to a publication after it has been approved to we have to put it back through the approval process? (So not adding new data, maybe just changing the wording / adding a new paragraph to the intro/discussion etc?)

live answered


There was a mention of making presentations within the RE and presenting directly via this platform. Does this not go against the airlock rules? Or is it only an option if everyone you're presenting to is an RE member?

You can present to another approved RE user direct from the RE if you would like to (but not to non-RE users!). However the reference to making the presentation inside the RE was referring to the idea of creating the presentation, then exporting it, then presenting it after export. Hope that makes sense , please let me know if any follow up queries


Can I change the x/y axis labels post-airlock approval?

It depends: are you just renaming the same concept or meaningfully changing the information shown?

just rewording the same concept

I think I'm happy to say that that's fine and can be filed as "just re-structuring the same data"


I know genetic variants are treated more "leniently" than phenotypic data, does this extend to germline variants and not just somatic?

live answered

Yep no worries - realise it's a difficult question to give a blanket answer to


I am working within a collaborative group and I have had three emails regarding my lack of activity in our project. It seems that the message is that my “permission” ends and then I respond immediately and it is reactivated. Is there another method to keep me active so that I’m not “scared” that I will be discontinued from our project? Thanks for a response! (from the feedback survey)

part of our agreement with the NHS that exclusively active researchers are authorised to access the clinical datasets they have provided which is why we are obliged to remove user's access if they do not log in to the RE for over 6 months.

To keep your access, ensure you log in to the RE on a regular basis. The reminder emails will always arrive before you reach the limit, so if you log in whenever you receive them, you're covered.