Welcome! The following statistics provide some visusal insights into the ORCHESTRA Public Data Set. The Public Data Set constitutes patient data from the ORCHESTRA cohort after a data cleaning process and includes data from patients documented until November 29, 2024.
The ORCHESTRA Public Data Set is originating from the central ORCHESTRA data base. The data anonymisation pipeline is described by Jakob et al. in Design and evaluation of a data anonymisation pipeline to promote Open Science on COVID-19 and Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients. The public data is anonymised using our data protection concept. The anonymisation process was carried out with the ARX software
Copyright: This work is licensed under the Creative Commons Attribution Non-Commercial 4.0 License. With the use of this data you agree to include a proper acknowledgement of the ORCHESTRA study group in any work based on the data set. By working with this notebook you agree to maintain the confidentiality of the data set at all times and to not attempt to compromise or otherwise violate the privacy of the patients described. To view a copy of the license, visit https://creativecommons.org/licenses/by-nc/4.0/.
If you have any comments on the notebook, please drop us a message at support@orchestra-cohort.eu.
Here we provide information on the basic structure of the ORCHESTRA Public Data Set.
The data set consists of 4598 patients before anonymisation, 3982 patients after anonymisation, and 38 variables.
Each row represents the anonymised data of a single patient.
*The Clinical Phases are defined according to the WHO clinical progression scale:
To get to know the Public Data Set better, the values of variables are shown below according to the used data set. Please be aware that the Public Data Set is only a part of the complete ORCHESTRA data set. Anonymisation processes may lead to variables having less values than in the complete ORCHESTRA data set. For example the variable 'gender' can also have the value 'diverse', but there is no patient with this gender in the Public Data Set.
age: 18 - 39 years, 40 - 59 years, 60 - 79 years, >= 80 years, nan gender: Female, Male quarter_of_diagnosis: Q1-2020, Q2-2020, Q3-2020, Q4-2020, Q1-2021, Q2-2021, Q3-2021, Q4-2021, Q1-2022, Q2-2022, Q3-2022, Q4-2022, Q1-2023, Q2-2023 chronic_heart_disease: No, Unknown, Yes chronic_lung_disease: No, Unknown, Yes chronic_liver_disease: No, Unknown, Yes chronic_kidney_disease: No, Unknown, Yes active_tumor_cancer: No, Unknown, Yes auto_inflammatory_disease: No, Yes diabetes: No, Yes neurological_psychiatric_disease: No, Yes transplant: No, Unknown, Yes cigarette_abusus: Former, No, Unknown, Yes covid_vaccination: No, Unknown, Yes covid_therapy: Unknown, Yes dialysis: No, Unknown, Yes intensive_care_treatment: No, Unknown, Yes events_embolic: No, Yes events_pulmonary_embolism: No, Yes events_neurological: No, Yes events_cardiac: No, Yes events_bacterial_pneumonia: No, Yes highest_level_respiratory_support: High flow, Invasive ventilation, Mask or nasal prongs, No oxygen, Non-invasive ventilation, None most_severe_stage_acute: Mild, Moderate, Severe, Unknown hospitalisation: No, Unknown, Yes any_symptoms_acute: No, Unknown, Yes general_symptoms_acute: No, Yes neurological_symptoms_acute: No, Yes respiratory_symptoms_acute: No, Yes gastrointestinal_symptoms_acute: No, Yes systolic_blood_pressure: 100-119 mmHg, 120-139 mmHg, 140-159 mmHg, 160-179 mmHg, 80-99 mmHg, < 80 mmHg, > 179 mmHg, Unknown diastolic_blood_pressure: 110-119 mmHg, 40-59 mmHg, 60-89 mmHg, 90-109 mmHg, < 40 mmHg, > 119 mmHg, Unknown heart_frequency: 60-100/min, < 60/min, > 100/min, Unknown peripheral_oxygen_saturation: 60-69 %, 70-79 %, 80-89 %, 90-95 %, 96-100 %, < 60 %, Unknown respiratory_frequency: 16-20/min, 21-29/min, < 16/min, > 29/min, Unknown type_of_discharge_acute: Alive, Ambulant, Death, Referral to another insitution, Unknown availability_6month_followup: Yes any_symptom_6month_followup: No, Unknown, Yes
The following descriptive statistics are computed in this section:
The number of patients before anonymisation is 4598.
The number of patients after anonymisation 3982.
The following descriptive statistics on the health status at the end of medical consultation are computed in this section:
Note that we will use a filtered data set for computing the rates, which we describe below.
Before Anonymisation | After Anonymisation | |
---|---|---|
Alive | 3373 | 3034 |
Ambulant | 1028 | 838 |
Unknown | 139 | 60 |
Referral to another insitution | 46 | 38 |
Death | 12 | 12 |
From here on we will indicate the three clinical phases as
In the following we will plot the: