Welcome! The following statistics provide some visusal insights into the ORCHESTRA Public Data Set. The Public Data Set constitutes patient data from the ORCHESTRA cohort after a data cleaning process and includes data from patients documented until January 17, 2023.
The ORCHESTRA Public Data Set is originating from the central ORCHESTRA data base. The data anonymisation pipeline is described by Jakob et al. in Design and evaluation of a data anonymisation pipeline to promote Open Science on COVID-19 and Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients. The public data is anonymised using our data protection concept. The anonymisation process was carried out with the ARX software
Copyright: This work is licensed under the Creative Commons Attribution Non-Commercial 4.0 License. With the use of this data you agree to include a proper acknowledgement of the ORCHESTRA study group in any work based on the data set. By working with this notebook you agree to maintain the confidentiality of the data set at all times and to not attempt to compromise or otherwise violate the privacy of the patients described. To view a copy of the license, visit https://creativecommons.org/licenses/by-nc/4.0/.
If you have any comments on the notebook, please drop us a message at support@orchestra-cohort.eu.
Here we provide information on the basic structure of the ORCHESTRA Public Data Set.
The data set consists of 3396 patients before anonymisation, 3026 patients after anonymisation, and 38 variables.
Each row represents the anonymised data of a single patient.
*The Clinical Phases are defined according to the WHO clinical progression scale:
To get to know the Public Data Set better, the values of variables are shown below according to the used data set. Please be aware that the Public Data Set is only a part of the complete ORCHESTRA data set. Anonymisation processes may lead to variables having less values than in the complete ORCHESTRA data set. For example the variable 'gender' can also have the value 'diverse', but there is no patient with this gender in the Public Data Set.
The following descriptive statistics are computed in this section:
The number of patients before anonymisation is 3396.
The number of patients after anonymisation 3026.
The following descriptive statistics on the health status at the end of medical consultation are computed in this section:
Note that we will use a filtered data set for computing the rates, which we describe below.
From here on we will indicate the three clinical phases as
In the following we will plot the: