Paving the road for data science in ORCHESTRA
– now and for the future
Work Package 7 (WP7) deals with the task of data management in ORCHESTRA, a task which is particularly challenging as ORCHESTRA has 26 beneficiaries and many third parties and collaborations from about 15 European and non-European countries involved and works with cohorts located in different countries. A cohort means a large number of patients which are grouped by certain categories. E.g. ORCHESTRA’s Healthcare Worker Cohort researches the immunisation status of healthcare workers after vaccination or COVID-19 infection. Cohort research relies very obviously on data extraction and data analysis. In order to be able to draw correct information from the data, ORCHESTRA depends on common standards and codes. Yet, medical standardisation and harmonisation is a worldwide challenge which is still work in progress. Prof. Dr. Sylvia Thun, Eugenia Rinaldi und Caroline Stellmach from Charité partner are tackling this important task and speak openly about challenges, motivation and visions.
You work for ORCHESTRA’s WP 7 “Data Management” and your specific task is data harmonisation and standardisation. Most people do not have a very clear picture of what this is about. How would you explain to some friends from a totally different background what you do?
Sylvia Thun: So first of all, we have an overall vision: to have a unique ecosystem where we can share data so that we can do science on this data and do more data-driven medicine and therapy. And within this mission there is a huge community, which supports this vision and this mission with standards. There are over 300 different IT-standards in healthcare at ISO level and besides that, there are several standardisation organisations, which are very important. We are working a lot with a new standard that is called Fast Healthcare Interoperability Resources, which is an interface standard that allows to exchange data in an interoperable way. This means that all users can really interpret the data in a correct – and very precise -way.
What is the specific challenge in ORCHESTRA for you?
Eugenia Rinaldi: The challenge is indeed that ORCHESTRA is such a complex project with so many different partners and so many different cohorts. Some cohorts are already collecting data, some cohorts are just working together for new prospective studies. So, it is really complex because we have a variety of scenarios. This is actually quite challenging – the standardisation and harmonisation process. So, our approach is actually to introduce terminology standards within ORCHESTRA.
How do you find these common terminologies?
Eugenia Rinaldi: We need to identify all the healthcare concepts that are being collected inside the studies in a unique and unambiguous way – both perspective and retrospective. This is also useful to identify common elements. For example, Caroline and I have been working hard since the beginning to look at all the work packages dealing with data. We have been looking at all the data – and we still are – associating concepts to international codes and then identify common elements. This is quite a challenging work because there are really a lot of variables. Sometimes you even need to find out what they mean. Because without the code, sometimes it is really hard to know what was meant, what is behind that description, that question and so we need to go back to the subject matter experts and talk to them. Do you really mean this or that? This is also part of the work we are doing: associating the variable to the concept. This is very important because it is in this way we are trying to deduct basically a core data set of elements that could be used within ORCHESTRA. As I said, this is a work in progress where we are almost done with the Work Packages 2, 4 and 6, but we now have to face 3 and 5 and there are still some changes in variables.
Caroline, how would you describe your job?
Caroline Stellmach: Sylvia and Eugenia already gave a very comprehensive overview, but maybe, just an addition to that. Coming up with this core data set, we want to give the partners and ORCHESTRA the possibility that – when they launch future new studies -, that they can draw from this pool of existing concepts and possibly use them for their studies. So that the merging of the information in the end for use in ORCHESTRA for analysis is then so much easier because this coding has already been done.
This means your work sets ORCHESTRA’s partners up for the future, right?
Caroline Stellmach: Yes, this is great motivation for us.
Please allow us some more insights. How are you looking in the data and how do you draw any conclusions from it?
Eugenia Rinaldi: We work with the work packages. We ask them to send us their data sets, the data that they want to collect or that they are already collecting and then we analyse every single element of that list – both the question and the answers. We try to understand first, what it is they mean; what is actually meant in that element and in the answers? And then we select the best, the most appropriate standards for that particular variable. We look up the code for that concept and we associate it to the variable. We repeat this procedure for each of the data elements.
So according to the field of use, if it is a test, if it is a questionnaire, if it is a broader concept, we select the standards. Sometimes when there is more than one international terminology possible, we add more codes to the list. This way, we produce a file with all these variables.
What happens next?
We go to the next data set that we receive, we do the same thing. We then look up, if there are codes that are the same, so that we know that a variable is actually appearing in one work package and also in another work package or in one study and in another study. We then know that that variable belongs to both studies.
This seems like a very detailed daisy-chaining. Does this mean that sometimes your colleagues from other work packages do already work with a certain coding and then you have to jump in and ask them to change their coding according to common standards?
Eugenia Rinaldi: Oh, yes! This is the greatest challenge in face of the fact that some datasets have already been defined – that is hard! I mean, sometimes we have to make them converge to one ORCHESTRA data set. Sometimes it is really, really hard! First of all, we need to convince a group, to change a little bit. We build a template for this core data set, then everybody needs a little bit to adapt to it. And, of course, there is always a little bit of resistance in this. Sometimes, it is also, because they already started collecting data. So, if just, for a trivial thing, you ask for the sex, and then one is collecting only female and male and somebody else is collecting female, male and unknown, you already have something that is different, a third element… This is the type of problem that can occur if data has already been collected.
How do you convince your colleagues to adapt to your data coding?
Basically, the effort is, first of all, to produce a good template that can be useful for all these studies going on. We then emphasize on how harmonised data is the only way to use data for analysis. In the end we want to produce a definition of data that can be the same for everybody that serves the process of analysis, the production of research and finally the creation of knowledge.
But the devil is always in the details, right? How do you foresee every single detail? Or does it happen that you establish a data set and then the researchers find out that there were variables missing?
Eugenia Rinaldi: Absolutely, absolutely – yes! These details are sometimes very tricky. This can be a tough call. We have to first understand the researchers, they are the experts. We need to make sure that we do not change the research questions with our coding, for sure. We are just trying to identify the common elements and the ones that could be the same – we also look for differences which could be important.
Caroline Stellmach: Another interesting aspect of that is that, of course, not all codes already exist. There is a large repertoire of codes that do not already exist. In particular, as the COVID-19 disease is evolving, so are the codes. We have been submitting a lot of codes to international health data standard organisations such as LOINC for laboratory tests but also for questionnaires. As a rough guess, I would think that we have submitted more than 50 concepts.
Sylvia Thun: We are not just consuming terminologies and standards but we are improving these standards. And so, we had the opportunity to work together with those huge International organisations and to find new codes and concepts here. Just as Caroline said – we try to find ways to improve the terminologies.
Sylvia, you can look back to more than 20 years in data operation, you are a physician and an engineer. What back then made you go into this domain?
Sylvia Thun: I decided when I was a young physician to go into medical informatics. I had understood that healthcare was a global and not a local issue. Here in Germany, we only had local projects, many, many projects and a lot of funding, but all these projects did not work together. And so, my idea was to first have a national initiative, which is now the so-called Interoperability Forum, where we have a growing community right now, and where we meet each other regularly in person. Secondly, and just as important is, to join with a global community and to work together globally. The World Health Organisation (WHO) for example also has a clear strategy towards common standards, as well.
And now, in the pandemic era and with our ORCHESTRA project (and the German federal projects within the Netzwerk Universitätsmedizin), we see that data harmonisation and standardisation should have been the goal before, but we have not quite achieved it, yet.
How can we imagine in which way are you linked to the International Community that is working on that data-harmonisation?
Sylvia Thun: We have been a part of the community for more than 20 years. So, we build standards together and we did implementation guides with the International Community. We are involved into some connected structures where we try to test data so that it really works. Not only on paper, but the vendors work together here to prove that they are compliant with these standards and beside that, I am the chair of HL7 Germany, which is an affiliation of HL7 international, and I am the chair of our community in Germany which is called “The Tick” – an umbrella organization for standardisation in healthcare and we are working together with many other organisations, for example the EMA, the European Medicine Agency. We have a very strategical view because we know what is coming and, what is coming from the regulatory side. Well, if you have clinical trials in our opinion, everything should – and will be – standardized in just a few years. We are not the very first ones who do it, but we are in the beginning of a huge new era in healthcare.
What are the greatest learnings that you would like to share with people who are not too much into the matter?
Eugenia Rinaldi: From my side, I would say that this project shows how using the right standards from the very beginning would make research so much easier! And easier at a global level, too! Because if you start already with the available and standardised codes in the beginning – this work would be done already. I think we should learn that there should be more use of standards in research because that would really facilitate research.
The implementation of global data standards is just as revolutionary as the implementation of the metric system, right?
Eugenia Rinaldi: Very true!
There always has to be a first time – and as you have to find new codes this is truly pioneering work what you are doing, right?
Eugenia Rinaldi: Yes, at this level I would say so because there are few examples of such use of standards across countries and with so many participating cohorts.
Sylvia Thun, member of WP7, professor at BIH, Charité and and Northrhine-Westphalia. director of Core Unit e-Health and Interoperability. Physician and engineer
Eugenia Rinaldi, research fellow in Core Unit e-Health and Interoperability, background is physics
Caroline Stellmach, research fellow Core Unit e-Health and Interoperability, background in Biochemistry and international business management
Interview by Marlene Nunnendorf, ORCHESTRA Communication