Federated Machine Learning for Patient-Centered Electronic Health Records

Project members: Felix Kluge

Start date: 1. April 2020

End date: 30. September 2022

Funding source: Deutscher Akademischer Austauschdienst (DAAD)

Project Partner Website: Refinio GmbH

Abstract

Electronic health records (EHR) are commonly institution-specific, provided by hospitals, insurance companies, or other institutions to fulfill their own objectives, thus, causing stored health information to be isolated, fragmented and duplicated across providers. Consequently, patients may lack complete access to their medical histories. As a solution, countries such as Denmark and Israel have long adopted nationwide EHR for their health care systems^a, in which the health information is well managed and digitally connected to avoid duplicate records and improve the quality and co-effectiveness of medical care as well as patient safety. In Germany, the “Appointment Service and Supply Act” adopted on 14th March 2019, requires the German statutory health insurance to provide EHR for all insured persons from 1st January 2021 onwards^b. As specified by German Health Care Information Technology Infrastructure in accordance with section 291a SGB V, the new EHR should store complete medical histories of patients such as previous diagnoses, therapeutic decisions, treatment reports and self-measurement values. Among other benefits, patients should have power to select freely between providers, hold data sovereignty for their EHR, and withdraw access rights at any time.

In line with those guidelines, OnePatient^c is a patient-centered EHR system that stores data locally under the sovereignty of individual device owners, thereby enabling patients to take control of their health information, provide offline access to medical data, ensure privacy management and to avoid a single point of failure. The OnePatient EHR system can be provisioned on any of the patients’ devices; therefore, patients technically own their medical data while the device and software manage it. On the one hand, these developments simplify the technical and organizational challenges to implement data regulations such as the General Data Protection Regulation (GDPR) of the European Union. On the other hand, the data will not only be in isolated, heterogeneous and distributed environments but also pose a new challenge to the conventional data transaction procedures employed in machine learning (ML) today [4]. The traditional procedures for acquiring big data in ML involve several parties from collecting the data, transferring it to a central data repository and fusing it to build a model, whereas the data owners may be unclear about these procedures and the model future use cases, for that reason, may violate laws such as GDPR.

Therefore, to address these challenges, federated learning (FL) approaches can be leveraged to build ML models that can be sent to train locally–where the data is located. In this manner, only the model updates that contain anonymous results which cannot be reverse-engineered are returned to the central data repository. Leveraging FL and the account of the FL existing studies [1; 2; 3], although not focusing on the emerging EHR systems’ architecture like OnePatient, we aim to attain four objectives. The first is to investigate and design novel FL frameworks that enable local systems to collaboratively train a ML model that patients can benefit from without divulging their medical information to a central entity; moreover, medical practitioners will be able to access the training process of the FL frameworks to adjust the diagnostic criteria of the model, and therefore increase trust and accuracy of the model outcome. Secondly, we aim to investigate and compare the accuracy and performance of the model trained in a centralized way and the FL frameworks that will be proposed. Thirdly, to protect the data during training from potentially malicious models and participants, we aim to use countermeasures such as differential privacy and multi-party computation to ensure privacy guarantees. Finally, for proof of concept, we aim to demonstrate the effectiveness of FL frameworks using existing databases and suitable ML tasks with the data.

1. Brisimi, T. S., Chen, R., Mela, T., Olshevsky, A., Paschalidis, I. C., & Shi, W. Federated learning of predictive models from federated electronic health records. Int.J.Med.Inf. 2018; 112: 59-67.

2. Roy, A. G., Siddiqui, S., Pölsterl, S., Navab, N., & Wachinger, C. Braintorrent: A peer-to-peer environment for decentralized federated learning. arXiv preprint arXiv:1905.06731 2019;

3. Xu, J., & Wang, F. Federated learning for healthcare informatics. arXiv preprint arXiv:1911.06270 2019;

4. Yang, Q., Liu, Y., Chen, T., & Tong, Y. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 2019; 10: 1-19.

a. https://www.gesundheitsindustrie-bw.de/en/article/news/ehr-and-phr-digital-records-in-the-german-healthcare-system
b. https://www.gematik.de/anwendungen/e-patientenakte/#
c. https://refinio.net/software.html