Researchers at Penn Medicine and Intel Corporation are using the largest global machine learning to date to securely aggregate information from brain scans of 6,314 glioblastoma (GBM) patients at 71 facilities around the world and develop a model that can improve the identification and prediction of boundaries. led the effort. in three tumor subcompartments without sacrificing patient privacy. Their findings were published today Nature Communication.
Senior author Spyridon Bakas, assistant professor in the Department of Pathology and Laboratory Medicine and Radiology, said, “This is the largest and most diverse dataset of glioblastoma patients ever considered in the literature and has been made possible through unified learning.” Said. Perelman School of Medicine at the University of Pennsylvania. “The more data we can feed into machine learning models, the more accurate they become, which could improve our ability to more precisely understand, treat and remove glioblastoma in patients.”
Researchers studying rare conditions such as GBM, an aggressive type of brain tumor, often have patient populations limited by their institution or geographic location. Due to privacy protection legislation such as the Health Insurance Portability and Accountability Act of 1996 (HIPAA) in the United States and the General Data Protection Regulation (GDPR) in Europe, data sharing collaborations between institutions without compromising patient privacy data is a major barrier. for many healthcare organizations.
A newer machine learning approach called federated learning offers a solution to these barriers by bringing the machine learning algorithm to the data rather than following the existing paradigm of centralizing data into algorithms. Unified learning, an approach first implemented by Google for the autocorrect functionality of keyboards, trains a machine learning algorithm between multiple decentralized devices or servers (in this case institutions) that hold local data samples without actually exchanging them. It has previously been shown to allow clinicians in institutions in different countries to collaborate on research without sharing any private patient data.
Bakas, first authors Sarthak Pati, MS, a senior software developer at Penn’s Center for Biomedical Image Computing & Analytics (CBICA), Ujjwal Baid, PhD, postdoctoral fellow at CBICA, along with a study from this major led the collaboration. Micah Sheller, a scientist at Intel Labs and a research scientist at Intel Labs.
“The data helps drive discovery, especially in rare cancers where available data may be scarce. The unified approach we’ve outlined provides access to maximum data while reducing the organizational burden of data sharing.” said Jill Barnholtz-Sloan, PhD, an adjunct professor at the Case Western Reserve University School of Medicine.
The model followed a phased approach. called the first stage general first modelwere pre-trained using publicly available data from the International Brain Tumor Segmentation (BraTS) competition. The model was tasked with delineating the three GBM tumor subcompartments: “enhancing tumor” (ET), which represents vascular blood-brain barrier disruption within the tumor; “tumor core” (TC), which includes the ET and the tissue-killing portion and represents the portion of the tumor relevant for surgeons who remove them; and “whole tumor” (WT), defined by the combination of TC, which is the entire area to be treated with radiation, and infiltrated tissue.
This is the initial data of 231 patient cases from 16 regions and the resulting model was validated against local data at each centre. The second stage, called pre-consensus model, used the public baseline model and increased its accuracy by including more data from 2,471 patient cases from 35 sites. final stage or ultimate consensus modelused the updated model and included the largest amount of data from 6,314 patient cases (3,914,680 images) at 71 facilities across 6 continents to further optimize and test the generalizability of the unseen data.
As a control for each step, the researchers removed 20 percent of the total cases contributed by each participating site from the model training process and used it as “local validation data.” This allowed them to measure the accuracy of the collaborative method. To further assess the generalizability of the models, six sites were not included in any of the training stages to represent a completely invisible “out of sample” data population of 590 cases. The site at the American College of Radiology validated their model using data from a national clinical trial study.
After model training, the final consensus model delivered significant performance improvements against collaborators’ local validation data. The final consensus model had a 27% improvement in detecting ET boundaries, 33% in detecting TC boundaries, and 16% in WT limit detection. The improved outcome is a clear indication of the benefit that can be gained through accessing more cases not only to improve the model but also to validate it.
Looking ahead, the authors hope that because of its generic methodology, the applications of unified learning in medical research could be far-reaching, not only to other cancers, but to other conditions such as neurodegeneration and beyond. They also await further research to show that unified learning can comply with security and privacy protocols around the world.
Funding for this research was provided by the National Institutes of Health (U01CA242871, R01NS042645, U24CA189523, U24CA215109, U01CA248226, P30CA510081231, R50CA211270, UL1TR001433, R21EB0302091232, R37CA21491232, R37CA21491232, R37CA214948, CA2374180102, CA237401080102, CA237401080102, CA237401080102, CA237401080102, CA237401080102, CA237401080102, CA237401080102, CA237401080102, CA237401080102, CA 018001, 080012001, CA23740108010208, 00801080112001, U24CA23740108010208, 20080108, 2012, U24CA237401080112001 , CA180794, CA180820,1236CA180822, CA180868) and the National Science Foundation (2040532, 2040462).
Intel Corporation provided software engineer staff and privacy protection expertise to the project during the development of the software used.