Two key points:
Predictive AI are only useful if they are correlated real outcomes in Orthopedics.
Predictive AI will require at least 10,000 cases in the datasets, and 100,000 is much more effective.
A medical exam yields, at first, an educated guess.
Take a patient whose X-ray shows signs of Covid pneumonia, for instance—a radiologist might flag that possibility for the attending physician. Now imagine that X-ray goes on to be used as a training tool to help with future diagnoses for other patients. Without additional data—like a Covid lab-test result, a genomic sequence, or a down-the-line update on how that patient ultimately fared—the X-ray is of limited use. Since it’s not tied to an outcome, it doesn’t offer a complete picture. There’s no way to know if the identified signals actually correspond to Covid pneumonia.
Nightingale Open Science, a new research resource, wants to make those educated guesses smarter by making high-quality, outcomes-based datasets widely available for researchers building AI tools for health care.
The group allows anyone conducting nonprofit research to access 40 terabytes of medical data for free—a resource that could shed light on medical mysteries and promote earlier diagnoses of high-risk conditions. Nightingale’s records span persistent medical issues, like sudden cardiac death, cancer metastasis, and maternal mortality, and have been collected from patients in the US and Taiwan before being vetted and de-identified. Currently, most similar datasets are kept for internal use only in medical institutions or at tech companies developing health products, Obermeyer said.
“A particular strength of [this] data collection at this scale—in terms of volume but also in terms of time—is that it has a bird’s-eye view of what happened to the patients,” Dr. Howard Chen, chief imaging informatics officer at Cleveland Clinic, which is not involved with Nightingale, told us. He added, “You’re using a historical record, and then going back to use that historical record as the future of the scan. So I know what really happened to the patient.”
The nonprofit debuted in December with $6 million in funding with Schmidt Futures, former Google CEO Eric Schmidt’s philanthropic organization, as a key backer. Its founders, Ziad Obermeyer and Sendhil Mullainathan, are both professors and researchers specializing in machine learning and medicine.
For health care researchers, the current landscape of medical data can be described in one word: siloed.
Most of the time, the pre-vetted, high-quality data is either owned by medical institutions—which largely limit data access to their own researchers—or bought up by big tech companies, which can pay a premium for data to inform product development and other projects. That can make research tough for anyone in the middle, like PhD students, junior faculty members, and medtech startups.“When I do my own research projects, I just spend a ton of time that I should be spending on research negotiating for access to data—and there are so many frictions, like getting either myself into the hospital system to access the data or getting the data out, it’s like a multi-year process,” Obermeyer told us, noting that his affiliations with legacy institutions give him a leg up over many others’ requests. He added, “On a high level, there’s a whole field of study, there’s a ton of products that are just not happening because the data are so hard to access.”
That was a key driver behind Nightingale, which also tries to help fill in notorious gaps in available medical datasets, like what ultimately happened to a patient.
The project does this via linkages, which are conducted just before the data is de-identified. When Nightingale partners with a health institution, like a hospital or health system, it’s given access to raw patient data inside the partner’s records infrastructure. Nightingale processes the linkages then and there to chart a health progression, merging everything from e-health records (like lab information, vital signs, height, and weight), to cancer registry information, to Social Security data (to see if and when a patient passed away).
The linkages can “help researchers triangulate what actually happened to the patient” despite discrepancies that could interfere with data collection, like physician bias, Obermeyer said. For instance, while a doctor in one country may cite “old age” as a cause of death, a doctor in another may cite “sudden cardiac death” for the same case.
From there, Nightingale runs point on de-identifying the data, using HIPAA Safe Harbor criteria and removing patient names, Social Security numbers, and all other protected categories of identifiers. The partner health organization—or a third-party the partner enlists—then works with Nightingale to de-identify medical records, before data is moved onto Nightingale’s cloud platform. There, researchers of all kinds can access the data as long as they’ve been verified as conducting nonprofit research and signed a data use agreement.
Approved researchers aren’t required to share their plans for the data, but Nightingale says it takes additional steps to secure it, including keeping anyone from downloading, removing, or exporting data or models, and keeping track of what researchers do. “When you’re on the platform, you’re doing research in a surveillance state, where every line of code that you write is being stored and looked at and can be audited,” Obermeyer added.
But still, under the right circumstances, Chen told us it could be possible for virtually any anonymized data to be re-identified.
“One thing I worry [about] is the potential re-identification of the patient as you accumulate enough complexity and enough volume of data,” he said.
“Every time there’s data sharing occurring, there’s always that question: ‘Where is the patient in this entire conversation?’” Chen said. “Does the patient have the right to say, ‘I don’t want my data shared [with] any initiative—I understand it’s helpful to people, but that’s a picture of my body, and this is what I want being done or not being done to a picture of my body.’”
UK Biobank, a similar biomedical research database with data from more than half a million UK participants, had to get explicit consent from everyone included.
Obermeyer calls data privacy the project’s “biggest risk” and says the team is “very, very in tune to it.” Besides the data use agreement, the on-platform precautions, and HIPAA guidelines for patient privacy, Nightingale also opts to only upload data relevant to the individual medical problems, rather than a full data dump.
For its part, Nightingale does not ask patients point-blank if they’d like to be included—it goes through the hospital—but since the data has been de-identified, Obermeyer said they don’t need to. “It’s all HIPAA compliant.”
“Given that we’ve got hundreds of thousands of patients, it would be impossible to do anything that we’re doing if we needed to go back and individually ask every person for consent,” Obermeyer said. He added, “There’s, of course, the deeper question of, ‘Man, is this the right thing to do?’”
But the project’s basic guiding principles are similar to those used in other research, Obermeyer said: “Never compromise patient privacy,” “never do any harm to patients,” and data should be used for purposes that are “broadly in patients’ best interests.” He added that if anyone were to use data for anything other than nonprofit research—research conducted with the goal of publishing for the medical community—it would be grounds to kick them off the platform.
“It doesn’t mean that nobody from a for-profit company can access the data, but that they have to be doing nonprofit research,” Obermeyer said. “This is not about making products. This is purely about creating knowledge.”
Dr. Alan Karthikesalingam, a clinician research lead on Google’s Health AI team, told us Nightingale could help enable better comparison of healthcare AI systems.
“The nice thing about initiatives like the Nightingale Initiative, and there’s a few others that are really noteworthy…is that they provide a kind of scientifically rigorous benchmark that then multiple researchers, AI developers, even medical device manufacturers can potentially use…[which] then makes it possible to fairly and rigorously compare the properties, like compare the performance of AI systems, in a consistent way,” Karthikesalingam said.
Chen’s biggest hope for Nightingale: Allowing the tool to “cut through the hype of AI in medicine.” He added, “Without the relevant data, without people asking the right questions of the data, it’s really hard to think of AI in medicine as a problem-solving tool, something that actually can be used in practice.”
He thinks this project could change that, by taking advantage of machine learning algorithms’ aptitude for picking up on patterns and connections that humans may not see.
“Algorithms are very good at approximating something,” Chen said. “If you make [an algorithm] approximate the human judgment, it’s always going to chase behind the human judgment—and get really, really close, but…It’s always going to be inferior, never meeting or exceeding the human challenge. But if you train it on the future—which the human doesn’t have access to, but an algorithm can…then it can potentially start picking up some details that we never knew existed in the first place.”
The workflow doctor will see you now.
A seamless digital experience is what patients have come to expect these days. Contactless scheduling and paperwork, virtual payments, and online sign-in are all processes and workflows Formstack can help you streamline. Don’t let your patients’ patience wear thin—learn more about Formstack today.