Machine learning model from the largest US COVID-19 dataset predicts disease severity
A centralized repository of COVID-19 health records built last year is beginning to show results, starting with a new paper published today. The repository is the largest set of COVID-19 records to date, and was built by a team of researchers and data experts last year to help make sense of COVID-19.
The study, published in the journal JAMA Network Open, looked at risk factors for severe cases of COVID-19 and traced the progression of the disease over time. The authors built machine learning models to predict which hospitalized patients would develop severe disease based on information collected on their first day in a hospital.
Using the centralized database, called the National COVID Cohort Collaborative Data Enclave, or N3C, meant the research team was able to include hundreds of thousands of patients’ records in its analysis. The study used data from 34 medical centers and included over 1 million adults — 174,568 who tested positive for COVID-19 and 1,133,848 who tested negative. It includes records stretching from January 2020 to December 2020.
The analysis shows how treatment for COVID-19 changed over the course of 2020, as doctors tried new treatments and gained more experience with the condition. The percentage of patients who were treated with the anti-malaria drug hydroxychloroquine, which was promoted by former President Donald Trump before proving to be ineffective, dropped off to nearly zero by May 2020. Use of the steroid dexamethasone ticked up in June, after studies showed it could improve survival rates.
It also confirmed that survival rates for patients with COVID-19 improved over the course of 2020. In March and April, 16 percent of people admitted to the hospital with COVID-19 died. In September and October, that dropped to just under 9 percent.
People who had higher heart rates, breathing rates, and temperatures when they arrived at the hospital were more likely to need drastic interventions like ventilation. They were also more likely to die. Abnormal white blood cell count, inflammation, blood acidity, and kidney function were also linked to more severe cases. The research team built machine learning models using those and other data points that could predict which patients would get seriously ill. The models could eventually be used as the basis for decision-making tools with additional testing, the authors wrote.
Researchers have been analyzing the trajectory of COVID-19 since the very start of the pandemic. This study has the advantage of pulling from a large and diverse dataset — it’s not restricted to one hospital or one state. In the US, researchers are often limited to studying the medical records from patients at the institutions where they work. That means the number of records they’re able to include in studies can be limited, and they’re not able to easily check if their conclusions would apply in other places.
A resource like N3C, which pulls together records from dozens of institutions, sidesteps those limitations. By now, N3C includes data from 73 health institutions and has records from over 2 million COVID-19 patients. More than 200 research projects using the data are underway, including studies examining risk factors for COVID-19 re-infection and the disease’s impact on pregnancy. It’s not perfect — standardizing information across hospitals is hard, and there may not be complete data on many patients.
Still, having such a large set of data is invaluable. Researchers are using the resource to run studies that they may not have been able to tackle with just their own institution’s resources, Elaine Hill, a health economist at the University of Rochester working on pregnancy research, told The Verge last fall. “It makes it possible to shed light on things we wouldn’t be able to,” she said.
A centralized repository of COVID-19 health records built last year is beginning to show results, starting with a new paper published today. The repository is the largest set of COVID-19 records to date, and was built by a team of researchers and data experts last year to help make sense…
Recent Posts
- Even YouTube’s pause screen won’t be safe from smart TV ads soon, as Google hints it’ll follow Hulu, Max and Peacock soon
- New Nintendo Switch 2 rumors suggest that the console’s battery life will be ‘clocked crazy low’ in handheld mode
- LastPass officially splits from former parent GoTo
- TikTok and Universal Music Group end feud with new agreement
- Amazfit’s new low-cost wearable packs in a big display and 26 days of battery life
Archives
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- December 2011