Our team is dedicated to empowering individuals to save lives and safeguard communities against public health threats. By leveraging advanced analytics and generating forecasts, we strive to enhance responses to public health emergencies. This involves recognizing historical patterns in signals that may indicate changes in human behavior. Such insights can assist decision-makers in anticipating substantial shifts in healthcare usage.
We analyze in real time the way in which the general public searches for disease-specific information on the Internet. Specifically, we monitor anonymized Google search activity for terms such as “causes of fever” or “flu virus”. Our real-time systems take advantage of this Internet search-based approach and have been able to identify future surges in hospitalizations 2 to 6 weeks before they are reported by traditional disease surveillance systems.
With about 90% of Internet penetration in the United States, Internet search queries capture a high proportion of the population’s interests and concerns. For example, respiratory diseases-related Internet search activity has been shown to closely mirror a population's health status. We frequently observe that when a high proportion of the population is affected by a respiratory disease (for example, influenza), Internet searches related to such disease also tend to increase. In moments when disease levels are very low, we observe very little related search activity. An increase in searches for symptoms associated with respiratory disease infections in a particular region could thus indicate the beginning of a disease outbreak.
There are several advantages to using timely Google searches in respiratory disease surveillance systems. First, Google search trends are available in near-real time, allowing our surveillance efforts to monitor the population’s awareness of an imminent rise in infection cases without the delays that affect sources of traditional surveillance data. Second, search data can capture important changes in human behavior that can differ across demographic and geographic groups, allowing for the development of responses that target populations who need the most support. Finally, this search data is widely accessible and can therefore be used by researchers, public health officials, and other stakeholders simultaneously.
When someone begins exhibiting symptoms, such as a fever, cough, or body aches, they often turn to the internet for answers. They may search for information about their symptoms to determine the type of disease they have developed. These individual searches, when aggregated with countless others, form a collective pool of data. This accumulation of searches serves as an early indicator or "signal" of respiratory disease activity within a population. Remarkably, this digital footprint can act as a proxy for real activity, offering insights into its spread before official hospital reports are submitted.
Respiratory disease-related search volumes complement existing surveillance systems and have shown to be most effective when combined with other types of data. In our efforts, we are currently working on developing Early warning Systems that leverage internet-search data, along with official health reports on respiratory diseases (such as Influenza, RSV, and COVID-19) to address the following tasks:
1. Anticipating the start of potential epidemic surges
2. To estimate the timing of a surge’s peak
Our proposed methods leverage information from multiple internet-based data sources, commonly called digital traces, as they are collected when humans navigate the internet and serve as proxies of human behavior.
The early warning system (EWS) framework is designed to anticipate sharp increases in respiratory diseases (such as influenza, RSV, and COVID-19) transmission, as identified by changes in the effective reproduction number (Rt), an outbreak indicator preferred by the community of epidemiologists. We’ve worked to extend and validate its functionality towards other target signals. The early warning system operates in the following way: Initially, we gather raw data from respiratory disease-related internet searches along with the currently available hospital admission reports from the CDC.
Next, we pre-process this raw data to remove noise and extraneous information and then quantify the trend of each proxy over the past six weeks.
Once the trends are established, we pinpoint the moments when exponential growth in respiratory disease activity begins. These critical points signal the potential onset of an outbreak. Using these insights, we construct a data-driven model that projects the likelihood of the start of an epidemic surge in a specific location within the upcoming six weeks, providing a timely alert for preventive actions.
Similarly, we are currently working on an early warning system to detect peaks of an outbreak. We start by using an algorithm designed to identify peaks within our historical proxy data. Then, we utilize a data-driven methodology to determine which proxies consistently preceded the historical peaks of our target respiratory disease. Finally, we create a model that can provide estimates of the likelihood of the peak of a surge in a specific location within the upcoming six weeks.
A retrospective analysis over our latest 2023-2024 season shows our capability to precede national and state-level influenza activity surges by successfully detecting the start of the national and state-level epidemics in 47 of the 50 states up to 6 weeks in advance.
Additionally, a retrospective analysis performed with data starting in 2010 to 2024, shows that our Early Warning System accurately detected 80% of state and national influenza surges up to 6 weeks in advance.