Assistant Professor, Computer Science & Engineering and Biostats/Epidemiology
New York University, Tandon School of Engineering and
College of Global Public Health
My mission is to develop computational and statistical approaches for acquiring, integrating and using data to improve population-level public health. Considering health from a comprehensive perspective means the data comes from inside and outside the clinic.
To accomplish this goal, my lab makes impact in computer science by addressing a range of technical questions inspired by the opportunity of new data sources in health. The main challenge is that the data are unstructured and the relevant features must be identified. This prompts questions about how to most efficiently gather human sourced data (PLoS Currents 2015, BMC 2017), and how to best address the complex challenges inherent in areas where data is sourced by humans (ISDS 2016), such as how to address missing data in location timelines by using prior knowledge on human mobility patterns (SigSpatial 2018), generating useful features of human behavior from passively available online social media using natural language processing and time series analyses (CSCW 2017, CSCW 2018a, CSCW 2018b). A recent line of work has also unpacked the data generating process for human-sourced data, in order to better understand causal mechanisms (JPH 2020, IJERPH 2020). Other work has focused on addressing challenges of identifying who the data represents and maintaining fairness even with data shift (AJPM 2016, ICWSM 2018, NeurIPS FairML 2019). Given the realistic challenges of data collected in different environments we are also interested in new methods for domain adaptation that use local information as necessary and can account for differences in the population in each environment (NeurIPS ML4H 2018, CHIL 2020).
Through these new computational methods, we also advance public health via better understanding of the multi-level factors related to health of populations. The work has advanced 1) prediction of infectious diseases through garnering and understanding community-sourced data for influenza (PLoS Currents 2015, ISDS 2016, BMC 2017, NeurIPS ML4H 2018, JPH 2020, CHIL 2020) and dengue (CVPR CV4GC 2019, PLOSNTD 2020) and 2) understanding of antecedents and risk factors for non-commmunicable diseases spanning substance use, diabetes and discrimination (CSCW 2017, CSCW 2018, ICWSM 2018, ICWSM 2019, ICWSM 2020, IJERPH 2020).
Overall, my deep immersion in both computer science and public health enable me to make unique contributions to research in both fields, as well as to make insights into how computer science and public health can interface.
I am teaching a short course on Data Science and Machine Learning for Public Health Research and Practice at NYU in August 2020.
I designed a Machine Learning for Public Health course which will be taught for the first time in Spring 2021 at NYU.
I am running an international, monthly Machine Learning in Public Health reading group. Please get in touch with me if you're interested.