Curriculum Vitae

Dr. Yijun Zhao

Associate Professor
Director of the MS in Data Science (MSDS) Program
Computer and Information Sciences Department
Fordham University

Research Interests

Dr. Zhao’s research focuses on machine learning (ML) – an essential branch of Data Science. Data Science draws heavily from empirical exploration of data to solve complex problems. However, the world around us constantly challenges practitioners with new information and previously unhandled exceptions. As a result, applying theoretical, off-the-shelf ML models often fares poorly in practice. These models’ failures typically arise from data’s incompleteness, contamination with errors, or violation of underlying assumptions. In her research, Dr. Zhao and her team strive to approach real-life problems from new perspectives that make them amenable to ML methods and leverage experts’ domain knowledge. These new methods creatively exploit unique circumstances associated with a given task to establish a sound theoretical footing and allow for more effective learning. The resulting new approaches can often be extended to other domains with similar learning needs. In the course of her work, Dr. Zhao has established robust, long-term collaborations with prestigious research institutions, including Harvard Medical School, NYU Langone Comprehensive Epilepsy Center, and Fordham researchers at the CIS, Psychology, and Chemistry departments.

I am actively looking for motivated students to work on applications of machine/deep learning in various domains, including healthcare, education, finance, and economics. If you would like to explore an area of research that is driving innovations in every sector of our daily life, please contact me to discuss available opportunities.

Past and Current Projects

Fordham students are marked in red in the listed publications.

  • Predicting Disease Course of Multiple Sclerosis Patients
    Multiple Sclerosis (MS) is the number one medical cause of neurological disability amongst young persons in the U.S., with an overall prevalence of 400,000. The majority of cases present with relapses involving neurological deficits such as vision blurring or loss, weakness, numbness, imbalance or cognitive deficits. In our research, we work closely with doctors from Harvard Medical School and Brigham and Women’s Hospital (BWH) in Boston, Massachusetts to predict the disability level of MS patients at the fifth year mark using their first two year's longitudinal data. Our clinical data are collected as part of the CLIMB (Comprehensive Longitudinal Investigation of Multiple Sclerosis at Brigham and Women’s Hospital) study at BWH. The CLIMB study is a large-scale, long-term study of patients with MS.  It is designed to investigate the course of the disease in the current era of treatment.  The main goals of the study are to identify predictors of future disease course when patients are at the beginning of their illness and determine the effects of treatment on disease progression and accumulation of disability.
    1. Y. Zhao and T. Chitnis, "Dirichlet Mixture of Gaussian Processes with Split-kernel: An Application to Predicting Disease Course in Multiple Sclerosis Patients," The International Joint Conference on Neural Networks (IJCNN), 2022   [PDF]

    2. Y. Zhao, T. Wang, R. Bove, B. Cree, R. Henry, H. Lokhande, M. Polgar-Turcsanyi, M. Anderson, R. Bakshi, H. Weiner, T. Chitnis, and SUMMIT Investigators. "Ensemble learning predicts multiple sclerosis disease course in the SUMMIT study," npj Digital Medicine, 2020   [Online]

    3. Y. Zhao, M. Berretta, T. Wang, and T. Chitnis, "A Temporal Model with Dynamic Imputation for Missing Target Values in Longitudinal Patient Data," IEEE International Conference on Healthcare Informatics (ICHI), 2020   [Online]

    4. Y. Zhao, B. Healy, D. Rotstein, C. Guttmann, R. Bakshi, H. Weiner, C. Brodley, and T. Chitnis "Exploration of Machine Learning Techniques in Predicting Multiple Sclerosis Disease Course," PLOS ONE, 2017   [PDF]

    5. Y. Zhao, T. Chitnis, B. Healy, J. Dy, and C. Brodley "Domain Induced Dirichlet Mixture of Gaussian Processes: An Application to Predicting Disease Progression in Multiple Sclerosis Patients," The IEEE International Conference on Data Mining Series (ICDM), 2015   [PDF]

    6. Y. Zhao, C. Brodley, T. Chitnis, and B. Healy, "Addressing Human Subjectivity via Transfer Learning: An Application to Predicting Disease Outcome in Multiple Sclerosis Patients," 2014 SIAM International Conference on Data Mining, 2014   [PDF]

  • Deep Learning for Detecting and Reducing Motion Artifacts in Brain MRI Images
    Modern neuroimaging is central to the assessment of patients with epilepsy. However, in-scanner head motion degrades the quality of brain MRI and thereby reduces the utility of MRI for the detection of clinically relevant neuroanatomical abnormalities. This research aims to address the motion artifacts associated with brain MRI scans using the recent advances in computer vision. Working closely with doctors and researchers from NYU Langone’s Comprehensive Epilepsy Center, we develop deep learning models for detecting and reducing motion artifacts in brain MRI Scans.

    1. S. Li, and Y. Zhao, "Addressing Motion Blurs in Brain MRI Scans Using Conditional Adversarial Networks and Simulated Curvilinear Motions," Journal of Imaging, 2022   [Online]

    2. H. Pardoe, S. Martin, Y. Zhao, A. George, H. Yuan, J. Zhou, W. Liu, and O. Devinsky, "Estimation of in-scanner head pose changes during structural MRI using a convolutional neural network trained on eye tracker video," Journal of Magnetic Resonance Imaging, 2021   [Online]

    3. Y. Zhao, J. Ossowski, X. Wang, S. Li, O. Devinsky, S. Martin, and H. Pardoe, "Localized Motion Artifact Reduction on Brain MRI Using Deep Learning with Effective Data Augmentation Techniques," The International Joint Conference on Neural Networks (IJCNN), 2021   [Online]

    4. Y. Zhao, B. Ahmed, T. Thesen, K. E. Blackmon, J. Dy, and C. Brodley "A Non-parametric Approach to Detect Epileptogeic Lesions using Restricted Boltzmann Machines," 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2016   [PDF]

    5. B. Ahmed, T. Thesen, K. Blackmon, and Y. Zhao, O. Devinsky, R. Kuzniercky, C. Brodley, "HierarchicalConditional Random Fields for Outlier Detection: An Application to Detecting Epileptogenic Cortical Malformations," The 31st International Conference on Machine Learning (ICML), 2014   [PDF]

  • Machine Learning to Monitor and Predict Lupus Disease Course
    Systemic lupus erythematosus (SLE) is a heterogeneous disease associated with premature morbidity and mortality. The clinical course is characterized by disease flares which can range from mild to life-threatening and can affect various organ systems. The objective of our research is to utilize EHR data and machine-learning methods to classify real-world SLE flares and to identify predictors of SLE flares. My team’s contributions in this area include exploiting machine learning methods to predict hospitalizations of SLE patients, and a personalized, patient-centered approach to optimizing lupus care.

    1. Y. Zhao, D. Smith, and A. Jorge, " Comparing Two Machine Learning Approaches in Predicting Lupus Hospitalization Using Longitudinal Data," Scientific Reports, 2022   [Online]

    2. Y. Zhao, M. Qin, and A. Jorge, "A Calibrated Ensemble Algorithm to Address Data Heterogeneity in Machine Learning: An Application to Identify Severe SLE Flares in Lupus Patients," IEEE Access, 2022   [Online]

    3. A. Jorge , D. Smith, Z .Wu,T. Chowdhury, K. Costenbader, Y. Zhang, H.Choi, C. Feldan, and Y. Zhao "Exploration of Machine Learning Methods to Predict Systemic Lupus Erythematosus Hospitalizations," Lupus, 2022   [Online]

  • Hospital Readmissions Analysis
    Reducing unnecessary admissions and readmissions to acute care facilities has been a focus of healthcare quality improvement efforts. The Agency for Healthcare Research and Quality’s (AHRQ) Healthcare Cost and Utilization Project (HCUP) estimated that in 2011, there were approximately 3.3 million adult 30-day all-cause hospital readmissions in the United States. Avoidable admissions and readmissions not only cause patients prolonged illness and pain, but also burden the healthcare system with unnecessary costs. HCUP estimated that in 2011, 30-day adult all-cause readmissions were associated with about $41.3 billion in hospital costs. In our research, we collaborate with researchers from University of Florida and apply the state of the art machine learning techniques to predict individual 30-day readmission probabilities of soon-to-be discharged inpatients. In addition, we examine trends in length of stay, hospital charges, and in-hospital mortality associated with different causes, as well as identified patient-level risk factors associated with 30-day readmissions.

    1. Y. Zhao, W. Wu, Y. Jin, S. Gu, H. Wu, J. Wang, X. Jiang, and H. Xiao, "Predicting 30-Day Hospital Readmissions for Patients with Diabetes," International Conference on Health Informatics (HIMS), 2019   [PDF]

  • Educational Data Mining
    Dr. Zhao is a co-director (with Dr. Weiss and Dr. Leeds from CIS) of Fordham’s Educational Data Mining Laboratory. Our research utilizes data mining and other analytical techniques to learn about student academic performance and the educational process. Our studies include descriptive data mining related to the sequencing and the relationships between courses and predictive data mining that predicts instructor effectiveness, future student performance, and suitable disciplines for a student to major in. Additionally, we are developing predictive models and a free software tool to combat systematic racism, gender bias, and cultural bias in academic letters of recommendation (LOR). To this end, we will leverage natural language processing (NLP) and machine learning methods to identify the language in LORs that may be associated with bias. The software tool will aid the recommendation writer by highlighting potential instances of bias and suggesting alternatives.

    1. D. Leeds, C. Chen, Y. Zhao , F. Metla, J. Guest, and G. Weiss, "Generalize Sequential Pattern Mining of Undergraduate Courses," International Conference on Educational Data Mining (EDM), 2022   [pdf]

    2. Y. Zhao, B. Lackaye, J. Dy, and C. Brodley, "A Quantitative Machine Learning Approach to Master Students Admission for Professional Institutions," International Conference on Educational Data Mining (EDM), 2020   [PDF]

    3. Y. Zhao, Q. Xu, M. Chen, and G. Weiss "Predicting Student Performance in a Master’s Program in Data Science using Admissions Data," International Conference on Educational Data Mining (EDM), 2020   [PDF]

  • A Study of Ways of Coping, Stress, and School Adjustment for College Students During the COVID-19 Pandemic
    The COVID-19 pandemic has resulted in a devastating loss of human life worldwide and extraordinary challenges to public health. University students have faced particular challenges during the pandemic. This research aims to leverage a proprietary dataset our team collected after the onset of the pandemic and develop machine learning models to predict college students’ ways of coping with stress, and school adjustment when facing unprecedented challenges. Because the COVID-19 outbreak was first reported in China, anti-Asian attitudes and rhetoric have increased exponentially both in the media and in daily life interactions. We further study racial/ethnic subgroups (i.e., Asian, non-Asian, and international) with customized predictive models and compare their coping strategies and stress levels.

    1. Y. Zhao, Y. Ding, H. Chekerid, and Y. Wang, "Student Adaptation to College and Coping in Relation to Adjustment During COVID-19: A Machine Learning Approach," Under Revision

    2. Y. Zhao, Y. Ding, Y. Shen, S. Failing, and J. Hwang, "Different coping patterns among U.S. graduate and undergraduate students during COVID-19 pandemic: A machine learning approach," International Journal of Environmental Research and Public Health, 2022   [Online]

    3. Y. Zhao, Y. Ding, Y. Shen, and W. Liu, "Gender Difference in Psychological, Cognitive, and Behavioral Patterns Among University Students During COVID-19: A machine learning approach," Frontiers in Psychology, 2022   [Online]

  • Deep Learning for Statistical Arbitrage
    This study explores the utility of deep learning (DL) approaches in statistical arbitrage under the “generalized pairs-trading” paradigm. Stock returns are regressed on a set of risk factors derived using Principal Component Analysis (PCA), and the long-short memory (LSTM) structure is employed to forecast directions of idiosyncratic residuals. Daily market-neutral trades are constructed based on the predicted signals. We compare the results to the influential relative value model by Avellaneda and Lee on the universe of S&P 500 stocks.

    1. Y. Zhao, S. Xu, and J. Ossowski, "Deep Learning Meets Statistical Arbitrage: An Application of Long Short-Term Memory Networks to Algorithmic Trading," Journal of Fiancial Data Science, 2022.   [Online]