Research Interests


  • Machine Learning, Artificial Intelligence, and Natural Language Processing
    • Theory-Guided Machine Learning
    • Deep Ensemble Algorithms
    • Efficient Learning Algorithms
    • Model Pruning Algorithms
    • Spatial-Temporal Data Analysis
    • Generalization of Large Language Models (LLMs)
  • Applications and Domains
    • Recommender Systems
    • User Behavior and Network Analysis in Social Media
    • Epidemiological Modeling and Disease Forecasting
    • Healthcare Analytics with EHR (Electronic Health Records)

Grants


  • FY25 Faculty Seed Grant Award
    • Principal Investigator: Shuai Zhang
    • Department: Data Science
    • Project Title: Provable Efficient Learning with Foundation Models
    • Co-Principal Investigator(s): Lijing Wang
    • Funded Amount: $10,000
  • FY24 Faculty Seed Grant Award
    • Principal Investigator: Lijing Wang
    • Department: Data Science
    • Project Title: Towards Improving the Generalization and Robustness of Large Pretrained Language Models
    • Co-Principal Investigator(s): Mengnan Du (Data Science)
    • Funded Amount: $10,000

Current Research Projects


  • image

    Improving the Generalization, Consistency, and Robustness of Large Pretrained Language Models.

    Key words: generalization, consistency, robustness, LLMs
    In this project, we focus on designing robust and consistent fine-tuning methods for LLMs in general domains. This includes examining the effects of random seed initialization on performance variability and addressing challenges related to domain adaptation. The goal is to make LLMs more reliable and versatile for diverse applications.
  • image

    Resource-Efficient Deep Recommendation Systems.

    Key words: deep learning, model pruning, feature selection, efficient learning
    In this project, we are developing efficient deep neural network (DNN)-based recommendation systems by employing model pruning and feature selection techniques. This research addresses computational challenges, enhancing scalability while maintaining performance in large-scale recommendation frameworks.
    • Ching-Hao Fan, Yue Ning, and Lijing Wang. Optimizing Recommender Systems: A Structured Pruning Approach with Pretraining for Enhanced Efficiency and Accuracy. . Under review in ACM KDD 2025.

  • image

    Integrating GNNs and NLP for Social Science Applications.

    Key words: deep learning, GNN, NLP, social science, network analysis, behavior analysis
    In this project, we explore the application of GNNs and NLP to analyze user behavior and network dynamics in crowdsourcing platforms. By modeling textual data and network interactions, this work seeks to uncover patterns in collaboration and knowledge-sharing, with the goal of designing adaptive systems that foster productivity and engagement.
    • Ching-Hao Fan, Hao Zhou, Yao Sun, Geovanny Palomino Roldan, Olga Kokshagina, Marc Santolini, and Lijing Wang. Incorporating Knowledge Sharing in Graph Learning for User Behavior Prediction in Crowd-Empowered Online Communities. .Under review in The 15th ACM International Conference on Multimedia Retrieval (ICMR 2025).

  • image

    AI in Medical Imaging and Surgical Navigation.

    Key words: NLP, LLM, image processing
    In collaboration with faculties in bioinformatics at NJIT and experts in medical imaging and neural sciences, we are developing an AI-based platform for automated surgical planning and navigation, with a focus on brain tumor surgeries. This work leverages LLMs, advanced ML algorithms, and cutting-edge imaging techniques to enhance the interaction between clinicians and imaging systems, aiming to replace manual tasks while ensuring precision and efficiency.

Previous Research Projects


  • image

    Improving consistency of deep learning models via ensemble techniques.

    Deep Learning Theory and Algorithm

    Key words: deep learning, consistency, correct-consistency, snapshot ensemble
    Deep learning models are assisting humans in making decisions and hence the user's trust in these models is of paramount importance. Trust is often a function of constant behavior. From an AI model perspective it means given the same input the user would expect the same output, especially for correct outputs, or in other words consistently correct outputs. We study a model behavior in the context of periodic retraining of deployed models where the outputs from successive generations of the models might not agree on the correct labels assigned to the same input. We formally define consistency and correct-consistency of a learning model. We prove that consistency and correct-consistency of an ensemble learner is not less than the average consistency and correct-consistency of individual learners and correct-consistency can be improved with a probability by combining learners with accuracy not less than the average accuracy of ensemble component learners. To validate the theory using three datasets and two state-of-the-art deep learning classifiers we also propose an efficient dynamic snapshot ensemble method and demonstrate its value.
  • image

    Epidemic forecasting with recurrent neural networks and graph neural networks.

    Epidemic Forecasting and Simulating

    Key words: RNN, GNN, dynamic networks, mobility map
    Forecasting the spatial and temporal evolution of epidemics has been an area of active research over the past couple of decades. Pure data-driven methods employ statistical and time-series-based methodologies to learn patterns in historical epidemic data and leverage those patterns for forecasting. Recurrent neural networks (RNNs) are widely used for time series forecasting since it can capture the temporal dynamics. Graph neural networks (GNNs) are famous for their ability to capture cross-spatial effects in dynamic environments. We propose novel frameworks that use RNN and GNN for spatio-temporal epidemic forecasting. Extensive experiments on seasonal influenza-like-illness (ILI) datasets and COVID-19 cases datasets demonstrate the value of the proposed methods.
  • image

    Combining theory and deep learning for epidemic forecasting.

    Epidemic Forecasting and Simulating

    Key words: DNN, theory-based causal models, synthetic data
    Deep learning methods have gained popularity in epidemic forecasting domain due to their advances in computer vision, natural language processing, and many other domains. A drawback with the deep learning models is their black box nature, while they are capable of providing correct inferences they lack explanatory power for the underlying phenomena. We are first proposing to combine mechanistic causal methods with deep learning based methods leading to explainable AI. The proposed methods are able to provide correct inference as well as better understanding of the learned models.
  • image

    Epidemic forecasting with mobility data

    Epidemic Forecasting and Simulating

    Key words: human mobility, GNN, agent-based SEIR models, metapopulation SEIR models
    Human mobility is a primary driver of infectious disease spread. Thus, the disease dynamics are heavily affected by human mobility behaviours. In this research work, we propose new models (metapopulation models, agent-based models, and graph neural network models) that leverage a large-scale anonymized mobility map aggregated over hundreds of millions of smartphones and evaluate its utility in forecasting epidemics. On one side, we factor mobility map into a metapopulation model to retrospectively forecast influenza in the USA and Australia. On the other side, we use mobility information to build graph neural networks for COVID-19 confirmed case forecasting at US state level. Our work takes the first step towards timely infectious disease forecasting at a global scale and opens new possibilities in studying human mobility and its applications to infectious disease epidemiology.
  • image

    Epidemic forecasting with social media data

    Epidemic Forecasting and Simulating

    Key words: twitter posts, topic modeling, agent-based SEIR models
    Traditional compartmental epidemiology models are able to capture the disease spreading trends through contact network, however, unable to provide timely updates via real-world data. In contrast, techniques focusing on emerging social media platforms can collect and monitor real-time disease data, but don not provide understanding of the underlying dynamics of ailment propagation. To achieve efficient and accurate real-time disease prediction, the framework proposed in this paper combines the strength of social media mining and computational epidemiology. Specifically, individual health status is first learned from user’s online posts through Bayesian inference, disease parameters are then extracted for the computational models in population-level, and the outputs of computational epidemiology model are inversely fed into the mining of social media data for further performance improvement.
  • image

    Health disparity analysis in infectious disease via agent-based SEIR simulations

    Epidemic Forecasting and Simulating

    Key words: health disparity, agent-based SEIR models, net return, vaccination strategy
    Infectious diseases such as Influenza and Ebola pose a serious threat to everyone but certain demographics and cohorts face a higher risk of infection than others. This research provides a computational framework for studying health disparities among cohorts based on individual level features, such as age, gender, income, etc. We apply this framework to find health disparities among subpopulations in an influenza epidemic and evaluate vaccination prioritization strategies to achieve specific objectives. The results, framework, and methodology developed here can assist public health policy makers in efficiently allocating limited pharmaceutical resources.
  • image

    Identity reconciliation via graph matching

    Network Science

    Key words: social network, percolation-based graph matching
    Linking multiple accounts owned by the same user across different online social networks (OSNs) is an important issue in social networks, known as identity reconciliation. Graph matching is one of popular techniques to solve this problem by identifying a map that matches a set of vertices across different OSNs. Among them, percolation-based graph matching (PGM) has been explored to identify entities belonging to a same user across two different networks based on a set of initial pre-matched seed nodes and graph structural information. However, existing PGM algorithms have been applied in only undirected networks while many OSNs are represented by directional relationships (e.g., followers or followees in Twitter or Facebook). For PGM to be applicable in real world OSNs represented by directed networks with a small set of overlapping vertices, we propose a percolation-based directed graph matching algorithm, namely PDGM, by considering the following two key features: (1) similarity of two nodes based on directional relationships (i.e., outgoing edges vs. incoming edges); and (2) celebrity penalty such as penalty given for nodes with a high in-degree. Through the extensive simulation experiments, our results show that the proposed PDGM outperforms the baseline PGM counterpart that does not consider either directional relationships or celebrity penalty.