2022 IUPUI REU Workshop

About

Overview
The 2022 IUPUI REU Workshop will showcase the research projects conducted during the 2022 Mobile Cloud and Data Security Research Experience for Undergraduates (REU) program at Indiana University-Purdue University Indianapolis. The focus of the workshop is on data science and cybersecurity, but some projects also focus on topics such as Artificial Intelligence and Human-Computer Interaction. The workshop will be hosted online on friday, August 12, 2022 from 10:00 a.m. to 1:00 p.m EST.

Review Process
Papers should be submitted by August 9 at 11:59 p.m. PST to Kritika Verma or Sofia Khemani. All submissions will be peer reviewed by two reviewers. The review process will end on August 11 at 4:00 p.m.

Paper Format
Papers should adhere to the IEEE manuscript standard, be at least five pages, and be submitted as PDFs.

Committee

General Co-Chairs
Croix Gyurek and Emma Tong
Website Chairs:
Cary Xiao and Nam Pham
Technical Program Chairs:
Kritika Verma and Sofia Khemani
Poster Chairs:
Kalika Raje and Julia Zeng
Publicity Chairs:
Brevon Gude and Charles Richards

Publications

Patient-GAT: Disease Prediction using Multi-modal Data Fusion and Weighted Graph Attention Networks

Cary Xiao and Nam Pham

Abstract:The recent, widespread adoption of Electronic Health Record (EHR) data enables real-world patient similarity networks to be easily generated and analyzed. Graph Attention Networks have been extensively used to perform node-level classification on graphs. However, few papers have investigated the effectiveness of using Graph Attention Networks (GATs) on graph representations of patient similarity networks. This paper proposes Patient-GAT, a novel method to predict chronic health conditions by first integrating imputed lab variables with other structured data for patient representation. This data is then used to construct a patient network by measuring patient similarity, finally applying GAT to the patient network for health status prediction. We demonstrate our framework by predicting using real-world Electronic Health Records obtained from the Indiana University Medical Hospital. Experiments show that our approach meets or outperforms other methods using Area Under ROC Curve (AUC) metric.

Evaluating Various Defense Techniques Against Targeted Poisoning Attacks in Federated Learning

Charles Richards and Sofia Khemani

Abstract: Federated Learning (FL) allows individual clients to train a global model by aggregating local model updates each round. This results in collaborative model training while maintaining the privacy of clients' sensitive data. However, malicious clients can join the training process and train with poisoned data or send artificial model updates in targeted poisoning attacks. Many defenses to targeted poisoning attacks rely on anomaly-detection based metrics which remove participants that deviate from the majority. Similarly, aggregation-based defenses aim to reduce the impact of outliers, while L2-norm clipping tries to scale down the impact of malicious models. However, oftentimes these defenses misidentify benign clients as malicious or only work under specific attack conditions. In our paper, we examine the effectiveness of two anomaly-detection metrics on two different aggregation methods, in addition to the presence of L2-norm clipping and weight selection, across two different types of attacks. We also combine different defenses in order to examine their interaction.

Point Cloud Image Classification Via Machine Learning

Kalika Raje and Julia Zeng

Abstract: With the increase in digital communications, there are growing concerns about privacy threats and digital tracking. Thus, mmWave radar has been utilized in many systems to avoid concerns with other widely-used methods (e.g. vision-based, lidar-based systems, etc). Current research uses mmWave radar signals to generate point cloud images that reflect the locations of objects intended for tracking and identification. However, the varying quality of the generated point cloud images may impact the accuracy of gesture recognition and object tracking. As such, it is critical to be able to distinguish quality point cloud images for better performance. In this paper we present an analysis on multiple machine learning algorithms' abilities to classify higher and lower quality point clouds. We find that convolutional neural networks, known for their image processing abilities, performed best at classification.

Quantifying Text Data Quality for Pedestrian Behavior Prediction by Autonomous Vehicles

Brevon Gude and Emma Tong

Abstract: Along with recent proliferation in computing technologies, multiple modalities such as image/video data and sensor data have been used to improve the decision making ability of autonomous vehicles, such as pedestrian-intent trajectories prediction. To further progress toward a solution to better predict pedestrian behavior, we argue that text data, which closely reflects human's thought process, would be a potential modality that may enhance the reasoning mechanism of machine learning models. In order to collect such quality text data, we introduced a comprehensive data annotation tool equipped with user-assistive, data-enhancing features known as the Object Level Grounding (OLG). However, little evaluation has been done so far on the OLG feature, leaving us to question whether such feature would elevate the ability for models to predict pedestrian behavior or not. In this paper, we propose a general approach to assess the OLG feature through quantitative analysis, defining new metrics for measuring the quality of text data. We further discuss some of the limitations of our studies and suggestions for future works.

Brief Survey of Cryptographic Attribute-Based Access Control Systems

Croix Gyurek and Kritika Verma

Abstract: With the growing number of attackers that are trying to access important data files in big corporate companies or even small but major companies, technology also needs to move with it. Access control allows us to control who can and cannot view certain documents. One method for enforcing this is Attribute-Based Access Control (ABAC), where files are allowed or blocked based on ``attributes'', which are binary properties that a user can have or not have. Users can have multiple attributes and policies can be based on arbitrary Boolean functions of these attributes, so implementations of ABAC must be designed specifically for these policies. In this research paper, we surveyed three main research papers that used different schemes to design a stronger and more efficient ABAC model. All three methods relied on bilinear maps to allow the decryption step to derive mask values from the encryption step based on the user's key, without allowing multi-user collusion or key forgery. Two out of the three papers used the Linear Secret Sharing Scheme (LSSS) matrix to represent policies numerically, and one of them used recursion and polynomial interpolation in their decryption step. All three systems claim to be resistant to attacks, but only two provided explicit hardness assumptions to justify such claims, and one paper relied on the hardness of four different, though related, problems to maintain all security. In this paper, we take a look at all three papers and compare the three papers with each other. We also discuss the efficiency of the schemes as the number of attributes increase.