2023 IUPUI REU Workshop

About

Overview
The 2023 IUPUI REU Workshop will showcase the research projects conducted during the 2023 Mobile Cloud and Data Security Research Experience for Undergraduates (REU) program at Indiana University-Purdue University Indianapolis. The focus of the workshop is on data science and cybersecurity, with a special focus on security related to current Deep Learning techniques. The workshop will be hosted online on Friday, August 11th, 2023 from 8:45 A.M. to 12:00 P.M. EST. The Zoom link can be found here.

Review Process
Papers should be submitted by August 9 at 11:59 p.m. EST to Joshua Schoenbachler or Vinay Krishnan. All submissions will be peer reviewed by two reviewers. The review process will end on August 10 at 4:00 p.m.

Paper Format
Papers should adhere to the ACM manuscript standard, be at least five pages, and be submitted as PDFs. Related information on formatting can be found at the REUNS 2023 Conference Website

Committee Members

General Co-Chairs:
David Chen and Ellie Fassman
Website Chairs:
Edwin Sanchez and David Xu
Technical Program Chairs:
Joshua Schoenbachler and Vinay Krishnan
Poster Chairs:
Sanaz Matinmehr and Anthony Weyer
Publicity Chairs:
Kehan Wang and Paul Jiang

Publications

Sorting Ransomware from Malware Utilizing Machine Learning Methods with Dynamic Analysis

Joshua Schoenbachler & Vinay Krishnan

Abstract: Ransomware attacks have grown significantly in the past dozen years and have disrupted businesses that engage with personal data. In this paper, we discuss the identification of ransomware, malware, and benign software from one another using machine learning tech- niques. We collected data samples from repositories on the internet as well as using a dataset from a previous study that provided a basis for our approach. We also collected ransomware, malware, and benign software samples manually from Cuckoo Sandbox™. We also filtered on certain feature groups to test to see if certain activity/processes in the infection process could be used to correctly distinguish ransomware from malware and benign software. These feature groups represent correlated processes within a running ap- plication: network activity, PROC memory activity, registry/events processes, and file interactions. The datasets were analyzed using several ML models which included Random Forest, SVM, Gradi- ent Boosting, and Decision Trees using binary classification. The best classifiers for distinctly identifying ransomware from benign software were Random Forest and SVC with an F1- score of 86% and an F1-score of 82% as well as an 85% in overall accuracy for Random Forest. In addition to ransomware versus benign software, we also compared malware software to ransomware data. Yielding a 100% accuracy in performance, Gradient Boosting Classifier and Decision Trees were the best at distinguishing ransomware from malware software. This high result may partially be caused by a smaller malware and ransomware dataset. Overall, we were able to successfully distinguish ransomware from malware and benign software.Read More

Evaluating the Impact of Noisy Data on Time-Sensitive Point Clouds from Millimeter Wave Gesture Recognition Systems

Paul Jiang & Ellie Fassman

Abstract: Point cloud data gathered through millimeter wave sensors has garnered increasing attention for its critical applications, including automotive radars, security systems, and notably, gesture recog- nition. It provides a non-intrusive and robust approach towards human-computer interactions; however, its reliance on real-time data makes resilience of paramount concern. Attacks on millimeter- wave sensors can have catastrophic effects. From real-time spoofing to data poisoning attacks or even just imperfect or poor data, sys- tems based on 2D and 3D point cloud machine learning models can be extremely vulnerable. Despite this, there exist few studies pri- oritizing the robustness of time-sensitive point clouds. This study presents an in-depth examination on the effects of noisy data on frame based time-sensitive point clouds used in millimeter wave gesture recognition machine learning models. Noisy data can be introduced during the training stage where imperfect data is fed to the model, causing this model to misclassify test-time samples and lower the overall accuracy of the model. We stage and evaluate the impact of four different, simple data noising scenarios to observe vulnerabilities within this system and to emphasize the importance of robust machine learning models. Noisy databases are particularly relevant to deep learning systems because these models need large amounts of data to train, many of which commonly scraped from the internet with little to no manual inspection. Our findings high- light the importance to not only dedicate time and research towards innovations in mmWave gesture recognition, but also towards the robustness and resiliency of these systems in order to proactively prevent destructive effects. Read More

Enhanced Disease Detection via Graph Clustering and Centroid-based Representations

David Xu & Sanaz Matinmehr

Abstract: According to the World Health Organization (WHO), 15 mil- lion people around the world become victims of a stroke every year. Of those 15 million people, 5 million succumb to the disease and another 5 million who survive become permanently disabled. Though it may not be common for those under the age of 40, anyone can become susceptible to a stroke, as factors such as high blood pressure can play a major role increasing one’s risk. It is crucial to identify such symptoms to discover patterns within a patient that may lead to a stroke. In this paper, we utilize five different graph clus- tering techniques to analyze patient data in order to evaluate and find the most effective and accurate method of detect- ing a stroke. Patient data from both stroke and non-stroke patients are used to identify phenotypes, which then form clusters to uncover overlapping patterns in the symptoms. Finally, we conclude our discussion with our final thoughts to the study and other methods that could be evaluated in the future. Read More

TrustAggFL: Enhancing Federated Learning with Trusted Client Aggregation for Improved Security

David Francis Chen & Kehan Wang

Abstract: Federated Learning (FL) has emerged as a promising approach for training machine learning models across individual devices while preserving data privacy. However, FL faces many challenges, specifically a vulnerability to adversarial attacks due to its strict adherence to ensuring individual client model and data privacy. To mitigate these issues, dynamic clipping techniques have been proposed which dynamically adjust the gradient clipping threshold during model aggregation. However current iterations depend on specific and often intensive calculations to determine a clipping threshold which can lead to an over fitting to a specific data set or attacker model. In this paper, we address the limitations of existing FL and dynamic clipping approaches by introducing a novel method that incorporates a group of trusted users during the aggregation of client models for a global update. By identifying and utilizing a subset of trusted clients, our method enhances the robustness of model aggregation against malicious updates. This approach not only maintains the model’s performance but also improves its resistance to adversarial influences. We demonstrate the effective- ness of our proposed method through extensive experiments thus showcasing its superiority and simplicity in achieving enhanced model security in federated learning settings. Read More

Advancing Active Authentication for User Privacy and Revocability with BioCapsule

Edwin Sanchez & Anthony Weyer

Abstract: Biometric Facial Authentication has become a widely used mode of authentication in recent years, which can be attributed to the ever-growing popularity of mobile devices. With this growth in popularity, there is also a growth in concern over privacy for biometrics. Along with the issue of template revocability with biometric data, there is a need for a system that can provide for these issues while remaining easy to use and practical. BioCapsule is a system designed to solve these issues. While BioCapsule has been tested for its face authentication capabilities, this paper extends the scheme to Active Authentication, where a user is continu- ously authenticated throughout a session on a mobile device. The MOBIO dataset is used for testing, which contains video recordings of 150 individuals using mobile devices over sev- eral sessions. We find that the BioCapsule system not only performs comparably to the baseline system performance, but in some cases exceeds baseline performance in terms of False Acceptance Rate, False Rejection Rate, and Equal Error Rate. We examine these findings to learn about both the na- ture of the Active Authentication task and how BioCapsule interacts with this system. We also examine hyperparameters such as time interval for sampling user facial features, and window size, referring to how many past samples to average over with the current sample to determine user authenticity.Read More