Applied Data Analytics
CIT 52600/ 3 Cr.
This course will cover both the fundamentals and the concepts of the data analytics cycle and the advancement data analytics techniques. The focus is on emerging advanced data analytics techniques and their applications to practical problems for different disciplines, such as IT, health care, and economics. Both advanced supervised learning and unsupervised learning algorithms will be explored along with data visualization techniques. Students will apply these advanced techniques in labs and a research project to resolve an applied research problem and identify scientific findings by using public data sets. A research project report is required at the end of the course and the quality of the research report is expected to align with the requirements of IEEE or ACM international conferences.
- Available Online: No
- Credit by Exam: No
- Laptop Required: Yes
Prerequisites/Co-requisites:
CIT 50700 Measurement and Evaluation in Industry and Technology or equivalent, basic knowledge about computing architecture, and programming in one of the major programming languages
Software
- R, Python, Data Mining Software
Outcomes
CIT Student Outcomes (What are these?)
(i) Demonstrate the solid understanding of the fundamentals of data analytics concepts, principles and life cycle
(ii) Apply the data analytics cycles for effective problem solving
(iii) Design state-of-the art data analytics techniques to apply to practical work and research
(iv) Evaluate different existing data analytics techniques for a specific research problem.
Integrate data from multiple data sources for inter-disciplinary data predictive analysis.
(v) Assess the existing data analytics tools and recommend the appropriate ones based on the organization’s resource and plan
(vi) Produce a research paper that could lead to scientific publication and present the scientific findings
Topics
- Introduction and Data Analytics fundamentals and concepts
- Data sampling & Preprocessing
- Data visualization and Data Correlation (Linear, non-linear, Chi-square)
- Feature selection
- Data forecasting
- Supervised Learning: Naïve Bayes, Linear Discriminant Analysis
- Supervised Learning: Support Vector Machine, Nearest Neighbor Classifier, Decision Trees
- Unsupervised Learning: partitioning methods, sequential clustering, hierarchical methods
- Unsupervised Learning: fuzzy clustering, relational clustering, cluster tendency assessment, cluster validity
- Unsupervised Learning: self-organizing map, DBSCAN
- Advanced topics in data mining techniques and Data mining techniques for big data