ESI6934 Advanced Analytics I

Hui Yang

 

Industry in the 21st century is investing in a variety of sensor networks and dedicated data centers to increase information visibility. Present technological advances are bringing massive data torrents in shorter time scales. This offers unprecedented opportunities to manage, analyze, visualize, and extract useful information from large, diverse, distributed and heterogeneous data sets so as to make better decisions, improve the system performance, and optimize the operational management. Data are motivating a profound transformation in the operation management in every field of engineering and business.

This course will help you navigate the overload and optimally prepare and enrich your data to use as a key ingredient for powerful analytical insights. The objectives of this course are as follows: (1) Enhance analytic effectiveness with predictive modeling. (2) Discover useful trends and patterns from the data. (3) develop neural network models to drive fact-based decisions.(4) Build descriptive and predictive models with real-world examples from a variety of industries.

Textbooks:

 

Martin T. Hagan, Howard B. Demuth, Mark H. Beale, Orlando De Jesús, Neural Network Design (2nd edition), 2014

SAS Institue Inc., Advanced Business Analytics - Volume I and Volume II

 

Prerequisite: EIN 4933 Engineering Analytics and ESI 4244/6247 Statistical Design Models, or equivalent background in statistics and linear algebra.

Topics:

 

Introduction to data analytics
- Overview of analytics
- Data management (Correctly consolidated data is the first step for analytics)
- Data challenges (Errors, outliers, missings, size)
- Missing data imputation

Predictive modeling
- Linear regression model
- Multiple logistic regression model
- Regularization

Neuron model and network architectures
- Neuron model
- Network architectures

Perceptron Learning Rule
- Perceptron architecture
- Perceptron learning rule

Signal and Weight Vector Spaces
- Linear vector spaces
- Gram-Schmidt orthogonalization

Performance Surfaces and Optimum Points
- Necessary conditions for optimality
- Eigensystem of the Hessian

Performance Optimization
- Gradient-based methods
- Steepest descent, Newton's method, and conjugate gradient

Backpropagation
- Multilayer perceptron
- Backpropagation algorithm

Competitive Networks
- Competitive learning
- Self-organizing feature map

Radial Basis Networks
- Radial basis network
- Training RBF networks

Clustering and Dimensionality Reduction
- Data clustering and variable clustering
- Principal component analysis

Large-scale Data Mining
- Stochastic gradient search
- Map-reduce and data parallelism

 

 

 

 

 

 

 

 

 

Grading Policy

 

2 Exams - 35 pts each
Homework/Quiz - 30 pts
1 Project - 35 pts


There will be two exams, numerous homework/quiz sets and one project. Exam dates will be announced as the course progresses. The top score from two exams will be added to the project and the homework/quiz scores to obtain the final grade for the course (out of a total of 100 pts). No make-up exams unless previous arrangements have been made. Students will be expected to attend class and prepare assignments. Habitual failure to do so will result in a reduced grade. An incomplete grade will only be given when a student misses a portion of the semester because of illness or accident. Cheating on examinations, plagiarism and other forms of academic dishonesty are serious offenses and may subject the student to penalties ranging from failing grades to dismissal.
 

Grading scale will be used: A: 90+; B: 80+; C: 70+; D: 60+, F: <60 (College of Engineering Rule: Only grades of C or better will be accepted in all Math, Science, and Engineering courses).

 

Software tools used in the course

 

SAS® Enterprise Miner

 

SAS Enterprise Miner streamlines the data mining process to create highly accurate predictive and descriptive models based on large volumes of data from across the enterprise. It offers a rich, easy-to-use set of integrated capabilities for creating and sharing insights that can be used to drive better decisions.
Forward-thinking organizations today are using SAS data mining software to detect fraud, minimize risk, anticipate resource demands, reduce asset downtime, increase response rates for marketing campaigns and curb customer attrition.

 

 

 

Matlab

 

Matlab Tutorial

 

http://www.mathworks.com/academia/student_center/tutorials/launchpad.html

http://www.math.ufl.edu/help/matlab-tutorial/
http://www.math.utah.edu/lab/ms/matlab/matlab.html
http://users.ece.gatech.edu/~bonnie/book/TUTORIAL/tutorial.html
http://www.engin.umich.edu/group/ctm/
http://www.math.mtu.edu/~msgocken/intro/intro.html
http://www.math.siu.edu/matlab/tutorials.html
http://www.cyclismo.org/tutorial/matlab/
http://www.cs.ubc.ca/spider/cavers/MatlabGuide/guide.html
http://www.duke.edu/~hpgavin/matlab.html
http://amath.colorado.edu/computing/Matlab/Tutorial/