Statistical Thinking for Industrial Problem Solving

A free online statistics course

Predictive Modeling and Text Mining

Predictive analytics is about using data and statistical algorithms to predict what might happen next given the current process and environment.

In this module, you will learn about some of the core techniques used in building predictive models, including how to address overfitting, select the best predictive model, and use multiple linear regression and logistic regression. You will also see how to fit other types of predictive models, including penalized regression, decision trees and neural networks. Finally, you will learn how to extract information and meaning from unstructured text data, such as survey response data.

Estimated time to complete this module: 3 to 4 hours

Specific topics covered in this module include:

Essentials of Predictive Modeling

  • Introduction to Predictive Modeling
  • Overfitting and Model Validation
  • Assessing Model Performance: Prediction Models
  • Assessing Model Performance: Classification Models
  • Receiver-Operating Characteristic (ROC) Curves

Decision Trees

  • Introduction to Decision Trees
  • Classification Trees
  • Regression Trees
  • Decision Trees with Validation
  • Random (Bootstrap) Forests

Neural Networks

  • What is a Neural Network?
  • Interpreting Neural Networks
  • Predictive Modeling with Neural Networks

Generalized Regression

  • Introduction to Generalized Regression
  • Fitting Models Using Maximum Likelihood
  • Introduction to Penalized Regression

Model Comparison and Selection

  • Comparing Predictive Models

Introduction to Text Mining

  • Introduction to Text Mining
  • Processing Text Data
  • Curating the Term List
  • Visualizing and Exploring Text Data
  • Analyzing (Mining) Text Data