 # Data Mining for Business Analytics

## Concepts, Techniques and Applications with JMP Pro

### Companion Site

This page provides a link to request data sets, slides and exercise solutions, along with access to useful resources for teaching analytics and predictive modeling.  Errata, which will be addressed in the next edition, are also listed here.

Request data and instructor materials

View errata

Request Data and Instructor Materials:

To request access to the data sets, or to request instructor materials, please use this form:  Request Data or Instructor Materials.

Note that in order to receive access to the instructor materials (data sets, slides, screenshots, and solutions for all but the last chapter), you will be required to provide proof that you are an instructor using this book in the classroom.

• Live and recorded webinars for getting started with JMP, data analysis, graphics, data preparation, and modeling.
• JMP Case Study Library: Business-oriented and analytics case studies, from basic graphics to multiple linear and logistic regression, classification and regression trees, neural networks, and model validation and selection.
• JMP Learning Library: One page guides and short videos on a number of topics. ##### Get help

Errata:

Page 43:  Paragraph starting with "Google" is listed twice.

Page 67:  A different version of the data was used to create the graphs in the book, so your graphs may look slightly different.

Pages 113 and 121, Figures 5.5 and 5.10:

Probabilities don't match.  Figure 5.5 is based on a model with main effects, interactions and a quadratic effect, while Figure 5.10 is based on a model with only main effects and interactions.

Page 116:  In the box, the false-positive (FPR) and false-negative rates (FNR) are discussed.

Under "false-positive rate", the definition and calculation for the "false discovery rate" (or FDR) is given, and under "false-negative rate" the definition and calculation for "false ommision rate" (FOR) is provided.  To compute FPR use n01/(n01+n00) and to compute FNR use n10/(n10+n11).

For a discussion of sensitivity, specificity, FPR and FNR, and confusion surrounding these topics, see this blog.

Page 148, last paragraph:  "The 5 parameter model..." should be "The model with four parameters and the intercept..."

Page 151:  Problem 6.1c - Eliminate the last sentence, "What is the prediction error?"

There is no way to calculate the prediction error since there is no actual value given for the tract.

Page 160:  Suggestion for Figure 7.4 - Display the whole table, with the rows saved for the selected nearest neighbors.

Page 165:  Problem 7.3a - It is not necessary to normalize the data first.  Remove the sentence, "Make sure to normalize the data...".

Page 174:  Naive Bayes in JMP Pro 13 (referenced in the box)

The Naive Bayes platform in JMP Pro 13 produces slightly different results.  It uses a bias factor of 0.5 when computing the probabilities. Therefore, the probabilities are not equal to the number of observations at a particular level divided by the total number of observations. Instead, it would be the number of observations at a particular level plus 0.5 divided by the number of levels divided by the total number of observations plus 0.5. For example, the probability of a fraudulent observation is:

[4 + (0.5/2)] / [10 + 0.5]  = 4.25 / 10.5  = 0.404762

The adjustment is also done when computing the conditional probabilities. This will place your numbers slightly off from those calculated by hand.

In addition to this adjustment to the probabilities, the platform makes use of logarithms. So rather than multiplying the Probability(Fraudulent) * Probability(Yes | Fraudulent) * Probability (Large | Frauduent), to calculate the scores, JMP takes the log of this expression (and then can sum the logs) and then exponentiates it. Therefore, the numeric values you see in the Naive Scores are the logs of the probabilities. The averages of the scores is then subtracted from the score for a particular observation before the exponentiation.

Page 181:  Problem 8.1b - "Create a summary of the training data..." (insert the word training).  Problem 8.1h - should refer to part f.

Page 182:  Problem 8.2bii - Move the sentence starting with "First , check to make sure..." to 8.2bv.

Page 209:  Problem 9.4 parts a, c, and d - change "error rate" to "prediction error" (3 times)

Page 210:  Problem 9.5 - change "test" to "validation" (twice), and fix the spelling of "similar"

Page 216:  Last sentence in the box, remove the "." between Window and (select)

Page 244:  Problem 10.4c - change "excluding price" to "excluding closing price".  Problem 10.4f - remove "an exhaustive search" (not available for logistic)

Page 266:  Problem 11.4c - add "in terms of lift" to the end of the last sentence.

Page 284: Problem 12.3

Problem 12.3a - change the hint to use Tabulate.

Problem 12.3e - revise to "A spam filter that is based on your model is used, so that only messages that are classified as nonspam are delivered while messages that are classified as spam are quarantined. Consequently, misclassifying a nonspam email (as spam) has much heftier results. Suppose that the cost of quarantining a nonspam email is 20 times that of not detecting a spam message. Show how the distance formula can be adjusted to account for these costs (assume that the proportion of spam is reflected correctly by the sample proportion).

Page 297:  Problem 13.2 - change the lift from 0.10 to 0.20 (twice).

Page 344:  Problem 15.1bi - change "the plot" to "the plots"

Page 367: Problem 16.1

Problem 16.1a - the section should be numbered ii.

Problem 16.1c - in the first sentence, change "multiplicative seasonality" to "multiplicative (exponential) seasonality".

Problem 16.1dii - Change the first sentence to "Save the prediction formula to the data table and calculate residuals".

Page 398:  Problem 17.8c - add a note that we need to hold out data - hold out the last year.

References: Add the following: Hyndman, R., and Yang, Y. Z. (2018).  tsdl: Time Series Data Library. v0.1.0.   https://pkg.yangzhourang.com/tsdl/.

Problems: Change the following references to [Source: Hyndman and Yang (2018).]

Problem 15.6 - “Souvenir Sales”

Problem 15.7 - “Shampoo Sales”

Problem 16.6 - “Souvenir Sales”

Problem 16.8 - “Australian Wine Sales”

Problem 17.7 - “Shampoo Sales”

Problem 17.9 - “Australian Wine Sales”