Seven Steps to Effective Predictive Modeling

This self-study course is designed to teach statisticians, analysts and modelers the basics of predictive modeling in a data-rich, fast-paced business environment. The course covers how to successfully design, build, validate and implement predictive models for a variety of applications and industries. With a focus on the business goals, the course relates each step to the company’s specific business needs, goals and objectives. The focus on the business goal highlights how the process is both powerful and practical. Steps 3 through 7 contain proprietary SAS® macros and applications of the powerful Output Delivery System to streamline data cleansing, variable preparation, model development, validation and implementation. Data is provided for a hands-on experience using Base SAS®, SAS/STAT®, SAS Graph® and MSExcel®.


Step 1: Defining the Objective
The first step in any modeling process is defining the objective. This chapter explores ways to think about defining the model objective function in relation to the business goals as well as the overall company strategy. Several methods for developing models are discussed including linear regression, logistic regression and classification trees. Numerous types of models are explained including response, activation, risk, retention, and lifetime value. The main case study, developing an activation model for life insurance, is introduced.

Step 2: Gathering the Data
Accurate, actionable, accessible data is the lifeblood of any successful model. This chapter discusses various types of data as well as its many sources, both internal and external. Numerous examples are provided for ways to collect and/or generate valid samples for model development. Multiple scenarios over a variety of industries describe data for acquisition, retention, risk, cross-sell and up-sell. Sampling theory and techniques are discussed.

Step 3: Preparing the Data for Modeling
The average modeler spends 60% of his or her time preparing data. This chapter details the entire data preparation process beginning with a description of the different classifications of data and how they can be adapted for predictive modeling. Several techniques are introduced for handling common data problems such as missing values and outliers.

Step 4: Selecting and Transforming the Variables
Determining the best fit is essential to good model performance. The underlying structure of the independent variables in relation to the dependent variable, determines the power and longevity of a model. This chapter details the steps for binning and transforming independent variables to insure the best fit with the dependent variable.
Special consideration is given to the fact that marketing data can have hundreds or even thousands of variables. This chapter introduces several quick methods for identifying the best candidate variables. Programs are introduced that automatically segment and transform the most powerful variables, to insure the best fit. Finally, selection methods are combined to easily bring the best fitting variables into the final stage of the modeling process.

Step 5: Processing and Evaluating the Model
All the preparation work up to this point makes this next step run smoothly. This chapter introduces several methods for processing and evaluating the model, with a practical discussion on the ideal number of variables. Weights of Evidence and Information Values are calculated. For our main case study, we use various options within PROC LOGISTIC to determine the model with the best fit. SAS®’s output delivery system (ODS) is used to capture information and display the data. Several models are compared using KS, Gini, C-Statistic, Bayes Information Criterion (BIC), decile analysis and SAS Graph. Validation data are scored, tabulated and compared using both SAS® & MSExcel®.

Step 6: Validating the Model 
By definition, models should perform well on the development data. Plus, if the hold-out sample is randomly selected, the model performance should score the validation data with similar results. A true test of model performance is how well it performs on data from a different time period or market area. This chapter demonstrates three powerful methods for insuring model fit. 1) Scoring alternate data is the best way to tell if your model will perform in a real campaign; 2) Bootstrapping uses simple resampling techniques to find confidence intervals around your estimates; 3) Key Variable Analysis calculates important market factors as they are effected by the model, thus insuring reasonable results.

Step 7: Implementing and Maintaining the Model
Effective implementation is a combination of business intelligence and well designed procedures. This chapter begins with scoring a new data set with the new model. Several auditing procedures are discussed. Tracking and model maintenance are emphasized as best practices.


Offer Details
After attending this course, you will be able to quickly determine what model or combination of models will do the best job of meeting your company’s objectives. You will have all the steps to successfully develop, validate and implement your predictive model.

  • Course notes (hard copy mailed to you)
  • SAS code to run on Base SAS®, SAS/STAT®, SAS/GRAPH®
  • Exercises & solutions
  • Datasets for model development, validation and implementation
  • Email and phone support (2 hours)
  • Certificate of completion

Pricing Options

Single User: $799     Group (3-6 people): $500 per person

Group (7+ people): Contact Olivia for pricing

Familiarity with SAS® including data step
Basic knowledge of statistics

Who Should Attend

Duration: Self-Paced