Home /  Courses And Programs / Data Preparation for Analytics

Data Preparation for Analytics

An essential, yet often under-emphasized step in the data mining process is data preparation. Habitually, people are more inclined to focus on knowledge discovery, but without sufficient preparation of the data, return on efforts will be limited. Without adequate skill and knowledge, preparing data for modeling can lead to less than adequate modeling results.

This class offers in-depth coverage of data preparation techniques and a step-by-step approach through a variety of tools while providing practical illustrations using real data sets. The hands-on exercises will anchor the learned concepts and offer valuable first-hand experience in cleaning, filtering, and preparing the data for mining and predictive or descriptive modeling. The goal is to transform the datasets so that their information content is best exposed to the mining tool.

Topics include:

  • Prerequisites to good data preparation
  • Dealing with variables
    • Sparcity
    • Monotonicity
    • Increasing dimensionality
    • Anachronisms
    • Missing values
    • Outliers
  • Normalization, transformation, feature extraction, and feature reduction
  • Building mineable datasets
  • Data separation
  • Dealing with imbalanced data

Practical experience:

  • Hands-on data mining projects

Software: WEKA is used for class assignments. There is no additional cost for this product.

Course typically offered: Online in Winter and Summer

Prerequisites: Fundamentals of Data Mining or equivalent experience required.

Next Steps: Upon completion of this course, consider taking Data Mining: Advanced Concepts and Algorithms.

More Information: For more information about this course, please contact unex-techdata@ucsd.edu.

Course Number: CSE-41261
Credit: 2.00 unit(s)
Related Certificate Programs: Data Mining for Advanced Analytics

+ Expand All