Skip to Content

Text Mining

With the vast amounts of unstructured data available on the web and stored in databases, and the promise it will provide insights unavailable in structured data, text mining has become an indispensable addition to traditional predictive analytics.

In this course, students will learn practical techniques for text extraction and text mining in a data mining context, including document clustering and classification, information retrieval, and the enhancement of structured data. Emphasis will be placed on the practical use of text mining in business. In addition, basic concepts of textual information such as tokenization, part-of-speech tagging, and disambiguation will be covered.

Topics include:

  • Structured vs. unstructured learning
  • CRISP-DM
  • Data sources
  • Dictionaries and lexicons
  • Text parsing
  • Regular expressions
  • Structured data from unstructured data
  • Document clustering and classification
  • Sentiment analysis

Practical experience:

  • Working with R
  • Working with unstructured text
  • Prepping text data for modeling
  • Visualizing text data

Software: Students will use R in this course. There is no additional cost for this product.

Course typically offered: Online in Fall and Spring

Prerequisites: Introduction to R Programming or equivalent knowledge required.

Next Steps: Upon completion of this course, consider taking other courses in data science to continue learning.

More Information: For more information about this course, please contact unex-techdata@ucsd.edu.

Course Number: CSE-41151
Credit: 2.00 unit(s)
Related Certificate Programs: Data Mining for Advanced Analytics