With the vast amounts of unstructured data available on the web and stored in databases, and the promise it will provide insights unavailable in structured data, text mining has become an indispensable addition to traditional predictive analytics.
In this course, students will learn practical techniques for text extraction and text mining in a data mining context, including document clustering and classification, information retrieval, and the enhancement of structured data. Emphasis will be placed on the practical use of text mining in business. In addition, basic concepts of textual information such as tokenization, part-of-speech tagging, and disambiguation will be covered.
Topics include:
- Structured vs. unstructured learning
- CRISP-DM
- Data sources
- Dictionaries and lexicons
- Text parsing
- Regular expressions
- Structured data from unstructured data
- Document clustering and classification
- Sentiment analysis
Practical experience:
- Working with R
- Working with unstructured text
- Prepping text data for modeling
- Visualizing text data
Software: Students will use R in this course. There is no additional cost for this product.
Course typically offered: Online in Fall and Spring
Prerequisites: Introduction to R Programming or equivalent knowledge required.
Next Steps: Upon completion of this course, consider taking other courses in data science to continue learning.
More Information: For more information about this course, please contact unex-techdata@ucsd.edu.
Course Number: CSE-41151
Credit: 2.00 unit(s)
Related Certificate Programs: Data Mining for Advanced Analytics
There are no sections of this course currently scheduled. Please contact the Science & Technology department at 858-534-3229 or unex-sciencetech@ucsd.edu for information about when this course will be offered again.