Abstract
Motivation: One of the newest areas of data mining is text mining. Text mining is used to extract information from free form text data such as that in claim description fields. This paper introduces the methods used to do text mining and applies the method to a simple example.
Method: The paper will describe the methods used to parse data into vectors of terms for analysis. It will then show how information extracted from the vectorized data can be used to create new features for use in analysis. focus will be placed on the method of clustering for finding patterns in unstructured text information.
Results: The paper shows how feature variables can be created from unstructured text information and used for prediction.
Conclusions: Text mining has significant potential to expand the amount of information that is available to insurance analysts for exploring and modeling data.
Availability: Free software that can be used to perform some of the analyses describes in this paper is described in the appendix.
Method: The paper will describe the methods used to parse data into vectors of terms for analysis. It will then show how information extracted from the vectorized data can be used to create new features for use in analysis. focus will be placed on the method of clustering for finding patterns in unstructured text information.
Results: The paper shows how feature variables can be created from unstructured text information and used for prediction.
Conclusions: Text mining has significant potential to expand the amount of information that is available to insurance analysts for exploring and modeling data.
Availability: Free software that can be used to perform some of the analyses describes in this paper is described in the appendix.
Keywords: Predictive modeling, data mining, text mining, statistical analysis
Volume
Winter
Page
51 - 88
Year
2006
Categories
Financial and Statistical Methods
Statistical Models and Methods
Data Mining
Financial and Statistical Methods
Statistical Models and Methods
Predictive Modeling
Publications
Casualty Actuarial Society E-Forum
Prizes
Management Data and Information Prize