Text Mining Handbook

Abstract
Motivation: Provide a guide to open source tools that can be used as a reference to do text mining.

Method: We apply the text processing language Perl and the statistical language R to two text databases, an accident description database and a survey database.

Results: For the accident description data new variables are extracted from the free-form text data that are used to predict the likelihood of attorney involvement and the severity of claims. For the survey data, the text mining identified key themes in the responses. Thus, useful information that would not otherwise be available was derived from both databases.

Conclusion: Open source software can be used to apply text mining procedures to insurance text data.

Availability: The Perl and R programs along with the CAS survey data will be available on the CAS Web Site.

Keywords: Predictive modeling, data mining, text mining

Volume
Spring
Page
1-61
Year
2010
Categories
Financial and Statistical Methods
Statistical Models and Methods
Data Mining
Financial and Statistical Methods
Statistical Models and Methods
Predictive Modeling
Publications
Casualty Actuarial Society E-Forum
Authors
Louise A Francis