Methods: This paper will give an overview of methods that can be used to detect errors and remediate data problems. The methods will include outlier detection procedures from the exploratory data analysis and data mining literature as well as methods from research on coping with missing values. The paper will also address the need for accurate and comprehensive metadata.
Conclusions: A number of graphical tools such as histograms and box and whisker plots are useful in highlighting unusual values in data. A new tool based on data spheres appears to have the potential to screen multiple variables simultaneously for outliers. For remediating missing data problems, imputation is a straightforward and frequently used approach.
Availability: The R statistical language can be used to perform the exploratory and cleaning methods described in this paper. It can be downloaded for free at http://cran.r-project.org/
Keywords: data quality, data mining, ratemaking, exploratory data analysis.