Data Cleaning Material Collection


Problems
Papers
Talks
Webpages
Conferences
Workshops
People
Theses
Books
Misc

Papers:

Data Cleaning: Problems and Current Approaches    Rahm, Erhard; Do, Hong-Hai, IEEE Bulletin of the Technical Committee on Data Engineering, Vol 23 No. 4, December 2000
Data Quality Mining -Making a Virtue of Necessity   Jochen Hipp, Ulrich Guntzer, and Udo Grimmer, Proceedings of the 6th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD 2001)
A Framework for Analysis of Data Quality Research   R.Y. Wang, V.C. Storey, and C.P. Firth, IEEE Transactions on Knowledge and Data Engineering 7 (1995), no. 4, 623--640
Monitoring data quality problems in network traffic databases.  F. Korn, S. Muthukrishnan and Y. Zhu. VLDB 2003.
Mining Database Structure: Or, How to build a Data Quality Browser   T. Dasu, T. Johnson, S. Muthukrishnan, and V. Shkapenyuk, In Proceedings of the ACM Conf. on Management of Data (SIGMOD), 2002
Systematic Development of Data Mining-Based Data Quality Tools   Dominik Luebbers, Udo Grimmer, Matthias Jarke, VLDB 2003
Potter's Wheel: An Interactive Data Cleaning System   V. Raman, J. Hellerstein. In Proc. VLDB (Roma, Italy, 2001), pp. 381-390.
Schema Mapping as Query Discovery   R. J. Miller, L. M. Haas, and M. Hernandez. In Proceedings of the International Conference on Very Large Data Bases (VLDB), pages 77--88, 2000.
Real World Data is Dirty: Data Cleansing and The Merge/Purge Problem   Mauricio Hernandez, Salvatore Stolfo. Journal of Data Mining and Knowledge Discovery, 1(2), 1998.
An Extenxible Framework for Data Cleaning Helena Galhardas, Daniela Florescu, Dennis Shasha, Eric Simon. ICDE2000
"Declarative Data Cleaning: Language, Model, and Algorithms", Helena Galhardas, Daniela Florescu, Eric Simon , Cristian-Augustin Saita, Dennis Shasha,  Proc. of the Int. Conf. on Very Large Data Bases (VLDB) ,Rome, Italy , September , 2001
Data Cleansing: Beyond Integrity Analysis   Jonathan I. Maletic and Andrian Marcus. In Proceedings of The Conference on Information Quality (IQ2000).
Cleansing Data for Mining and Warehousing   Mong-Li Lee, Tok Wang Ling, Hongjun Lu, and Yee Teng Ko. In Proceedings of the International Conference on Database and Expert Systems Applications (DEXA), volume 1677 of LNCS, pages 751-760, Florence, Italy, 1999.
ARKTOS: A Tool for Data Cleaning and Transformation in Data Warehouse Environments   Panos Vassiliadis, Zografoula Vagena, Spiros Skiadopoulos, Nikos Karayannidis, Timos Sellis, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol. 28, no. 4, pp. 42-47, December 2000
 

Webpages:

Data Quality Research at AT&T Labs
The MIT Total Data Quality Management Program
Data Cleaning and Information Quality(Drexel)
Data Cleaning at Microsoft
Univ. of Toronto DB Group
Automated Data Cleansing(SDML)
Server log cleaning
Dagstuhl Seminar "Data Quality on the Web"
A Reading List

Data Cleaning Projects and Commercial Tools:

AJAX: An Extensible Data Cleaning Tool
ARKTOS II(2002-2004)
Data Cleaning and Integration(a list of commercial tools and papers)
IBM DataJoiner
WinPure ListCleaner
Business Advantage
Practical Analysis of Nutritional Data(PANDA): Data Cleaning
Data Providers and Data Cleaning-KDnugget
SAS Data Quality-Cleanse
Dataquality.com
Datacleaning.com

Conferences:

SIGMOD, VLDB, ICDE'04, ICDT05, EDBT
SIGKDD, CIKM, ICDM'04, SDM'04
 ACM TODS, IEEE TKDE, J. VLDB, J. DMKD

Workshops:

Data Cleaning, Record Linkage,and Object Consolidation
DIMACS Workshop on Data Quality, Data Cleaning and Treatment of Noisy Data
International Workshop on Data Quality in Cooperative Information Systems
ICDE 2000: Special Issue on Data Cleaning All-in-one papers[PDF, PS]
 

Courses:
 

Talks:

Data Quality and Data Cleaning: An Overview
Data Warehousing Systems: Design & Research Issues
 

Faculties&Students:

Ren¨¦e J. Miller Professor at U. Toronto
JiaWei Han Professor at UIUC
Helena Galhardas, Professor at IST and Researcher at INESC
Ahmed K. Elmagarmid  Professor at Purdue U.
Mohamed Galal Elfeky    Ph. D. Student, Department of Computer Sciences, Purdue U.
Panos Vassiliadis   University of Ioannina, Greece

Ph.D. theses:

H. Galhardas: Data Cleaning: Model, Language and Algoritmes, PhD thesis, University of Versailles, September 2001[ps] [pdf]
Alvaro E. Monge: Adaptive detection of approximately duplicate database records and the database integration approach to information discovery. University of California, San Diego, 1997 [pdf]
Edwin M. Knorr: Outliers and Data Mining: Finding Exceptions in Data, PhD Thesis, University of British Columbia, April, 2002. [pdf]

Books:

Exploratory Data Mining and Data Cleaning by Tamraparni Dasu (Author), Theodore Johnson (Author)

Miscellaneous:

SUGI 27: Data Cleaning 101
Data Quality on the Web
Courses in Data Cleaning and Analysis