Adaptive Approximate Record Matching

Rahnamoun, Ramin

همایش سردبیران نشریات علمی دانشگاه آزاد اسلامی

سامانه یکپارچه نشریات علمی دانشگاه آزاد اسلامی

تعداد نشریات	418
تعداد شماره‌ها	10,005
تعداد مقالات	83,622
تعداد مشاهده مقاله	78,354,563
تعداد دریافت فایل اصل مقاله	55,394,944

	Adaptive Approximate Record Matching
International Journal of Smart Electrical Engineering
مقاله 4، دوره 03، شماره 01، اردیبهشت 2014، صفحه 23-27 اصل مقاله (360.83 K)
نویسنده
Ramin Rahnamoun
Computer Engineering Department, Azad University-Tehran Central Branch, Tehran, Iran.
چکیده
Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error patterns. In field matching phase, edit distance method is used. Naturally, it had been customized for Persian language problems such as similarity of Persian characters, usual typographical errors in Persian, etc. In record matching phase, the importance of each field can be determined by specifying a coefficient related to each field. Coefficient of each field must be dynamically changed, because of changes of typographical error patterns. For this reason, Genetic Algorithm (GA) is used for supervised learning of coefficient values. The simulation results show the high abilities of this algorithm compared with other methods (such as Decision Trees).
کلیدواژه‌ها
record matching؛ edit distance؛ data cleaning؛ genetic algorithms

مراجع
[1] D. E. Goldberg, “Genetic Algorithms in Search Optimization and Machine Learning”, Addison_Wesley, 1989. [2] ] J. Han, M. Kamber, “Data Mining: Concepts and Techniques”, Morgan Kaufmann, 2001. [3] M. A. Hernadez, S.J.Stolfo, "Real-world Data is Dirty: Data Cleansing and the Merge/Purge Problem", Journal of Data Mining and Knowledge Discovery, Vol.1, No.2, 1998 [4] J.A. Hylton, “Identifying and Merging Related Bibliographical Records”, Master's Thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 1996. [5] M. Kantardzic,” Data Mining: Concepts, Methods, and Algorithms”, IEEE Press, 2003. [6] K. Kukich, "Techniques for Automatically Correcting Words in Text", ACM Computing Survey, Vol.24, No.4, 1992. [7] A. E. Monge, “Adaptive Detection of Approximately Duplicate Database Records and Database Integration Approach to Information Discovery”, PHD Thesis, University of California, San Diego, 1997. [8] ] A. E. Monge, C. P. Elkan, "The Field Matching Problem: Algorithms and Applications" Second International Conference of Knowledge Discovery and Data Mining, AAAI Press, 1996. [9] V. S. Verykios, A.K.Elmagarmid, E.H.Houstis, "Automating the Approximate Record Matching Process", Information Science, Vol.126, No 1-4, 2000. [10] V. S. Verykios, G.V.Moustakides, "A Cost Optimal Decision Model for Record Matching", Workshop on Data Quality, 2001.
آمار تعداد مشاهده مقاله: 2,639 تعداد دریافت فایل اصل مقاله: 1,456

سامانه مدیریت نشریات علمی. قدرت گرفته از سیناوب

پیوندهای مفید

اخبار و اعلانات

آمار

Adaptive Approximate Record Matching