Schema Matching Quality: Thesaurus as the Matcher

Authors

  • Thabit Sabbah Faculty of Computing, Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor, Malaysia
  • Ali Selamat Faculty of Computing, Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor, Malaysia

DOI:

https://doi.org/10.11113/jt.v70.3514

Keywords:

Schema matching, thesaurus, information retrieval, performance

Abstract

Thesaurus is used in many Information Retrieval (IR) applications such as data integration, data warehousing, semantic query processing and classifiers. It was also utilized to solve the problem of schema matching. Considering the fact of existence of many thesauri for a certain area of knowledge, the quality of schema matching results when using different thesauri in the same field is not predictable. In this paper, we propose a methodology to study the performance of the thesaurus in solving schema matching. The paper also presents results of experiments using different thesauri. Precision, recall, F-measure, and similarity average were calculated to show that the quality of matching changed according to the used thesaurus.  

References

American National Standards Institute. 2005. ANSI/NISO Z39.19-2005.

Masterman, M. 1957. The Thesaurus in Syntax and Semantics. Mechanical Translation. 4(1–2): 35–43.

Aitchison, J., D. Bawden, and A. Gilchrist. 1997. Thesaurus Construction and Use: A Practical Manual. 3rd ed.

Golub, K. 2006. Automated Subject Classification of Textual Web Pages, Based on a Controlled Vocabulary: Challenges and Recommendations. New Review of Hypermedia and Multimedia. 12(1): 11–27.

Kuo, J.-J., et al. 2002. Multi-document Summarization Using Informative Words and Its Evaluation with a QA System. In Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing. Springer-Verlag. 391–-401.

Ralf, S., H. Johan, and S. Stefan. 2000. Using Thesauri for Automatic Indexing and for the Visualisation of Multilingual Document Collections. In Ontologies and Lexical Knowledge Bases: Proceedings of the First International OntoLex Workshop.

Steinberger, R., B. Pouliquen, and J. Hagman. 2002. Cross-Lingual Document Similarity Calculation Using the Multilingual Thesaurus EUROVOC. In Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing Springer-Verlag. 415–424.

Boudin, F., J.-Y. Nie, and M. Dawes. 2012. Using a Medical Thesaurus to Predict Query Difficulty. In Advances in Information Retrieval. R. Baeza-Yates, et al. Editors. Springer Berlin Heidelberg. 480–484.

Sabbah, T., R. Jayousi, and Y. Abuzir. 2009. Schema Matching Using Thesaurus. In Proceeding of 3rd International Conference on Software, Knowledge, Information Management and Applications.

Dong, C. and J. Bailey. 2006. A Framework for Integrating XML Transformations. In Conceptual Modeling-ER 2006. D. Embley, A. Olivé, and S. Ram, Editors. Springer Berlin Heidelberg. 182–195.

Madhavan, J., P. A. Bernstein, and E. Rahm. 2001. Generic Schema Matching with Cupid. In Proceedings of the 27th International Conference on Very Large Data Bases. Morgan Kaufmann Publishers Inc. 49–58.

Doan, A., P. Domingos, and A. Halevy. 2003. Learning to Match the Schemas of Data Sources: A Multistrategy Approach. Mach. Learn. 50(3): 279–301.

Madhavan, J., et al. 2005. Corpus-Based Schema Matching. In Proceedings of the 21st International Conference on Data Engineering IEEE Computer Society. 57–68.

Rahm, E. and P. A. Bernstein. 2001. A Survey of Approaches to Automatic Schema Matching. The VLDB Journal. 10(4): 334–350.

Shvaiko, P. and J. Euzenat. 2005. A Survey of Schema-based Matching Approaches. Journal on Data Semantics. IV: 146–171.

Zamboulis, L. 2003. XML Schema Matching & XML Data Migration & Integration: A Step Towards The Semantic Web Vision,

Melnik, S., H. Garcia-Molina, and E. Rahm. 2002. Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching. In Data Engineering, 2002. Proceedings. 18th International Conference on.

Thang, H. Q. and V. S. Nam. 2010. XML Schema Automatic Matching Solution. International Journal of Electrical, Computer, and Systems Engineering. 4(1): 68–74.

Princeton University. 2010. About WordNet. WordNet. Princeton University.

Xu, L. 2003. Source Discovery and Schema Mapping for Data Integration. Brigham Young University. 137.

Mirza, B., C. Laurent, and S. Joel. 2006. MAXSM: A Multi-Heuristic Approach to XML Schema Matching.

Sabbah, T. 2009. Using Thesaurus as a Schema Matching Approach at the Element Level. Unpublised MSc. Thesis. Al Quds University.

Downloads

Published

2014-09-18

Issue

Section

Science and Engineering

How to Cite

Schema Matching Quality: Thesaurus as the Matcher. (2014). Jurnal Teknologi, 70(5). https://doi.org/10.11113/jt.v70.3514