Generating the missing links for semantic relations within Wiktionary

Abdullah Bawakid

Abdullah Bawakid University of Jeddah

Keywords: Semantic Relations, Aligning Words Senses, WSD, Wiktionary

Abstract

In many cases, a single presentation of a term may carry multiple meanings. Wiktionary provides a way for viewing the meanings of the different terms it stores in the form of senses. It also provides semantic relations. However, Wiktionary, in its current form, contains semantic relations linking Wiktionary entries at the term level. Links for semantic relations connecting entries at the word sense level do not currently exist in Wiktionary. In this paper, we propose a novel method for generating a new type of links for semantic relations within Wiktionary. This is effectively applied through aligning the source words senses for semantic relations in Wiktionary with their corresponding target word senses. We use surface-level features that rely only on the structure and content of Wiktionary for completing this task without the aid of any external lexical or knowledge bases. We present the details of the method and how it was implemented. Additionally, we describe the evaluations that we performed and illustrate the competitive results we obtained, especially when compared against other systems. Our findings indicate that our system outperforms the baselines and performs similar to state-of-art systems without requiring access to external online resources or training data to run.

References

Artstein, R., Poesio, M., 2008. Inter-Coder Agreement for Computational Linguistics. Computational Linguistics 34, 555–596.

Balamurali, A.R., Joshi, A., Bhattacharyya, P., 2011. Harnessing WordNet Senses for Supervised Sentiment Classification, in: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’11. Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 1081–1091.

Banerjee, S., Pedersen, T., 2002. An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet, in: Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing, CICLing ’02. Springer-Verlag, London, UK, UK, pp. 136–145.

Bouras, C., Tsogkas, V., 2012. A clustering technique for news articles using WordNet. Knowledge-Based Systems 36, 115–128.

Clark, P., Fellbaum, C., Hobbs, J., 2008. Using and extending WordNet to support question-answering, in: Proceedings of the 4th Global WordNet Conference (GWC’08).

Dandala, B., Mihalcea, R., Bunescu, R., 2013. Multilingual word sense disambiguation using wikipedia, in: Proceedings of the 6th International Joint Conference on Natural Language Processing.

De Belder, J., Moens, M.-F., 2010. Text simplification for children, in: SIGIR: Workshop on Accessible Search Systems 23 July 2010 Geneva, Switzerland.

Gabrilovich, E., Markovitch, S., 2007. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis, in: Proceedings of the 20th International Joint Conference on Artificial Intelligence, IJCAI’07. USA, pp. 1606–1611.

Garrod, S., Terras, M., 2000. The Contribution of Lexical and Situational Knowledge to Resolving Discourse Roles: Bonding and Resolution. Journal of Memory and Language 42, 526–544.

Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G., 2013. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence, Wikipedia and Semi-Structured Resources 194, 28–61.

Jurafsky, D., Martin, J.H., 2000. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 1st ed. Prentice Hall PTR, Upper Saddle River, NJ, USA.

Komiya, K., Okumura, M., 2012. Automatic Selection of Domain Adaptation Method for WSD using Decision Tree Learning. Journal of Natural Language Processing 19, 143–166.

Krizhanovsky, A.A., Smirnov, A.V., 2013. An approach to automated construction of a general-purpose lexical ontology based on Wiktionary. J. Comput. Syst. Sci. Int. 52, 215–225.

Kunze, C., Lemnitzer, L., 2002. GermaNet - representation, visualization, application., in: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC-2002). Spain.

Leffa, V.J., 1992. Making foreign language texts comprehensible for beginners: An experiment with an electronic glossary. System 20, 63–73.

Li, C.H., Yang, J.C., Park, S.C., 2012. Text categorization algorithms using semantic approaches, corpus-based thesaurus and WordNet. Expert Systems with Applications 39, 765–772.

Li, C., Sun, A., Datta, A., 2013. TSDW: Two-stage word sense disambiguation using Wikipedia. J Am Soc Inf Sci Tec 64, 1203–1223.

Manning, C.D., Schütze, H., 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, USA.

McCrae, J., Montiel-Ponsoda, E., Cimiano, P., 2012. Integrating WordNet and Wiktionary with lemon, in: Chiarcos, C., Nordhoff, S., Hellmann, S. (Eds.), Linked Data in Linguistics. Springer Berlin Heidelberg, pp. 25–34.

Meyer, C.M., 2013. Wiktionary: The Metalexicographic and the Natural Language Processing Perspective. Technische Universität Darmstadt.

Meyer, C.M., Gurevych, I., 2012. To Exhibit is not to Loiter: A Multilingual, Sense-Disambiguated Wiktionary for Measuring Verb Similarity, Proceedings of COLING 2012, pp. 1763–1780.

Meyer, C.M., Gurevych, I., 2010. Worth Its Weight in Gold or Yet Another Resource — A Comparative Study of Wiktionary, OpenThesaurus and GermaNet, Gelbukh, A. (Ed.), Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science. Springer Berlin Heidelberg, pp. 38–49.

Miller, G.A., 1995. WordNet: A Lexical Database for English. Commun. ACM 38, 39–41.

Moldovan, D., Novischi, A., 2004. Word sense disambiguation of WordNet glosses. Computer Speech & Language, Word Sense Disambiguation 18, 301–317.

Mukherjee, S., Bhattacharyya, P., 2012. WikiSent: Weakly Supervised Sentiment Analysis through Extractive Summarization with Wikipedia, in: Machine Learning and Knowledge Discovery in Databases, Springer Berlin Heidelberg, pp. 774–793.

Nastase, V., 2008. Topic-driven Multi-document Summarization with Encyclopedic Knowledge and Spreading Activation, in: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’08. pp. 763–772.

Navarro, G., 2001. A Guided Tour to Approximate String Matching. ACM Comput. Surv. 33, 31–88.

Navigli, R., Ponzetto, S.P., 2012. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence 193, 217–250.

Pedersen, T., 2000. A Simple Approach to Building Ensembles of Naive Bayesian Classifiers for Word Sense Disambiguation, in: Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference, NAACL 2000. pp. 63–69.

Peters, W., Peters, I., 2000. Lexicalized Systematic Polysemy in WordNet, in: Proceedings of the 2nd International Conference on Language Resources and Evaluation.

Pirkola, A., Hedlund, T., Keskustalo, H., Järvelin, K., 2001. Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings. Information Retrieval 4, 209–230.

Pustejovsky, J., 2012. Semantics and The Lexicon. Springer Science & Business Media.

Wang, T., Hirst, G., 2014. Applying a Naive Bayes Similarity Measure to Word Sense Disambiguation, in: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, MD, USA,

Witten, I., Milne, D., 2008. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links, in: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, AAAI Press, Chicago, USA. pp. 25–30.

	P.O.Box 17225, Khaldiyah 72453, Kuwait
	jer@ku.edu.kw
	kuwaitjournals@gmail.com
	(+965) 2498 6100 / 2498 4487 / 2481 6261 (Dir)