Language and Disciplinary Concepts in Corpus Linguistics: Investigating Corpus Data


  • Zuraidah Mohd Don Language Academy, Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor, Malaysia
  • Gerry Knowles Independent Scholar



Corpus Linguistics, Malay, Empirical, MaLex, Digital Humanities


This paper is intended for researchers involved in or contemplating research in corpus linguistics, and is concerned in particular with the language of corpus linguistics. It introduces and explains technical terms in the context in which they are normally used. Technical terms lead on to the concepts to which they refer, and the concepts are related to the procedures, including tagging and parsing, by which they are implemented. English and Malay are used as the languages of illustration, and for the benefit of readers who do not know Malay, Malay examples are translated into English. The paper has a historical dimension, and the language of corpus linguistics is traced to traditional usage in the language classroom, and in particular to the study of Latin in Europe. The inheritance from the past is evident in the design of MaLex, which is a working device that does empirical Malay corpus linguistics, and is presented here as a contribution to the digital humanities.




How to Cite

Mohd Don, Z., & Knowles, G. . (2021). Language and Disciplinary Concepts in Corpus Linguistics: Investigating Corpus Data. LSP International Journal, 8(2), 79-91.