• Suraya Alias Faculty of Computing and Informatics, Universiti Malaysia Sabah, 88400 Kota Kinabalu, Sabah, Malaysia
  • Siti Khaotijah Mohammad School of Computer Sciences, Universiti Sains Malaysia, 11800 USM Pulau Pinang, Malaysia
  • Gan Keng Hoon School of Computer Sciences, Universiti Sains Malaysia, 11800 USM Pulau Pinang, Malaysia
  • Tan Tien Ping School of Computer Sciences, Universiti Sains Malaysia, 11800 USM Pulau Pinang, Malaysia




Sentence Compression, Pattern-Growth, Text Summarization, Malay


A text summary extracts serves as a condensed representation of a written input source where important and salient information is kept. However, the condensed representation itself suffer in lack of semantic and coherence if the summary was produced in verbatim using the input itself. Sentence Compression is a technique where unimportant details from a sentence are eliminated by preserving the sentence’s grammar pattern. In this study, we conducted an analysis on our developed Malay Text Corpus to discover the rules and pattern on how human summarizer compresses and eliminates unimportant constituent to construct a summary. A Pattern-Growth based model named Frequent Eliminated Pattern (FASPe) is introduced to represent the text using a set of sequence adjacent words that is frequently being eliminated across the document collection. From the rules obtained, some heuristic knowledge in Sentence Compression is presented with confidence value as high as 85% - that can be used for further reference in the area of Text Summarization for Malay language.


A MALAY TEXT CORPUS ANALYSIS FOR SENTENCE COMPRESSION USING PATTERN-GROWTH METHOD. (2016). Jurnal Teknologi, 78(8). https://doi.org/10.11113/jt.v78.7413