A CLUSTERED SEMANTIC GRAPH APPROACH FOR MULTI-DOCUMENT ABSTRACTIVE SUMMARIZATION

Authors

  • Atif Khan Faculty of Computing, Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor, Malaysia
  • Naomie Salim Faculty of Computing, Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor, Malaysia
  • Waleed Reafee Faculty of Computing, Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor, Malaysia
  • Anupong Sukprasert Faculty of Computing, Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor, Malaysia
  • Yogan Jaya Kumar Faculty of Information and Communication Technology, University Teknikal Malaysia Melaka, 76100 Melaka, Malaysia

DOI:

https://doi.org/10.11113/jt.v77.6491

Keywords:

Multi document abstractive summarization, semantic role labeling (SRL), graph based ranking algorithm, semantic graph, semantic similarity measure

Abstract

Multi-document abstractive summarization aims is to create a compact version of the source text and preserves the important information. The existing graph based methods rely on Bag of Words approach, which treats sentence as bag of words and relies on content similarity measure. The obvious limitation of Bag of Words approach is that it ignores semantic relationships among words and thus the summary produced from the source text would not be adequate. This paper proposes a clustered semantic graph based approach for multi-document abstractive summarization. The approach operates by employing semantic role labeling (SRL) to extract the semantic structure (predicate argument structures) from the document text. The predicate argument structures (PASs) are compared pair wise based on Lin semantic similarity measure to build semantic similarity matrix, which is thus represented as semantic graph whereas the vertices of graph represent the PASs and the edges correspond to the semantic similarity weight between the vertices. Content selection for summary is made by ranking the important graph vertices (PASs) based on modified graph based ranking algorithm. Agglomerative hierarchical clustering is performed to eliminate redundancy in such a way that representative PAS with the highest salience score from each cluster is chosen, and fed to language generation to generate summary sentences. Experiment of this study is performed using DUC-2002, a standard corpus for text summarization. Experimental results reveal that the proposed approach outperforms other summarization systems.

References

M. A. Fattah and F. Ren. 2009. GA, MR, FFNN, PNN and GMM based Models for Automatic Text Summarization. Computer Speech & Language. 23: 126-144.

R. Barzilay and K. R. McKeown. 2005. Sentence Fusion for Multidocument News Summarization. Computational Linguistics. 31: 297-328.

D. Das and A. F. Martins. 2007. A Survey on Automatic Text Summarization. Literature Survey for the Language and Statistics II course at CMU. 4: 192-195.

S. Ye, T.-S. Chua, M.-Y. Kan, and L. Qiu. 2007. Document Concept Lattice for Text Understanding and Summarization. Information Processing & Management. 43: 1643-1662.

J. Kupiec, J. Pedersen, and F. Chen. 1995. A trainable document summarizer. In Proceedings of the 18th annual International ACM SIGIR Conference on Research and Development In Information Retrieval. 68-73.

K. Knight and D. Marcu. 2000. Statistics-based Summarization-Step One: Sentence Compression. In Proceedings of the National Conference on Artificial Intelligence. 703-710.

B. Larsen. 1999. A Trainable Summarizer with Knowledge Acquired from Robust NLP Techniques. Advances in Automatic Text Summarization. 71.

M. A. Fattah. 2014. A Hybrid Machine Learning Model for Multi-Document Summarization. Applied Intelligence. 40: 592-600.

G. Erkan and D. R. Radev. 2004. LexPageRank: Prestige in Multi-Document Text Summarization. In EMNLP. 365-371.

G. Erkan and D. R. Radev. 2004. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. J. Artif. Intell. Res.(JAIR). 22: 457-479.

R. Mihalcea and P. Tarau. 2005. A Language Independent Algorithm for Single and Multiple Document Summarization.

X. Wan and J. Yang. 2006. Improved affinity graph based multi-document summarization. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers. 181-184.

L. Page, S. Brin, R. Motwani, and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web.

R. Barzilay, K. R. McKeown, and M. Elhadad. 1999. Information Fusion in the Context of Multi-Document Summarization. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics. 550-557.

H. Tanaka, A. Kinoshita, T. Kobayakawa, T. Kumano, and N. Kato. 2009. Syntax-driven Sentence Revision for Broadcast News Summarization. In Proceedings of the 2009 Workshop on Language Generation and Summarisation. 39-47.

P.-E. Genest and G. Lapalme. 2011. Framework for Abstractive Summarization Using Text-To-Text Generation. In Proceedings of the Workshop on Monolingual Text-To-Text Generation. 64-73.

S. M. Harabagiu and F. Lacatusu. 2002. Generating Single and Multi-Document Summaries With Gistexter. In Document Understanding Conferences.

P.-E. Genest and G. Lapalme. 2012. Fully Abstractive Approach to Guided Summarization. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2. 354-358.

C.-S. Lee, Z.-W. Jian, and L.-K. Huang. 2005. A Fuzzy Ontology and Its Application to News Summarization. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on. 35: 859-880.

C. F. Greenbacker. 2011. Towards a Framework for Abstractive Summarization of Multimodal Documents. ACL HLT 2011. 75.

I. F. Moawad and M. Aref. 2012. Semantic Graph Reduction Approach for Abstractive Text Summarization. In Computer Engineering & Systems (ICCES), 2012 Seventh International Conference on. 132-138.

D. Lin. 1998. An Information-theoretic Definition of Similarity. UIn ICML. 296-304.

J. C. K. Cheung and G. Penn. 2013. Towards Robust Abstractive Multi-Document Summarization: A Caseframe Analysis of Centrality and Domain. In ACL (1). 1233-1242.

I. Mani and E. Bloedorn. 1999. Summarizing Similarities and Differences Among Related Documents. Information Retrieval. 1: 35-67.

J. Zhang, L. Sun, and Q. Zhou. 2005. A Cue-based Hub-Authority Approach for Multi-Document Text Summarization. In Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE'05. Proceedings of 2005 IEEE International Conference on. 642-645.

F. Wei, W. Li, Q. Lu, and Y. He. 2010. A Document-Sensitive Graph Model for Multi-Document Summarization. Knowledge and Information Systems. 22: 245-259.

S. S. Ge, Z. Zhang, and H. He. 2011. Weighted Graph Model Based Sentence Clustering and Ranking for Document Summarization. In Interaction Sciences (ICIS). 2011 4th International Conference on. 90-95.

T.-A. Nguyen-Hoang, K. Nguyen, and Q.-V. Tran. 2012. TSGVi: A Graph-Based Summarization System for Vietnamese Documents. Journal of Ambient Intelligence and Humanized Computing. 3: 305-313.

G. Glavaš and J. Šnajder. 2014. Event Graphs for Information Retrieval and Multi-Document Summarization. Expert Systems with Applications. 41: 6904-6916.

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. 2011. Natural Language Processing (Almost) from Scratch. The Journal of Machine Learning Research. 12: 2493-2537.

T. Barnickel, J. Weston, R. Collobert, H.-W. Mewes, and V. Stümpflen. 2009. Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts. PLoS One. 4: e6393.

A. Gatt and E. Reiter. 2009. SimpleNLG: A Realisation Engine for Practical Applications. In Proceedings of the 12th European Workshop on Natural Language Generation. 90-93.

C. Aksoy, A. Bugdayci, T. Gur, I. Uysal, and F. Can. 2009. Semantic Argument Frequency-based Multi-Document Summarization. In Computer and Information Sciences, 2009. ISCIS 2009. 24th International Symposium on. 460-464.

S. Shehata, F. Karray, and M. S. Kamel. 2013. An Efficient Concept-based Retrieval Model for Enhancing Text Retrieval Quality. Knowledge and Information Systems. 35: 411-434.

L. Del Corro and R. Gemulla. 2013. ClausIE: Clause-Based Open Information Extraction. In Proceedings of the 22nd International Conference on World Wide Web. 355-366.

J. Persson, R. Johansson, and P. Nugues. 2008. Text Categorization Using Predicate–Argument Structures. Proe. the. 1: 142-149.

N. Jadhav and P. Bhattacharyya. 2014. Dive Deeper: Deep Semantics for Sentiment Analysis. ACL 2014. 113.

S. S. Pradhan, W. Ward, K. Hacioglu, J. H. Martin, and D. Jurafsky. 2004. Shallow Semantic Parsing using Support Vector Machines. In HLT-NAACL. 233-240.

M. F. Porter. 2001. Snowball: A Language for Stemming Algorithms. ed.

Y. Li, Z. A. Bandar, and D. McLean. 2003. An Approach for Measuring Semantic Similarity Between Words Using Multiple Information Sources. Knowledge and Data Engineering, IEEE Transactions on. 15: 871-882.

R. Rada, H. Mili, E. Bicknell, and M. Blettner. 1989. Development and Application of a Metric on Semantic Nets. Systems, Man and Cybernetics, IEEE Transactions on. 19: 17-30.

A. G. Tapeh and M. Rahgozar. 2008. A Knowledge-based Question Answering System for B2C Ecommerce. Knowl.-Based Syst. 21: 321-326.

Y. Blanco-Fernández, J. J. Pazos-Arias, A. Gil-Solla, M. Ramos-Cabrer, M. López-Nores, J. García-Duque, et al. 2008. A Flexible Semantic Inference Methodology to Reason About User Preferences in Knowledge-Based Recommender Systems. Knowledge-Based Systems. 21: 305-320.

H. Kozima. 1994. Computing Lexical Cohesion as a Tool for Text Analysis. Citeseer.

M. Stevenson and M. A. Greenwood. 2005. A semantic approach to IE pattern induction. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. 379-386.

L. Meng, R. Huang, and J. Gu. 2013. A Review of Semantic Similarity Measures in Wordnet. International Journal of Hybrid Information Technology. 6: 1-12.

G. A. Miller. 1995. WordNet: A Lexical Database for English. Communications of the ACM. 38: 39-41.

S. Brin and L. Page. 1998. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems. 30: 107-117.

J. M. Kleinberg. 1999. Authoritative Sources in a Hyperlinked Environment. Journal of the ACM (JACM). 46: 604-632.

F. Murtagh and P. Contreras. 2011. Methods of Hierarchical Clustering. arXiv preprint arXiv:1105.0121.

S. Takumi and S. Miyamoto. 2012. Top-down vs Bottom-Up Methods of Linkage for Asymmetric Agglomerative Hierarchical Clustering. In Granular Computing (GrC), 2012 IEEE International Conference on. 459-464.

M. Steinbach, G. Karypis, and V. Kumar. 2000. A Comparison of Document Clustering Techniques. In KDD workshop on Text Mining. 525-526.

Y. Zhao, G. Karypis, and U. Fayyad. 2005. Hierarchical Clustering Algorithms for Document Datasets. Data Mining and Knowledge Discovery. 10: 141-168.

A. El-Hamdouchi and P. Willett. 1989. Comparison of Hierarchic Agglomerative Clustering Methods for Document Retrieval. The Computer Journal. 32: 220-227.

C.-Y. Lin. 2004. Rouge: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop. 74-81.

A. Nenkova and R. Passonneau. 2004. Evaluating Content Selection in Summarization: The Pyramid Method.

Downloads

Published

2015-11-26

How to Cite

A CLUSTERED SEMANTIC GRAPH APPROACH FOR MULTI-DOCUMENT ABSTRACTIVE SUMMARIZATION. (2015). Jurnal Teknologi, 77(18). https://doi.org/10.11113/jt.v77.6491