Authors: Namaswini Padhy
Abstract: In recent years, text summarization has become a prominent challenge in the field of Natural Language Processing (NLP). It involves generating a concise and meaningful summary from a lengthy text document. There are two main approaches to summarization based on the type of output: extractive, which selects key sentences or phrases from the original text, and abstractive, which generates new sentences to convey the core information.While significant research has been conducted in extractive summarization for Indian languages, the development of effective abstractive summarization models remains limited—particularly for low-resource languages such as Gujarati. This work presents an efficient and accurate abstractive text summarizer tailored for the Gujarati language. Our model is built upon a Sequence-to-Sequence (Seq2Seq) framework employing an encoder-decoder architecture integrated with an attention mechanism. To enhance the preprocessing pipeline for Gujarati text, we introduce a custom preprocessor, designed to handle the linguistic and syntactic peculiarities of Gujarati. We curated a dataset comprising Gujarati news articles and their corresponding headlines to train and evaluate our model. Experimental results demonstrate that the proposed approach effectively captures the core semantics of the source text, generating fluent and human-readable summaries
DOI: https://doi.org/10.5281/zenodo.17119304
International Journal of Science, Engineering and Technology