Transformer Models in NLP

Authors- Ayush Sharma, Harsh Mehta, Abhishek Gautam, Priyanka Rajpal, Ashima Thakur, Harleen Kaur

Abstract-– Natural Language Processing (NLP) has been transformed by transformer models, which introduce self-attention mechanisms that effectively capture long-range interdependence. Transformers, as opposed to recurrent networks, allow for parallel processing, which boosts efficiency for jobs like text production and translation. This architecture, was first presented by Vaswani et al. (2017) [1], opened the door for models such as GPT (autoregressive generation) and BERT (bidirectional understanding) [2]. NLP applications are further expanded by more advanced versions like T5 and GPT-4 [6]. Despite their achievements, interpretability and high computing costs remain obstacles [7]. To guarantee that transformers continue to innovate in AI-driven language processing [14], ongoing research focuses on increasing efficiency and minimizing biases.

DOI: /10.61463/ijset.vol.13.issue2.315