Post by simranratry20244 on Feb 12, 2024 4:47:26 GMT -5
It is in 2019 when we can really start talking about language models in Spanish , with the launch of BETO from the University of Chile. It is true that large technology companies continue to lead the development of NLP tools and resources in English, but perhaps for this reason, in our country this revolution is closely linked to the real application and usefulness of these models. Spanish institutions and companies have focused, more than on the quantity, on the quality of these resources and on the adaptation of the models to specific tasks or problems with a specific corpus and training.
Normally, it is research centers that are in charge of Colombia Telemarketing Data developing language models, and private entities and consultants are in charge of adapting them and putting them into practice in specific contexts and businesses. At the IIC, due to our nature, we are capable of addressing both processes: PLN continues to be the protagonist of one of our lines of research, but we also work on its applicability in different areas.
Tasks of a computational linguist. Computational linguists are in charge of developing and maintaining the resources – mainly annotated corpora – that will be used to train the models to do a specific NLP task, such as summarizing texts, classifying them or extracting terms from them . For example, the Ancora corpus is a classical corpus annotated with basic morphosyntactic information, with which most Spanish NLP tools are trained (spaCy, Stanza). Along the same lines, the Mlsum corpus is a corpus of press summaries in various languages that is used to train models that generate summaries.
Along this path, we have already developed our own language model (RigoBERTa), providing quality to all the pieces that make it up with a multidisciplinary human team. But beyond that, what makes the difference in PLN is adapting the models to the needs of companies and institutions of all types . And it is not only about adjusting them to the task that you want to automate or improve, but also to the language and terminology used in that specific area, which is known as domain adaptation.
Who doesn't know today the virtues of ChatGPT or, in general, the advances of language models . Natural Language Processing (NLP) is in fashion and the teams behind these projects are not only made up of data scientists, architects and software developers, but also computational linguists (LC). This hybrid profile located between Linguistics and Artificial Intelligence can become the new unicorn, since currently additional preparation is required that is not always easy to achieve through a university degree. In this post, we review the skills required and some master's degrees and courses to train in computational linguistics.