In today's rapidly evolving world of technology and research, Large Language Models (LLMs) have taken centre stage. These sophisticated models have become invaluable tools for scientific researchers, opening up new horizons in natural language processing and data analysis.
In this blog post, I delve into the world of LLMs and explore what they are and their worth to researchers, especially those in developing countries. Additionally, I address the challenges researchers in low- and middle-income countries face when trying to exploit the power of LLMs, and I issue a call for a more inclusive research landscape.
What are Large Language Models?
LLMs are a class of artificial intelligence (AI) models that use deep learning algorithms to generate human-like language. They are useful for tasks like answering questions in a conversational manner, summarising documents, text generation and classification, and interpretation of major global languages, among others.
They have applications across a wide range of areas, from chatbots and language translation to content generation, data analysis, and more. The key to their power lies in their massive neural networks, trained on extensive data sets, allowing them to predict and generate text with remarkable accuracy. There are a number of publicly accessible LLMs, although customised LLMs have also been developed for private use in businesses and non-profit organisations (see Khanmigo; the Khan Academy’s AI tool). These models vary in scope, purpose, and complexity. One of the fundamental reasons for their differences stems from the type of data used to train the models, upon which future predictions or outcomes are based.
There are a number of publicly accessible LLMs, although customised LLMs have also been developed for private use in businesses and non-profit organisations.
Examples of LLMs and Their Key Features
Let's now explore some of the prominent LLMs that have captured the imagination of researchers and developers.
OpenAI’s Generative Pre-Trained Transformer (GPT): GPT-3 (now GPT-3.5) and its successor, GPT-4, are well-known for their content generation capabilities. GPT-3, with its 175 billion parameters, can produce human-like text and assist in various natural language understanding tasks. GPT-4, equipped with even more parameters, promises to push the boundaries further. Currently, only GPT-3.5 is free to use, while use of GPT-4 incurs a monthly cost (currently ~US$ 240 annually). Numerous analyses have shown how the high operating costs of running these models could underpin the pricing of subscription models.
Google's BARD and Probabilistic and Abstraction Language Model (PaLM2): Google's BARD is another conversational chatbot that has caught the attention of many. It is based on PaLM2, which aims to enhance the understanding and abstraction of text, making it a valuable tool for providing summaries and data analysis.
Meta’s (Facebook) Language Learning and Modeling Architecture (LLaMA): This is also a generative text model that can be used for linguistic analysis, algorithm development, and language modeling. It focuses on understanding and generating text in multiple languages and is said to be able to potentially break down language barriers in research and communication. This is questionable, however, since the model can handle only a select few of the global languages. The tool is available free of direct charges for companies and researchers to use. Meanwhile, Meta and Microsoft have formed a partnership to merge their tools with new additions to the Bing search engine and the Azure Cloud Services.
Aside from text-generative LLMs, other AI tools are well-known for image manipulation (e.g., Microsoft's Bing Image Creator powered by DALL·E 3) and coding (Starcoder, or StarCoderBase), among other applications.
Benefits of LLMs to Scientific Researchers
LLMs are sometimes touted as having the potential to carry out knowledge synthesis and analysis, and accelerate the proces of reviewing the literature, among other uses. While these tools might support such processes, I take these claims with a pinch of salt given the many limitations the tools have in these areas. I provide two key, consistent generic benefits that these models have as algorithmic tools that produce “garbage out” based on the information they “garbage in”. I discuss their potential in producing summaries of information, coding assistance, and as a database querying tools.
Summarising and generation of information based on algorithms: LLMs are useful for textual analysis and can help in producing summaries of more extensive information. They also have the potential to uncover new information. In many cases, LLMs drawing from large training data sets are able to produce grammar-corrected information, although there are cases where outcomes need to be cross-checked. These tools produce outputs based on algorithmic predictions, and the quality of the outputs is based on the quality of the input algorithms and the training data, which show their capabilities and limitations. For example, it is possible to instruct these tools to mimic the styles of certain well-known writers, fields of study, countries, levels of education, and even certain periods in history. I illustrate this in Figure 1, based on Google's BARD.
Natural Language Querying for Databases and Coding Assistance: Some LLMs can act as powerful interfaces for researchers to query databases using natural language. This simplifies the process of extracting information, enabling researchers to formulate complex queries without the need for specialised database query languages. Such LLMs have also proven useful for coding assistance and debugging of different analytical software packages. Popular generic LLMs can provide useful support regarding coding of analytical software, including R, Stata, Python, and MATLAB.
Limitations of Current LLMs for Researchers in Developing Countries
A main problem associated with the use of these models is that the training data upon which they are built is biased towards the literature and knowledge sourced from developed countries. The implication is that the results will not properly reflect the contexts of developing countries, often ignoring a vast amount of literature.
Lack of Domain Specificity: LLMs may lack domain specificity, leading to limitations in understanding and generating highly specialised scientific content. Researchers in niche fields may find that these models struggle to grasp nuanced terminology and may produce outputs that lack the depth required for advanced scientific discourse.
Potential for Biases: LLMs can inadvertently perpetuate biases which may be present in the training data, thus posing a significant concern for scientific research. If the training data contains biases, the models may produce biased or inaccurate results. This can potentially affect research outcomes and reinforce existing biases.
Limited explainability and inability to provide accurate references to generated texts: The inherent complexity of LLMs can result in a lack of transparency, rendering it challenging for researchers to understand how the models arrive at specific conclusions. This downside may hinder researchers in trusting the outputs of these models, especially in critical scientific decision-making processes where transparency is paramount. Since LLMs tend to generate content from existing sources, it is expedient that they provide references for the information produced. However, this turns out to be quite a difficult task. An evident example is presented in a recent case study: "Artificial Intelligence Not Yet Intelligent to be a Trusted Research Aid”. The author highlighted ChatGPT's consistent failure to provide true and correct information and references. This is why texts generated by some LLMs are hard to trust.
Limitations to the Use of Real-Time Information: LLMs are trained on historical data. Thus, they only inculcate information to the extent such information is used in the training process. This means that real-time events like current news reports may not be included in LLM predictions, which may result in wrong conclusions or misleading information. Although paid LLM versions tend to attempt to include more current information, there is a time lag between when events happen and when they form part of the input on which LLMs base their output. For researchers who read recent issues of journal articles or who want to include such information in their policy implementations, ignoring real-time information is a problem.
Hallucinations and lack of replicability: LLMs produce outputs based on their training data and cannot always cite their sources appropriately. This is because they depend on training data and extrapolations to create outcomes. These extrapolations sometimes result in outcomes that do not make sense, which is known as hallucination. In a recent blog cover picture, I used Microsoft Bing which created a blurry image due to hallucinations. Some researchers argue that these outputs are fabrications and falsifications, calling for care in the use of these tools. Moreover, using the same query, it is possible for the same LLM to produce different results as many times as a given query is entered. In research, the ability to repeat the same result based on the same query assists with the need for replicability and reproducibility, aspects that are lacking in the LLM context. Based on Figure 1 above, it is evident that the same query produced different results using Google's BARD. A critical comparison shows that the main focus of the tool is the use of synonyms without any clear trait of creativity. To achieve replicability, we need to obtain the same results based on the same trigger when used multiple times and across multiple LLM models.
LLMs often favour English or widely spoken languages such as French. Limited language support can hinder researchers in using less common languages.
Case Study: Problems Using ChatGPT to Translate from French and Ga-Dangme to English
LLMs can translate text between languages, thus potentially breaking down language barriers and facilitating international collaboration, meanwhile enhancing a comprehensive research process. While this is true for languages like English and French, among others, this is not so for many local languages in countries in the Global South. In Ghana, for instance, there are over 80 local languages; in Nigeria, there are over 525 languages; and there are over 260 spoken languages in Cameroon. LLMs often favour English or widely spoken languages such as French. Limited language support can hinder researchers in using less common languages.
As an example, Figure 2 above shows ChatGPT's inadequacy in correctly translating the two sentences that have the same meaning in English: it only translated the sentence in French correctly, but the same sentence in a Ghanaian local language was translated incorrectly.
Conclusion and Implications
In conclusion, LLMs represent a pivotal development in the field of human language processing, with far-reaching implications. Development of tailor-made LLMs that capture various contextual issues in the Global South and are able to provide related outcomes is required to provide essential tools for researchers from low- and middle-income countries. While I am suspicious of any alleged “intelligent” capabilities of LLMs, their potential in language processing, summarising, and large data processing backed by application programming interfaces (APIs) provides strong research potential for researchers in developing countries.
Therefore, both the technology development community and researchers should aim to work together to address the limitations of LLM use in developing countries and enhance their utility for practitioners. Fostering collaboration, sharing knowledge, and providing training opportunities will help achieve a more inclusive and equitable research landscape. As many fancy algorithmic tech tools are being developed, it is vital to note that human oversight remains crucial to avoid falling into the obvious pitfalls.
Rhoda Ladjer Akuaku is currently the Secretary of the AuthorAID Ghana Hub and a Fellow of the Aspire Institute, a programme founded at Harvard University. She is also an administrative volunteer at the University of Chicago Medicine, Illinois, USA. Her recent research focuses on the drivers of TikTok usage and their effect on academic performance. She is also a sustainability associate with the Dataking Research Lab. Rhoda is a recent graduate of the University of Ghana and holds a Bachelor of Arts in English Education.