A common cliche held in the language industry is that translation helps to break the language barrier. Since the late 1950s, researchers have been attempting to understand animal communication. Now, scientists are blending animal wisdom with LamDA’s secrets, embracing GPT-3’s essence.
By studying massive datasets, which can include audio recordings, video footage, and behavioural data, researchers are now using machine learning to create a programme that can interpret these animal communication methods, among other things.
Closer to Reality
The Earth Species Project (ESP) seeks to build on this by utilising AI to address some of the industry’s enduring problems. With projects like mapping out crow vocalisations and creating a benchmark of animal sounds, ESP is establishing the groundwork for further AI research.
The organisation’s first peer-reviewed publication, Scientific Reports, presented a technique that could separate a single voice from a recording of numerous speakers, demonstrating impressive strides being made in the field of animal communication with the help of AI, inspiring the audience with the possibilities.
Scientists refer to the complex task of isolating and understanding individual animal communication signals in a cacophony of sounds as the cocktail-party problem. From there, the organisation started evaluating the information in bloggers to pair behaviours with communication signals.
ESP co-founder Aza Raskin stated, “As human beings, our ability to understand is limited by our ability to perceive. AI does widen the window of what human perception is capable of.”
Easier Said than Done
A common mistake is assuming that animals employ sounds as one form of communication. Visual and tactile stimuli are as equally significant in animal communication as auditory stimuli, highlighting the intricate and fascinating nature of this field, which is sure to pique the interest of the audience.
For example, when beluga whales communicate, specific vocalisation cues show their social systems. Meerkats utilise a complex system of alarm cries in response to predators based on the predator’s proximity and level of risk. Birds also convey danger and other information to their flock members in the sky, such as the status of a mating pair.
These are only a few challenges researchers must address while studying animal communication.
To do this, Raskin and the ESP team are incorporating some of the most popular and consequential innovations of the moment into a suite of tools to actualise their project – generative AI and huge language models. These advanced technologies can understand and generate human-like responses in multiple languages, styles, and contexts using machine learning.
Understanding non-human communication can be significantly aided by the insights provided by models like OpenAI’s GPT-3 and Google’s LaMDA, which are examples of such generative AI tools.
ESP has recently developed the Benchmark for Animal Sounds, or BEANS for short, the first-ever benchmark for animal vocalisations. It established a standard against which to measure the performance of machine learning algorithms on bioacoustics data.
On the basis of self-supervision, it has also created the Animal Vocalisation Encoder, or AVES. This is the first foundational model for animal vocalisations and can be applied to many other applications, including signal detection and categorisation.
The nonprofit is just one of many groups that have recently emerged to translate animal languages. Some organisations, like Project Cetacean Translation Initiative (CETI), are dedicated to attempting to comprehend a specific species — in this case, sperm whales. CETI’s research focuses on deciphering the complex vocalisations of these marine mammals.
DeepSqueak is another machine learning technique developed by University of Washington researchers Kevin Coffey and Russell Marx, capable of decoding rodent chatter. Using raw audio data, DeepSqueak identifies rodent calls, compares them to calls with similar features, and provides behavioural insights, demonstrating the diverse approaches to animal communication research.
ChatGPT for Animals
In 2023, an X user named Cooper claimed that GPT-4 helped save his dog’s life. He ran a diagnosis on his dog using GPT-4, and the LLM helped him narrow down the underlying issue troubling his Border Collie named Sassy.
Though achieving AGI may still be years away, Sassy’s recovery demonstrates the potential practical applications of GPT-4 for animals.
While it is astonishing in and of itself, developing a foundational tool to comprehend all animal communication is challenging. Animal data is hard to obtain and requires specialised research to annotate, in contrast to human data, which is annotated in a simple manner (for humans).
Compared to humans, animals have a far limited range of sounds, even though many of them are capable of having sophisticated, complex communities. This means that the same sound can have multiple meanings depending on the context in which it is used. The only way to determine meaning is to examine the context, which includes the caller’s identity, relationships with others, hierarchy, and past interactions.
Yet, this might be possible within a few years, according to Raskin. “We anticipate being able to produce original animal vocalisations within the next 12 to 36 months. Imagine if we could create a synthetic crow or whale that would seem to them to be communicating with one of their own. The plot twist is that, before we realise what we are saying, we might be able to engage in conservation”, Raskin says.
This “plot twist”, as Raskin calls it, refers to the potential for AI to not only understand animal communication but also to facilitate human-animal communication, opening up new possibilities for conservation and coexistence.