Success Story

Where machine learning and ecology collide, formerly threatened species are making a comeback

Citizen science initiatives are generating an unprecedented quantity of wildlife data. As a result, conservation biologists can now look to sophisticated statistical approaches like machine learning to improve the survival of species threatened by global change. 

Karlsruhe Institute of Technology

ChallengeChallenges abound in a field like conservation biology where traditional wildlife monitoring is time- and resource-intensive. With the advent of app-based citizen science initiatives, however, wildlife conservationists can now use deep learning approaches to do far more with less. But many classically trained biologists lack the statistical expertise and coding skills to take full advantage of big data without more user-friendly tools.
SolutionResearchers like Frederick Kistner, a doctoral student at the Karlsruhe Institute of Technology in Germany, are working to develop algorithmic tools that will lower the barrier to entry for biologists with little prior experience dealing with big data. A key part of Kistner’s solution is JMP® statistical discovery software, which provides a customizable, end-to-end platform for the Footprint Identification Technique (FIT) first developed by conservation biologists at WildTrack.
ResultsKistner says that, by using a custom FIT application in JMP to optimize cluster models, he has been able to generate findings that would not have otherwise been possible in his work studying the Eurasian otter. Even more importantly, Kistner has pushed the boundaries of the application of both FIT and deep learning methods for ecological research that, if scaled, could ultimately better outcomes for endangered and threatened species worldwide.  

Though it has one of the most extensive geographical ranges for mammals globally – from coastal Europe to the waterways of North Africa and East Asia – the Eurasian otter (Lutra lutra) is endangered or even extinct in many areas where it was once ubiquitous. Urban development, overhunting and the impact of multiple climate crises are just a few factors that increasingly threaten species once thought to be hardy to the advances of global change. Even still, scientists report that the Eurasian otter population seems to be recovering in areas where it was once depleted. This recovery, and others like it, are evidence that conservation interventions can and do work.

The field of conservation, however, is today peppered with unknowns as scientists race to define new ways for humans and wildlife to co-exist – and to prioritize the actions taken to mitigate the effects of human encroachment on natural habitats. Even simply counting and mapping existing wildlife populations can be extraordinarily challenging, and traditional approaches to wildlife monitoring that rely on observational fieldwork are highly time- and resource-intensive. But advances in smartphone technology have recently enabled an exciting alternative: crowdsourcing data collection.

With powerful cameras in the pockets of millions of smartphone users worldwide, there has been a surge of new citizen science apps like Epicollect5, iNaturalist and eBird available to the public for free. Smartphone apps, with their image recognition capabilities, are the ideal field guide for citizen scientists. In turn, photos uploaded by the public are added to a growing database of geolocated wildlife image data – much of which is managed by prominent research institutions like the Smithsonian, Cornell and Oxford University.    

With new image data from around the world added to these data sets every minute, citizen science is changing the face of conservation research. No longer must biologists depend on their own field studies for insight; now, they have access to up-to-date observational data from around the world with just a single click. And technology-driven approaches mean researchers can use limited funding more efficiently.

“The availability of smartphones and the ability to collect loads of data and implement this in citizen science schemes is why I got back into [conservation biology],” explains Frederick Kistner, a doctoral student in the Department of Photogrammetry and Remote Sensing at the Karlsruhe Institute of Technology (KIT) in Germany. A self-described “numbers-savvy” ecologist, Kistner has a penchant for multivariate statistics and coding – a skillset he feels will become increasingly valuable in the world of wildlife conservation. Machine learning, he says, could revolutionize biological research should such approaches gain wider acceptance. For now, Kistner hopes his own research on Eurasian otter conservation will serve as a strong use case. 

The Eurasian otter (Lutra lutra) is endangered or even extinct in many areas where it was once ubiquitous. Urban development, overhunting and the impact of multiple climate crises are just a few factors that increasingly threaten species once thought to be hardy to the advances of global change. 

With the Footprint Identification Technique (FIT), scientists monitor wildlife populations from afar 

Otters are semiaquatic mammals that spend much of their lives in or near the water. One of the most effective ways of tracking their movements has therefore traditionally involved looking for the signs of presence and absence – in the form of feces or footprints – and, when sufficient funding is available, sending samples for DNA analysis. This kind of fieldwork, however, is extremely cost- and time-intensive and requires researchers to get physically close to animals in the wild – a requirement that has increasingly come under scrutiny as the science showing the negative effects of human-wildlife interaction has evolved.

An increasingly preferred alternative, Kistner explains, is footprint tracking, a non-invasive approach that doesn’t disturb animals in the wild. Better still, footprint tracking is primed for big data applications that can extend the impact of observational research using methods like the Footprint Identification Technique (FIT).

First introduced by Zoe Jewell and Sky Alibhai, scientists and co-founders of not-for-profit organization WildTrack, FIT draws on digital images of footprints to discern the species, individual identity, age-class and sex of the animal that left the print. This information is critical to conservation managers seeking to understand population dynamics over time.

“It may sound like a really simple question: How many animals are there?” Kistner explains. “But from a conservation perspective, it's this very simple information that's often missing.” While it may be impossible for researchers to disentangle data from unique individuals via traditional observational approaches, FIT uses a customized morphometric model to identify and track animals using non-invasive biometric data that would not be discernable to the naked eye. 

Kistner coaxes animals to walk across test substrates that mirror the sandy or muddy ground in their natural environment. Ultimately, he has used this method to collect foot and footprint image data for more than 40 unique individuals. 

Developing a machine learning model from a simulated data set

Before FIT could be applied to the study of otters in the wild, however, Kistner had to train the underlying algorithm he had developed. With no established data set meeting his requirements, Kistner partnered with captive otter rehabilitation facilities like the German Otter Foundation, where he constructed a reference database of footprints from known animals that can be used to train machine learning models.

This training data set could then be used to develop benchmark biometric and deep learning species prediction models. The extraction process, as with other image data applications, Kistner says, could have been excessively tedious were it not for the dynamic visual platform that FIT is built on: JMP® statistical discovery software.

“JMP is Excel on steroids,” he says. “A lot of the data exploration I do is done in JMP because it's very intuitive.” With JMP, Kistner can easily create landmark points on footprint images within the platform’s graphics window as part of the extraction process. The software automatically converts these landmarks into data that, when fed into a model, can then be compared in a pairwise way with other footprint images.

“It’s a three-group discriminant comparison where one group is a geometrical anchor of inter-canonical space,” he explains. “You have two confidence ellipses of two [otter footprint] trails that have two scenarios. If the ellipses overlap, it’s likely the same animal, and if they don’t, it’s likely different animals.” The distances between each trail calculated from pairwise comparisons inform a cluster model, which produces an output showing local population size probabilities.

“Footprint identification technology works by comparing sets of footprints on a parallel space against each other,” Kistner says. “When you apply this model in the field, you're looking for trails of footprints – one animal walking along a sandy river patch or beach, for example.”

In spite of the fact that Kistner has created models with classification accuracy beyond 80% for both sets and individuals, however, the translation of Kistner’s model to the wild is not without its logistical hurdles. Because otters spend a majority of their time in the water – not walking across sandy beaches – he explains, the main challenge is getting enough data points to meet the requirements of high certainty necessary for the algorithms to work.

In the meantime, Kistner is testing the biometric model’s accuracy by comparing results to data from the German Otter Foundation, which has identified biometrics via DNA analysis for the known individuals in its care. Furthermore, he is contributing to WildTrack’s prototyping of an end-to-end field testing functionality whereby researchers could upload an image of a footprint and get instant feedback as to the animal’s species, sex or even individual ID. 

  • First introduced by Zoe Jewell and Sky Alibhai, scientists and co-founders of not-for-profit organization WildTrack, FIT draws on digital images of footprints to discern the species, individual identity, age-class and sex of the animal that left the print. 

  • “Footprint identification technology works by comparing sets of footprints on a parallel space against each other,” Kistner says. “When you apply this model in the field, you're looking for trails of footprints – one animal walking along a sandy river patch or beach, for example.” 

    Citizen scientists fuel deep learning by contributing a more complete data set

    Even given their current limitations, however, machine learning models like what Kistner has developed for otter research will only get more accurate with time – and, crucially, with an influx of new data from which the model can learn. “When I started [using FIT] seven years ago, I thought, ‘This is a cool method, but it's more of an exciting outlier.’ At the time, we needed a lot of expertise to read a footprint and identify the important features,” Kistner says. “Now with deep learning, things are changing. We’re making progress.

    “There was one model I helped train that gave almost instant predictions of the bounding boxes. You could basically tell that with every 100 new images added [to the data set], the bounding boxes got better.”

    Another example of how a model’s accuracy improves over time, he explains, is in cases where there is a footprint within a footprint – what Kistner calls a double register. Surprisingly, he adds, “after a few hundred images, the model was actually [distinguishing between] the two footprints, and for different species no less.”

    Those images can now come not just from a single scientist collecting data in the field but from the public anywhere in the world. The advent of citizen science apps, Kistner says, was a game changer that has since made the requirements of accuracy attainable for deep learning approaches in conservation biology.

    “Incomplete data sets and missing values are a widely known problem... and where the data is insufficient is where I fit in as a tool builder,” he says. “We, as conservation biologists, need to follow the data scientists as there are so many powerful tools available.” 

    Lifting the burden of statistics on scientists in any field

    Deep learning may sound intimidating to biologists not trained in statistics, Kistner admits, but platforms like JMP have made data-driven approaches so much more accessible to a broad spectrum of scientists. “You may need to have a bit of training to at least understand what a P-value is and how distributions work… but you don’t need to make life harder by coding. That’s where I would argue JMP is definitely a great tool.”

    Moreover, the platform encourages exploration via trial and error, ultimately leading to a deeper understanding of the data. “You don’t have to be scared you’re going to break something. Just run your model and see what comes out. If it doesn’t make sense, think about why that might be the case and then try something else,” Kistner advises. JMP, he says, reduces the barriers to entry for scientists to apply statistical approaches to their fieldwork: Dive in and don’t worry about the “black box” of statistics. And for students, he adds, an interest in statistics is often born of having a relationship to the questions and data being explored: “Make them curious first, and then make sure they have a tool [like JMP] on hand.”

    That same spirit of curiosity is also driving the citizen science revolution more broadly. “We’re encouraging people to reconnect with nature,” Kistner says. “We’re getting people interested in going out in the forest to look at footprints. And from that we can extract data that helps advance bigger [wildlife] monitoring programs” – many of which rely on objective scientific evidence to build consensus around difficult conservation management and prioritization decisions.

    When anyone anywhere in the world with a smartphone can contribute to conservation efforts – simply by going out into their backyard – there are benefits all around: to scientists, to the public, to policymakers and ultimately, to animals like the Eurasian otter who are making a comeback from the brink of extinction.  

    The FIT platform is continually being improved by JMP developers who have partnered with WildTrack to advance the technology. “The JMP [organization] is very active in supporting projects,” Kistner says.

    Contribute data from your own backyard to support these important citizen science initiatives using machine learning to advance conservation projects around the world:

    • Epicollect5, an app developed by Oxford University’s Big Data Institute where you can help Fred Kistner and JMP partner WildTrack to collect footprint image data.
    • Leafsnap, an electronic field guide app from the Smithsonian and Columbia University.
    • eBird, an app developed by Cornell University to crowdsource ornithology data.
    • iNaturalist, a joint project of the California Academy of Sciences and the National Geographic Society.
    The results illustrated in this article are specific to the particular situations, business models, data input and computing environments described herein. Each SAS customer’s experience is unique, based on business and technical variables, and all statements must be considered nontypical. Actual savings, results and performance characteristics will vary depending on individual customer configurations and conditions. SAS does not guarantee or represent that every customer will achieve similar results. The only warranties for SAS products and services are those that are set forth in the express warranty statements in the written agreement for such products and services. Nothing herein should be construed as constituting an additional warranty. Customers have shared their successes with SAS as part of an agreed-upon contractual exchange or project success summarization following a successful implementation of SAS software.