Automatic exploration and classification of behaviours of migratory birds with ARLAS

April 09, 2020

        To better understand animal behaviour, many researchers are equipping some individuals with GPS beacons and sensors to track their movements. This new data are enabling major advances in the understanding of our biodiversity but also of climate change. To make it possible to analyse data that are voluminous, it is necessary to use high-performance tools. 

        At Gisaïa, we are sensitive to environmental concerns and are developing ARLAS Exploration, an open-source software for geo-spatial data mining capable of responding to Big Data challenges. We have therefore decided to put ARLAS to the test in order to make it possible to explore animal data interactively.

        Data from a study on a storks population was used, and we will see how ARLAS can be used to discover the behaviour of these migratory birds. We will also see how it can accompany the implementation of Machine Learning algorithms to automatically detect some of these behaviours.

A data set of storks

        The Max Planck Institute of Animal Behavior, a German research institute that studies wildlife, recently launched Movebank, a platform that aims to centralise data from animal studies conducted by researchers around the world, to encourage collaboration and make it possible to cross-reference data from different studies. This initiative also aims to make this data freely available to the general public. 

        In a study by Cheng & al. (2019) [1], researchers equipped a population of 169 storks with GPS beacons and collected data between 2013 and 2019. These data were retrieved from the Movebank [2] platform as CSV files.

        The white stork (Ciconia ciconia) is a large species of wading bird in the Ciconiidae family. Its plumage is mainly white, with black on the wings. This species has been the subject of protection and reintroduction programmes and is mainly found in Eastern and Western Europe. The stork is highly migratory and winters in Africa, making its movements particularly fascinating to study.

        The data were first processed in order to group and link together the successive observations of the same bird and to calculate the travelled distances. This type of processing is done using ARLAS PROC/ML, our massive distributed processing platform. The data thus formatted is then integrated into ARLAS Exploration and our storks are then ready to be explored.

ARLAS, a fluid and interactive exploration

        ARLAS Exploration is a map-centric application that allows to appreciate the spatial dispersion of the data. A bar of graphs on the left of the application also allows to visualise and filter the other dimensions of the data, so we can observe the distributions of different quantities such as the travelled distance, but also the altitude and the speed. At the bottom, the timeline allows to see the temporal distribution of the measurements made.

Overview of the study data in ARLAS Exploration

        It is clear that the traced storks move within a perimeter that extends from southern Germany to West Africa. There is a peak in the number of observations in August 2014. In addition, data are actually available for 81 birds for a total of about 7 million positions.

        The various graphs allow you to filter the data on the dimensions represented and the whole application is instantly updated with each new selection. It is also possible to navigate the map on certain areas and filter according to geographical selections drawn with a tool on the right side of the application.

Example of selection at the Gibraltar Strait

        Depending on the amount of data to be displayed in the application window, ARLAS switches from an “aggregate” mode, density maps ideal for general visualisation of the flows, to a “features” mode, the detail of the actual data to observe the actual paths of the storks. We can thus isolate interesting behaviours, and see that some storks seem to use hot airstreams to gain altitude, for example:

Example of the path of the ‘Wibi 3’ stork coloured by altitude

        The actual route displayed can be colored in different ways, depending on the speed or the bird ID for example:

Trajectories of 4 coloured storks per speed
Trajectories of 4 coloured storks per bird ID

        These two representations allow us to understand that the four selected storks named Hans, Schwitza, Kiki et Julia, move together at the same pace over this period, while still being able to clearly distinguish the four storks.

       Thus, ARLAS Exploration is a tool particularly adapted to interact intuitively and interactively with bird positioning data, even when the volume of data becomes large. This makes it a strong partner for researchers.

Towards the detection of migration

        The observed storks tend to migrate great distances to change habitat locations. One can then distinguish two attitudes: staying in the same area (local) or travelling to change areas (travel). Both of these behaviours are locally visible to the naked eye in ARLAS Exploration, but automatic detection of these activities could make it possible to study the migrations of all birds on a large scale in an extremely efficient manner. This is why we have chosen to use Machine Learning algorithms to automate this identification. Supervised learning was carried out to train the classification model. 

This process was therefore carried out in several stages: 

  • Construction of a training set
  • Calculation of new indicators
  • Choice of the classification model
  • Viewing the results


Construction of a training set

     A supervised classification model needs training data to learn how to recognise targeted behaviours. In our case, it is necessary to annotate our data.  Each fraction of a trip must be identified as ‘travel’ or ‘local’

ARLAS Exploration is a tool particularly adapted to the creation of training sets since it allows to assign a label to the current data selection. It is therefore possible to manually identify the parts of trajectories corresponding to a large displacement (“travel”) or a local activity (“local”) and label them as such.

Labelling interface
Example of a trajectory of the training set, ‘local’ (red) and ‘travel’ (green).

In practice, 4 birds have been labelled as such:

        For each bird, a period of approximately a year was used to capture at least one round trip in the migration. We therefore have a total of 334,372 fragments (interval between two measurements) that will be usable for training the model.

Calculation of new indicators

       The quality of a Machine Learning algorithm depends above all on the quality of its training data. Once a sufficient number of representative fragments of the behaviors to be detected have been labelled, it is necessary to choose the sizes that will be given as input of the model. First of all, the tagged data can be retrieved using an ARLAS API, available in Python among others, which allows the training data to be downloaded. 

Data recovery using ARLAS API (Python)

      In our case, the selected features illustrating the movement of these birds will be based on travelled distances and “as the crow flies” distances” calculated over different time windows. These features are not present in the raw data and can be calculated using ARLAS PROC/ML, our processing platform adapted to large volumes of data.

       Once these new quantities have been calculated, they can be used to train the different chosen Machine Learning models.

Choice of the classification model

        Several classification models have been tested. In order to be able to compare the quality of these classifiers, the calculation of indicators is necessary. Since the classes are disproportionate (~6% travel in the training set), several metrics were used to correctly evaluate the quality of ‘travel’ detection. A “cross-validation” is performed to avoid overfitting by partitioning the training set and measuring classification performance on data that is not used during the training of the model.

The metrics used are based on the confusion matrix of the prediction:

We have:

  • Accuracy: Overall proportion of good classification 
  • Recall: Share of correctly detected real ‘travel
  • Precision: Proportion of detected ‘travel’ that is actually real
  • Specificity:  Share of correctly detected real ‘local’ 
  • F1-score: Harmonic mean (trade-off) between Recall and Accuracy 

       As all experiments are performed under the same conditions, the models can be compared with each other, in particular thanks to the MLFlow tool used to record the results. Finally, after numerous experiments, an XGBoost classifier was chosen, both for its performance and its training speed.

Viewing the results

        Once the model has been chosen and trained, it can be applied to other birds and the results of this migration detection can be exported to ARLAS Exploration thanks to the tagging system (also available via API). It is then possible to visualise the results directly in the application. This allows a better understanding of our model by quickly identifying on which part of the data the predictions would fail. It is also possible to validate or correct the results, which makes it possible to increase the training set and to train the model again on more data. In the case of our storks, the model was applied to 26 birds, corresponding to 2,650,000 fragments.

        We can also follow the track of a particular bird and identify the different stopping places along its route. For the stork named Zozu, for example, the following results are obtained:

Selection of a trace of the stork named Zozu

      We can also date the great movements of these storks. If we consider the predicted travel fragments of the stork named Zozu, for example, we can observe the different peaks on the timeline, which makes it possible to identify the periods of the year when the bird migrated:

Selection of ‘travel’ fragments of the stork named Zozu

      If we look at one of these peaks in particular, we can date these migrations very precisely and see the different stages of the journey, so this journey between Switzerland and northern Spain between 21/08/2015 and 29/08/2015 took place in 7 stages:

Selection of ‘travel’ fragments of the stork named Zozu (21/08/2015 - 29/08/2015)

    The migration of these birds can also be explored on a larger scale. If one selects all the fragments in ‘travel’ for all the labelled birds, one can see a migration corridor following the Mediterranean coast towards Spain in the south of France:

Selection of fragments identified as ‘travel’

        Finally, it is possible to identify the different living places favoured by the storks during their journey around the Strait of Gibraltar by selecting the ‘local’ predicted fragments for all the storks:

Selection of fragments identified as ‘local’

     The automation of migration detection has therefore greatly facilitated its analysis and ornithological experts can now focus on the variations in dates and destinations of the migrations undertaken by the storks. The possibility of cross-referencing this information with other data, such as meteorological data, can provide an even better understanding of the behaviour of these large migratory birds in relation to, for example, climate changes.


         We have seen that ARLAS is particularly well suited for exploring position data of storks. The interactive map navigation provides valuable information on the behaviour of these migratory birds. But ARLAS can also be used to support the production of Machine Learning algorithms by facilitating the creation of training set and the visualisation of classification results. Finally, once the Machine Learning models have been trained, it is possible to apply them to large-scale data with ARLAS PROC/ML and see the results in ARLAS Exploration. All these results are available in a demonstration that is available at demo

          If we have focused here on migration, many other animal behaviour could be the subject of such studies. Obviously, ARLAS can be applied to all kind of geo-traced animals, but also to all geo-referenced data. Feel free to have a look at the other application examples on


We would like to thank the director of the Max Planck Institute of Animal Behavior, Dr M. Wikelski ,and his team for providing this data.


[1] Cheng Y, Fiedler W, Wikelski M, Flack A (2019) “Closer-to-home” strategy benefits juvenile survival in a long-distance migratory bird. Ecology and Evolution. doi:10.1002/ece3.5395

[2] Fiedler W, Flack A, Schäfle W, Keeves B, Quetting M, Eid B, Schmid H, Wikelski M (2019) Data from: Study “LifeTrack White Stork SW Germany” (2013-2019). Movebank Data Repository. doi:10.5441/001/1.ck04mn78