Maximum Sense about MaxEnt? My Two Cents

Some of you may be aware of an intense discussion surrounding a post by Professor David Warton on the wonderful Methods in Ecology and Evolution blog.  In this post Professor Warton outlines a series of key findings in various papers but, in particular one of his own coauthored with Ian Renner, which confirms a mathematical equivalence between what is often considered to be a rather ‘old’ statistical model type known as a Generalised Linear Model (GLM) and a ‘new’ and very popular modelling technique known as MaxEnt.  Both techniques have been used extensively in the field of species distribution modelling and the popularity of MaxEnt has meant that this post has received an incredible amount of attention: indeed this post has been brought up in a number of conversations with my colleagues (admittedly often at my behest) and even in a question and answer session at the GfÖ macroecology meeting in Göttingen last month.

Now in some respects I am very late to this particular party: the original post was submitted to the Methods in Ecology and Evolution blog way back in February.  However, the post is still generating comments and there is still clear interest in discussing this further so I’ve decided that, despite this, I’m going to pour further fuel onto this debate with my own ill-considered opinions.  Before I go into the meat of the claims, I would just like to point out that this topic would not have been so pertinent had MaxEnt not gained the popularity that it has today and that is, in no small part, down to the brilliant coding efforts of Stephen Phillips, Robert Anderson and Robert Schapire who, through the provision of an excellent, easy-to-use software package has made it easier for generations of macroecologists to perform distribution modelling on their particular data sets.

Now, what exactly do Ian Renner and David Warton mean when they say in their paper that MaxEnt is equivalent to a GLM?  Well their paper outlines two key findings:

  1. They first prove that the MaxEnt procedure fits a model that is mathematically equivalent to Poisson regression (a particular flavour of GLM).
  2. That this Poisson regression model can be related to a Poisson point process model and that, therefore, MaxEnt can be too.

This first point is important.  The authors state that MaxEnt is mathematically equivalent to Poisson regression.  This is not the same as statistically equivalent, or predictively equivalent.  If true, what this means is that any differences observed between MaxEnt and Poisson regression is not due to any assertions that MaxEnt has ‘extra flexibility’ or is a ‘better model’; that, fundamentally at least, these two models are doing exactly the same thing.

I should point out a couple of caveats at this point.  Firstly, the mathematical equivalence is only proven, as far as I can tell, for those feature types from MaxEnt that can be specified in terms of linear predictors: linear features, quadratic features, and product features.  The latest version of MaxEnt also offers ‘hinge features’ and ‘threshold features’ which are not, as yet, accounted for in this paper.  However, this is a minor point because I feel that these aren’t very good terms to include in a distribution model and, that if you really care about them that much then you can also include them pretty easily by extending the standard Poisson regression to include these components and, under these circumstances, it would seem that you would’ve restored the mathematical equivalence between your new model and MaxEnt anyway.

Secondly, this mathematical equivalence is true only at the fitted values of a model with the same predictor variables.  However, there is variation in implementation of these techniques, for example MaxEnt undergoes its own form of internal model selection (it calls it ‘feature selection’) meaning that the final output may not use all the predictor variables in the same way that you have entered them.  Similarly, when we implement a GLM, we often run a number of different models with different numbers of predictor variables and use some sort of model selection criteria to chose the best one.  How you make predictions into unsampled areas also differs between common implementations of the methods: the default settings of MaxEnt uses ‘clamping‘ which means that the projection of the model in climatically novel areas does not follow the same relationship derived from climates with known occurrences (I personally dislike the ad hoc nature of clamping but that can be another post for another day).

These implementation details are therefore the reason why we see any differences between MaxEnt and Poisson regression at all but, as the authors show, even with all the implementation differences, maps generated using the two techniques are still remarkably similar.  Some commentators have argued that we may see differences between the models because of deficiencies in the metrics used to assess the performance of distribution models.  Whilst I agree that commonly used metrics of model performance are terrible (I’m looking at you AUC: again another post for another day), that cannot be the explanation in this case.  If the models are mathematically equivalent and, if they have the same implementation, then any metric of performance, no matter how poor, would be unable to distinguish between them.

Now, we come to the second point of the paper: that MaxEnt can be related to a point process model.  For those not familiar with point process models, these models basically describe the probability of observing a known number of records in space at particular locations.  The particular point process model being compared in this paper is the ‘inhomogenous Poisson point process model’ in which we can imagine a two-dimensional probability distribution being draped over the region of interest and presence records being drawn according to it.  The authors show that both MaxEnt and Poisson regression models essentially parameterise this probability distribution by setting the probability of the presence record falling in each cell according to a linear relationship with the climatic predictors and only the predictors.  Without extra modifications, both modelling frameworks assume that there are no observation biases preventing observations from appearing in certain cells.  Some commentators have argued that the equivalence is not true because of differences in the assumptions about sampling effort but, as we can see here, neither method tackles sampling bias at all and both assume, rather poorly, that the only thing determining the probability of a record being drawn from a particular cell is the climate.

Some commentators have reacted with something along the lines of ‘so MaxEnt and Poisson regression are the same, so what?’.   This is a line that I simply cannot understand.  Firstly, we have known for a long time now that, whilst it still is an impressive package, there are some deficiencies in MaxEnt.  Just like in any field, we should always be on the lookout for new ways to improve our predictions and what has hampered us for a long time now is the ‘blackbox’ nature of out most beloved modelling technique.  Now that this equivalence has been shown then we can begin to develop better ways to model species distributions by extending a well-known and mathematically tractable system rather than having to unpick the guts of MaxEnt.  Secondly, this paper also explains why we get the observed differences in model performance between MaxEnt and Poisson regression.  Now we know to start looking at the implementation details and the model selection procedures.  For those that think this is simply a theoretical exercise in mathematical acrobatics I can assure you that it is not and I venture that it will not be long before we start to see new methods appear that base themselves on the findings of this paper.



2 thoughts on “Maximum Sense about MaxEnt? My Two Cents

  1. nice read!

    a thought on your statement below..

    ” …, neither method tackles sampling bias at all and both assume, rather poorly, that the only thing determining the probability of a record being drawn from a particular cell is the climate.”

    I don’t think assuming climate is the only factor affecting a species distribution is a problem, as long as the results are clearly labeled as “potential climatic suitability”. One I believe can further develop a realised distribution by considering other factors not accounted for in the models.

    • Thank you for your comment! Unfortunately I don’t even really believe the outputs of commonly applied SDMs can even be reliably interpreted as ‘potential climatic suitability’. The occurrence data used to parameterise the SDMs are the product of ‘realised’ process and as such, non-climatic range limitation, such as dispersal limitation may mean that the outputs of your SDM may fail to capture suitable areas that a species hasn’t yet colonised yet.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s