Why using select_related() by default can be dangerous

As you (should) already know, Django's select_related allows you to improve the performance of your applications by using joins to fetch related objects, thus reducing the number of executed SQL queries.

I was wondering why select_related() wasn't used by default when performing lookups and it only came clear to me when I hit a weird bug after trying to optimize an app.

A concrete example

The following example has been taken and adapted from a real app, rank-me. Let's consider this model:

class Game(models.Model):
    winner = models.ForeignKey(CustomUser)
    loser = models.ForeignKey(CustomUser)
    date = models.DateField()

    def update_score(self):
        # calculate_score() is out of the scope of this article
        self.winner.score, self.loser.score = calculate_score(
            self.winner.score,
            self.loser.score
        )

        self.winner.save()
        self.loser.save()

Now let's say we want to create a command in the app that will recalculate the score of every participant by going through each game and running the update_score method. We could implement it like this:

# Reset all users scores to an initial value of 100
CustomUser.objects.update(score=100)

# Cycle through each game and update the winner/loser score
game = Game.objects.all()
for game in games:
    game.update_score()

So far so good. If we execute this piece of code, the users scores will be recalculated correctly. Now let's imagine our app makes a heavy use of the Game model and we'd like to reduce the number of SQL queries generated by the default lookups. All we need to do is override the default manager and redefine its get_queryset() function to always use select_related():

class GameManager(models.Manager):
    def get_queryset():
        return super(GameManager, self).get_queryset().select_related()

class Game(models.Model):
    # ...

    objects = GameManager()

Now if we run the score recalculation again, we'll notice that every user will end up with a score that had only one iteration (ie. it will always call calculate_score(100, 100)), regardless of the number of games the player has won or lost.

Why?

The difference is that in the first case, when we fetched the Game objects with Game.objects.all(), the query returned something like this:

id winner_id loser_id date
1 1 2 2014-02-27
2 1 2 2014-02-28
3 2 1 2014-02-28

By adding the select_related(), it returned something like that:

id winner_id loser_id date winner.score loser.score
1 1 2 2014-02-27 100 100
2 1 2 2014-02-28 100 100
3 2 1 2014-02-28 100 100

So when we fetched all the Game objects to iterate on them, every Game object already had its winner and loser objects populated, and every time we used self.winner or self.loser, we used the instance that was created when we fetched all the games.

So?

Think twice before adding select_related() to your default managers because it might bite you someday. Also if you added it to your default manager and you know you don't want to use it in a particular case, remember you can use select_related(None) to clear it (Django >= 1.6 only).

Comments !

social