How to detect field changes in Django
The Problem
While working on Django project, we have every now and then needed to know if a specific field of model has changed or not, and act accordingly. Let’s say, you are developing a logistics website, and want to store status changes of packages whenever there is one. So, you would have model structure similar to something like this:
from django.contrib.auth import get_user_model
from django.db import models
UserModel = get_user_model()
class Status(models.Model):
name = models.CharField(max_length=32, unique=True)
class Package(models.Model):
user = models.ForeignKey(UserModel, on_delete=models.CASCADE)
shipment_cost = models.DecimalField(max_digits=6, decimal_places=2)
weight = models.DecimalField(max_digits=5, decimal_places=2)
status = models.ForeignKey(Status, on_delete=models.CASCADE)
class PackageStatusHistory(models.Model):
package = models.ForeignKey(Package, on_delete=models.CASCADE)
from_status = models.ForeignKey(Status, on_delete=models.CASCADE, related_name='from_status', null=True)
to_status = models.ForeignKey(Status, on_delete=models.CASCADE, related_name='to_status')
created_at = models.DateTimeField(auto_now_add=True)
Then, one would add post_save
signals, and register the status change:
@receiver(post_save, sender=Package)
def register_status_change(sender, instance: Package, **kwargs):
old_status_id = ...
if instance.status_id != old_status_id:
# There is a status change
PackageStatusHistory.objects.create(package_id=instance.id,
from_status_id=old_status_id,
to_status_id=instance.status_id)
The problem is, we do not know the old status id! If somewhere in the code we do package.status = new_status
, the status
field of that class/model has changed, and old value is lost. There are some ways to tackle this problem, and we will analyze some of them.
Note:
We manually added 5 statuses: Created, InPreparation, Shipped, Received, Delivered.
We also added one package with id=1 for testing.
Solution 1: The easy way?
Any person who has used post_save
signal a lot knows that signal sends the updated field as an argument. That is, **kwargs
contains update_fields
which is a frozenset. That would help us right?
Not really. There are several drawbacks of this method:
- If you change the value of field to its current value (that is, the value stays the same, no change), if would be reflected in
update_fields
, which is not suitable for us. - This frozenset is not automatically generated by Django. It contains only the fields that you explicitly passed in
save()
method. This makes it hard for us to develop and maintain the code. I am not going to test this; however, you are free to do so.
Solution 2: Query the old value
We know that Django models contain the old values that we manually changed, until refresh_from_db()
is called. We can use this to our advantage: what if we get the old status from database, and check it with “dirty” Django model?
@receiver(post_save, sender=Package)
def register_status_change(sender, instance: Package, **kwargs):
old_status_id = Package.objects.get(id=instance.id).status_id
if instance.status_id != old_status_id:
# There is a status change
PackageStatusHistory.objects.create(package_id=instance.id,
from_status_id=old_status_id,
to_status_id=instance.status_id)
If we check:
>>> from cache_fields.models import *
>>> package = Package.objects.get()
>>> package
<Package: 1: Created>
>>> package.status_id
1
>>> package.status_id = 2
>>> package.status
<Status: InPreparation>
>>>
>>> PackageStatusHistory.objects.filter(package_id=package.id)
<QuerySet []>
>>> package.save()
>>> package.status
<Status: InPreparation>
>>> PackageStatusHistory.objects.filter(package_id=package.id)
<QuerySet []>
>>>
What? Why our status history is empty? Here is why: we are using post_save
, so the database call for old value happens AFTER we write a new value to the database. We could have switched to pre-save signal (so that we read old value before we update database), however, we need to add extra handler: when the package is created for the first time, it has no id to pass to PackageStatusHistory
:
@receiver(pre_save, sender=Package)
def register_status_change(sender, instance: Package, **kwargs):
old_status_id = instance.id and Package.objects.get(id=instance.id).status_id
if instance.status_id != old_status_id:
# There is a status change
PackageStatusHistory.objects.create(package_id=instance.id,
from_status_id=old_status_id,
to_status_id=instance.status_id)
Now, if we try this:
>>> from cache_fields.models import *
>>> package = Package.objects.get()
>>> package
<Package: 1: Created>
>>> package.status_id = 2
>>> package.status
<Status: InPreparation>
>>> PackageStatusHistory.objects.filter(package_id=package.id)
<QuerySet []>
>>> package.save()
>>> package.status
<Status: InPreparation>
>>> PackageStatusHistory.objects.filter(package_id=package.id)
<QuerySet [<PackageStatusHistory: 1: 1 -> 2>]>
>>>
Notice how we used
old_status_id = instance.id and Package.objects.get(id=instance.id).status_id
It is a nice shortcut forif
statement:and
statement executes second part (right-hand part) ONLY IF the first part is “True
”.
In our case, ifid
isNone
, second part is not executed, so value isold_status_id = None
If it is valid value, second part is executed:old_status_id = Package.objects.get(id=instance.id).status_id
So, it works? Yes, however, it is not the best implementation. Every time you have change (even if not in status
field) you will execute an extra database call, which, for large systems will slow you down.
Solution 3: Overwrite init
This method allows us to save extra database calls. Considering that Django models are just Python classes containing database object data, we can use our own fields to save old data:
class Package(models.Model):
user = models.ForeignKey(UserModel, on_delete=models.CASCADE)
shipment_cost = models.DecimalField(max_digits=6, decimal_places=2)
weight = models.DecimalField(max_digits=5, decimal_places=2)
status = models.ForeignKey(Status, on_delete=models.CASCADE)
def __init__(self, *args: Any, **kwargs: Any) -> None:
super().__init__(*args, **kwargs)
self.cached_status_id = self.status_id
When the object is fetched from database first time, we will have a “duplicate” status field. However, if anywhere in the code, we change the status field, we will have historical data in cached_status_id
field. Pretty cool, huh? Notice how we us status_id
instead of status, because it is easier to handle integers, rather than objects. Moreover, if not handled correctly (let’s say, select_related
is not used while fetching) then, you will have extra database calls to Status
model.
We can also now use post_save
signal, as we handle data right at the beginning. Now if we implement the signal:
@receiver(post_save, sender=Package)
def register_status_change(sender, instance: Package, **kwargs):
if instance.status_id != instance.cached_status_id:
# There is a status change
PackageStatusHistory.objects.create(package_id=instance.id,
from_status_id=instance.cached_status_id,
to_status_id=instance.status_id)
Which results in:
>>> from cache_fields.models import *
>>> package = Package.objects.get()
>>> package
<Package: 1: InPreparation>
>>> package.status_id
2
>>> package.cached_status_id
2
>>> package.status_id = 3
>>> PackageStatusHistory.objects.filter(package_id=package.id)
<QuerySet [<PackageStatusHistory: 1: 1 -> 2>]>
>>> package.save()
>>> package.status_id
3
>>> PackageStatusHistory.objects.filter(package_id=package.id)
<QuerySet [<PackageStatusHistory: 1: 1 -> 2>, <PackageStatusHistory: 1: 2 -> 3>]>
>>>
We prevented extra database calls, yay! We can extend this method saving the dict
of the model as variable (by using model_to_dict
function) and then we would have access to old value of every field! You can also extend this logic to mixin, as explained in this StackOverflow answer. However, I personally prefer this method, having a class variable for each cached field, which is a little hard to maintain, however, efficient.
Solution 4: Third-Party Libraries
There is also similar implementation with mixin as third-party library: Django Dirty Fields. There is not much explaining to do, so let’s just test it:
The code:
@receiver(post_save, sender=Package)
def register_status_change(sender, instance: Package, **kwargs):
old_status_id = instance.get_dirty_fields(check_relationship=True).get("status", None)
if instance.status_id != old_status_id:
# There is a status change
PackageStatusHistory.objects.create(package_id=instance.id,
from_status_id=old_status_id,
to_status_id=instance.status_id)
The result:
>>> from cache_fields.models import *
>>> package = Package.objects.get()
>>> package
<Package: 1: Shipped>
>>> package.status_id
3
>>> package.cached_status_id
3
>>> package.is_dirty(check_relationship=True)
False
>>> PackageStatusHistory.objects.filter(package_id=package.id)
<QuerySet [<PackageStatusHistory: 1: 1 -> 2>, <PackageStatusHistory: 1: 2 -> 3>]>
>>> package.status_id = 4
>>> package.is_dirty(check_relationship=True)
True
>>> package.get_dirty_fields(check_relationship=True)
{'status': 3}
>>> package.save()
>>> package.status_id
4
>>> PackageStatusHistory.objects.filter(package_id=package.id)
<QuerySet [<PackageStatusHistory: 1: 1 -> 2>, <PackageStatusHistory: 1: 2 -> 3>, <PackageStatusHistory: 1: 3 -> 4>]>
>>>
It works, nice! You can get access to the whole code from this repository. Feel free to add your ideas/suggestions!
comments powered by Disqus