Sunday, January 14, 2007

Immutable instances in Python

Imagine you have a class like Coordinate that implements a two-dimensional coordinate pair. You might want to use instances of that class in a dictionary, but the problem is that instance keys compare by identity, not equality:

>>> D = {Coordinate(2, 3): "something"} # Coordinate is a custom class.
>>> D.has_key(Coordinate(2, 3)
False

This is not what you expect: even though the two instances of Coordinate(2, 3) have the same value, they don't have the same ID and therefore Python won't treat them as the same dictionary key.

The answer to this problem is to give the class __hash__ and __eq__ methods:

class Coordinate(object):
def __init__(self, x, y):
self.x = x
self.y = y
def __hash__(self):
return hash(self.x) ^ hash(self.y)
def __eq__(self, other):
try:
return self.x == other.x and self.y == other.y
except AttributeError:
return False


Now two instances that compare equal will also have the same hash, and Python will recognise them as the same dictionary key.

But there's a gotcha: unlike built-in types like int and tuple, classes in Python are mutable. That's generally what you want, but in this case it can bite you. If the instance which is the key is changed, the hash will also change and your code will probably experience difficult to track down bugs.

The solution is to make Coordinate immutable, or at least as immutable as any Python class can be. To make a class immutable, have the __setattr__ and __delattr__ methods raise exceptions. (But watch out -- that means that you can no longer write something like self.x = x, you have to delegate that to the superclass.)

class Coordinate(object):
def __setattr__(*args):
raise TypeError("can't change immutable class")
__delattr__ = __setattr__
def __init__(self, x, y):
super(Coordinate, self).__setattr__('x', x)
super(Coordinate, self).__setattr__('y', y)
def __hash__(self):
return hash(self.x) ^ hash(self.y)
def __eq__(self, other):
try:
return self.x == other.x and self.y == other.y
except AttributeError:
return False


There are a few other things you can do as well: as a memory optimization, you can use __slots__ = ('x', 'y') to allocate memory for the two attributes you do use and avoid giving each instance an attribute dictionary it can't use. If the superclass defines in-place operators like __iadd__ etc. you should over-ride them to raise exceptions. If your class is a container, you must also make sure that __setitem__ etc. either don't exist at all or raise exceptions.

(I am indebted to Python guru Alex Martelli's explanation about immutable instances.)

[Update, 2007-04-02: fixed a stupid typo where I called super(Immutable, ...) instead of super(Coordinate, ...).]

2 comments:

alsuren said...

Thanks for the tutorial. Good to know that making objects immutable is as easy as breaking __setattr__ and __delattr__ :D.

One thing strike me though:

1) Your hashing hack is broken, because:
hash(5)^hash(2)==hash(2)^hash(5)
A slightly more reliable way to do it might be:
return hash( ("Coordinate",self.x,self.y) )

What would be really nice is if you could define a function like this:

def setInConcrete(instance):
instance.__delattr__=_RaiseError
instance.__setattr__=_RaiseError

Shame it doesn't work, eh? :P

David
http://alsuren.livejournal.com

Vlad the Impala said...

Thanks for the comment David, glad it was helpful.

As for the __hash__ method, it is true that Coordinate(2, 5) and Coordinate(5, 2) return the same hash function. That's known as a collision, and generally it is a good idea to minimize the number of collisions. But dict lookup will still work, if you have an __eq__ method to resolve collisions.