Imagine you have a class like Coordinate that implements a two-dimensional coordinate pair. You might want to use instances of that class in a dictionary, but the problem is that instance keys compare by identity, not equality:>>> D = {Coordinate(2, 3): "something"} # Coordinate is a custom class.
>>> D.has_key(Coordinate(2, 3)
False
This is not what you expect: even though the two instances of Coordinate(2, 3)
have the same value, they don't have the same ID and therefore Python won't treat them as the same dictionary key.
The answer to this problem is to give the class __hash__
and __eq__
methods:class Coordinate(object):
def __init__(self, x, y):
self.x = x
self.y = y
def __hash__(self):
return hash(self.x) ^ hash(self.y)
def __eq__(self, other):
try:
return self.x == other.x and self.y == other.y
except AttributeError:
return False
Now two instances that compare equal will also have the same hash, and Python will recognise them as the same dictionary key.
But there's a gotcha: unlike built-in types like int
and tuple
, classes in Python are mutable. That's generally what you want, but in this case it can bite you. If the instance which is the key is changed, the hash will also change and your code will probably experience difficult to track down bugs.
The solution is to make Coordinate
immutable, or at least as immutable as any Python class can be. To make a class immutable, have the __setattr__ and __delattr__ methods raise exceptions. (But watch out -- that means that you can no longer write something like self.x = x
, you have to delegate that to the superclass.)class Coordinate(object):
def __setattr__(*args):
raise TypeError("can't change immutable class")
__delattr__ = __setattr__
def __init__(self, x, y):
super(Coordinate, self).__setattr__('x', x)
super(Coordinate, self).__setattr__('y', y)
def __hash__(self):
return hash(self.x) ^ hash(self.y)
def __eq__(self, other):
try:
return self.x == other.x and self.y == other.y
except AttributeError:
return False
There are a few other things you can do as well: as a memory optimization, you can use __slots__ = ('x', 'y')
to allocate memory for the two attributes you do use and avoid giving each instance an attribute dictionary it can't use. If the superclass defines in-place operators like __iadd__
etc. you should over-ride them to raise exceptions. If your class is a container, you must also make sure that __setitem__
etc. either don't exist at all or raise exceptions.
(I am indebted to Python guru Alex Martelli's explanation about immutable instances.)
[Update, 2007-04-02: fixed a stupid typo where I called super(Immutable, ...)
instead of super(Coordinate, ...)
.]
Sunday, January 14, 2007
Immutable instances in Python
Posted by Vlad the Impala at 1/14/2007 06:49:00 pm
Labels: programming, python
Subscribe to:
Post Comments (Atom)
3 comments:
Thanks for the tutorial. Good to know that making objects immutable is as easy as breaking __setattr__ and __delattr__ :D.
One thing strike me though:
1) Your hashing hack is broken, because:
hash(5)^hash(2)==hash(2)^hash(5)
A slightly more reliable way to do it might be:
return hash( ("Coordinate",self.x,self.y) )
What would be really nice is if you could define a function like this:
def setInConcrete(instance):
instance.__delattr__=_RaiseError
instance.__setattr__=_RaiseError
Shame it doesn't work, eh? :P
David
http://alsuren.livejournal.com
Thanks for the comment David, glad it was helpful.
As for the __hash__ method, it is true that Coordinate(2, 5) and Coordinate(5, 2) return the same hash function. That's known as a collision, and generally it is a good idea to minimize the number of collisions. But dict lookup will still work, if you have an __eq__ method to resolve collisions.
Great rreading your blog post
Post a Comment