Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Saturday, March 15, 2008

RIP Joseph Weizenbaum

Joseph Weizenbaum, the creator of Eliza, has passed away. That lead to this amusing exchange on the comp.lang.python newsgroup (compiled from various contributors):

    How do you feel about creator of Eliza?
    What is Eliza?
    Does that question interest you?
    Well played, sir.
    Earlier you said what is Eliza. Do you still feel that way?
    I am embarrassed to say that this vaguely disrespectful exchange made me laugh out loud.
    Does it bother you that this vaguely disrespectful exchange made you laugh out loud?


Linux users wanting to play with Eliza can run the Emacs text editor and choose "Emacs Psychotherapist" from the Help menu.

Sunday, January 14, 2007

Immutable instances in Python

Imagine you have a class like Coordinate that implements a two-dimensional coordinate pair. You might want to use instances of that class in a dictionary, but the problem is that instance keys compare by identity, not equality:

>>> D = {Coordinate(2, 3): "something"} # Coordinate is a custom class.
>>> D.has_key(Coordinate(2, 3)
False

This is not what you expect: even though the two instances of Coordinate(2, 3) have the same value, they don't have the same ID and therefore Python won't treat them as the same dictionary key.

The answer to this problem is to give the class __hash__ and __eq__ methods:

class Coordinate(object):
def __init__(self, x, y):
self.x = x
self.y = y
def __hash__(self):
return hash(self.x) ^ hash(self.y)
def __eq__(self, other):
try:
return self.x == other.x and self.y == other.y
except AttributeError:
return False


Now two instances that compare equal will also have the same hash, and Python will recognise them as the same dictionary key.

But there's a gotcha: unlike built-in types like int and tuple, classes in Python are mutable. That's generally what you want, but in this case it can bite you. If the instance which is the key is changed, the hash will also change and your code will probably experience difficult to track down bugs.

The solution is to make Coordinate immutable, or at least as immutable as any Python class can be. To make a class immutable, have the __setattr__ and __delattr__ methods raise exceptions. (But watch out -- that means that you can no longer write something like self.x = x, you have to delegate that to the superclass.)

class Coordinate(object):
def __setattr__(*args):
raise TypeError("can't change immutable class")
__delattr__ = __setattr__
def __init__(self, x, y):
super(Coordinate, self).__setattr__('x', x)
super(Coordinate, self).__setattr__('y', y)
def __hash__(self):
return hash(self.x) ^ hash(self.y)
def __eq__(self, other):
try:
return self.x == other.x and self.y == other.y
except AttributeError:
return False


There are a few other things you can do as well: as a memory optimization, you can use __slots__ = ('x', 'y') to allocate memory for the two attributes you do use and avoid giving each instance an attribute dictionary it can't use. If the superclass defines in-place operators like __iadd__ etc. you should over-ride them to raise exceptions. If your class is a container, you must also make sure that __setitem__ etc. either don't exist at all or raise exceptions.

(I am indebted to Python guru Alex Martelli's explanation about immutable instances.)

[Update, 2007-04-02: fixed a stupid typo where I called super(Immutable, ...) instead of super(Coordinate, ...).]

Tuesday, January 02, 2007

Asking why on technical forums

If you spend any time on technical mailing lists or newsgroups, you'll often come across conversations that go something like this:

    "How do I frabulate the transfibulator?"

    "Why do you want to do that?"

    "Why do you care? Just tell me how to frabulate the transfibulator!"


Why should people on technical lists care about the why? Why not just answer the question?

Firstly, and most importantly, because people have an ethical obligation not to give bad advice.

It is foolish to assume that every random poster on the Internet or Usenet is a responsible, intelligent, clear-thinking, sufficiently cautious adult who knows what they are doing. In fact, if you were going to play the odds, you'd bet on them being the complete opposite. This is even true on many of general purpose technical mailing lists (although perhaps not so much on the more elite lists). If frabulating the transfibulator carries risks or serious costs, then the chances are very good that the person asking about it isn't aware of those risks.

It is one thing to give a straight technical answer if it seems that the poster knows what they're doing. There's no reason not to tell someone how to shoot themselves in the foot if they are fully aware of the consequences of doing so; it is another thing altogether if their post indicates that they haven't thought it through and have no idea that they are even pointing the gun at their foot.

If somebody asks for help writing a rotor-based encryption engine (like the World War Two Enigma), it would be sheer irresponsibility to answer their technical question without pointing out that Enigma was broken back in the 1940s and is not even close to secure today. So ask "Why do you want to do that?". If the answer is "I'm storing confidential medical records in a database", then you can gently apply the cluebat. It might be your own medical records you prevent from being stolen. But if the answer is "I'm doing it to obfuscate some data in a game, I know this is weak encryption, but it is good enough for a game", then that's a horse of a different colour.

The second reason for asking "why?" is that it is extremely common for people to ask the wrong question because of a misunderstanding or misapprehension. Some time ago I read an exchange of posts on comp.lang.python started by a programmer who was looking for a faster method to access items in a list. Eventually somebody asked him "Why?", and it turned out that he had assumed that Python lists are linked lists and that item access was a very slow procedure. In fact, Python lists are smart arrays, and item access is exceedingly fast.

If folks had merely answered his technical question, he would have solved a non-problem, learnt nothing, and ended up with slow and inefficient code. His real problem wasn't "How do I this...?". His real problem was that he was labouring under false information, and by asking "Why do you want to do this?", people helped him to solve his real problem.

Wednesday, July 05, 2006

Capturing output of print in Python

The standard Python disassembler, dis, prints its output directly to standard output instead of returning it as a string. This is very inconvenient if you wish to do further processing on the disassembled code, or even if you just want to do something simple like count the number of lines.

Fortunately, it is easy to write a wrapper to capture the output of any Python function:

import sys, cStringIO, traceback

def capture(func, *args, **kwargs):
    """Capture the output of func when called with the given arguments.

    The function output includes any exception raised. capture returns
    a tuple of (function result, standard output, standard error).
    """
    stdout, stderr = sys.stdout, sys.stderr
    sys.stdout = c1 = cStringIO.StringIO()
    sys.stderr = c2 = cStringIO.StringIO()
    result = None
    try:
        result = func(*args, **kwargs)
    except:
        traceback.print_exc()
    sys.stdout = stdout
    sys.stderr = stderr
    return (result, c1.getvalue(), c2.getvalue())


With the aid of capture it is easy to grab the disassembled code:

import dis

def disassemble(obj=None):
    """Capture the output of dis.dis and return it."""
    return capture(dis.dis, obj)[1]

Sunday, July 02, 2006

Python: using print with lambda

In the Python programming language, print is a statement, not a function. (This has been recognised as a design flaw, and will be corrected in Python 3.) One of the disadvantages of that is that you can't use print in anonymous functions using lambda.

However, there are work arounds.

Firstly, the simplest: create your own print function:

def prnt(x):
    """Print object x."""
    print x


And now you can write lambda obj: prnt(obj).

Too easy? Here's a marginally more complex solution. If you know that your code has imported the sys module, you can do this:

lambda obj: sys.stdout.write(str(obj)+'\n')

What if you can't assume that some other piece of code has called import sys for you? There are those who claim that the right behaviour is to raise an exception if sys hasn't been imported, which Python will kindly do for you at runtime. Others prefer to import it when needed:

lambda obj: __import__("sys").stdout.write(str(obj)+'\n')

Of course, the real print statement takes optionally many arguments, printing them separated with a single space. Can we do that?

A list comprehension makes it easy:

lambda *args: __import__("sys").stdout.write(
" ".join([str(obj) for obj in args])+'\n')


The __import__ function, like the import statement, is smart about importing modules. If the module has already been imported, it doesn't do anything, so importing an already imported function has negligible penalty.

The real print has subtly different behaviour if you end its argument list with a trailing comma: it doesn't print a trailing newline. Our lambda can't capture that behaviour, but this gives you almost all the functionality of print within an anonymous function.