Thursday, December 30, 2004

Anonymous functions in Python

Python currently supports anonymous functions using the lambda keyword. This is a rather ugly beast, and I've yet to find anyone who actually likes the syntax, rather than tolerating it because they want to be able to use anonymous functions. It also forces non-mathematicians to learn that mathemetaticians and functional programmers seem to like calling anonymous functions lambdas, for reasons known only to them.

GvR has stated that he wants to get rid of lambda for Python 3.0. His main reasons seem to be that he dislikes the restriction to a single expression, and that he dislikes the current syntax. The question then, is whether a more Pythonic syntax for anonymous functions can be found to replace the current lambda, and whether the restriction to a single expression is really a problem.

I believe Python 2.4's generator expressions provide good guidance on the correct attitude towards 'anonymous functions as expressions'. Generator expressions are related to for loops similarly to the way anonymous functions are related to full function definitions. Firstly, generator expressions are restricted to a single expression in the "body" of the for loop. They also include an implied yield statement. That is, the following two pieces of code are equivalent (neglecting namespace effects):
sum(x * x for x in seq)

def squares(seq):
  for x in seq:
    yield x * x
sum(squares(seq))
Nobody seems to complain about the fact that generator expressions are restricted to a single expression inside the for loop. Instead, they are extremely happy about the fact that they can do their simple for loop inside an expression instead of breaking it out into a separate generator. I believe the same attitude should apply to anonymous functions - if something is simple enough to express with a single expression, it may be simple enough to embed inside another expression (such as a function call). If it cannot be expressed with a single expression, it is almost certainly too complicated to be embedded inside another expression.

The other argument in favour of the conceptual integrity of restricting anonymous functions to a single expression is Python's distinction between suites (which contain statements), statements (which contain expressions, statements or suites) and expressions (which contain only other expressions). Allowing a suite inside an anonymous function would break the concept of "expressions can only contain other expressions". Even if it turned out to be possible, it would do horrible things to Python's grammar, and open the door to some seriously unreadable code. Even when restricted to a single expression, overuse of lambdas can already lead to incomprehensible code (abuse of generator expressions can also lead to unreadable code, so don't take that last comment as an argument against allowing anonymous functions).

Even if you agree with me that restricting anonymous functions to a single expression is legitimate, that still leaves the question of "Why not just define a named function?". This is certainly GvR's standard response when questioned about the removal of lambda in Python 3.0. To my mind, the killer app for a clean anonymous function syntax is lazy evaluation of function arguments - only performing a calculation if the function actually needs that value. Another, rarer, use is the ability to have a generator expression which yields a sequence of functions. My examples will be based on these two use cases.

The standard mechanism for lazy evaluation in Python is to write a function that accepts a zero-argument callable instead of the argument we want lazily evaluated. If the function actually needs the relevant argument, it invokes the callable and uses the returned value. This approach is very clean when the caller has a function on hand that produces the desired result. When they do not, the caller must create a zero-argument function to be passed as the lazy argument. This is generally either a lambda or a named function created specifically for the purpose. Removing lambda eliminates the choice - you must use a named function. That approach, however, gets rather silly if the caller has the actual desired argument value on hand:
accepts_lazy_arg(lambda: val)

def ret_val():
  return val
accepts_lazy_arg(ret_val)
The canonical use cases for lazy evaluation, of course, are short circuiting versions of functions which implement conditional expressions and switch statements.

Moving on to the second use case, consider the toy problem of creating a list of incrementors - functions that add differing amounts to their arguments. With anonymous functions, this can be done with an expression, without them, it requires several statements:
funcs = [(lambda x: x + i) for i in range(10)]

def incrementors():
  for i in range(10):
    def incrementor(x):
      return x + i
    yield incrementor
funcs = list(incrementors)
There are legitimate use cases for anonymous functions. I don't use them very often, but when I do use them, it would be a genuine pain to work around not having them. So I would be very disappointed to see them disappear completely in Python 3.0. However, where I agree with GvR entirely is that the current syntax is as ugly as sin - I sometimes don't use lambda when it might be useful, simply because it is so ugly and un-Pythonic. That means it must be time to move on to a syntax proposal.

The proposed syntax is based on the idea of functions as mappings from tuples of arguments to tuples of results. In mathematical terms, a function maps from a given domain (e.g. the Cartesian product of the real numbers) to a given range (e.g. negative pi inclusive to positive pi exclusive). Anonymous functions cover only those cases where the result tuple can be obtained from the input tuple using Python expressions. If you need something more complex, switch to using a named function (just as generator expressions require you to switch to a named generator if either the desired result or the filtration condition cannot be written as Python expressions)

One of the problems with lambda is that we already have a perfectly good function keyword in def. So the proposed syntax uses def in an expression context (where it is currently illegal). Another problem I personally have with lambda is that it embeds a colon in the middle of an expression, which I find makes it difficult to parse the rest of the expression. So the proposed syntax uses the new keyword to instead. The existing pseudo-keyword as was considered, but it is already overloaded with enough uses, and the word to better fits the above interpretation of the meaning of anonymous functions. One of GvR's criticisms of lambda that I agree with is that it doesn't require parentheses arounds its argument list. So the proposed syntax requires parentheses around the argument list. Parentheses surrounding the entire anonymous function will be required, even as an argument to a single argument function call. This avoids ambiguity problems with returning a tuple from the anonymous function. All of which gives something like the following equivalent pieces of code:
accepts_func((def (a, b, c) to f(a) + g(b) - h(c)))

def f1(a, b, c):
  return f(a) + g(b) - h(c)
accepts_func(f1)
The proposed syntax can be read as "define an anonymous function from arguments a, b and c to the result f of a plus g of b minus h of c". Or, in a shorter form, "def from a, b and c to f a plus g b minus h c". An earlier version of this post actually contained a bug in the named function version - I had incorrectly used the name 'f', accidentally creating a recursive function. Anyway, using the proposed syntax, the examples above become:
accepts_lazy_arg(def () to val)

funcs = [(def (x) to x + i) for i in range(10)]
An idea worth toying with is whether the argument list and the to keyword should be optional when the argument list is empty. This makes calls to functions which take lazy arguments extremely clean - just take whatever the argument would have been using immediate evaluation, prepend "(def " and append ")". However, that approach may not be possible given the constraints of CPython's simple parser.

Finally, no discussion that covers lazy evaluation would be complete without showing how conditional expressions with short-circuiting behaviour would look. The example usages assume a syntax which allows "() to" to be omitted.
def either(condition, true_case, false_case):
  if condition:
    return true_case()
  else:
    return false_case()

print either(A == B, (def "A equals B"), (def "A does not equal B"))
either(thefile, (def thefile.close()), (def 0))
Note that this proposal does not add any new capability to the Python language. Instead, it merely aims to provide a more Pythonic syntax for the existing lambda expressions, with the aim of retaining anonymous functions for the hypothetical Python 3.0.

Friday, December 24, 2004

Email Blogging

Just checking it works. Nothing to see here. Move along.

--
Nick Coghlan | ncoghlan@email.com | Brisbane, Australia
---------------------------------------------------------------
http://boredomandlaziness.skystorm.net

Type-checking in Python

So, Guido has brought up the idea of optional static typing again, posting his current throughts on the idea, as well as noting what he sees as the problem areas.

His favoured syntax is:

def (a: sometype, b: sometype) -> sometype:
  pass

Bleh. The main reason I dislike the current version of anonymous functions is because they embed a colon in the middle of an expression so you can probably guess how I feel about Guido's reuse of the colon here. And I've always quite liked VB's approach of using 'as' to indicate the return type of a function. Anyway, syntax aside, there's a question of what the optional type declarations are actually for.

Now one point to make at the start is that optional typing (checked by the VM) and optional static typing (checked by the compiler) are different things, and it makes some sense to do the former before doing the latter. Once you have a syntax for optional typing, making it static is merely a question of figuring out how to get the compiler to do the type check, instead of the VM. This activity would then blend in with Python's general issue of "how can we move things to the compiler to save run-time activity, without losing too much dynamism?"

Having dropped the static idea for the moment, there's the fundamental question of what does a type declaration mean? Python has historically relied on an approach that says "if it defines the right methods, it's OK by me". This is great for flexibility and code reuse, but plays merry hell with type inferencing systems, and can lead to some exceedingly cryptic error messages when you pass a type that doesn't provide the correct methods (or provides methods with the right names, but the wrong signatures, etc, etc).

Stealing an example I like:

def int_divide(x as Integer, y as Integer) as Integer:
  return x / y

We don't really want x as Integer (or x : Integer in Guido's syntax) to mean isinstance(x, Integer) do we? After all, we'd like this function to work for builtin types, and Python's builtin types won't know anything about this interface we have created. It would be far nicer if the optional typing was just a way of formalising the 'duck typing' that Python currently relies on.

So let's consider something like the interfaces from PJE's PEAK, or Eiffel's idea of conformance (PEP 246, basically). In this case, we have a builtin method adapt() to adapt a given object to a given protocol. I'd suggest the meaning of the example should become:

def int_divide(x, y):
  x = adapt(x, Integer)
  y = adapt(y, Integer)
  return adapt(x / y, Integer)

Objects participate in this scheme as interfaces by definining __adapt__ special methods, and as adaptable objects by defining __conform__ special methods. That way, interfaces and types can be written in any order, and still play well together. For instance, the existing 'adaptation methods' understood by the builtin objects' constructors (i.e. __int__, __str__ and friends) could be incorporated into the system by having the __adapt__ methods of the relevant interfaces invoke the appropriate constructor - if the constructor throws an exception, then the adaptor method converts it to the appropriate adaptation exception.

As mentioned in PEP 246, it would also be possible to have an 'adaptation registry' which mapped from (type, interface) tuples to adaptation methods. While this doesn't really matter to the basic idea of adaptation, it's handy for people trying to integrate code which provides the right interface, but doesn't actually provide the relevant adaptation information (e.g. if it provides a read() method, and the function uses an interface which expects that method).

For containers, it would make sense to have the interfaces be parameterisable (e.g. List(Integer), List(Number), List(int, long) or List() - that last example meaning, "allow a List with any types", since a list which allowed no types wouldn't be very useful). This suggests the concept of interface factories - classes whose instances are themselves interfaces.

For example (assume AdaptationError is a subclass of TypeError that is thrown when an adaptation fails):

class AdaptedOk(Exception): pass

class List(object):
  def __init__(self, *args):
    self._allowed_interfaces = args

  def __adapt__(self, obj):
    try:
      lst = list(obj)
    except Exception, ex:
      raise AdaptationError(str(ex))
    interfaces = self._allowed_interfaces
    if interfaces:
      for i, x in enumerate(lst):
          try:
            for interface in interfaces:
              try:
                lst[i] = adapt(x, interface, None)
                raise AdaptedOk
              except AdaptationError:
                continue
            raise AdaptationError("List element %s does not "
                   "support any allowed interface" % str(x))
          except AdaptedOk:
            pass
    return lst

  def __eq__(self, other):
    return (isinstance(other, type(self)) and
     (self._allowed_interfaces == other._allowed_interfaces))

OK, so PEP 246 combined with syntactic support would give a cleaner mechanism for dynamic type checking. However, it would still be nice to have some sort of static checking for optimisation purposes (if the compiler knows the types at compilation time, it can do all the operator lookups and so forth then, instead of waiting to do the lookups at runtime).

Well, how about a slightly different pair of special methods: __adapt_strict__ and __conform_strict__. The result of strict adaptation guarantees that the result of adaptation is an actual instance of the interface. If an interface defines __adapt_strict__ without defining __adapt__, then Python can be certain that the results of adaptation to that interface will be an instance of that interface.

For example, the builtin types might provide __adapt_strict__ methods, allowing them to be used as interfaces which guaranteed that the result was an instance of the builtin type:

  isinstance(adapt(x, int), int) # Always true
  isinstance(adapt(x, Integer), Integer) # Likely false
  isinstance(adapt(x, Integer), int) # Only possibly true

This can give us static typing, as long as the compiler can check for the existence of __adapt__ and __adapt_strict__ on the supplied interface (e.g. by assuming the names of builtins actually refer to the builtins). Here's some hypothetical implementations of __adapt_strict__ for the builtins object and list:

def object:
  # The rest of object's definition is as normal
  # Naturally this would really be implemented in C. . .
  # Use a class method so any new-style class
  # can automatically be used for strict adaptation
  @classmethod
  def __adapt_strict__(cls, obj):
    if isinstance(obj, cls):
      return obj
    try:
      result = cls(obj)
    except Exception, ex:
      raise AdaptationError(str(ex))
    return result

class AdaptedOk(Exception): pass
def list(object):
  # The rest of list's definition is as normal
  # Naturally this would really be implemented in C. . .
  # Uses an instance method, so we use self
  # to store the list of allowed interfaces
  def __adapt_strict__(self, obj):
    try:
      lst = list(obj)
    except Exception, ex:
      raise AdaptationError(str(ex))
    if self:
      for i, x in enumerate(lst):
        try:
          for interface in self:
            try:
              lst[i] = adapt(x, interface, None)
              raise AdaptedOk
            except AdaptationError:
              continue
          raise AdaptationError("List element %s does not "
                 "support any allowed interface" % str(x))
        except AdaptedOk:
          pass
    return lst

With strict adaptation available, our earlier example of non-strict list adaptation would change to be:

def List(list):
  def __adapt__(self, obj):
    return self.__adapt_strict__(obj)

The rules for adaptation would change slightly from those suggested in the PEP:
  1. If the object is an exact instance of the interface, return it
  2. Try the object's __conform_strict__ method, if it has one. If that works, return the result.
  3. If the interface allows non-strict adaptation (it defines __adapt__), then try the object's __conform__ method, if it has one. If that works, return the result.
  4. Try the interface's __adapt_strict__ method, if it has one. If that works, return the result.
  5. Try the interface's __adapt__ method, if it has one. If that works, return the result.

The new additions are steps 2 & 4, and step 3 has been modified so that it is only tried if the interface implements __adapt__. The idea of restricting step 1 to exact instances is taken from the PEP - it allows subclasses to say "I don't implement my parent's interface" by throwing an exception in its conformation method. The check for instances of subclasses has been moved to the implementation of object.__adapt_strict__. This allows interfaces to decide how they choose to deal with subclasses.

Wednesday, December 22, 2004

Google Footprint

OK, so here's the thing. In the real world, most Australians talking about 'Nick Coghlan' are going to be talking about the guy on Secret Life of Us. If not him, then the surfer from somewhere down south. If we went to Canada, well, there's a diplomat by that name, so at least a few people would know who he is, or would have seen an article or two about him.

Enter the world of Google, though, and an awful lot of it is about me. On the first two pages of a search for "Nick Coghlan", there's just one entry halfway down the second page for the Canadian guy. The other two don't show up until the third page. The results are skewed massively in my favour because of the public mailing list web archives that Google indexes - and every one of my messages to those lists includes my name.

You can really see the effect of this by searching for "Nicholas Coghlan" instead. I disappear from the results, and the TV actor and the Canadian diplomat take control of the show. I don't show up until page 4, on an old announcement from UQ. As you might guess from that, I don't use my full first name very often.

Another interesting search is "nick coghlan -python -cygwin -software". The results are then quite similar to the "nicholas coghlan" results.

Anyway, I guess my only real observation is that, when searching just for names, Google gives extremely high weight to participation in public online communities. Obvious, one might say, but it's an interesting limitation when looking for information on more famous people that happen to share a name with someone like me.

Monday, December 06, 2004

Brain Dead Software #1: Yast Online Update

Suppose a new employee came to work for you. They're asked to collect 20 items from the company warehouse. You come back at the end of the day to find they have retrieved only one of the items. When queried, they say, "Well, the warheouse didn't have the second item on the list, so I came back to see what you wanted to do about it." "OK, fine, where are the other 18 items, then?" "Oh, I didn't try to get those. I needed to ask you about the second item, so I came back here." You'd be justifiably pissed off, and they'd probably be well on their way to getting fired.

Yast Online Update is that employee. If it encounters a problem with any of the packages you ask it to install, it sits there with a freaking dialog box on the screen doing absolutely nothing until you come back and tell it, "Look, just get on with the rest of the downloads already".

To be fair, this problem isn't specific to YOU - YOU just happens to be the most recent example I've encountered. When a computer program is given a list of tasks to do, and encounters a problem with one of them, it should look at the list and continue on with as many of the remaining tasks as it can. At the end, it can present a report detailing any problems encountered, and asking what is to be done about each of them. With Brain Dead Software like YOU, I can't just leave it to run overnight - chances are it will only do useful work for a short while before some glitch causes it to twiddle its thumbs for the rest of the night, waiting for me to wake up and reassure it that everything is fine.

Anyway, that's Brain Dead Software - programs that do things that would get a human fired. I'm sure there'll be more entries in this category.