Thursday, May 31, 2007

NTTMCL is hiring

NTTMCL is looking for software engineers (but then, who isn't right now?). We are a small research-and-development subsidiary of NTT Communications of Japan. I won't rehash our company profile because it is all on our web site.

We use and develop a lot of different technologies, so while the current positions are focused on wireless, encryption, and VoIP, one of the great things about working at NTTMCL is that you have the chance to work on many different projects beyond those you were originally hired for. I will add that we are far more focused on development than research, with the proportion of full-time engineers to full-time researchers somewhere around 5:1.

The positions ask for C, C++, or Java experience, but we also have at least as much Python and Perl code in use on various projects. Which is to say that being flexible is quite an asset here at NTTMCL. Just knowing/using languages isn't the point: the point is having a large tool chest to pull from and being able to identify which tool is the best tool for a job. Being a R&D company, management is relatively open to trying new tools/languages/etc. if you can justify the choice by explaining how it gets the job done better than the alternatives.

Speaking of which, I've been meaning to write up a post about how much I like my job, but it turns out my co-worker Zach has already beaten me to it. Hopefully, I'll still get around to writing up my own thoughts based on my 5+ years at NTTMCL sometime soon.

Thursday, May 24, 2007

XML-RPC patented

Ever being the astute one, I just now discovered that webMethods was awarded a patent for XML-RPC back in April of last year. They don't seem to be cracking down yet, but could XML-RPC be the next GIF/LZW controversy?

It would be hard to compose an unencumbered alternative to XML-RPC when the first claim of the patent reads:
A method of communicating between first and second machines, said method comprising the steps of: generating a message at a first machine including at least one argument and a type label for said argument; and transmitting said message from said first machine.

Since S.O.A.P was developed by Microsoft it goes without saying but that is patented too. I guess I need to start converting my XML-RPC clients and servers to JSON-RPC to be on the safe side.

Monday, May 21, 2007

Python: islambda()

The Python inspect module provides functions that determine whether objects are methods, functions, classes, modules, etc. However, there is no method that tells you whether something is a lambda expression. The isfunction() is probably close enough for most applications, but believe it or not, I recently encountered a case where it made sense to issue a warning if an argument was a lambda expression.

Here is a python function that determines whether or not its argument is a lambda expression:

import inspect
def islambda(f):
return inspect.isfunction(f) and \
f.__name__ == (lambda: True).__name__

Currently, the __name__ of anonymous functions created by lambda is "<lambda>", so I could have just hard-coded that string into the comparison. But I chose to use the (lambda: True).__name__ expression instead just in case python uses a different name for lambda expressions in the future.

This function works as expected in all common cases:

>>> islambda(lambda: 1)
True
>>> islambda(islambda)
False
>>> islambda(globals)
False
>>> islambda(str)
False
>>> islambda(str.join)
False
>>> islambda("".join)
False

The only case that I am aware of where it will not work is if you carefully craft a function with the name "<lambda>":

import new
>>> x = new.function(
compile("print 'Hello World!'", "<string>", "exec"),
{}, '<lambda>')
>>> islambda(x)
True

Consider yourself warned. :)

In case you are curious, the application I was working on had a method that took a callable as an argument and held a weak reference to it. I would have loved to have been able to issue a warning anytime a callable was passed that would "immediately" be garbage collected before anything useful was done, but that is a non-trivial condition to detect (hint: it involves reading the programmer's mind). But there is a common subset of that error case that is relatively easy to detect: callers passing their only reference to a lambda expression. That case can be trivially detected using the islambda() function described above along with the sys.getrefcount() function like so:

import sys
from warnings import warn
...
def myfunc(f):
if sys.getrefcount(f) == 3 and islambda(f):
warn('f is too short-lived to be useful', stacklevel=2)
...

Since it is not obvious, I should point out that (in this example) a reference count of 3 indicates that myfunc()'s caller holds no references to the callable f. The reason is that sys.getrefcount() will hold one reference, the name f is bound to one reference, and there is a temporary reference held by the python interpreter across the call to myfunc(), so if sys.getrefcount() returns 3, we know those are the only three references.

Incidentally, the fact that islambda() erroniously identifies a function with the same "<lambda>" as a lambda expression is inconsequential for my stated purpose: if the crafted function has no other references, I want to issue a warning just the same as if it had truly been a lambda expression.

Which brings me back to isfunction(). It turns out, not surprisingly, that isfunction() is sufficient for my needs since a function with only 3 references has, by definition, no external references. In the end, I didn't actually use my islambda() function and went with isfunction() for my application instead:

import inspect
import sys
from warnings import warn
...
def myfunc(f):
if sys.getrefcount(f) == 3 and inspect.isfunction(f):
warn('f is too short-lived to be useful', stacklevel=2)
...

This handles both lambda expressions and functions dynamically created using the new module.

Sunday, May 20, 2007

Python Pitfall: Not all objects are created equal

I've been entertaining the idea of writing a series of posts about Python warts for a couple of weeks now. Overall, python is a remarkably consistent programming language, but there are a few edge cases that people should be aware of. My hope is that, by pointing out their existence, others can save themselves a rude surprise. I've decided to call the series "python pitfalls".

So here goes my inaugural post: not all objects are created equal in python. New-style classes (which are only "new" in the sense they were introduced in python 2.2 which is quite old now) all inherit from the base object class. For example, consider the following simple class:

>>> class MyObject(object):
... pass

This is about as simple a class as you can make in Python; MyObject inherits all of its behaviour from the object base class. Now, lets set an attribute on an instance of our new class:

>>> b = MyObject()
>>> b.myattr = 42
>>> print b.myattr
42

Nothing fancy here. But recall that our MyObject class adds nothing to the base object class, implying that the ability to set arbitrary attributes on an instance must originate with the object class's implementation. Let's give it a try:

>>> a = object()
>>> a.myattr = 42
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'object' object has no attribute 'myattr'
>>> setattr(a, 'myattr', 42)
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'object' object has no attribute 'myattr'

What is going on here? We were able to set attributes on instances of MyObject, but not on instances of object itself? That's odd: MyObject inherits all of its behaviour from object, so they should be exactly the same!

My first guess was that the object class had a __slots__ attribute restricting which attributes could be set on it (see this article for an explanation of __slots__). One of the properties of __slots__ is that, unlike most other class attributes, it is not inherited by subclasses. Which would explain why we can set arbitrary attributes on instances of MyObject, which is a subclass of object, but not on instances of object itself. However, to my surprise, object does not define a __slots__ attribute:

>>> '__slots__' in dir(object)
False
>>> dir(object)
['__class__', '__delattr__', '__doc__', '__getattribute__',
'__hash__', '__init__', '__new__', '__reduce__',
'__reduce_ex__', '__repr__', '__setattr__', '__str__']

Look: no __slots__! So that isn't it.

As far as I can tell, the fact instances of object do not allow attributes to be set on them is simply an implementation artifact. They should, but they don't. Go figure. Luckily, there is seldom need to create instances of object directly; the class really just exists as a base class for deriving new-style classes. But I do still find it odd somehow subclassing object adds functionality not present in the base class.

Thursday, May 17, 2007

XML-RPC namespace

Is it just me or does XML-RPC appear to not have a defined XML namespace?

I realize that most people are using XML-RPC as the entire XML document, so namespaces probably don't matter to them. But I'm implementing a job scheduler in which I am separating the scheduling logic from the job handling logic. I do this by defining a job as an XML document consisting of scheduling parameters and an XML-RPC method invocation. When it is time to execute the job, the scheduler acts as an XML-RPC client, making the method call to the job handler process (which is an XML-RPC server). In this context, my job's XML document consists of tags from two namespaces: my job scheduler tags and XML-RPC tags. XML namespaces solve the ambiguity, but XML-RPC doesn't appear to have a well-defined namespace.

Since this is a slighty out-of-the-ordinary usage of XML-RPC, perhaps an example would help illustrate the problem:

<?xml version="1.0" ?>
<job xmlns="http://www.nttmcl.com/kelly/jobsched">
<!-- Schedules are expressed in ISO-8601 notation. -->
<!-- For example, the following schedule says to
repeat 5 times 1 minute apart, starting on May
18th 2007 at midnight UTC. -->
<schedule>
R5/2007-05-18T00:00:00Z/PT1M
</schedule>
<action>
<methodCall xmlns="">
<methodName>ExecProcess</methodName>
<params><param><value>
<struct>
<member>
<name>path</name>
<value>/bin/echo</value>
</member>
<member>
<name>args</name>
<value>
<array><data>
<value>This is a test</value>
</data></array>
</value>
</member>
</struct>
</value></param></params>
</methodCall>
</action>
</job>

As you can see, currently, I'm just resetting the XML namespace for the XML-RPC <methodCall> tag (and all nested sub-tags) to an empty namespace. If XML-RPC has a well-defined namespace, however, I would feel much better using that. Please, if anyone is aware of the proper namespace declaration for XML-RPC please let me know. If not, I will gladly offer a unique URL in my personal domain for use as the official XML-RPC identifier: http://www.posi.net/xml-rpc/.

Patents and innovation

One of the first rules of science is to know what your are measuring. Which is why I am surprised that The Economist has supposedly issued a report claiming that Japan is the most innovative country in the world (the U.S. is in third place after Japan and Switzerland). How did they arrive at this conclusion? By comparing the number of patents per-capita.

Which would seem to ignore that fact that different countries have different patent laws and different degrees of patent protection. Imagine for a second that U.S. patent law did not allow for making multiple claims in a single application, it is easy to see how the number of U.S. patents could easily be several times its current number.

Which, of course, still avoids the question of whether the number of patents is any indication of innovation. Apparently The Economist thinks so. On the surface it would appear that Microsoft would agree. But Microsoft's own Bill Gates has admitted that software companies must accumulate as many patents as possible to defend against and wage patent wars against other companies. That isn't any definition of innovation that I am aware of. In any event, if such patent warchest accumulation is inflating the U.S. numbers can we assume that 3rd place in world innovation might be too generous? Or might other countries be experiencing similar phenomena?

Of all the more tangible measures of innovation (venture capital funding, entrepreneurship, new product development, etc) I cannot help but find the selection of the abstract, easily gamed, number of patents issued to be an insincere -- if not outright misleading -- metric. Which begs the question: for what gain?

Friday, May 11, 2007

Brain-teasers make bad interview questions

The other day BitTorrent posted a job posting to the BayPiggies mailing list with the following brain-teaser to solve in it:
What is the exponent of the largest power of two whose base seven representation doesn't contain three zeros in a row?

Normally, I hate "brain-teaser" type interview questions. I make it a point to never ask brain-teasers in interviews and don't put too much weight on candidates' answers when my fellow hiring managers ask them. The reason is simple: the typical brain-teaser is too easy to game.

Take BitTorrent's question, for example, the answer is 8833.

BitTorrent asked for python code to solve the problem, but that is a red-herring. You cannot solve the problem in python. Sure, you could write a program like the following that prints all of the powers of 2 that do not have three consecutive zeros in the base-7 representation:
    def base7(x):
digits = []
while x != 0:
x, r = divmod(x, 7)
digits.append(str(r))
return ''.join(reversed(digits)) or '0'

i, x = 0, 1
while True:
if '000' not in base7(x):
print i,
i += 1
x += x

But BitTorrent asked for the largest power of two that meets the requirements. How do you know when a number printed by the above program is the largest? You don't! There are infinitely many numbers and this program has to test them all. That'll take a while.

The full solution to BitTorrent's problem requires you to find the upper-bound of the search -- which can only be done without a single line of python code (despite their requirement to use python): you have to solve it using math. Which implies that BitTorrent actually wants their ideal candidate to be a mathematician first and foremost. That isn't necessarily a bad thing; I don't know anything about BitTorrent's business, maybe they need people with particularly strong math backgrounds (much stronger than mine, that is).

But there is actually another way to obtain the solution: use Google. So knowing how to work a search engine will get you in the door just as effectively as having an advanced math degree. Go figure.

Neither skill says anything about the candidate's programming ability or knowledge of python in particular. In fact, I would argue that attempting to "solve" the problem by writing a python program without including the proof supporting your solution should disqualify the candidate as demonstrating that they don't understand the problem. Writing code without understanding the problem is well into Code Cowboy territory.

Anyway, the point I wanted to make was that it has been my experience with "brain-teasers" that at best they give the interviewer some vague idea of the candidate's problem-solving ability. Far more often, however, they randomly filter out people who may be perfectly good programmers but who didn't happen to have memorized (or found online) the answer to the particular brain-teaser you throw at them.

When I do interviews, I figure I only have a limited time to interview a potential employee. I would rather spend that time asking questions specific to software engineering, computer programming, or the problem domain we are hiring for. I always prepare a progression of real-world questions ranging from "easy" to "guru", with no expectation to ever ask the "guru" questions -- my intention is to see what the candidate's level of ability is and, more importantly, to see how they handle a question or two slightly beyond their ability. Maybe it's because I'm not too bright myself, but that tells me much more about the candidate's suitableness for a position than an abstract brain-teaser ever does.

Thursday, May 10, 2007

(^^)

For the past year or so, it has been the trend in Japanese TV advertising to include search keywords, rather than URLs, in commercials. The ads typically use a fairly unique, sometimes made-up, word, appearing in a search-engine like box near the end of the commercial. Presumably using a keyword is intended to make it easy to remember; obviously they use odd or made-up words to increase the odds that the advertiser's site comes up first in the search results. I guess the concept of Google Bombs hasn't made it to Japan yet (I say in jest; currently there are half a million hits for "Google Bomb" and another 14400 hits for グーグル爆弾).

Anyway, it looks like the marketing team at Coca Cola Japan put the cart before the horse and has started running ads telling viewers to search for "(^^)" to learn more. It is the left-most commercial under "TVCM Collection" on Coke's site (with the girl smiling in front of a brick wall); I would post a link to the commercial directly, but the site is all flash. Near the end you see a search box at the bottom of the screen with "(^^)" in it.

The only problem is that most search engines don't let you search for all-punctuation keywords! 0 hits on Google; not even Google Japan. 0 hits on Yahoo! and live.com too. Oops.

Besides, "(^^)" is a Japanese emoticon (aka "smiley"). Assume search engines did return results for all-punctuation keywords -- could you imagine the number of results that would come up for something like ":)". In other words, the keyword isn't even unique, making it a terrible keyword even if you could actually search for it!

To Coca Cola Japan's credit, it looks like they had the sense to at least make special arrangements with Yahoo! Japan. Too bad it isn't still 1997. You'd think they would have at least tried their own search keyword in the other search engines (even the small fries) before they paid millions to promote their site using that keyword.

Wednesday, May 9, 2007

Buddhabrot


Back in the early nineties I (or I should say, my 286/12) spent many a night rendering fractals using the Fractint DOS program. Like probably everyone else into fractals, I spent a large part of my time exploring the Mandlebrot set -- one of the best known, most easily-recognized fractals.

Which is why I was both amazed and amused to learn of the Buddhabrot a couple of days ago. The same iterative series is used as is for the Mandlebrot set, but the image is colored based on the density plot of the trajectories that escape to infinity. As you can see, the images are similar to those of the Mandlebrot set only more detailed.

I was simultaneously amused, not just by the clever name, but by the fact that you have to swap the traditional axises such that the real axis is vertical and the imaginary axis is horizontal -- in other words, you have to use a non-standard orientation to make it look like a "buddha" at all. Which just goes to show that fractals are as much art as they are math.

Tuesday, May 8, 2007

Sample Podcast RSS file

When I was putting together a podcast of my favorite Japanese-language radio program last month, it took me several tries to get the RSS tags right. Even with Apple's explanation of where the tags' contents are presented in iTune's podcast display, I still couldn't get it right the first time. On top of that, even once I was satisfied with how it looked in iTunes, I didn't like how the episodes were presented in the iPod U.I.

I'm not going to bore you with the details, but in hopes to help others to avoid the same trial-and-error process I went through, I put together a sample podcast feed in which the tags' contents are the names of the tags themselves.

If you subscribe to the feed [link only works if you have iTunes installed], you'll quickly see what I mean. Click around the iTunes U.I. and the podcast and episode properties and you can learn how each tag is displayed. Also, sync it to your iPod and you can see how the tags are displayed in the iPod U.I.

Unfortunately, my sample podcast does not validate with feedvalidator.org. The reason is simple, though: it doesn't like the tags' contents being the tag names themselves. For example, the <itunes:email> tag should contain an e-mail address; my sample podcast has the text "<channel><itunes:owner><itunes:email>" in it. While this makes it clear which tag's contents you are seeing in the U.I., you'll obviously want to fill them in with real data in your podcast. Just be sure you validate your podcast's RSS feed when you are done.

Thursday, May 3, 2007

Crayons and coloring books

In one of my first blog posts, I lamented how I was frustrated with web frameworks and disappointed by the lack of libraries. I suppose this may be a result of my Un*x development background in which the mantra has been "tools not policy". Frameworks are all about policy.

Just recently I read an amusing analogy that reflected the feeling I got when investigating python web frameworks. Which got me thinking again: why is it that frameworks feel so wrong to me?

Then this morning I finally found the words to express my frustration: Frameworks are coloring books; libraries are crayons.

If you want to draw a picture, crayons are a tool to do it. You can draw whatever you like, in whatever color you like. And the final result, depending on your drawing ability, will look exactly as you intended.

Or you can flip through coloring book after coloring book for a picture that looks vaguely like what you had in mind. You may find what you want quickly, or you may spend days and still not find what you really wanted but get sick of looking so settle on something "good enough". The good news is, once you settle on a picture, you now have a rough idea what your result is going to look like.

Now you are free to color it in whatever color you like (notice that you still need crayons for the coloring). Assuming you were able to stay within the lines, you'll end up with a pretty consistent result. The colors are yours, but the overall image is that of the coloring book picture you started with.

Don't get me wrong: a coloring book can save you a lot of work. For better or for worse, you can start coloring with a pretty good idea of what your picture is going to look like. But what if I don't want to rehash the same picture that has been colored in by a hundred other people before me? You know, what if I want to actually create something new? Well, I've seen people color outside the lines, and it is rarely attractive.

Perhaps the crayon analogy is too pedestrian: try oil paintings instead. Frameworks would be paint-by-numbers templates; libraries would be oil paints. If you are just getting started in oils, a paint-by-numbers template can help you make a nice painting pretty quickly. It won't be original, it won't be great, but you'll still probably get compliments on it from your friends and family.

That said, if I want to make a web site just like a hundred other web sites (with different colors, of course!), a web framework would probably save me a lot of time. For example, if I were making a content management system, especially for a newspaper, Django would be a good framework for the job.

Now there seems to be a lot of database-backed, simple CRUD web applications. Which explains why so many web frameworks make (or try to make) CRUD easy. And along the same lines, a lot of little girls like ponies so there are a lot of coloring books with pictures of ponies; one pony with its head up, or two ponies, or ponies eating grass. If I wanted to draw a pony, I would have an embarrassment of riches. With a little creativity, I could even draw a unicorn.

And therein lies the rub: I don't want a pony.

Which explains why frameworks don't feel right to me: they aren't. There is nothing wrong with frameworks per-se; they just aren't right for me. Which still leaves me with the question of why good libraries are so hard to find, but I already covered that peeve in my first rant.