Monday, April 30, 2007

I'm Prolog

I'm not sure how I feel about it, but apparently I am Prolog. What programming language are you?

Thursday, April 26, 2007

How I didn't buy a Mac

Before our move to Japan, my wife and I are considering buying a new, compact computer to take with us. Being the cool kids we are, we stopped by the local Apple Store to see if we liked anything. My wife worked in publishing for years and is pretty comfortable on a Mac; being a former FreeBSD developer, I liked the idea of BSD under the hood.

Anyway, we were pretty well sold on getting a Mac Mini until a thought crossed my mind: how do you uninstall software? I appreciate that Macs come with less circusware than Windows PCs, but when my "Test Drive" version of Microsoft Office or 30 day trial of iWork (sound familiar?) expires, how do I remove that useless trialware from my Mac?

We checked the help file on the Mac we were looking at -- there wasn't even an entry in the help addressing how to remove software. Not under "uninstall", not under "deinstall", or even "remove". In disbelief, we each took a turn visually scanning every single topic in the help file for something that even hinted at telling us how to uninstall a program. If there is an entry in the Mac help on removing software, we couldn't find it.

So we asked an employee. I wish I had a picture of the expression he made. Apparently it had never occurred to him you might ever want to remove software from a Mac. I guess that explains why Macs come with such large hard drives.

Experimenting, we tried just dragging an app into the trash to see if that would uninstall it. It worked: the app went into the trash. But I have no idea whether that meant the application was really uninstalled or not. Apparently, it depends on the app. And that is the answer that we (and the store employee we had baffled with the question) finally arrived at. Removing software on a Mac is really simple, unless it isn't.

On FreeBSD, I can pkg_delete programs I don't want anymore. On Windows they have an Add/Remove Programs item in the Control Panel that gets the job done pretty consistently. I'm sure linux has got the problem solved several different ways by now. I was honestly surprised that Apple didn't have a good, consistent solution for removing unwanted software. Perhaps the problem is too mundane?

I recall reading that retail stores' sales increase when shoppers know they can return goods without penalty. I think installing software is a lot like shopping: if I don't know whether I can remove it cleanly (i.e. without penalty) then I'm not likely to install it. Which means that I become really hesitant to install anything on my computer. If I can't comfortably try different editors, or mail readers, or web browsers, or whatever, then the value of the computer is appreciably reduced. For example, I know I would have never installed the FireBug add-on for Firefox had I not been sure I could remove the thing if it sucked (which it doesn't, by the way).

So we didn't buy a Mac. Yeah, I'm stupid, but that's the story: not being assured that we can cleanly uninstall software was enough that we left the store empty-handed. Perhaps we're more pragmatic than most. Perhaps I'm not as cool as I thought.

Tuesday, April 24, 2007

Python: Printing unicode

By default, the stdout stream in python is assumed to have ascii encoding. While this is the only safe assumption, it gets mighty annoying when your terminal supports utf8 or Microsoft's eponymous mbcs encoding (e.g. pyDev for Eclipse), especially when you are working with unicode data that you would like to print out while debugging.

It seems like this is a common problem. In fact, it comes numerous times at work and I met a gentleman at the past BayPiggies meeting who was looking for a solution himself. It doesn't help that sys.setdefaultencoding() is a red herring that seems to throw everyone off track.

Enough with the introduction, here is the snippet I use to get my stdout to a non-ascii encoding:
    import codecs, sys
sys.stdout = codecs.getwriter('mbcs')(sys.stdout)
Of course, change 'mbcs' to 'utf8' or whatever encoding you need. You can get fancy and look up the appropriate encoding based on the terminal environment (actually, 'mbcs' does this for you on Windows), but if you're just looking to print unicode for testing/debugging, this short snippet gets you to the goal in two lines of code.

Friday, April 20, 2007

Don't get the wrong idea: I'm still stupid

I just renamed my blog from "Ignorance. Illustrated." to something more innocuous. I'm still an idiot, but I sometimes write about bugs I find in other peoples' code and I wouldn't want them to think I was referring to them.

In case you're curious, the new title is an allusion to the fact that, believe it or not, there is another Kelly Yancey and she's famous. And a girl.

Thursday, April 19, 2007

What open source is all about

Ian Bicking just confirmed that he has fixed the bug I previously reported in Paste:
I think I did what you suggested in r6482 (just removed the regex check)

Talk about fast turnaround. I only mentioned the bug to Ian today and it was fixed almost immediately. You don't see that kind of response time everywhere. Thanks Ian!

Monday, April 16, 2007

Life hacks

Every programmer has implemented Conway's Game of Life at some point or another. In addition to the Javascript implementations [images, checkboxes] I did back in 1999, I also wrote a xscreensaver hack implementing a colorized life simulation back in 2003. Unfortunately, Jamie Zawinski rejected it for inclusion in the xscreensaver distribution with the response,
Well, my only problem is that I find almost all "life"-based savers pretty dull to look at :)

A number of my coworkers have humored me over the years and at least pretended to like my hack, so I've dusted off the files, updated them to xscreensaver 5.0, and put the files online. With the default settings, it looks like this:

Cells are born with a color based on the color of the three neighbor cells. Dead cells are drawn with a darker variant of the last living cell that occupied the space (causing "moving" clusters to leave trails). The trails can be disabled for a more pure Life experience. The hack also supports loading patterns from files in the Life 1.05 format, although the glider, bhept, and rabbits patterns are built-in if you don't have patterns of your own. To keep things interesting, if the simultation starts converging on a steady-state, a pattern of cells will be randomly added to the field.

Until I (or someone else) can convince Jamie Zawinski that Life isn't boring, the only way to install it for yourself is to:
  1. Download and extract the xscreensaver 5.01 source distribution.
  2. Copy clife.c and clife.man into the hacks subdirectory.
  3. Copy config/clife.xml into the hacks/config subdirectory.
  4. Patch Makefile.in using Makefile.in.diff.
  5. Finally, configure and build xscreensaver.

Sunday, April 15, 2007

Python: The anti-snippet

This past Thursday was snippets night at the BayPiggies meeting. Unfortunately, my laptop's battery died before I was able to present my snippet. I call it an "anti-snippet" since I start with a terrible, terrible piece of code and show various ways to clean it up (the best I can) using python. In this case, I started with an implementation of the key hiding algorithm used by the Wireless Zero Config service on Windows Mobile 5. Here is a straightforward port of the original C sample code:

def _hideKeyMaterial(str):
# This gem comes directly from the wzctool source code.
# I like how they pointlessly obfuscated their own code
# by storing the XOR values out-of-order in the
# chXORData array. Nobody does security like
# Microsoft.
if str is None:
str = ''
assert len(str) <= 32
# Pad out to WZCCTL_MAX_WEPK_MATERIAL (32) bytes.
str = str.ljust(32, '\0')
chXORData = [0x56, 0x09, 0x08, 0x98, 0x4D, 0x08, 0x11,
0x66, 0x42, 0x03, 0x01, 0x67, 0x66]
return ''.join([ chr(ord(c) ^ chXORData[(i * 7) % 13])
for (i, c) in enumerate(str) ])

First, I'd like to point out how silly this hiding logic is. As the comments say, the bytes of the chXORData array are pointlessly out-of-order; this adds nothing to security and just obfuscates the logic. Speaking of security, the perported purpose for hiding the key data in first place is to prevent memory scans from extracting the key material. However, since the routine pads the key material out of 32 bytes before XORing it with known values, this "hiding" algorithm actually makes identifying the key data in memory trivial: you would only need to search for a string of values from the chFakeKeyMaterial array (in their real order). For example, finding the values 0x03, 0x98, 0x01, 0x4D, 0x67, 0x08, 0x66, 0x11 in memory would give you a pretty good chance of having found the key data. Then you just need to apply the hiding algorithm a second time (since it is a simple XOR-based hiding algorithm) to recover the key material. Ironically, a search for a run of nul values in memory would lead to a lot more false matches so "hiding" the key actually makes it easier to find.


Anyway, the demerits of the algorithm aside, lets look at how we can improve the implementation using python. First, the most obvious improvement we can make is to remove the pointless rearrangement of the bytes in the the XOR data array:

def _hideKeyMaterial2(str):
str = str.ljust(32, '\0')
chXORData = [0x56, 0x66, 0x09, 0x42, 0x08, 0x03, 0x98,
0x01, 0x4D, 0x67, 0x08, 0x66, 0x11]
return ''.join([ chr(ord(c) ^ chXORData[i % 13])
for (i, c) in enumerate(str) ])

Personally I find this to be the easiest-to-read version, but it isn't really much of an improvement. We could have done at much in C; in fact, I have to wonder why the programmers who wrote the original C code in the Microsoft's reference wzctool didn't do as much.


There isn't much we can do about the chr() and ord() calls since python, unlike C, treats integers and characters as separate types and XOR is not defined for characters.


We can eliminate the itermediate list by replacing the list comprehension with a generator expression instead:

def _hideKeyMaterial3(str):
str = str.ljust(32, '\0')
chXORData = [0x56, 0x66, 0x09, 0x42, 0x08, 0x03, 0x98,
0x01, 0x4D, 0x67, 0x08, 0x66, 0x11]
return ''.join(( chr(ord(c) ^ chXORData[i % 13])
for (i, c) in enumerate(str) ))

Using the itertools module, we can re-write the algorithm as:

from itertools import repeat, izip
def _hideKeyMaterial4(str):
str = str.ljust(32, '\0')
chXORData = [0x56, 0x66, 0x09, 0x42, 0x08, 0x03, 0x98,
0x01, 0x4D, 0x67, 0x08, 0x66, 0x11]
return ''.join(( chr(t[0] ^ ord(t[1]))
for t in izip(repeat(chXORData), str) ))

Personally, I think this version isn't particularly easy on the eyes, but it does demonstate how to use some of the new functions in Python 2.4 (which isn't really that new anymore).


Anyway, that is all I had. Nothing particularly exciting, but I thought it would be a fun to poke fun at a terrible piece of logic and then use it to demonstrate different implementations in python. It's too bad my laptop wasn't up to the task.

Wednesday, April 4, 2007

Security is a process

Let's say, hypothetically, you had a library of C code which implemented some function, frobulate(const struct foo *fooarray, size_t numfoos). For the sake of discussion, it really isn't important what frobulate does, lets just say that it performs an action on an array of structures. Now, it turns out that it is pretty common to call frobulate() with just a single structure, so we provide a helper function frobulateOne(const struct foo *foop).
I realize this is silly, but bear with me.

Now, lets say a developer using your library finds that passing a malformed structure to frobulateOne causes it to crash. You identify that there is buffer overflow in your library code that is caused by not properly validating the contents of the structure passed as an argument. How do you fix it?

You might add some additional validation to frobulateOne(), but certainly you would fix frobulate() to validate each any every structure. After all, frobulateOne() just turns around and calls frobulate() to do the real work. Obviously, it is necessary to validate the input to frobulate().

In fact, it is so obvious you are probably wondering why I'm wasting my time mentioning it.

Because somehow Microsoft, who made a high-profile effort to fix security bugs in their flagship product, screwed it up. When it was found in November 2004 that malformed animated cursors could lead to a buffer overflow, Microsoft issued a fix (MS05-002) 4 months later. The fix issued added input validation to the LoadCursorIconFromFileMap() function. Being highly-security conscious, you would assume that in that four months developing a fix, someone would have noticed that LoadCursorIconFromFileMap() is a wrapper that loads the icon data from the disk and calls LoadAniIcon() to actually process the data. Or that, after discovering such a bug in one piece of code, they would have their crack team of security experts scour through related code looking for similar bugs.

But, sadly, they did not. Earlier this week, Microsoft had to scramble to issue new patches applying the exact same input validation they added to two years ago to LoadCursorIconFromFileMap() to LoadAniIcon(). Independent security researchers had found Microsoft's obvious error before they did and hackers were exploiting the vulnerability to take control of Windows users' machines.

Of course, the IT press is encouraging everyone to apply Microsoft's patches ASAP due to the severity of the bug. Microsoft's detractors will hold it up as another example of how Microsoft's Windows operating system is inherently more insecure than their favorite OS.

But what strikes me is not the severity of the bug itself, but what it says about the internal processes at Microsoft. This latest security flaw is not the result of some previously-undiscovered bug -- it is the direct manifestation of Microsoft's failure to properly address the exact same flaw two years ago. And the correct solution, as I hope my little example at the start of this post demonstrated, should have been obvious from the start.

I'll avoid the debate of whether Microsoft's development processes are fundamentally flawed; I've never worked there so I wouldn't know. However, this bug should make it abundantly clear that Microsoft's processes for addressing known security issues are not just flawed, but practically non-existent. No organization employing professional software engineers could have missed the proper fix two years ago, much less an organization that professes to take security issues seriously. Microsoft claims to "have dedicated staff whose job it is to perform root-cause analysis of defects". Somehow I don't believe it. Or, if they do, I should expect they'll be hiring a new group soon.

Update 2007/04/06: Apparently I'm not the only one who is baffled how Microsoft could fail to have proactively fixed this recent bug and still profess to be focused on making Windows more secure.

Monday, April 2, 2007

Python: Class objects as signal arguments, redux

Just the other day, I mentioned that I'm using class objects as signal arguments to functions/methods in some code I'm working on. Not entirely unlike perl, it seems that, increasingly, There Is More Than One Way To Do It in python so JJ's comment that he uses the same technique is very reassuring.

However, one thing still doesn't sit right with me regarding this technique: a novice programmer may see "class" and think they need to instantiate it and pass the instance as the function/method argument. Using my previous example:

ApplyFlavor("blueberry", CurrentWidget)
Class object passed as argument; widget is CurrentWidget so code that flavors the "current widget" is executed.

ApplyFlavor("blueberry", CurrentWidget())
Caller mistakenly passes instance as argument; widget is not CurrentWidget so ApplyFlavor() does not recognize the signal argument and runs code that accesses the widget object in ways that will probably throw exceptions.


This is nothing that a little documentation wouldn't fix, but I've yet to meet a junior programmer that actually reads all the documentation I wrote for them. So, being proactive, let's assume they don't read directions and catch their mistake for them:

from warnings import warn

class AllWidgets(object):
"""Placeholder for indicating action applies for all widgets
"""
def __new__(cls):
warn("%s should not be instantiated" % cls.__name__,
stacklevel=2)
return cls

....

def ApplyFlavor(flavor, widget=CurrentWidget):
if widget is CurrentWidget:
# Code that flavors the "current widget".
....
elif widget is AllWidgets:
# Code that flavors all widgets.
....
else:
# Code that flavors just the widget specified.
....

Now you cannot instantiate the class; trying to do so just returns an instance of the class object itself so AllWidgets() is AllWidgets.

If you don't mind whether users erroneously try to instantiate the class, you can omit the warning. If, like me, you believe such errors are signs of misunderstanding that could lead to further bugs, issue a warning or -- if you are really zealous -- throw an exception from the __new__() method.

Python: HTTP Accept-Language header parsing

The more I dig through the code, the more Paste is growing on me.

However, I noticed a few nights ago that Paste's Accept-Language header parsing is subtly non-RFC-compliant (sorry JJ!). The issue is that the regular expression it uses (from paste.httpheaders class _AcceptLanguage):
languageRegEx = re.compile(r"^[a-z]{2}(-[a-z]{2})?$", re.I)
does not match all language tags defined by section 3.10 of RFC 2616. Admittedly, it matches all language tags in common usage, but fails to comply with the letter of RFC 2616 in that it fails to match tags such as en-cockney, i-cherokee, or x-pig-latin. The RFC says:
any two-letter primary-tag is an ISO-639 language abbreviation
and any two-letter initial subtag is an ISO-3166 country code.

Note that it does not mandate that all primary-tags and subtags must be two-letters in length nor does it restrict the number of subtags to the set of zero and one. It just says that if they are two-letters, they have the meanings cited. In fact, the augmented BNF grammar only says the primary-tag is one or more 8-bit alphabetic characters followed by 0 or more subtags, each consisting of one or more 8-bit alphabetic characters.

Luckily, the problem regex isn't integral to the parsing algorithm and can be safely removed. As such, all that appears to be necessary to bring to code into RFC 2616 compliance is to remove the definition of languageRegEx as well as the following two lines:
if not self.languageRegEx.match(lang):
continue
With that obscure bug fixed, now you are ready to start serving Upper Sorbian, Cockney, or even Klingon localized versions of your Paste-powered web site.

(Now if we could only get the httpheaders code into the python standard library so everyone can get the benefit of bug-free parsing, whether they use Paste or not.)