| FeedSee | |
|
Ned Batchelder's blog Web Feed Ned Batchelder's blog Ben's birthday was a few weeks ago, and we ended up with three different occasions for cakes to celebrate. For the day itself, we honored the animated Karl Pilkington: For those that haven't enjoyed The Ricky Gervais Show, it's an animated cartoon of Ricky's podcasts discussing anything and everything with his colleagues Steven Merchant and especially Karl Pilkington: For an extended family gathering, we made cupcakes based on Little Big Planet sackboys: Finally, for his delayed party, a monstrosity based on Dante's Inferno, a fascination for Ben: There are three levels here: the top is suicides turned into trees, the bottom is the icy level with a demon guarding the place, and in the middle is the gluttonous third circle, with tormented souls swimming in their own excretions (don't worry, just chocolate pudding and Tootsie rolls). OK, so this is an unusual theme for a cheerful birthday party, but believe me, they loved this, and it's right up Ben's alley. Take a look at some of his art: Forgiveness Pt. 2 Cog in Matlab - Cog, my templating and code generation tool, seems to be like the little engine that could. I wrote it years ago to bring a little Python power to a non-Python job. But then it was unexpectedly useful while preparing my slides for PyCon this year. I did a lightning talk explaining why (I start at about 8:00 minutes in). One of the things I didn't expect when I released Cog was that people would take the concept and port it to other languages. There are implementations for PHP, Ruby, and Perl. And now, Doug Harriman has written another, so you can Cog in Matlab. I don't know anything about Matlab, I didn't realize this was even a sensible idea, but now it's real. When I look at Cog now, I see things I'd like to change about it. Maybe there will be a more modern implementation some day. But it does its job well now. If you have text files that you want to do a little bit of processing on, look into Cog, people seem to like it. A Javascript lexer in Python, and the saga behind it - In the last week I've written a new Javascript lexer, jslex. Why I did it is one of those open source adventures that starts innocently enough. I'm working on a Django project for a client, and it needs to be localized into their language. Django has good support for localization, providing tools for extracting strings from Python, HTML, and Javascript files. But something wasn't right: the client reported that some of the strings were still in English. Usually this means that they made a small mistake during the translation process, and the English in the source doesn't match the English in the message file. But when I looked, it turned out the English was completely missing from the message file. Check the source: yup, it's properly marked for translation. Then I remembered: parsing Javascript source files for messages is fragile. I'd encountered this before, and had simply fiddled with the Javascript source to make the problem go away. But this time, as one message was re-harvested, other messages would disappear. The problem seemed more severe than I had encountered in the past. I decided to learn more about why it was happening. Like many open source projects, Django uses Gnu gettext to manage the message files, including using the xgettext tool to parse the source files to find strings to translate. But xgettext doesn't support parsing Javascript. Django has a strange accomodation to deal with this: it performs a simple transformation on the Javascript source, then tells xgettext that it's Perl. I can only guess why Perl was chosen: because Javascript and Perl both have regex literals, which ws we'll see, play a large part in this story. But Django's Javascript-to-Perl transformation is simplistic: it just converts all //-comments on their own line into #-comments. So this Javascript: // My awesome Javascript gets transformed into this "Perl": # My awesome Javascript I assume the reason //-comments that share a line with code are skipped is to avoid clobbering strings with // in them, though with multi-line strings, even that is not enough to protect them. Of course, this transformation is insufficient to properly carry the strings into the "Perl" so that xgettext can find them. For example, in the above sample, the Javascript comment on line 2 is still executable Perl code after the transformation, and the apostrophe in the comment is considered the start of a string literal, so the gettext call is skipped as part of a multi-line string. In fact, depending on the version of gettext, which determines how advanced its Perl parsing is, all sorts of innocuous Javascript constructs can throw off the parser: gettext("Message on 1"); Here messages 1 and 5 are found, and 3 and 4 are not. How come? Because Perl's y operator consumes two strings delimited by the next character, in this case a semicolon, so lines 3 and 4 are considered literals rather than code. In truth, Django's accommodation for Javascript is an egregious hack. So I wanted to find a better solution. I figured that if I could properly lex Javascript, then I could manipulate the token stream to create something that could reliably be parsed by gettext. The result is jslex, a pure-Python lexer for Javascript. Lexing Javascript turns out to be tricky due to our old friend the regex literal. When a slash character is found, it could mean one of four things: a division operator (either / or /=), a line comment (//), a multi-line comment (/*), or a regex literal. The two comment forms are simple to deal with, because a regex literal can't be empty, so // is always a comment, and a regex can't start with a star, so /* is always a comment. But distinguishing between division and regexes is impossible to do at a purely lexical level, and can be quite subtle: for (var x = a in foo && "</x>" || mot ? z:/x:3;x<5;y</g/i) {xyz(x++);} The first line has a regex of /x:3;x<5;y</g, the second has /g/i. The ECMAScript standard says you need to parse the code, and if you're at a point where a regex literal would be a valid next token, then lex it as a regex, but if you're at a point where a division would be valid, that lex it as division. I wasn't willing to write a full parser, but I've taken a similar approach to other light Javascript tools, and use the previous token to decide if the next token can be division or regex. It seems to work well. The lexer is a general-purpose multi-state lexer built on regular expressions. The rules create a two-state lexer with a state for "division possible," and "regex possible." When I thought I had it working, I outsourced the QA to Stack Overflow, finally finding something to do with my too-many reputation points: pay a bounty to find Javascript it doesn't lex properly. Mind-twistingly, a respondent there found a useful test: a Javascript lexer written in Javascript, which when fed through my lexer, failed because my regex-matching regex couldn't properly lex his regex-matching regex! To bridge Javascript code to xgettext, I chose to transform it into "C" instead of Perl. That means getting rid of the regex literals by turning them all into the C string "REGEX", and changing single-quoted strings into double-quoted strings. The next phase is to determine whether this gets into Django or not. I've prepared it as a patch, but there was already some momentum to replace gettext with Babel, and it's looking like it might all have to wait for 1.4 in any case. As someone who's recently lost time to this bug, I would really rather get something into 1.3.1, so we'll see where that ends up. In any case, if you have need for lexing Javascript in Python, use jslex, it works. Obscene cuts - The current Federal budget negotiations make me sick. The new Congress started out by talking about the corrosive nature of the deficit, and the need to cut spending as a way to combat it. But now they seem focused purely on cutting for the sake of cutting. In the early days of this Congress, jobs seemed to be a big concern. In fact, they were the main reason claimed for the attempt to repeal the health care bill, they were even mentioned in the title of the legislation. But now that the Republicans have turned their attention to "big government", they tell us that 700,000 jobs lost is a small price to pay. Republicans' real priorities were made clear in their early rule change that said laws have to state their effect on the deficit, unless the law is a tax cut. There's no better demonstration of the choice they would make: between reducing the deficit and cutting taxes, they'll cut taxes. That isn't fiscal responsibility, it's shameless pandering, and it's part of the reason we have the deficit we do. So now Washington is negotiating budget cuts, and it's disgraceful the things being considered, like funding for homeless veterans. Social programs will be prime targets, not because they contribute significantly to the deficit, but because Republicans don't like them in the first place. Again, the deficit is not the top-of-mind issue here. Republicans, and especially the Tea Partiers who are now holding them hostage, like to decry spending. But spending in and of itself is not a bad thing. Over-spending is a bad thing, because it leads to deficits and out-of-control debt. But there are two ways to control the deficit and Congress is artificially tying one hand behind their back as they battle the deficit monster. Reducing the deficit will be really hard, and any solution will involve new real pain. When I look at the budget cuts being proposed, it's clear that all the proposed new pain will be felt by the lessers in society. The Republicans will cut social programs, or they'll try to dent Social Security. Some teachers will lose their jobs, unions will be hobbled, and so on. Here's my question: what new pain will the well-to-do feel to reduce the deficit? How will those among us with the most make a new contribution to solve this serious problem? I suspect the Republicans won't come up with a way. That's just wrong: if you have resources, you should help. That's part of what it means to be a community of citizens: when your country has a problem, you help how you can. Of course there's an obvious answer: the well-to-do can contribute revenue. If Congress can consider cutting services that go to those with the least, they can certainly consider increasing taxes on those with the most. Politicians that claim they can't are nothing but dishonest cowards, and they should be ashamed of themselves. Pi Day puzzle solutions - On Monday, I posed two puzzles from PyCon. The commenters there have pretty much covered everything, but I wanted to post my own approach. For the sum of the digits in the number of palindromes between zero and a googol, first think about how many palindromes there are with 2n digits. Each is formed by joining an n-digit number with its reverse, so there are as many 2n-digit palindromes as there are n-digit numbers. The number of n-digit numbers is all combinations of n digits, 10n, except you can't have a leading zero, so remove all those, for a total of 10n - 10n-1. The number of palindromes with 2n-1 digits is the same, since you just remove one of the doubled center digits. The number of palindromes between zero and 10100 is then:
Refactoring and expanding the summation:
Most of the terms in the summation cancel each other out, leaving:
This is 199999999999999999999999999999999999999999999999998, the number of palindromes between 0 and 10100. It has 49 9's, for a digit sum of 1+49×9+8, or 450. For the stairs problem, let's call the number of ways to walk up a flight of n stairs, S(n). We know that S(1) is 1, and S(2) is 2. For the arbitrary case n, there are two possible ways to start up the stairs, you can take the first step, or you can skip the first step. If you take the first, there are S(n-1) ways to finish your walk. If you skip the first, there are S(n-2) ways to finish it. So S(n) = S(n-1) + S(n-2). Combined with our values for S(1) and S(2), we see that S is the classic Fibonacci series: 1, 2, 3, 5, 8, 13, 21, 34, etc. Two Pi Day puzzles from PyCon - Happy Pi Day, everyone! (3/14, get it?) I got back from PyCon last night and have been trying to figure out how to integrate the energy and direction from the conference into my regular life here in Boston. It's a challenge, but PyCon is always an invigorating experience, and I'm really glad to have gone. In honor of Pi Day, I'll present you with two puzzles I heard at PyCon, one as part of Google's recruiting efforts, and one as part of a panel about Python in middle school. Google's puzzle: A number is a palindrome if the digits read the same backwards as forwards: 1, 88, 343, 234565432, and so on. What is the sum of the digits in the number of palindromes less than a googol (10100)? That is, count all the palindromes greater than zero and less than a googol, then sum all the digits in that number, not the sum of the digits in all the palindromes. What's your answer? They actually posed it as "write a program to compute the sum of the digits, etc," and were interested in the shortest program, but I prefer it as a pure math question. The education question was a puzzle presented to middle-school kids, who were asked to write programs to find the answer. Imagine a set of stairs with n steps from bottom to top. You can walk up the stairs by taking every step, or by skipping a single step any time you want. You can't skip more than one step at a time. How many different ways are there to walk up a flight of n steps? For example, representing a step as t and a skip as k, you could do a flight of 3 steps as ttt, tk, kt, and 4 steps could be tttt, ttk, tkt, ktt, or kk. Update: I posted my solutions. Quick and dirty multi-threaded Django dev server - The Django development server is great: it comes in the box, serves Django, auto-restarts on source code changes, and now even color-codes the log lines based on the status returns. But it isn't multi-threaded, which normally wouldn't be a problem for a development server, unless you're writing Ajax interactions, and these days, who isn't? The Django team has declared that they will not offer a multi-threaded development server, for good or bad, so we are left to our own devices. James Aylett wrote django_concurrent_test_server which offers multi-threading and forking, though I haven't tried it. David Cramer offers django-devserver which seems to offer a number of interesting new logging options also. Many developers simply use other "real" web servers, like Apache or gunicorn, but those don't detect code changes, and often don't provide stdout for debugging with. I wanted multi-threading on a project but I didn't want to use a big real web server, and didn't want to install a new Django app and modify settings.py, so I adapted the patch from the closed Django bug ticket to create threadedmanage.py: #!/usr/bin/env python Now I can run "./threadedmanage.py runserver .." and get the standard development server, but with multiple threads. The usual caveats apply: This isn't a real web server, don't use it in production. Your code likely has threading issues, please fix them. I'm pretty sure there are good reasons not to use this code, but it's working well for me. Hobbit cake - Max is turning 19 in a few days, and we made him a cake tonight, Bilbo's house in the Shire, from The Hobbit: Chocolate cake, vanilla icing tinted green for grass, with just a little coconut for texture. Mini-marhsmallow cobblestones, trees of pretzels, shortbread cookie door, chocolate chimney, and of course, Lego figures. BTW: in keeping with the birthday theme, today is the ninth anniversary of the first post on this blog. Boston printing office auction - Last week I attended the auction of the Boston Printing Office, and it was fascinating. The printing office was a genuine printing plant, a factory really, that produced whatever printed materials the City of Boston needed. Poking around, it was clear they weren't producing fine novels, it was a lot of ballots, policies, notices and commendations. I'm sure they did good work there, but this wasn't a craft shop, they were blue-collar workers doing city work. In the corner was a phone booth, the inside plastered with cut-out pictures of women in bikinis. In magic marker on one wall it said, "When in doubt, ship it out." I was there mostly to see the old printing equipment and especially the type. In these days of digital publishing, it astounds me that people used to (and still do) print by arranging tiny pieces of metal into lines of letters, placing those just right into rectangular forms, then running them through presses to produce individual sheets. Hundreds of years ago entire encyclopedias were produced through this painstaking manual process. It's a testament to the printed word that it was a viable commercial endeavor. Of course this office was more automated than that, using Linotype machines and large automatic presses, but the interest for me was the more antiquated technology. The auction itself was interesting and fun. The crowd clearly divided into the industrial people, and the craft and designer people. As the auction got going, people clustered around the auctioneer, getting a sense of who was really buying. All of the type was collected into one lot, Lot 400. During the auction, all the conversation was about who would get Lot 400, how much it would go for, and what they would do with it. The educated opinion was that it was not in good condition to print with, and the cases were worn and too large to sell to a general audience. The difficulty with many of the lots was that they were very heavy and large, so no matter what you paid for them, you'd also be paying thousands of dollars just to move and store them somewhere else. The two Linotype machines went for $10 each, precisely because they were so unwieldy. Everyone was relieved that they were bought by The Charles River Museum of Industry & Innovation, rather than to a scrapper who considered them only so many pounds of metal and would have melted them down. Lot 400 was finally sold to a mysterious individual who frankly looked a lot like Locke from Lost. He paid $9750 for all the type and cases in the room, and as soon as he did, he was swarmed by a dozen people asking how they could get part of it. To add to his mystery, he had no business cards, and no email address. Perhaps he really was Locke, jumping through time to save outdated technology! Or maybe it isn't outdated. There are more people doing letterpress printing than there were 15 years ago, so it's experiencing a resurgence, and people are working to document and save the tools that are still around. One good side effect of the day was to meet and hear about people in the Boston area working in letterpress:
In the end, the auction had two distinct feelings: first, a nostalgia and sadness as a working printing factory was split up and shipped off, some parts to be simply dismantled for scrap. The old ways were good ways, they just aren't good enough any more. But second, a hopefulness seeing all these people turn up to see the old equipment off, and to make use of the parts they can in their own smaller ways. PyCon presentations, Hollywood style - PyCon 2011 is right around the corner, which means the Boston Python Meetup is doing its annual PyCon on the Charles practice sessions, and I'm thinking about what makes a good presentation. The advice I wrote last year is good, but I thought of a new analogy: a PyCon presentation should be a trailer for your expertise. That is, imagine your expertise on your topic is like a full-length Hollywood movie. Then your PyCon talk should be the trailer for that expertise. A trailer is by nature short, so is your talk. Not as short as a trailer, but shorter than you want. You have to think hard about what to take out and what to leave in. Like a trailer, your talk needs to tell a compressed story, it should have some relatable emotion in it, and ideally it will have some action (demos). The point of a trailer is to convince people to watch the movie. The point of a PyCon talk is not to make people experts, but to convince them to learn more about your topic, which they can do afterward. You don't have to cram all the information into them, just as the trailer doesn't have to tell the entire story of the movie. If they leave thinking, "I'd like to know more about that," you've done your job. Writing 25-minute technical presentations is hard, and this trailer analogy may not be perfect, but I think it's a good mindset to get into for crafting a good PyCon talk. | |
| FeedSee | |