Here’s a clarification of my recent Tweet about Diana. Sometime over the weekend Diana, our primary Linux box that serves as the backbone of our home network (DNS, file server, internal Web server, SSH gateway, SVN repository server, etc.), gave up the ghost. I only discovered this yesterday evening, so I haven’t had much time to diagnose the problem. It’s almost certainly a hardware issue. I’m thinking it’s the power supply or the motherboard, as when I try to power her up, nothing happens. The power light comes on, I can watch the CPU fan twitch like it wants to start spinning, but otherwise nothing else visible occurs. No output makes its way to the monitor so there are no error messages to follow.
At this point, I’m not sure of the status of the hard drives. My hope is that they’re fine; the obvious problem appears to be occurring before they even start to spin, as if they’re not getting any power (and that’s why I suspect it’s a power supply issue). The good news is that Demeter, her predecessor, has been sitting idle and collecting dust and has since been rapidly pressed back into service. I should be able to slip Diana’s disks into Demeter, check their integrity, and hopefully recover the data. That’s the core thing right now, getting the data off; hardware is replaceable, data is not. The only hitch is that Demeter is old enough that I’m not sure her BIOS will read Diana’s larger disks. Demeter’s current HD is already larger than her BIOS supports, though, and Linux seems to work fine in this situation, so I’m hoping that won’t be a problem. A worst-case scenario might be to throw a live Linux distro into Athena, our current “alpha” Windows XP desktop, and try to grab the data that way. (Diana’s disks are in ext3, which obviously Windows can’t read.) Both Demeter and Diana have EIDE drives while Athena uses SATA, but I’m almost certain Athena also has legacy EIDE on the motherboard somewhere; if not, I’m hosed there.
Why might this be a concern to you? Well, for one thing, Diana was one of several redundant backup locations for storing my my high-resolution original strips. Fortunately, everything from Year Nine and back has already been backed up to multiple DVDs stored in multiple physical locations, while Year Ten’s files are stored across three redundant drives (two in separate physical machines and one external USB drive). More importantly, Diana was my SVN repository server, housing all the source code for the GPF site. I have working copies of that repository in multiple locations so I’m not hurting there, but with the repository down I’m stuck manually keeping those working copies in sync. The biggest problem that may affect you guys is the humongous time sink this will be for me to repair/replace Diana and get all our internal mechanisms working again. With my day job, two hours of commute, and toddler patrol vying for my time, my comic production schedule is severely squeezed as it is. This is probably going to impact that buffer I was forced to take a hiatus in December to reclaim as I wasn’t able to increase my production, just maintain the status quo.
For those of you who might care, I’ll post updates here when I can. More frequent cries of frustration will likely come through the Twitter feed. If the comic will be severely impacted, you’ll get something in the GPF News. So keep watching those RSS feeds.
Sorry again for the long dry spell. As hinted at in the latest GPF News post, things have been hectic in the Darlington household these past few months, with tons of minute issues slowly chipping away at the overall allotment of free time. The good news for GPF fans, though, is that I should have a good month’s worth of comics in the buffer when the comic restarts on January 5th, and with the holidays behind us I should be able to concentrate more on getting things done and on time.
In the tradition of last year’s “Christmas loot” post, I thought I’d post some of the awesome things I received as gifts this year. I know some people might look at this as a bit of bragging—and I can see how it can be read that way—but it’s really not. It’s an honest, geeky desire to share some of the exciting things my friends and family blessed me with out of love and happiness. If you want to read bragging into this, well, that’s your choice and you’re free to ignore this post. Otherwise, let me squeal with geeky glee as I delineate some of the cool things I was blessed to receive from people I love.
I’ll start off with a note to the folks: I know some of my family reads this blog, so don’t be offended if I didn’t mention something in particular that you got me. It’s not that it wasn’t memorable or that I didn’t like it; it’s because you know I have the memory of a sieve and I didn’t take copious notes after each present was opened. Since I’m composing this away from where the presents are stashed, I’m doing everything from memory. I also spent most of my time during the present opening ceremonies assembling and subsequently helping Ben play with his new toys, so there were lots of interruptions. So here’s my apologies in advance and don’t forget that blog posts can thankfully be edited.
My favorite gift, by far, is the one given to me by my wife. (Well, she signed Ben’s name on the tag, but I know he has neither the budget nor expertise to have picked it out himself. Just remember that if you read this years later, my son.) She got me a Nikon D60 digital SLR camera. As I previously Tweeted, “It’s like giving a 16-year-old with a beat-up ’85 Civic the keys to a sports car.” 10.2 megapixels, “real” lenses, tons of preset and manual options… it may technically be a “prosumer” or low-end professional camera, but it’s definitely the best I’ve ever had.
I’ve always wanted to learn more about photography, but have had neither the time nor capital to really invest in more than casual picture taking. We’ve had a succession of digital cameras over the years, all of which have served us very well (the Shows & Cons subsite is loaded with the results). However, they’ve all been relatively cheap, low-end models geared for amateur consumers. Our previous family camera was a nice little Olympus that only topped out at three megapixels and still used SmartMedia cards. Do you have any idea how hard those things are to find these days? While still functional, it was definitely showing its age. However, like many consumer cameras, it did all the automagic focus and lighting settings, making it a simple point-and-shoot device. This new Nikon can do point-and-shoot well, but it has enough manual options to make it a good learning platform for a curious amateur to graduate to a serious hobbyist. Now my biggest problem is finding time to actually play with it….
As an ironic side note, as I mentioned in the previous “Christmas loot” post, my wife’s birthday is also in December, and guess what I got her? That’s right, a new camera. Her’s is admittedly not as nice, but it is exactly what she wanted: a small little point-and-shooter that she can tuck away in her purse for those spur-of-the-moment photo ops where lugging the old Olympus around (and, for that matter, my new Nikon) would be inconvenient. As she so succinctly put it, “Who knew we were going to have such a photogenic holiday?”
Other items of note:

My sister's GPF quilt/wall-hanging
So, what did Santa leave in your stocking this year?
So I was listening to this week’s edition of TWiT, during which Leo Laporte and the usual band of miscreants psychoanalyze Microsoft‘s new ad campaign featuring Bill Gates and Jerry Seinfeld. I had not seen the ad yet myself—apparently it debuted during an NFL opening game, and considering that I don’t watch professional sports and the overwhelming majority of my television watching now consists of shows containing magic backpacks and talking monkeys that wear red boots, it hadn’t come to my attention yet—so the discussion naturally raised my morbid curiosity. So I dug around a little on YouTube and found this. I must admit, it’s as surreal as I was led to believe. I won’t attempt to try and mine this thing for hidden meaning like Ryan Block did; the only comment I think I can really make about it is that it tells me absolutely nothing about Microsoft, Windows, or any other product they may have in the pipeline, and after watching it I am no more inclined to pick Microsoft options over the competition than I was before. I thought that was the point of advertising….
But that’s not the weirdest part. Last night, I dreamed about Bill Gates. Maybe it was exhaustion, maybe it was a prescription-drug fueled haze (I’m currently in the middle of my quarterly bout with bronchitis), but it was not something I was particularly expecting. There’s nothing really interesting to say about the dream, though. In what little I remember, Mr. Gates was there, tying his shoes. He wasn’t necessarily trying on new ones, nor was there any indication that the shoes were noticeably old. They were shiny, brown leather dress shoes, so they could have been either new or well maintained. Mr. Seinfeld was nowhere in sight. The setting was unclear; I can’t say that it was a shoe store, a men’s locker room, or any other recognizable setting. I know only that I was seated on a wooden bench which I believe was painted a dark green and that Bill Gates stood next to me, lifted one leg, and set the foot on the bench, then proceeded to tie his shoe laces. Then he left without saying a word and the dream moved on to wherever it went after that. I remember nothing else about the dream, and to my knowledge Mr. Gates appeared nowhere else within it.
I have no desire to do any research on what kind of Fruedian analysis can be drawn from watching a billionare-CEO-turned-philanthropist from one of the world’s largest and most reviled software companies tying his shoes next to me. I’d be afraid of what I’d find. So I’ll just say it was the prescription cough syrup working its magic and go back to talking to the pink elephant and the green roast beef sandwich on either side of me. It’s a conversation about world politics and an economy built entirely around edible golf balls will solve the world’s energy crisis. It’s very enlightening. Maybe, somehow, some way, we’ll figure out exactly what makes Windows “delicious” while we’re at it. Drug-enduced hysteria is about the only way I can think of in my current semi-lucid state to make an operating system taste delicious. It makes me begin to wonder, though… what would other OSes taste like? Would Mac OS be crunchy? Would Linux be spicy? Would my Treo’s PalmOS be light in calories? I certainly hope so… I am trying to lose weight….
Not long ago, I took advantage of a nifty WordPress plugin to enable XML sitemaps for the blog. For those who’ve never heard of XML sitemaps (I hadn’t for quite a while), they are little XML files in a specific format that give search engines like Google hints on how to index your site. They don’t necessarily improve your search rankings per se, but they help the search engine better decide what to index, when it was last updated, relative priorities of different pages, etc. You then throw a special line into your robots.txt file or directly submit the file to the search engine to let it know the file is available. Once the engine knows about it, it will check it periodically to optimize how the site is indexed.
The plugin, of course, makes this ridiculously easy for WordPress. However, GPF gets orders of magnitude higher traffic than the blog does, so finding a way to generate sitemaps there would be ideal. I toyed with the idea for a while until I finally sat down, examined the sitemap specification, and figured out how to roll my own code. It now successfully runs via cron each morning and gives a pretty thorough census of what’s available on the GPF server. The problem is that the GPF site is divided into several parts that are largely autonomous and self-contained:
Ignoring the forum, that left me three major sub-projects for creating sitemaps. It’s easy enough to segregate these into separate files and tie them together using a “sitemap index” file, so that wasn’t a problem. The archive would just be a formatted dump of the archive database, deriving approximate update times from the posting date. The bulk of the rest of the site could be done by stepping through the file structure of the site and taking note of every HTML or PHP file and its last modification time (conveniently ignoring certain files and directories that don’t need to be counted, like access-restricted Premium pages). And that leaves the wiki.
I managed to come up with a decent wiki sitemap routine that I thought I’d share, just in case someone else might be interested. Of course, it’s not likely to be useful for massive wikis like Wikipedia—sitemaps are restricted to 10MB in size and 50,000 URLs—but something small like the GPF Wiki would be easy to submit and index. It was built using MediaWiki 1.12.0; I am uncertain what database changes may be needed for older or newer versions. Here’s my current process:
I only want to index relevant pages, including category pages. The relevant database table for this is “page”. (How… convenient). Unfortunately, this table also contains things like redirects and images. Each image has its own “page” assigned to it; try clicking on an image in Wikipedia or in the GPF Wiki to see what I mean. The time stamp of the latest revision, however, is stored in the “revision” table, joined to the page table by the latest revision ID number. So a good starting bit of SQL would be:
select p.page_title, r.rev_timestamp from page p, revision r where p.page_latest = r.rev_id and p.page_is_redirect = 0 and p.page_title not like '%.gif' and p.page_title not like '%.png' and p.page_title not like '%.jpg';
Unfortunately, this also returns a few meta pages like the sidebar and editing pages. Before selecting, I define a look-up hash of titles I want to avoid and as I loop through the results I just skip those.
The title, of course, is both the displayed title and the input portion of the URL that uniquely identifies the page. Thus, knowing the base URL (http://www.gpf-comics.com/wiki/) I can easily reconstruct the public URL of any article from the title. As with Wikipedia links, spaces have already been converted to underscores, but the rest of the string needs to be be URL encoded. This is easy enough, so we can quickly build the full URL as required by the XML schema.
The time stamp is a little bit tougher. MediaWiki stores time stamps as a 14-digit number in YYYYMMDDHHMMSS format, always in UTC time. In Perl (in which almost all my crons are coded) this is easy enough to break apart and turn into a UNIX time stamp. I then output the date in W3C ISO 8601 format as required by the schema. A sample of a resulting entry would be:
<url> <loc>http://www.gpf-comics.com/wiki/Nick</loc> <lastmod>2008-08-22T06:00:07Z</lastmod> <changefreq>monthly</changefreq> <priority>0.3</priority> </url>
Change frequency and priority are purely guesses and fudges for mine. According to the sitemap specification, priorities are purely relative to other parts of the site. I rated the wiki pages as relatively low since the wiki at GPF is considered a “supporting” page and subordinate to things like the archive. As for change frequency, the sitemap specification includes a number of predefined choices (hourly, daily, weekly, monthly, etc.). Monthly was a purely off-the-cuff guess; some pages may update more or less frequently, but monthly would be a good average. It is entirely possible to rate select pages as higher priority or frequency than others, but I decided to take the easy route and rate everything the same. To apply different values, you just need to pay special attention to the title and assign a non-default value when that title crops up.
Well, I hope someone out there might find this helpful. I’m not sure if it really helps anyone find anything at GPF, but it was a fun little exercise nonetheless.
I hope to post more on this when there’s more data to post, but I thought I’d throw up a quick note stating that the latest episode of the Security Now! “netcast” features a question posed by yours truly. (The best part was listening to Leo Laporte stumble over my long-winded rambling.
) The high-quality version of the show can be found at the previous link; a low-bandwidth version as well as a text-only transcript can be found at the corresponding page at GRC.com. A search in the transcript for “Darlington” will take you to the beginning of my question; in the netcast, it starts around 38 minutes, 22 seconds in. (Of course, I encourage everyone to read/listen to the entire thing.)
For the full effect, though, you’ll also need to listen to/read the previous two non-Q&A episodes of the show, #149 and #151. (Low-bandwidth and trascriptions can be found here and here.) The entire dialog concerns the recent trend of ISPs selling out their customers to allow third-party advertisers to come in and install hardware at the ISP to facilitate tracking the ISPs’ customers’ surfing habits across sites. While the ad companies in question claim to not be recording personally identifyable information about the ISPs’ customers, the capability is there and the possibilities for abuse are enormous. It brings back many shades of the DoubleClick controversies of the late 1990s-early 2000s, only much more ominous. I provided a unqiue standpoint to the discussion: that of a Web developer hosting a site and encountering similiar mysterious “first party” cookies set for my domain but not set by me.
The full body my question is present, but I’m not completely satisfied with the answer.
Let’s just say I think Steve Gibson made an assumption about the GPF site that’s not 100% true. I’ve replied to his response with additional information. I don’t necessarily expect another response (he does, after all, have his own agenda to follow on his show), and even if he does it will likely be in episode #154, the next scheduled Q&A episode. If anyone is interested, I’ll post updates if and when this occurs. If I don’t get a response, I’ll post my response here, especially since it contains some disturbing observations about “first party” cookies that have mildly paranoid folks like me nervous. (I’d hate to see what it does to really paranoid people.)
So ICANN, the organization that oversees the doling out of domain names on the Internet, has approved the relaxation of the rules for top-level domains (TLDs) to allow for arbitrary TLDs for whoever has the money and technical capability to grab it. If things go according to plan, by the middle of next year you may be able to just type into your browser something like http://search.google/ rather than http://www.google.com/, or perhaps you’d rather http://drink.coke/ or http://drive.ford/ or even http://have.crazy.monkey.sex/.
To quote virtually ever character in the Star Wars universe, I have a bad feeling about this.
I am so sitting on the fence on this one. My initial gut reaction is this can’t be a good thing. I know far too many non-techies who are confused by Internet addressing as it is, so let’s confuse them some more by adding even more things for them to figure out. JD Fraizer over at User Friendly hit the nail on the head; anyone who has ever used Usenet is probably rolling their eyes a lot more lately. The potential for cybersquatting and trademark dilution is enormous. ICANN insists that an “objection-based mechanism” will be in place to prevent such things, but how much red tape (and legal dollars) will someone have to go through to protect their brand? Every day that a squatter sits on a domain equates to valuable time, money, and reputation that can be lost, something big corporations may be able to wait out but little guys like me can’t afford. It’s been hard enough right now for me to keep up with all the variants of gpf-comics.something out there. And let’s not get into the discussion of what “offensive” TLDs creative individuals might come up with….
Of course, it’s not like I’m going to be registering .gpf anytime soon anyway. I suppose that’s one thing ICANN did right: to create your own TLD, you’ll need a truck load of money first. The CBC is reporting an estimated $100,000 per TLD—I have no idea if that’s Canadian dollars or not—but ICANN only says for now that “fee information is not yet available”. Ordinary domain names are dirt cheap nowadays, which is a blessing to small-time operators like me but a curse in that squatters with cash to burn can snap up thousands at a time and hold them for ransom. At least starting a new TLD will take capital, making it a serious investment. It will also be quite a technical undertaking; owning a TLD also means you have to build the infrastructure support it. So if Google were to grab .google with their pocket change, they’ll also need to pony up the hardware and bandwidth to maintain the root server. Google may be a bad example (they’ve got servers to spare, I’m sure), but for organizations not used to maintaining that kind of “big iron” it will be a significant learning curve.
But then it occurred to me… how awesome would it be if all your favorite comics or comic-related sites could found at “something dot comics”?
Imagine if you will that some philanthropic comics creator/reader with a hundred grand in “mad money” under his bed were to snatch up .comics and register that with ICANN. Being philanthropic, this individual would charge a minimal fee to register a domain there, just enough to cover operational costs and maybe make a modest living in the process, aggregated out to anticipated demand (of which I’m sure there’d be plenty). There would be only one additional requirement for application beyond the current standard (ethical) process: the domain must be used for a site publishing, promoting, or discussing comics in some way, shape, or form. Consideration for approval would require proof of content, such as a preview development site, previously published work, portfolios, etc.—just enough to prove the site really will be used for something comic-related. Individual titles would be encouraged to register at the root level (dilbert.comics, gpf.comics, x-men.comics) while companies would register their names (dc.comics, marvel.comics, keenspot.comics) and potentially use sub-domains for their own titles (x-men.marvel.comics). Our hypothetical philanthropic registrar would also be fair and balanced as to not let big conglomerates dominate the little guys. Disputes over domains would come down to traditional copyright and trademark resolutions, requiring proof of prior art, etc.
Wouldn’t that be just grand?
Of course, what will really happen will be that some big company will come along and buy up .comics with far more misanthropic intentions (and we know such an obvious TLD wouldn’t sit dormant for long). They’d either squirrel it away selfishly for promoting their own works and no one else’s, or they’ll charge such an exorbitant “premium” price for registrations that only big publishing houses like DC, Marvel, etc. will be able to afford it, shutting out the little independents and webcomics. Even if they price it fairly and keep it open, I’d bet it would get so swamped with squatters that the novelty of the whole TLD would become as diluted .info is today. Maybe it’s just that I’m pessimistic… or that I’ve been annoyed for so long that some jerk had been holding gpf-comics.org hostage for years… but I just don’t see this turning into as promising a possibility as I think it could be.
Oh, well. I’ve been waiting for gpf.com for nearly a decade now. I guess I can just add gpf.comics to the list. Wishful thinking….
For both of you out there who care, WinHasher has now been bumped to version 1.3. The changes are very minor, so there’s no need to upgrade unless you find the following two new features useful:
I had originally started adding support for HMAC signed hashes but have abandoned that for now. If there’s anyone out there who might actually find that useful, drop me a line and I’ll revisit the code to see what I might be able to add. Downloads can be found at the first link above.
The following is a specification proposal for a new pseudo-random character generator (PRCG), tentatively called the “Tiny Tots PRCG”. This specification is to be considered open and royalty free; everyone is free to implement and extend this specification, although attribution is appreciated. It usefulness, however, may be limited and may only be of interest to cryptographic and mathematical academics or really bored parents.
System Requirements:
Implementation:
Caveats, Limitations, and Additional Notes:
Just a head’s up to say I’ll be guest hosting Friday’s installment of the Jesus Geek podcast. I apologize in advance for any static or artifacts in the audio; chalk that up to my podcasting inexperience and not as an overall indicator of the quality of Jesus Geek as a whole. I’ll post a direct link to the download page as soon as I see that it goes live.
Update March 21: Aaaand… here it is.
The new GPF site has been running live for half a month now, and I’m proud to say things have been running incredibly smoothly. That is, at least, from my perspective; I haven’t seen any major glitches, and aside from a few typos in the comic (which are obviously independent of the site code), nobody has written me about any problems. This is especially heartening because the new site was pretty much entirely coded by hand by me, sans a few bits and pieces. (I can’t take credit for the OS, the web server software, the database engine, or the forum. But everything else… yep, that was me.)
There were a lot of motivations for writing my own archiving system, but the primary one was efficiency. While I considered trying something off-the-shelf, so to speak, like ComicPress or Drupal, I really wanted something that would be blazingly fast yet still dynamically generated to let me do things like GPF Premium on the server side, primarily for security reasons. (Server-side processing means no messy JavaScript is required by the users, thus exposing them to less risks, while Premium content doesn’t even get sent to the browser at all if Premium isn’t enabled.) So the GPF site is optimized out the wahzoo, with certain high-volume pages built once by nightly crons while others that require more interactivity reduce database queries to simple selects as much as possible. I’m never one to brag and toot my own horn, but I’m actually pretty proud of the new site and how responsive it is.
Of course, I can’t really take all the credit. I do have to give some serious props to XCache.
For those unfamiliar with PHP, it is one of many server-side, interpreted scripting languages commonly used for dynamic Web site development. The caveat, however, to any interpreted language is that on each request the source script must be read, parsed, compiled, and executed before anything is set back to the end user’s browser. This is one reason why dynamic sites are and will always be slower than serving purely static HTML files. Static HTML just needs to be read and regurgitated; anything that requires the Web server to actually think takes more time. Add to that the fact that there could be hundreds or even thousands of requests all competing at once for content and it’s a miracle anything get served at all.
XCache is one of several opcode caching extensions for PHP. Essentially, when the first request for a script is made, the script is parsed and compiled as usual. However, XCache stores the compiled code so subsequent requests can skip the parsing and compilation steps and go directly to executing the code. This significantly increases the speed of execution by eliminating one of the costliest parts of the process (except perhaps database connections). In addition, XCache also includes the ability to cache variables and objects, so commonly repeated and expensive variable generation–such as the cryptographic hashes I use for salting cookie hashes or database look-ups for common elements like the Premium subscription levels–can be stored in the cache rather rebuilt on each request.
I was first introduced to XCache by the XCache for WordPress plugin, which was probably mentioned in one of the development feeds built into the WordPress dashboard. I’ve been running this combination here on the blog for a little while with moderate success; I’m still trying to find a good balance of configuration settings to get the best results, but I’ve been happy with the results so far. Without putting much thought into it, I went ahead and installed XCache on the GPF server, hoping that it would help even if I never got a chance to optimize it. Fortunately, it has helped, and now that I’ve optimized the settings it’s exceeded most of my expectations. I’m not sure if there’s something about my code that caches better than WordPress, but GPF has done much better with XCache than the blog has.
Admittedly, I haven’t compared it to any other opcode cachers, nor have I benchmarked it against any of the competition. That said, however, I heartily recommend it to anybody running PHP applications. To get the greatest benefit, you may need to modify some code (or install a plugin if you’re using a prepackaged application) to take advantage of the variable/object caching. But even without modification the opcode caching alone makes for a vast improvement.