So I was listening to this week’s edition of TWiT, during which Leo Laporte and the usual band of miscreants psychoanalyze Microsoft‘s new ad campaign featuring Bill Gates and Jerry Seinfeld. I had not seen the ad yet myself—apparently it debuted during an NFL opening game, and considering that I don’t watch professional sports and the overwhelming majority of my television watching now consists of shows containing magic backpacks and talking monkeys that wear red boots, it hadn’t come to my attention yet—so the discussion naturally raised my morbid curiosity. So I dug around a little on YouTube and found this. I must admit, it’s as surreal as I was led to believe. I won’t attempt to try and mine this thing for hidden meaning like Ryan Block did; the only comment I think I can really make about it is that it tells me absolutely nothing about Microsoft, Windows, or any other product they may have in the pipeline, and after watching it I am no more inclined to pick Microsoft options over the competition than I was before. I thought that was the point of advertising….
But that’s not the weirdest part. Last night, I dreamed about Bill Gates. Maybe it was exhaustion, maybe it was a prescription-drug fueled haze (I’m currently in the middle of my quarterly bout with bronchitis), but it was not something I was particularly expecting. There’s nothing really interesting to say about the dream, though. In what little I remember, Mr. Gates was there, tying his shoes. He wasn’t necessarily trying on new ones, nor was there any indication that the shoes were noticeably old. They were shiny, brown leather dress shoes, so they could have been either new or well maintained. Mr. Seinfeld was nowhere in sight. The setting was unclear; I can’t say that it was a shoe store, a men’s locker room, or any other recognizable setting. I know only that I was seated on a wooden bench which I believe was painted a dark green and that Bill Gates stood next to me, lifted one leg, and set the foot on the bench, then proceeded to tie his shoe laces. Then he left without saying a word and the dream moved on to wherever it went after that. I remember nothing else about the dream, and to my knowledge Mr. Gates appeared nowhere else within it.
I have no desire to do any research on what kind of Fruedian analysis can be drawn from watching a billionare-CEO-turned-philanthropist from one of the world’s largest and most reviled software companies tying his shoes next to me. I’d be afraid of what I’d find. So I’ll just say it was the prescription cough syrup working its magic and go back to talking to the pink elephant and the green roast beef sandwich on either side of me. It’s a conversation about world politics and an economy built entirely around edible golf balls will solve the world’s energy crisis. It’s very enlightening. Maybe, somehow, some way, we’ll figure out exactly what makes Windows “delicious” while we’re at it. Drug-enduced hysteria is about the only way I can think of in my current semi-lucid state to make an operating system taste delicious. It makes me begin to wonder, though… what would other OSes taste like? Would Mac OS be crunchy? Would Linux be spicy? Would my Treo’s PalmOS be light in calories? I certainly hope so… I am trying to lose weight….
Not long ago, I took advantage of a nifty WordPress plugin to enable XML sitemaps for the blog. For those who’ve never heard of XML sitemaps (I hadn’t for quite a while), they are little XML files in a specific format that give search engines like Google hints on how to index your site. They don’t necessarily improve your search rankings per se, but they help the search engine better decide what to index, when it was last updated, relative priorities of different pages, etc. You then throw a special line into your robots.txt file or directly submit the file to the search engine to let it know the file is available. Once the engine knows about it, it will check it periodically to optimize how the site is indexed.
The plugin, of course, makes this ridiculously easy for WordPress. However, GPF gets orders of magnitude higher traffic than the blog does, so finding a way to generate sitemaps there would be ideal. I toyed with the idea for a while until I finally sat down, examined the sitemap specification, and figured out how to roll my own code. It now successfully runs via cron each morning and gives a pretty thorough census of what’s available on the GPF server. The problem is that the GPF site is divided into several parts that are largely autonomous and self-contained:
Ignoring the forum, that left me three major sub-projects for creating sitemaps. It’s easy enough to segregate these into separate files and tie them together using a “sitemap index” file, so that wasn’t a problem. The archive would just be a formatted dump of the archive database, deriving approximate update times from the posting date. The bulk of the rest of the site could be done by stepping through the file structure of the site and taking note of every HTML or PHP file and its last modification time (conveniently ignoring certain files and directories that don’t need to be counted, like access-restricted Premium pages). And that leaves the wiki.
I managed to come up with a decent wiki sitemap routine that I thought I’d share, just in case someone else might be interested. Of course, it’s not likely to be useful for massive wikis like Wikipedia—sitemaps are restricted to 10MB in size and 50,000 URLs—but something small like the GPF Wiki would be easy to submit and index. It was built using MediaWiki 1.12.0; I am uncertain what database changes may be needed for older or newer versions. Here’s my current process:
I only want to index relevant pages, including category pages. The relevant database table for this is “page”. (How… convenient). Unfortunately, this table also contains things like redirects and images. Each image has its own “page” assigned to it; try clicking on an image in Wikipedia or in the GPF Wiki to see what I mean. The time stamp of the latest revision, however, is stored in the “revision” table, joined to the page table by the latest revision ID number. So a good starting bit of SQL would be:
select p.page_title, r.rev_timestamp from page p, revision r where p.page_latest = r.rev_id and p.page_is_redirect = 0 and p.page_title not like '%.gif' and p.page_title not like '%.png' and p.page_title not like '%.jpg';
Unfortunately, this also returns a few meta pages like the sidebar and editing pages. Before selecting, I define a look-up hash of titles I want to avoid and as I loop through the results I just skip those.
The title, of course, is both the displayed title and the input portion of the URL that uniquely identifies the page. Thus, knowing the base URL (http://www.gpf-comics.com/wiki/) I can easily reconstruct the public URL of any article from the title. As with Wikipedia links, spaces have already been converted to underscores, but the rest of the string needs to be be URL encoded. This is easy enough, so we can quickly build the full URL as required by the XML schema.
The time stamp is a little bit tougher. MediaWiki stores time stamps as a 14-digit number in YYYYMMDDHHMMSS format, always in UTC time. In Perl (in which almost all my crons are coded) this is easy enough to break apart and turn into a UNIX time stamp. I then output the date in W3C ISO 8601 format as required by the schema. A sample of a resulting entry would be:
<url> <loc>http://www.gpf-comics.com/wiki/Nick</loc> <lastmod>2008-08-22T06:00:07Z</lastmod> <changefreq>monthly</changefreq> <priority>0.3</priority> </url>
Change frequency and priority are purely guesses and fudges for mine. According to the sitemap specification, priorities are purely relative to other parts of the site. I rated the wiki pages as relatively low since the wiki at GPF is considered a “supporting” page and subordinate to things like the archive. As for change frequency, the sitemap specification includes a number of predefined choices (hourly, daily, weekly, monthly, etc.). Monthly was a purely off-the-cuff guess; some pages may update more or less frequently, but monthly would be a good average. It is entirely possible to rate select pages as higher priority or frequency than others, but I decided to take the easy route and rate everything the same. To apply different values, you just need to pay special attention to the title and assign a non-default value when that title crops up.
Well, I hope someone out there might find this helpful. I’m not sure if it really helps anyone find anything at GPF, but it was a fun little exercise nonetheless.
I hope to post more on this when there’s more data to post, but I thought I’d throw up a quick note stating that the latest episode of the Security Now! “netcast” features a question posed by yours truly. (The best part was listening to Leo Laporte stumble over my long-winded rambling.
) The high-quality version of the show can be found at the previous link; a low-bandwidth version as well as a text-only transcript can be found at the corresponding page at GRC.com. A search in the transcript for “Darlington” will take you to the beginning of my question; in the netcast, it starts around 38 minutes, 22 seconds in. (Of course, I encourage everyone to read/listen to the entire thing.)
For the full effect, though, you’ll also need to listen to/read the previous two non-Q&A episodes of the show, #149 and #151. (Low-bandwidth and trascriptions can be found here and here.) The entire dialog concerns the recent trend of ISPs selling out their customers to allow third-party advertisers to come in and install hardware at the ISP to facilitate tracking the ISPs’ customers’ surfing habits across sites. While the ad companies in question claim to not be recording personally identifyable information about the ISPs’ customers, the capability is there and the possibilities for abuse are enormous. It brings back many shades of the DoubleClick controversies of the late 1990s-early 2000s, only much more ominous. I provided a unqiue standpoint to the discussion: that of a Web developer hosting a site and encountering similiar mysterious “first party” cookies set for my domain but not set by me.
The full body my question is present, but I’m not completely satisfied with the answer.
Let’s just say I think Steve Gibson made an assumption about the GPF site that’s not 100% true. I’ve replied to his response with additional information. I don’t necessarily expect another response (he does, after all, have his own agenda to follow on his show), and even if he does it will likely be in episode #154, the next scheduled Q&A episode. If anyone is interested, I’ll post updates if and when this occurs. If I don’t get a response, I’ll post my response here, especially since it contains some disturbing observations about “first party” cookies that have mildly paranoid folks like me nervous. (I’d hate to see what it does to really paranoid people.)
So ICANN, the organization that oversees the doling out of domain names on the Internet, has approved the relaxation of the rules for top-level domains (TLDs) to allow for arbitrary TLDs for whoever has the money and technical capability to grab it. If things go according to plan, by the middle of next year you may be able to just type into your browser something like http://search.google/ rather than http://www.google.com/, or perhaps you’d rather http://drink.coke/ or http://drive.ford/ or even http://have.crazy.monkey.sex/.
To quote virtually ever character in the Star Wars universe, I have a bad feeling about this.
I am so sitting on the fence on this one. My initial gut reaction is this can’t be a good thing. I know far too many non-techies who are confused by Internet addressing as it is, so let’s confuse them some more by adding even more things for them to figure out. JD Fraizer over at User Friendly hit the nail on the head; anyone who has ever used Usenet is probably rolling their eyes a lot more lately. The potential for cybersquatting and trademark dilution is enormous. ICANN insists that an “objection-based mechanism” will be in place to prevent such things, but how much red tape (and legal dollars) will someone have to go through to protect their brand? Every day that a squatter sits on a domain equates to valuable time, money, and reputation that can be lost, something big corporations may be able to wait out but little guys like me can’t afford. It’s been hard enough right now for me to keep up with all the variants of gpf-comics.something out there. And let’s not get into the discussion of what “offensive” TLDs creative individuals might come up with….
Of course, it’s not like I’m going to be registering .gpf anytime soon anyway. I suppose that’s one thing ICANN did right: to create your own TLD, you’ll need a truck load of money first. The CBC is reporting an estimated $100,000 per TLD—I have no idea if that’s Canadian dollars or not—but ICANN only says for now that “fee information is not yet available”. Ordinary domain names are dirt cheap nowadays, which is a blessing to small-time operators like me but a curse in that squatters with cash to burn can snap up thousands at a time and hold them for ransom. At least starting a new TLD will take capital, making it a serious investment. It will also be quite a technical undertaking; owning a TLD also means you have to build the infrastructure support it. So if Google were to grab .google with their pocket change, they’ll also need to pony up the hardware and bandwidth to maintain the root server. Google may be a bad example (they’ve got servers to spare, I’m sure), but for organizations not used to maintaining that kind of “big iron” it will be a significant learning curve.
But then it occurred to me… how awesome would it be if all your favorite comics or comic-related sites could found at “something dot comics”?
Imagine if you will that some philanthropic comics creator/reader with a hundred grand in “mad money” under his bed were to snatch up .comics and register that with ICANN. Being philanthropic, this individual would charge a minimal fee to register a domain there, just enough to cover operational costs and maybe make a modest living in the process, aggregated out to anticipated demand (of which I’m sure there’d be plenty). There would be only one additional requirement for application beyond the current standard (ethical) process: the domain must be used for a site publishing, promoting, or discussing comics in some way, shape, or form. Consideration for approval would require proof of content, such as a preview development site, previously published work, portfolios, etc.—just enough to prove the site really will be used for something comic-related. Individual titles would be encouraged to register at the root level (dilbert.comics, gpf.comics, x-men.comics) while companies would register their names (dc.comics, marvel.comics, keenspot.comics) and potentially use sub-domains for their own titles (x-men.marvel.comics). Our hypothetical philanthropic registrar would also be fair and balanced as to not let big conglomerates dominate the little guys. Disputes over domains would come down to traditional copyright and trademark resolutions, requiring proof of prior art, etc.
Wouldn’t that be just grand?
Of course, what will really happen will be that some big company will come along and buy up .comics with far more misanthropic intentions (and we know such an obvious TLD wouldn’t sit dormant for long). They’d either squirrel it away selfishly for promoting their own works and no one else’s, or they’ll charge such an exorbitant “premium” price for registrations that only big publishing houses like DC, Marvel, etc. will be able to afford it, shutting out the little independents and webcomics. Even if they price it fairly and keep it open, I’d bet it would get so swamped with squatters that the novelty of the whole TLD would become as diluted .info is today. Maybe it’s just that I’m pessimistic… or that I’ve been annoyed for so long that some jerk had been holding gpf-comics.org hostage for years… but I just don’t see this turning into as promising a possibility as I think it could be.
Oh, well. I’ve been waiting for gpf.com for nearly a decade now. I guess I can just add gpf.comics to the list. Wishful thinking….
For both of you out there who care, WinHasher has now been bumped to version 1.3. The changes are very minor, so there’s no need to upgrade unless you find the following two new features useful:
I had originally started adding support for HMAC signed hashes but have abandoned that for now. If there’s anyone out there who might actually find that useful, drop me a line and I’ll revisit the code to see what I might be able to add. Downloads can be found at the first link above.
The following is a specification proposal for a new pseudo-random character generator (PRCG), tentatively called the “Tiny Tots PRCG”. This specification is to be considered open and royalty free; everyone is free to implement and extend this specification, although attribution is appreciated. It usefulness, however, may be limited and may only be of interest to cryptographic and mathematical academics or really bored parents.
System Requirements:
Implementation:
Caveats, Limitations, and Additional Notes:
Just a head’s up to say I’ll be guest hosting Friday’s installment of the Jesus Geek podcast. I apologize in advance for any static or artifacts in the audio; chalk that up to my podcasting inexperience and not as an overall indicator of the quality of Jesus Geek as a whole. I’ll post a direct link to the download page as soon as I see that it goes live.
Update March 21: Aaaand… here it is.
The new GPF site has been running live for half a month now, and I’m proud to say things have been running incredibly smoothly. That is, at least, from my perspective; I haven’t seen any major glitches, and aside from a few typos in the comic (which are obviously independent of the site code), nobody has written me about any problems. This is especially heartening because the new site was pretty much entirely coded by hand by me, sans a few bits and pieces. (I can’t take credit for the OS, the web server software, the database engine, or the forum. But everything else… yep, that was me.)
There were a lot of motivations for writing my own archiving system, but the primary one was efficiency. While I considered trying something off-the-shelf, so to speak, like ComicPress or Drupal, I really wanted something that would be blazingly fast yet still dynamically generated to let me do things like GPF Premium on the server side, primarily for security reasons. (Server-side processing means no messy JavaScript is required by the users, thus exposing them to less risks, while Premium content doesn’t even get sent to the browser at all if Premium isn’t enabled.) So the GPF site is optimized out the wahzoo, with certain high-volume pages built once by nightly crons while others that require more interactivity reduce database queries to simple selects as much as possible. I’m never one to brag and toot my own horn, but I’m actually pretty proud of the new site and how responsive it is.
Of course, I can’t really take all the credit. I do have to give some serious props to XCache.
For those unfamiliar with PHP, it is one of many server-side, interpreted scripting languages commonly used for dynamic Web site development. The caveat, however, to any interpreted language is that on each request the source script must be read, parsed, compiled, and executed before anything is set back to the end user’s browser. This is one reason why dynamic sites are and will always be slower than serving purely static HTML files. Static HTML just needs to be read and regurgitated; anything that requires the Web server to actually think takes more time. Add to that the fact that there could be hundreds or even thousands of requests all competing at once for content and it’s a miracle anything get served at all.
XCache is one of several opcode caching extensions for PHP. Essentially, when the first request for a script is made, the script is parsed and compiled as usual. However, XCache stores the compiled code so subsequent requests can skip the parsing and compilation steps and go directly to executing the code. This significantly increases the speed of execution by eliminating one of the costliest parts of the process (except perhaps database connections). In addition, XCache also includes the ability to cache variables and objects, so commonly repeated and expensive variable generation–such as the cryptographic hashes I use for salting cookie hashes or database look-ups for common elements like the Premium subscription levels–can be stored in the cache rather rebuilt on each request.
I was first introduced to XCache by the XCache for WordPress plugin, which was probably mentioned in one of the development feeds built into the WordPress dashboard. I’ve been running this combination here on the blog for a little while with moderate success; I’m still trying to find a good balance of configuration settings to get the best results, but I’ve been happy with the results so far. Without putting much thought into it, I went ahead and installed XCache on the GPF server, hoping that it would help even if I never got a chance to optimize it. Fortunately, it has helped, and now that I’ve optimized the settings it’s exceeded most of my expectations. I’m not sure if there’s something about my code that caches better than WordPress, but GPF has done much better with XCache than the blog has.
Admittedly, I haven’t compared it to any other opcode cachers, nor have I benchmarked it against any of the competition. That said, however, I heartily recommend it to anybody running PHP applications. To get the greatest benefit, you may need to modify some code (or install a plugin if you’re using a prepackaged application) to take advantage of the variable/object caching. But even without modification the opcode caching alone makes for a vast improvement.
Not sure if anyone noticed, but both the blog and the new GPF beta test site were down last night. Our hosting service, Slicehost, informed us that a breaker blew in their data center and they were forced to bring a number of machines down to protect them. In addition, the blog server (which also hosts a number other private sites I run) stopped responding, so they had to reboot it again.
Unfortunately, while Slicehost was very informative and sent me several e-mails to keep me apprised of the situation, the sites continued to be down until early this morning. That’s when I discovered that for some bizarre reason the MySQL and Apache services were not configured to start at boot time. This is baffling, in my opinion, as I thought this was automatic with Fedora. You install the application package and, if it’s a service like this, it also installs the appropriate links in the init directories to make sure the services start on boot. Not so, apparently. I’m not sure if this is Fedora’s fault, Slicehost’s, or mine, to be honest, but it should be fixed now.
There’s one part of me thinks that this outage is an ominous sign on the eve of my leaving Keenspot. Then again, it also helped me catch a critical flaw that would have been extremely annoying if it happened a week later, after the move when thousands of readers would be hitting the new site. So I don’t know whether to be paranoid or relieved. (O_O)
Anyone interested in the history of webcomics should check out this week’s episode of the This Week in Tech (TWiT) podcast. Especially since it has nothing to do with webcomics.
Here’s my line of reasoning: In this episode, Leo Laporte and his unusual round of suspects are joined by Jonathan Coulton, geek musician extraordinaire. Aside from discussing a few topics of current note (like the death of HD DVD), they discuss a recent concert by Coulton where Leo and company joined him to play Rock Band before a nerd-filled audience. They go on to talk about the “new” Internet phenomena of niche entertainment targeting–skipping the big, mass-market blitzkrieg typically used by music, TV, and movie studios and canvasing thousands or millions of potential customers, to instead go directly to your core fans, the few dedicated people who are the ones that will really appreciate what you do. Coulton talks of making a living catering to a small handful of hard-core fans and how this is much more fulfilling that the big media alternative, where both the artist and the audience are faceless statistics on the bottom line of a balance sheet. And they discuss this with such freshness and enthusiasm, as if this is were the next new thing, some epiphany that no one has yet uncovered.
What I find so funny about it is… those of us in webcomics have already been doing this… for years.
I’ve noticed this a lot over the past near-decade of GPF‘s existence. Blogs, podcasts, and other forms of grass-roots media have all cropped up during that time, putting publishing power in the hands of the masses, becoming “innovative” and “groundbreaking” in bringing content production to the people. But a fair number of “new” trends (and problems) associated with these technologies are things I remember seeing crop up among webcartoonists several years before. Long before the term “blog” was coined, I remember chatting with other cartoonists on mailing lists and news groups, swapping ideas about search engine optimization (before that term was coined as well), getting and retaining readers, how to monetize your site, etc. It’s entertaining now to watch many tech headlines to see “fresh” ideas crop up that I’ve personally tried–and abandoned–a couple years before. It’s like the wheel reinventing itself every couple of years, only with different colors and/or materials.
Of course, I would never be so conceited to believe webcomics “did it first.” Webcomics themselves borrow heavily from the underground comics movement of the 1950s, 60s, and 70s, where small independent publishers ducked under government sensors to push out innovated and controversial content directly to the people who wanted them. What changed between then and now is that the interconnectivity of the Internet moved this from basements and back rooms to hidden mailing lists and chat rooms, eventually making its way to the mainstream, all while expanding the sphere of availability from isolated pockets of common interest to global reach. It would also be naive to believe this flow of “innovation” is one-way; RSS and other syndication technologies took off first in the blogosphere, and was only later ret-conned and shoe-horned into webcomic automation systems as a handy update notification system.
Perhaps one of the reasons bloggers and podcasters didn’t learn any lessons from webcartoonists is the difference between skill level–real or perceived, take your pick–required for entry. Cartooning obviously requires some level of artistic talent as cartooning, in all of its myriad of forms, is a form of art. It’s often a commercial art, intended more to generate revenue than anything else, but an art nonetheless, conveying ideas and emotions graphically. And while a well-crafted blog certainly requires a talent for writing, that is often easier to come by than the ability to both write and draw. Thus the critical mass of webcartoonists is much smaller than that of bloggers and podcasters, making it less noticeable to the mainstream. That’s also why “break-out” blogs now seem to be a dime a dozen, but it’s still major news when an online comic gets noticed by big media and gets optioned for TV/movie deals. Everyone knows about blogs and maybe even reads a few, but there are other comics on the “intraweb” besides Dilbert?
I’m not sure if there’s anything useful to these observations, other than the fact that they amuse me occasionally and it gives me something to post about. I’m not sure if anyone else has made these kinds of observations or, for that matter, anybody else cares. But I’ve often wondered if those underground cartoonists of yesteryear thought to same way about us webcartoonists as I have about bloggers. I’d like to think so, just because it creates a nice symmetry. I can’t wait for bloggers to sit around in the old bloggers’ home, thinking such thoughts about whatever comes next. “Those kids with their holocasts… if they had learned the lessons we did about AI search, they’d be raking the quatloos by now….”