This week an couple errors were reported in the custom CMS application I built at work a couple years ago. I haven’t touched this code in at least a year, so it took me bit to swap some mental virtual memory and recall how everything worked. I’m not sure if these “bugs” were something new that had manifested themselves after a recent platform upgrade or design flaws that had been there since the beginning only to be recently noticed. None of that really matters for the sake of this post, however. Suffice it to say there were two problems, one of which was likely to be entirely my fault but relatively easy to fix with a little bit of C# hacking.
The other problem was a bit obscure. The application is built in ASP.NET 2.0 and written entirely in C#. It also makes use of Microsoft’s AJAX Toolkit for ASP.NET to “pretty up” some of the interface interactions. Unfortunately, one particular user began to experience problems with the system recently. Since she’s the project manager, needless to say the problem was escalated to top priority with little to no delay. To make things more difficult, the problem was especially cryptic. In true Microsoft fashion, the pop-up JavaScript error dialog offered little to no useful information:
Sys.WebForms.PageRequestManagerServerErrorException: An unknown error occurred while processing the request on the server. The status code returned from the server was: 500
Google, of course, is my friend and found no shortage of pages where this turned up. The odd thing was that none of the purported causes for the error were anything that I was using.
After much searching, I finally happened upon this site. It seems Ted Jardine hit the same problem I did. He had narrowed it down to something to do with the .NET session, which he wasn’t really using but I was using extensively. What I found most interesting was his solution:
So, based on one of the comments in one of the above posts, even though I’m not touching session on one of the problem pages, I tried a hack in one of the problem page’s Page_Load:
Session["FixAJAXSysBug"] = true;
And lo and behold, we’re good to go!
I followed the various links he provided—as well as Googling for “FixAJAXSysBug” itself—and found lots more anecdotal evidence to support its usefulness. I applied this “fix” to the common header of the application to make sure it took affect everywhere and, so far, all reports seem to indicate its success.
Needless to say, I was instantly reminded of this GPF strip from the crossover with Help Desk. I can’t remember now if that joke was my idea or Chris Wright’s. It doesn’t matter now, really… it audacity is as brilliant now as it was eight years ago. The idea of setting a simple Boolean flag to “turn off bugs” is something I will always find hilarious.
Now if only all Microsoft bugs were so easy to fix….
Here’s a clarification of my recent Tweet about Diana. Sometime over the weekend Diana, our primary Linux box that serves as the backbone of our home network (DNS, file server, internal Web server, SSH gateway, SVN repository server, etc.), gave up the ghost. I only discovered this yesterday evening, so I haven’t had much time to diagnose the problem. It’s almost certainly a hardware issue. I’m thinking it’s the power supply or the motherboard, as when I try to power her up, nothing happens. The power light comes on, I can watch the CPU fan twitch like it wants to start spinning, but otherwise nothing else visible occurs. No output makes its way to the monitor so there are no error messages to follow.
At this point, I’m not sure of the status of the hard drives. My hope is that they’re fine; the obvious problem appears to be occurring before they even start to spin, as if they’re not getting any power (and that’s why I suspect it’s a power supply issue). The good news is that Demeter, her predecessor, has been sitting idle and collecting dust and has since been rapidly pressed back into service. I should be able to slip Diana’s disks into Demeter, check their integrity, and hopefully recover the data. That’s the core thing right now, getting the data off; hardware is replaceable, data is not. The only hitch is that Demeter is old enough that I’m not sure her BIOS will read Diana’s larger disks. Demeter’s current HD is already larger than her BIOS supports, though, and Linux seems to work fine in this situation, so I’m hoping that won’t be a problem. A worst-case scenario might be to throw a live Linux distro into Athena, our current “alpha” Windows XP desktop, and try to grab the data that way. (Diana’s disks are in ext3, which obviously Windows can’t read.) Both Demeter and Diana have EIDE drives while Athena uses SATA, but I’m almost certain Athena also has legacy EIDE on the motherboard somewhere; if not, I’m hosed there.
Why might this be a concern to you? Well, for one thing, Diana was one of several redundant backup locations for storing my my high-resolution original strips. Fortunately, everything from Year Nine and back has already been backed up to multiple DVDs stored in multiple physical locations, while Year Ten’s files are stored across three redundant drives (two in separate physical machines and one external USB drive). More importantly, Diana was my SVN repository server, housing all the source code for the GPF site. I have working copies of that repository in multiple locations so I’m not hurting there, but with the repository down I’m stuck manually keeping those working copies in sync. The biggest problem that may affect you guys is the humongous time sink this will be for me to repair/replace Diana and get all our internal mechanisms working again. With my day job, two hours of commute, and toddler patrol vying for my time, my comic production schedule is severely squeezed as it is. This is probably going to impact that buffer I was forced to take a hiatus in December to reclaim as I wasn’t able to increase my production, just maintain the status quo.
For those of you who might care, I’ll post updates here when I can. More frequent cries of frustration will likely come through the Twitter feed. If the comic will be severely impacted, you’ll get something in the GPF News. So keep watching those RSS feeds.
Not long ago, I took advantage of a nifty WordPress plugin to enable XML sitemaps for the blog. For those who’ve never heard of XML sitemaps (I hadn’t for quite a while), they are little XML files in a specific format that give search engines like Google hints on how to index your site. They don’t necessarily improve your search rankings per se, but they help the search engine better decide what to index, when it was last updated, relative priorities of different pages, etc. You then throw a special line into your robots.txt file or directly submit the file to the search engine to let it know the file is available. Once the engine knows about it, it will check it periodically to optimize how the site is indexed.
The plugin, of course, makes this ridiculously easy for WordPress. However, GPF gets orders of magnitude higher traffic than the blog does, so finding a way to generate sitemaps there would be ideal. I toyed with the idea for a while until I finally sat down, examined the sitemap specification, and figured out how to roll my own code. It now successfully runs via cron each morning and gives a pretty thorough census of what’s available on the GPF server. The problem is that the GPF site is divided into several parts that are largely autonomous and self-contained:
Ignoring the forum, that left me three major sub-projects for creating sitemaps. It’s easy enough to segregate these into separate files and tie them together using a “sitemap index” file, so that wasn’t a problem. The archive would just be a formatted dump of the archive database, deriving approximate update times from the posting date. The bulk of the rest of the site could be done by stepping through the file structure of the site and taking note of every HTML or PHP file and its last modification time (conveniently ignoring certain files and directories that don’t need to be counted, like access-restricted Premium pages). And that leaves the wiki.
I managed to come up with a decent wiki sitemap routine that I thought I’d share, just in case someone else might be interested. Of course, it’s not likely to be useful for massive wikis like Wikipedia—sitemaps are restricted to 10MB in size and 50,000 URLs—but something small like the GPF Wiki would be easy to submit and index. It was built using MediaWiki 1.12.0; I am uncertain what database changes may be needed for older or newer versions. Here’s my current process:
I only want to index relevant pages, including category pages. The relevant database table for this is “page”. (How… convenient). Unfortunately, this table also contains things like redirects and images. Each image has its own “page” assigned to it; try clicking on an image in Wikipedia or in the GPF Wiki to see what I mean. The time stamp of the latest revision, however, is stored in the “revision” table, joined to the page table by the latest revision ID number. So a good starting bit of SQL would be:
select p.page_title, r.rev_timestamp from page p, revision r where p.page_latest = r.rev_id and p.page_is_redirect = 0 and p.page_title not like '%.gif' and p.page_title not like '%.png' and p.page_title not like '%.jpg';
Unfortunately, this also returns a few meta pages like the sidebar and editing pages. Before selecting, I define a look-up hash of titles I want to avoid and as I loop through the results I just skip those.
The title, of course, is both the displayed title and the input portion of the URL that uniquely identifies the page. Thus, knowing the base URL (http://www.gpf-comics.com/wiki/) I can easily reconstruct the public URL of any article from the title. As with Wikipedia links, spaces have already been converted to underscores, but the rest of the string needs to be be URL encoded. This is easy enough, so we can quickly build the full URL as required by the XML schema.
The time stamp is a little bit tougher. MediaWiki stores time stamps as a 14-digit number in YYYYMMDDHHMMSS format, always in UTC time. In Perl (in which almost all my crons are coded) this is easy enough to break apart and turn into a UNIX time stamp. I then output the date in W3C ISO 8601 format as required by the schema. A sample of a resulting entry would be:
<url> <loc>http://www.gpf-comics.com/wiki/Nick</loc> <lastmod>2008-08-22T06:00:07Z</lastmod> <changefreq>monthly</changefreq> <priority>0.3</priority> </url>
Change frequency and priority are purely guesses and fudges for mine. According to the sitemap specification, priorities are purely relative to other parts of the site. I rated the wiki pages as relatively low since the wiki at GPF is considered a “supporting” page and subordinate to things like the archive. As for change frequency, the sitemap specification includes a number of predefined choices (hourly, daily, weekly, monthly, etc.). Monthly was a purely off-the-cuff guess; some pages may update more or less frequently, but monthly would be a good average. It is entirely possible to rate select pages as higher priority or frequency than others, but I decided to take the easy route and rate everything the same. To apply different values, you just need to pay special attention to the title and assign a non-default value when that title crops up.
Well, I hope someone out there might find this helpful. I’m not sure if it really helps anyone find anything at GPF, but it was a fun little exercise nonetheless.
I hope to post more on this when there’s more data to post, but I thought I’d throw up a quick note stating that the latest episode of the Security Now! “netcast” features a question posed by yours truly. (The best part was listening to Leo Laporte stumble over my long-winded rambling.
) The high-quality version of the show can be found at the previous link; a low-bandwidth version as well as a text-only transcript can be found at the corresponding page at GRC.com. A search in the transcript for “Darlington” will take you to the beginning of my question; in the netcast, it starts around 38 minutes, 22 seconds in. (Of course, I encourage everyone to read/listen to the entire thing.)
For the full effect, though, you’ll also need to listen to/read the previous two non-Q&A episodes of the show, #149 and #151. (Low-bandwidth and trascriptions can be found here and here.) The entire dialog concerns the recent trend of ISPs selling out their customers to allow third-party advertisers to come in and install hardware at the ISP to facilitate tracking the ISPs’ customers’ surfing habits across sites. While the ad companies in question claim to not be recording personally identifyable information about the ISPs’ customers, the capability is there and the possibilities for abuse are enormous. It brings back many shades of the DoubleClick controversies of the late 1990s-early 2000s, only much more ominous. I provided a unqiue standpoint to the discussion: that of a Web developer hosting a site and encountering similiar mysterious “first party” cookies set for my domain but not set by me.
The full body my question is present, but I’m not completely satisfied with the answer.
Let’s just say I think Steve Gibson made an assumption about the GPF site that’s not 100% true. I’ve replied to his response with additional information. I don’t necessarily expect another response (he does, after all, have his own agenda to follow on his show), and even if he does it will likely be in episode #154, the next scheduled Q&A episode. If anyone is interested, I’ll post updates if and when this occurs. If I don’t get a response, I’ll post my response here, especially since it contains some disturbing observations about “first party” cookies that have mildly paranoid folks like me nervous. (I’d hate to see what it does to really paranoid people.)
So ICANN, the organization that oversees the doling out of domain names on the Internet, has approved the relaxation of the rules for top-level domains (TLDs) to allow for arbitrary TLDs for whoever has the money and technical capability to grab it. If things go according to plan, by the middle of next year you may be able to just type into your browser something like http://search.google/ rather than http://www.google.com/, or perhaps you’d rather http://drink.coke/ or http://drive.ford/ or even http://have.crazy.monkey.sex/.
To quote virtually ever character in the Star Wars universe, I have a bad feeling about this.
I am so sitting on the fence on this one. My initial gut reaction is this can’t be a good thing. I know far too many non-techies who are confused by Internet addressing as it is, so let’s confuse them some more by adding even more things for them to figure out. JD Fraizer over at User Friendly hit the nail on the head; anyone who has ever used Usenet is probably rolling their eyes a lot more lately. The potential for cybersquatting and trademark dilution is enormous. ICANN insists that an “objection-based mechanism” will be in place to prevent such things, but how much red tape (and legal dollars) will someone have to go through to protect their brand? Every day that a squatter sits on a domain equates to valuable time, money, and reputation that can be lost, something big corporations may be able to wait out but little guys like me can’t afford. It’s been hard enough right now for me to keep up with all the variants of gpf-comics.something out there. And let’s not get into the discussion of what “offensive” TLDs creative individuals might come up with….
Of course, it’s not like I’m going to be registering .gpf anytime soon anyway. I suppose that’s one thing ICANN did right: to create your own TLD, you’ll need a truck load of money first. The CBC is reporting an estimated $100,000 per TLD—I have no idea if that’s Canadian dollars or not—but ICANN only says for now that “fee information is not yet available”. Ordinary domain names are dirt cheap nowadays, which is a blessing to small-time operators like me but a curse in that squatters with cash to burn can snap up thousands at a time and hold them for ransom. At least starting a new TLD will take capital, making it a serious investment. It will also be quite a technical undertaking; owning a TLD also means you have to build the infrastructure support it. So if Google were to grab .google with their pocket change, they’ll also need to pony up the hardware and bandwidth to maintain the root server. Google may be a bad example (they’ve got servers to spare, I’m sure), but for organizations not used to maintaining that kind of “big iron” it will be a significant learning curve.
But then it occurred to me… how awesome would it be if all your favorite comics or comic-related sites could found at “something dot comics”?
Imagine if you will that some philanthropic comics creator/reader with a hundred grand in “mad money” under his bed were to snatch up .comics and register that with ICANN. Being philanthropic, this individual would charge a minimal fee to register a domain there, just enough to cover operational costs and maybe make a modest living in the process, aggregated out to anticipated demand (of which I’m sure there’d be plenty). There would be only one additional requirement for application beyond the current standard (ethical) process: the domain must be used for a site publishing, promoting, or discussing comics in some way, shape, or form. Consideration for approval would require proof of content, such as a preview development site, previously published work, portfolios, etc.—just enough to prove the site really will be used for something comic-related. Individual titles would be encouraged to register at the root level (dilbert.comics, gpf.comics, x-men.comics) while companies would register their names (dc.comics, marvel.comics, keenspot.comics) and potentially use sub-domains for their own titles (x-men.marvel.comics). Our hypothetical philanthropic registrar would also be fair and balanced as to not let big conglomerates dominate the little guys. Disputes over domains would come down to traditional copyright and trademark resolutions, requiring proof of prior art, etc.
Wouldn’t that be just grand?
Of course, what will really happen will be that some big company will come along and buy up .comics with far more misanthropic intentions (and we know such an obvious TLD wouldn’t sit dormant for long). They’d either squirrel it away selfishly for promoting their own works and no one else’s, or they’ll charge such an exorbitant “premium” price for registrations that only big publishing houses like DC, Marvel, etc. will be able to afford it, shutting out the little independents and webcomics. Even if they price it fairly and keep it open, I’d bet it would get so swamped with squatters that the novelty of the whole TLD would become as diluted .info is today. Maybe it’s just that I’m pessimistic… or that I’ve been annoyed for so long that some jerk had been holding gpf-comics.org hostage for years… but I just don’t see this turning into as promising a possibility as I think it could be.
Oh, well. I’ve been waiting for gpf.com for nearly a decade now. I guess I can just add gpf.comics to the list. Wishful thinking….
The new GPF site has been running live for half a month now, and I’m proud to say things have been running incredibly smoothly. That is, at least, from my perspective; I haven’t seen any major glitches, and aside from a few typos in the comic (which are obviously independent of the site code), nobody has written me about any problems. This is especially heartening because the new site was pretty much entirely coded by hand by me, sans a few bits and pieces. (I can’t take credit for the OS, the web server software, the database engine, or the forum. But everything else… yep, that was me.)
There were a lot of motivations for writing my own archiving system, but the primary one was efficiency. While I considered trying something off-the-shelf, so to speak, like ComicPress or Drupal, I really wanted something that would be blazingly fast yet still dynamically generated to let me do things like GPF Premium on the server side, primarily for security reasons. (Server-side processing means no messy JavaScript is required by the users, thus exposing them to less risks, while Premium content doesn’t even get sent to the browser at all if Premium isn’t enabled.) So the GPF site is optimized out the wahzoo, with certain high-volume pages built once by nightly crons while others that require more interactivity reduce database queries to simple selects as much as possible. I’m never one to brag and toot my own horn, but I’m actually pretty proud of the new site and how responsive it is.
Of course, I can’t really take all the credit. I do have to give some serious props to XCache.
For those unfamiliar with PHP, it is one of many server-side, interpreted scripting languages commonly used for dynamic Web site development. The caveat, however, to any interpreted language is that on each request the source script must be read, parsed, compiled, and executed before anything is set back to the end user’s browser. This is one reason why dynamic sites are and will always be slower than serving purely static HTML files. Static HTML just needs to be read and regurgitated; anything that requires the Web server to actually think takes more time. Add to that the fact that there could be hundreds or even thousands of requests all competing at once for content and it’s a miracle anything get served at all.
XCache is one of several opcode caching extensions for PHP. Essentially, when the first request for a script is made, the script is parsed and compiled as usual. However, XCache stores the compiled code so subsequent requests can skip the parsing and compilation steps and go directly to executing the code. This significantly increases the speed of execution by eliminating one of the costliest parts of the process (except perhaps database connections). In addition, XCache also includes the ability to cache variables and objects, so commonly repeated and expensive variable generation–such as the cryptographic hashes I use for salting cookie hashes or database look-ups for common elements like the Premium subscription levels–can be stored in the cache rather rebuilt on each request.
I was first introduced to XCache by the XCache for WordPress plugin, which was probably mentioned in one of the development feeds built into the WordPress dashboard. I’ve been running this combination here on the blog for a little while with moderate success; I’m still trying to find a good balance of configuration settings to get the best results, but I’ve been happy with the results so far. Without putting much thought into it, I went ahead and installed XCache on the GPF server, hoping that it would help even if I never got a chance to optimize it. Fortunately, it has helped, and now that I’ve optimized the settings it’s exceeded most of my expectations. I’m not sure if there’s something about my code that caches better than WordPress, but GPF has done much better with XCache than the blog has.
Admittedly, I haven’t compared it to any other opcode cachers, nor have I benchmarked it against any of the competition. That said, however, I heartily recommend it to anybody running PHP applications. To get the greatest benefit, you may need to modify some code (or install a plugin if you’re using a prepackaged application) to take advantage of the variable/object caching. But even without modification the opcode caching alone makes for a vast improvement.
Not sure if anyone noticed, but both the blog and the new GPF beta test site were down last night. Our hosting service, Slicehost, informed us that a breaker blew in their data center and they were forced to bring a number of machines down to protect them. In addition, the blog server (which also hosts a number other private sites I run) stopped responding, so they had to reboot it again.
Unfortunately, while Slicehost was very informative and sent me several e-mails to keep me apprised of the situation, the sites continued to be down until early this morning. That’s when I discovered that for some bizarre reason the MySQL and Apache services were not configured to start at boot time. This is baffling, in my opinion, as I thought this was automatic with Fedora. You install the application package and, if it’s a service like this, it also installs the appropriate links in the init directories to make sure the services start on boot. Not so, apparently. I’m not sure if this is Fedora’s fault, Slicehost’s, or mine, to be honest, but it should be fixed now.
There’s one part of me thinks that this outage is an ominous sign on the eve of my leaving Keenspot. Then again, it also helped me catch a critical flaw that would have been extremely annoying if it happened a week later, after the move when thousands of readers would be hitting the new site. So I don’t know whether to be paranoid or relieved. (O_O)
Anyone interested in the history of webcomics should check out this week’s episode of the This Week in Tech (TWiT) podcast. Especially since it has nothing to do with webcomics.
Here’s my line of reasoning: In this episode, Leo Laporte and his unusual round of suspects are joined by Jonathan Coulton, geek musician extraordinaire. Aside from discussing a few topics of current note (like the death of HD DVD), they discuss a recent concert by Coulton where Leo and company joined him to play Rock Band before a nerd-filled audience. They go on to talk about the “new” Internet phenomena of niche entertainment targeting–skipping the big, mass-market blitzkrieg typically used by music, TV, and movie studios and canvasing thousands or millions of potential customers, to instead go directly to your core fans, the few dedicated people who are the ones that will really appreciate what you do. Coulton talks of making a living catering to a small handful of hard-core fans and how this is much more fulfilling that the big media alternative, where both the artist and the audience are faceless statistics on the bottom line of a balance sheet. And they discuss this with such freshness and enthusiasm, as if this is were the next new thing, some epiphany that no one has yet uncovered.
What I find so funny about it is… those of us in webcomics have already been doing this… for years.
I’ve noticed this a lot over the past near-decade of GPF’s existence. Blogs, podcasts, and other forms of grass-roots media have all cropped up during that time, putting publishing power in the hands of the masses, becoming “innovative” and “groundbreaking” in bringing content production to the people. But a fair number of “new” trends (and problems) associated with these technologies are things I remember seeing crop up among webcartoonists several years before. Long before the term “blog” was coined, I remember chatting with other cartoonists on mailing lists and news groups, swapping ideas about search engine optimization (before that term was coined as well), getting and retaining readers, how to monetize your site, etc. It’s entertaining now to watch many tech headlines to see “fresh” ideas crop up that I’ve personally tried–and abandoned–a couple years before. It’s like the wheel reinventing itself every couple of years, only with different colors and/or materials.
Of course, I would never be so conceited to believe webcomics “did it first.” Webcomics themselves borrow heavily from the underground comics movement of the 1950s, 60s, and 70s, where small independent publishers ducked under government sensors to push out innovated and controversial content directly to the people who wanted them. What changed between then and now is that the interconnectivity of the Internet moved this from basements and back rooms to hidden mailing lists and chat rooms, eventually making its way to the mainstream, all while expanding the sphere of availability from isolated pockets of common interest to global reach. It would also be naive to believe this flow of “innovation” is one-way; RSS and other syndication technologies took off first in the blogosphere, and was only later ret-conned and shoe-horned into webcomic automation systems as a handy update notification system.
Perhaps one of the reasons bloggers and podcasters didn’t learn any lessons from webcartoonists is the difference between skill level–real or perceived, take your pick–required for entry. Cartooning obviously requires some level of artistic talent as cartooning, in all of its myriad of forms, is a form of art. It’s often a commercial art, intended more to generate revenue than anything else, but an art nonetheless, conveying ideas and emotions graphically. And while a well-crafted blog certainly requires a talent for writing, that is often easier to come by than the ability to both write and draw. Thus the critical mass of webcartoonists is much smaller than that of bloggers and podcasters, making it less noticeable to the mainstream. That’s also why “break-out” blogs now seem to be a dime a dozen, but it’s still major news when an online comic gets noticed by big media and gets optioned for TV/movie deals. Everyone knows about blogs and maybe even reads a few, but there are other comics on the “intraweb” besides Dilbert?
I’m not sure if there’s anything useful to these observations, other than the fact that they amuse me occasionally and it gives me something to post about. I’m not sure if anyone else has made these kinds of observations or, for that matter, anybody else cares. But I’ve often wondered if those underground cartoonists of yesteryear thought to same way about us webcartoonists as I have about bloggers. I’d like to think so, just because it creates a nice symmetry. I can’t wait for bloggers to sit around in the old bloggers’ home, thinking such thoughts about whatever comes next. “Those kids with their holocasts… if they had learned the lessons we did about AI search, they’d be raking the quatloos by now….”
By now, I’m assuming most of you have read Mondays GPF News item. (If you haven’t, shame on you.) GPF is leaving Keenspot, and I’m neck-deep in unit testing the new site with hopes of releasing it to beta testers soon. If you’re interested in beta testing, you can volunteer in this thread on the old forum.
However, I’ve hit upon one little programming snag, so I thought I’d put out an appeal for help. I thought the blog would be more appropriate venue for this than the forum; that assumption could be wrong, but I’ll go with it anyway. For those of you with some Web-based programming knowledge, especially in the areas of PHP and cookies, please put on your thinking caps.
As part of the new site, I’m implementing my own version of Keenspot’s PREMIUM service, reusing the old relabeling of GPF Premium. Keenspot PREMIUM is going away (for several reasons I won’t go into here), but as the service’s biggest proponent and largest beneficiary, I’d hate to lose that functionality. So the new site will launch with its own independent Premium functionality including all the old service’s features (optional ad-free surfing, weekly archives, High-Def archives, tons of exclusives like Jeff’s Sketchbook, etc.) plus a few new features that I’ve been wanting to implement but haven’t had the time or technological hoop-jumping expertise to work on at Keen.
For security reasons, I want to secure Premium sign-ups and account management via secure HTTP (HTTPS). The benefits should be obvious. By encrypting account creation & management pages, you eliminate sniffing attacks and protect user privacy. While these pages may still be susceptible to other forms of attacks (and I’ve coded them to be as resilient as I know how), encrypting the traffic end-to-end can go a long way to cutting off those vectors of attack.
However, I seem to have hit a brick wall when it comes to setting the Premium authentication cookie. Like Keenspot’s implementation, the subscriber’s browser will be “enabled” by “branding” it with a cookie, which will be read and authenticated each time the page is loaded. If valid, Premium features for that page will be turned on; if invalid, the page will default to a non-enabled state, which could be a simple as showing all ads or as complex as denying access to the content within. Unlike Keenspot’s implementation, which was JavaScript based, mine is scripted server-side in PHP, meaning it should be more accessible to a wider range of browsers and in theory more secure (no Premium content is sent at all if Premium is not enabled, rather than letting the client browser decide). My implementation has been thoroughly tested and appears to work pretty much flawlessly… with one hitch.
The problem occurs when I set the cookie over the encrypted HTTPS connection, then try to read it over unencrypted HTTP. I appears that none of my test browsers send the cookie back when the encryption state changes. The reverse is the same; if I change the URL and set the cookie over HTTP, then try to access a page via HTTPS, the encrypted page can’t see the cookie either. It works like an either-or situation, when what I really want is both. If I set a cookie over HTTPS, I want to see it in both HTTP and HTTPS mode.
PHP’s primary cookie interface is the setcookie() method (for setting) and the $_COOKIE array (for reading). setcookie() includes a boolean parameter for secure cookies, i.e. cookies that will only be sent via HTTPS. What’s annoying is that even when I set this flag to false to force it to be insecure, the scripts continue to exhibit the same behavior: cookies set via HTTP can only be read via HTTP and vice versa. I’ve also tried setting the same cookie both ways–first in one protocol, then the other, without erasing the first cookie–but that didn’t seem to work. The second cookie overwrites the first one, effectively turning it off.
I had heard that IE 6 exhibited this behavior as a bug. However, I tried the exact same tests in Firefox 2.0.0.11, Opera 9.24, and Safari 3.0.4 (all on Windows) as well as IE 7, and all reacted the same way. Cookies set over HTTP could not be read over HTTPS and vice versa. It’s a bit frustrating. Obviously, I don’t want my Premium folks to be forced to use the new site in encrypted mode all the time, as this would slow down all the pages and put a significant extra load on the server as the number of subscribers increases. But I want to protect my users’ privacy and settings (and one of my important revenue streams) by encrypting their account access.
So I guess I’m looking for answers to two questions:
Any responses via e-mail or (preferred) comments below will be appreciated.
Update March 5, 2008: Thanks to the input of many commentors below, it looks like I’ve got a solution. The problem, as usual, was somewhere between the chair and the keyboard and the faulty component has been sufficiently flogged with a wet noodle. Immense thanks to everyone who provided feedback and suggestions.
I don’t usually do link-and-run posts (I prefer to have actual content in a blog), but I thought this was disturbing enough to disseminate. I’ll probably add my own blathering commentary which will make it more than a link-and-run post anyway. (After all, I know all of you who come here really come for the blathering. I’m just so blatherful….)
I’m not sure how many of you out there follow the Security Now! podcast over at TWiT, but it’s probably obvious by now that I do, given recent posts. This past week’s episode, #119, exposes a rather unsettling fact that shouldn’t be ignored. (The high quality 64kbps MP3 can be found at that link, while a 16kbps MP3, a transcript in various formats, and additional notes can be found here.) While I encourage you to download and listen/read the facts for yourself, I’ll see if I can summarize it below for the attention-span impaired.
For a long time, I’ve defended PayPal as a method of monetary transfer. They’ve always been good to me personally, even during the stormy periods where some GPF readers boycotted them for “questionable” practices. (See the PayPal Wikipedia entry for an abbreviated history.) For that matter, many online comics wouldn’t be able to monetize themselves in any fashion if it weren’t for PayPal, as many webcomics use the service for donations and online stores. (PayPal has always been an acceptable form of payment in every incarnation of the GPF Store.) They’ve always had issues with customer service, but they’ve also been champions in anti-phishing campaigns.
But Steve Gibson and Leo Laporte have helped disclose a rather shady new practice: In a previous Security Now! episode, a listener mentioned problems downloading a software service from PayPal, only to discover that the download link was sending him to a server over at DoubleClick rather than PayPal. Since he was locally blocking access to the domain “doubleclick.net” in his hosts file, the link failed and the software would not download. Gibson promised to investigate the incident and after a number of side-tracks finally presented his results.
DoubleClick, for the few out there unfamiliar with it, is one of the Internet’s largest online advertising agencies, serving ad banners to millions of Web sites (including, indirectly, GPF). DoubleClick has long been unpopular among netizens for its questionable policies of tracking Web surfers across multiple sites, using a trick with tracking cookies to follow you from site to site. Privacy concerns were raised even further when Google, a company that itself stores and indexes a lot of personal information about its users of GMail, Ad-Sense, and other services, recently purchased DoubleClick. DoubleClick eventually bowed to pressure from the Net at large and created an opt-out page so their tracking cookie would contain “non-personally-identifiable information” and thus negate some of the tracking cookie’s effectiveness. (This opt-out page is still linked to (now indirectly, as the URL has changed) from the GPF privacy policy page.) Many folks these days, however, including myself, simply run spyware scanners like Spybot: Search & Destroy or Ad-Aware and periodically delete such tracking cookies, or just block the “doubleclick.net” domain and its subdomains using the hosts file trick mentioned above. (This is how, in part, Spybot’s immunization against cookies works.) This eliminates or at least minimizes the opportunity for your Web surfing habits to be linked personally to you.
However, PayPal’s new links bypass many of these anti-drive-by-cookie-ing techniques by sending you directly to DoubleClick’s servers, rather than inlining content like Flash or images from their site. Since these are internal PayPal URLs and not links that are expected to send you to the outside, they should be immediately suspicious. What’s even worse is that if you examine the URL closely, there appears to be some sort of “user ID” like number included that may personally identify you if you click on it. What’s even more disturbing is the number of these links you run across as you surf the PayPal site; while some obviously ad-like images contain the “doubleclick.net” URL, many links in the site bar that look like ordinary navigational links contain it as well. While Gibson points out–quite rightly–that there is no evidence to support any sort of conspiracy theories that many come to mind, it is obvious enough that some sort of information sharing is going on between the two companies, and that if a unique user identifier is indeed being passed along with the URL, there’s a likelihood that both companies can link your potential spending habits with PayPal to your surfing habits tracked by DoubleClick.
Now it’s easy to be alarmist and to say everyone should boycott PayPal. Unfortunately, so many of us in webcomics depend on PayPal for survival, so there’s no way we can easily remove ourselves from it. And there’s no competitor out there with enough critical mass to really challenge PayPal for dominance, so there aren’t many viable alternatives. Thus the only current immunization option is diligent observation.
The good news is that the DoubleClick URLs within PayPal’s site all contain at the end PayPal URL you will eventually be redirected to. It’s trivial to copy the URL, paste it into your address bar, crop out the DoubleClick portion, and go directly the the PayPal internal destination. Laporte even suggested that it won’t be long before someone comes up with a Firefox plugin that does that for you on the fly. The problem I see with this is that it won’t be long before the diabolical duo figures out savvy users are bypassing the links and they find a better way to obscure the redirection target URL so the copy/paste/edit trick will no longer work. While true encryption might be a bit too much server load for them to handle en masse, a simple ROT13 or Base64 encode might be enough to thwart all but the most stalwart gearheads.
So… should you avoid PayPal? That’s up to you. I can’t, but I’ll be a lot more careful of where I click on their site from now on.