Friday, January 18, 2008

Compressing and Obfuscating JavaScript (with Prototype & Script.aculo.us)

Abstract
My methods, trials and eventual solution for optimizing the delivery of JavaScript files via minification, obfuscation, and compression, including examples with notoriously problematic to minify libraries such as Prototype and Script.aculo.us. Some of my explanation will work on any platform, although my eventual sweet solution is Java specific and relies on a JSP taglib.

Introduction
Last fall, my company developed an election game for the primaries and caucuses called Kingmaker, which we launched in the middle of December. We built portions of the site in Adobe Flex. Although this provided some cool functionality, it increased our page sizes significantly. So, I began my search for ways to shrink down everything else on the site to compensate for the tremendous increase caused by Flex apps.

First, we compressed the images as far as possible before losing too much quality. Next, I saw to the shrinking of our JavaScript files. This presented a series of problems:
  1. I like Prototype. I wanted to keep it. But it's huge (124K uncompressed).
  2. I like Script.aculo.us. I wanted to keep that too. Again, it's huge (I only use effects, controls, slider, and dragdrop... 111K uncompressed)
  3. My list of known, and highly used JavaScript compressors were known not to be able to compress Prototype or Script.aculo.us correctly.
Why Minify/Obfuscate/Compress Anyway?
At the most basic level, regardless of connection speeds, smaller files translate into shorter download times. Sure, web browsers cache static content such as stylesheets and JavaScript files, but there is no justification for unnecessarily slowing the initial download, or relying on the client when you can do something about it quickly and easily.

As for obfuscation, you can't ever really protect JavaScript like you can server-side code/bytecode, and you shouldn't be writing JavaScript that make it easy for a malicious user to manipulate your system. Bottom line: a security risk is still a security risk when obfuscated and you should avoid it all together. That said, with a system as easy to use as the one I'll outline shortly, there's no reason not make a malicious user's goals just a little bit more difficult. On another level, perhaps there are some people you want to share your code with, and some people you don't. Obfuscation gives you more control, since obfuscated code is a major pain to read, understand, and modify safely. Plus, the substitution of tiny meaningless variable names also means a smaller size file (minimally, but smaller nonetheless).

What Didn't Work
Dean Edwards packer is a pretty common tool for packing JavaScript files. There's an online (JavaScript) version, as well as offline versions in .NET, perl, and PHP. Packer, however, uses regex, has strict syntax requirements for it to work that Prototype is known to break. Also, even after alleviating Prototype's syntax issues, the online version of packer still chokes on a few properties, though the offline versions do not. It's reported that there is an online version of the PHP version that does work, located here, but it didn't work when I was trying it. Besides, you still have to fix up the syntax first, and even then there's multiple better solutions.

JSMin is also regex based, so it has the same issues.

Fixing Prototype's Syntax
I really shouldn't say fixing... because technically (as in, according to JavaScript interpreters) what Prototype does is 100% OK. Nevertheless, regex based parsers need stricter syntax than a JavaScript interpreter. Fortunately, the guys over at Dojo had the wonderful insight to ditch the regex all together and use (you guessed it) a JavaScript interpreter! Specifically, they use a custom version of Rhino, which is an open source JavaScript interpreter used in Mozilla applications (Firefox, Camino, Thunderbird, Seamonkey, etc).

Rhino is written in Java, and as a JavaScript interpreter has more context about what's going on in a JavaScript file than a rigid regex does. Hence, Dojo ShrinkSafe came into the world with the ability to safely compress JavaScript files without additional strict syntax requirements. By running files through Dojo ShrinkSafe first, they could then be packed/obfuscated additionally by regex packers because the post Rhino treatment was syntactically 'fixed' for a regex packer.

But there was a remaining problem: If you're using the Dojo Toolkit, then your files are automatically compressed using ShrinkSafe, but I am not and I didn't want to have to go through those steps to manually re-minify a JavaScript file every time I edited it... so my search continued.

A Changing Solution
Shortly, I discovered the pack:tag JSP Tag Library, which seemed to be what I was looking for. The pack:tag had many benefits:
  1. Simultaneous minification and obfuscation
  2. gzip compression
  3. Combination of static resources to minimize round-trips from client to server
  4. Caching of compressed static resources via a memory (Servlet) or file cache, and thus minification, obfuscation, and compression at resource request time (as opposed to compile time)
  5. Pluggable compression algorithms (and an implementable interface for creating your own packing strategy)
  6. Configuration via a .properties file
Four packing strategies are included: 2 for CSS (Isaac Shlueter's CSS Compressor, and the iBloom CSS Compressor), and 2 for JavaScript(JSMin, and the YUI Compressor).

My favorite additional benefits are as follows:

First, the pack:tag allows you to edit your JavaScript files in their un-minified and un-obfuscated form, thus relinquishing me from having to either edit a minified file or use Dojo Shrinksafe and the perl version of Dean Edward's packer every time I changed a file.

Second, the pack:tag checks to ensure that a resource is not included more than once in the same page, automatically ignoring subsequent requests, which can accidentally happen quite easily when using multiple JSP includes (used in multiple places) to dynamically build a page.

Third, the pack:tag allows you to keep your uncompressed (and thus easy to read and edit) JavaScript files within the WEB-INF directory of a web application, where they are protected from prying eyes.

Using the pack:tag as simple as including the JSP Taglib declaration in your JSP, and then using the following instead of normal <script> tags to include JavaScript files:

<pack:script src="/my/file.js" />

Or, to combine multiple resources:

<pack:script>
<src>/my/file1.js</>
<src>/my/file2.js</>
</pack:script>

pack:tag Refined
The only remaining problem with the pack:tag was the same one I discussed earlier: JSMin and other regex based packers do not safely minify Prototype. Fortunately, pluggable compressor strategies come to the rescue! As of version 2.2, the YUICompressor can safely compress Prototype (thanks to the fact that it, like Dojo ShrinkSafe, is implemented using Rhino). This was an amazing development because I now had a solution whereby I could hide my JavaScript files, edit them with complete clarity, and have them automatically combined, minified, obfuscated, and gzip compressed at request time, then cached for the next request!

An added benefit of using the YUICompressor is a significant amount of logging output that complains to you're doing something stupid. Unfortunately, I haven't found a way to turn off that output, so it does fill up the log a bit when compressing files wherein I reference functions that are declared in other files (which it has no context of, unless the files are combined before processing).

Statistical Results
For testing, all downloaded file size measurements were taken using the Firefox plugin Firebug, version 1.05.

I wanted large, commonly used libraries for testing. Since I happen to be a big fan of Prototype and Script.aculo.us, and since Prototype has been my resident problem in this description, I decided those two libraries would be sufficient. I use the same technique outlined above on my own JavaScript files with wonderful results. Since I mostly use Script.aculo.us for the effects, dragdrop, and slider, and thus never include all the components at once, I decided to measure each file individually rather than Script.aculo.us as a whole package.

Terminology
Minification - refers to both minifying and obfuscating the JavaScript
Individual - Each JavaScript file was included separately in the page; for the base case via :

<script type="text/javascript" src="/js/prototype.js"><script>

Or for the other scenarios using pack:tag to include each with:

<pack:script src="/js/prototype.js" />

Grouped - The JavaScrpt files were included together in one file. For the case with minification and gzip via:

<pack:tag>
<src>/js/prototype.js</src>
<src>/js/effects.js</src>
<src>/js/controls.js</src>
<src>/js/slider.js</src>
<src>/js/builder.js</src>
<src>/js/dragdrop.js</src>
<src>/js/sound.js</src>
</pack:tag>

For the minified test without gzip, I created a new file (total.js) with the output of the above grouping and included it via:

<script type="text/javascript" src="/js/total.js"></script>

Versions used in testing
Prototype 1.6
Script.aculo.us 1.7.1 Beta 3

Individual Results


Grouped Results


Conclusion
Compression in some form can significantly decrease file size and, implicitly, download times. Different methods of compression yield varying degrees of benefit. Namely, Gzip (which, by the way, is Prototype's 'officially supported' compression method) yields the greatest initial benefit, with minification yielding additional benefit. On the whole, I now have a setup that I don't have to do anything. I create my JavaScript files as if I were not compressing, include them with slightly different syntax, and the pack:tag with the YUICompressor strategy takes care of everything else.

Adendum
As a testament to this conclusion, more compression options have become available lately. John-David Dalton has released a collection of compressed Prototype and Script.aculo.us versions on Google Code. He explained his process in an Axajian post last December:

I format the code manually, fixing semi-colons and fixing references to $super. I run them through a compressor with quotes around the $super vars so they aren’t changed then fix their method arguments. I use Dean Edward’s Packer because it creates the smallest files. From there you can use a server side solution to gzip/version/and deploy the file. I use Prado (www.pradosoft.com) and their asset publishing capabilities.

I have a Blog but it’s currently in the early stages, I never have time to work on it.

Basically it's the process I described above, except manually fixing the syntax 'problems' rather than using a JavaScript interpreter to do it for you. It's a great solution for non-Java server environments where the pack:tag is not an option.

Thursday, January 10, 2008

Update and things to come

Been a while since I've posted. To update, I was swamped working on my latest project, Kingmaker, but there should be a flurry of blog posts soon as I have a backlog of ideas I want to write about.

In other good news, I was introduced to an awesome program for OS X called Journler, which has become one of my most frequently used programs due to its versatility and thus my ability to use it practically however I want. It is now my second most recommended download for OS X following Quicksilver.

I finally upgraded my Apple MacBook Pro to 3GB of RAM. I have had 2GB since I first got it. Although that was fine for a while, I discovered that attempting to run Netbeans, Adobe Flex Builder, and Photoshop with the usual accompaniment of Apple Mail, web browsers, iChat, and iTunes resulted in a ridiculously high degree of paging and significantly decreased my productivity as I sat there waiting to switch between programs or waiting for programs to start and exit.

Adding the additional RAM made an incredible difference and my productivity has been much the better for it. I only wish that I could have 4GB... which brings to mind that I debated putting in a matched pair of 2GB sticks knowing that my MBP actually supports up to 3.3GB. Ultimately, I decided the extra 0.3GB and having a matched pair was not worth the price of the stick. I had debated moving to 3GB knowing that doing so would mean not having a matched pair and thus losing the 6-8% performance increase recorded for using a matched pair in MBPs. I reasoned that the decrease in paging to move from 2GB to 3GB would significantly outweigh the performance increase from a matched pair and my experience thus far with 3GB has confirmed this.

Blogs to look forward to soon:

1) The continuation of my series on setting up a glassfish cluster and mysql cluster.

2) Effectively and safely packing JavaScript files (including Prototype and Scriptaculous) using the pack:tag JSP-Taglib.

3) My attempts to use HA-JDBC on my glassfish cluster as a way of load balancing requests to the SQL nodes in my MySQL cluster.

4) Using Quartz for scheduling in Glassfish.

5) A Glassfish Admin GUI JDBCRealm bug that affects the containers ability to detect and pick up changes to the realm security.

6) Prototype specific IE6 JavaScript quirks.

7) Minifying DWR

8) Clustered second-level caching with JPA and Hibernate in my glassfish cluster. My 24-hour attempt to use JBoss TreeCache, and the subsequent 30-minute solution using Ehcache.

... But not necessarily in that order.