A Property By Any Other Name, Part 1

Tuesday, January 18, 2011 | 1:47 PM

When you use Closure Compiler's --compilation_level=ADVANCED_OPTIMIZATIONS flag, the compiler will try to rename properties on your objects. For example, it may rename x.longPropertyName to x.a.

Because property renaming is a complex topic, we're going to split this discussion up into three blog posts. Part 1 is about the property renaming that you get with ADVANCED_OPTIMIZATIONS. Part 2 will be about other property renaming algorithms we've tried that didn't work so well. Part 3 will be about property renaming algorithms that we're currently experimenting on and are available from the Java API.

If you're using Closure Compiler's Java API, you have more fine-grained control over what renaming the compiler does. The API treats variable renaming (foo.bar -> a.bar) and property renaming (foo.bar -> foo.a) as completely independent optimizations. You can choose a variable renaming policy and a property renaming policy. The best property renaming policy, "All Unquoted," is what you get when you use ADVANCED_OPTIMIZATIONS. Most large Google projects use it. It significantly changes how we write JavaScript.

In the general case, a compiler can't rename properties at compile-time. You simply don't have enough information to try. There will always be objects that come from external sources that the compiler can't see (like JSON responses from the server), and property names that are undecidable. (Consider the expression foo[undecidableFunction()] = function(){};.) So property renaming can never be perfect. There will always be rules and gotchas.

Before we talk about the best property renamer, "All Unquoted," we have to define what we mean by "best." Usually, we use three criteria.

  1. Power: How much smaller does it make your code?

  2. Failure Cases: If it renames properties incorrectly, how easy is it to figure out what went wrong? How easy is it to fix the issue? If you make a change, how confident can you be that it won't break renaming?

  3. Migration: How easy is it to update a legacy codebase so that it can take advantage of property renaming?

"All Unquoted" renaming was designed to optimize for #2: making the failure cases easy to debug and correct. By design, the algorithm is transparent and simple.

  1. If a property is in the externs file, don't rename it.

  2. If the property appears in quotes, don't rename it.

  3. Otherwise, rename it.

For example, if you write the code:

var obj = {
alice: true,
'bob': true
obj.claire = true;
obj.document = true;
window['obj'] = obj;

then you will get something that looks like this:

var a = {
a: true, // alice was not in quotes or in the externs file
bob: true // bob was in quotes
a.b = true; // claire was not in quotes or in the externs file
a.document = true; // document was in the externs file
window.obj = a; // obj was in quotes

If you use the --debug flag, the same properties still get renamed, but now it will be much easier to see what the original names were:

var $obj$$ = {
$alice$: true,
'bob': true
$obj$$.$claire$ = true;
$obj$$.document = true;
window['obj'] = $obj$$;

This has some nice features that help debugging and development. It's straightforward for the average programmer to look at the compiled output and figure out why the compiler is renaming something, and what the name should be. If we decide that we don't want the property "claire" to be renamed, then we can change obj.claire to obj['claire'].

Furthermore, this convention makes it easier to read and refactor code written by a large team of JavaScript developers. If Bob is using "All Unquoted," and he has a method defined as Foo.prototype.methodA = fn, then Alice can easily search for all calls to methodA in the codebase. She doesn't have to worry about opaque accesses to the method, like obj['method' + 'A'], because she knows that the compiler will break those accesses anyway.

Another way to look at it is that we use the dotted access (obj.alice) for compile-time property lookups, and array access (obj['alice']) for run-time property lookups.

But there are some big downsides to "All Unquoted." As you saw above, any property in the externs file cannot be renamed anywhere in the program. When your methods have the same names as native browser methods, they won't be renamed. It's not as powerful as it could be.

Even worse, it is difficult to safely convert a legacy codebase to use this renaming policy. Because you never had to declare your "external" properties, you probably didn't, and it's a bear to find them all.

Could we do better? Maybe there's additional information we could leverage to get fewer false positives and false negatives? Perhaps we could look at the type annotations, or at other property assignments in the program? We'll get to those questions in the next two posts.


LJHarb said...

Since obj.alice is equivalent to obj['alice'] - why can't the compiler handle unquoted property names correctly by just converting them to the quoted version?

It seems like using dot notation (ie, unquoted property names) implicitly /always/ means that it's external in my code. At the very least, the compiler should have an option or support a pragma so that I am not forced to rewrite my Javascript to support what is for me, overly aggressive property name optimization.