A Property By Any Other Name, Part 3

Tuesday, January 25, 2011 | 2:15 PM

This is the last in a series of blog posts on how Closure Compiler decides what properties. Part 1 was about the --compilation_level=ADVANCED_OPTIMIZATIONS flag and part 2 was about renaming policies that didn't work. This blog post will be a bit of a grab bag of newer tools for better property renaming.

Using Type Checking to Fill in Missing Externs


Your externs file should declare all the properties that are defined outside of your program. If you haven't done this before, then finding all those properties can be a pain. Fortunately, Closure Compiler can point us in the right direction by checking for missing properties.

There are a few ways to turn on the missing properties check. For example, you can specify the flags:
--warning_level=VERBOSE --jscomp_warning=missingProperties
or
--jscomp_warning=checkTypes
The compiler will try to find places where you've used dot access (foo.bar) to read a property that can't possibly be defined on that object, perhaps because you've forgotten to declare it in your externs.

Notice that this check is subtly different than compiler checks in more static languages, like Java. The Java compiler uses a "must-define" approach: the property bar must be declared on all possible values of foo (except null), or it will be a compiler error. Closure Compiler uses a "may-define" approach. It only requires that some possible value of foo has a property bar. For example, if you have:

function f(x) {
return x.apartment;
}


Closure Compiler will emit a warning if the apartment property is not assigned anywhere in the program. Similarly, if you have:

/** @param {Element} x */
function f(x) {
return x.apartment;
}


the compiler will emit a warning if the apartment property is not assigned on any object that could possibly be an Element. It will not emit a warning if apartment is assigned on some specific subtype of Element, like HTMLTextAreaElement.

As you can see, you don't need complete type annotations to use this check, but more type annotations will get better results.

Using Type Analysis for Better Renaming


If a property is listed in the externs file under the "All Unquoted" naming policy, that property will not be renamed anywhere in the program. This is a bummer. We went through a lot of trouble to make the Closure Library events API consistent with the native browser events API. But because we gave the methods the same name as extern methods, those methods can't be renamed. Could we use type information to differentiate between method calls on user-defined objects from method calls on browser-defined objects?

We can. Closure Compiler's Java API has two options: disambiguateProperties and ambiguateProperties.

Disambiguation means that Closure Compiler will look at all accesses to a property
x in the program. If two types have a property xx// externs file
/** @constructor */ function Element() {}
Element.prototype.id;

// user code
/** @constructor */
function MyWidget() {
this.id = 3;
}

Disambiguate properties will rename this to something like:

/** @constructor */
function MyWidget() {
this.MyWidget$id = 3;
}


By design, disambiguate properties gives things very verbose names, on the assumption that they will be given shorter names by the "All Unquoted" naming policy. This makes it easier to debug, because you can turn off this optimization independently of "all unquoted" property renaming.

Disambiguate properties allows us to rename some properties that are in the externs file, but it creates a new problem: there are more unique properties, which makes the gzipped code bigger. To solve this problem, we use ambiguateProperties to minify the number of unique properties. Ambiguate properties will look at two property names on different objects such that there's no chance those objects will appear in the same variable. Then it will give those properties the same name.

Disambiguate and ambiguate properties are very conservative. They will only rename things if they are reasonably sure that it's safe to do so. (They can never be absolutely sure, because you could always pass in external objects that violate the declared types.) These optimizations only make sense when used in conjunction with "All Unquoted" renaming.

Short Names Aren't Necessarily Better


So far in this series, we've been assuming that short names make your binary smaller, and will always be better than long names.

That makes sense if you're sending your entire JS file across the network. But what if your visitors had old versions of your JS in their cache? You might want to figure out what version they have, and send them only the parts that had changed.

This is called delta encoding. If the compiler is choosing the shortest possible names for your properties, then small changes to your JS may change every compiled property in the output. The delta between two versions may be as large as the original files.

What we really want is a way to tell the compiler, "give these properties the shortest possible names, unless you've seen them before, and in that case give them the same name you gave them last time." The compiler has 4 flags for this:

--variable_map_input_file
--variable_map_output_file
--property_map_input_file
--property_map_output_file


The output maps from one compilation can be used as the input map for the next compilation. Although the compiler cannot guarantee that a property will be renamed the same way in both binaries, it will make a best-effort attempt to do so.

That covers most of our major property renaming policies. If you have ideas for more, let us know at the Closure Compiler discussion group.

A Property By Any Other Name, Part 2

Friday, January 21, 2011 | 1:52 PM

This is the second in a series of blog posts on how Closure Compiler decides what properties to rename with the --compilation_level=ADVANCED_OPTIMIZATIONS flag. Part 1 talks about the current algorithm. This blog post will focus on property renaming policies that we tried that didn't work so well.

In the beginning, we tried to use coding conventions to decide when to rename things. Uppercase property names (foo.MyMethod) were renamed, and lowercase property names were not renamed. This didn't work well. One man's internal code is another man's external library. Sometimes you really didn't want uppercase property names to be renamed. Changing your code to use this convention meant breaking your API.

Later, we tried to move towards "smarter" algorithms, ones that did not require the programmer to be aware of renaming policies. These were called the "heuristic" property renaming policies. These algorithms looked at the entire input to the compiler, and tried to find all writes and reads to a specific property. If it saw at least one write to the property, and was reasonably sure that all the reads of that property came from those writes, then it renamed them all.

In small apps, heuristic renaming policies worked well. They were not very powerful, but they were easy to migrate to. Even when you didn't declare all the properties on external objects in the externs file, you'd usually still be ok. There would be no property writes to that property name, so the compiler wouldn't try to rename it.

But for medium to large apps, these advantages were a curse. Consider the following code:

/** @param {Object} json Some external JSON.
function f(json) {
return json.estate;
}
window['__receive_json'] = f;

// ...

// in some other code base
Foo.prototype.estate = 3;
f(new Foo());


If these were the only two appearances of the property estate in your binary, the compiler would rename it. The compiler can't tell that you're calling f from external code, and that you expect estate to be preserved.

You could have this piece of code that worked for years and years. Then, somebody who you never met could add Foo.prototype.estate in a different part of the codebase. It would break your code for no obvious reason, and the breaking change would be difficult to track down. When we have common JavaScript libraries, this becomes orders of magnitude more problematic. Adding Foo.prototype.estate could break any of the 25 products that depend on your library in subtle and difficult-to-debug ways.

Even if you did find the problem, how would you work around it? If this is shared code, then changing json.estate to something like json['estate'] might break other projects that depend on it, because their binaries do expect estate to get renamed.

Because of these problems, most projects that use Closure Compiler do not use heuristic renaming algorithms. But heuristic renaming wasn't a total failure. We learned some useful lessons:

  • If the compiler looks at your whole program to determine whether a property should be renamed, then that means a change in one part of the program can change property renaming in an unrelated part of the program.
  • If your code is shared across projects, then you probably want the property to be renamed in all projects or none of them.
  • When renaming properties, it's better to be transparent and 90% accurate than to be cryptic and 99% accurate.


Could we use these lessons to develop a better renaming algorithm? We'll talk about this more in Part 3.

A Property By Any Other Name, Part 1

Tuesday, January 18, 2011 | 1:47 PM

When you use Closure Compiler's --compilation_level=ADVANCED_OPTIMIZATIONS flag, the compiler will try to rename properties on your objects. For example, it may rename x.longPropertyName to x.a.

Because property renaming is a complex topic, we're going to split this discussion up into three blog posts. Part 1 is about the property renaming that you get with ADVANCED_OPTIMIZATIONS. Part 2 will be about other property renaming algorithms we've tried that didn't work so well. Part 3 will be about property renaming algorithms that we're currently experimenting on and are available from the Java API.

If you're using Closure Compiler's Java API, you have more fine-grained control over what renaming the compiler does. The API treats variable renaming (foo.bar -> a.bar) and property renaming (foo.bar -> foo.a) as completely independent optimizations. You can choose a variable renaming policy and a property renaming policy. The best property renaming policy, "All Unquoted," is what you get when you use ADVANCED_OPTIMIZATIONS. Most large Google projects use it. It significantly changes how we write JavaScript.

In the general case, a compiler can't rename properties at compile-time. You simply don't have enough information to try. There will always be objects that come from external sources that the compiler can't see (like JSON responses from the server), and property names that are undecidable. (Consider the expression foo[undecidableFunction()] = function(){};.) So property renaming can never be perfect. There will always be rules and gotchas.

Before we talk about the best property renamer, "All Unquoted," we have to define what we mean by "best." Usually, we use three criteria.

  1. Power: How much smaller does it make your code?

  2. Failure Cases: If it renames properties incorrectly, how easy is it to figure out what went wrong? How easy is it to fix the issue? If you make a change, how confident can you be that it won't break renaming?

  3. Migration: How easy is it to update a legacy codebase so that it can take advantage of property renaming?


"All Unquoted" renaming was designed to optimize for #2: making the failure cases easy to debug and correct. By design, the algorithm is transparent and simple.

  1. If a property is in the externs file, don't rename it.

  2. If the property appears in quotes, don't rename it.

  3. Otherwise, rename it.


For example, if you write the code:

var obj = {
alice: true,
'bob': true
};
obj.claire = true;
obj.document = true;
window['obj'] = obj;


then you will get something that looks like this:

var a = {
a: true, // alice was not in quotes or in the externs file
bob: true // bob was in quotes
};
a.b = true; // claire was not in quotes or in the externs file
a.document = true; // document was in the externs file
window.obj = a; // obj was in quotes


If you use the --debug flag, the same properties still get renamed, but now it will be much easier to see what the original names were:

var $obj$$ = {
$alice$: true,
'bob': true
};
$obj$$.$claire$ = true;
$obj$$.document = true;
window['obj'] = $obj$$;


This has some nice features that help debugging and development. It's straightforward for the average programmer to look at the compiled output and figure out why the compiler is renaming something, and what the name should be. If we decide that we don't want the property "claire" to be renamed, then we can change obj.claire to obj['claire'].

Furthermore, this convention makes it easier to read and refactor code written by a large team of JavaScript developers. If Bob is using "All Unquoted," and he has a method defined as Foo.prototype.methodA = fn, then Alice can easily search for all calls to methodA in the codebase. She doesn't have to worry about opaque accesses to the method, like obj['method' + 'A'], because she knows that the compiler will break those accesses anyway.

Another way to look at it is that we use the dotted access (obj.alice) for compile-time property lookups, and array access (obj['alice']) for run-time property lookups.

But there are some big downsides to "All Unquoted." As you saw above, any property in the externs file cannot be renamed anywhere in the program. When your methods have the same names as native browser methods, they won't be renamed. It's not as powerful as it could be.

Even worse, it is difficult to safely convert a legacy codebase to use this renaming policy. Because you never had to declare your "external" properties, you probably didn't, and it's a bear to find them all.

Could we do better? Maybe there's additional information we could leverage to get fewer false positives and false negatives? Perhaps we could look at the type annotations, or at other property assignments in the program? We'll get to those questions in the next two posts.