A Property By Any Other Name, Part 2

Friday, January 21, 2011 | 1:52 PM

This is the second in a series of blog posts on how Closure Compiler decides what properties to rename with the --compilation_level=ADVANCED_OPTIMIZATIONS flag. Part 1 talks about the current algorithm. This blog post will focus on property renaming policies that we tried that didn't work so well.

In the beginning, we tried to use coding conventions to decide when to rename things. Uppercase property names (foo.MyMethod) were renamed, and lowercase property names were not renamed. This didn't work well. One man's internal code is another man's external library. Sometimes you really didn't want uppercase property names to be renamed. Changing your code to use this convention meant breaking your API.

Later, we tried to move towards "smarter" algorithms, ones that did not require the programmer to be aware of renaming policies. These were called the "heuristic" property renaming policies. These algorithms looked at the entire input to the compiler, and tried to find all writes and reads to a specific property. If it saw at least one write to the property, and was reasonably sure that all the reads of that property came from those writes, then it renamed them all.

In small apps, heuristic renaming policies worked well. They were not very powerful, but they were easy to migrate to. Even when you didn't declare all the properties on external objects in the externs file, you'd usually still be ok. There would be no property writes to that property name, so the compiler wouldn't try to rename it.

But for medium to large apps, these advantages were a curse. Consider the following code:

/** @param {Object} json Some external JSON.
function f(json) {
return json.estate;
}
window['__receive_json'] = f;

// ...

// in some other code base
Foo.prototype.estate = 3;
f(new Foo());


If these were the only two appearances of the property estate in your binary, the compiler would rename it. The compiler can't tell that you're calling f from external code, and that you expect estate to be preserved.

You could have this piece of code that worked for years and years. Then, somebody who you never met could add Foo.prototype.estate in a different part of the codebase. It would break your code for no obvious reason, and the breaking change would be difficult to track down. When we have common JavaScript libraries, this becomes orders of magnitude more problematic. Adding Foo.prototype.estate could break any of the 25 products that depend on your library in subtle and difficult-to-debug ways.

Even if you did find the problem, how would you work around it? If this is shared code, then changing json.estate to something like json['estate'] might break other projects that depend on it, because their binaries do expect estate to get renamed.

Because of these problems, most projects that use Closure Compiler do not use heuristic renaming algorithms. But heuristic renaming wasn't a total failure. We learned some useful lessons:

  • If the compiler looks at your whole program to determine whether a property should be renamed, then that means a change in one part of the program can change property renaming in an unrelated part of the program.
  • If your code is shared across projects, then you probably want the property to be renamed in all projects or none of them.
  • When renaming properties, it's better to be transparent and 90% accurate than to be cryptic and 99% accurate.


Could we use these lessons to develop a better renaming algorithm? We'll talk about this more in Part 3.

3 comments:

Unknown said...

Could you guys please consider slightly increasing the quote for the closure compiler api? I have well over 30 separate files that need to be compiled and that I have attempted to automate thru a batch job. Run this twice for testing and I get kicked off...

PS: Thank you for the software

Unknown said...

*Quota

Nick said...

@MaximG: Some of the closure-compiler webservice's quota is self-enforced. But some of it is enforced from above us by AppEngine. We couldn't exceed those quotas even if we wanted to.

Have your tried plovr? (http://plovr.com/)

It's open-source, so you can run it locally without any quota at all.