Call of the Octocat: From Google Code to GitHub

Wednesday, June 11, 2014 | 11:14 AM

Earlier this spring, Closure engineers proposed migrating our source repositories from Google Code to GitHub. We asked the open-source community whether hosting the repositories on GitHub would improve their workflow, and whether it would justify the (small) cost of migration.

The response was univocal. Closure users preferred GitHub issue tracking, documentation, and pull request systems to their Google Code equivalents. Enthusiastic users outside Google had even set up unofficial GitHub mirrors of the Closure projects to better fit into their development process.

Today, we are pleased to announce that three of the four top-level Closure projects have migrated to GitHub:

(The fourth project, Closure Linter, still needs some work to ensure a smooth migration. It will stay at Google Code for now.) If your own Git repositories currently use the Google Code repositories as a remote, use these instructions to update.

We hope this change will make it easier for developers to learn about, use, discuss, and contribute to the Closure Tools. Let us know what you think on the project-specific groups (Compiler, Library, Stylesheets).

Which Compilation Level is Right for Me?

Wednesday, September 26, 2012 | 7:28 AM

Cross-posted from the Missouri State Web and New Media Blog

When first starting with Closure Compiler, it is easy to see the names for the compilation levels and assume that advanced is better than simple. While it is true that advanced optimization generally produces a smaller file, that does not mean it is the best fit for all projects.

What is really the difference?

There are quite a few differences, but the most significant is dead-code elimination. With advanced optimizations the compiler removes any code that it knows you are not using. Perfect! Who would not want that? Turns out a lot of people because the compiler can only correctly eliminate code when you specifically tell it about ALL of the other code used in your project AND ALL of the ways that your code is used by other scripts. Everything should be compiled together at the same time. That is a pretty big gotcha.

Here is a classic example:

<html>
<head>
  <title>Advanced Optimization Gotchas</title>
  <!-- an external library -->
  <script src="jquery-1.7.2.js"></script>
  <script>
    //This section is compiled
    function ChangeBackground() {
      $('body').css('background-color', 'pink');
    }
    //Export for external use
    window['ChangeBackground'] = ChangeBackground;
  </script>
</head>
<body>
  <!-- external use of compiled code -->
  <a onclick="ChangeBackground()">Pinkify</a>
</body>
</html>

In this case we have to explicitly tell the compiler about jQuery during compilation with an extern file and we have to tell it that our ChangeBackground function is called from external code. While this is a contrived example, it illustrates a case where it probably was not worth the time to ensure compatibility with the advanced optimization level.

General decision factors

So how do you actually decide which optimization level is right for your project? Below are some of the most common factors in that decision:

Simple Optimizations Advanced Optimizations
Looking for a replacement JavaScript compressor Looking for every last byte of savings in delivered code size or in execution time
Compiling a library where the vast majority of functions are part of the publicly exposed API Authoring a very large application with multiple modules
Unwilling to make substantial changes to code style Starting a new project or are willing to make substantial changes to coding style and patterns
Using external libraries that do not have existing extern files and are not compatible with advanced optimizations Wanting the best possible obfuscation of your code to protect intellectual property
On a tight timeline that does not allow for troubleshooting obscure errors after compilation Authoring a large library but want to support users who only use a small subset of the code

Coding style factors

Most of us are proud of our JavaScript. In fact we may have some slick coding patterns that make our code elegant to read and maintain, however, not all JavaScript coding patterns compile equally with Closure Compiler advanced optimizations. If your code contains any of the following (and you are unwilling to change this) then simple optimizations would probably be the best choice for you:

  • Mixed property access methods
    Closure Compiler treats properties accessed with dotted notation (obj.prop) differently than when accessed via brackets or quoted notation (obj[‘prop’]). In fact it sees them as completely different properties. This item is first on the list for a reason: it is almost always the biggest hurdle. Because of this, the following patterns are all places which can cause problems with advanced optimizations:

    1. Building method names with strings
      var name = 'replace';
      obj[name] = function() {};
      obj[name + 'With'] = function() {};
      Obj.replaceWith(); //Mixed access problem
    2. Testing for property existence with strings
      obj.prop = 'exists';
      if ('prop' in obj) … //Mixed access problem
    3. Using a property name in a loop
      obj.prop = function() {};
      for (var propName in obj) {
        if(propName == 'prop') { //Mixed access problem
        }
      }
  • Using the “this” keyword outside of constructors or prototype methods
    var obj = {};
    //Static method using “this”
    obj.prop = function() { this.newprop = 'exists' };
    obj.prop();
    alert(obj.newprop);
    ...

    In advanced optimizations, Closure Compiler can move and refactor code in unexpected ways. Some of them include functions which are inlined and properties which are collapsed to variables. In many of these cases, it changes which object the “this” keyword references. These cases have workarounds, but without special attention your code will likely not execute as intended. To illustrate, under advanced optimizations the compiler might change the above code to:

    var obj = {};
    var a = function() { this.newprop = 'exists' };
    a();
    //Property does not exist - it is defined on the window object
    alert(obj.newprop);

Choose wisely

Regardless of the choice between simple or advanced optimizations, you can still use the many compile time code checks and warnings for your code. From a missing semicolon to a misspelled property, the compiler can assist in identifying problems with your code before your users do it for you.

So to recap, advanced is not always better than simple. Modifying an existing code base to be compatible with Closure Compiler advanced optimizations can be a daunting challenge, and it definitely is not the best choice for every project.

Subtyping Functions Without Poking Your Eyes Out

Friday, June 1, 2012 | 2:12 PM

Pop quiz: suppose you see two Closure Compiler type annotations.

A) {function(Object)}
B) {function(Array)}

Which one is the subtype?

Telling you that the right answer is (A) feels a lot like driving on the wrong side of the road. Try reading it out loud.

Right Answer: "A function taking an Object IS A function taking an Array"
Wrong Answer: "A function taking an Array IS A function taking an Object"

Saying the right answer feels so wrong! Saying the wrong answer feels so right! We often have to do multiple rounds of whiteboarding the problem to try to explain it to other JS engineers.

This is not a unique problem to JavaScript. Most object oriented languages have the same issue. But JavaScript engineers are used to living without types. With Closure, suddenly you have types. And you have lots of complex function types that real OO languages don't have, and that you're probably unfamiliar with if you haven't used a functional language.

The problem, I think, goes back to a lot of "Intro to Objects and Classes" books that teach the "is-a" relationship. An apple IS A fruit. A blog IS A website. A werewolf IS A man but also IS A wolf.

The "is-a" wording is an oversimplification that breaks down if you're not careful. This wikipedia page gives a good counterexample, obtusely explained. Suppose you define "A Doctor" as a person that can treat all patients, and a Pediatrician as a person that can only treat children. If we need a Doctor that can treat all patients, then a Pediatrician is not an acceptable substitude. So, counterintuitively, a Pediatrician is not a subclass a Doctor.

Oddly, a Pediatrician is a superclass of this "Doctor." If you need a pediatrician, a doctor that can treat all patients is an acceptable substitute.

What you really want is to define a Doctor as "a person that can treat at least one patient." You might call this the Empty Doctor or (for the calculus nerds) the Epsilon Doctor. This doctor can treat some patient, somewhere. But if you go to see the Epsilon Doctor, it's likely that they won't know how to treat you. A pediatrician is a subclass of the Epsilon Doctor, because a pediatrician can treat some patients.

Conversely, when you have a {function(Object)}, this does not mean "a function that can take some object". It means "a function that knows how to handle all objects." A {function(Object)} is substitutable any place you need a {function(Array)}.

Type-checking Tips

Friday, February 24, 2012 | 9:02 AM

Closure Compiler’s type language is a bit complicated. It has unions (“variable x can be A or B”), structural functions (“variable x is a function that returns a number”), and record types (“variable x is any object with properties foo and bar”).

A lot of people have told us that that’s still not expressive enough. There are many ways that you can write JavaScript that do not fit cleanly into our type system. People have suggested that we should add mixins, traits, and post-hoc naming of anonymous objects.

This was not particularly surprising to us. The rules of objects in JavaScript are a little bit like the rules of Calvinball. You can change anything, and make up new rules as you go along. A lot of people think that a good type system gives you a powerful way to describe how your program is structured. But it also gives you a set of rules. The type system ensures that everyone agrees on what a “class” is, and what an “interface” is, and what “is” means. When you’re trying to add type annotations to untyped JS, you’re inevitably going to run into issues where the rules in my head don’t quite match the rules in your head. That’s OK.

But we were surprised that when we gave people this type system, they often found multiple ways to express the same thing. Some ways worked better than others. I thought I’d write this to describe some of the things that people tried, and how they worked out.

Function vs. function()

There are two ways to describe a function. One is to use the {Function} type, which the compiler literally interprets as “any object x where ‘x instanceof Function’ is true”. A {Function} is deliberately mushy. It can accept any arguments, and return anything. You can even use ‘new’ on it. The compiler will let you call it however you want without emitting warnings.

A structural function is much more specific, and gives you fine-grained control over what the function can do. A {function()} takes no arguments, but we don’t care what it returns. A {function(?): number} returns a number and takes exactly one argument, but we don’t care about the type of that argument. A {function(new:Array)} creates an Array when you call it with “new”. Our type documentation and JavaScript style guide have more examples of how to use structural functions.

A lot of people have asked us if {Function} is discouraged, because it’s less specific. Actually, it’s very useful. For example, consider the definition of Function.prototype.bind. It lets you curry functions: you can give it a function and a list of arguments, and it will give you back a new function with those arguments “pre-filled in”. It’s impossible for our type system to express that the returned function type is a transformation of the type of the first argument. So the JSDoc on Function.prototype.bind says that it returns a {Function}, and the compiler has to have hand-coded logic to figure out the real type.

There are also many cases where you want to pass a callback function to collect results, but the results are context-specific.


rpc.get(‘MyObject’, function(x) {
// process MyObject
});

The “rpc.get” method is a lot more clumsy if the callback argument you pass has to type-cast anything it gets. So it’s often easier just to give the parameter a {Function} type, and trust that the caller type isn’t worth type-checking.

Object vs. Anonymous Objects

Many JS libraries define a single global object with lots of methods. What type annotation should that object have?

var bucket = {};
/** @param {number} stuff */ bucket.fill = function(stuff) {};

If you come from Java, you may be tempted to just give it type {Object}.

/** @type {Object} */ var bucket = {};
/** @param {number} stuff */ bucket.fill = function(stuff) {};

That’s usually not what you want. If you add a “@type {Object}” annotation, you’re not just telling the compiler “bucket is an Object.” You’re telling it “bucket is allowed to be any Object.” So the compiler has to assume that anybody can assign any object to “bucket”, and the program would still be type-safe.

Instead, you’re often better off using @const.

/** @const */ var bucket = {};
/** @param {number} stuff */ bucket.fill = function(stuff) {};

Now we know that bucket can’t be assigned to any other object, and the compiler’s type inference engine can make much stronger checks on bucket and its methods.

Can Everything Just Be a Record Type?

JavaScript’s type system isn’t that complicated. It has 8 types with special syntax: null, undefined, boolean, number, string, Object, Array, and Function. Some people have noticed that record types let you define “an object with properties x, y, and z”, and that typedefs let you give a name to any type expression. So between the two, you should be able to define any user-defined type with record types and typedefs. Is that all we need?

Record types are great when you need a function to accept a large number of optional parameters. So if you have this function:

/**
* @param {boolean=} withKetchup
* @param {boolean=} withLettuce
* @param {boolean=} withOnions
*/
function makeBurger(withKetchup, withLettuce, withOnions) {}

you can make it a bit easier to invoke like this:

/**
* @param {{withKetchup: (boolean|undefined),
withLettuce: (boolean|undefined),
withOnions: (boolean|undefined)}=} options
*/
function makeBurger(options) {}

This works well. But when you use the same record type in many places across a program, things can get a bit hairy. Suppose you create a type for makeBurger’s parameter:

/** @typedef {{withKetchup: (boolean|undefined),
withLettuce: (boolean|undefined),
withOnions: (boolean|undefined)}=} */
var BurgerToppings;

/** @const */
var bobsBurgerToppings = {withKetchup: true};

function makeBurgerForBob() {
return makeBurger(bobsBurgerToppings);
}

Later, Alice builds a restaurant app on top of Bob’s library. In a separate file, she tries to add onions, but screws up the API.

bobsBurgerToppings.withOnions = 3;

Closure Compiler will notice that bobsBurgerToppings no longer matches the BurgerToppings record type. But it won’t complain about Alice’s code. It will complain that Bob’s code is making the type error. For non-trivial programs, it might be very hard for Bob to figure out why the types don’t match anymore.

A good type system doesn’t just express contracts about types. It also gives us a good way to assign blame when code breaks the contract. Because a class is usually defined in one place, the compiler can figure out who’s responsible for breaking the definition of the class. But when you have an anonymous object that’s passed to many different functions, and has properties set from many disparate places, it’s much harder--for both humans and compilers--to figure out who’s breaking the type contracts.

Posted by Nick Santos, Software Engineer

Introducing Closure Stylesheets

Saturday, November 12, 2011 | 2:58 PM

(CSS is for programming, not for pasting)

When the Closure Tools were first released a little over two years ago, they gave web developers the ability to organize and optimize their JavaScript and HTML in a new way. But there was something missing, namely, a tool to help manage CSS.

You see, the nature of CSS runs contrary to the DRY principle that is exhibited in good software engineering. For example, if there is a color that should be used for multiple classes in a stylesheet, a developer has no choice but to copy-and-paste it everywhere because CSS has no concept of variables. Similarly, if there is a value in a stylesheet that is derived from other values, there is no way to express that because CSS also lacks functions. Common patterns of style blocks are duplicated over and over because CSS has no macros. All of these properties of CSS conspire to make stylesheets extremely difficult to maintain.

To this end, we are excited to introduce the missing piece in the Closure Tools suite: Closure Stylesheets. Closure Stylesheets is an an extension to CSS that adds variables, functions, conditionals, and mixins to standard CSS. The tool also supports minification, linting, RTL flipping, and CSS class renaming. As the existing Closure Tools have done for JavaScript and HTML, Closure Stylesheets will help you write CSS in a maintainable way, while also empowering you to deliver optimized code to your users. We hope you enjoy it! Please let us know what you think in the discussion forum.

By Michael Bolin, Open Source Engineer

Cross posted from the Google Open Source Blog.

A Property By Any Other Name, Part 3

Tuesday, January 25, 2011 | 2:15 PM

This is the last in a series of blog posts on how Closure Compiler decides what properties. Part 1 was about the --compilation_level=ADVANCED_OPTIMIZATIONS flag and part 2 was about renaming policies that didn't work. This blog post will be a bit of a grab bag of newer tools for better property renaming.

Using Type Checking to Fill in Missing Externs


Your externs file should declare all the properties that are defined outside of your program. If you haven't done this before, then finding all those properties can be a pain. Fortunately, Closure Compiler can point us in the right direction by checking for missing properties.

There are a few ways to turn on the missing properties check. For example, you can specify the flags:
--warning_level=VERBOSE --jscomp_warning=missingProperties
or
--jscomp_warning=checkTypes
The compiler will try to find places where you've used dot access (foo.bar) to read a property that can't possibly be defined on that object, perhaps because you've forgotten to declare it in your externs.

Notice that this check is subtly different than compiler checks in more static languages, like Java. The Java compiler uses a "must-define" approach: the property bar must be declared on all possible values of foo (except null), or it will be a compiler error. Closure Compiler uses a "may-define" approach. It only requires that some possible value of foo has a property bar. For example, if you have:

function f(x) {
return x.apartment;
}


Closure Compiler will emit a warning if the apartment property is not assigned anywhere in the program. Similarly, if you have:

/** @param {Element} x */
function f(x) {
return x.apartment;
}


the compiler will emit a warning if the apartment property is not assigned on any object that could possibly be an Element. It will not emit a warning if apartment is assigned on some specific subtype of Element, like HTMLTextAreaElement.

As you can see, you don't need complete type annotations to use this check, but more type annotations will get better results.

Using Type Analysis for Better Renaming


If a property is listed in the externs file under the "All Unquoted" naming policy, that property will not be renamed anywhere in the program. This is a bummer. We went through a lot of trouble to make the Closure Library events API consistent with the native browser events API. But because we gave the methods the same name as extern methods, those methods can't be renamed. Could we use type information to differentiate between method calls on user-defined objects from method calls on browser-defined objects?

We can. Closure Compiler's Java API has two options: disambiguateProperties and ambiguateProperties.

Disambiguation means that Closure Compiler will look at all accesses to a property
x in the program. If two types have a property xx// externs file
/** @constructor */ function Element() {}
Element.prototype.id;

// user code
/** @constructor */
function MyWidget() {
this.id = 3;
}

Disambiguate properties will rename this to something like:

/** @constructor */
function MyWidget() {
this.MyWidget$id = 3;
}


By design, disambiguate properties gives things very verbose names, on the assumption that they will be given shorter names by the "All Unquoted" naming policy. This makes it easier to debug, because you can turn off this optimization independently of "all unquoted" property renaming.

Disambiguate properties allows us to rename some properties that are in the externs file, but it creates a new problem: there are more unique properties, which makes the gzipped code bigger. To solve this problem, we use ambiguateProperties to minify the number of unique properties. Ambiguate properties will look at two property names on different objects such that there's no chance those objects will appear in the same variable. Then it will give those properties the same name.

Disambiguate and ambiguate properties are very conservative. They will only rename things if they are reasonably sure that it's safe to do so. (They can never be absolutely sure, because you could always pass in external objects that violate the declared types.) These optimizations only make sense when used in conjunction with "All Unquoted" renaming.

Short Names Aren't Necessarily Better


So far in this series, we've been assuming that short names make your binary smaller, and will always be better than long names.

That makes sense if you're sending your entire JS file across the network. But what if your visitors had old versions of your JS in their cache? You might want to figure out what version they have, and send them only the parts that had changed.

This is called delta encoding. If the compiler is choosing the shortest possible names for your properties, then small changes to your JS may change every compiled property in the output. The delta between two versions may be as large as the original files.

What we really want is a way to tell the compiler, "give these properties the shortest possible names, unless you've seen them before, and in that case give them the same name you gave them last time." The compiler has 4 flags for this:

--variable_map_input_file
--variable_map_output_file
--property_map_input_file
--property_map_output_file


The output maps from one compilation can be used as the input map for the next compilation. Although the compiler cannot guarantee that a property will be renamed the same way in both binaries, it will make a best-effort attempt to do so.

That covers most of our major property renaming policies. If you have ideas for more, let us know at the Closure Compiler discussion group.

A Property By Any Other Name, Part 2

Friday, January 21, 2011 | 1:52 PM

This is the second in a series of blog posts on how Closure Compiler decides what properties to rename with the --compilation_level=ADVANCED_OPTIMIZATIONS flag. Part 1 talks about the current algorithm. This blog post will focus on property renaming policies that we tried that didn't work so well.

In the beginning, we tried to use coding conventions to decide when to rename things. Uppercase property names (foo.MyMethod) were renamed, and lowercase property names were not renamed. This didn't work well. One man's internal code is another man's external library. Sometimes you really didn't want uppercase property names to be renamed. Changing your code to use this convention meant breaking your API.

Later, we tried to move towards "smarter" algorithms, ones that did not require the programmer to be aware of renaming policies. These were called the "heuristic" property renaming policies. These algorithms looked at the entire input to the compiler, and tried to find all writes and reads to a specific property. If it saw at least one write to the property, and was reasonably sure that all the reads of that property came from those writes, then it renamed them all.

In small apps, heuristic renaming policies worked well. They were not very powerful, but they were easy to migrate to. Even when you didn't declare all the properties on external objects in the externs file, you'd usually still be ok. There would be no property writes to that property name, so the compiler wouldn't try to rename it.

But for medium to large apps, these advantages were a curse. Consider the following code:

/** @param {Object} json Some external JSON.
function f(json) {
return json.estate;
}
window['__receive_json'] = f;

// ...

// in some other code base
Foo.prototype.estate = 3;
f(new Foo());


If these were the only two appearances of the property estate in your binary, the compiler would rename it. The compiler can't tell that you're calling f from external code, and that you expect estate to be preserved.

You could have this piece of code that worked for years and years. Then, somebody who you never met could add Foo.prototype.estate in a different part of the codebase. It would break your code for no obvious reason, and the breaking change would be difficult to track down. When we have common JavaScript libraries, this becomes orders of magnitude more problematic. Adding Foo.prototype.estate could break any of the 25 products that depend on your library in subtle and difficult-to-debug ways.

Even if you did find the problem, how would you work around it? If this is shared code, then changing json.estate to something like json['estate'] might break other projects that depend on it, because their binaries do expect estate to get renamed.

Because of these problems, most projects that use Closure Compiler do not use heuristic renaming algorithms. But heuristic renaming wasn't a total failure. We learned some useful lessons:

  • If the compiler looks at your whole program to determine whether a property should be renamed, then that means a change in one part of the program can change property renaming in an unrelated part of the program.
  • If your code is shared across projects, then you probably want the property to be renamed in all projects or none of them.
  • When renaming properties, it's better to be transparent and 90% accurate than to be cryptic and 99% accurate.


Could we use these lessons to develop a better renaming algorithm? We'll talk about this more in Part 3.