The Google repository

I’ve been reading how Google organizes its codebase: they maintain a hyper-large repository containing everything, since the beginning of the company. I guess you may find Gmail, Photos, or AdWords there. You won’t find Android or Chrome, though – these are open source projects.
The repository is 86Tb of data, 1 billion of files, and 35 billion of commits. To manage this complexity, they needed to build their own tools: a home-grown Version Control System that can work effectively with such a repository at this scale, editor integration, building and automated testing tools, etc.
They develop all the code against trunk/master, meaning that if you are updating a library, you’ll also need to fix all applications that depend on it. Every project will be up-to-date, even abandoned projects.

The advantages

The main reasons they claim this approach works for them are: it makes easier reusing blocks of knowledge company-wide and reduces the friction to contribute between projects/teams. UI primitives, building tools, etc, all are shared by any project that wants them, it’s just a matter of depending on the master version. It minimizes the costs of versioning/integration and the curse of being left behind when something is updated and you cannot keep up with the changes (the experts will do it for you!).
As a side effect, when working on libraries/frameworks it’s easier to understand the performance/impact/etc of a specific change (you can run tests on real projects) and to put together a task-force to fix issues affecting several applications.

The disadvantages

This approach comes with downsides as well: they mention the amount of maintenance this setup requires even with all the tooling they have already built. With a monolithic repo, it’s easy to run into unnecessary dependencies that bloat the binary size of a project (and they do), the costs inherent to updating basic blocks used through the whole company, etc.
Another point is that it makes difficult having external contributors. Although they have a space in the repository for public/open-sourced projects, the article is unclear on how they manage 3rd-party contributions there – external programmers don’t have access to the internal building tools that Google programmers have. High-profile products like Android or Chrome -where outside contributors are expected and encouraged- have walked away from this approach.

Coda

I highly recommend reading the paper, it’s a pretty unique approach, and the article does a good job on presenting a balanced perspective.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *