CS373 (1pm) : Comyar Zaheri


Generating Precise Dependencies for Large Software

Google Research Paper

Large software projects inherently have a large number of dependencies. Because of this, those software projects become extremely fragile when dependencies deviate from their initial design specifications. Google Research presents a viable solution to this sort of issue that seeks to identify what it calls bad dependencies and facilitate large-scale refactoring.

So, what are bad dependencies? They are defined to be underutilized dependencies and inconsistent dependencies. So called underutilized dependencies come about when only a small portion of a module is used in the project. It makes sense why underutilized dependencies are bad; they unnecessarily blow-up the code size, slow the build process, and hint at a more serious issue. That is, low utilization of a module indicates that a small portion of the code may only be loosely related to the rest of the module and should be separated out.

Inconsistent dependencies are dependencies that violate the design of the software. The example given is that a project is built without third-party libraries when one of its dependencies relies on a third-party library. While the example seemed a little vague to me, it seems like inconsistent dependencies result when software violates its own dependency rules.

Google Research describes a tool they developed in order to identify these types of bad dependencies by analyzing C++ template-related dependencies, virtual function calls, and module-level dependencies. After describing how the tool works, the paper details the results of running the tool on Chromium, which I found to be the most interesting aspect of the paper.

Some modules in Chromium have less than 20% utilization and of those modules, more than 80% of the code is dead. That's a strikingly large amount of dead code in a well-maintained open-source project. Even worse, the base module, which contains common code shared by all of Chromium's sub-projects, depends on eight other modules. The design states that it depends on none.

The tool developed by Google Research seems invaluable to any company/organization working on large software projects. Dead code simply adds fragility to software projects and contributes to what Kent Beck refers to as code smell.