Pros and Cons of Code Repository Strategies

Pros:

• Code repositories provide a centralized location for developers to store and share code. This makes it easier for developers to collaborate on projects and access code from any location.

• Code repositories can be used to track changes to code over time, making it easier to identify and fix bugs.

• Code repositories can be used to store and share documentation, making it easier for developers to understand the code they are working with.

• Code repositories can be used to store and share test cases, making it easier to ensure that code is working correctly.

Cons:

• Code repositories can be difficult to set up and maintain, especially for large projects.

• Code repositories can be difficult to secure, making them vulnerable to malicious attacks.

• Code repositories can be difficult to search, making it difficult to find the code you need.

• Code repositories can be difficult to integrate with other systems, making it difficult to use the code in other applications.

There are two main strategies for hosting and managing code through Git: monorepo vs multi-repo. Both approaches have their pros and cons.

We can use either approach for any codebase in any language. You can use any of these strategies for projects containing a handful of libraries to thousands of them. Even if it involves a few team members or hundreds, or you want to host private or open-source code, you can still go with monorepo or multi-repo based on various factors.

What are the benefits and drawbacks of each approach? When should we use one or the other? Let’s find out!

What Are Repos?

A repo (short for repository) is a storage for all the changes and files from a project, enabling developers to “version control” the project’s assets throughout its development stage.

We usually refer to Git repositories (as provided by GitHub, GitLab, or Bitbucket), but the concept also applies to other version control systems (such as Mercurial).

What Is a Monorepo?

The monorepo approach uses a single repository to host all the code for the multiple libraries or services composing a company’s projects. At its most extreme, the whole codebase from a company — spanning various projects and coded in different languages — is hosted in a single repository.

Benefits of Monorepo

Hosting the whole codebase on a single repository provides the following benefits.

Lowers Barriers of Entry

When new staff members start working for a company, they need to download the code and install the required tools to begin working on their tasks. Suppose the project is scattered across many repositories, each having its installation instructions and tooling required. In that case, the initial setup will be complex, and more often than not, the documentation will not be complete, requiring these new team members to reach out to colleagues for help.

A monorepo simplifies matters. Since there is a single location containing all code and documentation, you can streamline the initial setup.

Centrally Located Code Management

Having a single repository gives visibility of all the code to all developers. It simplifies code management since we can use a single issue tracker to watch all issues throughout the application’s life cycle.

For instance, these characteristics are valuable when an issue spans two (or more) child libraries with the bug existing on the dependent library. With multiple repositories, it may be challenging to find the piece of code where the problem happens.

On top of this, we would need to figure out which repository to use to create the issue and then invite and cross-tag members of other teams to help resolve the problem.

With a monorepo, though, both locating code problems and collaborating to troubleshoot become simpler to achieve.

Painless Application-Wide Refactorings

When creating an application-wide refactoring of the code, multiple libraries will be affected. If you’re hosting them via multiple repositories, managing all the different pull requests to keep them synchronized with each other can prove to be a challenge.

A monorepo makes it easy to perform all modifications to all code for all libraries and submit it under a single pull request.

More Difficult To Break Adjacent Functionality

With the monorepo, we can set up all tests for all libraries to run whenever any single library is modified. As a result, the likelihood of doing a change in some libraries has minimized adverse effects on other libraries.

Teams Share Development Culture

Even though not impossible, with a monorepo approach, it becomes challenging to inspire unique subcultures among different teams. Since they’ll share the same repository, they will most likely share the same programming and management methodologies and use the same development tools.

Issues With the Monorepo Approach

Using a single repository for all our code has several drawbacks.

Slower Development Cycles

When the code for a library contains breaking changes, which make the tests for dependent libraries fail, the code must also be fixed before merging the changes.

If these libraries depend on other teams, who are busy working on some other task and are not able (or willing) to adapt their code to avoid the breaking changes and have the tests pass, the development of the new feature may stall.

What’s more, the project may well start advancing only at the speed of the slowest team in the company. This outcome could frustrate the members of the fastest teams, creating conditions for them to want to leave the company.

In addition, a library will need to run the tests for all other libraries too. The more tests to run, the more time it takes to run them, slowing down how fast we can iterate on our code.

Requires Download of Entire Codebase

When the monorepo contains all the code for a company, it can be huge, containing gigabytes of data. To contribute to any library hosted within, anybody would require a download of the whole repository.

Dealing with a vast codebase implies a poor use of space on our hard drives and slower interactions with it. For instance, everyday actions such as executing git status or searching in the codebase with a regex may take many seconds or even minutes longer than they would with multiple repos.

Unmodified Libraries May Be Newly Versioned

When we tag the monorepo, all code within is assigned the new tag. If this action triggers a new release, then all libraries hosted in the repository will be newly released with the version number from the tag, even though many of those libraries may not have had any change.

Forking Is More Difficult

Open source projects must make it as easy as possible for contributors to become involved. With multiple repositories, contributors can head directly to the specific repository for the project they want to contribute to. With a monorepo hosting various projects, though, contributors must first navigate their way into the right project and will need to understand how their contribution may affect all other projects.

What Is Multi-Repo?

The multi-repo approach uses several repositories to host the multiple libraries or services of a project developed by a company. At its most extreme, it’ll host every minimum set of reusable code or standalone functionality (such as a microservice) under its repository.

Benefits of Multi-Repo

Hosting every library independently of all others provides a plethora of benefits.

Independent Library Versioning

When tagging a repository, its whole codebase is assigned the “new” tag. Since only the code for a specific library is on the repository, the library can be tagged and versioned independently of all other libraries hosted elsewhere.

Having an independent version for every library helps define the dependency tree for the application, allowing us to configure what version of each library to use.

Independent Service Releases

Since the repository only contains the code for some service and nothing else, it can have its own deployment cycle, independently of any progress made on the applications accessing it.

The service can use a fast release cycle such as continuous delivery (where new code is deployed after it passes all the tests). Some libraries accessing the service may use a slower release cycle, such as those that only produce a new release once a week.

Helps Define Access Control Across the Organization

Only the team members involved with developing a library need to be added to the corresponding repository and download its code. As a result, there’s an implicit access control strategy for each layer in the application. Those involved with the library will be granted editing rights, and everyone else may get no access to the repository. Or they may be given reading but not editing rights.

Allows Teams To Work Autonomously

Team members can design the library’s architecture and implement its code working in isolation from all other teams. They can make decisions based on what the library does in the general context without being affected by the specific requirements from some external team or application.

Issues With the Multi-Repo Approach

Using multiple repositories can give rise to several issues.

Libraries Must Constantly Be Resynced

When a new version of a library containing breaking changes is released, libraries depending on this library will need to be adapted to start using the latest version. If the release cycle of the library is faster than that of its dependent libraries, they could quickly become out of sync with each other.

Teams will need to constantly catch up to use the latest releases from other teams. Given that different teams have different priorities, this may sometimes prove arduous to achieve.

Consequently, a team not able to catch up may end up sticking to the outdated version of the depended-upon library. This outcome will have implications on the application (in terms of security, speed, and other considerations), and the gap in development across libraries may only get wider.

May Fragment Teams

When different teams don’t need to interact, they may work in their own silos. In the long term, this could result in teams producing their subcultures within the company, such as employing different methodologies of programming or management or utilizing different sets of development tools.

If some team member eventually needs to work in a different team, they may suffer a bit of culture shock and learn a new way of doing their job.

Monorepo vs Multi-Repo: Primary Differences

Both approaches ultimately deal with the same objective: managing the codebase. Hence, they must both solve the same challenges, including release management, fostering collaboration among team members, handling issues, running tests, and others.

Their main difference concerns their timing on team members to make decisions: either upfront for monorepo or down the line for multi-repo.

Let’s analyze this idea in more detail.

Because all libraries are versioned independently in the multi-repo, a team releasing a library with breaking changes can do it safely by assigning a new major version number to the latest release. Other groups can have their dependent libraries stick to the old version and switch to the new one once their code has been adapted.

This approach leaves the decision of when to adapt all other libraries to each responsible team, who can do it at any time. If they do it too late and new library versions are released, closing the gap across libraries will become increasingly difficult.

Consequently, while one team can iterate fast and often on their code, other teams may prove unable to catch up, ultimately producing libraries that diverge.

On the other hand, in a monorepo environment, we cannot release a new version of one library that breaks some other library since their tests will fail. In this case, the first team must communicate with the second team to incorporate the changes.

This approach forces teams to adapt all libraries altogether whenever a change for a single library must happen. All teams are forced to talk to each other and reach a solution together.

As a result, the first team will not be able to iterate as fast as they wish to, but the code across different libraries will at no point start diverging.

In summary, the multi-repo approach can help create a culture of “move fast and break things” among teams, where nimble independent teams can produce their output at their speed. Instead, the monorepo approach favors a culture of awareness and care, where teams should not be left behind to deal with a problem all by themselves.

Hybrid Poly-As-Mono Approach

If we can’t decide if to use either the multi-repo or monorepo approaches, there is also the in-between approach: to use multiple repositories and employ some tool to keep them synchronized, making it resemble a monorepo but with more flexibility.

Meta is one such tool. It organizes multiple repositories under subdirectories and provides a command-line interface that executes the same command on all of them simultaneously.

A meta-repository contains the information on which repositories make up a project. Cloning this repository via meta will then recursively clone all the required repositories, making it easier for new team members to start working on their projects immediately.

To clone a meta-repository and all its defined multiple repos, we must execute the following:

meta git clone [meta repo url]

Meta will execute a git clone for each repository and place it in a subfolder:

Cloning a meta project — Cloning a meta-project. (Image source: github.com/mateodelnorte/meta)

From then on, executing the meta exec command will execute the command on each subfolder. For instance, executing git checkout master on each repository is done like this:

meta exec "git checkout master"

Hybrid Mono-As-Poly Approach

Another approach is managing the code via a monorepo for development, but copying each library’s code into its independent repository for deployment.

This strategy is prevalent within the PHP ecosystem because Packagist (the main Composer repository) requires a public repository URL to publish a package, and it’s not possible to indicate that the package is located within a subdirectory of the repository.

Given the Packagist limitation, PHP projects can still use a monorepo for development, but they must use the multi-repo approach for deployment.

To achieve this conversion, we can execute a script with git subtree split Or use one of the available tools which perform the same logic:

Who’s Using Monorepo vs Multi-Repo

Several big tech companies favor the monorepo approach, while others have decided to use the multi-repo method.

Google, Facebook, Twitter, and Uber have all publicly vouched for the monorepo approach. Microsoft runs the largest Git monorepo on the planet to host the source code of the Windows operating system.

On the opposite side, Netflix, Amazon, and Lyft are famous companies using the multi-repo approach.

On the hybrid poly-as-mono side, Android updates multiple repositories, which are managed like a monorepo.

On the hybrid mono-as-poly side, Symfony keeps the code for all of its components in a monorepo. They split it into independent repositories for deployment (such as symfony/dependency-injection and symfony/event-dispatcher.)

Examples of Monorepo and Multi-Repo

The WordPress account on GitHub hosts examples of both the monorepo and multi-repo approaches.

Gutenberg, the WordPress block editor, is composed of several dozen JavaScript packages. These packages are all hosted on the WordPress/gutenberg monorepo and managed through Lerna to help publish them in the npm repository.

Openverse, the search engine for openly licensed media, hosts its main parts in independent repositories: Front-end, Catalog, and API.

Monorepo vs Multi-Repo: How to Choose?

As with many development problems, there is no predefined answer on which approach you should use. Different companies and projects will benefit from one strategy or the other based on their unique conditions, such as:

How big is the codebase? Does it contain gigabytes of data?
How many people will work on the codebase? Is it around 10, 100, or 1,000?
How many packages will there be? Is it around 10, 100, or 1,000?
How many packages does the team need to work on at a given time?
How tightly coupled are the packages?
Are different programming languages involved? Do they require a particular software installed or special hardware to run?
How many deployment tools are required, and how complex are they to set up?
What is the culture in the company? Are teams encouraged to collaborate?
What tools and technologies do the teams know how to use?

Summary

There are two main strategies for hosting and managing code: monorepo vs multi-repo. The monorepo approach entails storing the code for different libraries or projects — and even all code from a company — in a single repository. And the multi-repo system divides the code into units, such as libraries or services, and keeps their code hosted in independent repositories.

Which approach to use depends on a multitude of conditions. Both strategies have several advantages and disadvantages, and we’ve just covered all of them in detail in this article.

Do you have any questions left about monorepos or multi-repos? Let us know in the comments section!

Get all your applications, databases and WordPress sites online and under one roof. Our feature-packed, high-performance cloud platform includes:

Easy setup and management in the MyKinsta dashboard
24/7 expert support
The best Google Cloud Platform hardware and network, powered by Kubernetes for maximum scalability
An enterprise-level Cloudflare integration for speed and security
Global audience reach with up to 35 data centers and 275 PoPs worldwide

Get started with a free trial of our Application Hosting or Database Hosting. Explore our plans or talk to sales to find your best fit.

Source link