Aryan Shaily
Why do we even need pnpm?
We use pnpm for our monorepo at work. I remember when I first asked why use pnpm? A senior of mine had said that it saves space by using a central store and it was a good enough answer for me. Yeah ik that was stupid of me to not ask more questions. In this blog lets try to understand why do different package managers exist?
As I already mentioned we at work use pnpm for our monorepo but we also use yarn for our marketing website and yes weβve used npm here and there in some internal tools as well. So yes there are a lot of options out there and people like us are using almost all of them. Some prefer one over the other. So whyβs that? Why do we even have different package managers? Why do we use pnpm for our monorepo but not in the website project? How does pnpm save space? These are all the questions that I shouldβve asked my senior the first time I asked them about the choice of our package manager for monorepo and now Iβm gonna try to answer them.
There are three different package managers npm, yarn, and pnpm. We all already know npm its the first package manager that we get to know about when dipping our foot for the first time in the javascript world. Yarn and pnpm come a little later as you explore more projects on the internet or you explore tutorials of more complex projects.
These package managers solve three competing constraints:
- Deterministic installs
- Correct dependency resolution
- Performance and disk efficiency
Npm was the first package manager that came with node. It was nice and easy to use. You had package.json to mention all your dependecies and a node_modules folder where all the dependecies would go. The package-lock.json file would store the dependency graph with the exact versions of the dependencies installed. All of this came with a few drawbacks as well. The node_modules folders were gigantic and you had multiple such node_modules folders in your projects. The package downloads were also historically serial and slower. While npm introduced package-lock.json for determinism it also had a few drawbacks like instability across npm versions, resolution differences across platforms (especially pre-v7) and due to excessive hoisting packages could execute code from other packages that were not specified in their dependencies which could create unexpected results during runtime.
This is where yarn comes with a new deterministic resolution algorithm (top-down), stable lockfile format and parallel package installs. The lockfile format is human readable and easier to resolve than npmβs old ones in case of conflicts. More deterministic lock file also meant that we could reliably install projects in multiple devices without getting weird dependency version mismatches. Pnpm improved on both yarn and npm, the lockfile is human readable and deterministic and it doesnβt hoist packages. All package contents are stored once in a global content-addressable store and hardlinked into projects, while symlinks encode the dependency graph which helps pnpm enforce strict dependency boundaries. Pnpm does this while mirroring nodeβs resolution rules instead of bending them.
While working with package managers youβll come across terms like hoisting, peerDependencies and dependency resolution. So lets discuss them as well. Dependency hoisting is an optimization technique used by package managers by putting the packages that are shared between packages on the top level of the node_modules, this improves deduplication but also enables phantom dependencies thatβs why pnpm avoids hoisting by design. A peer dependency declares a dependancy that must be provided by the consumer, and must resolve to the same instance. Dependency resolution is the algorithm Node.js uses to locate a module by traversing node_modules directories upward from the requiring file.
Pnpm is very strict in nature for how it handles dependency resolution. As already mentioned earlier the dependencies are scoped to their parent packages to avoid any inconsistencies during installs and avoid packages from accessing other packages not mentioned in their dependencies. This comes in very handy when working with monorepos, as multiple versions of the same dependency must coexist, the workspace packages must not leak dependecies and deterministic builds are mandatory at scale. This is the reason why we use pnpm for our monorepo as well. It makes makes dependency boundaries explicit, which prevents accidental coupling between packages in a monorepo while also saving some space.
# Dependency resolution graph:
# (Your app uses react-dom and react-router-dom, both depend on react)
app
βββ react-dom
β βββ react
βββ react-router-dom
βββ react
# npm node_modules structure:
node_modules/
βββ react-dom/
βββ react/ (hoisted to top)
βββ react-router-dom/
# yarn node_modules structure:
node_modules/
βββ react-dom/
βββ react/ (hoisted to top)
βββ react-router-dom/
# pnpm node_modules structure:
node_modules/
βββ .pnpm/
β βββ react-dom@18.2.0/ (actual package files)
β β βββ node_modules/
β β βββ react-dom -> ../react-dom@18.2.0 (symlink to self)
β β βββ react -> ../../react@18.2.0 (symlink)
β βββ react@18.2.0/ (actual package files, stored once)
β βββ react-router-dom@6.8.0/ (actual package files)
β βββ node_modules/
β βββ react-router-dom -> ../react-router-dom@6.8.0 (symlink to self)
β βββ react -> ../../react@18.2.0 (symlink)
βββ react-dom -> .pnpm/react-dom@18.2.0/node_modules/react-dom (symlink)
βββ react-router-dom -> .pnpm/react-router-dom@6.8.0/node_modules/react-router-dom (symlink)