in ,

Open-Sourcing the Universal Package Manager, Hacker News

Open-Sourcing the Universal Package Manager, Hacker News


On Repl.it, working with packages is made easy. You can simply typeimport flaskin your Python code, andFlaskwill automatically be installed for you. Or, if you’re more the browsing type, you can search for packages and install them through a graphical interface.

In Repl.it tradition, once you know how to do package management in one language, you know how to do it in every language. You can use the same interface to install packages in Node.js, simply typerequire (“express”)to getExpressup and running and so on.

Today we’re excited to release several months of work on improving the package management experience. Here are the highlights:

  • Reproducible package management. We’ll still look at your code and install packages automatically, but now we’ll remember exactly which versions to use, so your code will keep working no matter how many package updates are published.
  • Modern tooling and best practices.We now use the modern dependency managerPoetryto manage your Python packages. Poetry improves on Pip in its security, consistency, usability, and flexibility. We believe tools like Poetry are the future, and we are migrating to them to do our part in improving the ecosystem for developers everywhere.
  • Giving back to the community.The core of our language-agnostic package management is nowopen-source on GitHub. UPM, the Universal Package Manager, is a manager for your package managers: it knows all of their features, best practices, and quirks so that you don’t have to. UPM provides a unified set of abstractions (adding, removing, and listing project packages, searching for packages online, and guessing what packages need to be installed for your project to run) and a consistent, scriptable command-line interface that you can use to manage packages for every language the same way, just like we do on Repl.it. If you want to get your favorite package manager on Repl.it, all you need to do is submit a pull request to UPM. Supporting a new language now only takes about 300 lines of code!
  • More language support.Splitting out package management into a project which abstracts over language-specific differences makes adding package management to more languages ​​much easier. In fact, we’ve already received a contribution from theDartLangteam, and alsoaddedpackage management toEmacs Lisp– check it outhere!

You can take out UPM on your computer. Check out theInstallationsection on GitHub for full instructions for your system.

Here is a quick demo of the CLI on Repl.it. (You can open a shell in the workspace withctrl-shift-son maccommand-shift-s).

Let’s dive into some of the technical aspects of UPM and Repl.it’s new package management.

Different kinds of package managers

There’s more than one kind of package manager. Broadly speaking, I like to define two categories:systempackage managers, andprojectpackage managers.

System package managers:

  • includeHomebrew,APT,RPM,Pacman, andChocolatey.
  • generally install only the latest versions of all software.
  • install software globally, for everyone (or in some cases for a single user).
  • often have guarantees by package repository maintainers that the software they install will work together.
  • can install anything that’s packaged, regardless of the language it’s written in.

Project package managers:

  • includePipenv,Poetry,NPM,Yarn,Bundler,Stack, andCask.
  • can generally install any versions of available packages, and generally include dependency resolution algorithms that can compute solutions to a large set of package version constraints.
  • install packages into an isolated environment for every project, so that they aren’t available (and can’t conflict) globally.
  • usually don’t have any guarantee about the quality, compatibility, or even safety of the packages that are installed.
  • are almost always limited to one specific language.

In other words, system package managers are meant to administrate your system and install the tools that you use everywhere on your machine, whereas project package managers are meant to help develop and package new software. These are very different use cases, and so the resulting package managers are very different.

You might ask, what about tools likePip,RubyGems, andcabal-install? These tools occupy a middle ground: by default, they install packages globally (making them unsuitable for project package management); yet they are also limited to a specific programming language (making them also unsuitable for system package management). As package management ecosystems evolved, using these tools directly is no longer recommended; rather, for system package management you should use a system package manager which packages the software you want to install globally, and for project package management you should use a tool which wraps Pip (eg Pipenv or Poetry), RubyGems (Bundler), or cabal -install (Stack) to provide isolation and reproducibility.

How should project package managers

Here’s how we visualize project package management as working in an ideal world:source → specfile → lockfile → installed packages. Let’s break that down in detail:

  • Thesourcecode is what really defines a project’s dependencies. Although often imprecise and implicit through importing and usage of packages, when possible the source should be used as a basis. For exampleupm add --guesswill add any packages it think are being used to the specfile.
  • Thespecfiledescribes project dependencies in a human-and-machine-readable format. For example, your specfile might say: “This is a Python project. It needs(at least version 1.1, but not 2.0 or anything later) as well asSelenium(any version) to run. ”ForPoetry, this file is calledpyproject.toml. Typically you edit the specfile either by hand or by using a command-line interface (or both). For example, you could create the specfile I described above by runningpoetry add “flask ^ 1.1” “selenium *”.
  • Thelockfileis a file that describes project dependenciesexactly, in a machine-readable format . This means that it includestransitive dependencies(dependencies of your dependencies), and it has exact versions for every package, rather than the version (constraints) **************************** (at least 1.1 , less than 2.0) that are found in the specfile. The lockfile is automatically generated from the specfile by the project package manager viadependency resolution. Why is it important? If you have a lockfile, then it ensures that every developer working on your library or application will use exactly the same versions of its dependencies. This is very important for reproducibility! Typically, lockfiles also include some checksums or hashes to improve security (if someone has replaced a dependency on PyPI with malware, then the build will fail). For Poetry, the lockfile is calledpoetry.lock.
  • Thepackagesare installed based on what is listed in the lockfile. Typically, they will be installed into an isolated per-project directory (likenode_modulesor a Python virtualenv).

This one-directional information flow from source to specfile to lockfile to installed packages neatly separates the different functions of a project package manager. Each stage having less human involvement than the last.

How do project package managers actually behave?

Not well, it turns out. While building the package management infrastructure at Repl.it, we discovered a laundry list of language-specific limitations, quirks, and design mistakes. This is what inspired us to create UPM: we want to make package management as easy as it should be.

Here are some of our favorite quirks:

  • Bundler, despite using a project-local specfile and lockfile, installs dependenciessystem-globallyby default. You can configure this, but there’s no standard project-local location to use.
  • When you runpip search Flask, the package namedFlaskdoes not appear in the search results. As of yet, I’ve been unable to determine why. Also, you can’t compare Python package names using string equality because they are case insensitive and hyphens and underscores are equivalent. (But nonetheless thereisa canonical format for each package name, which cannot be determined without network access and which is used in some contexts but not others.)
  • The lockfile format used by Yarn is (almostYAML, but not quite.Yes , really.
  • The command-line option to make Bundler produce machine-parseable output is “documented” only by virtue of existing in the source code. Reading Bundler specfiles and lockfiles from an external tool is also only possible by threading together a bunch of internal functions from the Bundler source code.
  • There’s no reasonable way in Poetry to discover where dependencies are going to be installed. You have to either create a virtual environment and then ask where the Python binary was installed, or manually reimplement the algorithm, which includes checking environment variables, reading two different configuration variables, parsing an optional TOML file, looking up the Python version, and lowercasing the project name.
  • Given a Python import, the best way to determine which package provides it is apparently to look it up in a big list that is generated bypeople manually adding packages one at a time.
  • For some packages, the NPM Registry API returns a URL for the package homepage in search results butnotwhen you look up details for the package individually.
  • The standard Emacs package manager has literally no support for installing anything but the latest version of a package.

If you use UPM, you don’t have to worry about any of this!

(UPM abstractions

The basic principle of UPM is to definea sensible internal APIwhich can be implemented for each language, and then define the user-facing command-line interface in terms of this API. This way, all of the business logic of UPM is guaranteed to be language-independent.

Some parts of the API are simple constants: the names of the specfile and lockfile, and what filenames correspond to the language. These are used for project language autodetection. Other parts implement the core UPM operations: add or remove packages, list the specfile or lockfile, search project source code for possible dependencies to install. In addition to guaranteeing language-independence, this API / CLI split makes it easier to implement language backends. For example, ‘upm add flask’ will first list the specfile and filter out Flask if it’s already been added. This means the implementation ofLanguageBackend.Addfor the Python backend of UPM can just invokepoetry add, without needing to worry about the fact that Poetry throws an error if you try to add the same package twice.

One of the main challenges in designing UPM’s language backend API was the fact that different package managers act quite differently. In an ideal world, each package manager would implement three separate operations: add to or remove from the specfile, generate the lockfile from the specfile, and install packages from the lockfile. In reality, some package managers force you to do two or even three steps at once. In UPM, we deal with this by having each language backend declare a set of “quirks”, likeAddRemoveAlsoLocksandLockAlsoInstalls. The implementation ofupm addwill run theAddbackend method, and will then follow it up with theLockbackend methodunlessAddRemoveAlsoLocksis included in the backend’s quirks configuration (indicating that the lockfile was already generated in addition to the specfile being modified).

Even worse than some package managers combining steps, some package managers don’t have any concept of a lockfile at all! For example, the standard package manager for Emacs Lisp (package.el, wrapped byCask) has no support at all for installing a specific version of a package, so the idea of ​​a lockfile is really a non-starter. (Aside: this annoyed me so much that I wrotemy own package manager for Emacs, which was part of the reason I got hired to improve the package management infrastructure at Repl.it!) ******

The approach of UPM to this problem is to preserve the spirit of the specfile / lockfile abstraction as much as possible. For Emacs Lisp, UPM will install directly from the specfile, then generate a lockfile from what is installed (listing exact versions and transitive dependencies, of course).

Caching and dependency guessing

At Repl.it, we care about performance, because nobody wants to wait for their code to run. That means our package management needs to be as fast as possible, especially when there isn’t actually anything that needs to be installed. Since we want UPM to be as useful a standalone tool as possible, we opted to implement all of the performance optimizations directly in UPM. All of the package management code in Repl.it is essentially just a wrapper around UPM:

  • when you add a package through the interface, Repl.it callsupm add
  • when you remove a package through the interface, Repl.it callsupm remove
  • when you run your code, Repl.it callsupm add --guess( which searches your code forimportorrequirestatements and installs any missing packages)

You might ask how it isn’t incredibly slow to do a code search on every run. (Not to mention making sure the lockfile and installed packages are up to date, since you’re allowed to edit the specfile directly at any time if you want to!)

The answer is that UPM transparently keeps track of some information in a hidden JSON file behind the scenes. It looks something like this:

{   "version":  (2) ,   "languages": {     "python-python3-poetry": {       "specfileHash":"361 e6bddc6a  (f) ******************************************************************************** (e)  be 88 B4B4 ",       "lockfileHash":"f 208 ad0efc  (D) ***************************************************************************************************** (f) **************************************************************************************************** (e) *********************************************************************** (cf ") ,       "guessedImports": [      "Flask",      "selenium"     ],       "guessedImportsHash":" (e)  CF  (e)  EF  (C4E9C)  A7 "    }   } }

After a successful operation, UPM will automatically recordhashesof the specfile and lockfile. That way, it can tell if the specfile has changed since the last time the lockfile was generated. If it hasn’t, thenupm lockis a very fast no-op. Similarly, if the lockfile hasn’t been changed since the last time packages were installed, thenupm installcan be optimized away.

UPM also optimizes dependency guessing by means of a two-step search. First, it uses a fastregexpmatch to heuristically find things thatmightbeimportorrequirestatements. Then it converts the deterministically generated sequence of matches into a hash. If this hash matches what was recorded in the JSON file last time a search was done, then the list of guessed packages from last time (also in the JSON file) can be reused. This is very fast. Otherwise, the language backend is asked to do a more advanced search, usually involvingASTparsing.

Closing

We hope you enjoy faster, more modern, and more open package management support on Repl.it. Now that we’ve aggregated all of the language-specific code into a single place, we hope it will be much easier to add package management support for new languages, likeEmacs Lisp. Check outUPM on GitHuband see what it would take to add your favorite package manager to Repl.it! (Or, if Repl.it doesn’t have your favorite programming language yet, check out our other open-source projects,PolygottandPrybar, to help us add it.)

Brave Browser
Read More
Payeer

What do you think?

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings

Fitbit to Be Acquired by Google, Hacker News

Google buys Fitbit for $ 2.1 billion, Ars Technica

Google buys Fitbit for $ 2.1 billion, Ars Technica