Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taweret/OpenBTMixing Building & Distribution #83

Open
jared321 opened this issue Apr 24, 2024 · 4 comments
Open

Taweret/OpenBTMixing Building & Distribution #83

jared321 opened this issue Apr 24, 2024 · 4 comments
Assignees

Comments

@jared321
Copy link
Contributor

jared321 commented Apr 24, 2024

The simplest means for users to install Taweret would be by issuing pip install Taweret with that command installing openbtmixing automatically from PyPI. However, we need to determine if such a scheme is feasible given openbtmixing's dependence on MPI. As part of this, we can try to determine all the ways in which users can install and use both of these packages.

Possible Requirements

  • openbtmixing’s build system shall allows users to use the software on laptops, desktops, nodes, clusters, or supercomputers. The build system shall be capable for building, installing, and using on macOS, *nix, Windows/Ubuntu, and Windows/Powershell.
    • We will concentrate on macOS and Debian/Ubuntu for now.
  • openbtmixing’s build system shall allow for CI testing across the Cartesian product of common setups
    {All OS} x {GCC, Intel, …} x {OpenMPI, MPICH, MVAPICH, etc.} x {debug, production}
    • This implies the need to encode in the build system debug and production flags for each compiler suite. Is that sufficient? Should we allow for site-specific flags as well or can autotools obviate the need for this?
  • Where possible, for each OS that we support the openbtmixing command line tools (CLT) and libraries shall be made available through at least one compatible package manager (e.g., homebrew, apt-get, spack, etc.).
    • This implies that openbtmixing C++ should be installed by itself and then the Python package just looks for it or finds it in the path. Is this what we want? Should the Python package build its own internal version of the tools as it presently does?
    • Will there be users who want to use the CLT by themselves and through Python? If so, should openbtmixing be installable with and without integrated CLT?
  • Users shall be able to build/install openbtmixing with the compiler suite and MPI implementation of their choice. This includes using suites and implementations installed by experts and that are optimized for the associated platform.
  • openbtmixing shall be pip installable. Will this be a source-only distribution with automatic building integrated? Can we distribute prebuilt wheels and satisfy our MPI-based requirements?
  • Taweret shall be pip installable. If MPI is isolated in openbtmixing, then Taweret is a pure python package and users shall be able to pip install from PyPI via source distribution or universal binary wheel. Should we account for users that want to run through a git clone? Should we allow for users/developers to install via a clone in editable/developer mode (i.e., pip install -e)?
  • The openbtmixing Python package shall be listed as an external dependence of Taweret.
    • This implies that pip install Taweret would try to install openbtmixing from PyPI and therefore with the default openbtmixing install scheme. Feasible? Good idea?
  • If an openbtmixing distribution includes an MPI implementation, then the integration of that implementation in the package shall be such that other implementations in a user's system cannot be used with openbtmixing at execution by accident and such that our MPI implementation cannot accidentally insert itself in a different software’s stack.
  • Users shall be able to install openbtmixing and Taweret in a regular Python installation or in an anaconda installation
  • For anaconda installations, users shall be allowed to use the OpenMPI or MPICH (others?) packages available through conda for installing openbtmixing
    • How does installing openbtmixing Python package know what compilers to use to build the CLT/libraries and where to find the MPI implementation?
    • Should we encourage this? How do we actually encourage this if there are multiple means to install and this one requires extra effort?
    • Follow the example of mpi4py for installing with an mpi dependence?
  • Based on all of the above, a possible means to install Taweret on macOS in a general Python installation is
    • brew install open-mpi (or mpich and also a particular version using which compiler suite?)
    • brew install eigen (header only => no compilation here)
    • brew install openbtmixing (C++ CLT and libraries built with your homebrew MPI installation & matching compiler suite using your eigen)
    • pip install Taweret (this should automatically install openbtmixing Python package based on your homebrew openbtmixing/MPI)
@jared321 jared321 self-assigned this Apr 24, 2024
@ominusliticus
Copy link
Collaborator

Let's begin by noting that Taweret is intended to be a framework that standardizes APIs for Bayesian inference software (developed by the nuclear physics community).
Such a framework should not depend on an implementation thereof.
With this reasoning, I would like to strike the ability to blanket-install Taweret and have it take care of all its implementations' requirements.

Assuming the openbtmixing does gain widespread adoption, we would, at a minimum, want to support compute cluster SPACK-installing OpenBT and openbtmixing in whatever, optimization configurations they can concoct.

Build OpenBT as standalone

For this reason, it is, in my opinion, a little inappropriate to discuss the guts of OpenBT in a Taweret issue.
But, this will make for the best place to keep track of our ideas for now.

The work flow that I envision is as follows:

  1. Require the user to build OpenBT from scratch
    1. Users should have their compiler and MPI implementation of choice
    2. Clear build instructions should be developed, best practices, e.g., manipulating OpenMPI install locations, should be included
    3. Unit tests for the matrix of compilers and operating systems, ideally covering all conceivable combinations, should be implemented
  2. Require the user to install openbtmixing separately, with the appropriate prompt from Taweret should the module not be installed
    1. This is how it is done in packages like bilby

openbtmixing and OpenBT as Taweret dependancies

Should people insist on the convenience of one-line installations, we could appeal to the pip install flag --no-binary :all: which should be tailored to build dependencies with exactly one prescription for compiler and MPI implementation, whose identities should be determined by compilers available on systems like Ubuntu (personal computer) and CentOS (clusters).

@jared321
Copy link
Contributor Author

@ominusliticus Interesting. I've never heard of a package building in a dependency but refusing to install it. That does sound like something we might need. As long as the error message is helpful and points them to clear docs, it seems acceptable.

It seems like much of your workflow is in accord with my requirements. In other words, it could be a solution that satisfies the requirements. Do you agree?

I've realized that while we are trying to evolve the package distribution portion of OpenBT as you point out, we are at the same time updating the overall Taweret software architecture in the case that OpenBTMixing is a dependency so that the architecture is compatible with an acceptable build/distribution strategy. We've seen this in the work that John is doing and my next question is related.

There is the "homebrew-style" architecture where the OpenBT/C++ CLTs and the OpenBTMixing Python package are independent layers in the SW architecture. In such case, the Python package is a pure Python package. However, there is a "Python source build" architecture where the OpenBTMixing Python package is a true wrapper of the OpenBT/C++ CLTs. I believe that this style implies that the package must have its own build system and users would likely always have to build from source. Is one of these what you have in mind for your workflow?

image

@ominusliticus
Copy link
Collaborator

In my own view of things, I am more inclined towards the figure on the right. The picture on the left is something we have to fit into the "fast-paced data science in python" pipeline. If we can enable this for users, we ought to.

Regarding the discussion of the relevance of OpenBT infrastructure for Taweret, I agree. Though the results of our findings/conclusion will probably be documents in the BAND software guidelines, which will be useful for future developers.

@asemposki
Copy link
Collaborator

asemposki commented Apr 30, 2024

In regard to Anaconda, here are the things I have found out so far:

  • conda-forge can use mpi4py to install MPIs according to whichever the user wants (MPICH or openMPI) (see this link Jared sent)
  • conda-forge allows users to use their own MPI, seen here
  • Conda has a procedure for adding packages, and this includes a "recipe" where we specify dependencies, build requirements, etc. One of these is that we can set the compiler of choice if we are not only using Python code---this might be useful for openbtmixing? See this link, line 41.
  • I'm also slowly investigating how packages like NumPy are included in Conda, which may also be useful since NumPy interfaces with C, C++, and Fortran code since it is built using all of them underneath

Update:

  • How to contribute to Conda: on their contributors page (here), they say anyone can contribute, by contributing the package you will be the maintainer, and that one PRs the package to conda-forge and the review team will work with us to get it in the right shape to be released through conda-forge. This sounds very much to me like the JOSS process, and doesn't seem like there are stringent constraints (as with homebrew's requirements of how many stars, etc., on GitHub).
  • It might be prudent to look at the feedstock repos from SciPy and NumPy, or other large packages, to see how they handle dependencies, compilers, etc., for example SciPy's conda-forge feedstock repo is here and NumPy's is here. These repos are where conda-forge automatically moves the recipes, etc., to when the package is PR'd and accepted into conda-forge. If we go the Conda route, these might be good recipes to look off of to see what we can use with Conda.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants