Milvus 的内核局部是 C++ 编写的,而 C++ 的依赖治理始终是困扰开发者的微小痛点,也是限度 C++ 生态倒退的瓶颈。

Milvus 晚期通过 FetchContent 和 ExternalProject 这些 CMake 内置办法主动下载依赖,在大部分状况下也够用,但随着 Milvus 内核的能力越来越丰盛、依赖项也越来越多,例如要退出 Folly 应用它优化后的线程池、数据结构,要引入 opentelemetry-cpp 加强可观测性等。

这就带来了肯定的问题,编译工夫越来越长,依赖的包还有递归依赖而且彼此之间还不能复用,每次退出新的依赖过程无比苦楚。这一系列的问题急需一个依赖治理的工具,在调研了 Conan、vcpkg、bazel 等工具后,最终咱们抉择了生态欠缺、和 CMake 兼容最好的 Conan 来治理依赖。

目前, Milvus 社区里的 C++ 我的项目都在应用 Conan 治理依赖,在革新的过程中遇到了一些绕不过来的坑,本文将梳理应用 Conan 过程中的一些常见概念、用法和常见问题,不便大家应用、了解。

01.Conan 的惯例用法

装置教程

Conan 在 2023 年 3 月公布了 2.0 版本,然而 2.0 有些第三方包还没齐全迁徙过来,所以在 Milvus 中仍在应用 1.58.0 版本的 Conan,将来会尝试降级到 2.0 版本。

Conan是 python3 写的程序,通过 pip 即可装置:

pip install conan==1.58.0

在Milvus中应用原理

在执行 make 之后,Milvus 会主动调用 Conan 下载、装置依赖,具体细节如下:

  • 在 scripts/core_build.sh 中运行 conan install 下载并编译依赖:
case "${unameOut}" in  Darwin*)    conan install ${CPP_SRC_DIR} --install-folder conan --build=missing -s compiler=clang -s compiler.version=${llvm_version} -s compiler.libcxx=libc++ -s compiler.cppstd=17 || { echo 'conan install failed'; exit 1; }    ;;  Linux*)    GCC_VERSION=`${CC} -dumpversion`    if [[ `${CC} -v 2>&1 | sed -n 's/.*\(--with-default-libstdcxx-abi\)=\(\w*\).*/\2/p'` == "gcc4" ]]; then      conan install ${CPP_SRC_DIR} --install-folder conan --build=missing -s compiler.version=${GCC_VERSION} || { echo 'conan install failed'; exit 1; }    else      conan install ${CPP_SRC_DIR} --install-folder conan --build=missing -s compiler.version=${GCC_VERSION} -s compiler.libcxx=libstdc++11 || { echo 'conan install failed'; exit 1; }    fi    ;;  *)    echo "Cannot build on windows"    ;;esac
  • 在 cmake_build/conan 目录中生成依赖项的配置。
  • 在 core/CMakeLists.txt 中 include 生成的配置信息,即可应用 Conan 中定义的第三方依赖:
list( APPEND CMAKE_MODULE_PATH ${CMAKE_BINARY_DIR}/conan )include( ${CMAKE_BINARY_DIR}/conan/conanbuildinfo.cmake )

Conan 的 Profile

Profile 是 Conan 的重要配置,该配置决定了 Conan 在编译第三方依赖时的参数,包含编译器版本、C++ 版本等。

Conan 会依据 profile + option 决定是否编译依赖,如果 profile + option 在 conan center 中有预编译好的二进制文件,则间接下载应用,否则会从源码编译。

在 ~/.conan/profiles/default 有默认配置,例如:

[settings]os=Macosos_build=Macosarch=armv8arch_build=armv8compiler=clangcompiler.version=15compiler.libcxx=libc++  # libcxx so的版本,有是否反对cxx11的区别compiler.cppstd=17build_type=Release[options][build_requires]

在 Milvus 的 Conanfile.py 中,改了默认的 arrow 编译配置,所以 arrow 必然会从新编译:

class MilvusConan(ConanFile):    settings = "os", "compiler", "build_type", "arch"    requires = (        "arrow/8.0.1",    )    generators = ("cmake", "cmake_find_package")    default_options = {        "arrow:with_zstd": True,        "arrow:shared": False,        "arrow:with_jemalloc": True,    }

第三方包装在哪里?

以 arrow 为例,它会装在下方目录中,其中文件门路中的 hash 值是依据 profile+option 算进去的,所以批改 profile 或 option 后会从新生成一个 package。

02.如何写 conanfile.py

能够参考 internal/core/conanfile.py:

class MilvusConan(ConanFile):    settings = "os", "compiler", "build_type", "arch"    # 去 https://conan.io/center/ 搜寻须要的package及其版本    requires = (        "rocksdb/6.29.5",        "boost/1.81.0",        "onetbb/2021.7.0",        "nlohmann_json/3.11.2",        "zstd/1.5.5",        # ...    )    generators = ("cmake", "cmake_find_package")    default_options = {        "rocksdb:shared": True,        # ...            }    # 依据settings动静决定依赖的编译配置    def configure(self):        if self.settings.os == "Macos":            # Macos M1 cannot use jemalloc            if self.settings.arch not in ("x86_64", "x86"):                del self.options["folly"].use_sse4_2    # imports 会把匹配的文件放到 cmake_build/ 下               def imports(self):        self.copy("*.dylib", "../lib", "lib")        self.copy("*.dll", "../lib", "lib")        self.copy("*.so*", "../lib", "lib")        self.copy("*", "../bin", "bin")        self.copy("*.proto", "../include", "include")

03.如何写入及公布 Library 的 conanfile.py ?

相比于只是应用 Conan 治理依赖,写一个 library 的 conanfile.py 要简单很多,它不光要定义依赖项,给用户提供多种编译选项,还要申明导出的包各种定义。

参考 Knowhere 的 conanfile.py:

class KnowhereConan(ConanFile):    name = "knowhere"    description = "Knowhere is written in C++. It is an independent project that act as Milvus's internal core"    topics = ("vector", "simd", "ann")    url = "https://github.com/milvus-io/knowhere"    homepage = "https://github.com/milvus-io/knowhere"    license = "Apache-2.0"    generators = "pkg_config"    settings = "os", "arch", "compiler", "build_type"    # 须要指定option和它的默认值    options = {        "shared": [True, False],        "fPIC": [True, False],        "with_raft": [True, False],        "with_asan": [True, False],        "with_diskann": [True, False],        "with_profiler": [True, False],        "with_ut": [True, False],        "with_benchmark": [True, False],    }    default_options = {        "shared": True,        "fPIC": False,        "with_raft": False,        "with_asan": False,        "with_diskann": False,        "with_profiler": False,        "with_ut": False,        "glog:with_gflags": False,        "prometheus-cpp:with_pull": False,        "with_benchmark": False,    }    # 公布的源码包蕴含哪些文件    exports_sources = (        "src/*",        "thirdparty/*",        "tests/ut/*",        "include/*",        "CMakeLists.txt",        "*.cmake",        "conanfile.py",    )    @property    def _minimum_cpp_standard(self):        return 17    @property    def _minimum_compilers_version(self):        return {            "gcc": "8",            "Visual Studio": "16",            "clang": "6",            "apple-clang": "10",        }    def config_options(self):        if self.settings.os == "Windows":            self.options.rm_safe("fPIC")    def configure(self):        if self.options.shared:            self.options.rm_safe("fPIC")    def requirements(self):        self.requires("boost/1.81.0")        self.requires("glog/0.6.0")        self.requires("nlohmann_json/3.11.2")        self.requires("openssl/1.1.1t")        self.requires("prometheus-cpp/1.1.0")        if self.options.with_ut:            self.requires("catch2/3.3.1")        if self.options.with_benchmark:            self.requires("gtest/1.13.0")            self.requires("hdf5/1.14.0")    @property    def _required_boost_components(self):        return ["program_options"]    def validate(self):        if self.settings.compiler.get_safe("cppstd"):            check_min_cppstd(self, self._minimum_cpp_standard)        min_version = self._minimum_compilers_version.get(str(self.settings.compiler))        if not min_version:            self.output.warn(                "{} recipe lacks information about the {} compiler support.".format(                    self.name, self.settings.compiler                )            )        else:            if Version(self.settings.compiler.version) < min_version:                raise ConanInvalidConfiguration(                    "{} requires C++{} support. The current compiler {} {} does not support it.".format(                        self.name,                        self._minimum_cpp_standard,                        self.settings.compiler,                        self.settings.compiler.version,                    )                )    def layout(self):        cmake_layout(self)    # 用于生成最要害的 cmake toolchain文件,cmake依赖项配置文件,以及cmake编译参数    def generate(self):        tc = CMakeToolchain(self)        tc.variables["CMAKE_POSITION_INDEPENDENT_CODE"] = self.options.get_safe(            "fPIC", True        )        # Relocatable shared lib on Macos        tc.cache_variables["CMAKE_POLICY_DEFAULT_CMP0042"] = "NEW"        # Honor BUILD_SHARED_LIBS from conan_toolchain (see https://github.com/conan-io/conan/issues/11840)        tc.cache_variables["CMAKE_POLICY_DEFAULT_CMP0077"] = "NEW"        cxx_std_flag = tools.cppstd_flag(self.settings)        cxx_std_value = (            cxx_std_flag.split("=")[1]            if cxx_std_flag            else "c++{}".format(self._minimum_cpp_standard)        )        tc.variables["CXX_STD"] = cxx_std_value        if is_msvc(self):            tc.variables["MSVC_LANGUAGE_VERSION"] = cxx_std_value            tc.variables["MSVC_ENABLE_ALL_WARNINGS"] = False            tc.variables["MSVC_USE_STATIC_RUNTIME"] = "MT" in msvc_runtime_flag(self)        tc.variables["WITH_ASAN"] = self.options.with_asan        tc.variables["WITH_DISKANN"] = self.options.with_diskann        tc.variables["WITH_RAFT"] = self.options.with_raft        tc.variables["WITH_PROFILER"] = self.options.with_profiler        tc.variables["WITH_UT"] = self.options.with_ut        tc.variables["WITH_BENCHMARK"] = self.options.with_benchmark        tc.generate()        deps = CMakeDeps(self)        deps.generate()    def build(self):        cmake = CMake(self)        cmake.configure()        cmake.build()    def package(self):        cmake = CMake(self)        cmake.install()        files.rmdir(self, os.path.join(self.package_folder, "lib", "cmake"))        files.rmdir(self, os.path.join(self.package_folder, "lib", "pkgconfig"))    def package_info(self):        self.cpp_info.set_property("cmake_file_name", "knowhere")        self.cpp_info.set_property("cmake_target_name", "Knowhere::knowhere")        self.cpp_info.set_property("pkg_config_name", "libknowhere")        self.cpp_info.components["libknowhere"].libs = ["knowhere"]        self.cpp_info.components["libknowhere"].requires = [            "boost::program_options",            "glog::glog",            "prometheus-cpp::core",            "prometheus-cpp::push",        ]        self.cpp_info.filenames["cmake_find_package"] = "knowhere"        self.cpp_info.filenames["cmake_find_package_multi"] = "knowhere"        self.cpp_info.names["cmake_find_package"] = "Knowhere"        self.cpp_info.names["cmake_find_package_multi"] = "Knowhere"        self.cpp_info.names["pkg_config"] = "libknowhere"        self.cpp_info.components["libknowhere"].names["cmake_find_package"] = "knowhere"        self.cpp_info.components["libknowhere"].names[            "cmake_find_package_multi"        ] = "knowhere"        self.cpp_info.components["libknowhere"].set_property(            "cmake_target_name", "Knowhere::knowhere"        )        self.cpp_info.components["libknowhere"].set_property(            "pkg_config_name", "libknowhere"        )

实践上无需批改原始的 CMakeLists.txt 文件,但局部第三方包名并不对立要做对应的批改。在 CMakeLists.txt 中间接增加 find_package(XXX required) 即可找到对应的包。

原理

以编译 Knowhere 为例:

在build目录下运行,能够增加一些自定参数,这些自定义参数须要定义在 conanfile.py 中。

conan install .. --build=missing -o with_ut=True -o with_asan=True -s build_type=Debug

运行上述命令即可将依赖包下载并编译,同时在 build/Debug/generators 下会生成重要的配置文件。再运行即可编译knowhere我的项目:

conan build ..

Conan build 命令实质上是运行了 cmake 命令,加了一些参数,约等于:

cmake -G "Unix Makefiles" -DCMAKE_TOOLCHAIN_FILE=./Debug/generators/conan_toolchain.cmake -DCMAKE_BUILD_TYPE="Debug" ..  

很多编辑器、IDE 会依据 CMakeLists.txt 文件主动配置环境。在应用 Conan 后,很多同学会遇到配置我的项目报错、无奈应用的问题,此时须要批改 IDE 对应的 cmake 配置,加上 -DCMAKE_TOOLCHAIN_FILE=build/Debug/generators/conan_toolchain.cmake 参数即可实现环境配置。

如何写一个新包及测试?

https://github.com/milvus-io/conanfiles 里有几个例子,以其中的 arrow 为例,在 arrow/all 目录下执行:

conan create . arrow/12.0.0-dev1@milvus/dev --build=missing

如果编译胜利,会在 ~/.conan/data/arrow 下生成对应的包。

如何上传到 center

Milvus 依赖的一些 lib 如 Knowhere、velox 等在 https://conan.io/center/ 中不存在或版本不符合要求,此时须要上传到公有的 center,拿到对应的用户名、明码并运行以下命令:

conan user -p $password -r default-conan-local $userconan upload arrow/12.0.0-dev1@milvus/dev -r default-conan-local

至于如何搭建公有的center,详见:https://docs.conan.io/1/uploading_packages/remotes.html

本文由mdnice多平台公布