关于golang:Go-Rust-排序和二分搜索的对比

作者：王东阳

前言

在计算机科学中，排序和二分搜寻是十分常见的一种算法，在上篇文章《leveldb memdb源码剖析(下)之Rust实现篇》中，就能够看到外面很多办法都用到了二分搜寻来实现。本文将会比照 Go 和 Rust 语言中排序和搜寻办法中的一些个性和不同，对于Go次要应用数字切片 []int 作为例子，对于 Rust 次要应用 Vec 作为例子。

排序
在Go语言中，对于 []int 咱们能够间接应用 sort.Ints 进行排序如下：

func sort_simple() {
    a := []int{1, 2, 6, 7, 8, 3, 4}
    sort.Ints(a)
    fmt.Println(a)
}

在 Rust 中，对于 Vec 能够应用 sort 办法进行排序如下：

fn sort_simple() {
        let mut a = vec![1, 2, 6, 7, 8, 3, 4];
        a.sort();
        println!("{:?}", a);
    }

如果心愿自定义排序规定，在Go中咱们能够应用 sort.Slice ,实现逆序排序或基于绝对值等特定规定的排序，如下：

func sort_reverse() {
    a := []int{1, 2, 6, 7, 8, 3, 4}
    sort.Slice(a, func(i, j int) bool {
        return a[i] > a[j]
    })
    fmt.Println(a)
}

func sort_abs() {
    a := []int{1, -2, 6, 7, -8, 3, 4}
    sort.Slice(a, func(i, j int) bool {
        return abs(a[i]) > abs(a[j])
    })
    fmt.Println(a)
}

func abs(i int) int {
    if i > 0 {
        return i
    }
    return -i
}

在Rust中则能够应用 sort_by来自定义排序

fn sort_reverse() {
        let mut a = vec![1, 2, 6, 7, 8, 3, 4];
        a.sort_by(|a, b| b.cmp(a));
        println!("{:?}", a);
    }

    fn sort_abs() {
        let mut a: Vec<i32> = vec![1, -2, 6, 7, -8, 3, 4];
        a.sort_by(|a, b| a.abs().cmp(&b.abs()));
        println!("{:?}", a);
    }

除此之外，在Rust如果要给予key的特定规定进行排序，也能够应用sort_by_key

fn sort_by_key_abs() {
        let mut a: Vec<i32> = vec![1, -2, 6, 7, -8, 3, 4];
        a.sort_by_key(|k| k.abs());
        println!("{:?}", a);
    }

另外Rust思考到sort_by_key办法传入的闭包函数在解决key的时候可能比拟耗时，所以还提供了sort_by_cached_key ，不会反复计算key，优化排序性能。

在Rust中还提供了对应的 _unstable 版本，相比 stable 排序有更好的性能，感兴趣的能够自行去看对应的api文件。

在Go中，对于自定义类型要进行排序，除了应用 sort.Slice ，也能够应用上面这种比拟麻烦的形式，实现排序相干接口：

type SortBy []Type
func (a SortBy) Len() int           { return len(a) }
func (a SortBy) Swap(i, j int)      { a[i], a[j] = a[j], a[i] }
func (a SortBy) Less(i, j int) bool { return a[i] < a[j] }

搜寻
Go

Go中能够别离应用 SearchInts，SearchFloat64s，SearchStrings对不同类型的切片进行排序，本文次要以SearchInts为例讲述：

func SearchInts(a []int, x int) int

对应的官网介绍如下

SearchInts searches for x in a sorted slice of ints and returns the index as specified by Search. The return value is the index to insert x if x is not present (it could be len(a)). The slice must be sorted in ascending order.

这里有几点十分重要，许多人很容易搞错或素来没有留神到：

如果元素找不到，返回的是这个sorted ints 中大于x的元素中最小的下标。
如果x比 sorted ints中的所有元素都大，返回的下标是这个 sorted ints 的长度；
如果这个 sorted ints 外面有多个雷同的x元素，返回下标最小的那个

接下来咱们将会通过具体的例子介绍这3点：

package main

import (
    "fmt"
    "sort"
)

func main() {
    a := []int{1, 2, 3, 4, 6, 7, 8}

    x := 2
    i := sort.SearchInts(a, x)// 找到 x
    fmt.Printf("found %d at index %d in %v\n", x, i, a)

    x = 5
    i = sort.SearchInts(a, x) // 找不到 x 
    fmt.Printf("%d not found, can be inserted at index %d in %v\n", x, i, a)


    x = 9
    i = sort.SearchInts(a, x) // 找不到 x 且 x 比所有元素都大
    fmt.Printf("%d not found, can be inserted at index %d in %v\n", x, i, a)
}

例如下面的例子中，对于有序的切片 a:=[]int{1, 2, 3, 4, 6, 7, 8} ，

执行sort.SearchInts(a, 2)，因为2在a中存在，所以返回对应的下表索引1；
执行 sort.SearchInts(a, 5)，因为5不存在，所以返回a中比第一个比5大的数也就是6的下标索引4；
执行 sort.SearchInts(a, 9)，因为9比a中所有元素都大，所以返回a的长度。

由下面的例子咱们也就晓得了，如果要判断一个元素是否存在，执行sort.SearchInts搜寻后还要再进行一下判断。

对于蕴含反复元素的切片，如下：

a = []int{0,2,2,2,2,2,5}

如果执行搜寻 sort.SearchInts(a, 2)，那么返回的是a中等于2的元素中下标最小的那个元素的下标，为什么会有这样的行为呢？咱们看下

对应的源码，SearchInts 底层其实是调用了 Search 办法，其实SearchFloat64s，SearchStrings 也都是调用了Search

咱们先看下Search的源码如下，是十分经典的二分搜寻的写法

func Search(n int, f func(int) bool) int {
    // Define f(-1) == false and f(n) == true.
    // Invariant: f(i-1) == false, f(j) == true.
    i, j := 0, n
    for i < j {
        h := int(uint(i+j) >> 1) // avoid overflow when computing h
        // i ≤ h < j
        if !f(h) {
            i = h + 1 // preserves f(i-1) == false
        } else {
            j = h // preserves f(j) == true
        }
    }
    // i == j, f(i-1) == false, and f(j) (= f(i)) == true  =>  answer is i.
    return i
}

SearchInts 调用 Search 时传递的函数是 return a[i] >= x

func SearchInts(a []int, x int) int {
    return Search(len(a), func(i int) bool { return a[i] >= x })
}

联合下面 sort.SearchInts的源码，咱们就能够明确为什么 sort.SearchInts会有那样的行为，因为 sort.SearchInts其实是在切片中查问大于等于特定值x的最小元素的下标。

所以如果切片中如果有多个雷同的元素，查问时sort.SearchInts返回的是最右边的那个元素下标，如果咱们要返回最左边的那个元素的下标呢？其实很简略，咱们只须要通过 sort.SearchInts查问 x+1 的下标，而后将这个下标减去一返回就能够失去了。

Rust

在Rust进行二分搜寻罕用的办法是binary_search ，该办法返回Result构造。

如果找到则返回Result::Ok , 蕴含找到元素的下标，如果有多个元素都匹配，则返回的元素的下标不确定；这个跟Go中的不一样；
如果找不到返回 Result::Err，外面是vec中第一个大于搜寻元素的索引下标，如果查问的元素比所有元素都大，外面的值等于vec的长度，这个行为跟go中统一；

咱们先来看上面的例子：

let s = [0, 1, 1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55];

    assert_eq!(s.binary_search(&13), Ok(9));
    assert_eq!(s.binary_search(&4), Err(7));
    assert_eq!(s.binary_search(&100), Err(13));

如果有序vec中蕴含反复的元素，例如下面的s中1有多个，那么返回的下标就不确定了，可能是从1到4中的任意一个下标。

let r = s.binary_search(&1);
    assert!(match r {
        Ok(1..=4) => true,
        _ => false,
    });

为了了解binary_search的行为，咱们看下binary_search的源码，外部是调用了 binary_search_by，两个办法的源码如下：

#[stable(feature = "rust1", since = "1.0.0")]
    pub fn binary_search(&self, x: &T) -> Result<usize, usize>
    where
        T: Ord,
    {
        self.binary_search_by(|p| p.cmp(x))
    }

binary_search_by 的官网文档形容如下：

Binary searches this sorted slice with a comparator function.

The comparator function should implement an order consistent with the sort order of the underlying slice, returning an order code that indicates whether its argument is Less, Equal or Greater the desired target.

If the value is found then Result::Ok is returned, containing the index of the matching element. If there are multiple matches, then any one of the matches could be returned. The index is chosen deterministically, but is subject to change in future versions of Rust. If the value is not found then Result::Err is returned, containing the index where a matching element could be inserted while maintaining sorted order.

基于传入的比拟函数进行排序；
比拟函数要返回Ordering类型 F: FnMut(&’a T) -> Ordering；
如果找到就返回 Result::Ok 蕴含找到元素的下标，如果有多个匹配，任意一个都有可能返回，这个在上面的源码中能够看到；

#[stable(feature = "rust1", since = "1.0.0")]
    #[inline]
    pub fn binary_search_by<'a, F>(&'a self, mut f: F) -> Result<usize, usize>
    where
        F: FnMut(&'a T) -> Ordering,
    {
        let mut size = self.len();
        let mut left = 0;
        let mut right = size;
        while left < right {
            let mid = left + size / 2;

            // SAFETY: the call is made safe by the following invariants:
            // - `mid >= 0`
            // - `mid < size`: `mid` is limited by `[left; right)` bound.
            let cmp = f(unsafe { self.get_unchecked(mid) });

            // The reason why we use if/else control flow rather than match
            // is because match reorders comparison operations, which is perf sensitive.
            // This is x86 asm for u8: https://rust.godbolt.org/z/8Y8Pra.
            if cmp == Less {
                left = mid + 1;
            } else if cmp == Greater {
                right = mid;
            } else {
                // SAFETY: same as the `get_unchecked` above
                unsafe { crate::intrinsics::assume(mid < self.len()) };
                return Ok(mid);
            }

            size = right - left;
        }
        Err(left)
    }

如果想要在rust中实现 golang 中 SearchInts 那样的行为，能够应用 partition_point, 官网介绍如下：

Returns the index of the partition point according to the given predicate (the index of the first element of the second partition).

The slice is assumed to be partitioned according to the given predicate. This means that all elements for which the predicate returns true are at the start of the slice and all elements for which the predicate returns false are at the end. For example, [7, 15, 3, 5, 4, 12, 6] is a partitioned under the predicate x % 2 != 0 (all odd numbers are at the start, all even at the end).

If this slice is not partitioned, the returned result is unspecified and meaningless, as this method performs a kind of binary search.

简略来说，partition_point 就是用于把输出的slice分为两局部，返回索引以前的元素都是满足条件 P，索引以及索引当前的元素不满足条件P。

例子如下：

let v = [1, 2, 3, 3, 5, 6, 7];
let i = v.partition_point(|&x| x < 5);

assert_eq!(i, 4);
assert!(v[..i].iter().all(|&x| x < 5));
assert!(v[i..].iter().all(|&x| !(x < 5)));

下面基于条件 x<5 把 v 分为两局部, i 之前的都是小于5，i 以及 i 前面的元素都是大于5，partition_point 底层其实也是调用了binary_search_by来实现，如下：

#[stable(feature = "partition_point", since = "1.52.0")]
    pub fn partition_point<P>(&self, mut pred: P) -> usize
    where
        P: FnMut(&T) -> bool,
    {
        self.binary_search_by(|x| if pred(x) { Less } else { Greater }).unwrap_or_else(|i| i)
    }

总结

本文介绍了Go语言和Rust语言中排序和二分搜寻相干api的用法，同时对比了两种语言在进行二分搜寻的一些不同行为个性，防止在开发中踩坑，总结如下：

Go 中切片能够应用Sort办法进行排序，应用sort.Slice()进行自定义行为的排序；
Rust中能够应用sort_by ，sort_by_key进行自定义排序，同时还有性能更高的sort_by_cached_key, _unstable版本可供选择；
Go中应用sort.SearchInts，sort.SearchFloat64s，sort.SearchStrings 在相应的切片中执行二分搜寻，外部是调用sort.Search查问大于等于特定值的索引；
Rust中应用binary_search 执行二分搜寻，外部其实是调用 binary_search_by 执行的二分查找；
Go调用sort.Search搜寻的时候，遇到满足条件的元素还是会持续搜寻；而Rust的binary_search_by 搜寻的时候，遇到满足条件的元素就会立即返回。这两种搜索算法实现的不同，导致在查问蕴含反复元素的数据时候，获取的后果就会不一样。

关于golang:Go-Rust-排序和二分搜索的对比

评论

发表回复取消回复

更多文章

DDN HPC 存储硬件架构设计深度分析

探秘IO500：从Lustre并行文件系统出发，开启HPC存储性能新征程

苹果iOS打包的ipa应用无法安装？一篇文章带你了解可能的原因及排查方法

图解Golang：从零开始实现简易版过期LRU缓存

关于golang:Go-Rust-排序和二分搜索的对比

评论

发表回复 取消回复

更多文章

DDN HPC 存储硬件架构设计深度分析

探秘IO500：从Lustre并行文件系统出发，开启HPC存储性能新征程

苹果iOS打包的ipa应用无法安装？一篇文章带你了解可能的原因及排查方法

图解Golang：从零开始实现简易版过期LRU缓存

发表回复取消回复