基本概念

优先队列(priority queue)是一种非凡的队列,取出元素的程序是依照元素的优先权(关键字)大小,而不是进入队列的程序,堆就是一种优先队列的实现。堆个别是由数组实现的,逻辑上堆能够被看做一个齐全二叉树(除底层元素外是齐全充斥的,且底层元素是从左到右排列的)。

堆分为最大堆和最小堆,最大堆是指每个根结点的值大于左右孩子的节点值,最小堆则是根结点的值小于左右孩子的值。

实现

Python中堆的是以小根堆的形式实现,本文次要介绍Python源码中入堆和出堆办法的实现:

"""                                  0                  1                                 2          3               4                5               6      7       8       9       10      11      12      13      14    15 16   17 18   19 20   21 22   23 24   25 26   27 28   29 30"""# push 将newitem放入底部,从底部开始把父结点往下替换def heappush(heap, item):    """Push item onto heap, maintaining the heap invariant."""    heap.append(item)    _siftdown(heap, 0, len(heap)-1)def _siftdown(heap, startpos, pos):    newitem = heap[pos]    # Follow the path to the root, moving parents down until finding a place    # newitem fits.    while pos > startpos:        parentpos = (pos - 1) >> 1        parent = heap[parentpos]        if newitem < parent:            heap[pos] = parent            pos = parentpos            continue        break    heap[pos] = newitem# pop 顶部元素被弹出,把最初一个元素放到顶部,在把子节点(小于本人的)往上替换def heappop(heap):    """Pop the smallest item off the heap, maintaining the heap invariant."""    lastelt = heap.pop()    # pop tail in list                                                        # raises appropriate IndexError if heap is empty    if heap:        returnitem = heap[0]        heap[0] = lastelt        _siftup(heap, 0)        return returnitem    return lasteltdef _siftup(heap, pos):    endpos = len(heap)    startpos = pos    newitem = heap[pos]    # Bubble up the smaller child until hitting a leaf.    childpos = 2*pos + 1    # leftmost child position    while childpos < endpos:        # Set childpos to index of smaller child.        rightpos = childpos + 1        if rightpos < endpos and not heap[childpos] < heap[rightpos]:            childpos = rightpos        # Move the smaller child up.        heap[pos] = heap[childpos]        pos = childpos        childpos = 2*pos + 1    # The leaf at pos is empty now. Put newitem there, and bubble it up    # to its final resting place (by sifting its parents down).    heap[pos] = newitem    _siftdown(heap, startpos, pos)

对于_siftup办法实现的解释

能够看到python中_siftup办法的实现是和教科书以及直觉不太吻合的:

  1. siftup只关怀哪个子节点小,就把它往上移(并不关怀子节点是否大于本人)
  2. 最初当节点达到叶子节点后在通过siftDown把大于本人的父结点往下移

如下是示意图,按理来说每次_siftup的时候间接和子节点比拟仿佛更高效。

"""      3   4       6 9  15  [8]        [8]  4       69  15        4 [8]      69  15       4  9       6[8] 15    _siftdown      4 [8]      69  15"""

接下来,先贴上源码中的解释

# The child indices of heap index pos are already heaps, and we want to make# a heap at index pos too.  We do this by bubbling the smaller child of# pos up (and so on with that child's children, etc) until hitting a leaf,# then using _siftdown to move the oddball originally at index pos into place.## We *could* break out of the loop as soon as we find a pos where newitem <=# both its children, but turns out that's not a good idea, and despite that# many books write the algorithm that way.  During a heap pop, the last array# element is sifted in, and that tends to be large, so that comparing it# against values starting from the root usually doesn't pay (= usually doesn't# get us out of the loop early).  See Knuth, Volume 3, where this is# explained and quantified in an exercise.## Cutting the # of comparisons is important, since these routines have no# way to extract "the priority" from an array element, so that intelligence# is likely to be hiding in custom comparison methods, or in array elements# storing (priority, record) tuples.  Comparisons are thus potentially# expensive.

从这里的形容,咱们能够这么了解,因为位于 pos 的 ball 是从 end 处挪动到此的,它的所有 child indices 都是 heap,通过 _siftup bubbling smaller child up till hitting a leaf, then use _siftdown to move the ball to place. 这个过程会发现和教科书中讲的不一样,个别状况下 _siftup 过程中须要比拟以后 ball 与 both childs,这样能够尽快从 loop 中 break,然而事实上,ball 的值一般来说会很大(因为在heap中被排在的end),那么把它从 root 处开始和 child 开始比拟通常没有意义,也就是因为可能会始终 compare 直到比拟底部甚至可能到叶子处,这样来看通常并不会让咱们 break loop early(作者提到在The Art of Computer Programming volume3 中有具体的论述和比拟)。此外,因为comparison methods实现办法的不同(比方在tuples,或者自定义类中),comparisons 有可能 more expensive,所以python在对heap的实现上是采纳 siftup till leaf then siftdown to right place 的办法。