关于算法:Python源码阅读堆的入堆出堆方法实现

65次阅读

共计 3157 个字符,预计需要花费 8 分钟才能阅读完成。

基本概念

优先队列 (priority queue) 是一种非凡的队列,取出元素的程序是依照元素的优先权 (关键字) 大小,而不是进入队列的程序,堆就是一种优先队列的实现。堆个别是由数组实现的,逻辑上堆能够被看做一个齐全二叉树(除底层元素外是齐全充斥的,且底层元素是从左到右排列的)。

堆分为最大堆和最小堆,最大堆是指每个根结点的值大于左右孩子的节点值,最小堆则是根结点的值小于左右孩子的值。

实现

Python 中堆的是以小根堆的形式实现,本文次要介绍 Python 源码中入堆和出堆办法的实现:

"""
                                  0

                  1                                 2

          3               4                5               6

      7       8       9       10      11      12      13      14

    15 16   17 18   19 20   21 22   23 24   25 26   27 28   29 30
"""
# push 将 newitem 放入底部,从底部开始把父结点往下替换
def heappush(heap, item):
    """Push item onto heap, maintaining the heap invariant."""
    heap.append(item)
    _siftdown(heap, 0, len(heap)-1)

def _siftdown(heap, startpos, pos):
    newitem = heap[pos]
    # Follow the path to the root, moving parents down until finding a place
    # newitem fits.
    while pos > startpos:
        parentpos = (pos - 1) >> 1
        parent = heap[parentpos]
        if newitem < parent:
            heap[pos] = parent
            pos = parentpos
            continue
        break
    heap[pos] = newitem

# pop 顶部元素被弹出,把最初一个元素放到顶部,在把子节点 (小于本人的) 往上替换
def heappop(heap):
    """Pop the smallest item off the heap, maintaining the heap invariant."""
    lastelt = heap.pop()    # pop tail in list
                                                        # raises appropriate IndexError if heap is empty
    if heap:
        returnitem = heap[0]
        heap[0] = lastelt
        _siftup(heap, 0)
        return returnitem
    return lastelt

def _siftup(heap, pos):
    endpos = len(heap)
    startpos = pos
    newitem = heap[pos]
    # Bubble up the smaller child until hitting a leaf.
    childpos = 2*pos + 1    # leftmost child position
    while childpos < endpos:
        # Set childpos to index of smaller child.
        rightpos = childpos + 1
        if rightpos < endpos and not heap[childpos] < heap[rightpos]:
            childpos = rightpos
        # Move the smaller child up.
        heap[pos] = heap[childpos]
        pos = childpos
        childpos = 2*pos + 1
    # The leaf at pos is empty now. Put newitem there, and bubble it up
    # to its final resting place (by sifting its parents down).
    heap[pos] = newitem
    _siftdown(heap, startpos, pos)

对于_siftup 办法实现的解释

能够看到 python 中_siftup 办法的实现是和教科书以及直觉不太吻合的:

  1. siftup 只关怀哪个子节点小,就把它往上移(并不关怀子节点是否大于本人)
  2. 最初当节点达到叶子节点后在通过 siftDown 把大于本人的父结点往下移

如下是示意图,按理来说每次_siftup 的时候间接和子节点比拟仿佛更高效。

"""
      3 
  4       6 
9  15  [8]
   
     [8]
  4       6
9  15  

      4
 [8]      6
9  15 

      4
  9       6
[8] 15
    
_siftdown

      4
 [8]      6
9  15
"""

接下来,先贴上源码中的解释

# The child indices of heap index pos are already heaps, and we want to make
# a heap at index pos too.  We do this by bubbling the smaller child of
# pos up (and so on with that child's children, etc) until hitting a leaf,
# then using _siftdown to move the oddball originally at index pos into place.
#
# We *could* break out of the loop as soon as we find a pos where newitem <=
# both its children, but turns out that's not a good idea, and despite that
# many books write the algorithm that way.  During a heap pop, the last array
# element is sifted in, and that tends to be large, so that comparing it
# against values starting from the root usually doesn't pay (= usually doesn't
# get us out of the loop early).  See Knuth, Volume 3, where this is
# explained and quantified in an exercise.
#
# Cutting the # of comparisons is important, since these routines have no
# way to extract "the priority" from an array element, so that intelligence
# is likely to be hiding in custom comparison methods, or in array elements
# storing (priority, record) tuples.  Comparisons are thus potentially
# expensive.

从这里的形容,咱们能够这么了解,因为位于 pos 的 ball 是从 end 处挪动到此的,它的所有 child indices 都是 heap,通过 _siftup bubbling smaller child up till hitting a leaf, then use _siftdown to move the ball to place. 这个过程会发现和教科书中讲的不一样,个别状况下 _siftup 过程中须要比拟以后 ball 与 both childs,这样能够尽快从 loop 中 break,然而事实上,ball 的值一般来说会很大(因为在 heap 中被排在的 end),那么把它从 root 处开始和 child 开始比拟通常没有意义,也就是因为可能会始终 compare 直到比拟底部甚至可能到叶子处,这样来看通常并不会让咱们 break loop early(作者提到在 The Art of Computer Programming volume3 中有具体的论述和比拟)。此外,因为 comparison methods 实现办法的不同(比方在 tuples,或者自定义类中),comparisons 有可能 more expensive,所以 python 在对 heap 的实现上是采纳 siftup till leaf then siftdown to right place 的办法。

正文完
 0