关于后端:最小堆提升每次排序的效率

之前写过一个散布是任务调度零碎，每次执行完工作都要对工作进行排序，应用最小堆的确优化了效率及cpu

我的项目中须要应用一个简略的定时任务调度的框架，最后间接从GitHub上搜了一个star比拟多的，就是https://github.com/robfig/cron，目前有8000+ star。刚开始应用的时候发现问题不大，然而随着单机须要定时调度的工作越来越多，高峰期差不多靠近500QPS，随着业务的推广应用，能够预期增长还会比拟快，然而曾经遇到CPU使用率偏高的问题，通过pprof剖析，很多都是在做排序，看了下这个我的项目的代码，整体执行的过程大略如下：

对所有工作进行排序，依照下次执行工夫进行排序

抉择数组中第一个工作，计算下次执行工夫减去以后工夫失去工夫t，而后sleep t

而后从数组第一个元素开始遍历工作，如果此工作须要调度的工夫 < now，那么就执行此工作，执行之后从新计算这个工作的next执行工夫

每次待执行的工作执行结束之后，都会从新对这个数组进行排序

而后再循环从排好序的数组中找到第一个须要执行的工作去执行。

代码如下：

for {        // Determine the next entry to run.        sort.Sort(byTime(c.entries))        var timer *time.Timer        if len(c.entries) == 0 || c.entries[0].Next.IsZero() {            // If there are no entries yet, just sleep - it still handles new entries            // and stop requests.            timer = time.NewTimer(100000 * time.Hour)        } else {            timer = time.NewTimer(c.entries[0].Next.Sub(now))        }        for {            select {            case now = <-timer.C:                now = now.In(c.location)                c.logger.Info("wake", "now", now)                // Run every entry whose next time was less than now                for _, e := range c.entries {                    if e.Next.After(now) || e.Next.IsZero() {                        break                    }                    c.startJob(e.WrappedJob)                    e.Prev = e.Next                    e.Next = e.Schedule.Next(now)                    c.logger.Info("run", "now", now, "entry", e.ID, "next", e.Next)                }            case newEntry := <-c.add:                timer.Stop()                now = c.now()                newEntry.Next = newEntry.Schedule.Next(now)                c.entries = append(c.entries, newEntry)                c.logger.Info("added", "now", now, "entry", newEntry.ID, "next", newEntry.Next)            case replyChan := <-c.snapshot:                replyChan <- c.entrySnapshot()                continue            case <-c.stop:                timer.Stop()                c.logger.Info("stop")                return            case id := <-c.remove:                timer.Stop()                now = c.now()                c.removeEntry(id)                c.logger.Info("removed", "entry", id)            }            break        }    }

问题就不言而喻了，执行一个工作（或几个工作）都从新计算next执行工夫，从新排序，最坏状况就是每次执行1个工作，排序一遍，那么执行k个工作须要的工夫的工夫复杂度就是O(k*nlogn)，这无疑是十分低效的。

于是想着怎么优化一下这个框架，不难想到每次找最先须要执行的工作就是从一堆工作中找schedule_time最小的那一个（设schedule_time是工作的执行工夫），那么比拟容易想到的思路就是应用最小堆：

在初始化工作列表的时候就间接构建一个最小堆

每次执行查看peek元素是否须要执行

须要执行就pop堆顶元素，计算next执行工夫，从新push入堆

不须要执行就break到外层循环取堆顶元素，计算next_time-now() = need_sleep_time，而后select 睡眠、add、remove等操作。

我批改为min-heap的形式之后，每次增加工作的时候通过堆的属性进行up和down调整，每次增加工作工夫复杂度O(logn)，执行k个工作工夫复杂度是O(klogn)。通过验证线上CPU应用升高4~5倍。CPU从50%左右升高至10%左右。

优化后的代码如下，只是其中一部分。

全副的代码也曾经在github上曾经创立了一个Fork的仓库并推送下来了，全副单测Case也都PASS。感兴趣能够点过来看。https://github.com/tovenja/cron

    for {        // Determine the next entry to run.        // Use min-heap no need sort anymore     // 这里不再须要排序了，因为add的时候间接进行堆调整     //sort.Sort(byTime(c.entries))        var timer *time.Timer        if len(c.entries) == 0 || c.entries[0].Next.IsZero() {            // If there are no entries yet, just sleep - it still handles new entries            // and stop requests.            timer = time.NewTimer(100000 * time.Hour)        } else {            timer = time.NewTimer(c.entries[0].Next.Sub(now))            //fmt.Printf(" %v, %+v\n", c.entries[0].Next.Sub(now), c.entries[0].ID)        }        for {            select {            case now = <-timer.C:                now = now.In(c.location)                c.logger.Info("wake", "now", now)                // Run every entry whose next time was less than now                for {                    e := c.entries.Peek()                    if e.Next.After(now) || e.Next.IsZero() {                        break                    }                    e = heap.Pop(&c.entries).(*Entry)                    c.startJob(e.WrappedJob)                    e.Prev = e.Next                    e.Next = e.Schedule.Next(now)                    heap.Push(&c.entries, e)                    c.logger.Info("run", "now", now, "entry", e.ID, "next", e.Next)                }            case newEntry := <-c.add:                timer.Stop()                now = c.now()                newEntry.Next = newEntry.Schedule.Next(now)                heap.Push(&c.entries, newEntry)                c.logger.Info("added", "now", now, "entry", newEntry.ID, "next", newEntry.Next)            case replyChan := <-c.snapshot:                replyChan <- c.entrySnapshot()                continue            case <-c.stop:                timer.Stop()                c.logger.Info("stop")                return            case id := <-c.remove:                timer.Stop()                now = c.now()                c.removeEntry(id)                c.logger.Info("removed", "entry", id)            }            break        }    }

转自：

cnblogs.com/aboutblank/p/14860571.html

更多好文关注

本文由mdnice多平台公布