关于java:空间范围中点位数量的快速统计算法

之前做了个需要，过后搞了一个基于树结构的算法，这里记录一下。需要是在任意地图层级下，实时统计以后屏幕中散布的点的数量，数据放在 mongodb 的单个表中，总量是一亿多。地图范畴是某个省份。

在任意地图层级下查问，意味着以后屏幕中散布的点的数量能够是 0~1 亿 +，这样的话，间接从数据库查问很难做到实时返回；过后测试了一下，间接在内存中计数 1000w 也要 1s 左右。所以，思考本人实现一个计数算法，该算法有肯定的误差。

算法也比较简单，次要思路有两点：

把数据放在内存中
提前计数。

结构一颗多叉树，前面称为统计树；
依据 内存使用量 和误差容忍度 能够自定义树的深度和分叉数量；
树的每个节点寄存的是一个范畴，与该范畴内的点的数量。

首先，做出全省边界的最小矩形框(由 4 个经纬度坐标点形成), 即 java JTS 外面的 Envelope，作为根节点的范畴，这里称为 E0,0；
其次，将 E0 均分为 n 个雷同的矩形，例如 n =16，别离是 E1,0~E1,15，作为根节点的子节点的范畴；
而后，再将 E1,0~E1,15 中的每个矩形持续宰割，依据内存使用量和误差容忍度确定树的深度。
最初，遍历所有数据，计算出树中的每个节点中范畴内的点数，存在树的节点中。
深度是 3 的树，如下图：

例如查问范畴是下图红色局部 S

首先，判断 S 的范畴与根节点的范畴 E0,0 的关系：
如果 S 蕴含 E0,0，则 S 中点位的数量就是提前统计的 E0,0 范畴的点数；这里显然，S 是在 E0,0 范畴内，两者是相交关系。

那就遍历 E0,0 的所有子节点，再次判断 S 与每个 E1,n(0<=n<=15)的关系：
如果 S 蕴含 E1,k(0<=k<=15), 则查问后果就加上 E1,k 所在节点的统计数量；否则，再遍历 E1,k 的子节点，直到达到树的叶子节点；如果 S 与某个叶子节点 Ep,q 相交，计算两个面交加的面积，将最初的统计后果加上 Ep,q 的统计数量 * 交加的面积 /Ep,q 的面积。

因为最初的叶子节点没有子节点，因而用估计量代替理论数量，这里简略地应用了面积的百分比作为对数量的预计。该算法的误差就来源于此。能够看出，每个叶子节点的范畴越小，最初的误差也越小，这也意味着树的分叉数量和深度的减少，即减少了内存使用量和查问工夫。

为了可能新增和删除点位后，树内的统计量也随之变动，须要在事后的统计中为每条数据设置一个字段进行标识，标记该数据曾经在树中。

新增数据：

判断该数据节点在哪一个叶子节点的范畴内，将统计数量加 1；
而后往上遍历所有父节点，把每个父节点的统计数量都加 1；
最初在数据库中标识该条记录曾经退出到统计树。

对于删除数据，与新增数据相似：

判断该数据节点在哪一个叶子节点的范畴内，将统计数量减 1；
而后往上遍历所有父节点，把每个父节点的统计数量都减 1；
最初在数据库中删除统计树标识。

更新操作 (数据点位的地位发送变动) 就是新增和删除的组合，对地位更新之前执行上述删除操作，对地位更新之后执行上述新增操作。

事后遍历数据表生成统计树是一个很耗时的工作，做好了树之后，不可能每次利用重启都要从新做一遍，因而，能够将做好的统计树进行序列化，每次重启时，再反序列化进去。还能够开启定时工作，周期性地执行序列化操作，每次重启时，反序列化最新的那个序列化文件。

树的高度和分叉数量须要做一个衡量，一般来说，CRUD 都很快，都是毫秒级的；
内存占用状况：对于 32 个分叉、深度是 6 层的树，节点数量是一百万多点，对于 java，假如每个节点的范畴由 4 个 double 类型的数据表示，统计量是一个 int 类型，每个节点就是 36 个字节，统计树的大小应该不超过 100MB。
查问的范畴越大，相对误差越小；小范畴查问能够思考用其它计划，例如间接查数据库。不严格的测试后果：对于 32 个分叉、深度是 6 层的统计树，总数是 1 亿的数据量，超过百万点位的查问平均误差小于 1%。

顺便说一下，在生成统计树的时候很耗时 (下面曾经提过了)，有可能发生意外(网络问题、数据库问题) 导致中断的状况，因而能够在树节点内增加额定字段，标识该节点是否曾经统计实现。这样，当程序产生异样时，在捕捉异样中进行序列化操作，这样下次就能够持续统计，而不是重头开始。

还有不太重要的一点是，统计之前，能够先大抵预计统计树的内存占用量，个别其实也不须要很大的内存空间。

最初一点，能够思考将多个统计量放到同一个统计对象外面，例如人口点位数量、poi 点位数量，省得保护多个树结构。

上面是 java 源码实现，次要的依赖就是 JTS 和序列化包 msgpack，写的比拟潦草，起初也没有再优化，欢送指出问题。

// 统计树的节点对象
@Message
public class Cnode implements Serializable {
    // 树的深度, 根节点是 1
    private int level;
    private Envelope envelope;
    // 这里寄存统计量，统计标识
    private StatisticBo statisticBo;
    // 子节点
    private Cnode[][] subNodes;

    Cnode() {}

    Cnode(Envelope envelope, Ctree ctree) {
        this.envelope = envelope;
        computeLevel(ctree);
    }

    private void computeLevel(Ctree ctree) {if (ctree.getRoot() == null) {this.level = 1;} else {int times = new Long(Math.round(ctree.getRoot().envelope.getWidth() / this.envelope.getWidth())).intValue();
            int count = 1;
            while (times != 1) {
                count++;
                times = times / ctree.getWidthSize();}
            this.level = count;
        }
    }

    public void updateStatic(StatisticBo statisticBo) {if (this.statisticBo == null) {this.statisticBo = statisticBo;} else {this.statisticBo.plus(statisticBo);
        }
    }

   // ... 省略 getter、setter
}

// 统计量
@Message
public class StatisticBo implements Serializable {

    private int personCount;
    // 节点统计实现标识
    private boolean personStatisticFinish;

    private int poiCount;
    // 节点统计实现标识
    private boolean poiStatisticFinish;

    public StatisticBo plus(StatisticBo statisticBo) {if (statisticBo.personStatisticFinish) {this.personCount = statisticBo.personCount;}
        if (statisticBo.poiStatisticFinish) {this.poiCount = statisticBo.poiCount;}
        
        this.personCount += statisticBo.getPersonCount();
        this.poiCount += statisticBo.getPoiCount();
        

        return this;
    }

    public com.tuyun.mapserver.bo.StatisticBo multiply(double factor) {
        this.personCount *= factor;
        this.poiCount *= factor;
        
        return this;
    }
   
    // ... 省略 getter、setter
}

// 统计树对象
@Message
public class Ctree implements Serializable {private static final Logger logger = LoggerFactory.getLogger(Ctree.class);
    /**
     * 树的根节点
     */
    private Cnode root;
    /**
     * envelop 横向分隔的数量
     */
    private int widthSize;
    /**
     * envelop 纵向分隔的数量
     */
    private int heightSize;
    /**
     * 默认 envelop 分为 16 个子方格
     */
    private final int default_widthSize = 4;
    private final int defaust_heightSize = 4;


    public Ctree() {}

    /**
     * 初始化树，设置树的深度，所有节点的 envelope
     *
     * @param envelope
     * @param depth
     */
    public void init(Envelope envelope, int widthSize, int heightSize, int depth) {if (widthSize < 1 || heightSize < 1) {logger.warn("envelop 分隔的个数不能小于 1: [widthSize: {}], [heightSize: {}]", widthSize, heightSize);
            logger.warn("应用默认值: [widthSize: {}], [heightSize: {}]", default_widthSize, defaust_heightSize);
            this.widthSize = default_widthSize;
            this.heightSize = defaust_heightSize;
        } else {
            this.widthSize = widthSize;
            this.heightSize = heightSize;
        }
        if (depth < 1) {throw new IllegalArgumentException("树的深度不能小于 1: [" + depth + "]");
        }
        this.root = new Cnode(envelope, this);
        if (depth == 1) {return;}
        createSubNodes(this.root);
        for (int i = 0; i < depth - 2; i++) {moreDeep(this.root.getSubNodes());
        }

    }

    private void createSubNodes(Cnode cnode) {Envelope envelope = cnode.getEnvelope();
        Cnode[][] subNodes = new Cnode[widthSize][heightSize];
        double minx = envelope.getMinX();
        double miny = envelope.getMinY();
        for (int i = 0; i < widthSize; i++) {for (int j = 0; j < heightSize; j++) {Envelope envelope1 = new Envelope(minx + envelope.getWidth() / widthSize * i,
                        minx + envelope.getWidth() / widthSize * (i + 1),
                        miny + envelope.getHeight() / heightSize * j,
                        miny + envelope.getHeight() / heightSize * (j + 1));
                subNodes[i][j] = new Cnode(envelope1, this);
            }
        }
        cnode.setSubNodes(subNodes);
    }

    /**
     * 减少树的深度
     *
     * @param cnodes
     */
    private void moreDeep(Cnode[][] cnodes) {for (int m = 0; m < widthSize; m++) {for (int n = 0; n < widthSize; n++) {if (cnodes[m][n].getSubNodes() == null) {createSubNodes(cnodes[m][n]);
                } else {moreDeep(cnodes[m][n].getSubNodes());
                }
            }
        }
    }

    /**
     * 获取树的深度
     *
     * @return
     */
    public int getDepth() {if (root == null) {return 0;}
        int depth = 1;
        Cnode[][] subNodes = root.getSubNodes();
        while (subNodes != null) {
            depth++;
            subNodes = subNodes[0][0].getSubNodes();}
        return depth;
    }

    public int getCnodeCount() {if (this.root == null) {return 0;}
        return getCnodeCount(this.root);
    }

    private int getCnodeCount(Cnode cnode) {
        int count = 1;
        if (cnode.getSubNodes() == null) {return count;}
        Cnode[][] subNodes = cnode.getSubNodes();
        for (int i = 0; i < subNodes.length; i++) {for (int j = 0; j < subNodes[0].length; j++) {Cnode cnode1 = subNodes[i][j];
                int count1 = getCnodeCount(cnode1);
                count += count1;
            }

        }
        return count;
    }

    // ... 省略 getter、setter
}


// 程序入口
public class CtreeMain {
    public static Ctree ctree;

    private static final String PATH = "E:\\ctree.msgpack";
    
    public static void main(String[] args) throws Exception {
        try {File file = new File(PATH);
            if (file.exists()) {
                try {
                    // 如果序列化文件存在, 反序列化获取统计树
                    deserializer(PATH);
                } catch (IOException e) {e.printStackTrace();
                }
            } else {Envelope rootEnvelop = new Envelope(Minx, Maxx, Miny, Maxy);
                // 结构树(创立节点, 设置范畴)
                ctree = buildCtree(rootEnvelop, 4, 4, 6);
            }
            // 填充树(设置统计量)
            fillCtree(ctree);
            // 统计实现后，再序列化一次
            serializer(ctree,PATH);
        } catch (Exception e) {e.printStackTrace();
            if (ctree != null && ctree.getRoot() != null) {serializer(PATH);
            }
        }
    }
}

// 反序列化
public void deserializer(String path) throws IOException {MessagePack messagePack = new MessagePack();
    messagePack.register(Envelope.class);
    ctree = messagePack.read(new BufferedInputStream(new FileInputStream(new File(path))), Ctree.class);
}

// 结构树
public Ctree buildCtree(Envelope envelope, int widthSize, int heightSize, int depth) {Ctree ctree = new Ctree();
    ctree.init(envelope, widthSize, heightSize, depth);
    return ctree;
}

// 填充树，这个办法的实现是用于统计单个统计量的, 如果要同时统计多个统计量，须要改变 / 或者 屡次调用
public StatisticBo fillCnode(Cnode cnode) {
    //TODO 统计不同的数据时，这里要改变
    // 曾经统计过了
    if (cnode.getStatisticBo() != null && cnode.getStatisticBo().isPersonStatisticFinish()) {return cnode.getStatisticBo();
    }
    StatisticBo statisticBo = new StatisticBo();
    if (cnode.getSubNodes() == null) {Envelope envelope = cnode.getEnvelope();
        //TODO 统计不同的数据时，这里要改变
        // 依据范畴查数据库计算数量，这里不是遍历数据库所有记录，而是依据每个范畴查问的
        statisticBo = staticPerson(envelope);
//statisticBo = staticPoi(envelope);
        cnode.updateStatic(statisticBo);
        return statisticBo;
    }

    Cnode[][] subNodes = cnode.getSubNodes();
    for (int i = 0; i < subNodes.length; i++) {for (int j = 0; j < subNodes[0].length; j++) {Cnode cnode1 = subNodes[i][j];
            StatisticBo statisticBo1 = fillCnode(cnode1);
            statisticBo.plus(statisticBo1);
        }
    }
    cnode.updateStatic(statisticBo);
    return statisticBo;
}

// 序列化树
public void serializer(String path) throws IOException {MessagePack messagePack = new MessagePack();
    messagePack.register(Envelope.class);
    byte[] bytes = messagePack.write(ctree);
    ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
    Files.newOutputStream(Paths.get(path)).write(byteBuffer.array());
}

// 数量查问
public static StatisticBo queryCount(Geometry geometry) {Cnode root = ctree.getRoot();
    if (geometry.contains(geometryFactory.toGeometry(root.getEnvelope())) || root.getSubNodes() == null) {return root.getStatisticBo() != null ? root.getStatisticBo() : new StatisticBo();
    }
    return queryCount(geometry, root.getSubNodes());
}

// 递归调用
private static StatisticBo queryCount(Geometry geometry, Cnode[][] subNodes) {StatisticBo statisticBo = new StatisticBo();
    if (subNodes != null) {for (int i = 0; i < subNodes.length; i++) {for (int j = 0; j < subNodes[0].length; j++) {Cnode cNode = subNodes[i][j];
                Geometry envelopeGeometry = geometryFactory.toGeometry(cNode.getEnvelope());
                if (geometry.contains(envelopeGeometry)) {statisticBo.plus(cNode.getStatisticBo());
                } else if (envelopeGeometry.contains(geometry) || envelopeGeometry.intersects(geometry)) {Cnode[][] cNodes = cNode.getSubNodes();
                    if (cNodes == null) {statisticBo.plus(cNode.getStatisticBo().multiply(Math.random()));
                    } else {statisticBo.plus(queryCount(geometry, cNodes));
                    }
                }
            }
        }
    }
    return statisticBo;
}

转发请注明原作者和出处！

关于java:空间范围中点位数量的快速统计算法

需要背景

算法介绍

算法详情

查问：

新增和删除

更新

序列化和反序列化

算法性能

源码

Just My Socks（注册教程内含优惠码）

关于java:空间范围中点位数量的快速统计算法

需要背景

算法介绍

算法详情

查问：

新增和删除

更新

序列化和反序列化

算法性能

源码

Just My Socks（注册教程 内含优惠码）

Just My Socks（注册教程内含优惠码）