关于java:MyBatis-别再乱用-foreach-批量插入了5000-数据用了-14-分钟实力劝退

8次阅读

共计 4879 个字符,预计需要花费 13 分钟才能阅读完成。

近日,我的项目中有一个耗时较长的 Job 存在 CPU 占用过高的问题,经排查发现,次要工夫耗费在往 MyBatis 中批量插入数据。mapper configuration 是用 foreach 循环做的,差不多是这样。(因为我的项目窃密,以下代码均为本人手写的 demo 代码)

<insert id="batchInsert" parameterType="java.util.List">
    insert into USER (id, name) values
    <foreach collection="list" item="model" index="index" separator=","> 
        (#{model.id}, #{model.name})
    </foreach>
</insert>

这个办法晋升批量插入速度的原理是,将传统的:

INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");

转化为:

INSERT INTO `table1` (`field1`, `field2`) 
VALUES ("data1", "data2"),
("data1", "data2"),
("data1", "data2"),
("data1", "data2"),
("data1", "data2");

在 MySql Docs 中也提到过这个 trick,如果要优化插入速度时,能够将许多小型操作组合到一个大型操作中。现实状况下,这样能够在单个连贯中一次性发送许多新行的数据,并将所有索引更新和一致性查看提早到最初才进行。

乍看上去这个 foreach 没有问题,然而通过我的项目实际发现,当表的列数较多(20+),以及一次性插入的行数较多(5000+)时,整个插入的耗时非常漫长,达到了 14 分钟,这是不能忍的。在材料中也提到了一句话:

Of course don’t combine ALL of them, if the amount is HUGE. Say you have 1000 rows you need to insert, then don’t do it one at a time. You shouldn’t equally try to have all 1000 rows in a single query. Instead break it into smaller sizes.

它强调,当插入数量很多时,不能一次性全放在一条语句里。可是为什么不能放在同一条语句里呢?这条语句为什么会耗时这么久呢?

我查阅了材料发现:

Insert inside Mybatis foreach is not batch, this is a single (could become giant) SQL statement and that brings drawbacks:

  • some database such as Oracle here does not support.
  • in relevant cases: there will be a large number of records to insert and the database configured limit (by default around 2000 parameters per statement) will be hit, and eventually possibly DB stack error if the statement itself become too large.

Iteration over the collection must not be done in the mybatis XML. Just execute a simple Insertstatement in a Java Foreach loop. The most important thing is the session Executor type.

SqlSession session = sessionFactory.openSession(ExecutorType.BATCH);
for (Model model : list) {session.insert("insertStatement", model);
}
session.flushStatements();

Unlike default ExecutorType.SIMPLE, the statement will be prepared once and executed for each record to insert.

从材料中可知,默认执行器类型为 Simple,会为每个语句创立一个新的预处理语句,也就是创立一个 PreparedStatement 对象。在咱们的我的项目中,会不停地应用批量插入这个办法,而因为 MyBatis 对于含有 <foreach> 的语句,无奈采纳缓存,那么在每次调用办法时,都会从新解析 sql 语句。

Internally, it still generates the same single insert statement with many placeholders as the JDBC code above.

MyBatis has an ability to cache PreparedStatement, but this statement cannot be cached because it contains <foreach /> element and the statement varies depending on the parameters. As a result, MyBatis has to 1) evaluate the foreach part and 2) parse the statement string to build parameter mapping [1] on every execution of this statement.

And these steps are relatively costly process when the statement string is big and contains many placeholders.

[1] simply put, it is a mapping between placeholders and the parameters.

从上述材料可知,耗时就耗在,因为我 foreach 后有 5000+ 个 values,所以这个 PreparedStatement 特地长,蕴含了很多占位符,对于占位符和参数的映射尤其耗时。并且,查阅相干材料可知,values 的增长与所需的解析工夫,是呈指数型增长的。

所以,如果非要应用 foreach 的形式来进行批量插入的话,能够思考缩小一条 insert 语句中 values 的个数,最好能达到下面曲线的最底部的值,使速度最快。个别按教训来说,一次性插 20~50 行数量是比拟适合的,工夫耗费也能承受。

重点来了。下面讲的是,如果非要用 <foreach> 的形式来插入,能够晋升性能的形式。而实际上,MyBatis 文档中写批量插入的时候,是举荐应用另外一种办法。(能够看 http://www.mybatis.org/mybati… 中 Batch Insert Support 题目里的内容)

SqlSession session = sqlSessionFactory.openSession(ExecutorType.BATCH);
try {SimpleTableMapper mapper = session.getMapper(SimpleTableMapper.class);
    List<SimpleTableRecord> records = getRecordsToInsert(); // not shown
 
    BatchInsert<SimpleTableRecord> batchInsert = insert(records)
            .into(simpleTable)
            .map(id).toProperty("id")
            .map(firstName).toProperty("firstName")
            .map(lastName).toProperty("lastName")
            .map(birthDate).toProperty("birthDate")
            .map(employed).toProperty("employed")
            .map(occupation).toProperty("occupation")
            .build()
            .render(RenderingStrategy.MYBATIS3);
 
    batchInsert.insertStatements().stream().forEach(mapper::insert);
 
    session.commit();} finally {session.close();
}

即根本思维是将 MyBatis session 的 executor type 设为 Batch,而后屡次执行插入语句。就相似于 JDBC 的上面语句一样。

Connection connection = DriverManager.getConnection("jdbc:mysql://127.0.0.1:3306/mydb?useUnicode=true&characterEncoding=UTF-8&useServerPrepStmts=false&rewriteBatchedStatements=true","root","root");
connection.setAutoCommit(false);
PreparedStatement ps = connection.prepareStatement("insert into tb_user (name) values(?)");
for (int i = 0; i < stuNum; i++) {ps.setString(1,name);
    ps.addBatch();}
ps.executeBatch();
connection.commit();
connection.close();

通过试验,应用了 ExecutorType.BATCH 的插入方式,性能显著晋升,不到 2s 便能全副插入实现。

总结一下,如果 MyBatis 须要进行批量插入,举荐应用 ExecutorType.BATCH 的插入方式,如果非要应用 <foreach>的插入的话,须要将每次插入的记录管制在 20~50 左右。

参考资料

  1. https://dev.mysql.com/doc/ref…
  2. https://stackoverflow.com/que…
  3. https://stackoverflow.com/que…
  4. https://blog.csdn.net/wlwlwlw…
  5. http://blog.harawata.net/2016…
  6. https://www.red-gate.com/simp…
  7. https://stackoverflow.com/que…
  8. http://www.mybatis.org/mybati…

原文链接:https://blog.csdn.net/huangha…

版权申明:本文为 CSDN 博主「huanghanqian」的原创文章,遵循 CC 4.0 BY-SA 版权协定,转载请附上原文出处链接及本申明。

近期热文举荐:

1.1,000+ 道 Java 面试题及答案整顿(2022 最新版)

2. 劲爆!Java 协程要来了。。。

3.Spring Boot 2.x 教程,太全了!

4. 别再写满屏的爆爆爆炸类了,试试装璜器模式,这才是优雅的形式!!

5.《Java 开发手册(嵩山版)》最新公布,速速下载!

感觉不错,别忘了顺手点赞 + 转发哦!

正文完
 0