PLSQL迅速批量插入-FORALL-BULK-COLLECT

看完这章你会学习到以下内容: 为什么Forall会比单纯的Insert语句要快?适用范围以及如何使用?补充知识点执行SQL语句的逻辑过程:这里引入一个重要的概念上下文交换。 当PL/SQL运行时引擎处理一块代码时,它使用PL/SQL引擎来执行过程化的代码,而将SQL语句发送给SQL引擎来执行;SQL引擎执行完毕后,将结果再返回给PL/SQL引擎。这种在PL/SQL引擎和SQL引擎之间的交互,称为上下文交换(context switch)。缺点隐忧:每发生一次交换,就会带来一定的额外开销。如下如所示: Q:为什么Forall会比单纯的Insert语句要快? FORALL,用于增强PL/SQL引擎到SQL引擎的交换。BULK COLLECT,用于增强SQL引擎到PL/SQL引擎的交换。下面是 FORALL 的一个示意图:使用FORALL,可以将多个DML批量发送给SQL引擎来执行,最大限度地减少上下文交互所带来的开销。 Q:什么情况下,多少数据量下比较适用?至少百万级别或以上,效果会比较明显。 Forall 语法说明介绍: 1.index_name:一个无需声明的标识符,作为集合下标使用。2.lower_bound .. upper_bound:数字表达式,来指定一组连续有效的索引数字下限和上限。该表达式只需解析一次。3.INDICES OF collection_name:用于指向稀疏数组的实际下标。跳过没有赋值的元素,例如被 DELETE 的元素,NULL 也算值。4.VALUES OF index_collection_name:把该集合中的值当作下标,且该集合值的类型只能是 PLS_INTEGER/BINARY_INTEGER。5.SAVE EXCEPTIONS:可选关键字,表示即使一些DML语句失败,直到FORALL LOOP执行完毕才抛出异常。可以使用SQL%BULK_EXCEPTIONS 查看异常信息。6.dml_statement:静态语句,例如:UPDATE或者DELETE;或者动态(EXECUTE IMMEDIATE)DML语句。FORALL的使用示例所使用表结构: CREATE TABLE tmp_tab( id NUMBER(5), name VARCHAR2(50)); -- 创建一张新表--批量插入DECLARE -- 定义索引表类型 TYPE tb_table_type IS TABLE OF tmp_tab%rowtype INDEX BY BINARY_INTEGER; tb_table tb_table_type;BEGIN FOR i IN 1..100 LOOP tb_table(i).id:=i; -- 先将数据存放再一张临时表? tb_table(i).name:='NAME'||i; END LOOP; --用临时表数据导入到Tmp_tab上。 FORALL i IN 1..tb_table.count INSERT INTO tmp_tab VALUES tb_table(i); END;FORALL注意事项使用FORALL时,应该遵循如下规则: ...

June 27, 2020 · 1 min · jiezi

hive0.14-insert、update、delete操作测试

问题导读1.测试insert报错,该如何解决?2.hive delete和update报错,该如何解决?3.什么情况下才允许delete和update?首先用最普通的建表语句建一个表: hive>create table test(id int,name string)row format delimited fields terminated by ',';复制代码测试insert: insert into table test values (1,'row1'),(2,'row2');复制代码结果报错: java.io.FileNotFoundException: File does not exist: hdfs://127.0.0.1:9000/home/hadoop/git/hive/packaging/target/apache-hive-0.14.0-SNAPSHOT-bin/ apache-hive-0.14.0-SNAPSHOT-bin/lib/curator-client-2.6.0.jar         at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)         at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)         at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)         at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)         at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)         at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)         at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99)         at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)         at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)         at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)         at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)         at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)         at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)         at java.security.AccessController.doPrivileged(Native Method)         ......复制代码貌似往hdfs上找jar包了,小问题,直接把lib下的jar包上传到hdfs hadoop fs -mkdir -p /home/hadoop/git/hive/packaging/target/apache-hive-0.14.0-SNAPSHOT-bin/apache-hive-0.14.0-SNAPSHOT-bin/lib/ hadoop fs -put $HIVE_HOME/lib/* /home/hadoop/git/hive/packaging/target/apache-hive-0.14.0-SNAPSHOT-bin/apache-hive-0.14.0-SNAPSHOT-bin/lib/ 复制代码接着运行insert,没有问题,接下来测试delete hive>delete from test where id = 1;复制代码报错!:FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.说是在使用的转换管理器不支持update跟delete操作。原来要支持update操作跟delete操作,必须额外再配置一些东西,见:https://cwiki.apache.org/conflue ... tersforTransactions根据提示配置hive-site.xml:     hive.support.concurrency – true     hive.enforce.bucketing – true     hive.exec.dynamic.partition.mode – nonstrict     hive.txn.manager – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager     hive.compactor.initiator.on – true     hive.compactor.worker.threads – 1复制代码配置完以为能够顺利运行了,谁知开始报下面这个错误: FAILED: LockException [Error 10280]: Error communicating with the metastore复制代码与元数据库出现了问题,修改log为DEBUG查看具体错误: 4-11-04 14:20:14,367 DEBUG [Thread-8]: txn.CompactionTxnHandler (CompactionTxnHandler.java:findReadyToClean(265)) - Going to execute query <select cq_id, cq_database, cq_table, cq_partition, cq_type, cq_run_as from COMPACTION_QUEUE where cq_state = 'r'> 2014-11-04 14:20:14,367 ERROR [Thread-8]: txn.CompactionTxnHandler (CompactionTxnHandler.java:findReadyToClean(285)) - Unable to select next element for cleaning, Table 'hive.COMPACTION_QUEUE' doesn't exist 2014-11-04 14:20:14,367 DEBUG [Thread-8]: txn.CompactionTxnHandler (CompactionTxnHandler.java:findReadyToClean(287)) - Going to rollback 2014-11-04 14:20:14,368 ERROR [Thread-8]: compactor.Cleaner (Cleaner.java:run(143)) - Caught an exception in the main loop of compactor cleaner, MetaException(message :Unable to connect to transaction database com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'hive.COMPACTION_QUEUE' doesn't exist     at sun.reflect.GeneratedConstructorAccessor19.newInstance(Unknown Source)     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)     at java.lang.reflect.Constructor.newInstance(Constructor.java:526)     at com.mysql.jdbc.Util.handleNewInstance(Util.java:409)复制代码在元数据库中找不到COMPACTION_QUEUE这个表,赶紧去mysql中查看,确实没有这个表。怎么会没有这个表呢?找了很久都没找到什么原因,查源码吧。在org.apache.hadoop.hive.metastore.txn下的TxnDbUtil类中找到了建表语句,顺藤摸瓜,找到了下面这个方法会调用建表语句: private void checkQFileTestHack() {     boolean hackOn = HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_IN_TEST) ||         HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_IN_TEZ_TEST);     if (hackOn) {       LOG.info("Hacking in canned values for transaction manager");       // Set up the transaction/locking db in the derby metastore       TxnDbUtil.setConfValues(conf);       try {         TxnDbUtil.prepDb();       } catch (Exception e) {         // We may have already created the tables and thus don't need to redo it.         if (!e.getMessage().contains("already exists")) {           throw new RuntimeException("Unable to set up transaction database for" +               " testing: " + e.getMessage());         }       }     }   }复制代码什么意思呢,就是说要运行建表语句还有一个条件:HIVE_IN_TEST或者HIVE_IN_TEZ_TEST.只有在测试环境中才能用delete,update操作,也可以理解,毕竟还没有开发完全。终于找到原因,解决方法也很简单:在hive-site.xml中添加下面的配置: <property> <name>hive.in.test</name> <value>true</value> </property>复制代码OK,再重新启动服务,再运行delete: hive>delete from test where id = 1;复制代码又报错: FAILED: SemanticException [Error 10297]: Attempt to do update or delete on table default.test that does not use an AcidOutputFormat or is not bucketed复制代码说是要进行delete操作的表test不是AcidOutputFormat或没有分桶。估计是要求输出是AcidOutputFormat然后必须分桶网上查到确实如此,而且目前只有ORCFileformat支持AcidOutputFormat,不仅如此建表时必须指定参数('transactional' = true)。感觉太麻烦了。。。。于是按照网上示例建表: hive>create table test(id int ,name string )clustered by (id) into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');复制代码insert hive>insert into table test values (1,'row1'),(2,'row2'),(3,'row3');复制代码delete hive>delete from test where id = 1;复制代码update hive>update test set name = 'Raj' where id = 2;复制代码OK!全部顺利运行,不过貌似效率太低了,基本都要30s左右,估计应该可以优化,再研究研究...

October 24, 2017 · 3 min · jiezi