关于hive:hive优化

68次阅读

共计 1977 个字符，预计需要花费 5 分钟才能阅读完成。

Fetch 抓取是指，Hive 中对某些状况的查问能够不用应用 MapReduce 计算。例如：SELECT * FROM employees; 在这种状况下，Hive 能够简略地读取 employee 对应的存储目录下的文件，而后输入查问后果到控制台。
在 hive-default.xml.template 文件中 hive.fetch.task.conversion 默认是 more，老版本 hive 默认是 minimal，该属性批改为 more 当前，在全局查找、字段查找、limit 查找等都不走 mapreduce。

 <property>
    <name>hive.fetch.task.conversion</name>
    <value>more</value>
    <description>
      Expects one of [none, minimal, more].
      Some select queries can be converted to single FETCH task minimizing latency.
      Currently the query should be single sourced not having any subquery and should not have
      any aggregations or distincts (which incurs RS), lateral views and joins.
      0. none : disable hive.fetch.task.conversion
      1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
      2. more  : SELECT, FILTER, LIMIT only (support TABLESAMPLE and virtual columns)
    </description>
  </property>

1. 把 hive.fetch.task.conversion 设置成 none，而后执行查问语句，都会执行 mapreduce 程序。

 hive (default)> set hive.fetch.task.conversion=none;
hive (default)> select * from emp;
hive (default)> select ename from emp;
hive (default)> select ename from emp limit 3;

2. 把 hive.fetch.task.conversion 设置成 more，而后执行查问语句，如下查问形式都不会执行 mapreduce 程序。

 hive (default)> set hive.fetch.task.conversion=more;
hive (default)> select * from emp;
hive (default)> select ename from emp;
hive (default)> select ename from emp limit 3;

大多数的 Hadoop Job 是须要 Hadoop 提供的残缺的可扩展性来解决大数据集的。不过，有时 Hive 的输出数据量是十分小的。在这种状况下，为查问触发执行工作耗费的工夫可能会比理论 job 的执行工夫要多的多。对于大多数这种状况，Hive 能够通过本地模式在单台机器上解决所有的工作。对于小数据集，执行工夫能够显著被缩短。

用户能够通过设置 hive.exec.mode.local.auto 的值为 true，来让 Hive 在适当的时候主动启动这个优化。

set hive.exec.mode.local.auto=true;  // 开启本地 mr

设置 local mr 的最大输出数据量，当输出数据量小于这个值时采纳 local mr 的形式，默认为 134217728，即 128M

set hive.exec.mode.local.auto.inputbytes.max=50000000;

设置 local mr 的最大输出文件个数，当输出文件个数小于这个值时采纳 local mr 的形式，默认为 4

set hive.exec.mode.local.auto.input.files.max=10;

1. 开启本地模式，并执行查问语句

 hive (default)> set hive.exec.mode.local.auto=true; 
hive (default)> select * from emp cluster by deptno;
Time taken: 1.328 seconds, Fetched: 14 row(s)

2. 敞开本地模式，并执行查问语句

 hive (default)> set hive.exec.mode.local.auto=false; 
hive (default)> select * from emp cluster by deptno;
Time taken: 20.09 seconds, Fetched: 14 row(s)

正文完

hive

发表至： hive

2020-12-09

0

关于hive:Hive

关于hive:分享一个-hive-on-spark-模式下使用-HikariCP-数据库连接池造成的资源泄露问题

关于hive:Hive日期时间函数总结

关于hive:Hive作业产生的临时数据占用HDFS空间大问题处理

关于javascript:写了个网页版的五笔跟打器玫枫跟打器

关于hive:hive优化

企业级调优

Fetch 抓取

案例实操：

本地模式（重要）

案例实操：

Just My Socks（注册教程内含优惠码）

	<property>
	<name>hive.fetch.task.conversion</name>
	<value>more</value>
	<description>
	Expects one of [none, minimal, more].
	Some select queries can be converted to single FETCH task minimizing latency.
	Currently the query should be single sourced not having any subquery and should not have
	any aggregations or distincts (which incurs RS), lateral views and joins.
	0. none : disable hive.fetch.task.conversion
	1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
	2. more : SELECT, FILTER, LIMIT only (support TABLESAMPLE and virtual columns)
	</description>
	</property>

	hive (default)> set hive.fetch.task.conversion=none;
	hive (default)> select * from emp;
	hive (default)> select ename from emp;
	hive (default)> select ename from emp limit 3;

	hive (default)> set hive.exec.mode.local.auto=true;
	hive (default)> select * from emp cluster by deptno;
	Time taken: 1.328 seconds, Fetched: 14 row(s)

	hive (default)> set hive.exec.mode.local.auto=false;
	hive (default)> select * from emp cluster by deptno;
	Time taken: 20.09 seconds, Fetched: 14 row(s)

关于hive:hive优化

企业级调优

Fetch 抓取

案例实操：

本地模式（重要）

案例实操：

Just My Socks（注册教程 内含优惠码）

Just My Socks（注册教程内含优惠码）