关于presto:技能速成教你10分钟内在电脑上配置运行Hive-Metastore和Presto

54次阅读

共计 4133 个字符，预计需要花费 11 分钟才能阅读完成。

作者：范斌；Alluxio 开创成员、开源社区副总裁

To 初学者：

本教程将领导初学者在本地服务器上通过搭建 Presto 和 Hive Metastore 来查问 S3 上的数据。
Presto 是用于打算和执行查问的 SQL 引擎，S3 为表分区文件提供存储服务，而 Hive Metastore 是为 Presto 拜访表模式和地位信息提供 catalog 服务。
本教程将展现如何一步一步装置并配置 Presto 和 Hive MetaStore，从而查问存储在私有 S3 bucket 中的数据。

本教程中咱们下载并应用 [apache-hive-2.3.7-bin.tar.gz]，点击下载并解压 Hive 的二进制压缩包。

$ cd /path/to/tutorial/root
$ wget https://downloads.apache.org/hive/hive-2.3.7/apache-hive-2.3.7-bin.tar.gz
$ tar -zxf apache-hive-2.3.7-bin.tar.gz
$ cd apache-hive-2.3.7-bin

咱们只须要启动 Hive Metastore 来为 Presto 提供诸如表模式和分区地位等的 catalog 信息。

如果你是第一次启动 Hive Metastore，请筹备好相应的配置文件和环境，同时初始化 (initialize) 一个新的 Metastore。

$ export HIVE_HOME=`pwd`
$ cp conf/hive-default.xml.template conf/hive-site.xml
$ mkdir -p hcatalog/var/log/
$ bin/schematool -dbType derby -initSchema

须要配置 Hive 来拜访 S3，能够在 conf/hive-env.sh 中增加以下几行。同时，Hive 须要相应的 jar 包来拜访带有“s3a://”地址的文件，还须要 AWS 凭证来拜访 S3 bucket（包含私有 S3 bucket）。

export HIVE_AUX_JARS_PATH=${HADOOP_HOME}/share/hadoop/tools/lib/aws-java-sdk-core-1.10.6.jar:${HADOOP_HOME}/share/hadoop/tools/lib/aws-java-sdk-s3-1.10.6.jar:${HADOOP_HOME}/share/hadoop/tools/lib/hadoop-aws-2.8.4.jar
export AWS_ACCESS_KEY_ID=<Your AWS Access Key>
export AWS_SECRET_ACCESS_KEY=<Your AWS Secret Key>

如果你的 Hadoop 安装包中没有上述 jar 包，你也能够从 maven central 下载：

<aws-java-sdk-core-1.10.6.jar>、<aws-java-sdk-s3-1.10.6.jar>、<hadoop-aws-2.8.4.jar>

启动 Hive Metastore，它将在后盾运行并监听端口 9083（默认端口）。

$ hcatalog/sbin/hcat_server.sh start
Started metastore server init, testing if initialized correctly...
Metastore initialized successfully on port[9083].

为了验证 MetaStore 是否在运行，请在 hcatalog/var/log/ 门路下查看 Hive Metastore 日志。

在本教程中咱们以 [0.237.1 版本] 服务器为例，点击链接，关上 Presto 服务器装置页面，下载并解压通过预编译的（pre-build），服务器压缩包。

$ cd /path/to/tutorial/root
$ wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.237.1/presto-server-0.237.1.tar.gz
$ tar -zxf presto-server-0.237.1.tar.gz
$ cd presto-server-0.237.1

创立一个蕴含根本 Presto 配置的配置文件: etc/config.properties。

coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
discovery-server.enabled=true
discovery.uri=http://localhost:8080

创立 etc/jvm.config 来实现以下 JVM 配置。

-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError

创立 etc/node.properties，应蕴含上面几行内容：

node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/tmp/presto/data

最初，在 etc/catalog/hive.properties 中配置 Presto Hive 连接器，指向刚刚启动的 Hive Metastore 服务。此外，这里还须要再次输出 AWS 凭证，实现后，Presto 即可从 S3 读取输出文件。

connector.name=hive-hadoop2
hive.metastore.uri=thrift://localhost:9083
hive.s3.aws-access-key=<Your AWS Access Key>
hive.s3.aws-secret-key=<Your AWS Secret Key>

在后盾启动 Presto 服务器：

$ ./bin/launcher start

为了验证 Presto 服务器是否在运行，从浏览器中拜访链接 http://localhost:8080，并在网页用户界面（UI）上查看服务器状态。

并运行查问命令，从服务器上下载 Presto 命令行工具，它是一个独自的二进制文件 Presto CLI

$ cd /path/to/tutorial/root
$ wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.237.1/presto-cli-0.237.1-executable.jar
$ mv presto-cli-0.237.1-executable.jar presto
$ chmod +x presto

连贯到上一步中曾经启动的 Presto 服务器。

$ ./presto --server localhost:8080  --catalog hive --debug

应用默认模式

presto> use default;
USE

基于 S3 中的文件在默认模式下创立一个新表，这些信息将被发送到 Hive MetaStore。

presto:default> CREATE TABLE reason (
  r_reason_sk integer,
  r_reason_id varchar,
  r_reason_desc varchar
) WITH (
  external_location = 's3a://apc999/presto-tutorial/example-reason',
  format = 'PARQUET'
);
CREATE TABLE

扫描创立的新表：

presto:default> SELECT * FROM reason limit 3;
 r_reason_sk |   r_reason_id    |     r_reason_desc      
-------------+------------------+------------------------
           1 | AAAAAAAABAAAAAAA | Package was damaged    
           2 | AAAAAAAACAAAAAAA | Stopped working        
           3 | AAAAAAAADAAAAAAA | Did not get it on time 
(3 rows)Query 20200703_074406_00011_8vq8w, FINISHED, 1 node
http://localhost:8080/ui/query.html?20200703_074406_00011_8vq8w
Splits: 18 total, 18 done (100.00%)
CPU Time: 0.5s total,     6 rows/s, 2.06KB/s, 27% active
Per Node: 0.1 parallelism,     0 rows/s,   279B/s
Parallelism: 0.1
Peak User Memory: 0B
Peak Total Memory: 219B
Peak Task Total Memory: 219B
0:04 [3 rows, 1002B] [0 rows/s, 279B/s]

$ cd /path/to/tutorial/root
$ presto-server-0.237.1/bin/launcher stop
$ apache-hive-2.3.7-bin/hcatalog/sbin/hcat_server.sh stop

在本教程中，咱们演示了如何通过搭建 Presto 和 Hive Metastore 来对存储在私有 S3 bucket 中的数据进行 SQL 查问，心愿对你有所帮忙。

想要获取更多乏味有料的【流动信息】【技术文章】【大咖观点】，请关注[[[Alluxio 智库]]](https://page.ma.scrmtech.com/…)

正文完

presto

发表至： presto

2022-09-27

0

关于presto:Presto-在字节跳动的内部实践与优化

关于presto:揭秘Presto＋Alluxio-的N个核心黑魔法

关于presto:架构创新丨PrestoAlluxio-概览白皮书发布

关于presto:探究Presto-SQL引擎2浅析Join

关于flutter:在Flutter移动应用开发中减少应用大小

关于presto:技能速成教你10分钟内在电脑上配置运行Hive-Metastore和Presto

第一步：下载和启动 Hive MetaStore

第二步：下载并启动 Presto 服务器

第三步：启动 Presto CLI（Presto 命令行工具）

第四步：进行服务器

总结：

Just My Socks（注册教程内含优惠码）

关于presto:技能速成教你10分钟内在电脑上配置运行Hive-Metastore和Presto

第一步：下载和启动 Hive MetaStore

第二步：下载并启动 Presto 服务器

第三步：启动 Presto CLI（Presto 命令行工具）

第四步：进行服务器

总结：

Just My Socks（注册教程 内含优惠码）

Just My Socks（注册教程内含优惠码）