关于数据库:JDBC-Vertica-Source-Connector-使用文档

6次阅读

共计 3701 个字符,预计需要花费 10 分钟才能阅读完成。

反对以下引擎

  • Spark
  • Flink
  • SeaTunnel Zeta

    要害个性

  • 批处理
  • 准确一次性解决
  • 列投影
  • 并行处理
  • 反对用户自定义拆分
  • 反对查问 SQL 并实现投影成果

    形容

通过 JDBC 读取内部数据源数据。

反对的数据源信息

Datasource Supported versions Driver Url Maven
Vertica Different dependency version has different driver class. com.vertica.jdbc.Driver jdbc:vertica://localhost:5433/vertica Download

数据库依赖

请下载与 ‘Maven’ 对应的反对列表,并将其复制到 ‘$SEATNUNNEL_HOME/plugins/jdbc/lib/’ 工作目录中 <br/>
例如,Vertica 数据源:cp vertica-jdbc-xxx.jar $SEATNUNNEL_HOME/plugins/jdbc/lib/

数据类型映射

Vertical Data type SeaTunnel Data type
BIT BOOLEAN
TINYINT<br/>TINYINT UNSIGNED<br/>SMALLINT<br/>SMALLINT UNSIGNED<br/>MEDIUMINT<br/>MEDIUMINT UNSIGNED<br/>INT<br/>INTEGER<br/>YEAR INT
INT UNSIGNED<br/>INTEGER UNSIGNED<br/>BIGINT LONG
BIGINT UNSIGNED DECIMAL(20,0)
DECIMAL(x,y)(Get the designated column’s specified column size.<38) DECIMAL(x,y)
DECIMAL(x,y)(Get the designated column’s specified column size.>38) DECIMAL(38,18)
DECIMAL UNSIGNED DECIMAL((Get the designated column’s specified column size)+1,<br/>(Gets the designated column’s number of digits to right of the decimal point.)))
FLOAT<br/>FLOAT UNSIGNED FLOAT
DOUBLE<br/>DOUBLE UNSIGNED DOUBLE
CHAR<br/>VARCHAR<br/>TINYTEXT<br/>MEDIUMTEXT<br/>TEXT<br/>LONGTEXT<br/>JSON STRING
DATE DATE
TIME TIME
DATETIME<br/>TIMESTAMP TIMESTAMP
TINYBLOB<br/>MEDIUMBLOB<br/>BLOB<br/>LONGBLOB<br/>BINARY<br/>VARBINAR<br/>BIT(n) BYTES
GEOMETRY<br/>UNKNOWN Not supported yet

源选项

Name Type Required Default Description
url String Yes The URL of the JDBC connection. Refer to a case: jdbc:vertica://localhost:5433/vertica
driver String Yes The jdbc class name used to connect to the remote data source,<br/> if you use Vertica the value is com.vertica.jdbc.Driver.
user String No Connection instance user name
password String No Connection instance password
query String Yes Query statement
connection_check_timeout_sec Int No 30 The time in seconds to wait for the database operation used to validate the connection to complete
partition_column String No The column name for parallelism’s partition, only support numeric type,Only support numeric type primary key, and only can config one column.
partition_lower_bound Long No The partition_column min value for scan, if not set SeaTunnel will query database get min value.
partition_upper_bound Long No The partition_column max value for scan, if not set SeaTunnel will query database get max value.
partition_num Int No job parallelism The number of partition count, only support positive integer. default value is job parallelism
fetch_size Int No 0 For queries that return a large number of objects,you can configure<br/> the row fetch size used in the query toimprove performance by<br/> reducing the number database hits required to satisfy the selection criteria.<br/> Zero means use jdbc default value.
common-options No Source plugin common parameters, please refer to Source Common Options for details
  • 提醒

如果未设置 partition_column,则会在繁多并发中运行;如果设置了 partition_column,则将依据工作的并发性进行并行执行。

工作示例

简略示例:

此示例在繁多并行中查问您的测试“数据库”中的 type_bin 'table' 16 个数据,并查问其所有字段。您还能够指定要查问的字段,以便将最终输入显示在管制台上。

env {

您能够在此处设置 Flink 配置
execution.parallelism = 2
job.mode = "BATCH"
}
source{
Jdbc {
url = "jdbc:vertica://localhost:5433/vertica"
driver = "com.vertica.jdbc.Driver"
connection_check_timeout_sec = 100
user = "root"
password = "123456"
query = "select * from type_bin limit 16"
}
}

transform {
# 如果您想获取无关如何配置 seatunnel 的更多信息,并查看残缺的转换插件列表,# 请拜访 https://seatunnel.apache.org/docs/transform-v2/sql
}

sink {Console {}
}

并行示例:

并行读取您的查问表,应用您配置的 shard 字段和 shard 数据。如果要读取整个表,能够这样做。

source {
Jdbc {
url = "jdbc:vertica://localhost:5433/vertica"
driver = "com.vertica.jdbc.Driver"
connection_check_timeout_sec = 100
user = "root"
password = "123456"
# 依据须要定义查问逻辑
query = "select * from type_bin"
# 并行分片读取字段
partition_column = "id"
# 片段数量
partition_num = 10
}
}

并行边界示例:

 依据查问的下限和上限指定数据更加高效,依据您配置的下限和上限来读取数据源更加高效
source {
Jdbc {
url = "jdbc:vertica://localhost:5433/vertica"
driver = "com.vertica.jdbc.Driver"
connection_check_timeout_sec = 100
user = "root"
password = "123456"
# 依据须要定义查问逻辑
query = "select * from type_bin"
partition_column = "id"
# 读取起始边界
partition_lower_bound = 1
# 读取完结边界
partition_upper_bound = 500
partition_num = 10
}
}

本文由 白鲸开源科技 提供公布反对!

正文完
 0