共计 3701 个字符,预计需要花费 10 分钟才能阅读完成。
反对以下引擎
- Spark
- Flink
-
SeaTunnel Zeta
要害个性
- 批处理
- 准确一次性解决
- 列投影
- 并行处理
- 反对用户自定义拆分
-
反对查问 SQL 并实现投影成果
形容
通过 JDBC 读取内部数据源数据。
反对的数据源信息
Datasource | Supported versions | Driver | Url | Maven |
---|---|---|---|---|
Vertica | Different dependency version has different driver class. | com.vertica.jdbc.Driver | jdbc:vertica://localhost:5433/vertica | Download |
数据库依赖
请下载与 ‘Maven’ 对应的反对列表,并将其复制到 ‘$SEATNUNNEL_HOME/plugins/jdbc/lib/’ 工作目录中 <br/>
例如,Vertica 数据源:cp vertica-jdbc-xxx.jar $SEATNUNNEL_HOME/plugins/jdbc/lib/
数据类型映射
Vertical Data type | SeaTunnel Data type |
---|---|
BIT | BOOLEAN |
TINYINT<br/>TINYINT UNSIGNED<br/>SMALLINT<br/>SMALLINT UNSIGNED<br/>MEDIUMINT<br/>MEDIUMINT UNSIGNED<br/>INT<br/>INTEGER<br/>YEAR | INT |
INT UNSIGNED<br/>INTEGER UNSIGNED<br/>BIGINT | LONG |
BIGINT UNSIGNED | DECIMAL(20,0) |
DECIMAL(x,y)(Get the designated column’s specified column size.<38) | DECIMAL(x,y) |
DECIMAL(x,y)(Get the designated column’s specified column size.>38) | DECIMAL(38,18) |
DECIMAL UNSIGNED | DECIMAL((Get the designated column’s specified column size)+1,<br/>(Gets the designated column’s number of digits to right of the decimal point.))) |
FLOAT<br/>FLOAT UNSIGNED | FLOAT |
DOUBLE<br/>DOUBLE UNSIGNED | DOUBLE |
CHAR<br/>VARCHAR<br/>TINYTEXT<br/>MEDIUMTEXT<br/>TEXT<br/>LONGTEXT<br/>JSON | STRING |
DATE | DATE |
TIME | TIME |
DATETIME<br/>TIMESTAMP | TIMESTAMP |
TINYBLOB<br/>MEDIUMBLOB<br/>BLOB<br/>LONGBLOB<br/>BINARY<br/>VARBINAR<br/>BIT(n) | BYTES |
GEOMETRY<br/>UNKNOWN | Not supported yet |
源选项
Name | Type | Required | Default | Description |
---|---|---|---|---|
url | String | Yes | – | The URL of the JDBC connection. Refer to a case: jdbc:vertica://localhost:5433/vertica |
driver | String | Yes | – | The jdbc class name used to connect to the remote data source,<br/> if you use Vertica the value is com.vertica.jdbc.Driver . |
user | String | No | – | Connection instance user name |
password | String | No | – | Connection instance password |
query | String | Yes | – | Query statement |
connection_check_timeout_sec | Int | No | 30 | The time in seconds to wait for the database operation used to validate the connection to complete |
partition_column | String | No | – | The column name for parallelism’s partition, only support numeric type,Only support numeric type primary key, and only can config one column. |
partition_lower_bound | Long | No | – | The partition_column min value for scan, if not set SeaTunnel will query database get min value. |
partition_upper_bound | Long | No | – | The partition_column max value for scan, if not set SeaTunnel will query database get max value. |
partition_num | Int | No | job parallelism | The number of partition count, only support positive integer. default value is job parallelism |
fetch_size | Int | No | 0 | For queries that return a large number of objects,you can configure<br/> the row fetch size used in the query toimprove performance by<br/> reducing the number database hits required to satisfy the selection criteria.<br/> Zero means use jdbc default value. |
common-options | No | – | Source plugin common parameters, please refer to Source Common Options for details |
- 提醒
如果未设置 partition_column
,则会在繁多并发中运行;如果设置了 partition_column
,则将依据工作的并发性进行并行执行。
工作示例
简略示例:
此示例在繁多并行中查问您的测试“数据库”中的 type_bin 'table'
16 个数据,并查问其所有字段。您还能够指定要查问的字段,以便将最终输入显示在管制台上。
env {
您能够在此处设置 Flink 配置
execution.parallelism = 2
job.mode = "BATCH"
}
source{
Jdbc {
url = "jdbc:vertica://localhost:5433/vertica"
driver = "com.vertica.jdbc.Driver"
connection_check_timeout_sec = 100
user = "root"
password = "123456"
query = "select * from type_bin limit 16"
}
}
transform {
# 如果您想获取无关如何配置 seatunnel 的更多信息,并查看残缺的转换插件列表,# 请拜访 https://seatunnel.apache.org/docs/transform-v2/sql
}
sink {Console {}
}
并行示例:
并行读取您的查问表,应用您配置的 shard 字段和 shard 数据。如果要读取整个表,能够这样做。
source {
Jdbc {
url = "jdbc:vertica://localhost:5433/vertica"
driver = "com.vertica.jdbc.Driver"
connection_check_timeout_sec = 100
user = "root"
password = "123456"
# 依据须要定义查问逻辑
query = "select * from type_bin"
# 并行分片读取字段
partition_column = "id"
# 片段数量
partition_num = 10
}
}
并行边界示例:
依据查问的下限和上限指定数据更加高效,依据您配置的下限和上限来读取数据源更加高效
source {
Jdbc {
url = "jdbc:vertica://localhost:5433/vertica"
driver = "com.vertica.jdbc.Driver"
connection_check_timeout_sec = 100
user = "root"
password = "123456"
# 依据须要定义查问逻辑
query = "select * from type_bin"
partition_column = "id"
# 读取起始边界
partition_lower_bound = 1
# 读取完结边界
partition_upper_bound = 500
partition_num = 10
}
}
本文由 白鲸开源科技 提供公布反对!
正文完