安装部署
准备安装文件
安装文件在inlong-sort-standalone/sort-standalone-dist/target/目录下,文件名是apache-inlong-sort-standalone-${project.version}-bin.tar.gz。
启动inlong-sort-standalone应用
有了上述编译阶段产出的tar.gz包后,解压后就可以启动inlong-sort-standalone的应用了。
示例:
./bin/sort-start.sh
conf/common.properties配置
| 配置名 | 是否必须 | 默认值 | 描述 |
|---|---|---|---|
| clusterId | Y | NA | 用来唯一标识一个inlong-sort-standalone集群 |
| sortSource.type | N | org.apache.inlong.sort.standalone.source.readapi.ReadApiSource | Source类名 |
| sortChannel.type | N | org.apache.inlong.sort.standalone.channel.BufferQueueChannel | Channel类型 |
| sortSink.type | N | org.apache.inlong.sort.standalone.sink.hive.HiveSink | Sink类名,不同的分发类型使用不同的Sink类 |
| sortClusterConfig.type | N | org.apache.inlong.sort.standalone.config.loader.ClassResourceSortClusterConfigLoader | 分发集群配置加载类名,ClassResourceSortClusterConfigLoader从ClassPath的SortClusterConfig.conf源文件读取分发集群配置 |
| sortClusterConfig.managerPath | N | NA | 分发集群配置加载类org.apache.inlong.sort.standalone.config.loader.ManagerSortClusterConfigLoader的参数,指定Inlong Manager的URL路径, 如http://${manager ip:port}/api/inlong/manager/openapi/sort/standalone/getClusterConfig |
| eventFormatHandler | N | org.apache.inlong.sort.standalone.sink.hive.DefaultEventFormatHandler | 分发Hive前的格式转换类名 |
| maxThreads | N | 10 | sink的并行度 |
| reloadInterval | N | 60000 | 分发集群配置的更新加载周期,单位毫秒 |
| processInterval | N | 100 | 分发分组处理间隔,单位毫秒 |
| metricDomains | N | Sort | 指标汇总域名 |
| metricDomains.Sort.domainListeners | N | org.apache.inlong.sort.standalone.metrics.prometheus.PrometheusMetricListener | 指标汇总监听器类名列表,空格分隔 |
| prometheusHttpPort | N | 8080 | org.apache.inlong.sort.standalone.metrics.prometheus.PrometheusMetricListener的参数,Prometheus的HttpServer端口 |
| metricDomains.Sort.snapshotInterval | N | 60000 | 订阅tube的重试超时时间,单位为ms |
SortClusterConfig配置
- 可以从ClassPath的SortClusterConfig.conf源文件读取,但不支持实时更新
- 可以从Inlong Manager的HTTP接口获取配置
| 配置名 | 是否必须 | 默认值 |描述 | | ------------ | ------------ | ------------ | ------------ | |clusterName | Y | NA | 用来唯一标识一个inlong-sort-standalone集群 | |sortTasks | Y | NA | 分发任务列表 |
SortTaskConfig配置
| 配置名 | 是否必须 | 默认值 | 描述 |
|---|---|---|---|
| name | Y | NA | 分发任务名 |
| type | Y | NA | 分发任务类型,如HIVE("hive"), TUBE("tube"), KAFKA("kafka"), PULSAR("pulsar"), ElasticSearch("ElasticSearch"), UNKNOWN("n") |
| idParams | Y | NA | Inlong数据流参数列表 |
| sinkParams | Y | NA | 分发任务的参数 |
Hive分发任务的idParams
| 配置名 | 是否必须 | 默认值 | 描述 |
|---|---|---|---|
| inlongGroupId | Y | NA | inlongGroupId |
| inlongStreamId | Y | NA | inlongStreamId |
| separator | Y | NA | 分隔符 |
| partitionIntervalMs | N | 3600000 | 分区间隔时间,单位毫秒 |
| idRootPath | Y | NA | Inlong数据流的Hdfs根目录 |
| partitionSubPath | Y | NA | Inlong数据流的分区子目录 |
| hiveTableName | Y | NA | Inlong数据流的Hive表名 |
| partitionFieldName | N | dt | Inlong数据流的分区字段名 |
| partitionFieldPattern | Y | NA | Inlong数据流的分区字段值格式,如{yyyyMMdd}、{yyyyMMddHH}、{yyyyMMddHHmm} |
| msgTimeFieldPattern | Y | NA | 消息生成时间的字段值格式,Java时间格式 |
| maxPartitionOpenDelayHour | N | 8 | 分区最大打开延迟时间,单位小时 |
Hive分发任务的sinkParams
| 配置名 | 是否必须 | 默认值 | 描述 |
|---|---|---|---|
| hdfsPath | Y | NA | HDFS的NameNode |
| maxFileOpenDelayMinute | N | 5 | 单个HDFS文件最大写入时间,单位分钟 |
| tokenOvertimeMinute | N | 60 | 单个Inlong数据流的分区创建token最大占用时间,单位分钟 |
| maxOutputFileSizeGb | N | 2 | 单个HDFS文件最大大小,单位GB |
| hiveJdbcUrl | Y | NA | Hive的JDBC路径 |
| hiveDatabase | Y | NA | Hive的数据库 |
| hiveUsername | Y | NA | Hive的用户名 |
| hivePassword | Y | NA | Hive的密码 |
Pulsar分发任务的idParams
| 配置名 | 是否必须 | 默认值 | 描述 |
|---|---|---|---|
| inlongGroupId | Y | NA | inlongGroupId |
| inlongStreamId | Y | NA | inlongStreamId |
| topic | Y | NA | Pulsar的Topic |
Pulsar分发任务的sinkParams
| 配置名 | 是否必须 | 默认值 | 描述 |
|---|---|---|---|
| serviceUrl | Y | NA | Pulsar服务路径 |
| authentication | Y | NA | Pulsar集群鉴权 |
| enableBatching | N | true | enableBatching |
| batchingMaxBytes | N | 5242880 | batchingMaxBytes |
| batchingMaxMessages | N | 3000 | batchingMaxMessages |
| batchingMaxPublishDelay | N | 1 | batchingMaxPublishDelay |
| maxPendingMessages | N | 1000 | maxPendingMessages |
| maxPendingMessagesAcrossPartitions | N | 50000 | maxPendingMessagesAcrossPartitions |
| sendTimeout | N | 0 | sendTimeout |
| compressionType | N | NONE | compressionType |
| blockIfQueueFull | N | true | blockIfQueueFull |
| roundRobinRouterBatchingPartitionSwitchFrequency | N | 10 | roundRobinRouterBatchingPartitionSwitchFrequency |
Hive配置样例
{
"data":{
"clusterName":"hivev3-sz-sz1",
"sortTasks":[
{
"idParams":[
{
"inlongGroupId":"0fc00000046",
"inlongStreamId":"",
"separator":"|",
"partitionIntervalMs":3600000,
"idRootPath":"/user/hive/warehouse/t_inlong_v1_0fc00000046",
"partitionSubPath":"/{yyyyMMdd}/{yyyyMMddHH}",
"hiveTableName":"t_inlong_v1_0fc00000046",
"partitionFieldName":"dt",
"partitionFieldPattern":"yyyyMMddHH",
"msgTimeFieldPattern":"yyyy-MM-dd HH:mm:ss",
"maxPartitionOpenDelayHour":8
},
{
"inlongGroupId":"03600000045",
"inlongStreamId":"",
"separator":"|",
"partitionIntervalMs":3600000,
"idRootPath":"/user/hive/warehouse/t_inlong_v1_03600000045",
"partitionSubPath":"/{yyyyMMdd}/{yyyyMMddHH}",
"hiveTableName":"t_inlong_v1_03600000045",
"partitionFieldName":"dt",
"partitionFieldPattern":"yyyyMMddHH",
"msgTimeFieldPattern":"yyyy-MM-dd HH:mm:ss",
"maxPartitionOpenDelayHour":8
},
{
"inlongGroupId":"05100054990",
"inlongStreamId":"",
"separator":"|",
"partitionIntervalMs":3600000,
"idRootPath":"/user/hive/warehouse/t_inlong_v1_05100054990",
"partitionSubPath":"/{yyyyMMdd}/{yyyyMMddHH}",
"hiveTableName":"t_inlong_v1_05100054990",
"partitionFieldName":"dt",
"partitionFieldPattern":"yyyyMMddHH",
"msgTimeFieldPattern":"yyyy-MM-dd HH:mm:ss",
"maxPartitionOpenDelayHour":8
},
{
"inlongGroupId":"09c00014434",
"inlongStreamId":"",
"separator":"|",
"partitionIntervalMs":3600000,
"idRootPath":"/user/hive/warehouse/t_inlong_v1_09c00014434",
"partitionSubPath":"/{yyyyMMdd}/{yyyyMMddHH}",
"hiveTableName":"t_inlong_v1_09c00014434",
"partitionFieldName":"dt",
"partitionFieldPattern":"yyyyMMddHH",
"msgTimeFieldPattern":"yyyy-MM-dd HH:mm:ss",
"maxPartitionOpenDelayHour":8
},
{
"inlongGroupId":"0c900035509",
"inlongStreamId":"",
"separator":"|",
"partitionIntervalMs":3600000,
"idRootPath":"/user/hive/warehouse/t_inlong_v1_0c900035509",
"partitionSubPath":"/{yyyyMMdd}/{yyyyMMddHH}",
"hiveTableName":"t_inlong_v1_0c900035509",
"partitionFieldName":"dt",
"partitionFieldPattern":"yyyyMMddHH",
"msgTimeFieldPattern":"yyyy-MM-dd HH:mm:ss",
"maxPartitionOpenDelayHour":8
}
],
"name":"sid_hive_inlong6th_v3",
"sinkParams":{
"hdfsPath":"hdfs://127.0.0.1:9000",
"maxFileOpenDelayMinute":"5",
"tokenOvertimeMinute":"60",
"maxOutputFileSizeGb":"2",
"hiveJdbcUrl":"jdbc:hive2://127.0.0.2:10000",
"hiveDatabase":"default",
"hiveUsername":"hive",
"hivePassword":"hive"
},
"type":"HIVE"
}
]
},
"errCode":0,
"md5":"md5",
"result":true
}