Deployment
Prepare to get module archive
Module archive is in the directory: inlong-sort-standalone/sort-standalone-dist/target/, the archive file is apache-inlong-sort-standalone-${project.version}-bin.tar.gz.
Start inlong-sort-standalone application
After the compilation is completed and the tar.gz package is generated, unzip the file to start the inlong-sort-standalone application.
example:
./bin/sort-start.sh
Configuration of conf/common.properties
| Parameter | Required | DefaultValue | Remark |
|---|---|---|---|
| clusterId | Y | NA | inlong-sort-standalone cluster id |
| sortSource.type | N | org.apache.inlong.sort.standalone.source.readapi.ReadApiSource | Source class name |
| sortChannel.type | N | org.apache.inlong.sort.standalone.channel.BufferQueueChannel | Channel class name |
| sortSink.type | N | org.apache.inlong.sort.standalone.sink.hive.HiveSink | Sink class name. Different distribution types use different Sink classes |
| sortClusterConfig.type | N | org.apache.inlong.sort.standalone.config.loader.ClassResourceSortClusterConfigLoader | The distribution cluster configuration loading class name, ClassResourceSortClusterConfigLoader reads the distribution cluster configuration from the SortClusterConfig.conf source file in ClassPath |
| sortClusterConfig.managerPath | N | NA | Distribute the parameters of the cluster configuration loading class org.apache.inlong.sort.standalone.config.loader.ManagerSortClusterConfigLoader and specify the URL path of the Inlong Manager |
| eventFormatHandler | N | org.apache.inlong.sort.standalone.sink.hive.DefaultEventFormatHandler | Format conversion class name before distributing Hive |
| maxThreads | N | 10 | Sink thread number |
| reloadInterval | N | 60000 | Interval updating Configuration data(millisecond) |
| processInterval | N | 100 | Interval processing data(millisecond) |
| metricDomains | N | Sort | Domain name of metric |
| metricDomains.Sort.domainListeners | N | org.apache.inlong.sort.standalone.metrics.prometheus.PrometheusMetricListener | Class name list of metric listener, separated by space |
| prometheusHttpPort | N | 8080 | HTTP server port of prometheus simple client |
| metricDomains.Sort.snapshotInterval | N | 60000 | Interval snapshoting metric data(millisecond) |
SortClusterConfig Configuration
Can read from the SortClusterConfig.conf source file in ClassPath, but does not support real-time updates
Can get the configuration from the HTTP interface of Inlong Manager
Parameter Required DefaultValue Remark clusterName Y NA Used to uniquely identify an inlong-sort-standalone cluster sortTasks Y NA Distribute task lists
SortTaskConfig Configuration
| Parameter | Required | DefaultValue | Remark |
|---|---|---|---|
| name | Y | NA | Distribute task name |
| type | Y | NA | Distribute task types, such as HIVE("hive"), TUBE("tube"), KAFKA("kafka"), PULSAR("pulsar"), ElasticSearch("ElasticSearch"), UNKNOWN("n") |
| idParams | Y | NA | Inlong data stream parameter list |
| sinkParams | Y | NA | Distribute task parameters |
Hive Distributed Tasks IdParams
| Parameter | Required | DefaultValue | Remark |
|---|---|---|---|
| inlongGroupId | Y | NA | inlongGroupId |
| inlongStreamId | Y | NA | inlongStreamId |
| separator | Y | NA | Delimiter |
| partitionIntervalMs | N | 3600000 | Partition interval, in milliseconds |
| idRootPath | Y | NA | HDFS root directory of Inlong data stream |
| partitionSubPath | Y | NA | Partition subdirectories for inlong data streams |
| hiveTableName | Y | NA | Hive table name of the Inlong data stream |
| partitionFieldName | N | dt | Partition field name of the Inlong data stream |
| partitionFieldPattern | Y | NA | The partition field value format of the Inlong data stream, such as {yyyyMMdd}, {yyyyMMddHH}, {yyyyMMddHHmm} |
| msgTimeFieldPattern | Y | NA | The field value format of the message generation time, Java time format |
| maxPartitionOpenDelayHour | N | 8 | Maximum opening delay time of the partition, in hours |
Hive Distributed Tasks SinkParams
| Parameter | Required | DefaultValue | Remark |
|---|---|---|---|
| hdfsPath | Y | NA | HDFS nameNode |
| maxFileOpenDelayMinute | N | 5 | Maximum write time of a single HDFS file, in minutes |
| tokenOvertimeMinute | N | 60 | The maximum time it takes to create a token for a partition of a single Inlong data stream, in minutes |
| maxOutputFileSizeGb | N | 2 | Maximum size of a single HDFS file, in GB |
| hiveJdbcUrl | Y | NA | Hive JDBC Path |
| hiveDatabase | Y | NA | Hive Database |
| hiveUsername | Y | NA | Hive Username |
| hivePassword | Y | NA | Hive Password |
Pulsar Distributed Tasks IdParams
| Parameter | Required | DefaultValue | Remark |
|---|---|---|---|
| inlongGroupId | Y | NA | inlongGroupId |
| inlongStreamId | Y | NA | inlongStreamId |
| topic | Y | NA | Pulsar Topic |
Pulsar Distributed Tasks SinkParams
| Parameter | Required | DefaultValue | Remark |
|---|---|---|---|
| serviceUrl | Y | NA | Pulsar service path |
| authentication | Y | NA | Pulsar cluster authentication |
| enableBatching | N | true | enableBatching |
| batchingMaxBytes | N | 5242880 | batchingMaxBytes |
| batchingMaxMessages | N | 3000 | batchingMaxMessages |
| batchingMaxPublishDelay | N | 1 | batchingMaxPublishDelay |
| maxPendingMessages | N | 1000 | maxPendingMessages |
| maxPendingMessagesAcrossPartitions | N | 50000 | maxPendingMessagesAcrossPartitions |
| sendTimeout | N | 0 | sendTimeout |
| compressionType | N | NONE | compressionType |
| blockIfQueueFull | N | true | blockIfQueueFull |
| roundRobinRouterBatchingPartitionSwitchFrequency | N | 10 | roundRobinRouterBatchingPartitionSwitchFrequency |
Hive Configuration Example
{
"data": {
"clusterName": "hivev3-sz-sz1",
"sortTasks": [
{
"idParams": [
{
"inlongGroupId": "0fc00000046",
"inlongStreamId": "",
"separator": "|",
"partitionIntervalMs": 3600000,
"idRootPath": "/user/hive/warehouse/t_inlong_v1_0fc00000046",
"partitionSubPath": "/{yyyyMMdd}/{yyyyMMddHH}",
"hiveTableName": "t_inlong_v1_0fc00000046",
"partitionFieldName": "dt",
"partitionFieldPattern": "yyyyMMddHH",
"msgTimeFieldPattern": "yyyy-MM-dd HH:mm:ss",
"maxPartitionOpenDelayHour": 8
},
{
"inlongGroupId": "03600000045",
"inlongStreamId": "",
"separator": "|",
"partitionIntervalMs": 3600000,
"idRootPath": "/user/hive/warehouse/t_inlong_v1_03600000045",
"partitionSubPath": "/{yyyyMMdd}/{yyyyMMddHH}",
"hiveTableName": "t_inlong_v1_03600000045",
"partitionFieldName": "dt",
"partitionFieldPattern": "yyyyMMddHH",
"msgTimeFieldPattern": "yyyy-MM-dd HH:mm:ss",
"maxPartitionOpenDelayHour": 8
},
{
"inlongGroupId": "05100054990",
"inlongStreamId": "",
"separator": "|",
"partitionIntervalMs": 3600000,
"idRootPath": "/user/hive/warehouse/t_inlong_v1_05100054990",
"partitionSubPath": "/{yyyyMMdd}/{yyyyMMddHH}",
"hiveTableName": "t_inlong_v1_05100054990",
"partitionFieldName": "dt",
"partitionFieldPattern": "yyyyMMddHH",
"msgTimeFieldPattern": "yyyy-MM-dd HH:mm:ss",
"maxPartitionOpenDelayHour": 8
},
{
"inlongGroupId": "09c00014434",
"inlongStreamId": "",
"separator": "|",
"partitionIntervalMs": 3600000,
"idRootPath": "/user/hive/warehouse/t_inlong_v1_09c00014434",
"partitionSubPath": "/{yyyyMMdd}/{yyyyMMddHH}",
"hiveTableName": "t_inlong_v1_09c00014434",
"partitionFieldName": "dt",
"partitionFieldPattern": "yyyyMMddHH",
"msgTimeFieldPattern": "yyyy-MM-dd HH:mm:ss",
"maxPartitionOpenDelayHour": 8
},
{
"inlongGroupId": "0c900035509",
"inlongStreamId": "",
"separator": "|",
"partitionIntervalMs": 3600000,
"idRootPath": "/user/hive/warehouse/t_inlong_v1_0c900035509",
"partitionSubPath": "/{yyyyMMdd}/{yyyyMMddHH}",
"hiveTableName": "t_inlong_v1_0c900035509",
"partitionFieldName": "dt",
"partitionFieldPattern": "yyyyMMddHH",
"msgTimeFieldPattern": "yyyy-MM-dd HH:mm:ss",
"maxPartitionOpenDelayHour": 8
}
],
"name": "sid_hive_inlong6th_v3",
"sinkParams": {
"hdfsPath": "hdfs://127.0.0.1:9000",
"maxFileOpenDelayMinute": "5",
"tokenOvertimeMinute": "60",
"maxOutputFileSizeGb": "2",
"hiveJdbcUrl": "jdbc:hive2://127.0.0.2:10000",
"hiveDatabase": "default",
"hiveUsername": "hive",
"hivePassword": "hive"
},
"type": "HIVE"
}
]
},
"errCode": 0,
"md5": "md5",
"result": true
}