Reference ¶

Table of Contents

Reference

Documentation Conventions ¶

Commands and keywords are in this font.
$OFFLOAD_HOME is set when the environment file (offload.env) is sourced, unless already set, and refers to the directory named offload that is created when the software is unpacked. This is also referred to as <OFFLOAD_HOME> in sections of this guide where the environment file has not been created/sourced.
Third party vendor product names might be aliased or shortened for simplicity. See Third Party Vendor Products for cross-references to full product names and trademarks.

Name of host(s) to use to connect to Impala/Hive. Can be a comma-separated list of hosts to randomly choose from, e.g. hadoop1,hadoop2,hadoop3. Use when configuring Gluent Query Engine to connect to a different Cloudera Data Platform experience to Gluent Offload Engine (e.g. Data Warehouse rather than Data Hub). If unset, all connections will be made to HIVE_SERVER_HOST.

Supported Values	Hostname or IP address of Impala/Hive host(s)
Version Added	`4.1.0`

CONNECTOR_HIVE_SERVER_HTTP_PATH¶

Path component of URL endpoint when connecting to HiveServer2 in HTTP mode (i.e. when HIVE_SERVER_HTTP_TRANSPORT is true). Use when configuring Gluent Query Engine to connect to a different Cloudera Data Platform experience to Gluent Offload Engine (e.g. Data Warehouse rather than Data Hub). If unset, all connections will be made with HIVE_SERVER_HTTP_PATH.

Supported Values	Valid URL path
Version Added	`4.1.0`

CONNECTOR_SQL_ENGINE¶

SQL engine used by Gluent Query Engine for hybrid queries.

Default Value	`IMPALA`
Supported Values	`IMPALA\|BIGQUERY\|SNOWFLAKE\|SYNAPSE`
Version Added	`3.1.0`

CONN_PRE_CMD¶

Used to set pre-commands before query execution, e.g. set hive.execution.engine=tez;.

Supported Values	Supported session `set` parameters
Version Added	`2.3.0`

DATA_GOVERNANCE_API_PASS¶

Password for the account specified in DATA_GOVERNANCE_API_USER. Password encryption is supported using the Password Tool utility.

Supported Values	Cloudera Navigator service account password
Version Added	`2.11.0`

DATA_GOVERNANCE_API_URL¶

URL for a data governance REST API in the format http://fqdn-n.example.com:port/api. Leaving this configuration item blank disables data governance integration.

Supported Values	Valid Cloudera Navigator REST API URL
Version Added	`2.11.0`

DATA_GOVERNANCE_API_USER¶

Service account to be used to connect to a data governance REST API.

Supported Values	Cloudera Navigator service account name
Version Added	`2.11.0`

DATA_GOVERNANCE_AUTO_PROPERTIES¶

CSV string of dynamic properties to include in data governance metadata. The tokens in the CSV will be expanded at runtime if prefixed with + or ignored if prefixed with -.

Supported Values

CSV containing the following tokens prefixed with either + or -.

GLUENT_OBJECT_TYPE, SOURCE_RDBMS_TABLE, TARGET_RDBMS_TABLE, INITIAL_GLUENT_VERSION, LATEST_GLUENT_VERSION, INITIAL_OPERATION_DATETIME, LATEST_OPERATION_DATETIME

Version Added

2.11.0

DATA_GOVERNANCE_AUTO_TAGS¶

CSV string of tags to include in data governance metadata. Tags are free-format except for +RDBMS_NAME which is expanded at run time.

Default Value:	`GLUENT,+RDBMS_NAME`
Supported Values	CSV containing tags to attach to data governance metadata
Version Added	`2.11.0`

DATA_GOVERNANCE_BACKEND¶

Specify the data governance API type accessed via DATA_GOVERNANCE_API_URL.

Supported Values	`navigator`
Version Added	`2.11.0`

DATA_GOVERNANCE_CUSTOM_PROPERTIES¶

JSON string of key/value pairs to include in data governance metadata.

Associated Option	`--data-governance-custom-properties`
Supported Values	Valid JSON string of key/value pairs (no nested or complex data types)
Version Added	`2.11.0`

DATA_GOVERNANCE_CUSTOM_TAGS¶

CSV string of tags to include in data governance metadata.

Associated Option	`--data-governance-custom-tags`
Supported Values	CSV containing tags to attach to data governance metadata
Version Added	`2.11.0`

DATA_SAMPLE_PARALLELISM¶

Degree of parallelism to use when sampling data for all columns in the source RDBMS table that are either date/timestamp-based or defined as a number without a precision and scale. A value of 0 or 1 disables parallelism.

Associated Option	`--data-sample-parallelism`
Default Value	`0`
Supported Values	`0` and positive integers
Version Added	`4.2.0`

DATAD_ADDRESS¶

The address(es) of Data Daemon. For a single daemon the format is <hostname/IP address>:<port>. Specifying multiple daemons can be achieved in one of two ways:

By DNS address. The DNS server can return multiple A records for a hostname and Gluent Data Platform will load balance between these, e.g. <load-balancer-address>:<load-balancer-port>

By IP address and port. The comma-separated list must be prefixed with ipv4: e.g. ipv4:<hostname/IP address>:<port>,<hostname/IP address>:<port>

Supported Values	`<hostname/IP address>:<port>`, `<load-balancer-address>:<load-balancer-port>`, `ipv4:<hostname/IP address>:<port>,<hostname/IP address>:<port>`
Version Added	`4.0.0`

DATAD_SSL_ACTIVE¶

Set to true when TLS is enabled on the Data Daemon socket.

Supported Values	`true\|false`
Version Added	`4.0.0`

DATAD_SSL_TRUSTED_CERTS¶

The trusted certificate when TLS is enabled on the Data Daemon socket.

Supported Values	Full path to the trusted certificate
Version Added	`4.0.0`

DATAD_WEB_PASS¶

Password for authentication with Data Daemon Web Interface (if configured). Password encryption is supported using the Password Tool utility.

Supported Values	Data Daemon Web Interface user password
Version Added	`4.1.0`

DATAD_WEB_USER¶

User for authentication with Data Daemon Web Interface (if configured).

Supported Values	Data Daemon Web Interface username
Version Added	`4.1.0`

DB_NAME_PREFIX¶

Database name/path prefix for multitenant support. This allows multiple Oracle Database databases to offload to the same backend cluster. If undefined, the DB_UNIQUE_NAME will be used, giving <DB_UNIQUE_NAME>_<schema>. If defined but empty, no prefix is used, giving <schema>. Otherwise, databases will be named <DB_NAME_PREFIX>_<schema>.

If the source database is part of an Oracle Data Guard configuration set DB_NAME_PREFIX to ensure that DB_UNIQUE_NAME is not used.

Associated Option	`--db-name-prefix`
Supported Values	Characters supported by the backend database/dataset/schema-naming rules
Version Added	`2.3.0`

DEFAULT_BUCKETS¶

Default number of offload buckets for parallel data retrieval from the backend Hadoop system. If you aim to run your biggest queries with parallel DOP X then set this value to X. This way each Oracle Database PX slave can start its own Smart Connector process for fetching a subset of data.

Associated Option	`--num-buckets`
Supported Values	Valid Oracle Database DOP
Version Added	`2.3.0`

DEFAULT_BUCKETS_MAX¶

Upper limit of DEFAULT_BUCKETS when DEFAULT_BUCKETS=AUTO.

Default Value	`16`
Supported Values	Valid Oracle Database DOP
Version Added	`2.7.0`

DEFAULT_BUCKETS_THRESHOLD¶

Threshold at which RDBMS segments are considered “small” by DEFAULT_BUCKETS=AUTO tuning.

Supported Values	E.g. `3M`, `0.5G`
Version Added	`2.7.0`

GOOGLE_APPLICATION_CREDENTIALS¶

Path to Google service account private key JSON file.

Supported Values	Valid paths
Version Added	`4.0.0`

GOOGLE_KMS_KEY_NAME¶

Google Cloud Key Management Service cryptographic key name to use for encryption and decryption operations. The purpose of this key must be Symmetric encryption.

Supported Values	Valid KMS key name
Version Added	`4.2.0`

GOOGLE_KMS_KEY_RING_NAME¶

Google Cloud Key Management Service cryptographic key ring name containing the key defined in GOOGLE_KMS_KEY_NAME

Supported Values	Valid KMS key ring name
Version Added	`4.2.0`

GOOGLE_KMS_KEY_RING_LOCATION¶

Google Cloud Key Management Service cryptographic key ring location of the key ring defined in GOOGLE_KMS_KEY_RING_NAME

Supported Values	Valid Google Cloud Service locations
Version Added	`4.2.0`

HADOOP_SSH_USER¶

User to connect to Hadoop server(s) defined in HIVE_SERVER_HOST using password-less SSH.

Supported Values	Valid host username
Version Added	`2.3.0`

HDFS_CMD_HOST¶

Overrides HIVE_SERVER_HOST for the HDFS command steps only. In split installation environments where orchestration commands are run from a Hadoop edge node(s), set this to localhost in the Hadoop edge node(s) configuration file.

Supported Values	Hostname or IP address of HDFS host
Version Added	`2.3.0`

HDFS_DATA¶

HDFS data directory of the HIVE_SERVER_USER. Used to store offloaded data.

Associated Option	`--hdfs-data`
Supported Values	Valid HDFS directory
Version Added	`2.3.0`

HDFS_DB_PATH_SUFFIX¶

Hadoop databases are named <schema><HDFS_DB_PATH_SUFFIX> and <schema>_load<HDFS_DB_PATH_SUFFIX>. When this value is not set the suffix of the databases defaults to .db, giving <schema>.db and <schema>_load.db. Set this to an empty string to use no suffix. For backend systems other than Hadoop this variable has no effect.

Associated Option	`--hdfs-db-path-suffix`
Supported Values	Hostname or IP address of HDFS host
Version Added	`2.3.0`

HDFS_HOME¶

HDFS home directory of the HIVE_SERVER_USER.

Supported Values	Valid HDFS directory
Version Added	`2.3.0`

HDFS_LOAD¶

HDFS data directory of the HIVE_SERVER_USER. Used to stage offloaded data.

Supported Values	Valid HDFS directory
Version Added	`3.4.0`

HDFS_NAMENODE_ADDRESS¶

Hostname or IP address of the active HDFS namenode or the ID of the HDFS nameservice if HDFS High Availability is configured. This value is required in order to execute result cache queries. In a deployment where result cache queries will never be used, this variable can safely be unset.

Supported Values	Hostname or IP address of active HDFS namenode or ID of the HDFS nameservice if HDFS High Availability is configured
Version Added	`2.3.0`

HDFS_NAMENODE_PORT¶

Port of the active HDFS namenode. Set to 0 if HDFS High Availability is configured and HDFS_NAMENODE_ADDRESS is set to a nameservice ID. As with HDFS_NAMENODE_ADDRESS, this value is necessary for executing result cache queries, but otherwise can safely be unset.

Supported Values	Port of active HDFS namenode or `0` if HDFS High Availability is configured
Version Added	`2.3.0`

HDFS_RESULT_CACHE_USER¶

Hadoop user to impersonate when making HDFS requests for result cache queries; must have write permissions to HDFS_HOME. In a deployment where result cache queries will never be used, this variable can safely be unset.

Default Value	`HIVE_SERVER_USER`
Supported Values	Hadoop username
Version Added	`2.3.0`

HDFS_SNAPSHOT_PATH¶

Before an Incremental Update compaction a HDFS snapshot will be automatically created in the location specified by HDFS_SNAPSHOT_PATH. This location must be a snapshottable directory (consult your HDFS administrators to enable this). When changing HDFS_SNAPSHOT_PATH from the default ensure that it remains a parent directory of HDFS_DATA. Unsetting this variable will disable automatic HDFS snapshots.

Default Value	`HDFS_DATA`
Supported Values	HDFS path that is equal to or a parent of `HDFS_DATA`
Version Added	`2.10.0`

HDFS_SNAPSHOT_SUDO_COMMAND¶

If HADOOP_SSH_USER is not the inode owner of HDFS_SNAPSHOT_PATH then HDFS superuser rights will be required to take HDFS snapshots. A sudo rule (or equivalent user substitution tool) can be used to enable this using HDFS_SNAPSHOT_SUDO_COMMAND. The command must be password-less.

Supported Values	A valid user-substitution command
Version Added	`2.10.0`

HIVE_SERVER_AUTH_MECHANISM¶

Authentication mechanism for HiveServer2. In non-kerberized and non-LDAP environments, should be set to: Impala: NOSASL, Hive: value of hive-site.xml: hive.server2.authentication. In LDAP environments, should be set to PLAIN.

Supported Values	`NOSASL\|PLAIN`, value of `hive-site.xml`: `hive.server2.authentication`
Version Added	`2.3.0`

HIVE_SERVER_HOST¶

Name of host(s) to connect to Impala/Hive. Can be a comma-separated list of hosts to randomly choose from, e.g. hadoop1,hadoop2,hadoop3.

Supported Values	Hostname or IP address of Impala/Hive host(s)
Version Added	`2.3.0`

HIVE_SERVER_HTTP_PATH¶

Path component of URL endpoint when connecting to HiveServer2 in HTTP mode (i.e. when HIVE_SERVER_HTTP_TRANSPORT is true).

Supported Values	Valid URL path
Version Added	`4.1.0`

HIVE_SERVER_HTTP_TRANSPORT¶

Use HTTP transport for HiveServer2 connections.

Default Value	`false`
Supported Values	`true\|false`
Version Added	`4.1.0`

HIVE_SERVER_PASS¶

Password of the user to authenticate with HiveServer2 service. Required in LDAP enabled Impala configurations. Password encryption is supported using the Password Tool utility.

Supported Values	HiveServer2 service password
Version Added	`2.3.0`

HIVE_SERVER_PORT¶

Port of HiveServer2 service. Default Impala port is 21050, default Hive port is 10000.

Default Value	`21050\|10000`
Supported Values	Port of HiveServer2 service
Version Added	`2.3.0`

HIVE_SERVER_USER¶

Name of the user to authenticate with HiveServer2 service.

Supported Values	HiveServer2 service username
Version Added	`2.3.0`

HYBRID_EXT_TABLE_DEGREE¶

Default degree of parallelism for base hybrid external tables. When set to AUTO Offload will copy settings from the source RDBMS table to the hybrid external table.

Associated Option	`--ext-table-degree`
Supported Values	`AUTO` and positive integers
Version Added	`2.11.2`

HS2_SESSION_PARAMS¶

Comma-separated list of HiveServer2 session parameters to set.

BATCH_SIZE=16384 is a recommended performance setting.

E.g. export HS2_SESSION_PARAMS="BATCH_SIZE=16384,MEM_LIMIT=2G".

Supported Values	Valid Impala/Hive session parameters
Version Added	`2.3.0`

IN_LIST_JOIN_TABLE¶

Database and table name of the in-list join table. Can be created and populated with ./connect --create-sequence-table. Applicable to Impala.

Supported Values	Valid database and table name
Version Added	`2.4.2`

IN_LIST_JOIN_TABLE_SIZE¶

Size of table specified by IN_LIST_JOIN_TABLE. Required for both table population by connect, and table usage by Gluent Query Engine. Applicable to Impala.

Supported Values	Up to 1000000
Version Added	`2.4.2`

KERBEROS_KEYTAB¶

The path of the keytab file. If not provided, a valid ticket must already exist in the cache (i.e. manual kinit).

Supported Values	Path to the keytab file
Version Added	`2.3.0`

KERBEROS_PATH¶

If your Kerberos utilities (like kinit) reside in some non-standard directory, set the path here.

Supported Values	Path to Kerberos utilities
Version Added	`2.3.0`

KERBEROS_PRINCIPAL¶

The Kerberos user to authenticate as. i.e. kinit -kt KERBEROS_KEYTAB KERBEROS_PRINCIPAL should succeed. If KERBEROS_KEYTAB is provided, this should also be provided.

Supported Values	Name of Kerberos principal
Version Added	`2.3.0`

KERBEROS_SERVICE¶

The Impala/Hive service (typically impala/hive). If empty, Smart Connector will attempt to connect unsecured.

Supported Values	Name of Impala service
Version Added	`2.3.0`

KERBEROS_TICKET_CACHE_PATH¶

Required to use the libhdfs3-based result cache with an HDFS cluster that uses Kerberos authentication. In a deployment where result cache queries will never be used, this variable can safely be unset.

Supported Values	Path to Kerberos ticket cache path for the user that will be executing Smart Connector processes
Version Added	`2.3.0`

LD_LIBRARY_PATH¶

Ensures Gluent lib directory is included.

Supported Values	Valid paths
Version Added	`2.3.0`

LIBHDFS3_CONF¶

HDFS client configuration file location.

Supported Values	Valid path to XML configuration file
Version Added	`3.0.4`

LOG_LEVEL¶

Logging level verbosity.

Default Value	`info`
Supported Values	`info\|detail\|debug`
Version Added	`2.3.0`

MAX_OFFLOAD_CHUNK_COUNT¶

Restrict number of partitions offloaded per cycle. See Offload Transport Chunks for usage.

Associated Option	`--max-offload-chunk-count`
Supported Values	`1`-`1000`
Version Added	`2.9.0`

MAX_OFFLOAD_CHUNK_SIZE¶

Restrict size of partitions offloaded per cycle. See Offload Transport Chunks for usage.

Associated Option	`--max-offload-chunk-size`
Supported Values	E.g. `100M`, `1G`, `1.5G`
Version Added	`2.9.0`

METAD_AUTOSTART¶

Enable Metadata Daemon automatic start:

TRUE: If Metadata Daemon is not running, Smart Connector will attempt to start Metadata Daemon automatically. FALSE: Smart Connector will only attempt to connect to an already running Metadata Daemon.

Default Value	`true`
Supported Values	`true\|false`
Version Added	`2.6.0`

METAD_POOL_SIZE¶

The maximum number of connections Metadata Daemon will maintain in its connection pool to Oracle Database.

Default Value	`16`
Supported Values	Number of connections
Version Added	`2.4.5`

METAD_POOL_TIMEOUT¶

The timeout for idle connections in Metadata Daemon’s connection pool to Oracle Database.

Default Value	`300`
Supported Values	Timeout value in seconds
Version Added	`2.4.5`

NLS_LANG¶

Should be set to the value of Oracle Database NLS_CHARACTERSET.

Supported Values	Valid `NLS_CHARACTERSET` values
Version Added	`2.3.0`

NUM_LOCATION_FILES¶

Number of external table location files for parallel data retrieval.

Associated Option	`--num-location-files`
Supported Values	Integer values
Version Added	`2.7.2`

OFFLOAD_BACKEND_SESSION_PARAMETERS¶

Key/value pairs, in JSON format, to override backend query engine parameters. These take effect when establishing a connection to the backend system. For example:

"{\"export OFFLOAD_BACKEND_SESSION_PARAMETERS="{\"request_pool\": \"'root.gluent'\"}"

Supported Values	Valid JSON string of key/value pairs (no nested or complex data types)
Version Added	`3.3.2`

OFFLOAD_BIN¶

Path to the Gluent Data Platform bin directory ($OFFLOAD_HOME/bin).

Supported Values	Oracle Database directory object name
Version Added	`2.3.0`

OFFLOAD_CONF¶

Path to the Gluent Data Platform conf directory.

Supported Values	Path to `conf` directory
Version Added	`2.3.0`

OFFLOAD_COMPRESS_LOAD_TABLE¶

Compress staged data during an Offload. This can be useful when staging to cloud storage.

Associated Option	`--compress-load-table`
Supported Values	`true\|false`
Version Added	`4.0.0`

OFFLOAD_DISTRIBUTE_ENABLED¶

Distribute data by partition key(s) during the final INSERT operation of an offload. Hive only.

Associated Option	`--offload-distribute-enabled`
Supported Values	`true\|false`
Version Added	`2.8.0`

OFFLOAD_FS_AZURE_ACCOUNT_DOMAIN¶

Microsoft Azure storage account service domain, required when staging offloaded data in Azure storage.

Supported Values	`blob.core.windows.net`
Version Added	`4.1.0`

OFFLOAD_FS_AZURE_ACCOUNT_KEY¶

Microsoft Azure account key, required when staging offloaded data in Azure storage.

Supported Values	Valid Azure account key
Version Added	`4.1.0`

OFFLOAD_FS_AZURE_ACCOUNT_NAME¶

Microsoft Azure account name, required when staging offloaded data in Azure storage.

Supported Values	Valid Azure account name
Version Added	`4.1.0`

OFFLOAD_FS_CONTAINER¶

The name of the bucket or container to be used when offloading to cloud storage.

Associated Option	`--offload-fs-container`
Supported Values	A cloud storage bucket/container name configured for use by the backend cluster
Version Added	`3.0.0`

OFFLOAD_FS_PREFIX¶

A directory path used to prefix database locations within OFFLOAD_FS_SCHEME. When OFFLOAD_FS_SCHEME is inherit HDFS_DATA takes precedence over this setting.

Associated Option	`--offload-fs-prefix`
Supported Values	A valid directory in HDFS or cloud storage
Version Added	`3.0.0`

OFFLOAD_FS_SCHEME¶

The filesystem scheme to be used for database and table locations. inherit specifies that all tables created by Offload will not specify a LOCATION clause, they will inherit the location from the parent database. See Integrating with Cloud Storage for details.

Associated Option	`--offload-fs-scheme`
Supported Values	`inherit`, `hdfs`, `s3a`, `adl`, `abfs`, `abfss`
Version Added	`3.0.0`

OFFLOAD_HOME¶

Location of Gluent Data Platform installation.

Supported Values	Path to installed `offload` directory
Version Added	`2.3.0`

OFFLOAD_LOG¶

Path to the Gluent Data Platform log directory.

Supported Values	Oracle Database directory object name
Version Added	`2.3.0`

OFFLOAD_LOGDIR¶

Override Smart Connector log path. If undefined defaults to $OFFLOAD_HOME/log.

Supported Values	Valid path
Version Added	`2.3.0`

OFFLOAD_NOT_NULL_PROPAGATION¶

Specify how Offload should treat NOT NULL constraints on offloaded columns. A value of AUTO will propagate all RDBMS NOT NULL constraints to the backend and a value of NONE will not propagate any NOT NULL constraints columns to the backend table. Only applies to Google BigQuery, Snowflake or Azure Synapse Analytics backends. The --not-null-columns option can be used to override this global setting, allowing a specific list of columns to be defined as NOT NULL for an individual offload.

Default Value	`AUTO`
Supported Values	`AUTO\|NONE`
Version Added	`4.3.4`

OFFLOAD_SORT_ENABLED¶

Enables the sorting/clustering of data when inserting in to the final destination table. Columns used for sorting/clustering are specified using --sort-columns.

Associated Option	`--offload-sort-enabled`
Supported Values	`true\|false`
Version Added	`2.7.0`

OFFLOAD_STAGING_FORMAT¶

Staging file format to use when staging offloaded data for loading into Snowflake.

Default value	`PARQUET`
Supported Values	`AVRO\|PARQUET`
Version Added	`4.1.0`

OFFLOAD_TRANSPORT¶

Method used to transport data from an RDBMS frontend to a backend system. AUTO selects the optimal method based on configuration and table structure.

Associated Option	`--offload-transport`
Supported Values	`AUTO\|GLUENT\|SQOOP`
Version Added	`3.1.0`

OFFLOAD_TRANSPORT_AUTH_USING_ORACLE_WALLET¶

Instruct Offload that RDBMS authentication is via an Oracle Wallet. The wallet location should be configured using Hadoop configuration appropriate to method used for data transport. See SQOOP_OVERRIDES and OFFLOAD_TRANSPORT_SPARK_PROPERTIES for examples.

Supported Values	`true\|false`
Version Added	`3.1.0`

OFFLOAD_TRANSPORT_CMD_HOST¶

An override for HDFS_CMD_HOST when running shell based Offload Transport commands such as Sqoop or Spark Submit.

Associated Option	`--offload-transport-cmd-host`
Supported Values	Hostname or IP address of HDFS host
Version Added	`3.1.0`

OFFLOAD_TRANSPORT_CONSISTENT_READ¶

Control whether parallel data transport tasks should use a consistent point in time when reading RDBMS data.

Associated Option	`--offload-transport-consistent-read`
Supported Values	`true\|false`
Version Added	`3.1.0`

OFFLOAD_TRANSPORT_CREDENTIAL_PROVIDER_PATH¶

The credential provider path to be used in conjunction with OFFLOAD_TRANSPORT_PASSWORD_ALIAS. Integration with Hadoop Credential Provider API is only supported by Sqoop, Spark Submit and Livy based Offload Transport.

Supported Values	A valid HDFS path
Version Added	`3.1.0`

OFFLOAD_TRANSPORT_DSN¶

Database connection details for Offload Transport if different to ORA_CONN.

Associated Option	`--offload-transport-dsn`
Supported Values	`<hostname>:<port>/<service>`
Version Added	`3.1.0`

OFFLOAD_TRANSPORT_FETCH_SIZE¶

Number of records to fetch in a single batch from the RDBMS during Offload. Offload Transport may encounter memory pressure if a table is very wide (e.g. contains LOB columns) and there are lots of records in a batch. Reducing the fetch size can alleviate this if more memory cannot be allocated.

Associated Option	`--offload-transport-fetch-size`
Supported Values	Positive integers
Version Added	`3.1.0`

OFFLOAD_TRANSPORT_LIVY_API_VERIFY_SSL¶

Used to enable SSL for Livy API calls. There are 4 states:

Empty: Do not use SSL.

TRUE: Use SSL and verify Hadoop certificate against known certificates.

FALSE: Use SSL and do not verify Hadoop certificate.

/some/path/here/cert-bundle.crt: Use SSL and verify Hadoop certificate against path to certificate bundle.

Supported Values	Empty, `true\|false` , `<path to certificate bundle>`
Version Added	`3.1.0`

OFFLOAD_TRANSPORT_LIVY_API_URL¶

URL for Livy/Spark REST API in the format http://fqdn-n.example.com:port. https can be used in place of http.

Associated Option	`--offload-transport-livy-api-url`
Supported Values	Valid Livy REST API URL
Version Added	`3.1.0`

OFFLOAD_TRANSPORT_LIVY_IDLE_SESSION_TIMEOUT¶

Timeout (in seconds) for idle Spark client sessions created in Livy.

Associated Option	`--offload-transport-livy-idle-session-timeout`
Supported Values	Positive integers
Version Added	`3.1.0`

OFFLOAD_TRANSPORT_LIVY_MAX_SESSIONS¶

Limits the number of Livy sessions Offload will create. Sessions are re-used when idle. New sessions are only created when no idle sessions are available.

Associated Option	`--offload-transport-livy-max-sessions`
Supported Values	Positive integers
Version Added	`3.1.0`

OFFLOAD_TRANSPORT_PARALLELISM¶

The number of parallel streams to be used when transporting data from the source RDBMS to the backend.

Associated Option	`--offload-transport-parallelism`
Supported Values	Positive integers
Version Added	`3.1.0`

OFFLOAD_TRANSPORT_PASSWORD_ALIAS¶

An alias provided by Hadoop Credential Provider API to be used for RDBMS authentication during Offload Transport. The key store containing the alias must be specified in either OFFLOAD_TRANSPORT_CREDENTIAL_PROVIDER_PATH or in Hadoop configuration Path (hadoop.security.credential.provider.path).

Associated Option	`--offload-transport-password-alias`
Supported Values	Valid Hadoop Credential Provider API alias
Version Added	`3.1.0`

OFFLOAD_TRANSPORT_RDBMS_SESSION_PARAMETERS¶

Key/value pairs, in JSON format, to supply database session parameter values. These only take effect during Offload Transport, e.g. '{"cell_offload_processing": "false"}'

Supported Values	Valid JSON string of key/value pairs (no nested or complex data types)
Version Added	`3.1.0`

OFFLOAD_TRANSPORT_SMALL_TABLE_THRESHOLD¶

Threshold above which Query Import is no longer considered the correct offload choice for non-partitioned tables.

Supported Values	E.g. `100M`, `1G`, `1.5G`
Version Added	`4.2.0`

OFFLOAD_TRANSPORT_SPARK_OVERRIDES¶

Override JVM flags for a spark-submit command, inserted immediately after spark-submit.

Associated Option	`--offload-transport-jvm-overrides`
Supported Values	Valid JVM options
Version Added	`3.1.0`

OFFLOAD_TRANSPORT_SPARK_PROPERTIES¶

Key/value pairs, in JSON format, to override Spark property defaults. Examples:

'{"spark.driver.memory": "8G", "spark.executor.memory": "8G"}'

'{"spark.driver.extraJavaOptions": "-Doracle.net.wallet_location=/some/path/here/gluent_wallet", "spark.executor.extraJavaOptions": "-Doracle.net.wallet_location=/some/path/here/gluent_wallet"}'

Associated Option	`--offload-transport-spark-properties`
Supported Values	Valid JSON string of key/value pairs (no nested or complex data types)
Version Added	`3.1.0`

Note

Some properties will not take effect when connecting to the Spark Thrift Server because the Spark context has already been created.

OFFLOAD_TRANSPORT_SPARK_QUEUE_NAME¶

YARN queue name for Gluent Offload Engine Spark jobs.

Associated Option	`--offload-transport-queue-name`
Supported Values	Valid YARN queue name
Version Added	`3.1.0`

OFFLOAD_TRANSPORT_SPARK_SUBMIT_EXECUTABLE¶

The executable to use for submitting Spark applications. Can be empty, spark-submit or spark2-submit.

Supported Values	Blank or `spark-submit\|spark2-submit`
Version Added	`3.1.0`

OFFLOAD_TRANSPORT_SPARK_SUBMIT_MASTER_URL¶

The master URL for the Spark cluster, only used for non-Hadoop Spark clusters. If empty, Spark will use default settings.

Associated Option	None
Supported Values	Valid master URL
Version Added	`4.0.0`

OFFLOAD_TRANSPORT_SPARK_THRIFT_HOST¶

Name of host(s) where the Spark Thrift Server is running. Can be a comma-separated list of hosts to randomly choose from, e.g. hadoop1,hadoop2,hadoop3.

Associated Option	`--offload-transport-spark-thrift-host`
Supported Values	Hostname or IP address of Spark Thrift Server host(s)
Version Added	`3.1.0`

OFFLOAD_TRANSPORT_SPARK_THRIFT_PORT¶

Port that the Spark Thrift Server is listening on.

Associated Option	`--offload-transport-spark-thrift-port`
Supported Values	Active port
Version Added	`3.1.0`

OFFLOAD_TRANSPORT_USER¶

User to authenticate as when executing Offload Transport commands such as SSH for spark-submit or Sqoop commands, or Livy API calls

Associated Option	None
Supported Values	Valid username
Version Added	`4.0.0`

OFFLOAD_TRANSPORT_VALIDATION_POLLING_INTERVAL¶

Polling interval in seconds for validation of Spark transport row count. A value of -1 disables retrieval of RDBMS SQL statistics. A value of 0 disables polling resulting in a single capture of SQL statistics after Offload Transport. A value greater than 0 polls RDBMS SQL statistics using the specified interval.

Associated Option	`--offload-transport-validation-polling-interval`
Supported Values	Interval value in seconds, `0` or `-1`
Version Added	`4.2.1`

Note

When the Spark Thrift Server or Apache Livy are used for Offload Transport it is recommended to set OFFLOAD_TRANSPORT_VALIDATION_POLLING_INTERVAL to a positive value. This is because polling RDBMS SQL statistics is the primary validation for both Spark Thrift Server and Apache Livy based Offload Transport.

OFFLOAD_UDF_DB¶

For Impala/Hive, the database that Gluent Data Platform UDFs are created in. If undefined defaults to the default database.

For BigQuery, the name of the dataset that contains custom UDF(s) for synthetic partitioning. If undefined, the dataset will be determined from the --partition-functions option.

Supported Values	Valid Impala/Hive database or BigQuery dataset
Version Added	`2.3.0`

OFFLOAD_VERIFY_PARALLELISM¶

Degree of parallelism to use for the RDBMS query executed when validating an offload. Values of 0 or 1 will execute the query without parallelism. Values > 1 will force a parallel query of the given degree. If unset, the RDBMS query will fall back to using the behavior specified by RDBMS defaults.

Associated Option	`--verify-parallelism` `--frontend-parallelism`
Supported Values	`0` and positive integers
Version Added	`4.2.1`

ORA_ADM_CONN¶

Connection string (typically tnsnames.ora entry) for ORA_ADM_USER connections. Primarily for use with Oracle Wallet as each entry requires a unique connection string.

Supported Values	Connection string corresponding to the Oracle Wallet entry for `ORA_ADM_USER`
Version Added	`4.2.0`

ORA_ADM_PASS¶

Password of the Gluent Data Platform Admin Schema chosen during installation. Password encryption is supported using the Password Tool utility.

Supported Values	Oracle Database ADM password
Version Added	`2.3.0`

ORA_ADM_USER¶

Name of the Gluent Data Platform Admin Schema chosen during installation.

Supported Values	Oracle Database ADM username
Version Added	`2.3.0`

ORA_APP_PASS¶

Password of the Gluent Data Platform Application Schema chosen during installation. Password encryption is supported using the Password Tool utility.

Supported Values	Oracle Database APP password
Version Added	`2.3.0`

ORA_APP_USER¶

Name of the Gluent Data Platform Application Schema chosen during installation.

Supported Values	Oracle Database APP username
Version Added	`2.3.0`

ORA_CONN¶

Oracle Database connection details. Fully qualified DB service name must be used if the Oracle Database service name includes domain-names (DB_DOMAIN), e.g. ORCL12.gluent.com.

Supported Values	`<hostname>:<port>/<service>`
Version Added	`2.3.0`

ORA_REPO_USER¶

Name of the Gluent Data Platform Repository Schema chosen during installation.

Supported Values	Oracle Database REPO username
Version Added	`3.3.0`

PASSWORD_KEY_FILE¶

Password key file generated by Password Tool and used to create encrypted password strings.

Supported Values	Path to Password Key File
Version Added	`2.5.0`

PATH¶

Ensures Gluent Data Platform bin directory is included. The path order is important to ensure that the Python distribution included with Gluent Data Platform is used.

Supported Values	Valid paths
Version Added	`2.3.0`

QUERY_ENGINE¶

Backend SQL engine to use for commands issued as part of Offload/Present orchestration.

Supported Values	`BIGQUERY\|IMPALA\|SNOWFLAKE\|SYNAPSE`
Version Added	`2.3.0`

QUERY_MONITOR_THRESHOLD¶

Threshold for hybrid query execution time (in seconds) that enables automatic monitoring of a query in the backend. Queries with Data Daemon execution time below this threshold will not gather any backend trace metrics or profiles. A value of 0 will enable automatic trace/profile collection for all hybrid queries. Individual hybrid queries can have trace enabled or disabled with the GLUENT_QUERY_MONITOR or GLUENT_NO_QUERY_MONITOR hints, respectively.

Supported Values	Integers >= 0
Version Added	`4.3.2`

SNOWFLAKE_ACCOUNT¶

Name of the Snowflake account to use with Gluent Data Platform.

Supported Values	Snowflake account name
Version Added	`4.1.0`

SNOWFLAKE_DATABASE¶

Name of the Snowflake database to use with Gluent Data Platform.

Supported Values	Snowflake database name
Version Added	`4.1.0`

SNOWFLAKE_FILE_FORMAT_PREFIX¶

Name prefix for Gluent Offload Engine to use when creating file format objects while offloading to Snowflake.

Default Value	`GLUENT_OFFLOAD_FILE_FORMAT`
Supported Values	Valid Snowflake file format object name <= 120 characters
Version Added	`4.1.0`

SNOWFLAKE_INTEGRATION¶

Name of the Snowflake storage integration for Gluent Offload Engine to use when offloading to Snowflake.

Supported Values	Valid Snowflake integration name
Version Added	`4.1.0`

SNOWFLAKE_PASS¶

Password for Snowflake service account user for Gluent Data Platform, required when using password authentication. Password encryption is supported using the Password Tool utility.

Supported Values	Snowflake user’s password
Version Added	`4.1.0`

SNOWFLAKE_PEM_FILE¶

Path to private PEM file for Snowflake service account user for Gluent Data Platform, required when using key-pair authentication.

Supported Values	Path to Snowflake user’s private PEM key file
Version Added	`4.1.0`

SNOWFLAKE_PEM_PASSPHRASE¶

Optional PEM passphrase to authenticate the Snowflake service account user for Gluent Data Platform, only required when using key-pair authentication with a passphrase. Passphrase encryption is supported using the Password Tool utility.

Supported Values	Snowflake user’s PEM passphrase
Version Added	`4.1.0`

SNOWFLAKE_ROLE¶

Name of the Snowflake database role created by Gluent Data Platform.

Default Value	`GLUENT_OFFLOAD_ROLE`
Supported Values	Valid Snowflake role name
Version Added	`4.1.0`

SNOWFLAKE_STAGE¶

Name for Gluent Offload Engine to use when creating schema-level stage objects while offloading to Snowflake.

Default Value	`GLUENT_OFFLOAD_STAGE`
Supported Values	Valid Snowflake stage name
Version Added	`4.1.0`

SNOWFLAKE_USER¶

Name of the Snowflake service account user for Gluent Data Platform.

Supported Values	Valid Snowflake user name
Version Added	`4.1.0`

SNOWFLAKE_WAREHOUSE¶

Default Snowflake warehouse for Gluent Data Platform to use when interacting with Snowflake.

Supported Values	Valid Snowflake warehouse name
Version Added	`4.1.0`

SPARK_HISTORY_SERVER¶

SPARK_HISTORY_SERVER is a URL for accessing the runtime history of the running Spark Thrift Server UI.

Supported Values	URL of Spark History Server e.g. `http://hadoop1:18081/`
Version Added	`3.1.0`

SPARK_THRIFT_HOST¶

Name of host(s) where the Spark Thrift Server is running. Can be a comma-separated list of hosts to randomly choose from, e.g. hadoop1,hadoop2,hadoop3.

Supported Values	Hostname or IP address of Spark Thrift Server host(s)
Version Added	`3.1.0`

SPARK_THRIFT_PORT¶

Port that the Spark Thrift Server is listening on.

Supported Values	Active port
Version Added	`3.1.0`

SQOOP_DISABLE_DIRECT¶

It is recommended that the OraOOP optimizations for Sqoop (included in standard Apache Sqoop from v1.4.5) are used. If not, then disable direct path mode.

Associated Option	`--sqoop-disable-direct`
Supported Values	`true\|false`
Version Added	`2.3.0`

SQOOP_OVERRIDES¶

Override flags for Sqoop command, inserted immediately after sqoop import. To avoid issues, -Dsqoop.avro.logical_types.decimal.enable=false is included by default and should not be removed. Additional settings can be added as below. For example:

"-Dsqoop.avro.logical_types.decimal.enable=false -Dmapreduce.map.java.opts='-Doracle.net.wallet_location=/some/path/here/gluent_wallet'"

Associated Option	`--offload-transport-jvm-overrides`
Supported Values	Valid Sqoop parameters
Version Added	`2.3.0`

SQOOP_ADDITIONAL_OPTIONS¶

Additional Sqoop command options added at the end of the Sqoop command.

Associated Option	`--sqoop-additional-options`
Supported Values	Any Sqoop command option/argument not already included in the Sqoop command line
Version Added	`2.9.0`

SQOOP_PASSWORD_FILE¶

HDFS path to Sqoop password file, readable by HADOOP_SSH_USER. If not specified, ORA_APP_PASS will be used.

Associated Option	`--sqoop-password-file`
Supported Values	HDFS path to password file
Version Added	`2.5.0`

SQOOP_QUEUE_NAME¶

YARN queue name for Gluent Offload Engine Sqoop jobs.

Associated Option	`--offload-transport-queue-name`
Supported Values	Valid YARN queue name
Version Added	`3.1.0`

SSL_ACTIVE¶

Set to true when Impala/Hive uses SSL/TLS encryption.

Supported Values	`true\|false`
Version Added	`2.3.0`

SSL_TRUSTED_CERTS¶

SSL/TLS trusted certificates.

Supported Values	Path to SSL certificate
Version Added	`2.3.0`

START_OF_WEEK¶

Specify the first day of the week for TO_CHAR(<value>, 'D') predicate pushdown. Applies to Snowflake and Azure Synapse Analytics.

Default Value	`7`
Supported Values	`1` (Monday) to `7` (Sunday)
Version Added	`4.3.0`

SYNAPSE_AUTH_MECHANISM¶

Azure Synapse Analytics authentication mechanism.

Supported Values	`SqlPassword`, `ActiveDirectoryPassword`, `ActiveDirectoryMsi`, `ActiveDirectoryServicePrincipal`
Version Added	`4.3.0`

SYNAPSE_COLLATION¶

Azure Synapse Analytics collation to use for character columns. Note that changing this to a value with different behavior to the frontend system may give unexpected results.

Supported Values	Valid collations
Version Added	`4.3.0`

SYNAPSE_DATA_SOURCE¶

Name of the external data source for Gluent Offload Engine to use when offloading to Azure Synapse Analytics. Note that in databases with case-sensitive collations this parameter is case-sensitive.

Supported Values	Valid Azure Synapse Analytics external data source
Version Added	`4.3.0`

SYNAPSE_DATABASE¶

Name of the Azure Synapse Analytics database to use with Gluent Data Platform. Note that in databases with case-sensitive collations this parameter is case-sensitive.

Supported Values	Valid Azure Synapse Analytics database name
Version Added	`4.3.0`

SYNAPSE_FILE_FORMAT¶

Name of the file format for Gluent Offload Engine to use when offloading to Azure Synapse Analytics. Note that in databases with case-sensitive collations this parameter is case-sensitive.

Supported Values	Valid Azure Synapse Analytics Parquet file format
Version Added	`4.3.0`

SYNAPSE_MSI_CLIENT_ID¶

Specifies the object (principal) ID of the identity for ActiveDirectoryMsi authentication with a user-assigned identity. Leave blank when using other authentication mechanisms.

Supported Values	Object (principal) ID of the identity
Version Added	`4.3.0`

SYNAPSE_PASS¶

Specifies the password for the Gluent Data Platform user for SqlPassword or ActiveDirectoryPassword authentication. Leave blank when using other authentication mechanisms. Password encryption is supported using the Password Tool utility.

Supported Values	Azure Synapse Analytics user’s password
Version Added	`4.3.0`

SYNAPSE_PORT¶

Dedicated SQL endpoint port of Azure Synapse Analytics workspace.

Default Value	`1433`
Supported Values	Valid port
Version Added	`4.3.0`

SYNAPSE_RESOURCE_GROUP¶

Resource group of Azure Synapse Analytics workspace.

Supported Values	Valid Azure Synapse Analytics resource group
Version Added	`4.3.0`

SYNAPSE_ROLE¶

Name of the Azure Synapse Analytics database role assigned to the Gluent Data Platform user. Note that in databases with case-sensitive collations this parameter is case-sensitive.

Supported Values	Valid Azure Synapse Analytics role name
Version Added	`4.3.0`

SYNAPSE_SERVER¶

Dedicated SQL endpoint of Azure Synapse Analytics workspace.

Supported Values	Valid Azure Synapse Analytics dedicated SQL endpoint
Version Added	`4.3.0`

SYNAPSE_SERVICE_PRINCIPAL_ID¶

Specifies the application (client) ID for ActiveDirectoryServicePrincipal authentication. Leave blank when using other authentication mechanisms.

Supported Values	Application (client) ID
Version Added	`4.3.0`

SYNAPSE_SERVICE_PRINCIPAL_SECRET¶

Specifies the client secret for ActiveDirectoryServicePrincipal authentication. Leave blank when using other authentication mechanisms.

Supported Values	Client secret
Version Added	`4.3.0`

SYNAPSE_SUBSCRIPTION_ID¶

ID of the subscription containing the Azure Synapse Analytics workspace.

Supported Values	Valid Azure Synapse Analytics resource group
Version Added	`4.3.0`

SYNAPSE_USER¶

Specifies the username for the Gluent Data Platform user for SqlPassword or ActiveDirectoryPassword authentication. Leave blank when using other authentication mechanisms.

Supported Values	Azure Synapse Analytics username
Version Added	`4.3.0`

SYNAPSE_WORKSPACE¶

Name of the Azure Synapse Analytics workspace.

Supported Values	Valid Azure Synapse Analytics workspace
Version Added	`4.3.0`

TWO_TASK¶

Used to support Pluggable Databases in Oracle Database Multitenant environments. Set to ORA_CONN for single instance and an EZconnect string connecting to the local instance, typically <hostname>:<port>/<ORACLE_SID>, for Oracle RAC (Real Application Clusters).

Supported Values	`ORA_CONN` or EZconnect string
Version Added	`2.10.0`

USE_ORACLE_WALLET¶

Controls use of Oracle Wallet for authentication for orchestration commands and Metadata Daemon. When set to true OFFLOAD_TRANSPORT_AUTH_USING_ORACLE_WALLET is automatically set to true.

Default Value	`false`
Supported Values	`true\|false`
Version Added	`4.2.0`

WEBHDFS_HOST¶

Can be used in conjunction with WEBHDFS_PORT to optimize HDFS activities removing JVM start-up overhead by utilizing WebHDFS. From version 2.4.7 the value can be a comma-separated list of hosts if HDFS is configured for High Availability.

Supported Values	Hostname or IP address of WebHDFS host
Version Added	`2.3.0`

WEBHDFS_PORT¶

Can be used in conjunction with WEBHDFS_HOST to optimize HDFS activities removing JVM start-up overhead by utilizing WebHDFS. If this value is unset then default ports of 50070 (HTTP) or 50470 (HTTPS) are used.

Default Value	`50070\|50470`
Supported Values	Port of HDFS namenode
Version Added	`2.3.0`

WEBHDFS_VERIFY_SSL¶

Used to enable SSL for WebHDFS calls. There are 4 states:

Empty: Do not use SSL
TRUE: Use SSL & verify Hadoop certificate against known certificates
FALSE: Use SSL & do not verify Hadoop certificate
/some/path/here/cert-bundle.crt: Use SSL & verify Hadoop certificate against path to certificate bundle

Supported Values	Empty, `true\|false`, `<path to certificate bundle>`
Version Added	`2.3.0`

Common Parameters ¶

--execute¶

Perform operations, rather than just printing.

Alias	`-x`
Default Value	None
Supported Values	None
Version Added	`2.3.0`

-f¶

Force option. Replace Gluent Offload Engine managed tables/views as required. Use with caution.

Alias	`--force`
Default Value	None
Supported Values	None
Version Added	`2.3.0`

--force¶

Force option. Replace Gluent Offload Engine managed tables/views as required. Use with caution.

Alias	`-f`
Default Value	None
Supported Values	None
Version Added	`2.3.0`

--no-webhdfs¶

Prevent the use of WebHDFS even when configured for use.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.3.0`

-t¶

Owner and table name.

Alias	`--table`
Default Value	None
Supported Values	`<OWNER>.<NAME>`
Version Added	`2.3.0`

--table¶

Owner and table name.

Alias	`-t`
Default Value	None
Supported Values	`<OWNER>.<NAME>`
Version Added	`2.3.0`

--target-name¶

Override owner and/or name of created frontend or backend object as appropriate for a command.

Allows separation of the RDBMS owner and/or name from the backend system. This can be necessary as some characters supported for owner and name in Oracle Database are not supported in all backend systems, for example $ in Hadoop-based or BigQuery backends.

Allows offload to an existing backend database with a different name to the source RDBMS schema.

Allows present to a hybrid schema without a corresponding application RDBMS schema or with a different name to the source backend database.

Alias	None
Default Value	None
Supported Values	`<OWNER>.<NAME>`
Version Added	`2.3.0`

-v¶

Verbose output.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.3.0`

--vv¶

More verbose output.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.3.0`

-x¶

Perform operations, rather than just printing.

Alias	`--execute`
Default Value	None
Supported Values	None
Version Added	`2.3.0`

Connect Parameters ¶

--create-sequence-table¶

Create the Gluent Data Platform sequence table. See IN_LIST_JOIN_TABLE and IN_LIST_JOIN_TABLE_SIZE.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.4.2`

--install-udfs¶

Install Gluent Data Platform user-defined functions (UDFs).

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.3.0`

--sequence-table-name¶

See IN_LIST_JOIN_TABLE.

Alias	None
Default Value	`default.gluent_sequence`
Supported Values	Valid database and table name
Version Added	`2.4.2`

--sequence-table-size¶

See IN_LIST_JOIN_TABLE_SIZE.

Alias	None
Default Value	`10000`
Supported Values	Up to 1000000
Version Added	`2.4.2`

--sql-file¶

Write SQL commands to a file rather than execute them when connect is run.

Alias	None
Default Value	None
Supported Values	Any valid path
Version Added	`2.11.0`

--update-root-files¶

Updates both Metadata Daemon and Data Daemon scripts with configuration and sets ownership to root:root. This option can only be run with root privileges.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.11.0`

--update-metad-files¶

Updates Metadata Daemon scripts with configuration and sets ownership to root:root. This option can only be run with root privileges.

Alias	None
Default Value	None
Supported Values	None
Version Added	`4.0.0`

--update-datad-files¶

Updates Data Daemon scripts with configuration and sets ownership to root:root. This option can only be run with root privileges.

Alias	None
Default Value	None
Supported Values	None
Version Added	`4.0.0`

--upgrade-environment-file¶

Updates configuration file (offload.env) with any missing default configuration from offload.env.template. Typically used after upgrades.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.11.0`

--validate-udfs¶

Validate that the Gluent Data Platform user-defined functions (UDFs) are accessible from Impala after installation/upgrade.

Alias	None
Default Value	None
Supported Values	None
Version Added	`4.1.0`

Offload Parameters ¶

--allow-decimal-scale-rounding¶

Confirm that it is acceptable for Offload to round decimal places when loading data into a backend system.

Alias	None
Default Value	None
Supported Values	None
Version Added	`4.0.0`

--allow-floating-point-conversions¶

Confirm that it is acceptable for Offload to convert NaN or Infinity special values to NULL when loading data into a backend system.

Alias	None
Default Value	None
Supported Values	None
Version Added	`4.3.0`

--allow-nanosecond-timestamp-columns¶

Confirm that it is safe to offload timestamp columns with nanosecond capability when the backend system does not support nanoseconds.

Alias	None
Default Value	None
Supported Values	None
Version Added	`4.0.2`

--bucket-hash-column¶

Column to use when calculating offload bucket values.

Alias	None
Default Value	None
Supported Values	Valid column name
Version Added	`2.3.0`

--compress-load-table¶

Compress the contents of the load table during offload.

Alias	None
Default Value	`OFFLOAD_COMPRESS_LOAD_TABLE`, `false`
Supported Values	None
Version Added	`2.3.0`

--compute-load-table-stats¶

Compute statistics on the load table during offload. Applicable to Impala.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.9.0`

--create-backend-db¶

Automatically create backend databases. Either use this option, or ensure the correct databases/datasets/schemas (base and load databases) for offloading and presenting already exist.

Alias	None
Default Value	None
Supported Values	None
Version Added	`3.3.0`

--count-star-expressions¶

CSV list of functional equivalents to COUNT(*) for aggregation pushdown.

If you also use COUNT(x) in your SQL statements then, apart from COUNT(1) which is automatically catered for, the presence of COUNT(x) will cause rewrite rules to fail unless you include it with this parameter.

Alias	None
Default Value	None
Supported Values	E.g. `COUNT(9)`
Version Added	`2.3.0`

--data-governance-custom-properties¶

CSV list of free-format tags for data governance metadata. These are in addition to DATA_GOVERNANCE_AUTO_PROPERTIES and will override DATA_GOVERNANCE_CUSTOM_PROPERTIES.

Alias	None
Default Value	`DATA_GOVERNANCE_CUSTOM_PROPERTIES`
Supported Values	Valid JSON string of key/value pairs (no nested or complex data types)
Version Added	`2.11.0`

--data-governance-custom-tags¶

CSV list of free-format tags for data governance metadata. These are in addition to DATA_GOVERNANCE_AUTO_TAGS and therefore useful for tags to be applied to specific activities.

Alias	None
Default Value	`DATA_GOVERNANCE_CUSTOM_TAGS`
Supported Values	E.g. `CONFIDENTIAL,TIER1`
Version Added	`2.11.0`

--data-sample-parallelism¶

Degree of parallelism to use when sampling data for all columns in the source RDBMS table that are either date or timestamp-based or defined as a number without a precision and scale. A value of 0 or 1 disables parallelism.

Alias	None
Default Value	`DATA_SAMPLE_PARALLELISM`
Supported Values	`0` and positive integers
Version Added	`4.2.0`

--data-sample-percent¶

Sample data for all columns in the source RDBMS table that are either date or timestamp-based or defined as a number without a precision and scale. A value of 0 disables sampling. A value of AUTO will enable Offload choose a percentage based on the size of the RDBMS table.

Alias	None
Default Value	`AUTO`
Supported Values	`AUTO` or `0`-`100`
Version Added	`2.5.0`

--date-columns¶

CSV list of columns to offload as DATE (effective for date/timestamp columns).

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`4.0.0`

--db-name-prefix¶

Multitenant support, enabling many Oracle Database databases to offload to the same backend cluster. See DB_NAME_PREFIX for details.

Alias	None
Default Value	`DB_NAME_PREFIX`
Supported Values	Supported backend characters
Version Added	`2.3.0`

--decimal-columns¶

CSV list of columns to offload/present as a fixed precision and scale numeric data type. For example DECIMAL(p,s) where “p,s” is specified in a paired --decimal-columns-type option. Only effective for numeric columns. These options allow repeat inclusion for flexible data type specification, for example:

"--decimal-columns-type=18,2 --decimal-columns=price,cost --decimal-columns-type=6,4 --decimal-columns=location"

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`2.5.0`

--decimal-columns-type¶

State the precision and scale of columns listed in a paired --decimal-columns option. Must be of format “precision,scale” where 1<=precision<=38 and 0<=scale<=38 and scale<=precision. e.g.:

"--decimal-columns-type=18,2"

When offloading, values specified in this option are subject to padding as per the --decimal-padding-digits option.

Alias	None
Default Value	None
Supported Values	Valid “precision,scale” where 1<=precision<=38 and 0<=scale<=38 and scale<=precision
Version Added	`2.5.0`

--decimal-padding-digits¶

Padding to apply to precision and scale of DECIMALs during an offload.

Alias	None
Default Value	2
Supported Values	Integral values
Version Added	`2.5.0`

--double-columns¶

CSV list of columns to store as a double precision floating-point. Only effective for numeric columns.

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`2.4.7`

--equal-to-values¶

Used for list-partitioned tables to specify a partition to be included for Partition-Based Offload by partition key value. This option can be included multiple times to match multiple partitions, for example:

--equal-to-values=2011 --equal-to-values=2012 --equal-to-values=2013

Alias	None
Default Value	None
Supported Values	Valid literals matching list-partition key values
Version Added	`3.3.0`

--ext-table-degree¶

Default degree of parallelism for base hybrid external tables. When set to AUTO Offload will copy settings from the source RDBMS table to the hybrid external table.

Alias	None
Default Value	`HYBRID_EXT_TABLE_DEGREE` or `AUTO`
Supported Values	`AUTO` and positive integers
Version Added	`2.11.2`

--hdfs-data¶

Command line override for HDFS_DATA.

Alias	None
Default Value	`HDFS_DATA`
Supported Values	Valid HDFS path
Version Added	`2.3.0`

--hdfs-db-path-suffix¶

Hadoop databases are named <schema><HDFS_DB_PATH_SUFFIX> and <schema>_load<HDFS_DB_PATH_SUFFIX>. When this value is not set the suffix of the databases defaults to .db, giving <schema>.db and <schema>_load.db. Set this to an empty string to use no suffix. For backend systems other than Hadoop this variable has no effect.

Alias	None
Default Value	`HDFS_DB_PATH_SUFFIX`, `.db` on Hadoop systems, or `''` on other backend systems.
Supported Values	Valid HDFS path
Version Added	`2.3.0`

--hive-column-stats¶

Enable computation of column stats with “NATIVE” --offload-stats method. Applies to Hive only.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.6.1`

--integer-1-columns¶

CSV list of columns to offload/present (as applicable) as a 1-byte integer, known as TINYINT in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`3.3.0`

--integer-2-columns¶

CSV list of columns to offload/present (as applicable) as a 2-byte integer, known as SMALLINT in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`3.3.0`

--integer-4-columns¶

CSV list of columns to offload/present (as applicable) as a 4-byte integer, known as INT in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`3.3.0`

--integer-8-columns¶

CSV list of columns to offload/present (as applicable) as an 8-byte integer, known as BIGINT in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`3.3.0`

--integer-38-columns¶

CSV list of columns to offload/present (as applicable) as 38 digit integral column. If a system does not support 38 digits of precision then the most appropriate data type available will be used. Only effective for numeric columns.

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`3.3.0`

--less-than-value¶

Offload partitions with high water mark less than this value.

Alias	None
Default Value	None
Supported Values	Integer or date values (use `YYYY-MM-DD` format)
Version Added	`2.3.0`

--lob-data-length¶

Expected length of RDBMS LOB data

Alias	None
Default Value	`32K`
Supported Values	E.g. `64K`, `10M`
Version Added	`2.4.7`

--max-offload-chunk-count¶

Restrict the number of partitions offloaded per cycle. See Offload Transport Chunks for usage.

Alias	None
Default Value	`MAX_OFFLOAD_CHUNK_COUNT`, `100`
Supported Values	`1`-`1000`
Version Added	`2.3.0`

--max-offload-chunk-size¶

Restrict the size of partitions offloaded per cycle. See Offload Transport Chunks for usage.

Alias	None
Default Value	`MAX_OFFLOAD_CHUNK_SIZE`, `2G`
Supported Values	E.g. `100M`, `1G`, `1.5G`
Version Added	`2.3.0`

--no-auto-detect-dates¶

Turn off automatic adoption of string data type for RDBMS date values that are incompatible with the backend system. For example, dates preceding 1400-01-01 are invalid in Impala and will be offloaded to string columns unless this option is used.

Alias	None
Default Value	False
Supported Values	None
Version Added	`2.5.1`

--no-auto-detect-numbers¶

Turn off automatic adoption of numeric data types based on their precision and scale in the RDBMS. All numeric data types will be offloaded to a general purpose data type such as DECIMAL(38,18) on Hadoop systems, NUMERIC or BIGNUMERIC on Google BigQuery or NUMBER(38,18) on Snowflake.

Alias	None
Default Value	False
Supported Values	None
Version Added	`2.3.0`

--no-create-aggregations¶

Skip aggregation creation. If this parameter is used, then to benefit from Advanced Aggregation Pushdown the aggregate hybrid objects must be created using Present.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.3.0`

--no-generate-dependent-views¶

Dependent views will not be automatically re-generated in the hybrid schema.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.3.0`

--no-materialize-join¶

Offload a join (specified by --offload-join) as a view.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.3.0`

--no-modify-hybrid-view¶

Prevent an offload predicate from being added to the boundary conditions in a hybrid view. Can only be used in conjunction with --offload-predicate for --offload-predicate-type values of RANGE, LIST_AS_RANGE, RANGE_AND_PREDICATE or LIST_AS_RANGE_AND_PREDICATE.

Alias	None
Default Value	None
Supported Values	None
Version Added	`3.4.0`

--no-verify¶

Skip the data validation step at the end of an offload.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.3.0`

--not-null-columns¶

Specifies which columns should be created as NOT NULL when offloading a table. Used to override the global OFFLOAD_NOT_NULL_PROPAGATION configuration variable at an offload level. Accepts a CSV list and/or wildcard(s) of valid columns to create as NOT NULL in the backend. Only applies to Google BigQuery, Snowflake or Azure Synapse Analytics backends.

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`4.3.4`

--num-buckets¶

Default number of offload buckets (subpartitions) for an offloaded table, allowing parallel data retrieval. A value of AUTO tunes to a value between 1 and DEFAULT_BUCKETS_MAX.

Alias	None
Default Value	`DEFAULT_BUCKETS` or `AUTO`
Supported Values	Integer values or `AUTO`
Version Added	`2.3.0`

--num-location-files¶

Number of external table location files for parallel data retrieval.

Alias	None
Default Value	`NUM_LOCATION_FILES`
Supported Values	Integer values
Version Added	`2.7.2`

Note

When offloading or materializing data in Impala then --num-location-files will be aligned with --num-buckets/DEFAULT_BUCKETS

--offload-by-subpartition¶

Offload a subpartitioned table with Subpartition-Based Offload (i.e. with reference to subpartition keys and high values rather than partition-level information).

Alias	None
Default Value	True for composite partitioned tables that are unsupported for Partition-Based Offload but supported for Subpartition-Based Offload, False for all other tables
Supported Values	None
Version Added	`2.7.0`

--offload-chunk-column¶

Splits load data by this column during insert from the load table to the final table. This can be used to manage memory usage.

Alias	None
Default Value	None
Supported Values	Valid column name
Version Added	`2.3.0`

--offload-chunk-impala-insert-hint¶

Used to inject a hint into the INSERT AS SELECT moving data from load table to final destination. The absence of a value injects no hint. Impala only.

Alias	None
Default Value	None
Supported Values	`SHUFFLE\|NOSHUFFLE`
Version Added	`2.3.0`

--offload-distribute-enabled¶

Distribute data by partition key(s) during the final INSERT operation of an offload. Hive only.

Alias	None
Default Value	`OFFLOAD_DISTRIBUTE_ENABLED`, `true`
Supported Values	None
Version Added	`2.8.0`

--offload-fs-container¶

The name of the bucket or container to be used when offloading to cloud storage.

Alias	None
Default Value	`OFFLOAD_FS_CONTAINER`
Supported Values	A cloud storage bucket/container name configured for use by the backend cluster
Version Added	`3.0.0`

--offload-fs-prefix¶

A directory path used to prefix database locations within OFFLOAD_FS_SCHEME. When OFFLOAD_FS_SCHEME is inherit HDFS_DATA takes precedence over this setting.

Alias	None
Default Value	`OFFLOAD_FS_PREFIX`
Supported Values	A valid directory in HDFS or cloud storage
Version Added	`3.0.0`

--offload-fs-scheme¶

The filesystem scheme to be used for database and table locations. inherit specifies that all tables created by Offload will not specify a LOCATION clause, they will inherit the location from the parent database. See Integrating with Cloud Storage for details.

Alias	None
Default Value	`OFFLOAD_FS_SCHEME`, `inherit`
Supported Values	`inherit`, `hdfs`, `s3a`, `adl`, `abfs`, `abfss`
Version Added	`3.0.0`

--offload-join¶

Offload a materialized view of the supplied join(s), allowing join processing to be offloaded. Repeated use of --offload-join allows multiple row sources to be included. See documentation for syntax details.

Alias	None
Default Value	None
Supported Values	See Offload Join Grammar
Version Added	`2.3.0`

--offload-predicate¶

Specify a predicate to identify a set of data in a table for offload. Can be used to offload all or some of the data in any table type. See documentation for syntax details.

Alias	None
Default Value	None
Supported Values	See Predicate Grammar
Version Added	`3.4.0`

--offload-predicate-type¶

Override the default INCREMENTAL_PREDICATE_TYPE for a partitioned table. Can be used to offload LIST partitioned tables using RANGE logic with an --offload-predicate-type value of LIST_AS_RANGE or used for specialized cases of offloading with Partition-Based Offload and Predicate-Based Offload.

Alias	None
Default Value	None
Supported Values	`LIST`, `LIST_AS_RANGE`, `RANGE`, `RANGE_AND_PREDICATE`, `LIST_AS_RANGE_AND_PREDICATE`, `PREDICATE`
Version Added	`3.3.1`

--offload-sort-enabled¶

Sort/cluster data during the final INSERT operation of an offload. Configure sort/cluster columns using --sort-columns.

Alias	None
Default Value	`OFFLOAD_SORT_ENABLED`, `false`
Supported Values	None
Version Added	`2.7.0`

--offload-stats¶

Method used to manage backend table stats during an Offload, Incremental Update Extraction or Compaction. NATIVE is the default. HISTORY will gather stats on all partitions without stats (applicable to an Offload on Hive only and will automatically be replaced with NATIVE on Impala). COPY will copy table statistics from the RDBMS to an offloaded table if the backend system supports setting of statistics. NONE will prevent Offload from managing stats; for Hive this results in no stats being gathered even if hive.stats.autogather=true is set at the system level.

Alias	None
Default Value	`NATIVE`
Supported Values	`NATIVE\|HISTORY\|COPY\|NONE`
Version Added	`2.4.7` (`HISTORY` added in `2.9.0`)

--offload-transport¶

Method used to transport data from an RDBMS frontend to a backend system. AUTO selects the optimal method based on configuration and table structure.

Alias	None
Default Value	`OFFLOAD_TRANSPORT`, `AUTO`
Supported Values	`AUTO\|GLUENT\|SQOOP`
Version Added	`3.1.0`

--offload-transport-cmd-host¶

An override for HDFS_CMD_HOST when running shell based Offload Transport commands such as Sqoop or Spark Submit.

Alias	None
Default Value	`OFFLOAD_TRANSPORT_CMD_HOST`
Supported Values	Hostname or IP address of HDFS host
Version Added	`3.1.0`

--offload-transport-consistent-read¶

Control whether parallel data transport tasks should use a consistent point in time when reading RDBMS data.

Alias	None
Default Value	`OFFLOAD_TRANSPORT_CONSISTENT_READ`, `true`
Supported Values	`true\|false`
Version Added	`3.1.0`

--offload-transport-dsn¶

Database connection details for Offload Transport if different to ORA_CONN.

Alias	None
Default Value	`OFFLOAD_TRANSPORT_DSN`, `ORA_CONN`
Supported Values	`<hostname>:<port>/<service>`
Version Added	`3.1.0`

--offload-transport-fetch-size¶

Number of records to fetch in a single batch from the RDBMS during Offload. Offload Transport may encounter memory pressure if a table is very wide (e.g. contains LOB columns) and there are lots of records in a batch. Reducing the fetch size can alleviate this if more memory cannot be allocated.

Alias	None
Default Value	`OFFLOAD_TRANSPORT_FETCH_SIZE`, `5000`
Supported Values	Positive integers
Version Added	`3.1.0`

--offload-transport-jvm-overrides¶

JVM overrides (inserted right after sqoop import or spark-submit).

Alias	None
Default Value	`SQOOP_OVERRIDES`, `OFFLOAD_TRANSPORT_SPARK_OVERRIDES`
Supported Values	See `SQOOP_OVERRIDES` and `OFFLOAD_TRANSPORT_SPARK_OVERRIDES`
Version Added	`3.1.0`

--offload-transport-livy-api-url¶

URL for Livy/Spark REST API in the format http://fqdn-n.example.com:port. https can be used in place of http.

Alias	None
Default Value	`OFFLOAD_TRANSPORT_LIVY_API_URL`
Supported Values	Valid Livy REST API URL
Version Added	`3.1.0`

--offload-transport-livy-idle-session-timeout¶

Timeout (in seconds) for idle Spark client sessions created in Livy.

Alias	None
Default Value	`OFFLOAD_TRANSPORT_LIVY_IDLE_SESSION_TIMEOUT`, `600`
Supported Values	Positive integers
Version Added	`3.1.0`

--offload-transport-livy-max-sessions¶

Limits the number of Livy sessions Offload will create. Sessions are re-used when idle. New sessions are only created when no idle sessions are available.

Alias	None
Default Value	`OFFLOAD_TRANSPORT_LIVY_MAX_SESSIONS`, `10`
Supported Values	Positive integers
Version Added	`3.1.0`

--offload-transport-parallelism¶

The number of parallel streams to be used when transporting data from the source RDBMS to the backend.

Alias	None
Default Value	`OFFLOAD_TRANSPORT_PARALLELISM`, `2`
Supported Values	Positive integers
Version Added	`3.1.0`

--offload-transport-password-alias¶

An alias provided by Hadoop Credential Provider API to be used for RDBMS authentication during Offload Transport. The key store containing the alias must be specified in either OFFLOAD_TRANSPORT_CREDENTIAL_PROVIDER_PATH or in Hadoop configuration Path (hadoop.security.credential.provider.path).

Alias	None
Default Value	`OFFLOAD_TRANSPORT_PASSWORD_ALIAS`
Supported Values	Valid Hadoop Credential Provider API alias
Version Added	`3.1.0`

--offload-transport-queue-name¶

YARN queue name to be used for Offload Transport jobs.

Alias	None
Default Value	`SQOOP_QUEUE_NAME`, `OFFLOAD_TRANSPORT_SPARK_QUEUE_NAME`
Supported Values	See `SQOOP_QUEUE_NAME` and `OFFLOAD_TRANSPORT_SPARK_QUEUE_NAME`
Version Added	`3.1.0`

--offload-transport-small-table-threshold¶

Threshold above which Query Import is no longer considered the correct offload choice for non-partitioned tables.

Alias	None
Default Value	`OFFLOAD_TRANSPORT_SMALL_TABLE_THRESHOLD` or `20M`
Supported Values	E.g. `100M`, `1G`, `1.5G`
Version Added	`3.1.0`

--offload-transport-spark-properties¶

Key/value pairs, in JSON format, to override Spark property defaults. Examples:

'{"spark.driver.memory": "8G", "spark.executor.memory": "8G"}'

'{"spark.driver.extraJavaOptions": "-Doracle.net.wallet_location=/some/path/here/gluent_wallet", "spark.executor.extraJavaOptions": "-Doracle.net.wallet_location=/some/path/here/gluent_wallet"}'

Alias	None
Default Value	`OFFLOAD_TRANSPORT_SPARK_PROPERTIES`
Supported Values	Valid JSON string of key/value pairs (no nested or complex data types)
Version Added	`3.1.0`

--offload-transport-spark-thrift-host¶

Name of host(s) where the Spark Thrift Server is running. Can be a comma-separated list of hosts to randomly choose from, e.g. hadoop1,hadoop2,hadoop3.

Alias	None
Default Value	`OFFLOAD_TRANSPORT_SPARK_THRIFT_HOST`
Supported Values	Hostname or IP address of Spark Thrift Server host(s)
Version Added	`3.1.0`

--offload-transport-spark-thrift-port¶

Port that the Spark Thrift Server is listening on.

Alias	None
Default Value	`OFFLOAD_TRANSPORT_SPARK_THRIFT_PORT`
Supported Values	Active port
Version Added	`3.1.0`

--offload-transport-validation-polling-interval¶

Polling interval in seconds for validation of Spark transport row count. A value of -1 disables retrieval of RDBMS SQL statistics. A value of 0 disables polling resulting in a single capture of SQL statistics after Offload Transport. A value greater than 0 polls RDBMS SQL statistics using the specified interval.

Alias	None
Default Value	`OFFLOAD_TRANSPORT_VALIDATION_POLLING_INTERVAL`
Supported Values	Interval value in seconds, `0` or `-1`
Version Added	`4.2.1`

--offload-type¶

Identifies a range-partitioned offload as FULL or INCREMENTAL. FULL dictates that all data is offloaded. INCREMENTAL dictates that data up to a boundary threshold will be offloaded.

Alias	None
Default Value	`INCREMENTAL` for RDBMS tables capable of supporting Partition-Based Offload that are partially offloaded (e.g. using `--older-than-date`). `FULL` for all other offloads.
Supported Values	`FULL\|INCREMENTAL`
Version Added	`2.5.0`

--older-than-date¶

Offload partitions older than this date (use YYYY-MM-DD format). Overrides --older-than-days if both are present.

Alias	None
Default Value	None
Supported Values	Date in `YYYY-MM-DD` format
Version Added	`2.3.0`

--older-than-days¶

Offload partitions older than this number of days (exclusive, i.e. the boundary partition is not offloaded). Suitable for keeping data up to a certain age in the source table. Alternative to --older-than-date option. If both are supplied, --older-than-date will be used.

Alias	None
Default Value	None
Supported Values	Valid number of days
Version Added	`2.3.0`

--partition-columns¶

Override column(s) to use for partitioning backend data. Defaults to source table partition columns.

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`2.3.0`

--partition-digits¶

Maximum digits allowed for a numeric partition value.

Alias	None
Default Value	`15`
Supported Values	Integer values
Version Added	`2.3.0`

--partition-functions¶

Custom UDF to use for synthetic partitioning of offloaded data. Used when no native partitioning scheme exists for the partition column data type. Google BigQuery only.

Alias	None
Default Value	None
Supported Values	Valid custom UDF
Version Added	`4.2.0`

--partition-granularity¶

Partition level/granularity. Use:

Y, M, D for date/timestamp partition columns
Integral size for numeric partitions. A value of 1 is effectively list partitioning
Sub-string length for string partitions

Examples:

M partitions the table by Year-Month
D partitions the table by Year-Month-Day
5000 partitions the table in ranges of 5000 values
1 creates a partition per value, useful for columns holding values such as year and month or categories
2 on a string partition key partitions using the first two characters

Alias	None
Default Value	See Partition Granularity
Supported Values	`Y\|M\|D\|\d+`
Version Added	`2.3.0`

--partition-lower-value¶

Integer value defining the lower bound of a range values used for backend integer range partitioning. BigQuery only.

Alias	None
Default Value	None
Supported Values	Positive integers
Version Added	`4.0.0`

--partition-names¶

Specify partitions to be included for offload with Partition-Based Offload. For range-partitioned tables only a single partition name can be specified and it is used to derive a value for --less-than-value/--older-than-date as appropriate. For list-partitioned tables, this option is used to supply a CSV of all partitions to be offloaded and is additional to any partitions offloaded in previous operations.

Alias	None
Default Value	None
Supported Values	Valid partition name(s)
Version Added	`3.3.0`

--partition-upper-value¶

Integer value defining the upper bound of a range of values used for backend integer range partitioning. BigQuery only.

Alias	None
Default Value	None
Supported Values	Positive integers
Version Added	`4.0.0`

--preserve-load-table¶

Stops the load table being dropped on completion of offload.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.3.0`

--purge¶

When supported by the backend system, utilize purge when removing a table due to --reset-backend-table.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.4.9`

--reset-backend-table¶

Remove the backend table before offloading. Use with caution as this will delete previously offloaded data for this table.

Alias	None
Default Value	None
Supported Values	None
Version Added	`3.3.0`

--reset-hybrid-view¶

Reset Partition-Based Offload, Subpartition-Based Offload or Predicate-Based Offload predicates in the hybrid view.

Alias	None
Default Value	None
Supported Values	None
Version Added	`3.3.0`

--skip-steps¶

Skip given steps. CSV of step IDs to be skipped. Step IDs are found by replacing spaces with underscore and are case-insensitive.

For example, it is possible to skip Impala compute statistics commands using a value of Compute_backend_statistics if an initial offload is being performed in stages, and then gather them with the final offload command.

Alias	None
Default Value	None
Supported Values	Valid offload step names
Version Added	`2.3.0`

--sort-columns¶

CSV list of columns used to sort or cluster data when inserting into the final destination table. Offloads using Partition-Based Offload or Subpartition-Based Offload will retrieve the value used by the prior offload if no list of columns is explicitly provided. This option has no effect when OFFLOAD_SORT_ENABLED/--offload-sort-enabled is false.

When using Offload Join the column names in --sort-columns must match those in the final destination table (not the names used in the source tables).

This option supports the wildcard character * in column names.

Alias	None
Default Value	None for non-partitioned source tables, `--partition-columns` for partitioned source tables
Supported Values	Valid column name(s)
Version Added	`2.7.0`

--sqoop-disable-direct¶

It is recommended that the OraOOP optimizations for Sqoop (included in standard Apache Sqoop from v1.4.5) are used. If not, then disable direct path mode.

Alias	None
Default Value	`SQOOP_DISABLE_DIRECT`, `false`
Supported Values	`true\|false`
Version Added	`2.3.0`

--sqoop-mapreduce-map-java-opts¶

Sqoop specific setting for -Dmapreduce.map.java.opts. Allows control over Java options for Sqoop MapReduce jobs.

Alias	None
Default Value	None
Supported Values	Valid Sqoop Java options
Version Added	`2.3.0`

--sqoop-mapreduce-map-memory-mb¶

Sqoop specific setting for -Dmapreduce.map.memory.mb. Allows control over memory allocation for Sqoop MapReduce jobs.

Alias	None
Default Value	None
Supported Values	Valid numbers in MB
Version Added	`2.3.0`

--sqoop-additional-options¶

Additional Sqoop command options added to the end of the Sqoop command.

Alias	None
Default Value	`SQOOP_ADDITIONAL_OPTIONS`
Supported Values	Any Sqoop command option/argument not already included in the Sqoop command line
Version Added	`2.9.0`

--sqoop-password-file¶

Path to an HDFS file containing ORA_APP_PASS which is then passed to Sqoop using the Sqoop --password-file option. This file should be protected with appropriate file system permissions.

Alias	None
Default Value	`SQOOP_PASSWORD_FILE`
Supported Values	Valid HDFS path
Version Added	`2.5.0`

--storage-compression¶

Storage compression of final offload table. GZIP only available with Parquet. ZLIB only available with ORC.

MED is an alias for SNAPPY on both Impala and Hive. This is the default value because it gives the best balance of elapsed time to compression.

HIGH is an alias for GZIP on Impala, ZLIB on Hive.

Alias	None
Default Value	`MED`
Supported Values	`HIGH\|MED\|NONE\|GZIP\|ZLIB\|SNAPPY`
Version Added	`2.3.0`

--storage-format¶

Storage format of final backend table. Not applicable to Google BigQuery or Snowflake.

Alias	None
Default Value	`PARQUET` for Impala, `ORC` for Hive
Supported Values	`ORC\|PARQUET`
Version Added	`2.3.0`

--timestamp-tz-columns¶

CSV list of columns to offload as a timestamp with time zone (will only be effective for date-based columns).

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`4.0.0`

--udf-db¶

Backend database to use for user-defined functions (UDFs).

Gluent Data Platform UDFs are used in Hadoop-based backends to:

Convert data to Oracle Database binary formats (ORACLE_NUMBER, ORACLE_DATE)
Perform Run-Length Encoding
Handle data conversion functions e.g. UPPER, LOWER

They are installed once during installation, and upgraded, using the connect --install-udfs command.

Custom UDFs can also be created by users in BigQuery and used by Gluent Data Platform for synthetic partitioning. Custom UDFs must be installed prior to running any offload commands that require access to them.

Alias	None
Default Value	`OFFLOAD_UDF_DB`
Supported Values	Valid backend database
Version Added	`2.3.0`

--unicode-string-columns¶

CSV list of columns to Offload as Unicode string (only effective for string columns).

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

4.3.0

--variable-string-columns¶

CSV list of columns to offload as a variable length string. Only effective for date/timestamp columns.

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`3.3.0`

--verify¶

Validation method to use when verifying data at the end of an offload.

Alias	None
Default Value	`minus`
Supported Values	`minus\|aggregate`
Version Added	`2.3.0`

--verify-parallelism¶

Degree of parallelism to use for the RDBMS query executed when validating an offload. Values of 0 or 1 will execute the query without parallelism. Values > 1 will force a parallel query of the given degree. If unset, the RDBMS query will fall back to using the behavior specified by RDBMS defaults.

Alias	None
Default Value	`OFFLOAD_VERIFY_PARALLELISM`
Supported Values	`0` and positive integers
Version Added	`4.2.1`

Present Parameters ¶

--aggregate-by¶

CSV list of columns to aggregate by (GROUP BY) when presenting an Advanced Aggregation Pushdown rule.

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`2.3.0`

--base-name¶

For aggregations only. Provide the name of the base hybrid view originally presented before aggregation. Use when the base view name is different to its source backend table.

Alias	None
Default Value	None
Supported Values	`<SCHEMA>.<VIEW_NAME>`
Version Added	`2.3.0`

--binary-columns¶

CSV list of columns to present using a binary data type. Only effective for string-based columns.

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`3.3.0`

--columns¶

CSV list of columns to present.

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`2.3.0`

--count-star-expressions¶

CSV list of functional equivalents to COUNT(*) for aggregation pushdown.

If you also use COUNT(x) in your SQL statements then, apart from COUNT(1) which is automatically catered for, the presence of COUNT(x) will cause rewrite rules to fail unless you include it with this parameter.

Alias	None
Default Value	None
Supported Values	E.g. `COUNT(9)`
Version Added	`2.3.0`

--data-governance-custom-properties¶

CSV list of free-format tags for data governance metadata. These are in addition to DATA_GOVERNANCE_AUTO_PROPERTIES and will override DATA_GOVERNANCE_CUSTOM_PROPERTIES.

Alias	None
Default Value	`DATA_GOVERNANCE_CUSTOM_PROPERTIES`
Supported Values	Valid JSON string of key/value pairs (no nested or complex data types)
Version Added	`2.11.0`

--data-governance-custom-tags¶

CSV list of free-format tags for data governance metadata. These are in addition to DATA_GOVERNANCE_AUTO_TAGS and therefore useful for tags to be applied to specific activities.

Alias	None
Default Value	`DATA_GOVERNANCE_CUSTOM_TAGS`
Supported Values	E.g. `CONFIDENTIAL,TIER1`
Version Added	`2.11.0`

--date-columns¶

CSV list of columns to present to Oracle Database as DATE (effective for datetime/timestamp columns).

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`2.3.0`

--date-fns¶

CSV list of functions to apply to the non-aggregating date/timestamp projection.

Alias	None
Default Value	`MIN`, `MAX`, `COUNT`
Supported Values	`MIN`, `MAX`, `COUNT`
Version Added	`2.3.0`

--decimal-columns¶

CSV list of columns to offload/present as a fixed precision and scale numeric data type. For example DECIMAL(p,s) where “p,s” is specified in a paired --decimal-columns-type option. Only effective for numeric columns. These options allow repeat inclusion for flexible data type specification, for example:

"--decimal-columns-type=18,2 --decimal-columns=price,cost --decimal-columns-type=6,4 --decimal-columns=location"

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`2.5.0`

--decimal-columns-type¶

State the precision and scale of columns listed in a paired --decimal-columns option. Must be of format “precision,scale” where 1<=precision<=38 and 0<=scale<=38 and scale<=precision. e.g.:

"--decimal-columns-type=18,2"

When offloading, values specified in this option are subject to padding as per the --decimal-padding-digits option.

Alias	None
Default Value	None
Supported Values	Valid “precision,scale” where 1<=precision<=38 and 0<=scale<=38 and scale<=precision
Version Added	`2.5.0`

--detect-sizes¶

Query backend table/view data length and set external table columns sizes accordingly.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.3.0`

--integer-1-columns¶

CSV list of columns to offload/present (as applicable) as a 1-byte integer, known as TINYINT in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`3.3.0`

--integer-2-columns¶

CSV list of columns to offload/present (as applicable) as a 2-byte integer, known as SMALLINT in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`3.3.0`

--integer-4-columns¶

CSV list of columns to offload/present (as applicable) as a 4-byte integer, known as INT in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`3.3.0`

--integer-8-columns¶

CSV list of columns to offload/present (as applicable) as an 8-byte integer, known as BIGINT in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`3.3.0`

--integer-38-columns¶

CSV list of columns to offload/present (as applicable) as 38 digit integral column. If a system does not support 38 digits of precision then the most appropriate data type available will be used. Only effective for numeric columns.

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`3.3.0`

--interval-ds-columns¶

CSV list of columns to present to Oracle Database as INTERVAL DAY TO SECOND type (will only be effective for backend STRING columns).

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`2.3.0`

--interval-ym-columns¶

CSV list of columns to present to Oracle Database as INTERVAL YEAR TO MONTH type (will only be effective for backend STRING columns).

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`2.3.0`

--large-binary-columns¶

CSV list of columns to present using a large binary data type, for example Oracle Database BLOB. Only effective for string-based columns.

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`3.3.0`

--large-string-columns¶

CSV list of columns to present as a large string data type, for example Oracle Database CLOB. Only effective for string-based columns.

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`3.3.0`

--lob-data-length¶

Expected length of RDBMS LOB data

Alias	None
Default Value	`32K`
Supported Values	E.g. `64K`, `10M`
Version Added	`2.4.7`

--materialize-join¶

Use this option to materialize a join specified using --present-join.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.3.0`

--measures¶

CSV list of aggregated columns to include in the projection of an aggregated present.

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`2.4.0`

--no-create-aggregations¶

Skip aggregation creation. If this parameter is used, then to benefit from Advanced Aggregation Pushdown the aggregate hybrid objects must be created using Present.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.3.0`

--no-gather-stats¶

Skip generation of new statistics for presented tables/views (default behavior is to generate statistics for new aggregate/join views or existing backend tables with no statistics).

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.3.0`

--num-location-files¶

Number of external table location files for parallel data retrieval.

Alias	None
Default Value	`NUM_LOCATION_FILES`
Supported Values	Integer values
Version Added	`2.7.2`

--numeric-fns¶

CSV list of aggregate functions to apply to aggregated numeric columns or measures in an aggregation projection.

Alias	None
Default Value	`MIN`, `MAX`, `AVG`, `SUM`, `COUNT`
Supported Values	`MIN`, `MAX`, `AVG`, `SUM`, `COUNT`
Version Added	`2.3.0`

--present-join¶

Present a view of the supplied join(s) allowing the join processing to be offloaded. Repeated use of --present-join allows multiple row sources to be included. See documentation for syntax.

Alias	None
Default Value	None
Supported Values	See Present Join Grammar
Version Added	`2.3.0`

--reset-backend-table¶

Remove the backend table before offloading. Use with caution as this will delete previously offloaded data for this table.

Alias	None
Default Value	None
Supported Values	None
Version Added	`3.3.0`

--sample-stats¶

Estimate statistics by scanning a few (random) partitions for presented partitioned tables/views, or a percentage of the non-partitioned presented table/view for backends that support row based percentage sampling (default behavior is to scan the entire table).

Alias	None
Default Value	None
Supported Values	`0-100`
Version Added	`2.3.0`

--string-fns¶

CSV list of aggregate functions to apply to aggregated string columns or measures in an aggregation projection.

Alias	None
Default Value	`MIN`, `MAX`, `COUNT`
Supported Values	`MIN`, `MAX`, `COUNT`
Version Added	`2.3.0`

--timestamp-columns¶

CSV list of columns to present as a TIMESTAMP (only effective for date based columns)

This option supports the wildcard character * in column names.

Alias	None
Default Value	None
Supported Values	Valid column name(s)
Version Added	`4.0.0`

--unicode-string-columns¶

CSV list of columns to Present as Unicode string (only effective for string columns).

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

4.3.0

Incremental Update Parameters ¶

--incremental-batch-size¶

Batch (fetch) size to use when extracting changes for shipping from a table that is enabled for Incremental Update.

Alias	None
Default Value	`1000`
Supported Values	Positive integers
Version Added	`2.5.0`

--incremental-changelog-sequence-cache-size¶

Specifies the cache size to use for a sequence coupled to the log table used for Incremental Update extraction.

Alias	None
Default Value	`100`
Supported Values	Positive integers
Version Added	`2.10.0`

--incremental-changelog-table¶

Specifies the name of the log table to use for Incremental Update extraction (format is <OWNER>.<TABLE>). Not required when --incremental-extraction-method is ORA_ROWSCN.

Alias	None
Default Value	`<Hybrid Schema>.<Table Name>_LOG`
Supported Values	`<OWNER>.<TABLE>`
Version Added	`2.5.0`

--incremental-delta-threshold¶

When running the compaction routine for a table enabled for Incremental Update, this threshold denotes the minimum number of changes required to enable the compaction routine to be executed (i.e. compaction will only be executed if there are at least this many rows in the delta table at a given time).

Alias	None
Default Value	`50000`
Supported Values	Positive integers
Version Added	`2.5.0`

--incremental-extraction-method¶

Indicates which change extraction method to use when enabling Incremental Update for a table during an offload.

Alias	None
Default Value	`ORA_ROWSCN`
Supported Values	`ORA_ROWSCN,CHANGELOG,UPDATABLE_CHANGELOG,UPDATABLE,CHANGELOG_INSERT,UPDATABLE_INSERT`
Version Added	`2.5.0`

--incremental-full-compaction¶

When running the compaction routine for a table that has Incremental Update enabled, insert compacted records into a new base table, also known as an out-of-place compaction.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.10.0`

--incremental-key-columns¶

Comma-separated list of columns that uniquely identify rows in an offloaded source table. Columns are used when extracting incremental changes from the source table and applying them to the offloaded table. In the absence of this parameter the primary key of the table is used.

This option supports the wildcard character * in column names.

Alias	None
Default Value	Primary key
Supported Values	Comma-separated list of columns
Version Added	`2.5.0`

--incremental-no-lockfile¶

When running the compaction routine for a table that is enabled for Incremental Update, do not use a lockfile on the local filesystem to prevent multiple compaction processes from running concurrently (on that machine).

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.5.0`

--incremental-no-verify-primary-key¶

Bypass verification of mandatory primary key when using CHANGELOG_INSERT or UPDATABLE_INSERT extraction methods.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.9.0`

Warning

With this option, users must ensure that no duplicate records are inserted.

--incremental-no-verify-shipped¶

Bypass verification of the number of change records shipped when extracting and shipping changes for a table that is enabled for Incremental Update. Not applicable when using Incremental Update with Google BigQuery backends.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.5.0`

--incremental-partition-wise-full-compaction¶

When running the compaction routine for a table that has Incremental Update enabled, insert compacted records into the new base table partition-wise. Note that this may cause the compaction process to take significantly longer overall, but it can also significantly reduce the cluster resources used by compaction at any one time.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.5.0`. Renamed from `--incremental-partition-wise-compaction` in `2.10.0`

--incremental-retain-obsolete-objects¶

Retain the previous artifacts when the compaction routine has completed for a table with Incremental Update enabled.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.5.0`

Warning

With this option, users must manage previous artifacts and associated storage. In some circumstances, retained obsolete objects can cause the re-offloading of entire tables (with the --reset-backend-table option) to fail.

--incremental-run-compaction¶

Run the compaction routine for a table that has Incremental Update enabled. Must be used in conjunction with the --execute parameter.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.5.0`

--incremental-run-compaction-without-snapshot¶

Run the compaction routine for a table without creating an HDFS snapshot.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.10.0`

--incremental-run-extraction¶

Extract and ship all new changes for a table that has Incremental Update enabled. Must be used in conjunction with the --execute parameter.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.5.0`

--incremental-terminate-compaction¶

When running the compaction routine for a table with Incremental Update enabled, instruct the compaction process to exit when blocked by some external condition. By default, the compaction process will keep running when blocked, but will drop into a sleep-then-poll loop.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.5.0`

--incremental-tmp-dir¶

When extracting and shipping changes for a table that has Incremental Update enabled, this specifies the staging directory to be used for local data files, before they are shipped to HDFS.

Alias	None
Default Value	`<OFFLOAD_HOME>/tmp/incremental_changes`
Supported Values	Valid writable directory
Version Added	`2.5.0`

--incremental-updates-disabled¶

Disables Incremental Update for the specified table.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.6.0`

--incremental-updates-enabled¶

Enables Incremental Update for the table being offloaded.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.5.0`

--incremental-wait-time¶

When running the compaction routine for a table that has Incremental Update enabled, this specifies the minimum amount of time (in minutes) to allow for active queries to complete before performing any database operations that could cause such queries to fail.

Alias	None
Default Value	`15`
Supported Values	`0 and positive integers`
Version Added	`2.5.0`

Validate Parameters ¶

--aggregate-functions¶

Comma-separated list of aggregate functions to apply, e.g. MIN,MAX,COUNT. Functions need to be available and use the same arguments in both frontend and backend databases.

Alias	`-A`
Default Value	`[('MIN', 'MAX', 'COUNT')]`
Supported Values	CSV list of expressions
Version Added	`2.3.0`

--as-of-scn¶

Execute validation on frontend site as-of a specified SCN (assumes an ORACLE frontend).

Alias	None
Default Value	None
Supported Values	Valid SCN
Version Added	`2.3.0`

--filters¶

Comma-separated list of (<column> <operation> <value>) expressions, e.g. PROD_ID < 12, CUST_ID >= 1000. Expressions must be supported in both frontend and backend databases.

Alias	`-F`
Default Value	None
Supported Values	CSV list of expressions
Version Added	`2.3.0`

--frontend-parallelism¶

Degree of parallelism to use for the RDBMS query executed when validating an offload. Values of 0 or 1 will execute the query without parallelism. Values > 1 will force a parallel query of the given degree. If unset, the RDBMS query will fall back to using the behavior specified by RDBMS defaults.

Alias	None
Default Value	`OFFLOAD_VERIFY_PARALLELISM`
Supported Values	`0` and positive integers
Version Added	`4.2.1`

--group-bys¶

Comma-separated list of group by expressions, e.g. COL1, COL2. Expressions must be supported in both frontend and backend databases.

This option supports the wildcard character * in column names.

Alias	`-G`
Default Value	None
Supported Values	csv list of expressions
Version Added	`2.3.0`

--selects¶

Comma-separated list of columns OR <number> of columns to run aggregations on. If <number> is specified the first and last columns and the <number>-2 highest cardinality columns will be selected.

This option supports the wildcard character * in column names.

Alias	`-S`
Default Value	`5`
Supported Values	CSV list of columns OR <number>
Version Added	`2.3.0`

--skip-boundary-check¶

Do not include ‘offloaded boundary check’ in the list of filters. The ‘offloaded boundary check’ filter defines data that was offloaded to the backend database. For example: WHERE TIME_ID < timestamp '2015-07-01 00:00:00' which resulted from applying the --older-than-date=2015-07-01 filter during offload.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.3.0`

Schema Sync Parameters ¶

--command-file¶

Name of an additional log file to record the commands that have been applied (if the --execute option has been used) or should be applied (if the --execute option has not been used). Supplied as full or relative path.

Alias	None
Default Value	None
Supported Values	Full or relative path to file
Version Added	`2.8.0`

--include¶

CSV list of schemas, schema.tables or tables to examine for change detection and evolution. Supports wildcards (using *). Example formats: SCHEMA1, SCHEMA*, SCHEMA1.TABLE1,SCHEMA1.TABLE2,SCHEMA2.TAB*, SCHEMA1.TAB*, *.TABLE1,*.TABLE2, *.TAB*.

Alias	None
Default Value	None
Supported Values	List of one or more schema(s), schema(s).table(s) or table(s)
Version Added	`2.8.0`

--no-create-aggregations¶

Skip aggregation creation. If this parameter is used, then to benefit from Advanced Aggregation Pushdown the aggregate hybrid objects must be created using Present.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.3.0`

Diagnose Parameters ¶

--backend-log-size-limit¶

Size limit for data returned from each backend log e.g. 100K, 0.5M, 1G.

Alias	None
Default Value	`10M`
Supported Values	`<n><K\|M\|G\|T>`
Version Added	`2.11.0`

--hive-http-endpoint¶

Endpoint of the HiverServer2 or HiveServer2 Interactive (LLAP) service in the format <server|ip address>:<port>.

Alias	None
Default Value	None
Supported Values	`<server\|ip address>:<port>`
Version Added	`3.1.0`

--impalad-http-port¶

Port of the Impala Daemon HTTP Server.

Alias	None
Default Value	`25000`
Supported Values	Positive integers
Version Added	`2.11.0`

--include-backend-logs¶

Retrieve backend query engine logs.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.11.0`

--include-backend-config¶

Retrieve backend query engine config.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.11.0`

--include-logs-from¶

Collate and package log files modified or created since date (format: YYYY-MM-DD) or date/time (format: YYYY-MM-DD_HH24:MM:SS). Can be used in conjunction with the --include-logs-to parameter to specify a search range.

Alias	None
Default Value	None
Supported Values	`YYYY-MM-DD` or `YYYY-MM-DD_HH24:MM:SS`
Version Added	`2.11.0`

--include-logs-last¶

Collate and package log files modified or created in the last n [d]ays (e.g. 3d) or [h]ours (e.g. 7h).

Alias	None
Default Value	None
Supported Values	`<n><d\|h>`
Version Added	`2.11.0`

--include-logs-to¶

Collate and package log files modified or created since date (format: YYYY-MM-DD) or date/time (format: YYYY-MM-DD_HH24:MM:SS). Can be used in conjunction with the --include-logs-from parameter to specify a search range.

Alias	None
Default Value	None
Supported Values	`YYYY-MM-DD` or `YYYY-MM-DD_HH24:MM:SS`
Version Added	`2.11.0`

--include-permissions¶

Collect permissions of files and directories related to Gluent Data Platform.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.11.0`

--include-processes¶

Collect details for running processes related to Gluent Data Platform.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.11.0`

--include-query-logs¶

Retrieve logs for a supplied query ID.

Alias	None
Default Value	None
Supported Values	Valid Impala/LLAP query ID
Version Added	`2.11.0`

--log-location¶

Location in which to search for log files.

Alias	None
Default Value	`OFFLOAD_HOME/log`
Supported Values	Valid directory path
Version Added	`2.11.0`

--output-location¶

Location in which to save files created by Diagnose.

Alias	None
Default Value	`OFFLOAD_HOME/log`
Supported Values	Valid directory path
Version Added	`2.11.0`

--retain-created-files¶

By default, after they have been packaged, files created by Diagnose in --output-location are removed. Specify this parameter to retain them.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.11.0`

--spark-application-id¶

Retrieve logs for a supplied Spark application ID.

Alias	None
Default Value	None
Supported Values	Valid Spark application ID
Version Added	`3.1.0`

Offload Status Report Parameters ¶

--csv-delimiter¶

Field delimiter character for output.

Alias	None
Default Value	`,`
Supported Values	Must be a single character
Version Added	`2.11.0`

--csv-enclosure¶

Enclosure character for string fields in CSV output.

Alias	None
Default Value	`"`
Supported Values	Must be a single character
Version Added	`2.11.0`

-o¶

Output format for Offload Status Report data.

Alias	`--output-format`
Default Value	`text`
Supported Values	`csv\|text\|html\|json\|raw`
Version Added	`2.11.0`

--output-format¶

Output format for Offload Status Report data.

Alias	`-o`
Default Value	`text`
Supported Values	`csv\|text\|html\|json\|raw`
Version Added	`2.11.0`

--output-level¶

Level of detail required for the Offload Status Report.

Alias	None
Default Value	`summary`
Supported Values	`summary\|detail`
Version Added	`2.11.0`

--report-directory¶

Directory to save the report in.

Alias	None
Default Value	`OFFLOAD_HOME/log`
Supported Values	Valid directory path
Version Added	`2.11.0`

--report-name¶

Name of report.

Alias	None
Default Value	`Gluent_Offload_Status_Report_{DB_NAME}_{YYYY}-{MM}-{DD}_{HH}-{MI}-{SS}.[html\|txt\|csv]`
Supported Values	Valid filename
Version Added	`2.11.0`

-s¶

Optional name of schema to run the Offload Status Report for.

Alias	`--schema`
Default Value	None
Supported Values	Valid schema name
Version Added	`2.11.0`

--schema¶

Optional name of schema to run the Offload Status Report for.

Alias	`-s`
Default Value	None
Supported Values	Valid schema name
Version Added	`2.11.0`

-t¶

Optional name of table to run the Offload Status Report for.

Alias	`--table`
Default Value	None
Supported Values	Valid table name
Version Added	`2.11.0`

--table¶

Optional name of table to run the Offload Status Report for.

Alias	`-t`
Default Value	None
Supported Values	Valid table name
Version Added	`2.11.0`

Password Tool Parameters ¶

--encrypt¶

Encrypt a clear-text, case-sensitive password. User will be prompted for the input password and the encrypted version will be output.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.5.0`

--keygen¶

Generate a password key file of the name given by --keyfile.

Alias	None
Default Value	None
Supported Values	None
Version Added	`2.5.0`

--keyfile¶

Name of the password key file to generate.

Alias	None
Default Value	None
Supported Values	Valid path and file name
Version Added	`2.5.0`

Result Cache Manager Parameters ¶

--rc-retention-hours¶

Controls how long to retain Result Cache files for.

Alias	None
Default Value	`24`
Supported Values	Valid number of hours
Version Added	`2.3.0`

Oracle Database Schemas ¶

Gluent Data Platform Admin Schema¶

This account is used by Gluent Data Platform to perform administrative activities. It is defined by ORA_ADM_USER.

Non-standard privileges granted to this schema are:

ANALYZE ANY: Required to copy optimizer statistics from application schema to hybrid schema

GRANT ANY OBJECT PRIVILEGE: Enables the Admin Schema to grant permission on application schema tables to the hybrid schema.

SELECT ANY DICTIONARY: Enables Offload and Present operations to access the Oracle Database data dictionary for information such as column names, data types and partitioning schemes.

SELECT ANY TABLE: Required for Offload activity.

Gluent Data Platform Application Schema¶

This account is used by Gluent Data Platform to perform read-only activities. It is defined by ORA_APP_USER.

Non-standard privileges granted to this schema are:

FLASHBACK ANY TABLE: Required for Sqoop to provide a consistent point-in-time data load. The Gluent Data Platform application schema does not have DML privileges on user application schema tables, therefore there is no threat posed by this configuration.

SELECT ANY DICTIONARY: Documented requirement of Sqoop.

SELECT ANY TABLE: Required for Sqoop to read application schema tables during an offload.

Gluent Data Platform Repository Schema¶

This account is used by Gluent Data Platform to store operational metadata. It is defined by ORA_REPO_USER.

Non-standard privileges granted to this schema are:

SELECT ANY DICTIONARY: Enables installed database packages in support of the metadata repository to access the Oracle Database data dictionary.

Hybrid Schemas¶

Gluent Data Platform hybrid schemas are required to enable remote data to be queried in tandem with customer data in the RDBMS application schema.

Non-standard privileges granted to hybrid schemas are:

CONNECT THROUGH GLUENT_ADM: Offload and Present use this to create hybrid objects without requiring powerful CREATE ANY and DROP ANY privileges.

GLOBAL QUERY REWRITE: Required to support Gluent Query Engine optimizations.

SELECT ANY TABLE: Enables a hybrid view to access the original application schema and offloaded table.

Data Daemon ¶

Properties¶

The following Java properties can be set by creating a $OFFLOAD_HOME/conf/datad.properties file containing <property>=<value> properties and values.

datad.initial-request-pool-size¶

The initial size of the thread pool for concurrent read requests from the RDBMS.

Default Value	`16`
Supported Values	Positive integers
Version Added	`4.2.2`

datad.max-request-pool-size¶

The maximum size of the thread pool for concurrent read requests from the RDBMS.

Default Value	`1024`
Supported Values	Positive integers
Version Added	`4.2.2`

datad.read-pipeline-size¶

The number of reads from the backend to keep in the pipeline to be processed.

Default Value	`4`
Supported Values	Positive integers
Version Added	`4.0.0`

datad.send-queue-size¶

The maximum size in MB of the queue to send to the RDBMS.

Default Value	`16`
Supported Values	Positive integers
Version Added	`4.0.0`

grpc.port¶

The port used for Data Daemon. Setting to 0 results in random port selection.

Default Value	`50051`
Supported Values	Any valid port
Version Added	`4.0.0`

grpc.security.cert-chain¶

The full path to the certificate chain in PEM format to enable TLS on the Data Daemon socket.

Default Value	None
Supported Values	`file:<full path to PEM file>`
Version Added	`4.0.0`

grpc.security.private-key¶

The full path to the private key in PEM format to enable TLS on the Data Daemon socket.

Default Value	None
Supported Values	`file:<full path to PEM file>`
Version Added	`4.0.0`

logging.config¶

The full path to a LOGBack format configuration file to override default logging.

Default Value	None
Supported Values	`<full path to xml file>`
Version Added	`4.0.0`

logging.level.com.gluent.providers.bigquery.BigQueryProvider¶

The log level for Data Daemon interactions with BigQuery.

Default Value	`info`
Supported Values	`off\|error\|warn\|info\|debug\|all`
Version Added	`4.0.0`

logging.level.com.gluent.providers.impala.ImpalaProvider¶

The log level for Data Daemon interactions with Impala.

Default Value	`info`
Supported Values	`off\|error\|warn\|info\|debug\|all`
Version Added	`4.0.0`

logging.level.com.gluent.providers.jdbc.JdbcDataProvider¶

The log level for general Data Daemon operations when interacting with Snowflake and Azure Synapse Analytics.

Default Value	`info`
Supported Values	`off\|error\|warn\|info\|debug\|all`
Version Added	`4.1.0`

logging.level.com.gluent.providers.snowflake.SnowflakeJdbcDataProvider¶

The log level for Data Daemon interactions with Snowflake.

Default Value	`info`
Supported Values	`off\|error\|warn\|info\|debug\|all`
Version Added	`4.1.0`

logging.level.com.gluent.providers.synapse.SynapseProvider¶

The log level for Data Daemon interactions with Azure Synapse Analytics.

Default Value	`info`
Supported Values	`off\|error\|warn\|info\|debug\|all`
Version Added	`4.3.0`

server.port¶

The port used for Data Daemon Web Interface. Setting to 0 results in random port selection.

Default Value	`50052`
Supported Values	Any valid port
Version Added	`4.0.0`

spring.main.web-application-type¶

Allows Data Daemon Web Interface to be disabled.

Default Value	None
Supported Values	`NONE`
Version Added	`4.0.0`

Configuration¶

The following Java configuration options can be set by creating a $OFFLOAD_HOME/conf/datad.conf file containing JAVA_OPTS="<parameter1> <parameter2> ..." e.g. JAVA_OPTS="-Xms2048m -Xmx2048m -Djavax.security.auth.useSubjectCredsOnly=false".

-Xms¶

Sets the initial and minimum Java heap size.

Default Value	Smaller of 1/4th of the physical memory or 1GB
Supported Values	`-Xms<size>[g\|G\|m\|M\|k\|K]`
Version Added	`4.0.0`

-Xmx¶

Sets the maximum Java heap size.

Default Value	Larger of 1/64th of the physical memory or some reasonable minimum
Supported Values	`-Xmx<size>[g\|G\|m\|M\|k\|K]`
Version Added	`4.0.0`

-Djavax.security.auth.useSubjectCredsOnly¶

Required to be set to false when authenticating with a Kerberos enabled backend.

Default Value	`true`
Supported Values	`true\|false`
Version Added	`4.0.0`

Documentation Feedback ¶

Send feedback on this documentation to: feedback@gluent.com

Supported Values	Valid AWS secret access key
Version Added	`4.1.0`

Supported Values	`CDH\|GCP\|MSAZURE\|SNOWFLAKE`
Version Added	`2.3.0`

Supported Values	Valid `odbcinst.ini` entry
Version Added	`4.3.0`

Supported Values	Any valid Google BigQuery location
Version Added	`4.0.2`

Supported Values	Valid Cloudera Navigator entity ID
Version Added	`2.11.0`