Reference

Documentation Conventions

  • Commands and keywords are in this font.

  • $OFFLOAD_HOME is set when the environment file (offload.env) is sourced, unless already set, and refers to the directory named offload that is created when the software is unpacked. This is also referred to as <OFFLOAD_HOME> in sections of this guide where the environment file has not been created/sourced.

  • Third party vendor product names might be aliased or shortened for simplicity. See Third Party Vendor Products for cross-references to full product names and trademarks.

Environment File

AWS_ACCESS_KEY_ID

Access key ID for AWS authentication, required when staging offloaded data to S3 and not using either an AWS credentials file or instance-level permissions.

Supported Values

Valid AWS access key ID

Version Added

4.1.0

AWS_SECRET_ACCESS_KEY

Secret access key for AWS authentication, required when staging offloaded data to S3 and not using either an AWS credentials file or instance-level permissions.

Supported Values

Valid AWS secret access key

Version Added

4.1.0

BACKEND_DISTRIBUTION

Backend system distribution override.

Supported Values

CDH|GCP|MSAZURE|SNOWFLAKE

Version Added

2.3.0

BACKEND_IDENTIFIER_CASE

Case conversion to be applied to any backend identifier names created by Gluent Data Platform. Backend systems may ignore any case conversion if they are case-insensitive.

Supported Values

UPPER|LOWER|NO_MODIFY

Version Added

4.0.0

BACKEND_ODBC_DRIVER_NAME

Name of the Microsoft ODBC driver as specified in odbcinst.ini.

Supported Values

Valid odbcinst.ini entry

Version Added

4.3.0

BIGQUERY_DATASET_LOCATION

Google BigQuery location to use when creating a dataset. Only applicable when creating datasets using the --create-backend-db option.

Supported Values

Any valid Google BigQuery location

Version Added

4.0.2

Note

Google BigQuery dataset locations must be compatible with that of the Google Cloud Storage bucket specified in OFFLOAD_FS_CONTAINER

CLASSPATH

Ensures Gluent lib directory is included.

Supported Values

Valid paths

Version Added

2.3.0

CLOUDERA_NAVIGATOR_HIVE_SOURCE_ID

The Cloudera Navigator entity ID for the Hive source that will register metadata. See the Installation and Upgrade guide for details on how to set this parameter.

Supported Values

Valid Cloudera Navigator entity ID

Version Added

2.11.0

CONNECTOR_HIVE_SERVER_HOST

Name of host(s) to use to connect to Impala/Hive. Can be a comma-separated list of hosts to randomly choose from, e.g. hadoop1,hadoop2,hadoop3. Use when configuring Gluent Query Engine to connect to a different Cloudera Data Platform experience to Gluent Offload Engine (e.g. Data Warehouse rather than Data Hub). If unset, all connections will be made to HIVE_SERVER_HOST.

Supported Values

Hostname or IP address of Impala/Hive host(s)

Version Added

4.1.0

CONNECTOR_HIVE_SERVER_HTTP_PATH

Path component of URL endpoint when connecting to HiveServer2 in HTTP mode (i.e. when HIVE_SERVER_HTTP_TRANSPORT is true). Use when configuring Gluent Query Engine to connect to a different Cloudera Data Platform experience to Gluent Offload Engine (e.g. Data Warehouse rather than Data Hub). If unset, all connections will be made with HIVE_SERVER_HTTP_PATH.

Supported Values

Valid URL path

Version Added

4.1.0

CONNECTOR_SQL_ENGINE

SQL engine used by Gluent Query Engine for hybrid queries.

Default Value

IMPALA

Supported Values

IMPALA|BIGQUERY|SNOWFLAKE|SYNAPSE

Version Added

3.1.0

CONN_PRE_CMD

Used to set pre-commands before query execution, e.g. set hive.execution.engine=tez;.

Supported Values

Supported session set parameters

Version Added

2.3.0

DATA_GOVERNANCE_API_PASS

Password for the account specified in DATA_GOVERNANCE_API_USER. Password encryption is supported using the Password Tool utility.

Supported Values

Cloudera Navigator service account password

Version Added

2.11.0

DATA_GOVERNANCE_API_URL

URL for a data governance REST API in the format http://fqdn-n.example.com:port/api. Leaving this configuration item blank disables data governance integration.

Supported Values

Valid Cloudera Navigator REST API URL

Version Added

2.11.0

DATA_GOVERNANCE_API_USER

Service account to be used to connect to a data governance REST API.

Supported Values

Cloudera Navigator service account name

Version Added

2.11.0

DATA_GOVERNANCE_AUTO_PROPERTIES

CSV string of dynamic properties to include in data governance metadata. The tokens in the CSV will be expanded at runtime if prefixed with + or ignored if prefixed with -.

Supported Values

CSV containing the following tokens prefixed with either + or -.

GLUENT_OBJECT_TYPE, SOURCE_RDBMS_TABLE, TARGET_RDBMS_TABLE, INITIAL_GLUENT_VERSION, LATEST_GLUENT_VERSION, INITIAL_OPERATION_DATETIME, LATEST_OPERATION_DATETIME

Version Added

2.11.0

DATA_GOVERNANCE_AUTO_TAGS

CSV string of tags to include in data governance metadata. Tags are free-format except for +RDBMS_NAME which is expanded at run time.

Default Value:

GLUENT,+RDBMS_NAME

Supported Values

CSV containing tags to attach to data governance metadata

Version Added

2.11.0

DATA_GOVERNANCE_BACKEND

Specify the data governance API type accessed via DATA_GOVERNANCE_API_URL.

Supported Values

navigator

Version Added

2.11.0

DATA_GOVERNANCE_CUSTOM_PROPERTIES

JSON string of key/value pairs to include in data governance metadata.

Associated Option

--data-governance-custom-properties

Supported Values

Valid JSON string of key/value pairs (no nested or complex data types)

Version Added

2.11.0

DATA_GOVERNANCE_CUSTOM_TAGS

CSV string of tags to include in data governance metadata.

Associated Option

--data-governance-custom-tags

Supported Values

CSV containing tags to attach to data governance metadata

Version Added

2.11.0

DATA_SAMPLE_PARALLELISM

Degree of parallelism to use when sampling data for all columns in the source RDBMS table that are either date/timestamp-based or defined as a number without a precision and scale. A value of 0 or 1 disables parallelism.

Associated Option

--data-sample-parallelism

Default Value

0

Supported Values

0 and positive integers

Version Added

4.2.0

DATAD_ADDRESS

The address(es) of Data Daemon. For a single daemon the format is <hostname/IP address>:<port>. Specifying multiple daemons can be achieved in one of two ways:

  • By DNS address. The DNS server can return multiple A records for a hostname and Gluent Data Platform will load balance between these, e.g. <load-balancer-address>:<load-balancer-port>

  • By IP address and port. The comma-separated list must be prefixed with ipv4: e.g. ipv4:<hostname/IP address>:<port>,<hostname/IP address>:<port>

Supported Values

<hostname/IP address>:<port>, <load-balancer-address>:<load-balancer-port>, ipv4:<hostname/IP address>:<port>,<hostname/IP address>:<port>

Version Added

4.0.0

DATAD_SSL_ACTIVE

Set to true when TLS is enabled on the Data Daemon socket.

Supported Values

true|false

Version Added

4.0.0

DATAD_SSL_TRUSTED_CERTS

The trusted certificate when TLS is enabled on the Data Daemon socket.

Supported Values

Full path to the trusted certificate

Version Added

4.0.0

DATAD_WEB_PASS

Password for authentication with Data Daemon Web Interface (if configured). Password encryption is supported using the Password Tool utility.

Supported Values

Data Daemon Web Interface user password

Version Added

4.1.0

DATAD_WEB_USER

User for authentication with Data Daemon Web Interface (if configured).

Supported Values

Data Daemon Web Interface username

Version Added

4.1.0

DB_NAME_PREFIX

Database name/path prefix for multitenant support. This allows multiple Oracle Database databases to offload to the same backend cluster. If undefined, the DB_UNIQUE_NAME will be used, giving <DB_UNIQUE_NAME>_<schema>. If defined but empty, no prefix is used, giving <schema>. Otherwise, databases will be named <DB_NAME_PREFIX>_<schema>.

If the source database is part of an Oracle Data Guard configuration set DB_NAME_PREFIX to ensure that DB_UNIQUE_NAME is not used.

Associated Option

--db-name-prefix

Supported Values

Characters supported by the backend database/dataset/schema-naming rules

Version Added

2.3.0

DEFAULT_BUCKETS

Default number of offload buckets for parallel data retrieval from the backend Hadoop system. If you aim to run your biggest queries with parallel DOP X then set this value to X. This way each Oracle Database PX slave can start its own Smart Connector process for fetching a subset of data.

Associated Option

--num-buckets

Supported Values

Valid Oracle Database DOP

Version Added

2.3.0

DEFAULT_BUCKETS_MAX

Upper limit of DEFAULT_BUCKETS when DEFAULT_BUCKETS=AUTO.

Default Value

16

Supported Values

Valid Oracle Database DOP

Version Added

2.7.0

DEFAULT_BUCKETS_THRESHOLD

Threshold at which RDBMS segments are considered “small” by DEFAULT_BUCKETS=AUTO tuning.

Supported Values

E.g. 3M, 0.5G

Version Added

2.7.0

GOOGLE_APPLICATION_CREDENTIALS

Path to Google service account private key JSON file.

Supported Values

Valid paths

Version Added

4.0.0

GOOGLE_KMS_KEY_NAME

Google Cloud Key Management Service cryptographic key name to use for encryption and decryption operations. The purpose of this key must be Symmetric encryption.

Supported Values

Valid KMS key name

Version Added

4.2.0

GOOGLE_KMS_KEY_RING_NAME

Google Cloud Key Management Service cryptographic key ring name containing the key defined in GOOGLE_KMS_KEY_NAME

Supported Values

Valid KMS key ring name

Version Added

4.2.0

GOOGLE_KMS_KEY_RING_LOCATION

Google Cloud Key Management Service cryptographic key ring location of the key ring defined in GOOGLE_KMS_KEY_RING_NAME

Supported Values

Valid Google Cloud Service locations

Version Added

4.2.0

HADOOP_SSH_USER

User to connect to Hadoop server(s) defined in HIVE_SERVER_HOST using password-less SSH.

Supported Values

Valid host username

Version Added

2.3.0

HDFS_CMD_HOST

Overrides HIVE_SERVER_HOST for the HDFS command steps only. In split installation environments where orchestration commands are run from a Hadoop edge node(s), set this to localhost in the Hadoop edge node(s) configuration file.

Supported Values

Hostname or IP address of HDFS host

Version Added

2.3.0

HDFS_DATA

HDFS data directory of the HIVE_SERVER_USER. Used to store offloaded data.

Associated Option

--hdfs-data

Supported Values

Valid HDFS directory

Version Added

2.3.0

HDFS_DB_PATH_SUFFIX

Hadoop databases are named <schema><HDFS_DB_PATH_SUFFIX> and <schema>_load<HDFS_DB_PATH_SUFFIX>. When this value is not set the suffix of the databases defaults to .db, giving <schema>.db and <schema>_load.db. Set this to an empty string to use no suffix. For backend systems other than Hadoop this variable has no effect.

Associated Option

--hdfs-db-path-suffix

Supported Values

Hostname or IP address of HDFS host

Version Added

2.3.0

HDFS_HOME

HDFS home directory of the HIVE_SERVER_USER.

Supported Values

Valid HDFS directory

Version Added

2.3.0

HDFS_LOAD

HDFS data directory of the HIVE_SERVER_USER. Used to stage offloaded data.

Supported Values

Valid HDFS directory

Version Added

3.4.0

HDFS_NAMENODE_ADDRESS

Hostname or IP address of the active HDFS namenode or the ID of the HDFS nameservice if HDFS High Availability is configured. This value is required in order to execute result cache queries. In a deployment where result cache queries will never be used, this variable can safely be unset.

Supported Values

Hostname or IP address of active HDFS namenode or ID of the HDFS nameservice if HDFS High Availability is configured

Version Added

2.3.0

HDFS_NAMENODE_PORT

Port of the active HDFS namenode. Set to 0 if HDFS High Availability is configured and HDFS_NAMENODE_ADDRESS is set to a nameservice ID. As with HDFS_NAMENODE_ADDRESS, this value is necessary for executing result cache queries, but otherwise can safely be unset.

Supported Values

Port of active HDFS namenode or 0 if HDFS High Availability is configured

Version Added

2.3.0

HDFS_RESULT_CACHE_USER

Hadoop user to impersonate when making HDFS requests for result cache queries; must have write permissions to HDFS_HOME. In a deployment where result cache queries will never be used, this variable can safely be unset.

Default Value

HIVE_SERVER_USER

Supported Values

Hadoop username

Version Added

2.3.0

HDFS_SNAPSHOT_PATH

Before an Incremental Update compaction a HDFS snapshot will be automatically created in the location specified by HDFS_SNAPSHOT_PATH. This location must be a snapshottable directory (consult your HDFS administrators to enable this). When changing HDFS_SNAPSHOT_PATH from the default ensure that it remains a parent directory of HDFS_DATA. Unsetting this variable will disable automatic HDFS snapshots.

Default Value

HDFS_DATA

Supported Values

HDFS path that is equal to or a parent of HDFS_DATA

Version Added

2.10.0

HDFS_SNAPSHOT_SUDO_COMMAND

If HADOOP_SSH_USER is not the inode owner of HDFS_SNAPSHOT_PATH then HDFS superuser rights will be required to take HDFS snapshots. A sudo rule (or equivalent user substitution tool) can be used to enable this using HDFS_SNAPSHOT_SUDO_COMMAND. The command must be password-less.

Supported Values

A valid user-substitution command

Version Added

2.10.0

HIVE_SERVER_AUTH_MECHANISM

Authentication mechanism for HiveServer2. In non-kerberized and non-LDAP environments, should be set to: Impala: NOSASL, Hive: value of hive-site.xml: hive.server2.authentication. In LDAP environments, should be set to PLAIN.

Supported Values

NOSASL|PLAIN, value of hive-site.xml: hive.server2.authentication

Version Added

2.3.0

HIVE_SERVER_HOST

Name of host(s) to connect to Impala/Hive. Can be a comma-separated list of hosts to randomly choose from, e.g. hadoop1,hadoop2,hadoop3.

Supported Values

Hostname or IP address of Impala/Hive host(s)

Version Added

2.3.0

HIVE_SERVER_HTTP_PATH

Path component of URL endpoint when connecting to HiveServer2 in HTTP mode (i.e. when HIVE_SERVER_HTTP_TRANSPORT is true).

Supported Values

Valid URL path

Version Added

4.1.0

HIVE_SERVER_HTTP_TRANSPORT

Use HTTP transport for HiveServer2 connections.

Default Value

false

Supported Values

true|false

Version Added

4.1.0

HIVE_SERVER_PASS

Password of the user to authenticate with HiveServer2 service. Required in LDAP enabled Impala configurations. Password encryption is supported using the Password Tool utility.

Supported Values

HiveServer2 service password

Version Added

2.3.0

HIVE_SERVER_PORT

Port of HiveServer2 service. Default Impala port is 21050, default Hive port is 10000.

Default Value

21050|10000

Supported Values

Port of HiveServer2 service

Version Added

2.3.0

HIVE_SERVER_USER

Name of the user to authenticate with HiveServer2 service.

Supported Values

HiveServer2 service username

Version Added

2.3.0

HYBRID_EXT_TABLE_DEGREE

Default degree of parallelism for base hybrid external tables. When set to AUTO Offload will copy settings from the source RDBMS table to the hybrid external table.

Associated Option

--ext-table-degree

Supported Values

AUTO and positive integers

Version Added

2.11.2

HS2_SESSION_PARAMS

Comma-separated list of HiveServer2 session parameters to set.

BATCH_SIZE=16384 is a recommended performance setting.

E.g. export HS2_SESSION_PARAMS="BATCH_SIZE=16384,MEM_LIMIT=2G".

Supported Values

Valid Impala/Hive session parameters

Version Added

2.3.0

IN_LIST_JOIN_TABLE

Database and table name of the in-list join table. Can be created and populated with ./connect --create-sequence-table. Applicable to Impala.

Supported Values

Valid database and table name

Version Added

2.4.2

IN_LIST_JOIN_TABLE_SIZE

Size of table specified by IN_LIST_JOIN_TABLE. Required for both table population by connect, and table usage by Gluent Query Engine. Applicable to Impala.

Supported Values

Up to 1000000

Version Added

2.4.2

KERBEROS_KEYTAB

The path of the keytab file. If not provided, a valid ticket must already exist in the cache (i.e. manual kinit).

Supported Values

Path to the keytab file

Version Added

2.3.0

KERBEROS_PATH

If your Kerberos utilities (like kinit) reside in some non-standard directory, set the path here.

Supported Values

Path to Kerberos utilities

Version Added

2.3.0

KERBEROS_PRINCIPAL

The Kerberos user to authenticate as. i.e. kinit -kt KERBEROS_KEYTAB KERBEROS_PRINCIPAL should succeed. If KERBEROS_KEYTAB is provided, this should also be provided.

Supported Values

Name of Kerberos principal

Version Added

2.3.0

KERBEROS_SERVICE

The Impala/Hive service (typically impala/hive). If empty, Smart Connector will attempt to connect unsecured.

Supported Values

Name of Impala service

Version Added

2.3.0

KERBEROS_TICKET_CACHE_PATH

Required to use the libhdfs3-based result cache with an HDFS cluster that uses Kerberos authentication. In a deployment where result cache queries will never be used, this variable can safely be unset.

Supported Values

Path to Kerberos ticket cache path for the user that will be executing Smart Connector processes

Version Added

2.3.0

LD_LIBRARY_PATH

Ensures Gluent lib directory is included.

Supported Values

Valid paths

Version Added

2.3.0

LIBHDFS3_CONF

HDFS client configuration file location.

Supported Values

Valid path to XML configuration file

Version Added

3.0.4

LOG_LEVEL

Logging level verbosity.

Default Value

info

Supported Values

info|detail|debug

Version Added

2.3.0

MAX_OFFLOAD_CHUNK_COUNT

Restrict number of partitions offloaded per cycle. See Offload Transport Chunks for usage.

Associated Option

--max-offload-chunk-count

Supported Values

1-1000

Version Added

2.9.0

MAX_OFFLOAD_CHUNK_SIZE

Restrict size of partitions offloaded per cycle. See Offload Transport Chunks for usage.

Associated Option

--max-offload-chunk-size

Supported Values

E.g. 100M, 1G, 1.5G

Version Added

2.9.0

METAD_AUTOSTART

Enable Metadata Daemon automatic start:

TRUE: If Metadata Daemon is not running, Smart Connector will attempt to start Metadata Daemon automatically. FALSE: Smart Connector will only attempt to connect to an already running Metadata Daemon.

Default Value

true

Supported Values

true|false

Version Added

2.6.0

METAD_POOL_SIZE

The maximum number of connections Metadata Daemon will maintain in its connection pool to Oracle Database.

Default Value

16

Supported Values

Number of connections

Version Added

2.4.5

METAD_POOL_TIMEOUT

The timeout for idle connections in Metadata Daemon’s connection pool to Oracle Database.

Default Value

300

Supported Values

Timeout value in seconds

Version Added

2.4.5

NLS_LANG

Should be set to the value of Oracle Database NLS_CHARACTERSET.

Supported Values

Valid NLS_CHARACTERSET values

Version Added

2.3.0

NUM_LOCATION_FILES

Number of external table location files for parallel data retrieval.

Associated Option

--num-location-files

Supported Values

Integer values

Version Added

2.7.2

OFFLOAD_BACKEND_SESSION_PARAMETERS

Key/value pairs, in JSON format, to override backend query engine parameters. These take effect when establishing a connection to the backend system. For example:

"{\"export OFFLOAD_BACKEND_SESSION_PARAMETERS="{\"request_pool\": \"'root.gluent'\"}"

Supported Values

Valid JSON string of key/value pairs (no nested or complex data types)

Version Added

3.3.2

OFFLOAD_BIN

Path to the Gluent Data Platform bin directory ($OFFLOAD_HOME/bin).

Supported Values

Oracle Database directory object name

Version Added

2.3.0

OFFLOAD_CONF

Path to the Gluent Data Platform conf directory.

Supported Values

Path to conf directory

Version Added

2.3.0

OFFLOAD_COMPRESS_LOAD_TABLE

Compress staged data during an Offload. This can be useful when staging to cloud storage.

Associated Option

--compress-load-table

Supported Values

true|false

Version Added

4.0.0

OFFLOAD_DISTRIBUTE_ENABLED

Distribute data by partition key(s) during the final INSERT operation of an offload. Hive only.

Associated Option

--offload-distribute-enabled

Supported Values

true|false

Version Added

2.8.0

OFFLOAD_FS_AZURE_ACCOUNT_DOMAIN

Microsoft Azure storage account service domain, required when staging offloaded data in Azure storage.

Supported Values

blob.core.windows.net

Version Added

4.1.0

OFFLOAD_FS_AZURE_ACCOUNT_KEY

Microsoft Azure account key, required when staging offloaded data in Azure storage.

Supported Values

Valid Azure account key

Version Added

4.1.0

OFFLOAD_FS_AZURE_ACCOUNT_NAME

Microsoft Azure account name, required when staging offloaded data in Azure storage.

Supported Values

Valid Azure account name

Version Added

4.1.0

OFFLOAD_FS_CONTAINER

The name of the bucket or container to be used when offloading to cloud storage.

Associated Option

--offload-fs-container

Supported Values

A cloud storage bucket/container name configured for use by the backend cluster

Version Added

3.0.0

OFFLOAD_FS_PREFIX

A directory path used to prefix database locations within OFFLOAD_FS_SCHEME. When OFFLOAD_FS_SCHEME is inherit HDFS_DATA takes precedence over this setting.

Associated Option

--offload-fs-prefix

Supported Values

A valid directory in HDFS or cloud storage

Version Added

3.0.0

OFFLOAD_FS_SCHEME

The filesystem scheme to be used for database and table locations. inherit specifies that all tables created by Offload will not specify a LOCATION clause, they will inherit the location from the parent database. See Integrating with Cloud Storage for details.

Associated Option

--offload-fs-scheme

Supported Values

inherit, hdfs, s3a, adl, abfs, abfss

Version Added

3.0.0

OFFLOAD_HOME

Location of Gluent Data Platform installation.

Supported Values

Path to installed offload directory

Version Added

2.3.0

OFFLOAD_LOG

Path to the Gluent Data Platform log directory.

Supported Values

Oracle Database directory object name

Version Added

2.3.0

OFFLOAD_LOGDIR

Override Smart Connector log path. If undefined defaults to $OFFLOAD_HOME/log.

Supported Values

Valid path

Version Added

2.3.0

OFFLOAD_NOT_NULL_PROPAGATION

Specify how Offload should treat NOT NULL constraints on offloaded columns. A value of AUTO will propagate all RDBMS NOT NULL constraints to the backend and a value of NONE will not propagate any NOT NULL constraints columns to the backend table. Only applies to Google BigQuery, Snowflake or Azure Synapse Analytics backends. The --not-null-columns option can be used to override this global setting, allowing a specific list of columns to be defined as NOT NULL for an individual offload.

Default Value

AUTO

Supported Values

AUTO|NONE

Version Added

4.3.4

OFFLOAD_SORT_ENABLED

Enables the sorting/clustering of data when inserting in to the final destination table. Columns used for sorting/clustering are specified using --sort-columns.

Associated Option

--offload-sort-enabled

Supported Values

true|false

Version Added

2.7.0

OFFLOAD_STAGING_FORMAT

Staging file format to use when staging offloaded data for loading into Snowflake.

Default value

PARQUET

Supported Values

AVRO|PARQUET

Version Added

4.1.0

OFFLOAD_TRANSPORT

Method used to transport data from an RDBMS frontend to a backend system. AUTO selects the optimal method based on configuration and table structure.

Associated Option

--offload-transport

Supported Values

AUTO|GLUENT|SQOOP

Version Added

3.1.0

OFFLOAD_TRANSPORT_AUTH_USING_ORACLE_WALLET

Instruct Offload that RDBMS authentication is via an Oracle Wallet. The wallet location should be configured using Hadoop configuration appropriate to method used for data transport. See SQOOP_OVERRIDES and OFFLOAD_TRANSPORT_SPARK_PROPERTIES for examples.

Supported Values

true|false

Version Added

3.1.0

OFFLOAD_TRANSPORT_CMD_HOST

An override for HDFS_CMD_HOST when running shell based Offload Transport commands such as Sqoop or Spark Submit.

Associated Option

--offload-transport-cmd-host

Supported Values

Hostname or IP address of HDFS host

Version Added

3.1.0

OFFLOAD_TRANSPORT_CONSISTENT_READ

Control whether parallel data transport tasks should use a consistent point in time when reading RDBMS data.

Associated Option

--offload-transport-consistent-read

Supported Values

true|false

Version Added

3.1.0

OFFLOAD_TRANSPORT_CREDENTIAL_PROVIDER_PATH

The credential provider path to be used in conjunction with OFFLOAD_TRANSPORT_PASSWORD_ALIAS. Integration with Hadoop Credential Provider API is only supported by Sqoop, Spark Submit and Livy based Offload Transport.

Supported Values

A valid HDFS path

Version Added

3.1.0

OFFLOAD_TRANSPORT_DSN

Database connection details for Offload Transport if different to ORA_CONN.

Associated Option

--offload-transport-dsn

Supported Values

<hostname>:<port>/<service>

Version Added

3.1.0

OFFLOAD_TRANSPORT_FETCH_SIZE

Number of records to fetch in a single batch from the RDBMS during Offload. Offload Transport may encounter memory pressure if a table is very wide (e.g. contains LOB columns) and there are lots of records in a batch. Reducing the fetch size can alleviate this if more memory cannot be allocated.

Associated Option

--offload-transport-fetch-size

Supported Values

Positive integers

Version Added

3.1.0

OFFLOAD_TRANSPORT_LIVY_API_VERIFY_SSL

Used to enable SSL for Livy API calls. There are 4 states:

  1. Empty: Do not use SSL.

  2. TRUE: Use SSL and verify Hadoop certificate against known certificates.

  3. FALSE: Use SSL and do not verify Hadoop certificate.

  4. /some/path/here/cert-bundle.crt: Use SSL and verify Hadoop certificate against path to certificate bundle.

Supported Values

Empty, true|false , <path to certificate bundle>

Version Added

3.1.0

OFFLOAD_TRANSPORT_LIVY_API_URL

URL for Livy/Spark REST API in the format http://fqdn-n.example.com:port. https can be used in place of http.

Associated Option

--offload-transport-livy-api-url

Supported Values

Valid Livy REST API URL

Version Added

3.1.0

OFFLOAD_TRANSPORT_LIVY_IDLE_SESSION_TIMEOUT

Timeout (in seconds) for idle Spark client sessions created in Livy.

Associated Option

--offload-transport-livy-idle-session-timeout

Supported Values

Positive integers

Version Added

3.1.0

OFFLOAD_TRANSPORT_LIVY_MAX_SESSIONS

Limits the number of Livy sessions Offload will create. Sessions are re-used when idle. New sessions are only created when no idle sessions are available.

Associated Option

--offload-transport-livy-max-sessions

Supported Values

Positive integers

Version Added

3.1.0

OFFLOAD_TRANSPORT_PARALLELISM

The number of parallel streams to be used when transporting data from the source RDBMS to the backend.

Associated Option

--offload-transport-parallelism

Supported Values

Positive integers

Version Added

3.1.0

OFFLOAD_TRANSPORT_PASSWORD_ALIAS

An alias provided by Hadoop Credential Provider API to be used for RDBMS authentication during Offload Transport. The key store containing the alias must be specified in either OFFLOAD_TRANSPORT_CREDENTIAL_PROVIDER_PATH or in Hadoop configuration Path (hadoop.security.credential.provider.path).

Associated Option

--offload-transport-password-alias

Supported Values

Valid Hadoop Credential Provider API alias

Version Added

3.1.0

OFFLOAD_TRANSPORT_RDBMS_SESSION_PARAMETERS

Key/value pairs, in JSON format, to supply database session parameter values. These only take effect during Offload Transport, e.g. '{"cell_offload_processing": "false"}'

Supported Values

Valid JSON string of key/value pairs (no nested or complex data types)

Version Added

3.1.0

OFFLOAD_TRANSPORT_SMALL_TABLE_THRESHOLD

Threshold above which Query Import is no longer considered the correct offload choice for non-partitioned tables.

Supported Values

E.g. 100M, 1G, 1.5G

Version Added

4.2.0

OFFLOAD_TRANSPORT_SPARK_OVERRIDES

Override JVM flags for a spark-submit command, inserted immediately after spark-submit.

Associated Option

--offload-transport-jvm-overrides

Supported Values

Valid JVM options

Version Added

3.1.0

OFFLOAD_TRANSPORT_SPARK_PROPERTIES

Key/value pairs, in JSON format, to override Spark property defaults. Examples:

'{"spark.driver.memory": "8G", "spark.executor.memory": "8G"}'

'{"spark.driver.extraJavaOptions": "-Doracle.net.wallet_location=/some/path/here/gluent_wallet", "spark.executor.extraJavaOptions": "-Doracle.net.wallet_location=/some/path/here/gluent_wallet"}'

Associated Option

--offload-transport-spark-properties

Supported Values

Valid JSON string of key/value pairs (no nested or complex data types)

Version Added

3.1.0

Note

Some properties will not take effect when connecting to the Spark Thrift Server because the Spark context has already been created.

OFFLOAD_TRANSPORT_SPARK_QUEUE_NAME

YARN queue name for Gluent Offload Engine Spark jobs.

Associated Option

--offload-transport-queue-name

Supported Values

Valid YARN queue name

Version Added

3.1.0

OFFLOAD_TRANSPORT_SPARK_SUBMIT_EXECUTABLE

The executable to use for submitting Spark applications. Can be empty, spark-submit or spark2-submit.

Supported Values

Blank or spark-submit|spark2-submit

Version Added

3.1.0

OFFLOAD_TRANSPORT_SPARK_SUBMIT_MASTER_URL

The master URL for the Spark cluster, only used for non-Hadoop Spark clusters. If empty, Spark will use default settings.

Associated Option

None

Supported Values

Valid master URL

Version Added

4.0.0

OFFLOAD_TRANSPORT_SPARK_THRIFT_HOST

Name of host(s) where the Spark Thrift Server is running. Can be a comma-separated list of hosts to randomly choose from, e.g. hadoop1,hadoop2,hadoop3.

Associated Option

--offload-transport-spark-thrift-host

Supported Values

Hostname or IP address of Spark Thrift Server host(s)

Version Added

3.1.0

OFFLOAD_TRANSPORT_SPARK_THRIFT_PORT

Port that the Spark Thrift Server is listening on.

Associated Option

--offload-transport-spark-thrift-port

Supported Values

Active port

Version Added

3.1.0

OFFLOAD_TRANSPORT_USER

User to authenticate as when executing Offload Transport commands such as SSH for spark-submit or Sqoop commands, or Livy API calls

Associated Option

None

Supported Values

Valid username

Version Added

4.0.0

OFFLOAD_TRANSPORT_VALIDATION_POLLING_INTERVAL

Polling interval in seconds for validation of Spark transport row count. A value of -1 disables retrieval of RDBMS SQL statistics. A value of 0 disables polling resulting in a single capture of SQL statistics after Offload Transport. A value greater than 0 polls RDBMS SQL statistics using the specified interval.

Associated Option

--offload-transport-validation-polling-interval

Supported Values

Interval value in seconds, 0 or -1

Version Added

4.2.1

Note

When the Spark Thrift Server or Apache Livy are used for Offload Transport it is recommended to set OFFLOAD_TRANSPORT_VALIDATION_POLLING_INTERVAL to a positive value. This is because polling RDBMS SQL statistics is the primary validation for both Spark Thrift Server and Apache Livy based Offload Transport.

OFFLOAD_UDF_DB

For Impala/Hive, the database that Gluent Data Platform UDFs are created in. If undefined defaults to the default database.

For BigQuery, the name of the dataset that contains custom UDF(s) for synthetic partitioning. If undefined, the dataset will be determined from the --partition-functions option.

Supported Values

Valid Impala/Hive database or BigQuery dataset

Version Added

2.3.0

OFFLOAD_VERIFY_PARALLELISM

Degree of parallelism to use for the RDBMS query executed when validating an offload. Values of 0 or 1 will execute the query without parallelism. Values > 1 will force a parallel query of the given degree. If unset, the RDBMS query will fall back to using the behavior specified by RDBMS defaults.

Associated Option

--verify-parallelism --frontend-parallelism

Supported Values

0 and positive integers

Version Added

4.2.1

ORA_ADM_CONN

Connection string (typically tnsnames.ora entry) for ORA_ADM_USER connections. Primarily for use with Oracle Wallet as each entry requires a unique connection string.

Supported Values

Connection string corresponding to the Oracle Wallet entry for ORA_ADM_USER

Version Added

4.2.0

ORA_ADM_PASS

Password of the Gluent Data Platform Admin Schema chosen during installation. Password encryption is supported using the Password Tool utility.

Supported Values

Oracle Database ADM password

Version Added

2.3.0

ORA_ADM_USER

Name of the Gluent Data Platform Admin Schema chosen during installation.

Supported Values

Oracle Database ADM username

Version Added

2.3.0

ORA_APP_PASS

Password of the Gluent Data Platform Application Schema chosen during installation. Password encryption is supported using the Password Tool utility.

Supported Values

Oracle Database APP password

Version Added

2.3.0

ORA_APP_USER

Name of the Gluent Data Platform Application Schema chosen during installation.

Supported Values

Oracle Database APP username

Version Added

2.3.0

ORA_CONN

Oracle Database connection details. Fully qualified DB service name must be used if the Oracle Database service name includes domain-names (DB_DOMAIN), e.g. ORCL12.gluent.com.

Supported Values

<hostname>:<port>/<service>

Version Added

2.3.0

ORA_REPO_USER

Name of the Gluent Data Platform Repository Schema chosen during installation.

Supported Values

Oracle Database REPO username

Version Added

3.3.0

PASSWORD_KEY_FILE

Password key file generated by Password Tool and used to create encrypted password strings.

Supported Values

Path to Password Key File

Version Added

2.5.0

PATH

Ensures Gluent Data Platform bin directory is included. The path order is important to ensure that the Python distribution included with Gluent Data Platform is used.

Supported Values

Valid paths

Version Added

2.3.0

QUERY_ENGINE

Backend SQL engine to use for commands issued as part of Offload/Present orchestration.

Supported Values

BIGQUERY|IMPALA|SNOWFLAKE|SYNAPSE

Version Added

2.3.0

QUERY_MONITOR_THRESHOLD

Threshold for hybrid query execution time (in seconds) that enables automatic monitoring of a query in the backend. Queries with Data Daemon execution time below this threshold will not gather any backend trace metrics or profiles. A value of 0 will enable automatic trace/profile collection for all hybrid queries. Individual hybrid queries can have trace enabled or disabled with the GLUENT_QUERY_MONITOR or GLUENT_NO_QUERY_MONITOR hints, respectively.

Supported Values

Integers >= 0

Version Added

4.3.2

SNOWFLAKE_ACCOUNT

Name of the Snowflake account to use with Gluent Data Platform.

Supported Values

Snowflake account name

Version Added

4.1.0

SNOWFLAKE_DATABASE

Name of the Snowflake database to use with Gluent Data Platform.

Supported Values

Snowflake database name

Version Added

4.1.0

SNOWFLAKE_FILE_FORMAT_PREFIX

Name prefix for Gluent Offload Engine to use when creating file format objects while offloading to Snowflake.

Default Value

GLUENT_OFFLOAD_FILE_FORMAT

Supported Values

Valid Snowflake file format object name <= 120 characters

Version Added

4.1.0

SNOWFLAKE_INTEGRATION

Name of the Snowflake storage integration for Gluent Offload Engine to use when offloading to Snowflake.

Supported Values

Valid Snowflake integration name

Version Added

4.1.0

SNOWFLAKE_PASS

Password for Snowflake service account user for Gluent Data Platform, required when using password authentication. Password encryption is supported using the Password Tool utility.

Supported Values

Snowflake user’s password

Version Added

4.1.0

SNOWFLAKE_PEM_FILE

Path to private PEM file for Snowflake service account user for Gluent Data Platform, required when using key-pair authentication.

Supported Values

Path to Snowflake user’s private PEM key file

Version Added

4.1.0

SNOWFLAKE_PEM_PASSPHRASE

Optional PEM passphrase to authenticate the Snowflake service account user for Gluent Data Platform, only required when using key-pair authentication with a passphrase. Passphrase encryption is supported using the Password Tool utility.

Supported Values

Snowflake user’s PEM passphrase

Version Added

4.1.0

SNOWFLAKE_ROLE

Name of the Snowflake database role created by Gluent Data Platform.

Default Value

GLUENT_OFFLOAD_ROLE

Supported Values

Valid Snowflake role name

Version Added

4.1.0

SNOWFLAKE_STAGE

Name for Gluent Offload Engine to use when creating schema-level stage objects while offloading to Snowflake.

Default Value

GLUENT_OFFLOAD_STAGE

Supported Values

Valid Snowflake stage name

Version Added

4.1.0

SNOWFLAKE_USER

Name of the Snowflake service account user for Gluent Data Platform.

Supported Values

Valid Snowflake user name

Version Added

4.1.0

SNOWFLAKE_WAREHOUSE

Default Snowflake warehouse for Gluent Data Platform to use when interacting with Snowflake.

Supported Values

Valid Snowflake warehouse name

Version Added

4.1.0

SPARK_HISTORY_SERVER

SPARK_HISTORY_SERVER is a URL for accessing the runtime history of the running Spark Thrift Server UI.

Supported Values

URL of Spark History Server e.g. http://hadoop1:18081/

Version Added

3.1.0

SPARK_THRIFT_HOST

Name of host(s) where the Spark Thrift Server is running. Can be a comma-separated list of hosts to randomly choose from, e.g. hadoop1,hadoop2,hadoop3.

Supported Values

Hostname or IP address of Spark Thrift Server host(s)

Version Added

3.1.0

SPARK_THRIFT_PORT

Port that the Spark Thrift Server is listening on.

Supported Values

Active port

Version Added

3.1.0

SQOOP_DISABLE_DIRECT

It is recommended that the OraOOP optimizations for Sqoop (included in standard Apache Sqoop from v1.4.5) are used. If not, then disable direct path mode.

Associated Option

--sqoop-disable-direct

Supported Values

true|false

Version Added

2.3.0

SQOOP_OVERRIDES

Override flags for Sqoop command, inserted immediately after sqoop import. To avoid issues, -Dsqoop.avro.logical_types.decimal.enable=false is included by default and should not be removed. Additional settings can be added as below. For example:

"-Dsqoop.avro.logical_types.decimal.enable=false -Dmapreduce.map.java.opts='-Doracle.net.wallet_location=/some/path/here/gluent_wallet'"

Associated Option

--offload-transport-jvm-overrides

Supported Values

Valid Sqoop parameters

Version Added

2.3.0

SQOOP_ADDITIONAL_OPTIONS

Additional Sqoop command options added at the end of the Sqoop command.

Associated Option

--sqoop-additional-options

Supported Values

Any Sqoop command option/argument not already included in the Sqoop command line

Version Added

2.9.0

SQOOP_PASSWORD_FILE

HDFS path to Sqoop password file, readable by HADOOP_SSH_USER. If not specified, ORA_APP_PASS will be used.

Associated Option

--sqoop-password-file

Supported Values

HDFS path to password file

Version Added

2.5.0

SQOOP_QUEUE_NAME

YARN queue name for Gluent Offload Engine Sqoop jobs.

Associated Option

--offload-transport-queue-name

Supported Values

Valid YARN queue name

Version Added

3.1.0

SSL_ACTIVE

Set to true when Impala/Hive uses SSL/TLS encryption.

Supported Values

true|false

Version Added

2.3.0

SSL_TRUSTED_CERTS

SSL/TLS trusted certificates.

Supported Values

Path to SSL certificate

Version Added

2.3.0

START_OF_WEEK

Specify the first day of the week for TO_CHAR(<value>, 'D') predicate pushdown. Applies to Snowflake and Azure Synapse Analytics.

Default Value

7

Supported Values

1 (Monday) to 7 (Sunday)

Version Added

4.3.0

SYNAPSE_AUTH_MECHANISM

Azure Synapse Analytics authentication mechanism.

Supported Values

SqlPassword, ActiveDirectoryPassword, ActiveDirectoryMsi, ActiveDirectoryServicePrincipal

Version Added

4.3.0

SYNAPSE_COLLATION

Azure Synapse Analytics collation to use for character columns. Note that changing this to a value with different behavior to the frontend system may give unexpected results.

Supported Values

Valid collations

Version Added

4.3.0

SYNAPSE_DATA_SOURCE

Name of the external data source for Gluent Offload Engine to use when offloading to Azure Synapse Analytics. Note that in databases with case-sensitive collations this parameter is case-sensitive.

Supported Values

Valid Azure Synapse Analytics external data source

Version Added

4.3.0

SYNAPSE_DATABASE

Name of the Azure Synapse Analytics database to use with Gluent Data Platform. Note that in databases with case-sensitive collations this parameter is case-sensitive.

Supported Values

Valid Azure Synapse Analytics database name

Version Added

4.3.0

SYNAPSE_FILE_FORMAT

Name of the file format for Gluent Offload Engine to use when offloading to Azure Synapse Analytics. Note that in databases with case-sensitive collations this parameter is case-sensitive.

Supported Values

Valid Azure Synapse Analytics Parquet file format

Version Added

4.3.0

SYNAPSE_MSI_CLIENT_ID

Specifies the object (principal) ID of the identity for ActiveDirectoryMsi authentication with a user-assigned identity. Leave blank when using other authentication mechanisms.

Supported Values

Object (principal) ID of the identity

Version Added

4.3.0

SYNAPSE_PASS

Specifies the password for the Gluent Data Platform user for SqlPassword or ActiveDirectoryPassword authentication. Leave blank when using other authentication mechanisms. Password encryption is supported using the Password Tool utility.

Supported Values

Azure Synapse Analytics user’s password

Version Added

4.3.0

SYNAPSE_PORT

Dedicated SQL endpoint port of Azure Synapse Analytics workspace.

Default Value

1433

Supported Values

Valid port

Version Added

4.3.0

SYNAPSE_RESOURCE_GROUP

Resource group of Azure Synapse Analytics workspace.

Supported Values

Valid Azure Synapse Analytics resource group

Version Added

4.3.0

SYNAPSE_ROLE

Name of the Azure Synapse Analytics database role assigned to the Gluent Data Platform user. Note that in databases with case-sensitive collations this parameter is case-sensitive.

Supported Values

Valid Azure Synapse Analytics role name

Version Added

4.3.0

SYNAPSE_SERVER

Dedicated SQL endpoint of Azure Synapse Analytics workspace.

Supported Values

Valid Azure Synapse Analytics dedicated SQL endpoint

Version Added

4.3.0

SYNAPSE_SERVICE_PRINCIPAL_ID

Specifies the application (client) ID for ActiveDirectoryServicePrincipal authentication. Leave blank when using other authentication mechanisms.

Supported Values

Application (client) ID

Version Added

4.3.0

SYNAPSE_SERVICE_PRINCIPAL_SECRET

Specifies the client secret for ActiveDirectoryServicePrincipal authentication. Leave blank when using other authentication mechanisms.

Supported Values

Client secret

Version Added

4.3.0

SYNAPSE_SUBSCRIPTION_ID

ID of the subscription containing the Azure Synapse Analytics workspace.

Supported Values

Valid Azure Synapse Analytics resource group

Version Added

4.3.0

SYNAPSE_USER

Specifies the username for the Gluent Data Platform user for SqlPassword or ActiveDirectoryPassword authentication. Leave blank when using other authentication mechanisms.

Supported Values

Azure Synapse Analytics username

Version Added

4.3.0

SYNAPSE_WORKSPACE

Name of the Azure Synapse Analytics workspace.

Supported Values

Valid Azure Synapse Analytics workspace

Version Added

4.3.0

TWO_TASK

Used to support Pluggable Databases in Oracle Database Multitenant environments. Set to ORA_CONN for single instance and an EZconnect string connecting to the local instance, typically <hostname>:<port>/<ORACLE_SID>, for Oracle RAC (Real Application Clusters).

Supported Values

ORA_CONN or EZconnect string

Version Added

2.10.0

USE_ORACLE_WALLET

Controls use of Oracle Wallet for authentication for orchestration commands and Metadata Daemon. When set to true OFFLOAD_TRANSPORT_AUTH_USING_ORACLE_WALLET is automatically set to true.

Default Value

false

Supported Values

true|false

Version Added

4.2.0

WEBHDFS_HOST

Can be used in conjunction with WEBHDFS_PORT to optimize HDFS activities removing JVM start-up overhead by utilizing WebHDFS. From version 2.4.7 the value can be a comma-separated list of hosts if HDFS is configured for High Availability.

Supported Values

Hostname or IP address of WebHDFS host

Version Added

2.3.0

WEBHDFS_PORT

Can be used in conjunction with WEBHDFS_HOST to optimize HDFS activities removing JVM start-up overhead by utilizing WebHDFS. If this value is unset then default ports of 50070 (HTTP) or 50470 (HTTPS) are used.

Default Value

50070|50470

Supported Values

Port of HDFS namenode

Version Added

2.3.0

WEBHDFS_VERIFY_SSL

Used to enable SSL for WebHDFS calls. There are 4 states:

  1. Empty: Do not use SSL

  2. TRUE: Use SSL & verify Hadoop certificate against known certificates

  3. FALSE: Use SSL & do not verify Hadoop certificate

  4. /some/path/here/cert-bundle.crt: Use SSL & verify Hadoop certificate against path to certificate bundle

Supported Values

Empty, true|false, <path to certificate bundle>

Version Added

2.3.0

Common Parameters

--execute

Perform operations, rather than just printing.

Alias

-x

Default Value

None

Supported Values

None

Version Added

2.3.0

-f

Force option. Replace Gluent Offload Engine managed tables/views as required. Use with caution.

Alias

--force

Default Value

None

Supported Values

None

Version Added

2.3.0

--force

Force option. Replace Gluent Offload Engine managed tables/views as required. Use with caution.

Alias

-f

Default Value

None

Supported Values

None

Version Added

2.3.0

--no-webhdfs

Prevent the use of WebHDFS even when configured for use.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.3.0

-t

Owner and table name.

Alias

--table

Default Value

None

Supported Values

<OWNER>.<NAME>

Version Added

2.3.0

--table

Owner and table name.

Alias

-t

Default Value

None

Supported Values

<OWNER>.<NAME>

Version Added

2.3.0

--target-name

Override owner and/or name of created frontend or backend object as appropriate for a command.

Allows separation of the RDBMS owner and/or name from the backend system. This can be necessary as some characters supported for owner and name in Oracle Database are not supported in all backend systems, for example $ in Hadoop-based or BigQuery backends.

Allows offload to an existing backend database with a different name to the source RDBMS schema.

Allows present to a hybrid schema without a corresponding application RDBMS schema or with a different name to the source backend database.

Alias

None

Default Value

None

Supported Values

<OWNER>.<NAME>

Version Added

2.3.0

-v

Verbose output.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.3.0

--vv

More verbose output.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.3.0

-x

Perform operations, rather than just printing.

Alias

--execute

Default Value

None

Supported Values

None

Version Added

2.3.0

Connect Parameters

--create-sequence-table

Create the Gluent Data Platform sequence table. See IN_LIST_JOIN_TABLE and IN_LIST_JOIN_TABLE_SIZE.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.4.2

--install-udfs

Install Gluent Data Platform user-defined functions (UDFs).

Alias

None

Default Value

None

Supported Values

None

Version Added

2.3.0

--sequence-table-name

See IN_LIST_JOIN_TABLE.

Alias

None

Default Value

default.gluent_sequence

Supported Values

Valid database and table name

Version Added

2.4.2

--sequence-table-size

See IN_LIST_JOIN_TABLE_SIZE.

Alias

None

Default Value

10000

Supported Values

Up to 1000000

Version Added

2.4.2

--sql-file

Write SQL commands to a file rather than execute them when connect is run.

Alias

None

Default Value

None

Supported Values

Any valid path

Version Added

2.11.0

--update-root-files

Updates both Metadata Daemon and Data Daemon scripts with configuration and sets ownership to root:root. This option can only be run with root privileges.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.11.0

--update-metad-files

Updates Metadata Daemon scripts with configuration and sets ownership to root:root. This option can only be run with root privileges.

Alias

None

Default Value

None

Supported Values

None

Version Added

4.0.0

--update-datad-files

Updates Data Daemon scripts with configuration and sets ownership to root:root. This option can only be run with root privileges.

Alias

None

Default Value

None

Supported Values

None

Version Added

4.0.0

--upgrade-environment-file

Updates configuration file (offload.env) with any missing default configuration from offload.env.template. Typically used after upgrades.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.11.0

--validate-udfs

Validate that the Gluent Data Platform user-defined functions (UDFs) are accessible from Impala after installation/upgrade.

Alias

None

Default Value

None

Supported Values

None

Version Added

4.1.0

Offload Parameters

--allow-decimal-scale-rounding

Confirm that it is acceptable for Offload to round decimal places when loading data into a backend system.

Alias

None

Default Value

None

Supported Values

None

Version Added

4.0.0

--allow-floating-point-conversions

Confirm that it is acceptable for Offload to convert NaN or Infinity special values to NULL when loading data into a backend system.

Alias

None

Default Value

None

Supported Values

None

Version Added

4.3.0

--allow-nanosecond-timestamp-columns

Confirm that it is safe to offload timestamp columns with nanosecond capability when the backend system does not support nanoseconds.

Alias

None

Default Value

None

Supported Values

None

Version Added

4.0.2

--bucket-hash-column

Column to use when calculating offload bucket values.

Alias

None

Default Value

None

Supported Values

Valid column name

Version Added

2.3.0

--compress-load-table

Compress the contents of the load table during offload.

Alias

None

Default Value

OFFLOAD_COMPRESS_LOAD_TABLE, false

Supported Values

None

Version Added

2.3.0

--compute-load-table-stats

Compute statistics on the load table during offload. Applicable to Impala.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.9.0

--create-backend-db

Automatically create backend databases. Either use this option, or ensure the correct databases/datasets/schemas (base and load databases) for offloading and presenting already exist.

Alias

None

Default Value

None

Supported Values

None

Version Added

3.3.0

--count-star-expressions

CSV list of functional equivalents to COUNT(*) for aggregation pushdown.

If you also use COUNT(x) in your SQL statements then, apart from COUNT(1) which is automatically catered for, the presence of COUNT(x) will cause rewrite rules to fail unless you include it with this parameter.

Alias

None

Default Value

None

Supported Values

E.g. COUNT(9)

Version Added

2.3.0

--data-governance-custom-properties

CSV list of free-format tags for data governance metadata. These are in addition to DATA_GOVERNANCE_AUTO_PROPERTIES and will override DATA_GOVERNANCE_CUSTOM_PROPERTIES.

Alias

None

Default Value

DATA_GOVERNANCE_CUSTOM_PROPERTIES

Supported Values

Valid JSON string of key/value pairs (no nested or complex data types)

Version Added

2.11.0

--data-governance-custom-tags

CSV list of free-format tags for data governance metadata. These are in addition to DATA_GOVERNANCE_AUTO_TAGS and therefore useful for tags to be applied to specific activities.

Alias

None

Default Value

DATA_GOVERNANCE_CUSTOM_TAGS

Supported Values

E.g. CONFIDENTIAL,TIER1

Version Added

2.11.0

--data-sample-parallelism

Degree of parallelism to use when sampling data for all columns in the source RDBMS table that are either date or timestamp-based or defined as a number without a precision and scale. A value of 0 or 1 disables parallelism.

Alias

None

Default Value

DATA_SAMPLE_PARALLELISM

Supported Values

0 and positive integers

Version Added

4.2.0

--data-sample-percent

Sample data for all columns in the source RDBMS table that are either date or timestamp-based or defined as a number without a precision and scale. A value of 0 disables sampling. A value of AUTO will enable Offload choose a percentage based on the size of the RDBMS table.

Alias

None

Default Value

AUTO

Supported Values

AUTO or 0-100

Version Added

2.5.0

--date-columns

CSV list of columns to offload as DATE (effective for date/timestamp columns).

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

4.0.0

--db-name-prefix

Multitenant support, enabling many Oracle Database databases to offload to the same backend cluster. See DB_NAME_PREFIX for details.

Alias

None

Default Value

DB_NAME_PREFIX

Supported Values

Supported backend characters

Version Added

2.3.0

--decimal-columns

CSV list of columns to offload/present as a fixed precision and scale numeric data type. For example DECIMAL(p,s) where “p,s” is specified in a paired --decimal-columns-type option. Only effective for numeric columns. These options allow repeat inclusion for flexible data type specification, for example:

"--decimal-columns-type=18,2 --decimal-columns=price,cost --decimal-columns-type=6,4 --decimal-columns=location"

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

2.5.0

--decimal-columns-type

State the precision and scale of columns listed in a paired --decimal-columns option. Must be of format “precision,scale” where 1<=precision<=38 and 0<=scale<=38 and scale<=precision. e.g.:

"--decimal-columns-type=18,2"

When offloading, values specified in this option are subject to padding as per the --decimal-padding-digits option.

Alias

None

Default Value

None

Supported Values

Valid “precision,scale” where 1<=precision<=38 and 0<=scale<=38 and scale<=precision

Version Added

2.5.0

--decimal-padding-digits

Padding to apply to precision and scale of DECIMALs during an offload.

Alias

None

Default Value

2

Supported Values

Integral values

Version Added

2.5.0

--double-columns

CSV list of columns to store as a double precision floating-point. Only effective for numeric columns.

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

2.4.7

--equal-to-values

Used for list-partitioned tables to specify a partition to be included for Partition-Based Offload by partition key value. This option can be included multiple times to match multiple partitions, for example:

--equal-to-values=2011 --equal-to-values=2012 --equal-to-values=2013

Alias

None

Default Value

None

Supported Values

Valid literals matching list-partition key values

Version Added

3.3.0

--ext-table-degree

Default degree of parallelism for base hybrid external tables. When set to AUTO Offload will copy settings from the source RDBMS table to the hybrid external table.

Alias

None

Default Value

HYBRID_EXT_TABLE_DEGREE or AUTO

Supported Values

AUTO and positive integers

Version Added

2.11.2

--hdfs-data

Command line override for HDFS_DATA.

Alias

None

Default Value

HDFS_DATA

Supported Values

Valid HDFS path

Version Added

2.3.0

--hdfs-db-path-suffix

Hadoop databases are named <schema><HDFS_DB_PATH_SUFFIX> and <schema>_load<HDFS_DB_PATH_SUFFIX>. When this value is not set the suffix of the databases defaults to .db, giving <schema>.db and <schema>_load.db. Set this to an empty string to use no suffix. For backend systems other than Hadoop this variable has no effect.

Alias

None

Default Value

HDFS_DB_PATH_SUFFIX, .db on Hadoop systems, or '' on other backend systems.

Supported Values

Valid HDFS path

Version Added

2.3.0

--hive-column-stats

Enable computation of column stats with “NATIVE” --offload-stats method. Applies to Hive only.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.6.1

--integer-1-columns

CSV list of columns to offload/present (as applicable) as a 1-byte integer, known as TINYINT in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

3.3.0

--integer-2-columns

CSV list of columns to offload/present (as applicable) as a 2-byte integer, known as SMALLINT in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

3.3.0

--integer-4-columns

CSV list of columns to offload/present (as applicable) as a 4-byte integer, known as INT in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

3.3.0

--integer-8-columns

CSV list of columns to offload/present (as applicable) as an 8-byte integer, known as BIGINT in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

3.3.0

--integer-38-columns

CSV list of columns to offload/present (as applicable) as 38 digit integral column. If a system does not support 38 digits of precision then the most appropriate data type available will be used. Only effective for numeric columns.

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

3.3.0

--less-than-value

Offload partitions with high water mark less than this value.

Alias

None

Default Value

None

Supported Values

Integer or date values (use YYYY-MM-DD format)

Version Added

2.3.0

--lob-data-length

Expected length of RDBMS LOB data

Alias

None

Default Value

32K

Supported Values

E.g. 64K, 10M

Version Added

2.4.7

--max-offload-chunk-count

Restrict the number of partitions offloaded per cycle. See Offload Transport Chunks for usage.

Alias

None

Default Value

MAX_OFFLOAD_CHUNK_COUNT, 100

Supported Values

1-1000

Version Added

2.3.0

--max-offload-chunk-size

Restrict the size of partitions offloaded per cycle. See Offload Transport Chunks for usage.

Alias

None

Default Value

MAX_OFFLOAD_CHUNK_SIZE, 2G

Supported Values

E.g. 100M, 1G, 1.5G

Version Added

2.3.0

--no-auto-detect-dates

Turn off automatic adoption of string data type for RDBMS date values that are incompatible with the backend system. For example, dates preceding 1400-01-01 are invalid in Impala and will be offloaded to string columns unless this option is used.

Alias

None

Default Value

False

Supported Values

None

Version Added

2.5.1

--no-auto-detect-numbers

Turn off automatic adoption of numeric data types based on their precision and scale in the RDBMS. All numeric data types will be offloaded to a general purpose data type such as DECIMAL(38,18) on Hadoop systems, NUMERIC or BIGNUMERIC on Google BigQuery or NUMBER(38,18) on Snowflake.

Alias

None

Default Value

False

Supported Values

None

Version Added

2.3.0

--no-create-aggregations

Skip aggregation creation. If this parameter is used, then to benefit from Advanced Aggregation Pushdown the aggregate hybrid objects must be created using Present.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.3.0

--no-generate-dependent-views

Dependent views will not be automatically re-generated in the hybrid schema.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.3.0

--no-materialize-join

Offload a join (specified by --offload-join) as a view.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.3.0

--no-modify-hybrid-view

Prevent an offload predicate from being added to the boundary conditions in a hybrid view. Can only be used in conjunction with --offload-predicate for --offload-predicate-type values of RANGE, LIST_AS_RANGE, RANGE_AND_PREDICATE or LIST_AS_RANGE_AND_PREDICATE.

Alias

None

Default Value

None

Supported Values

None

Version Added

3.4.0

--no-verify

Skip the data validation step at the end of an offload.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.3.0

--not-null-columns

Specifies which columns should be created as NOT NULL when offloading a table. Used to override the global OFFLOAD_NOT_NULL_PROPAGATION configuration variable at an offload level. Accepts a CSV list and/or wildcard(s) of valid columns to create as NOT NULL in the backend. Only applies to Google BigQuery, Snowflake or Azure Synapse Analytics backends.

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

4.3.4

--num-buckets

Default number of offload buckets (subpartitions) for an offloaded table, allowing parallel data retrieval. A value of AUTO tunes to a value between 1 and DEFAULT_BUCKETS_MAX.

Alias

None

Default Value

DEFAULT_BUCKETS or AUTO

Supported Values

Integer values or AUTO

Version Added

2.3.0

--num-location-files

Number of external table location files for parallel data retrieval.

Alias

None

Default Value

NUM_LOCATION_FILES

Supported Values

Integer values

Version Added

2.7.2

Note

When offloading or materializing data in Impala then --num-location-files will be aligned with --num-buckets/DEFAULT_BUCKETS

--offload-by-subpartition

Offload a subpartitioned table with Subpartition-Based Offload (i.e. with reference to subpartition keys and high values rather than partition-level information).

Alias

None

Default Value

True for composite partitioned tables that are unsupported for Partition-Based Offload but supported for Subpartition-Based Offload, False for all other tables

Supported Values

None

Version Added

2.7.0

--offload-chunk-column

Splits load data by this column during insert from the load table to the final table. This can be used to manage memory usage.

Alias

None

Default Value

None

Supported Values

Valid column name

Version Added

2.3.0

--offload-chunk-impala-insert-hint

Used to inject a hint into the INSERT AS SELECT moving data from load table to final destination. The absence of a value injects no hint. Impala only.

Alias

None

Default Value

None

Supported Values

SHUFFLE|NOSHUFFLE

Version Added

2.3.0

--offload-distribute-enabled

Distribute data by partition key(s) during the final INSERT operation of an offload. Hive only.

Alias

None

Default Value

OFFLOAD_DISTRIBUTE_ENABLED, true

Supported Values

None

Version Added

2.8.0

--offload-fs-container

The name of the bucket or container to be used when offloading to cloud storage.

Alias

None

Default Value

OFFLOAD_FS_CONTAINER

Supported Values

A cloud storage bucket/container name configured for use by the backend cluster

Version Added

3.0.0

--offload-fs-prefix

A directory path used to prefix database locations within OFFLOAD_FS_SCHEME. When OFFLOAD_FS_SCHEME is inherit HDFS_DATA takes precedence over this setting.

Alias

None

Default Value

OFFLOAD_FS_PREFIX

Supported Values

A valid directory in HDFS or cloud storage

Version Added

3.0.0

--offload-fs-scheme

The filesystem scheme to be used for database and table locations. inherit specifies that all tables created by Offload will not specify a LOCATION clause, they will inherit the location from the parent database. See Integrating with Cloud Storage for details.

Alias

None

Default Value

OFFLOAD_FS_SCHEME, inherit

Supported Values

inherit, hdfs, s3a, adl, abfs, abfss

Version Added

3.0.0

--offload-join

Offload a materialized view of the supplied join(s), allowing join processing to be offloaded. Repeated use of --offload-join allows multiple row sources to be included. See documentation for syntax details.

Alias

None

Default Value

None

Supported Values

See Offload Join Grammar

Version Added

2.3.0

--offload-predicate

Specify a predicate to identify a set of data in a table for offload. Can be used to offload all or some of the data in any table type. See documentation for syntax details.

Alias

None

Default Value

None

Supported Values

See Predicate Grammar

Version Added

3.4.0

--offload-predicate-type

Override the default INCREMENTAL_PREDICATE_TYPE for a partitioned table. Can be used to offload LIST partitioned tables using RANGE logic with an --offload-predicate-type value of LIST_AS_RANGE or used for specialized cases of offloading with Partition-Based Offload and Predicate-Based Offload.

Alias

None

Default Value

None

Supported Values

LIST, LIST_AS_RANGE, RANGE, RANGE_AND_PREDICATE, LIST_AS_RANGE_AND_PREDICATE, PREDICATE

Version Added

3.3.1

--offload-sort-enabled

Sort/cluster data during the final INSERT operation of an offload. Configure sort/cluster columns using --sort-columns.

Alias

None

Default Value

OFFLOAD_SORT_ENABLED, false

Supported Values

None

Version Added

2.7.0

--offload-stats

Method used to manage backend table stats during an Offload, Incremental Update Extraction or Compaction. NATIVE is the default. HISTORY will gather stats on all partitions without stats (applicable to an Offload on Hive only and will automatically be replaced with NATIVE on Impala). COPY will copy table statistics from the RDBMS to an offloaded table if the backend system supports setting of statistics. NONE will prevent Offload from managing stats; for Hive this results in no stats being gathered even if hive.stats.autogather=true is set at the system level.

Alias

None

Default Value

NATIVE

Supported Values

NATIVE|HISTORY|COPY|NONE

Version Added

2.4.7 (HISTORY added in 2.9.0)

--offload-transport

Method used to transport data from an RDBMS frontend to a backend system. AUTO selects the optimal method based on configuration and table structure.

Alias

None

Default Value

OFFLOAD_TRANSPORT, AUTO

Supported Values

AUTO|GLUENT|SQOOP

Version Added

3.1.0

--offload-transport-cmd-host

An override for HDFS_CMD_HOST when running shell based Offload Transport commands such as Sqoop or Spark Submit.

Alias

None

Default Value

OFFLOAD_TRANSPORT_CMD_HOST

Supported Values

Hostname or IP address of HDFS host

Version Added

3.1.0

--offload-transport-consistent-read

Control whether parallel data transport tasks should use a consistent point in time when reading RDBMS data.

Alias

None

Default Value

OFFLOAD_TRANSPORT_CONSISTENT_READ, true

Supported Values

true|false

Version Added

3.1.0

--offload-transport-dsn

Database connection details for Offload Transport if different to ORA_CONN.

Alias

None

Default Value

OFFLOAD_TRANSPORT_DSN, ORA_CONN

Supported Values

<hostname>:<port>/<service>

Version Added

3.1.0

--offload-transport-fetch-size

Number of records to fetch in a single batch from the RDBMS during Offload. Offload Transport may encounter memory pressure if a table is very wide (e.g. contains LOB columns) and there are lots of records in a batch. Reducing the fetch size can alleviate this if more memory cannot be allocated.

Alias

None

Default Value

OFFLOAD_TRANSPORT_FETCH_SIZE, 5000

Supported Values

Positive integers

Version Added

3.1.0

--offload-transport-jvm-overrides

JVM overrides (inserted right after sqoop import or spark-submit).

Alias

None

Default Value

SQOOP_OVERRIDES, OFFLOAD_TRANSPORT_SPARK_OVERRIDES

Supported Values

See SQOOP_OVERRIDES and OFFLOAD_TRANSPORT_SPARK_OVERRIDES

Version Added

3.1.0

--offload-transport-livy-api-url

URL for Livy/Spark REST API in the format http://fqdn-n.example.com:port. https can be used in place of http.

Alias

None

Default Value

OFFLOAD_TRANSPORT_LIVY_API_URL

Supported Values

Valid Livy REST API URL

Version Added

3.1.0

--offload-transport-livy-idle-session-timeout

Timeout (in seconds) for idle Spark client sessions created in Livy.

Alias

None

Default Value

OFFLOAD_TRANSPORT_LIVY_IDLE_SESSION_TIMEOUT, 600

Supported Values

Positive integers

Version Added

3.1.0

--offload-transport-livy-max-sessions

Limits the number of Livy sessions Offload will create. Sessions are re-used when idle. New sessions are only created when no idle sessions are available.

Alias

None

Default Value

OFFLOAD_TRANSPORT_LIVY_MAX_SESSIONS, 10

Supported Values

Positive integers

Version Added

3.1.0

--offload-transport-parallelism

The number of parallel streams to be used when transporting data from the source RDBMS to the backend.

Alias

None

Default Value

OFFLOAD_TRANSPORT_PARALLELISM, 2

Supported Values

Positive integers

Version Added

3.1.0

--offload-transport-password-alias

An alias provided by Hadoop Credential Provider API to be used for RDBMS authentication during Offload Transport. The key store containing the alias must be specified in either OFFLOAD_TRANSPORT_CREDENTIAL_PROVIDER_PATH or in Hadoop configuration Path (hadoop.security.credential.provider.path).

Alias

None

Default Value

OFFLOAD_TRANSPORT_PASSWORD_ALIAS

Supported Values

Valid Hadoop Credential Provider API alias

Version Added

3.1.0

--offload-transport-queue-name

YARN queue name to be used for Offload Transport jobs.

Alias

None

Default Value

SQOOP_QUEUE_NAME, OFFLOAD_TRANSPORT_SPARK_QUEUE_NAME

Supported Values

See SQOOP_QUEUE_NAME and OFFLOAD_TRANSPORT_SPARK_QUEUE_NAME

Version Added

3.1.0

--offload-transport-small-table-threshold

Threshold above which Query Import is no longer considered the correct offload choice for non-partitioned tables.

Alias

None

Default Value

OFFLOAD_TRANSPORT_SMALL_TABLE_THRESHOLD or 20M

Supported Values

E.g. 100M, 1G, 1.5G

Version Added

3.1.0

--offload-transport-spark-properties

Key/value pairs, in JSON format, to override Spark property defaults. Examples:

'{"spark.driver.memory": "8G", "spark.executor.memory": "8G"}'

'{"spark.driver.extraJavaOptions": "-Doracle.net.wallet_location=/some/path/here/gluent_wallet", "spark.executor.extraJavaOptions": "-Doracle.net.wallet_location=/some/path/here/gluent_wallet"}'

Alias

None

Default Value

OFFLOAD_TRANSPORT_SPARK_PROPERTIES

Supported Values

Valid JSON string of key/value pairs (no nested or complex data types)

Version Added

3.1.0

--offload-transport-spark-thrift-host

Name of host(s) where the Spark Thrift Server is running. Can be a comma-separated list of hosts to randomly choose from, e.g. hadoop1,hadoop2,hadoop3.

Alias

None

Default Value

OFFLOAD_TRANSPORT_SPARK_THRIFT_HOST

Supported Values

Hostname or IP address of Spark Thrift Server host(s)

Version Added

3.1.0

--offload-transport-spark-thrift-port

Port that the Spark Thrift Server is listening on.

Alias

None

Default Value

OFFLOAD_TRANSPORT_SPARK_THRIFT_PORT

Supported Values

Active port

Version Added

3.1.0

--offload-transport-validation-polling-interval

Polling interval in seconds for validation of Spark transport row count. A value of -1 disables retrieval of RDBMS SQL statistics. A value of 0 disables polling resulting in a single capture of SQL statistics after Offload Transport. A value greater than 0 polls RDBMS SQL statistics using the specified interval.

Alias

None

Default Value

OFFLOAD_TRANSPORT_VALIDATION_POLLING_INTERVAL

Supported Values

Interval value in seconds, 0 or -1

Version Added

4.2.1

--offload-type

Identifies a range-partitioned offload as FULL or INCREMENTAL. FULL dictates that all data is offloaded. INCREMENTAL dictates that data up to a boundary threshold will be offloaded.

Alias

None

Default Value

INCREMENTAL for RDBMS tables capable of supporting Partition-Based Offload that are partially offloaded (e.g. using --older-than-date). FULL for all other offloads.

Supported Values

FULL|INCREMENTAL

Version Added

2.5.0

--older-than-date

Offload partitions older than this date (use YYYY-MM-DD format). Overrides --older-than-days if both are present.

Alias

None

Default Value

None

Supported Values

Date in YYYY-MM-DD format

Version Added

2.3.0

--older-than-days

Offload partitions older than this number of days (exclusive, i.e. the boundary partition is not offloaded). Suitable for keeping data up to a certain age in the source table. Alternative to --older-than-date option. If both are supplied, --older-than-date will be used.

Alias

None

Default Value

None

Supported Values

Valid number of days

Version Added

2.3.0

--partition-columns

Override column(s) to use for partitioning backend data. Defaults to source table partition columns.

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

2.3.0

--partition-digits

Maximum digits allowed for a numeric partition value.

Alias

None

Default Value

15

Supported Values

Integer values

Version Added

2.3.0

--partition-functions

Custom UDF to use for synthetic partitioning of offloaded data. Used when no native partitioning scheme exists for the partition column data type. Google BigQuery only.

Alias

None

Default Value

None

Supported Values

Valid custom UDF

Version Added

4.2.0

--partition-granularity

Partition level/granularity. Use:

  • Y, M, D for date/timestamp partition columns

  • Integral size for numeric partitions. A value of 1 is effectively list partitioning

  • Sub-string length for string partitions

Examples:

  • M partitions the table by Year-Month

  • D partitions the table by Year-Month-Day

  • 5000 partitions the table in ranges of 5000 values

  • 1 creates a partition per value, useful for columns holding values such as year and month or categories

  • 2 on a string partition key partitions using the first two characters

Alias

None

Default Value

See Partition Granularity

Supported Values

Y|M|D|\d+

Version Added

2.3.0

--partition-lower-value

Integer value defining the lower bound of a range values used for backend integer range partitioning. BigQuery only.

Alias

None

Default Value

None

Supported Values

Positive integers

Version Added

4.0.0

--partition-names

Specify partitions to be included for offload with Partition-Based Offload. For range-partitioned tables only a single partition name can be specified and it is used to derive a value for --less-than-value/--older-than-date as appropriate. For list-partitioned tables, this option is used to supply a CSV of all partitions to be offloaded and is additional to any partitions offloaded in previous operations.

Alias

None

Default Value

None

Supported Values

Valid partition name(s)

Version Added

3.3.0

--partition-upper-value

Integer value defining the upper bound of a range of values used for backend integer range partitioning. BigQuery only.

Alias

None

Default Value

None

Supported Values

Positive integers

Version Added

4.0.0

--preserve-load-table

Stops the load table being dropped on completion of offload.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.3.0

--purge

When supported by the backend system, utilize purge when removing a table due to --reset-backend-table.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.4.9

--reset-backend-table

Remove the backend table before offloading. Use with caution as this will delete previously offloaded data for this table.

Alias

None

Default Value

None

Supported Values

None

Version Added

3.3.0

--reset-hybrid-view

Reset Partition-Based Offload, Subpartition-Based Offload or Predicate-Based Offload predicates in the hybrid view.

Alias

None

Default Value

None

Supported Values

None

Version Added

3.3.0

--skip-steps

Skip given steps. CSV of step IDs to be skipped. Step IDs are found by replacing spaces with underscore and are case-insensitive.

For example, it is possible to skip Impala compute statistics commands using a value of Compute_backend_statistics if an initial offload is being performed in stages, and then gather them with the final offload command.

Alias

None

Default Value

None

Supported Values

Valid offload step names

Version Added

2.3.0

--sort-columns

CSV list of columns used to sort or cluster data when inserting into the final destination table. Offloads using Partition-Based Offload or Subpartition-Based Offload will retrieve the value used by the prior offload if no list of columns is explicitly provided. This option has no effect when OFFLOAD_SORT_ENABLED/--offload-sort-enabled is false.

When using Offload Join the column names in --sort-columns must match those in the final destination table (not the names used in the source tables).

This option supports the wildcard character * in column names.

Alias

None

Default Value

None for non-partitioned source tables, --partition-columns for partitioned source tables

Supported Values

Valid column name(s)

Version Added

2.7.0

--sqoop-disable-direct

It is recommended that the OraOOP optimizations for Sqoop (included in standard Apache Sqoop from v1.4.5) are used. If not, then disable direct path mode.

Alias

None

Default Value

SQOOP_DISABLE_DIRECT, false

Supported Values

true|false

Version Added

2.3.0

--sqoop-mapreduce-map-java-opts

Sqoop specific setting for -Dmapreduce.map.java.opts. Allows control over Java options for Sqoop MapReduce jobs.

Alias

None

Default Value

None

Supported Values

Valid Sqoop Java options

Version Added

2.3.0

--sqoop-mapreduce-map-memory-mb

Sqoop specific setting for -Dmapreduce.map.memory.mb. Allows control over memory allocation for Sqoop MapReduce jobs.

Alias

None

Default Value

None

Supported Values

Valid numbers in MB

Version Added

2.3.0

--sqoop-additional-options

Additional Sqoop command options added to the end of the Sqoop command.

Alias

None

Default Value

SQOOP_ADDITIONAL_OPTIONS

Supported Values

Any Sqoop command option/argument not already included in the Sqoop command line

Version Added

2.9.0

--sqoop-password-file

Path to an HDFS file containing ORA_APP_PASS which is then passed to Sqoop using the Sqoop --password-file option. This file should be protected with appropriate file system permissions.

Alias

None

Default Value

SQOOP_PASSWORD_FILE

Supported Values

Valid HDFS path

Version Added

2.5.0

--storage-compression

Storage compression of final offload table. GZIP only available with Parquet. ZLIB only available with ORC.

MED is an alias for SNAPPY on both Impala and Hive. This is the default value because it gives the best balance of elapsed time to compression.

HIGH is an alias for GZIP on Impala, ZLIB on Hive.

Alias

None

Default Value

MED

Supported Values

HIGH|MED|NONE|GZIP|ZLIB|SNAPPY

Version Added

2.3.0

--storage-format

Storage format of final backend table. Not applicable to Google BigQuery or Snowflake.

Alias

None

Default Value

PARQUET for Impala, ORC for Hive

Supported Values

ORC|PARQUET

Version Added

2.3.0

--timestamp-tz-columns

CSV list of columns to offload as a timestamp with time zone (will only be effective for date-based columns).

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

4.0.0

--udf-db

Backend database to use for user-defined functions (UDFs).

Gluent Data Platform UDFs are used in Hadoop-based backends to:

  • Convert data to Oracle Database binary formats (ORACLE_NUMBER, ORACLE_DATE)

  • Perform Run-Length Encoding

  • Handle data conversion functions e.g. UPPER, LOWER

They are installed once during installation, and upgraded, using the connect --install-udfs command.

Custom UDFs can also be created by users in BigQuery and used by Gluent Data Platform for synthetic partitioning. Custom UDFs must be installed prior to running any offload commands that require access to them.

Alias

None

Default Value

OFFLOAD_UDF_DB

Supported Values

Valid backend database

Version Added

2.3.0

--unicode-string-columns

CSV list of columns to Offload as Unicode string (only effective for string columns).

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

4.3.0

--variable-string-columns

CSV list of columns to offload as a variable length string. Only effective for date/timestamp columns.

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

3.3.0

--verify

Validation method to use when verifying data at the end of an offload.

Alias

None

Default Value

minus

Supported Values

minus|aggregate

Version Added

2.3.0

--verify-parallelism

Degree of parallelism to use for the RDBMS query executed when validating an offload. Values of 0 or 1 will execute the query without parallelism. Values > 1 will force a parallel query of the given degree. If unset, the RDBMS query will fall back to using the behavior specified by RDBMS defaults.

Alias

None

Default Value

OFFLOAD_VERIFY_PARALLELISM

Supported Values

0 and positive integers

Version Added

4.2.1

Present Parameters

--aggregate-by

CSV list of columns to aggregate by (GROUP BY) when presenting an Advanced Aggregation Pushdown rule.

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

2.3.0

--base-name

For aggregations only. Provide the name of the base hybrid view originally presented before aggregation. Use when the base view name is different to its source backend table.

Alias

None

Default Value

None

Supported Values

<SCHEMA>.<VIEW_NAME>

Version Added

2.3.0

--binary-columns

CSV list of columns to present using a binary data type. Only effective for string-based columns.

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

3.3.0

--columns

CSV list of columns to present.

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

2.3.0

--count-star-expressions

CSV list of functional equivalents to COUNT(*) for aggregation pushdown.

If you also use COUNT(x) in your SQL statements then, apart from COUNT(1) which is automatically catered for, the presence of COUNT(x) will cause rewrite rules to fail unless you include it with this parameter.

Alias

None

Default Value

None

Supported Values

E.g. COUNT(9)

Version Added

2.3.0

--data-governance-custom-properties

CSV list of free-format tags for data governance metadata. These are in addition to DATA_GOVERNANCE_AUTO_PROPERTIES and will override DATA_GOVERNANCE_CUSTOM_PROPERTIES.

Alias

None

Default Value

DATA_GOVERNANCE_CUSTOM_PROPERTIES

Supported Values

Valid JSON string of key/value pairs (no nested or complex data types)

Version Added

2.11.0

--data-governance-custom-tags

CSV list of free-format tags for data governance metadata. These are in addition to DATA_GOVERNANCE_AUTO_TAGS and therefore useful for tags to be applied to specific activities.

Alias

None

Default Value

DATA_GOVERNANCE_CUSTOM_TAGS

Supported Values

E.g. CONFIDENTIAL,TIER1

Version Added

2.11.0

--date-columns

CSV list of columns to present to Oracle Database as DATE (effective for datetime/timestamp columns).

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

2.3.0

--date-fns

CSV list of functions to apply to the non-aggregating date/timestamp projection.

Alias

None

Default Value

MIN, MAX, COUNT

Supported Values

MIN, MAX, COUNT

Version Added

2.3.0

--decimal-columns

CSV list of columns to offload/present as a fixed precision and scale numeric data type. For example DECIMAL(p,s) where “p,s” is specified in a paired --decimal-columns-type option. Only effective for numeric columns. These options allow repeat inclusion for flexible data type specification, for example:

"--decimal-columns-type=18,2 --decimal-columns=price,cost --decimal-columns-type=6,4 --decimal-columns=location"

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

2.5.0

--decimal-columns-type

State the precision and scale of columns listed in a paired --decimal-columns option. Must be of format “precision,scale” where 1<=precision<=38 and 0<=scale<=38 and scale<=precision. e.g.:

"--decimal-columns-type=18,2"

When offloading, values specified in this option are subject to padding as per the --decimal-padding-digits option.

Alias

None

Default Value

None

Supported Values

Valid “precision,scale” where 1<=precision<=38 and 0<=scale<=38 and scale<=precision

Version Added

2.5.0

--detect-sizes

Query backend table/view data length and set external table columns sizes accordingly.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.3.0

--integer-1-columns

CSV list of columns to offload/present (as applicable) as a 1-byte integer, known as TINYINT in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

3.3.0

--integer-2-columns

CSV list of columns to offload/present (as applicable) as a 2-byte integer, known as SMALLINT in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

3.3.0

--integer-4-columns

CSV list of columns to offload/present (as applicable) as a 4-byte integer, known as INT in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

3.3.0

--integer-8-columns

CSV list of columns to offload/present (as applicable) as an 8-byte integer, known as BIGINT in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

3.3.0

--integer-38-columns

CSV list of columns to offload/present (as applicable) as 38 digit integral column. If a system does not support 38 digits of precision then the most appropriate data type available will be used. Only effective for numeric columns.

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

3.3.0

--interval-ds-columns

CSV list of columns to present to Oracle Database as INTERVAL DAY TO SECOND type (will only be effective for backend STRING columns).

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

2.3.0

--interval-ym-columns

CSV list of columns to present to Oracle Database as INTERVAL YEAR TO MONTH type (will only be effective for backend STRING columns).

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

2.3.0

--large-binary-columns

CSV list of columns to present using a large binary data type, for example Oracle Database BLOB. Only effective for string-based columns.

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

3.3.0

--large-string-columns

CSV list of columns to present as a large string data type, for example Oracle Database CLOB. Only effective for string-based columns.

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

3.3.0

--lob-data-length

Expected length of RDBMS LOB data

Alias

None

Default Value

32K

Supported Values

E.g. 64K, 10M

Version Added

2.4.7

--materialize-join

Use this option to materialize a join specified using --present-join.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.3.0

--measures

CSV list of aggregated columns to include in the projection of an aggregated present.

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

2.4.0

--no-create-aggregations

Skip aggregation creation. If this parameter is used, then to benefit from Advanced Aggregation Pushdown the aggregate hybrid objects must be created using Present.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.3.0

--no-gather-stats

Skip generation of new statistics for presented tables/views (default behavior is to generate statistics for new aggregate/join views or existing backend tables with no statistics).

Alias

None

Default Value

None

Supported Values

None

Version Added

2.3.0

--num-location-files

Number of external table location files for parallel data retrieval.

Alias

None

Default Value

NUM_LOCATION_FILES

Supported Values

Integer values

Version Added

2.7.2

--numeric-fns

CSV list of aggregate functions to apply to aggregated numeric columns or measures in an aggregation projection.

Alias

None

Default Value

MIN, MAX, AVG, SUM, COUNT

Supported Values

MIN, MAX, AVG, SUM, COUNT

Version Added

2.3.0

--present-join

Present a view of the supplied join(s) allowing the join processing to be offloaded. Repeated use of --present-join allows multiple row sources to be included. See documentation for syntax.

Alias

None

Default Value

None

Supported Values

See Present Join Grammar

Version Added

2.3.0

--reset-backend-table

Remove the backend table before offloading. Use with caution as this will delete previously offloaded data for this table.

Alias

None

Default Value

None

Supported Values

None

Version Added

3.3.0

--sample-stats

Estimate statistics by scanning a few (random) partitions for presented partitioned tables/views, or a percentage of the non-partitioned presented table/view for backends that support row based percentage sampling (default behavior is to scan the entire table).

Alias

None

Default Value

None

Supported Values

0-100

Version Added

2.3.0

--string-fns

CSV list of aggregate functions to apply to aggregated string columns or measures in an aggregation projection.

Alias

None

Default Value

MIN, MAX, COUNT

Supported Values

MIN, MAX, COUNT

Version Added

2.3.0

--timestamp-columns

CSV list of columns to present as a TIMESTAMP (only effective for date based columns)

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

4.0.0

--unicode-string-columns

CSV list of columns to Present as Unicode string (only effective for string columns).

This option supports the wildcard character * in column names.

Alias

None

Default Value

None

Supported Values

Valid column name(s)

Version Added

4.3.0

Incremental Update Parameters

--incremental-batch-size

Batch (fetch) size to use when extracting changes for shipping from a table that is enabled for Incremental Update.

Alias

None

Default Value

1000

Supported Values

Positive integers

Version Added

2.5.0

--incremental-changelog-sequence-cache-size

Specifies the cache size to use for a sequence coupled to the log table used for Incremental Update extraction.

Alias

None

Default Value

100

Supported Values

Positive integers

Version Added

2.10.0

--incremental-changelog-table

Specifies the name of the log table to use for Incremental Update extraction (format is <OWNER>.<TABLE>). Not required when --incremental-extraction-method is ORA_ROWSCN.

Alias

None

Default Value

<Hybrid Schema>.<Table Name>_LOG

Supported Values

<OWNER>.<TABLE>

Version Added

2.5.0

--incremental-delta-threshold

When running the compaction routine for a table enabled for Incremental Update, this threshold denotes the minimum number of changes required to enable the compaction routine to be executed (i.e. compaction will only be executed if there are at least this many rows in the delta table at a given time).

Alias

None

Default Value

50000

Supported Values

Positive integers

Version Added

2.5.0

--incremental-extraction-method

Indicates which change extraction method to use when enabling Incremental Update for a table during an offload.

Alias

None

Default Value

ORA_ROWSCN

Supported Values

ORA_ROWSCN,CHANGELOG,UPDATABLE_CHANGELOG,UPDATABLE,CHANGELOG_INSERT,UPDATABLE_INSERT

Version Added

2.5.0

--incremental-full-compaction

When running the compaction routine for a table that has Incremental Update enabled, insert compacted records into a new base table, also known as an out-of-place compaction.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.10.0

--incremental-key-columns

Comma-separated list of columns that uniquely identify rows in an offloaded source table. Columns are used when extracting incremental changes from the source table and applying them to the offloaded table. In the absence of this parameter the primary key of the table is used.

This option supports the wildcard character * in column names.

Alias

None

Default Value

Primary key

Supported Values

Comma-separated list of columns

Version Added

2.5.0

--incremental-no-lockfile

When running the compaction routine for a table that is enabled for Incremental Update, do not use a lockfile on the local filesystem to prevent multiple compaction processes from running concurrently (on that machine).

Alias

None

Default Value

None

Supported Values

None

Version Added

2.5.0

--incremental-no-verify-primary-key

Bypass verification of mandatory primary key when using CHANGELOG_INSERT or UPDATABLE_INSERT extraction methods.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.9.0

Warning

With this option, users must ensure that no duplicate records are inserted.

--incremental-no-verify-shipped

Bypass verification of the number of change records shipped when extracting and shipping changes for a table that is enabled for Incremental Update. Not applicable when using Incremental Update with Google BigQuery backends.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.5.0

--incremental-partition-wise-full-compaction

When running the compaction routine for a table that has Incremental Update enabled, insert compacted records into the new base table partition-wise. Note that this may cause the compaction process to take significantly longer overall, but it can also significantly reduce the cluster resources used by compaction at any one time.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.5.0. Renamed from --incremental-partition-wise-compaction in 2.10.0

--incremental-retain-obsolete-objects

Retain the previous artifacts when the compaction routine has completed for a table with Incremental Update enabled.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.5.0

Warning

With this option, users must manage previous artifacts and associated storage. In some circumstances, retained obsolete objects can cause the re-offloading of entire tables (with the --reset-backend-table option) to fail.

--incremental-run-compaction

Run the compaction routine for a table that has Incremental Update enabled. Must be used in conjunction with the --execute parameter.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.5.0

--incremental-run-compaction-without-snapshot

Run the compaction routine for a table without creating an HDFS snapshot.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.10.0

--incremental-run-extraction

Extract and ship all new changes for a table that has Incremental Update enabled. Must be used in conjunction with the --execute parameter.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.5.0

--incremental-terminate-compaction

When running the compaction routine for a table with Incremental Update enabled, instruct the compaction process to exit when blocked by some external condition. By default, the compaction process will keep running when blocked, but will drop into a sleep-then-poll loop.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.5.0

--incremental-tmp-dir

When extracting and shipping changes for a table that has Incremental Update enabled, this specifies the staging directory to be used for local data files, before they are shipped to HDFS.

Alias

None

Default Value

<OFFLOAD_HOME>/tmp/incremental_changes

Supported Values

Valid writable directory

Version Added

2.5.0

--incremental-updates-disabled

Disables Incremental Update for the specified table.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.6.0

--incremental-updates-enabled

Enables Incremental Update for the table being offloaded.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.5.0

--incremental-wait-time

When running the compaction routine for a table that has Incremental Update enabled, this specifies the minimum amount of time (in minutes) to allow for active queries to complete before performing any database operations that could cause such queries to fail.

Alias

None

Default Value

15

Supported Values

0 and positive integers

Version Added

2.5.0

Validate Parameters

--aggregate-functions

Comma-separated list of aggregate functions to apply, e.g. MIN,MAX,COUNT. Functions need to be available and use the same arguments in both frontend and backend databases.

Alias

-A

Default Value

[('MIN', 'MAX', 'COUNT')]

Supported Values

CSV list of expressions

Version Added

2.3.0

--as-of-scn

Execute validation on frontend site as-of a specified SCN (assumes an ORACLE frontend).

Alias

None

Default Value

None

Supported Values

Valid SCN

Version Added

2.3.0

--filters

Comma-separated list of (<column> <operation> <value>) expressions, e.g. PROD_ID < 12, CUST_ID >= 1000. Expressions must be supported in both frontend and backend databases.

Alias

-F

Default Value

None

Supported Values

CSV list of expressions

Version Added

2.3.0

--frontend-parallelism

Degree of parallelism to use for the RDBMS query executed when validating an offload. Values of 0 or 1 will execute the query without parallelism. Values > 1 will force a parallel query of the given degree. If unset, the RDBMS query will fall back to using the behavior specified by RDBMS defaults.

Alias

None

Default Value

OFFLOAD_VERIFY_PARALLELISM

Supported Values

0 and positive integers

Version Added

4.2.1

--group-bys

Comma-separated list of group by expressions, e.g. COL1, COL2. Expressions must be supported in both frontend and backend databases.

This option supports the wildcard character * in column names.

Alias

-G

Default Value

None

Supported Values

csv list of expressions

Version Added

2.3.0

--selects

Comma-separated list of columns OR <number> of columns to run aggregations on. If <number> is specified the first and last columns and the <number>-2 highest cardinality columns will be selected.

This option supports the wildcard character * in column names.

Alias

-S

Default Value

5

Supported Values

CSV list of columns OR <number>

Version Added

2.3.0

--skip-boundary-check

Do not include ‘offloaded boundary check’ in the list of filters. The ‘offloaded boundary check’ filter defines data that was offloaded to the backend database. For example: WHERE TIME_ID < timestamp '2015-07-01 00:00:00' which resulted from applying the --older-than-date=2015-07-01 filter during offload.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.3.0

Schema Sync Parameters

--command-file

Name of an additional log file to record the commands that have been applied (if the --execute option has been used) or should be applied (if the --execute option has not been used). Supplied as full or relative path.

Alias

None

Default Value

None

Supported Values

Full or relative path to file

Version Added

2.8.0

--include

CSV list of schemas, schema.tables or tables to examine for change detection and evolution. Supports wildcards (using *). Example formats: SCHEMA1, SCHEMA*, SCHEMA1.TABLE1,SCHEMA1.TABLE2,SCHEMA2.TAB*, SCHEMA1.TAB*, *.TABLE1,*.TABLE2, *.TAB*.

Alias

None

Default Value

None

Supported Values

List of one or more schema(s), schema(s).table(s) or table(s)

Version Added

2.8.0

--no-create-aggregations

Skip aggregation creation. If this parameter is used, then to benefit from Advanced Aggregation Pushdown the aggregate hybrid objects must be created using Present.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.3.0

Diagnose Parameters

--backend-log-size-limit

Size limit for data returned from each backend log e.g. 100K, 0.5M, 1G.

Alias

None

Default Value

10M

Supported Values

<n><K|M|G|T>

Version Added

2.11.0

--hive-http-endpoint

Endpoint of the HiverServer2 or HiveServer2 Interactive (LLAP) service in the format <server|ip address>:<port>.

Alias

None

Default Value

None

Supported Values

<server|ip address>:<port>

Version Added

3.1.0

--impalad-http-port

Port of the Impala Daemon HTTP Server.

Alias

None

Default Value

25000

Supported Values

Positive integers

Version Added

2.11.0

--include-backend-logs

Retrieve backend query engine logs.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.11.0

--include-backend-config

Retrieve backend query engine config.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.11.0

--include-logs-from

Collate and package log files modified or created since date (format: YYYY-MM-DD) or date/time (format: YYYY-MM-DD_HH24:MM:SS). Can be used in conjunction with the --include-logs-to parameter to specify a search range.

Alias

None

Default Value

None

Supported Values

YYYY-MM-DD or YYYY-MM-DD_HH24:MM:SS

Version Added

2.11.0

--include-logs-last

Collate and package log files modified or created in the last n [d]ays (e.g. 3d) or [h]ours (e.g. 7h).

Alias

None

Default Value

None

Supported Values

<n><d|h>

Version Added

2.11.0

--include-logs-to

Collate and package log files modified or created since date (format: YYYY-MM-DD) or date/time (format: YYYY-MM-DD_HH24:MM:SS). Can be used in conjunction with the --include-logs-from parameter to specify a search range.

Alias

None

Default Value

None

Supported Values

YYYY-MM-DD or YYYY-MM-DD_HH24:MM:SS

Version Added

2.11.0

--include-permissions

Collect permissions of files and directories related to Gluent Data Platform.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.11.0

--include-processes

Collect details for running processes related to Gluent Data Platform.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.11.0

--include-query-logs

Retrieve logs for a supplied query ID.

Alias

None

Default Value

None

Supported Values

Valid Impala/LLAP query ID

Version Added

2.11.0

--log-location

Location in which to search for log files.

Alias

None

Default Value

OFFLOAD_HOME/log

Supported Values

Valid directory path

Version Added

2.11.0

--output-location

Location in which to save files created by Diagnose.

Alias

None

Default Value

OFFLOAD_HOME/log

Supported Values

Valid directory path

Version Added

2.11.0

--retain-created-files

By default, after they have been packaged, files created by Diagnose in --output-location are removed. Specify this parameter to retain them.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.11.0

--spark-application-id

Retrieve logs for a supplied Spark application ID.

Alias

None

Default Value

None

Supported Values

Valid Spark application ID

Version Added

3.1.0

Offload Status Report Parameters

--csv-delimiter

Field delimiter character for output.

Alias

None

Default Value

,

Supported Values

Must be a single character

Version Added

2.11.0

--csv-enclosure

Enclosure character for string fields in CSV output.

Alias

None

Default Value

"

Supported Values

Must be a single character

Version Added

2.11.0

-o

Output format for Offload Status Report data.

Alias

--output-format

Default Value

text

Supported Values

csv|text|html|json|raw

Version Added

2.11.0

--output-format

Output format for Offload Status Report data.

Alias

-o

Default Value

text

Supported Values

csv|text|html|json|raw

Version Added

2.11.0

--output-level

Level of detail required for the Offload Status Report.

Alias

None

Default Value

summary

Supported Values

summary|detail

Version Added

2.11.0

--report-directory

Directory to save the report in.

Alias

None

Default Value

OFFLOAD_HOME/log

Supported Values

Valid directory path

Version Added

2.11.0

--report-name

Name of report.

Alias

None

Default Value

Gluent_Offload_Status_Report_{DB_NAME}_{YYYY}-{MM}-{DD}_{HH}-{MI}-{SS}.[html|txt|csv]

Supported Values

Valid filename

Version Added

2.11.0

-s

Optional name of schema to run the Offload Status Report for.

Alias

--schema

Default Value

None

Supported Values

Valid schema name

Version Added

2.11.0

--schema

Optional name of schema to run the Offload Status Report for.

Alias

-s

Default Value

None

Supported Values

Valid schema name

Version Added

2.11.0

-t

Optional name of table to run the Offload Status Report for.

Alias

--table

Default Value

None

Supported Values

Valid table name

Version Added

2.11.0

--table

Optional name of table to run the Offload Status Report for.

Alias

-t

Default Value

None

Supported Values

Valid table name

Version Added

2.11.0

Password Tool Parameters

--encrypt

Encrypt a clear-text, case-sensitive password. User will be prompted for the input password and the encrypted version will be output.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.5.0

--keygen

Generate a password key file of the name given by --keyfile.

Alias

None

Default Value

None

Supported Values

None

Version Added

2.5.0

--keyfile

Name of the password key file to generate.

Alias

None

Default Value

None

Supported Values

Valid path and file name

Version Added

2.5.0

Result Cache Manager Parameters

--rc-retention-hours

Controls how long to retain Result Cache files for.

Alias

None

Default Value

24

Supported Values

Valid number of hours

Version Added

2.3.0

Oracle Database Schemas

Gluent Data Platform Admin Schema

This account is used by Gluent Data Platform to perform administrative activities. It is defined by ORA_ADM_USER.

Non-standard privileges granted to this schema are:

ANALYZE ANY

Required to copy optimizer statistics from application schema to hybrid schema

GRANT ANY OBJECT PRIVILEGE

Enables the Admin Schema to grant permission on application schema tables to the hybrid schema.

SELECT ANY DICTIONARY

Enables Offload and Present operations to access the Oracle Database data dictionary for information such as column names, data types and partitioning schemes.

SELECT ANY TABLE

Required for Offload activity.

Gluent Data Platform Application Schema

This account is used by Gluent Data Platform to perform read-only activities. It is defined by ORA_APP_USER.

Non-standard privileges granted to this schema are:

FLASHBACK ANY TABLE

Required for Sqoop to provide a consistent point-in-time data load. The Gluent Data Platform application schema does not have DML privileges on user application schema tables, therefore there is no threat posed by this configuration.

SELECT ANY DICTIONARY

Documented requirement of Sqoop.

SELECT ANY TABLE

Required for Sqoop to read application schema tables during an offload.

Gluent Data Platform Repository Schema

This account is used by Gluent Data Platform to store operational metadata. It is defined by ORA_REPO_USER.

Non-standard privileges granted to this schema are:

SELECT ANY DICTIONARY

Enables installed database packages in support of the metadata repository to access the Oracle Database data dictionary.

Hybrid Schemas

Gluent Data Platform hybrid schemas are required to enable remote data to be queried in tandem with customer data in the RDBMS application schema.

Non-standard privileges granted to hybrid schemas are:

CONNECT THROUGH GLUENT_ADM

Offload and Present use this to create hybrid objects without requiring powerful CREATE ANY and DROP ANY privileges.

GLOBAL QUERY REWRITE

Required to support Gluent Query Engine optimizations.

SELECT ANY TABLE

Enables a hybrid view to access the original application schema and offloaded table.

Data Daemon

Properties

The following Java properties can be set by creating a $OFFLOAD_HOME/conf/datad.properties file containing <property>=<value> properties and values.

datad.initial-request-pool-size

The initial size of the thread pool for concurrent read requests from the RDBMS.

Default Value

16

Supported Values

Positive integers

Version Added

4.2.2

datad.max-request-pool-size

The maximum size of the thread pool for concurrent read requests from the RDBMS.

Default Value

1024

Supported Values

Positive integers

Version Added

4.2.2

datad.read-pipeline-size

The number of reads from the backend to keep in the pipeline to be processed.

Default Value

4

Supported Values

Positive integers

Version Added

4.0.0

datad.send-queue-size

The maximum size in MB of the queue to send to the RDBMS.

Default Value

16

Supported Values

Positive integers

Version Added

4.0.0

grpc.port

The port used for Data Daemon. Setting to 0 results in random port selection.

Default Value

50051

Supported Values

Any valid port

Version Added

4.0.0

grpc.security.cert-chain

The full path to the certificate chain in PEM format to enable TLS on the Data Daemon socket.

Default Value

None

Supported Values

file:<full path to PEM file>

Version Added

4.0.0

grpc.security.private-key

The full path to the private key in PEM format to enable TLS on the Data Daemon socket.

Default Value

None

Supported Values

file:<full path to PEM file>

Version Added

4.0.0

logging.config

The full path to a LOGBack format configuration file to override default logging.

Default Value

None

Supported Values

<full path to xml file>

Version Added

4.0.0

logging.level.com.gluent.providers.bigquery.BigQueryProvider

The log level for Data Daemon interactions with BigQuery.

Default Value

info

Supported Values

off|error|warn|info|debug|all

Version Added

4.0.0

logging.level.com.gluent.providers.impala.ImpalaProvider

The log level for Data Daemon interactions with Impala.

Default Value

info

Supported Values

off|error|warn|info|debug|all

Version Added

4.0.0

logging.level.com.gluent.providers.jdbc.JdbcDataProvider

The log level for general Data Daemon operations when interacting with Snowflake and Azure Synapse Analytics.

Default Value

info

Supported Values

off|error|warn|info|debug|all

Version Added

4.1.0

logging.level.com.gluent.providers.snowflake.SnowflakeJdbcDataProvider

The log level for Data Daemon interactions with Snowflake.

Default Value

info

Supported Values

off|error|warn|info|debug|all

Version Added

4.1.0

logging.level.com.gluent.providers.synapse.SynapseProvider

The log level for Data Daemon interactions with Azure Synapse Analytics.

Default Value

info

Supported Values

off|error|warn|info|debug|all

Version Added

4.3.0

server.port

The port used for Data Daemon Web Interface. Setting to 0 results in random port selection.

Default Value

50052

Supported Values

Any valid port

Version Added

4.0.0

spring.main.web-application-type

Allows Data Daemon Web Interface to be disabled.

Default Value

None

Supported Values

NONE

Version Added

4.0.0

Configuration

The following Java configuration options can be set by creating a $OFFLOAD_HOME/conf/datad.conf file containing JAVA_OPTS="<parameter1> <parameter2> ..." e.g. JAVA_OPTS="-Xms2048m -Xmx2048m -Djavax.security.auth.useSubjectCredsOnly=false".

-Xms

Sets the initial and minimum Java heap size.

Default Value

Smaller of 1/4th of the physical memory or 1GB

Supported Values

-Xms<size>[g|G|m|M|k|K]

Version Added

4.0.0

-Xmx

Sets the maximum Java heap size.

Default Value

Larger of 1/64th of the physical memory or some reasonable minimum

Supported Values

-Xmx<size>[g|G|m|M|k|K]

Version Added

4.0.0

-Djavax.security.auth.useSubjectCredsOnly

Required to be set to false when authenticating with a Kerberos enabled backend.

Default Value

true

Supported Values

true|false

Version Added

4.0.0

Documentation Feedback

Send feedback on this documentation to: feedback@gluent.com