Reference¶
Table of Contents
Documentation Conventions¶
Commands and keywords are in this
font
.$OFFLOAD_HOME
is set when the environment file (offload.env
) is sourced, unless already set, and refers to the directory namedoffload
that is created when the software is unpacked. This is also referred to as<OFFLOAD_HOME>
in sections of this guide where the environment file has not been created/sourced.Third party vendor product names might be aliased or shortened for simplicity. See Third Party Vendor Products for cross-references to full product names and trademarks.
Environment File¶
-
AWS_ACCESS_KEY_ID
¶ Access key ID for AWS authentication, required when staging offloaded data to S3 and not using either an AWS credentials file or instance-level permissions.
Supported Values
Valid AWS access key ID
Version Added
4.1.0
-
AWS_SECRET_ACCESS_KEY
¶ Secret access key for AWS authentication, required when staging offloaded data to S3 and not using either an AWS credentials file or instance-level permissions.
Supported Values
Valid AWS secret access key
Version Added
4.1.0
-
BACKEND_DISTRIBUTION
¶ Backend system distribution override.
Supported Values
CDH|GCP|MSAZURE|SNOWFLAKE
Version Added
2.3.0
-
BACKEND_IDENTIFIER_CASE
¶ Case conversion to be applied to any backend identifier names created by Gluent Data Platform. Backend systems may ignore any case conversion if they are case-insensitive.
Supported Values
UPPER|LOWER|NO_MODIFY
Version Added
4.0.0
-
BACKEND_ODBC_DRIVER_NAME
¶ Name of the Microsoft ODBC driver as specified in
odbcinst.ini
.Supported Values
Valid
odbcinst.ini
entryVersion Added
4.3.0
-
BIGQUERY_DATASET_LOCATION
¶ Google BigQuery location to use when creating a dataset. Only applicable when creating datasets using the
--create-backend-db
option.Supported Values
Any valid Google BigQuery location
Version Added
4.0.2
Note
Google BigQuery dataset locations must be compatible with that of the Google Cloud Storage bucket specified in OFFLOAD_FS_CONTAINER
-
CLASSPATH
¶ Ensures Gluent
lib
directory is included.Supported Values
Valid paths
Version Added
2.3.0
-
CLOUDERA_NAVIGATOR_HIVE_SOURCE_ID
¶ The Cloudera Navigator entity ID for the Hive source that will register metadata. See the Installation and Upgrade guide for details on how to set this parameter.
Supported Values
Valid Cloudera Navigator entity ID
Version Added
2.11.0
-
CONNECTOR_HIVE_SERVER_HOST
¶ Name of host(s) to use to connect to Impala/Hive. Can be a comma-separated list of hosts to randomly choose from, e.g.
hadoop1,hadoop2,hadoop3
. Use when configuring Gluent Query Engine to connect to a different Cloudera Data Platform experience to Gluent Offload Engine (e.g. Data Warehouse rather than Data Hub). If unset, all connections will be made toHIVE_SERVER_HOST
.Supported Values
Hostname or IP address of Impala/Hive host(s)
Version Added
4.1.0
-
CONNECTOR_HIVE_SERVER_HTTP_PATH
¶ Path component of URL endpoint when connecting to HiveServer2 in HTTP mode (i.e. when
HIVE_SERVER_HTTP_TRANSPORT
istrue
). Use when configuring Gluent Query Engine to connect to a different Cloudera Data Platform experience to Gluent Offload Engine (e.g. Data Warehouse rather than Data Hub). If unset, all connections will be made withHIVE_SERVER_HTTP_PATH
.Supported Values
Valid URL path
Version Added
4.1.0
-
CONNECTOR_SQL_ENGINE
¶ SQL engine used by Gluent Query Engine for hybrid queries.
Default Value
IMPALA
Supported Values
IMPALA|BIGQUERY|SNOWFLAKE|SYNAPSE
Version Added
3.1.0
-
CONN_PRE_CMD
¶ Used to set pre-commands before query execution, e.g.
set hive.execution.engine=tez;
.Supported Values
Supported session
set
parametersVersion Added
2.3.0
-
DATA_GOVERNANCE_API_PASS
¶ Password for the account specified in
DATA_GOVERNANCE_API_USER
. Password encryption is supported using the Password Tool utility.Supported Values
Cloudera Navigator service account password
Version Added
2.11.0
-
DATA_GOVERNANCE_API_URL
¶ URL for a data governance REST API in the format
http://fqdn-n.example.com:port/api
. Leaving this configuration item blank disables data governance integration.Supported Values
Valid Cloudera Navigator REST API URL
Version Added
2.11.0
-
DATA_GOVERNANCE_API_USER
¶ Service account to be used to connect to a data governance REST API.
Supported Values
Cloudera Navigator service account name
Version Added
2.11.0
-
DATA_GOVERNANCE_AUTO_PROPERTIES
¶ CSV string of dynamic properties to include in data governance metadata. The tokens in the CSV will be expanded at runtime if prefixed with
+
or ignored if prefixed with-
.Supported Values
CSV containing the following tokens prefixed with either
+
or-
.GLUENT_OBJECT_TYPE
,SOURCE_RDBMS_TABLE
,TARGET_RDBMS_TABLE
,INITIAL_GLUENT_VERSION
,LATEST_GLUENT_VERSION
,INITIAL_OPERATION_DATETIME
,LATEST_OPERATION_DATETIME
Version Added
2.11.0
-
DATA_GOVERNANCE_AUTO_TAGS
¶ CSV string of tags to include in data governance metadata. Tags are free-format except for
+RDBMS_NAME
which is expanded at run time.Default Value:
GLUENT,+RDBMS_NAME
Supported Values
CSV containing tags to attach to data governance metadata
Version Added
2.11.0
-
DATA_GOVERNANCE_BACKEND
¶ Specify the data governance API type accessed via
DATA_GOVERNANCE_API_URL
.Supported Values
navigator
Version Added
2.11.0
-
DATA_GOVERNANCE_CUSTOM_PROPERTIES
¶ JSON string of key/value pairs to include in data governance metadata.
Associated Option
Supported Values
Valid JSON string of key/value pairs (no nested or complex data types)
Version Added
2.11.0
-
DATA_GOVERNANCE_CUSTOM_TAGS
¶ CSV string of tags to include in data governance metadata.
Associated Option
Supported Values
CSV containing tags to attach to data governance metadata
Version Added
2.11.0
-
DATA_SAMPLE_PARALLELISM
¶ Degree of parallelism to use when sampling data for all columns in the source RDBMS table that are either date/timestamp-based or defined as a number without a precision and scale. A value of 0 or 1 disables parallelism.
Associated Option
Default Value
0
Supported Values
0
and positive integersVersion Added
4.2.0
-
DATAD_ADDRESS
¶ The address(es) of Data Daemon. For a single daemon the format is
<hostname/IP address>:<port>
. Specifying multiple daemons can be achieved in one of two ways:By DNS address. The DNS server can return multiple A records for a hostname and Gluent Data Platform will load balance between these, e.g.
<load-balancer-address>:<load-balancer-port>
By IP address and port. The comma-separated list must be prefixed with
ipv4:
e.g.ipv4:<hostname/IP address>:<port>,<hostname/IP address>:<port>
Supported Values
<hostname/IP address>:<port>
,<load-balancer-address>:<load-balancer-port>
,ipv4:<hostname/IP address>:<port>,<hostname/IP address>:<port>
Version Added
4.0.0
-
DATAD_SSL_ACTIVE
¶ Set to
true
when TLS is enabled on the Data Daemon socket.Supported Values
true|false
Version Added
4.0.0
-
DATAD_SSL_TRUSTED_CERTS
¶ The trusted certificate when TLS is enabled on the Data Daemon socket.
Supported Values
Full path to the trusted certificate
Version Added
4.0.0
-
DATAD_WEB_PASS
¶ Password for authentication with Data Daemon Web Interface (if configured). Password encryption is supported using the Password Tool utility.
Supported Values
Data Daemon Web Interface user password
Version Added
4.1.0
-
DATAD_WEB_USER
¶ User for authentication with Data Daemon Web Interface (if configured).
Supported Values
Data Daemon Web Interface username
Version Added
4.1.0
-
DB_NAME_PREFIX
¶ Database name/path prefix for multitenant support. This allows multiple Oracle Database databases to offload to the same backend cluster. If undefined, the
DB_UNIQUE_NAME
will be used, giving<DB_UNIQUE_NAME>_<schema>
. If defined but empty, no prefix is used, giving<schema>
. Otherwise, databases will be named<DB_NAME_PREFIX>_<schema>
.If the source database is part of an Oracle Data Guard configuration set
DB_NAME_PREFIX
to ensure thatDB_UNIQUE_NAME
is not used.Associated Option
Supported Values
Characters supported by the backend database/dataset/schema-naming rules
Version Added
2.3.0
-
DEFAULT_BUCKETS
¶ Default number of offload buckets for parallel data retrieval from the backend Hadoop system. If you aim to run your biggest queries with parallel DOP X then set this value to X. This way each Oracle Database PX slave can start its own Smart Connector process for fetching a subset of data.
Associated Option
Supported Values
Valid Oracle Database DOP
Version Added
2.3.0
-
DEFAULT_BUCKETS_MAX
¶ Upper limit of
DEFAULT_BUCKETS
whenDEFAULT_BUCKETS=AUTO
.Default Value
16
Supported Values
Valid Oracle Database DOP
Version Added
2.7.0
-
DEFAULT_BUCKETS_THRESHOLD
¶ Threshold at which RDBMS segments are considered “small” by
DEFAULT_BUCKETS=AUTO
tuning.Supported Values
E.g.
3M
,0.5G
Version Added
2.7.0
-
GOOGLE_APPLICATION_CREDENTIALS
¶ Path to Google service account private key JSON file.
Supported Values
Valid paths
Version Added
4.0.0
-
GOOGLE_KMS_KEY_NAME
¶ Google Cloud Key Management Service cryptographic key name to use for encryption and decryption operations. The purpose of this key must be Symmetric encryption.
Supported Values
Valid KMS key name
Version Added
4.2.0
-
GOOGLE_KMS_KEY_RING_NAME
¶ Google Cloud Key Management Service cryptographic key ring name containing the key defined in
GOOGLE_KMS_KEY_NAME
Supported Values
Valid KMS key ring name
Version Added
4.2.0
-
GOOGLE_KMS_KEY_RING_LOCATION
¶ Google Cloud Key Management Service cryptographic key ring location of the key ring defined in
GOOGLE_KMS_KEY_RING_NAME
Supported Values
Valid Google Cloud Service locations
Version Added
4.2.0
-
HADOOP_SSH_USER
¶ User to connect to Hadoop server(s) defined in
HIVE_SERVER_HOST
using password-less SSH.Supported Values
Valid host username
Version Added
2.3.0
-
HDFS_CMD_HOST
¶ Overrides
HIVE_SERVER_HOST
for the HDFS command steps only. In split installation environments where orchestration commands are run from a Hadoop edge node(s), set this tolocalhost
in the Hadoop edge node(s) configuration file.Supported Values
Hostname or IP address of HDFS host
Version Added
2.3.0
-
HDFS_DATA
¶ HDFS data directory of the
HIVE_SERVER_USER
. Used to store offloaded data.Associated Option
Supported Values
Valid HDFS directory
Version Added
2.3.0
-
HDFS_DB_PATH_SUFFIX
¶ Hadoop databases are named
<schema><HDFS_DB_PATH_SUFFIX>
and<schema>_load<HDFS_DB_PATH_SUFFIX>
. When this value is not set the suffix of the databases defaults to.db
, giving<schema>.db
and<schema>_load.db
. Set this to an empty string to use no suffix. For backend systems other than Hadoop this variable has no effect.Associated Option
Supported Values
Hostname or IP address of HDFS host
Version Added
2.3.0
-
HDFS_HOME
¶ HDFS home directory of the
HIVE_SERVER_USER
.Supported Values
Valid HDFS directory
Version Added
2.3.0
-
HDFS_LOAD
¶ HDFS data directory of the
HIVE_SERVER_USER
. Used to stage offloaded data.Supported Values
Valid HDFS directory
Version Added
3.4.0
-
HDFS_NAMENODE_ADDRESS
¶ Hostname or IP address of the active HDFS namenode or the ID of the HDFS nameservice if HDFS High Availability is configured. This value is required in order to execute result cache queries. In a deployment where result cache queries will never be used, this variable can safely be unset.
Supported Values
Hostname or IP address of active HDFS namenode or ID of the HDFS nameservice if HDFS High Availability is configured
Version Added
2.3.0
-
HDFS_NAMENODE_PORT
¶ Port of the active HDFS namenode. Set to
0
if HDFS High Availability is configured andHDFS_NAMENODE_ADDRESS
is set to a nameservice ID. As withHDFS_NAMENODE_ADDRESS
, this value is necessary for executing result cache queries, but otherwise can safely be unset.Supported Values
Port of active HDFS namenode or
0
if HDFS High Availability is configuredVersion Added
2.3.0
-
HDFS_RESULT_CACHE_USER
¶ Hadoop user to impersonate when making HDFS requests for result cache queries; must have write permissions to HDFS_HOME. In a deployment where result cache queries will never be used, this variable can safely be unset.
Default Value
Supported Values
Hadoop username
Version Added
2.3.0
-
HDFS_SNAPSHOT_PATH
¶ Before an Incremental Update compaction a HDFS snapshot will be automatically created in the location specified by HDFS_SNAPSHOT_PATH. This location must be a snapshottable directory (consult your HDFS administrators to enable this). When changing HDFS_SNAPSHOT_PATH from the default ensure that it remains a parent directory of
HDFS_DATA
. Unsetting this variable will disable automatic HDFS snapshots.Default Value
Supported Values
HDFS path that is equal to or a parent of
HDFS_DATA
Version Added
2.10.0
-
HDFS_SNAPSHOT_SUDO_COMMAND
¶ If
HADOOP_SSH_USER
is not the inode owner ofHDFS_SNAPSHOT_PATH
then HDFS superuser rights will be required to take HDFS snapshots. Asudo
rule (or equivalent user substitution tool) can be used to enable this using HDFS_SNAPSHOT_SUDO_COMMAND. The command must be password-less.Supported Values
A valid user-substitution command
Version Added
2.10.0
-
HIVE_SERVER_AUTH_MECHANISM
¶ Authentication mechanism for HiveServer2. In non-kerberized and non-LDAP environments, should be set to: Impala:
NOSASL
, Hive: value ofhive-site.xml
:hive.server2.authentication
. In LDAP environments, should be set toPLAIN
.Supported Values
NOSASL|PLAIN
, value ofhive-site.xml
:hive.server2.authentication
Version Added
2.3.0
-
HIVE_SERVER_HOST
¶ Name of host(s) to connect to Impala/Hive. Can be a comma-separated list of hosts to randomly choose from, e.g.
hadoop1,hadoop2,hadoop3
.Supported Values
Hostname or IP address of Impala/Hive host(s)
Version Added
2.3.0
-
HIVE_SERVER_HTTP_PATH
¶ Path component of URL endpoint when connecting to HiveServer2 in HTTP mode (i.e. when
HIVE_SERVER_HTTP_TRANSPORT
istrue
).Supported Values
Valid URL path
Version Added
4.1.0
-
HIVE_SERVER_HTTP_TRANSPORT
¶ Use HTTP transport for HiveServer2 connections.
Default Value
false
Supported Values
true|false
Version Added
4.1.0
-
HIVE_SERVER_PASS
¶ Password of the user to authenticate with HiveServer2 service. Required in LDAP enabled Impala configurations. Password encryption is supported using the Password Tool utility.
Supported Values
HiveServer2 service password
Version Added
2.3.0
-
HIVE_SERVER_PORT
¶ Port of HiveServer2 service. Default Impala port is
21050
, default Hive port is10000
.Default Value
21050|10000
Supported Values
Port of HiveServer2 service
Version Added
2.3.0
-
HIVE_SERVER_USER
¶ Name of the user to authenticate with HiveServer2 service.
Supported Values
HiveServer2 service username
Version Added
2.3.0
-
HYBRID_EXT_TABLE_DEGREE
¶ Default degree of parallelism for base hybrid external tables. When set to
AUTO
Offload will copy settings from the source RDBMS table to the hybrid external table.Associated Option
Supported Values
AUTO
and positive integersVersion Added
2.11.2
-
HS2_SESSION_PARAMS
¶ Comma-separated list of HiveServer2 session parameters to set.
BATCH_SIZE=16384
is a recommended performance setting.E.g.
export HS2_SESSION_PARAMS="BATCH_SIZE=16384,MEM_LIMIT=2G"
.Supported Values
Valid Impala/Hive session parameters
Version Added
2.3.0
-
IN_LIST_JOIN_TABLE
¶ Database and table name of the in-list join table. Can be created and populated with
./connect --create-sequence-table
. Applicable to Impala.Supported Values
Valid database and table name
Version Added
2.4.2
-
IN_LIST_JOIN_TABLE_SIZE
¶ Size of table specified by
IN_LIST_JOIN_TABLE
. Required for both table population byconnect
, and table usage by Gluent Query Engine. Applicable to Impala.Supported Values
Up to 1000000
Version Added
2.4.2
-
KERBEROS_KEYTAB
¶ The path of the keytab file. If not provided, a valid ticket must already exist in the cache (i.e. manual
kinit
).Supported Values
Path to the keytab file
Version Added
2.3.0
-
KERBEROS_PATH
¶ If your Kerberos utilities (like
kinit
) reside in some non-standard directory, set the path here.Supported Values
Path to Kerberos utilities
Version Added
2.3.0
-
KERBEROS_PRINCIPAL
¶ The Kerberos user to authenticate as. i.e.
kinit -kt KERBEROS_KEYTAB KERBEROS_PRINCIPAL
should succeed. IfKERBEROS_KEYTAB
is provided, this should also be provided.Supported Values
Name of Kerberos principal
Version Added
2.3.0
-
KERBEROS_SERVICE
¶ The Impala/Hive service (typically
impala
/hive
). If empty, Smart Connector will attempt to connect unsecured.Supported Values
Name of Impala service
Version Added
2.3.0
-
KERBEROS_TICKET_CACHE_PATH
¶ Required to use the
libhdfs3
-based result cache with an HDFS cluster that uses Kerberos authentication. In a deployment where result cache queries will never be used, this variable can safely be unset.Supported Values
Path to Kerberos ticket cache path for the user that will be executing Smart Connector processes
Version Added
2.3.0
-
LD_LIBRARY_PATH
¶ Ensures Gluent
lib
directory is included.Supported Values
Valid paths
Version Added
2.3.0
-
LIBHDFS3_CONF
¶ HDFS client configuration file location.
Supported Values
Valid path to XML configuration file
Version Added
3.0.4
-
LOG_LEVEL
¶ Logging level verbosity.
Default Value
info
Supported Values
info|detail|debug
Version Added
2.3.0
-
MAX_OFFLOAD_CHUNK_COUNT
¶ Restrict number of partitions offloaded per cycle. See Offload Transport Chunks for usage.
Associated Option
Supported Values
1
-1000
Version Added
2.9.0
-
MAX_OFFLOAD_CHUNK_SIZE
¶ Restrict size of partitions offloaded per cycle. See Offload Transport Chunks for usage.
Associated Option
Supported Values
E.g.
100M
,1G
,1.5G
Version Added
2.9.0
-
METAD_AUTOSTART
¶ Enable Metadata Daemon automatic start:
TRUE: If Metadata Daemon is not running, Smart Connector will attempt to start Metadata Daemon automatically. FALSE: Smart Connector will only attempt to connect to an already running Metadata Daemon.
Default Value
true
Supported Values
true|false
Version Added
2.6.0
-
METAD_POOL_SIZE
¶ The maximum number of connections Metadata Daemon will maintain in its connection pool to Oracle Database.
Default Value
16
Supported Values
Number of connections
Version Added
2.4.5
-
METAD_POOL_TIMEOUT
¶ The timeout for idle connections in Metadata Daemon’s connection pool to Oracle Database.
Default Value
300
Supported Values
Timeout value in seconds
Version Added
2.4.5
-
NLS_LANG
¶ Should be set to the value of Oracle Database
NLS_CHARACTERSET
.Supported Values
Valid
NLS_CHARACTERSET
valuesVersion Added
2.3.0
-
NUM_LOCATION_FILES
¶ Number of external table location files for parallel data retrieval.
Associated Option
Supported Values
Integer values
Version Added
2.7.2
-
OFFLOAD_BACKEND_SESSION_PARAMETERS
¶ Key/value pairs, in JSON format, to override backend query engine parameters. These take effect when establishing a connection to the backend system. For example:
"{\"export OFFLOAD_BACKEND_SESSION_PARAMETERS="{\"request_pool\": \"'root.gluent'\"}"
Supported Values
Valid JSON string of key/value pairs (no nested or complex data types)
Version Added
3.3.2
-
OFFLOAD_BIN
¶ Path to the Gluent Data Platform
bin
directory ($OFFLOAD_HOME/bin
).Supported Values
Oracle Database directory object name
Version Added
2.3.0
-
OFFLOAD_CONF
¶ Path to the Gluent Data Platform
conf
directory.Supported Values
Path to
conf
directoryVersion Added
2.3.0
-
OFFLOAD_COMPRESS_LOAD_TABLE
¶ Compress staged data during an Offload. This can be useful when staging to cloud storage.
Associated Option
Supported Values
true|false
Version Added
4.0.0
-
OFFLOAD_DISTRIBUTE_ENABLED
¶ Distribute data by partition key(s) during the final INSERT operation of an offload. Hive only.
Associated Option
Supported Values
true|false
Version Added
2.8.0
-
OFFLOAD_FS_AZURE_ACCOUNT_DOMAIN
¶ Microsoft Azure storage account service domain, required when staging offloaded data in Azure storage.
Supported Values
blob.core.windows.net
Version Added
4.1.0
-
OFFLOAD_FS_AZURE_ACCOUNT_KEY
¶ Microsoft Azure account key, required when staging offloaded data in Azure storage.
Supported Values
Valid Azure account key
Version Added
4.1.0
-
OFFLOAD_FS_AZURE_ACCOUNT_NAME
¶ Microsoft Azure account name, required when staging offloaded data in Azure storage.
Supported Values
Valid Azure account name
Version Added
4.1.0
-
OFFLOAD_FS_CONTAINER
¶ The name of the bucket or container to be used when offloading to cloud storage.
Associated Option
Supported Values
A cloud storage bucket/container name configured for use by the backend cluster
Version Added
3.0.0
-
OFFLOAD_FS_PREFIX
¶ A directory path used to prefix database locations within
OFFLOAD_FS_SCHEME
. WhenOFFLOAD_FS_SCHEME
isinherit
HDFS_DATA
takes precedence over this setting.Associated Option
Supported Values
A valid directory in HDFS or cloud storage
Version Added
3.0.0
-
OFFLOAD_FS_SCHEME
¶ The filesystem scheme to be used for database and table locations.
inherit
specifies that all tables created by Offload will not specify aLOCATION
clause, they will inherit the location from the parent database. See Integrating with Cloud Storage for details.Associated Option
Supported Values
inherit
,hdfs
,s3a
,adl
,abfs
,abfss
Version Added
3.0.0
-
OFFLOAD_HOME
¶ Location of Gluent Data Platform installation.
Supported Values
Path to installed
offload
directoryVersion Added
2.3.0
-
OFFLOAD_LOG
¶ Path to the Gluent Data Platform
log
directory.Supported Values
Oracle Database directory object name
Version Added
2.3.0
-
OFFLOAD_LOGDIR
¶ Override Smart Connector log path. If undefined defaults to
$OFFLOAD_HOME/log
.Supported Values
Valid path
Version Added
2.3.0
-
OFFLOAD_NOT_NULL_PROPAGATION
¶ Specify how Offload should treat
NOT NULL
constraints on offloaded columns. A value ofAUTO
will propagate all RDBMSNOT NULL
constraints to the backend and a value ofNONE
will not propagate anyNOT NULL
constraints columns to the backend table. Only applies to Google BigQuery, Snowflake or Azure Synapse Analytics backends. The--not-null-columns
option can be used to override this global setting, allowing a specific list of columns to be defined asNOT NULL
for an individual offload.Default Value
AUTO
Supported Values
AUTO|NONE
Version Added
4.3.4
-
OFFLOAD_SORT_ENABLED
¶ Enables the sorting/clustering of data when inserting in to the final destination table. Columns used for sorting/clustering are specified using
--sort-columns
.Associated Option
Supported Values
true|false
Version Added
2.7.0
-
OFFLOAD_STAGING_FORMAT
¶ Staging file format to use when staging offloaded data for loading into Snowflake.
Default value
PARQUET
Supported Values
AVRO|PARQUET
Version Added
4.1.0
-
OFFLOAD_TRANSPORT
¶ Method used to transport data from an RDBMS frontend to a backend system.
AUTO
selects the optimal method based on configuration and table structure.Associated Option
Supported Values
AUTO|GLUENT|SQOOP
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_AUTH_USING_ORACLE_WALLET
¶ Instruct Offload that RDBMS authentication is via an Oracle Wallet. The wallet location should be configured using Hadoop configuration appropriate to method used for data transport. See
SQOOP_OVERRIDES
andOFFLOAD_TRANSPORT_SPARK_PROPERTIES
for examples.Supported Values
true|false
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_CMD_HOST
¶ An override for
HDFS_CMD_HOST
when running shell based Offload Transport commands such as Sqoop or Spark Submit.Associated Option
Supported Values
Hostname or IP address of HDFS host
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_CONSISTENT_READ
¶ Control whether parallel data transport tasks should use a consistent point in time when reading RDBMS data.
Associated Option
Supported Values
true|false
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_CREDENTIAL_PROVIDER_PATH
¶ The credential provider path to be used in conjunction with
OFFLOAD_TRANSPORT_PASSWORD_ALIAS
. Integration with Hadoop Credential Provider API is only supported by Sqoop, Spark Submit and Livy based Offload Transport.Supported Values
A valid HDFS path
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_DSN
¶ Database connection details for Offload Transport if different to
ORA_CONN
.Associated Option
Supported Values
<hostname>:<port>/<service>
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_FETCH_SIZE
¶ Number of records to fetch in a single batch from the RDBMS during Offload. Offload Transport may encounter memory pressure if a table is very wide (e.g. contains LOB columns) and there are lots of records in a batch. Reducing the fetch size can alleviate this if more memory cannot be allocated.
Associated Option
Supported Values
Positive integers
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_LIVY_API_VERIFY_SSL
¶ Used to enable SSL for Livy API calls. There are 4 states:
Empty: Do not use SSL.
TRUE: Use SSL and verify Hadoop certificate against known certificates.
FALSE: Use SSL and do not verify Hadoop certificate.
/some/path/here/cert-bundle.crt
: Use SSL and verify Hadoop certificate against path to certificate bundle.
Supported Values
Empty,
true|false
,<path to certificate bundle>
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_LIVY_API_URL
¶ URL for Livy/Spark REST API in the format
http://fqdn-n.example.com:port
.https
can be used in place ofhttp
.Associated Option
Supported Values
Valid Livy REST API URL
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_LIVY_IDLE_SESSION_TIMEOUT
¶ Timeout (in seconds) for idle Spark client sessions created in Livy.
Associated Option
Supported Values
Positive integers
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_LIVY_MAX_SESSIONS
¶ Limits the number of Livy sessions Offload will create. Sessions are re-used when idle. New sessions are only created when no idle sessions are available.
Associated Option
Supported Values
Positive integers
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_PARALLELISM
¶ The number of parallel streams to be used when transporting data from the source RDBMS to the backend.
Associated Option
Supported Values
Positive integers
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_PASSWORD_ALIAS
¶ An alias provided by Hadoop Credential Provider API to be used for RDBMS authentication during Offload Transport. The key store containing the alias must be specified in either
OFFLOAD_TRANSPORT_CREDENTIAL_PROVIDER_PATH
or in Hadoop configuration Path (hadoop.security.credential.provider.path
).Associated Option
Supported Values
Valid Hadoop Credential Provider API alias
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_RDBMS_SESSION_PARAMETERS
¶ Key/value pairs, in JSON format, to supply database session parameter values. These only take effect during Offload Transport, e.g.
'{"cell_offload_processing": "false"}'
Supported Values
Valid JSON string of key/value pairs (no nested or complex data types)
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_SMALL_TABLE_THRESHOLD
¶ Threshold above which Query Import is no longer considered the correct offload choice for non-partitioned tables.
Supported Values
E.g.
100M
,1G
,1.5G
Version Added
4.2.0
-
OFFLOAD_TRANSPORT_SPARK_OVERRIDES
¶ Override JVM flags for a
spark-submit
command, inserted immediately afterspark-submit
.Associated Option
Supported Values
Valid JVM options
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_SPARK_PROPERTIES
¶ Key/value pairs, in JSON format, to override Spark property defaults. Examples:
'{"spark.driver.memory": "8G", "spark.executor.memory": "8G"}' '{"spark.driver.extraJavaOptions": "-Doracle.net.wallet_location=/some/path/here/gluent_wallet", "spark.executor.extraJavaOptions": "-Doracle.net.wallet_location=/some/path/here/gluent_wallet"}'
Associated Option
Supported Values
Valid JSON string of key/value pairs (no nested or complex data types)
Version Added
3.1.0
Note
Some properties will not take effect when connecting to the Spark Thrift Server because the Spark context has already been created.
-
OFFLOAD_TRANSPORT_SPARK_QUEUE_NAME
¶ YARN queue name for Gluent Offload Engine Spark jobs.
Associated Option
Supported Values
Valid YARN queue name
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_SPARK_SUBMIT_EXECUTABLE
¶ The executable to use for submitting Spark applications. Can be empty,
spark-submit
orspark2-submit
.Supported Values
Blank or
spark-submit|spark2-submit
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_SPARK_SUBMIT_MASTER_URL
¶ The master URL for the Spark cluster, only used for non-Hadoop Spark clusters. If empty, Spark will use default settings.
Associated Option
None
Supported Values
Valid master URL
Version Added
4.0.0
-
OFFLOAD_TRANSPORT_SPARK_THRIFT_HOST
¶ Name of host(s) where the Spark Thrift Server is running. Can be a comma-separated list of hosts to randomly choose from, e.g.
hadoop1,hadoop2,hadoop3
.Associated Option
Supported Values
Hostname or IP address of Spark Thrift Server host(s)
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_SPARK_THRIFT_PORT
¶ Port that the Spark Thrift Server is listening on.
Associated Option
Supported Values
Active port
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_USER
¶ User to authenticate as when executing Offload Transport commands such as SSH for
spark-submit
or Sqoop commands, or Livy API callsAssociated Option
None
Supported Values
Valid username
Version Added
4.0.0
-
OFFLOAD_TRANSPORT_VALIDATION_POLLING_INTERVAL
¶ Polling interval in seconds for validation of Spark transport row count. A value of -1 disables retrieval of RDBMS SQL statistics. A value of 0 disables polling resulting in a single capture of SQL statistics after Offload Transport. A value greater than 0 polls RDBMS SQL statistics using the specified interval.
Associated Option
Supported Values
Interval value in seconds,
0
or-1
Version Added
4.2.1
Note
When the Spark Thrift Server or Apache Livy are used for Offload Transport it is recommended to set OFFLOAD_TRANSPORT_VALIDATION_POLLING_INTERVAL to a positive value. This is because polling RDBMS SQL statistics is the primary validation for both Spark Thrift Server and Apache Livy based Offload Transport.
-
OFFLOAD_UDF_DB
¶ For Impala/Hive, the database that Gluent Data Platform UDFs are created in. If undefined defaults to the
default
database.For BigQuery, the name of the dataset that contains custom UDF(s) for synthetic partitioning. If undefined, the dataset will be determined from the
--partition-functions
option.Supported Values
Valid Impala/Hive database or BigQuery dataset
Version Added
2.3.0
-
OFFLOAD_VERIFY_PARALLELISM
¶ Degree of parallelism to use for the RDBMS query executed when validating an offload. Values of 0 or 1 will execute the query without parallelism. Values > 1 will force a parallel query of the given degree. If unset, the RDBMS query will fall back to using the behavior specified by RDBMS defaults.
Associated Option
Supported Values
0
and positive integersVersion Added
4.2.1
-
ORA_ADM_CONN
¶ Connection string (typically tnsnames.ora entry) for
ORA_ADM_USER
connections. Primarily for use with Oracle Wallet as each entry requires a unique connection string.Supported Values
Connection string corresponding to the Oracle Wallet entry for
ORA_ADM_USER
Version Added
4.2.0
-
ORA_ADM_PASS
¶ Password of the Gluent Data Platform Admin Schema chosen during installation. Password encryption is supported using the Password Tool utility.
Supported Values
Oracle Database ADM password
Version Added
2.3.0
-
ORA_ADM_USER
¶ Name of the Gluent Data Platform Admin Schema chosen during installation.
Supported Values
Oracle Database ADM username
Version Added
2.3.0
-
ORA_APP_PASS
¶ Password of the Gluent Data Platform Application Schema chosen during installation. Password encryption is supported using the Password Tool utility.
Supported Values
Oracle Database APP password
Version Added
2.3.0
-
ORA_APP_USER
¶ Name of the Gluent Data Platform Application Schema chosen during installation.
Supported Values
Oracle Database APP username
Version Added
2.3.0
-
ORA_CONN
¶ Oracle Database connection details. Fully qualified DB service name must be used if the Oracle Database service name includes domain-names (
DB_DOMAIN
), e.g.ORCL12.gluent.com
.Supported Values
<hostname>:<port>/<service>
Version Added
2.3.0
-
ORA_REPO_USER
¶ Name of the Gluent Data Platform Repository Schema chosen during installation.
Supported Values
Oracle Database REPO username
Version Added
3.3.0
-
PASSWORD_KEY_FILE
¶ Password key file generated by Password Tool and used to create encrypted password strings.
Supported Values
Path to Password Key File
Version Added
2.5.0
-
PATH
¶ Ensures Gluent Data Platform
bin
directory is included. The path order is important to ensure that the Python distribution included with Gluent Data Platform is used.Supported Values
Valid paths
Version Added
2.3.0
-
QUERY_ENGINE
¶ Backend SQL engine to use for commands issued as part of Offload/Present orchestration.
Supported Values
BIGQUERY|IMPALA|SNOWFLAKE|SYNAPSE
Version Added
2.3.0
-
QUERY_MONITOR_THRESHOLD
¶ Threshold for hybrid query execution time (in seconds) that enables automatic monitoring of a query in the backend. Queries with Data Daemon execution time below this threshold will not gather any backend trace metrics or profiles. A value of 0 will enable automatic trace/profile collection for all hybrid queries. Individual hybrid queries can have trace enabled or disabled with the
GLUENT_QUERY_MONITOR
orGLUENT_NO_QUERY_MONITOR
hints, respectively.Supported Values
Integers >= 0
Version Added
4.3.2
-
SNOWFLAKE_ACCOUNT
¶ Name of the Snowflake account to use with Gluent Data Platform.
Supported Values
Snowflake account name
Version Added
4.1.0
-
SNOWFLAKE_DATABASE
¶ Name of the Snowflake database to use with Gluent Data Platform.
Supported Values
Snowflake database name
Version Added
4.1.0
-
SNOWFLAKE_FILE_FORMAT_PREFIX
¶ Name prefix for Gluent Offload Engine to use when creating file format objects while offloading to Snowflake.
Default Value
GLUENT_OFFLOAD_FILE_FORMAT
Supported Values
Valid Snowflake file format object name <= 120 characters
Version Added
4.1.0
-
SNOWFLAKE_INTEGRATION
¶ Name of the Snowflake storage integration for Gluent Offload Engine to use when offloading to Snowflake.
Supported Values
Valid Snowflake integration name
Version Added
4.1.0
-
SNOWFLAKE_PASS
¶ Password for Snowflake service account user for Gluent Data Platform, required when using password authentication. Password encryption is supported using the Password Tool utility.
Supported Values
Snowflake user’s password
Version Added
4.1.0
-
SNOWFLAKE_PEM_FILE
¶ Path to private PEM file for Snowflake service account user for Gluent Data Platform, required when using key-pair authentication.
Supported Values
Path to Snowflake user’s private PEM key file
Version Added
4.1.0
-
SNOWFLAKE_PEM_PASSPHRASE
¶ Optional PEM passphrase to authenticate the Snowflake service account user for Gluent Data Platform, only required when using key-pair authentication with a passphrase. Passphrase encryption is supported using the Password Tool utility.
Supported Values
Snowflake user’s PEM passphrase
Version Added
4.1.0
-
SNOWFLAKE_ROLE
¶ Name of the Snowflake database role created by Gluent Data Platform.
Default Value
GLUENT_OFFLOAD_ROLE
Supported Values
Valid Snowflake role name
Version Added
4.1.0
-
SNOWFLAKE_STAGE
¶ Name for Gluent Offload Engine to use when creating schema-level stage objects while offloading to Snowflake.
Default Value
GLUENT_OFFLOAD_STAGE
Supported Values
Valid Snowflake stage name
Version Added
4.1.0
-
SNOWFLAKE_USER
¶ Name of the Snowflake service account user for Gluent Data Platform.
Supported Values
Valid Snowflake user name
Version Added
4.1.0
-
SNOWFLAKE_WAREHOUSE
¶ Default Snowflake warehouse for Gluent Data Platform to use when interacting with Snowflake.
Supported Values
Valid Snowflake warehouse name
Version Added
4.1.0
-
SPARK_HISTORY_SERVER
¶ SPARK_HISTORY_SERVER is a URL for accessing the runtime history of the running Spark Thrift Server UI.
Supported Values
URL of Spark History Server e.g.
http://hadoop1:18081/
Version Added
3.1.0
-
SPARK_THRIFT_HOST
¶ Name of host(s) where the Spark Thrift Server is running. Can be a comma-separated list of hosts to randomly choose from, e.g.
hadoop1,hadoop2,hadoop3
.Supported Values
Hostname or IP address of Spark Thrift Server host(s)
Version Added
3.1.0
-
SPARK_THRIFT_PORT
¶ Port that the Spark Thrift Server is listening on.
Supported Values
Active port
Version Added
3.1.0
-
SQOOP_DISABLE_DIRECT
¶ It is recommended that the OraOOP optimizations for Sqoop (included in standard Apache Sqoop from
v1.4.5
) are used. If not, then disable direct path mode.Associated Option
Supported Values
true|false
Version Added
2.3.0
-
SQOOP_OVERRIDES
¶ Override flags for Sqoop command, inserted immediately after
sqoop import
. To avoid issues,-Dsqoop.avro.logical_types.decimal.enable=false
is included by default and should not be removed. Additional settings can be added as below. For example:"-Dsqoop.avro.logical_types.decimal.enable=false -Dmapreduce.map.java.opts='-Doracle.net.wallet_location=/some/path/here/gluent_wallet'"
Associated Option
Supported Values
Valid Sqoop parameters
Version Added
2.3.0
-
SQOOP_ADDITIONAL_OPTIONS
¶ Additional Sqoop command options added at the end of the Sqoop command.
Associated Option
Supported Values
Any Sqoop command option/argument not already included in the Sqoop command line
Version Added
2.9.0
-
SQOOP_PASSWORD_FILE
¶ HDFS path to Sqoop password file, readable by
HADOOP_SSH_USER
. If not specified,ORA_APP_PASS
will be used.Associated Option
Supported Values
HDFS path to password file
Version Added
2.5.0
-
SQOOP_QUEUE_NAME
¶ YARN queue name for Gluent Offload Engine Sqoop jobs.
Associated Option
Supported Values
Valid YARN queue name
Version Added
3.1.0
-
SSL_ACTIVE
¶ Set to
true
when Impala/Hive uses SSL/TLS encryption.Supported Values
true|false
Version Added
2.3.0
-
SSL_TRUSTED_CERTS
¶ SSL/TLS trusted certificates.
Supported Values
Path to SSL certificate
Version Added
2.3.0
-
START_OF_WEEK
¶ Specify the first day of the week for
TO_CHAR(<value>, 'D')
predicate pushdown. Applies to Snowflake and Azure Synapse Analytics.Default Value
7
Supported Values
1
(Monday) to7
(Sunday)Version Added
4.3.0
-
SYNAPSE_AUTH_MECHANISM
¶ Azure Synapse Analytics authentication mechanism.
Supported Values
SqlPassword
,ActiveDirectoryPassword
,ActiveDirectoryMsi
,ActiveDirectoryServicePrincipal
Version Added
4.3.0
-
SYNAPSE_COLLATION
¶ Azure Synapse Analytics collation to use for character columns. Note that changing this to a value with different behavior to the frontend system may give unexpected results.
Supported Values
Valid collations
Version Added
4.3.0
-
SYNAPSE_DATA_SOURCE
¶ Name of the external data source for Gluent Offload Engine to use when offloading to Azure Synapse Analytics. Note that in databases with case-sensitive collations this parameter is case-sensitive.
Supported Values
Valid Azure Synapse Analytics external data source
Version Added
4.3.0
-
SYNAPSE_DATABASE
¶ Name of the Azure Synapse Analytics database to use with Gluent Data Platform. Note that in databases with case-sensitive collations this parameter is case-sensitive.
Supported Values
Valid Azure Synapse Analytics database name
Version Added
4.3.0
-
SYNAPSE_FILE_FORMAT
¶ Name of the file format for Gluent Offload Engine to use when offloading to Azure Synapse Analytics. Note that in databases with case-sensitive collations this parameter is case-sensitive.
Supported Values
Valid Azure Synapse Analytics Parquet file format
Version Added
4.3.0
-
SYNAPSE_MSI_CLIENT_ID
¶ Specifies the object (principal) ID of the identity for
ActiveDirectoryMsi
authentication with a user-assigned identity. Leave blank when using other authentication mechanisms.Supported Values
Object (principal) ID of the identity
Version Added
4.3.0
-
SYNAPSE_PASS
¶ Specifies the password for the Gluent Data Platform user for
SqlPassword
orActiveDirectoryPassword
authentication. Leave blank when using other authentication mechanisms. Password encryption is supported using the Password Tool utility.Supported Values
Azure Synapse Analytics user’s password
Version Added
4.3.0
-
SYNAPSE_PORT
¶ Dedicated SQL endpoint port of Azure Synapse Analytics workspace.
Default Value
1433
Supported Values
Valid port
Version Added
4.3.0
-
SYNAPSE_RESOURCE_GROUP
¶ Resource group of Azure Synapse Analytics workspace.
Supported Values
Valid Azure Synapse Analytics resource group
Version Added
4.3.0
-
SYNAPSE_ROLE
¶ Name of the Azure Synapse Analytics database role assigned to the Gluent Data Platform user. Note that in databases with case-sensitive collations this parameter is case-sensitive.
Supported Values
Valid Azure Synapse Analytics role name
Version Added
4.3.0
-
SYNAPSE_SERVER
¶ Dedicated SQL endpoint of Azure Synapse Analytics workspace.
Supported Values
Valid Azure Synapse Analytics dedicated SQL endpoint
Version Added
4.3.0
-
SYNAPSE_SERVICE_PRINCIPAL_ID
¶ Specifies the application (client) ID for
ActiveDirectoryServicePrincipal
authentication. Leave blank when using other authentication mechanisms.Supported Values
Application (client) ID
Version Added
4.3.0
-
SYNAPSE_SERVICE_PRINCIPAL_SECRET
¶ Specifies the client secret for
ActiveDirectoryServicePrincipal
authentication. Leave blank when using other authentication mechanisms.Supported Values
Client secret
Version Added
4.3.0
-
SYNAPSE_SUBSCRIPTION_ID
¶ ID of the subscription containing the Azure Synapse Analytics workspace.
Supported Values
Valid Azure Synapse Analytics resource group
Version Added
4.3.0
-
SYNAPSE_USER
¶ Specifies the username for the Gluent Data Platform user for
SqlPassword
orActiveDirectoryPassword
authentication. Leave blank when using other authentication mechanisms.Supported Values
Azure Synapse Analytics username
Version Added
4.3.0
-
SYNAPSE_WORKSPACE
¶ Name of the Azure Synapse Analytics workspace.
Supported Values
Valid Azure Synapse Analytics workspace
Version Added
4.3.0
-
TWO_TASK
¶ Used to support Pluggable Databases in Oracle Database Multitenant environments. Set to
ORA_CONN
for single instance and an EZconnect string connecting to the local instance, typically<hostname>:<port>/<ORACLE_SID>
, for Oracle RAC (Real Application Clusters).Supported Values
ORA_CONN
or EZconnect stringVersion Added
2.10.0
-
USE_ORACLE_WALLET
¶ Controls use of Oracle Wallet for authentication for orchestration commands and Metadata Daemon. When set to
true
OFFLOAD_TRANSPORT_AUTH_USING_ORACLE_WALLET
is automatically set totrue
.Default Value
false
Supported Values
true|false
Version Added
4.2.0
-
WEBHDFS_HOST
¶ Can be used in conjunction with
WEBHDFS_PORT
to optimize HDFS activities removing JVM start-up overhead by utilizing WebHDFS. From version2.4.7
the value can be a comma-separated list of hosts if HDFS is configured for High Availability.Supported Values
Hostname or IP address of WebHDFS host
Version Added
2.3.0
-
WEBHDFS_PORT
¶ Can be used in conjunction with
WEBHDFS_HOST
to optimize HDFS activities removing JVM start-up overhead by utilizing WebHDFS. If this value is unset then default ports of 50070 (HTTP) or 50470 (HTTPS) are used.Default Value
50070|50470
Supported Values
Port of HDFS namenode
Version Added
2.3.0
-
WEBHDFS_VERIFY_SSL
¶ Used to enable SSL for WebHDFS calls. There are 4 states:
Empty: Do not use SSL
TRUE: Use SSL & verify Hadoop certificate against known certificates
FALSE: Use SSL & do not verify Hadoop certificate
/some/path/here/cert-bundle.crt
: Use SSL & verify Hadoop certificate against path to certificate bundle
Supported Values
Empty,
true|false
,<path to certificate bundle>
Version Added
2.3.0
Common Parameters¶
-
--execute
¶
Perform operations, rather than just printing.
Alias
-x
Default Value
None
Supported Values
None
Version Added
2.3.0
-
-f
¶
Force option. Replace Gluent Offload Engine managed tables/views as required. Use with caution.
Alias
--force
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--force
¶
Force option. Replace Gluent Offload Engine managed tables/views as required. Use with caution.
Alias
-f
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--no-webhdfs
¶
Prevent the use of WebHDFS even when configured for use.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
-t
¶
Owner and table name.
Alias
--table
Default Value
None
Supported Values
<OWNER>.<NAME>
Version Added
2.3.0
-
--table
¶
Owner and table name.
Alias
-t
Default Value
None
Supported Values
<OWNER>.<NAME>
Version Added
2.3.0
-
--target-name
¶
Override owner and/or name of created frontend or backend object as appropriate for a command.
Allows separation of the RDBMS owner and/or name from the backend system. This can be necessary as some characters supported for owner and name in Oracle Database are not supported in all backend systems, for example
$
in Hadoop-based or BigQuery backends.Allows offload to an existing backend database with a different name to the source RDBMS schema.
Allows present to a hybrid schema without a corresponding application RDBMS schema or with a different name to the source backend database.
Alias
None
Default Value
None
Supported Values
<OWNER>.<NAME>
Version Added
2.3.0
-
-v
¶
Verbose output.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--vv
¶
More verbose output.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
-x
¶
Perform operations, rather than just printing.
Alias
--execute
Default Value
None
Supported Values
None
Version Added
2.3.0
Connect Parameters¶
-
--create-sequence-table
¶
Create the Gluent Data Platform sequence table. See
IN_LIST_JOIN_TABLE
andIN_LIST_JOIN_TABLE_SIZE
.Alias
None
Default Value
None
Supported Values
None
Version Added
2.4.2
-
--install-udfs
¶
Install Gluent Data Platform user-defined functions (UDFs).
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--sequence-table-name
¶
See
IN_LIST_JOIN_TABLE
.Alias
None
Default Value
default.gluent_sequence
Supported Values
Valid database and table name
Version Added
2.4.2
-
--sequence-table-size
¶
-
Alias
None
Default Value
10000
Supported Values
Up to 1000000
Version Added
2.4.2
-
--sql-file
¶
Write SQL commands to a file rather than execute them when
connect
is run.Alias
None
Default Value
None
Supported Values
Any valid path
Version Added
2.11.0
-
--update-root-files
¶
Updates both Metadata Daemon and Data Daemon scripts with configuration and sets ownership to
root:root
. This option can only be run withroot
privileges.Alias
None
Default Value
None
Supported Values
None
Version Added
2.11.0
-
--update-metad-files
¶
Updates Metadata Daemon scripts with configuration and sets ownership to
root:root
. This option can only be run withroot
privileges.Alias
None
Default Value
None
Supported Values
None
Version Added
4.0.0
-
--update-datad-files
¶
Updates Data Daemon scripts with configuration and sets ownership to
root:root
. This option can only be run withroot
privileges.Alias
None
Default Value
None
Supported Values
None
Version Added
4.0.0
-
--upgrade-environment-file
¶
Updates configuration file (
offload.env
) with any missing default configuration from offload.env.template. Typically used after upgrades.Alias
None
Default Value
None
Supported Values
None
Version Added
2.11.0
-
--validate-udfs
¶
Validate that the Gluent Data Platform user-defined functions (UDFs) are accessible from Impala after installation/upgrade.
Alias
None
Default Value
None
Supported Values
None
Version Added
4.1.0
Offload Parameters¶
-
--allow-decimal-scale-rounding
¶
Confirm that it is acceptable for Offload to round decimal places when loading data into a backend system.
Alias
None
Default Value
None
Supported Values
None
Version Added
4.0.0
-
--allow-floating-point-conversions
¶
Confirm that it is acceptable for Offload to convert NaN or Infinity special values to NULL when loading data into a backend system.
Alias
None
Default Value
None
Supported Values
None
Version Added
4.3.0
-
--allow-nanosecond-timestamp-columns
¶
Confirm that it is safe to offload timestamp columns with nanosecond capability when the backend system does not support nanoseconds.
Alias
None
Default Value
None
Supported Values
None
Version Added
4.0.2
-
--bucket-hash-column
¶
Column to use when calculating offload bucket values.
Alias
None
Default Value
None
Supported Values
Valid column name
Version Added
2.3.0
-
--compress-load-table
¶
Compress the contents of the load table during offload.
Alias
None
Default Value
OFFLOAD_COMPRESS_LOAD_TABLE
,false
Supported Values
None
Version Added
2.3.0
-
--compute-load-table-stats
¶
Compute statistics on the load table during offload. Applicable to Impala.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.9.0
-
--create-backend-db
¶
Automatically create backend databases. Either use this option, or ensure the correct databases/datasets/schemas (base and load databases) for offloading and presenting already exist.
Alias
None
Default Value
None
Supported Values
None
Version Added
3.3.0
-
--count-star-expressions
¶
CSV list of functional equivalents to
COUNT(*)
for aggregation pushdown.If you also use
COUNT(x)
in your SQL statements then, apart fromCOUNT(1)
which is automatically catered for, the presence ofCOUNT(x)
will cause rewrite rules to fail unless you include it with this parameter.Alias
None
Default Value
None
Supported Values
E.g.
COUNT(9)
Version Added
2.3.0
-
--data-governance-custom-properties
¶
CSV list of free-format tags for data governance metadata. These are in addition to
DATA_GOVERNANCE_AUTO_PROPERTIES
and will overrideDATA_GOVERNANCE_CUSTOM_PROPERTIES
.Alias
None
Default Value
Supported Values
Valid JSON string of key/value pairs (no nested or complex data types)
Version Added
2.11.0
CSV list of free-format tags for data governance metadata. These are in addition to
DATA_GOVERNANCE_AUTO_TAGS
and therefore useful for tags to be applied to specific activities.Alias
None
Default Value
Supported Values
E.g.
CONFIDENTIAL,TIER1
Version Added
2.11.0
-
--data-sample-parallelism
¶
Degree of parallelism to use when sampling data for all columns in the source RDBMS table that are either date or timestamp-based or defined as a number without a precision and scale. A value of 0 or 1 disables parallelism.
Alias
None
Default Value
Supported Values
0
and positive integersVersion Added
4.2.0
-
--data-sample-percent
¶
Sample data for all columns in the source RDBMS table that are either date or timestamp-based or defined as a number without a precision and scale. A value of 0 disables sampling. A value of
AUTO
will enable Offload choose a percentage based on the size of the RDBMS table.Alias
None
Default Value
AUTO
Supported Values
AUTO
or0
-100
Version Added
2.5.0
-
--date-columns
¶
CSV list of columns to offload as DATE (effective for date/timestamp columns).
This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
4.0.0
-
--db-name-prefix
¶
Multitenant support, enabling many Oracle Database databases to offload to the same backend cluster. See
DB_NAME_PREFIX
for details.Alias
None
Default Value
Supported Values
Supported backend characters
Version Added
2.3.0
-
--decimal-columns
¶
CSV list of columns to offload/present as a fixed precision and scale numeric data type. For example
DECIMAL(p,s)
where “p,s” is specified in a paired--decimal-columns-type
option. Only effective for numeric columns. These options allow repeat inclusion for flexible data type specification, for example:"--decimal-columns-type=18,2 --decimal-columns=price,cost --decimal-columns-type=6,4 --decimal-columns=location"
This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.5.0
-
--decimal-columns-type
¶
State the precision and scale of columns listed in a paired
--decimal-columns
option. Must be of format “precision,scale” where 1<=precision<=38 and 0<=scale<=38 and scale<=precision. e.g.:"--decimal-columns-type=18,2"
When offloading, values specified in this option are subject to padding as per the
--decimal-padding-digits
option.Alias
None
Default Value
None
Supported Values
Valid “precision,scale” where 1<=precision<=38 and 0<=scale<=38 and scale<=precision
Version Added
2.5.0
-
--decimal-padding-digits
¶
Padding to apply to precision and scale of DECIMALs during an offload.
Alias
None
Default Value
2
Supported Values
Integral values
Version Added
2.5.0
-
--double-columns
¶
CSV list of columns to store as a double precision floating-point. Only effective for numeric columns.
This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.4.7
-
--equal-to-values
¶
Used for list-partitioned tables to specify a partition to be included for Partition-Based Offload by partition key value. This option can be included multiple times to match multiple partitions, for example:
--equal-to-values=2011 --equal-to-values=2012 --equal-to-values=2013
Alias
None
Default Value
None
Supported Values
Valid literals matching list-partition key values
Version Added
3.3.0
-
--ext-table-degree
¶
Default degree of parallelism for base hybrid external tables. When set to
AUTO
Offload will copy settings from the source RDBMS table to the hybrid external table.Alias
None
Default Value
HYBRID_EXT_TABLE_DEGREE
orAUTO
Supported Values
AUTO
and positive integersVersion Added
2.11.2
-
--hdfs-data
¶
Command line override for
HDFS_DATA
.Alias
None
Default Value
Supported Values
Valid HDFS path
Version Added
2.3.0
-
--hdfs-db-path-suffix
¶
Hadoop databases are named
<schema><HDFS_DB_PATH_SUFFIX>
and<schema>_load<HDFS_DB_PATH_SUFFIX>
. When this value is not set the suffix of the databases defaults to.db
, giving<schema>.db
and<schema>_load.db
. Set this to an empty string to use no suffix. For backend systems other than Hadoop this variable has no effect.Alias
None
Default Value
HDFS_DB_PATH_SUFFIX
,.db
on Hadoop systems, or''
on other backend systems.Supported Values
Valid HDFS path
Version Added
2.3.0
-
--hive-column-stats
¶
Enable computation of column stats with “NATIVE”
--offload-stats
method. Applies to Hive only.Alias
None
Default Value
None
Supported Values
None
Version Added
2.6.1
-
--integer-1-columns
¶
CSV list of columns to offload/present (as applicable) as a 1-byte integer, known as
TINYINT
in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-2-columns
¶
CSV list of columns to offload/present (as applicable) as a 2-byte integer, known as
SMALLINT
in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-4-columns
¶
CSV list of columns to offload/present (as applicable) as a 4-byte integer, known as
INT
in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-8-columns
¶
CSV list of columns to offload/present (as applicable) as an 8-byte integer, known as
BIGINT
in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-38-columns
¶
CSV list of columns to offload/present (as applicable) as 38 digit integral column. If a system does not support 38 digits of precision then the most appropriate data type available will be used. Only effective for numeric columns.
This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--less-than-value
¶
Offload partitions with high water mark less than this value.
Alias
None
Default Value
None
Supported Values
Integer or date values (use
YYYY-MM-DD
format)Version Added
2.3.0
-
--lob-data-length
¶
Expected length of RDBMS LOB data
Alias
None
Default Value
32K
Supported Values
E.g.
64K
,10M
Version Added
2.4.7
-
--max-offload-chunk-count
¶
Restrict the number of partitions offloaded per cycle. See Offload Transport Chunks for usage.
Alias
None
Default Value
Supported Values
1
-1000
Version Added
2.3.0
-
--max-offload-chunk-size
¶
Restrict the size of partitions offloaded per cycle. See Offload Transport Chunks for usage.
Alias
None
Default Value
Supported Values
E.g.
100M
,1G
,1.5G
Version Added
2.3.0
-
--no-auto-detect-dates
¶
Turn off automatic adoption of string data type for RDBMS date values that are incompatible with the backend system. For example, dates preceding 1400-01-01 are invalid in Impala and will be offloaded to string columns unless this option is used.
Alias
None
Default Value
False
Supported Values
None
Version Added
2.5.1
-
--no-auto-detect-numbers
¶
Turn off automatic adoption of numeric data types based on their precision and scale in the RDBMS. All numeric data types will be offloaded to a general purpose data type such as
DECIMAL(38,18)
on Hadoop systems,NUMERIC
orBIGNUMERIC
on Google BigQuery orNUMBER(38,18)
on Snowflake.Alias
None
Default Value
False
Supported Values
None
Version Added
2.3.0
-
--no-create-aggregations
¶
Skip aggregation creation. If this parameter is used, then to benefit from Advanced Aggregation Pushdown the aggregate hybrid objects must be created using Present.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--no-generate-dependent-views
¶
Dependent views will not be automatically re-generated in the hybrid schema.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--no-materialize-join
¶
Offload a join (specified by
--offload-join
) as a view.Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--no-modify-hybrid-view
¶
Prevent an offload predicate from being added to the boundary conditions in a hybrid view. Can only be used in conjunction with
--offload-predicate
for--offload-predicate-type
values ofRANGE
,LIST_AS_RANGE
,RANGE_AND_PREDICATE
orLIST_AS_RANGE_AND_PREDICATE
.Alias
None
Default Value
None
Supported Values
None
Version Added
3.4.0
-
--no-verify
¶
Skip the data validation step at the end of an offload.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--not-null-columns
¶
Specifies which columns should be created as
NOT NULL
when offloading a table. Used to override the globalOFFLOAD_NOT_NULL_PROPAGATION
configuration variable at an offload level. Accepts a CSV list and/or wildcard(s) of valid columns to create asNOT NULL
in the backend. Only applies to Google BigQuery, Snowflake or Azure Synapse Analytics backends.This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
4.3.4
-
--num-buckets
¶
Default number of offload buckets (subpartitions) for an offloaded table, allowing parallel data retrieval. A value of
AUTO
tunes to a value between 1 andDEFAULT_BUCKETS_MAX
.Alias
None
Default Value
DEFAULT_BUCKETS
orAUTO
Supported Values
Integer values or
AUTO
Version Added
2.3.0
-
--num-location-files
¶
Number of external table location files for parallel data retrieval.
Alias
None
Default Value
Supported Values
Integer values
Version Added
2.7.2
Note
When offloading or materializing data in Impala then --num-location-files
will be aligned with --num-buckets
/DEFAULT_BUCKETS
-
--offload-by-subpartition
¶
Offload a subpartitioned table with Subpartition-Based Offload (i.e. with reference to subpartition keys and high values rather than partition-level information).
Alias
None
Default Value
True for composite partitioned tables that are unsupported for Partition-Based Offload but supported for Subpartition-Based Offload, False for all other tables
Supported Values
None
Version Added
2.7.0
-
--offload-chunk-column
¶
Splits load data by this column during insert from the load table to the final table. This can be used to manage memory usage.
Alias
None
Default Value
None
Supported Values
Valid column name
Version Added
2.3.0
-
--offload-chunk-impala-insert-hint
¶
Used to inject a hint into the
INSERT AS SELECT
moving data from load table to final destination. The absence of a value injects no hint. Impala only.Alias
None
Default Value
None
Supported Values
SHUFFLE|NOSHUFFLE
Version Added
2.3.0
-
--offload-distribute-enabled
¶
Distribute data by partition key(s) during the final INSERT operation of an offload. Hive only.
Alias
None
Default Value
Supported Values
None
Version Added
2.8.0
-
--offload-fs-container
¶
The name of the bucket or container to be used when offloading to cloud storage.
Alias
None
Default Value
Supported Values
A cloud storage bucket/container name configured for use by the backend cluster
Version Added
3.0.0
-
--offload-fs-prefix
¶
A directory path used to prefix database locations within
OFFLOAD_FS_SCHEME
. WhenOFFLOAD_FS_SCHEME
isinherit
HDFS_DATA
takes precedence over this setting.Alias
None
Default Value
Supported Values
A valid directory in HDFS or cloud storage
Version Added
3.0.0
-
--offload-fs-scheme
¶
The filesystem scheme to be used for database and table locations.
inherit
specifies that all tables created by Offload will not specify aLOCATION
clause, they will inherit the location from the parent database. See Integrating with Cloud Storage for details.Alias
None
Default Value
OFFLOAD_FS_SCHEME
,inherit
Supported Values
inherit
,hdfs
,s3a
,adl
,abfs
,abfss
Version Added
3.0.0
-
--offload-join
¶
Offload a materialized view of the supplied join(s), allowing join processing to be offloaded. Repeated use of
--offload-join
allows multiple row sources to be included. See documentation for syntax details.Alias
None
Default Value
None
Supported Values
Version Added
2.3.0
-
--offload-predicate
¶
Specify a predicate to identify a set of data in a table for offload. Can be used to offload all or some of the data in any table type. See documentation for syntax details.
Alias
None
Default Value
None
Supported Values
Version Added
3.4.0
-
--offload-predicate-type
¶
Override the default INCREMENTAL_PREDICATE_TYPE for a partitioned table. Can be used to offload LIST partitioned tables using RANGE logic with an
--offload-predicate-type
value ofLIST_AS_RANGE
or used for specialized cases of offloading with Partition-Based Offload and Predicate-Based Offload.Alias
None
Default Value
None
Supported Values
LIST
,LIST_AS_RANGE
,RANGE
,RANGE_AND_PREDICATE
,LIST_AS_RANGE_AND_PREDICATE
,PREDICATE
Version Added
3.3.1
-
--offload-sort-enabled
¶
Sort/cluster data during the final INSERT operation of an offload. Configure sort/cluster columns using
--sort-columns
.Alias
None
Default Value
OFFLOAD_SORT_ENABLED
,false
Supported Values
None
Version Added
2.7.0
-
--offload-stats
¶
Method used to manage backend table stats during an Offload, Incremental Update Extraction or Compaction.
NATIVE
is the default.HISTORY
will gather stats on all partitions without stats (applicable to an Offload on Hive only and will automatically be replaced withNATIVE
on Impala).COPY
will copy table statistics from the RDBMS to an offloaded table if the backend system supports setting of statistics.NONE
will prevent Offload from managing stats; for Hive this results in no stats being gathered even ifhive.stats.autogather=true
is set at the system level.Alias
None
Default Value
NATIVE
Supported Values
NATIVE|HISTORY|COPY|NONE
Version Added
2.4.7
(HISTORY
added in2.9.0
)
-
--offload-transport
¶
Method used to transport data from an RDBMS frontend to a backend system.
AUTO
selects the optimal method based on configuration and table structure.Alias
None
Default Value
OFFLOAD_TRANSPORT
,AUTO
Supported Values
AUTO|GLUENT|SQOOP
Version Added
3.1.0
-
--offload-transport-cmd-host
¶
An override for
HDFS_CMD_HOST
when running shell based Offload Transport commands such as Sqoop or Spark Submit.Alias
None
Default Value
Supported Values
Hostname or IP address of HDFS host
Version Added
3.1.0
-
--offload-transport-consistent-read
¶
Control whether parallel data transport tasks should use a consistent point in time when reading RDBMS data.
Alias
None
Default Value
Supported Values
true|false
Version Added
3.1.0
-
--offload-transport-dsn
¶
Database connection details for Offload Transport if different to
ORA_CONN
.Alias
None
Default Value
Supported Values
<hostname>:<port>/<service>
Version Added
3.1.0
-
--offload-transport-fetch-size
¶
Number of records to fetch in a single batch from the RDBMS during Offload. Offload Transport may encounter memory pressure if a table is very wide (e.g. contains LOB columns) and there are lots of records in a batch. Reducing the fetch size can alleviate this if more memory cannot be allocated.
Alias
None
Default Value
Supported Values
Positive integers
Version Added
3.1.0
-
--offload-transport-jvm-overrides
¶
JVM overrides (inserted right after
sqoop import
orspark-submit
).Alias
None
Default Value
Supported Values
Version Added
3.1.0
-
--offload-transport-livy-api-url
¶
URL for Livy/Spark REST API in the format
http://fqdn-n.example.com:port
.https
can be used in place ofhttp
.Alias
None
Default Value
Supported Values
Valid Livy REST API URL
Version Added
3.1.0
-
--offload-transport-livy-idle-session-timeout
¶
Timeout (in seconds) for idle Spark client sessions created in Livy.
Alias
None
Default Value
Supported Values
Positive integers
Version Added
3.1.0
-
--offload-transport-livy-max-sessions
¶
Limits the number of Livy sessions Offload will create. Sessions are re-used when idle. New sessions are only created when no idle sessions are available.
Alias
None
Default Value
Supported Values
Positive integers
Version Added
3.1.0
-
--offload-transport-parallelism
¶
The number of parallel streams to be used when transporting data from the source RDBMS to the backend.
Alias
None
Default Value
Supported Values
Positive integers
Version Added
3.1.0
-
--offload-transport-password-alias
¶
An alias provided by Hadoop Credential Provider API to be used for RDBMS authentication during Offload Transport. The key store containing the alias must be specified in either
OFFLOAD_TRANSPORT_CREDENTIAL_PROVIDER_PATH
or in Hadoop configuration Path (hadoop.security.credential.provider.path
).Alias
None
Default Value
Supported Values
Valid Hadoop Credential Provider API alias
Version Added
3.1.0
-
--offload-transport-queue-name
¶
YARN queue name to be used for Offload Transport jobs.
Alias
None
Default Value
Supported Values
Version Added
3.1.0
-
--offload-transport-small-table-threshold
¶
Threshold above which Query Import is no longer considered the correct offload choice for non-partitioned tables.
Alias
None
Default Value
Supported Values
E.g.
100M
,1G
,1.5G
Version Added
3.1.0
-
--offload-transport-spark-properties
¶
Key/value pairs, in JSON format, to override Spark property defaults. Examples:
'{"spark.driver.memory": "8G", "spark.executor.memory": "8G"}' '{"spark.driver.extraJavaOptions": "-Doracle.net.wallet_location=/some/path/here/gluent_wallet", "spark.executor.extraJavaOptions": "-Doracle.net.wallet_location=/some/path/here/gluent_wallet"}'
Alias
None
Default Value
Supported Values
Valid JSON string of key/value pairs (no nested or complex data types)
Version Added
3.1.0
-
--offload-transport-spark-thrift-host
¶
Name of host(s) where the Spark Thrift Server is running. Can be a comma-separated list of hosts to randomly choose from, e.g.
hadoop1,hadoop2,hadoop3
.Alias
None
Default Value
Supported Values
Hostname or IP address of Spark Thrift Server host(s)
Version Added
3.1.0
-
--offload-transport-spark-thrift-port
¶
Port that the Spark Thrift Server is listening on.
Alias
None
Default Value
Supported Values
Active port
Version Added
3.1.0
-
--offload-transport-validation-polling-interval
¶
Polling interval in seconds for validation of Spark transport row count. A value of -1 disables retrieval of RDBMS SQL statistics. A value of 0 disables polling resulting in a single capture of SQL statistics after Offload Transport. A value greater than 0 polls RDBMS SQL statistics using the specified interval.
Alias
None
Default Value
Supported Values
Interval value in seconds,
0
or-1
Version Added
4.2.1
-
--offload-type
¶
Identifies a range-partitioned offload as
FULL
orINCREMENTAL
.FULL
dictates that all data is offloaded.INCREMENTAL
dictates that data up to a boundary threshold will be offloaded.Alias
None
Default Value
INCREMENTAL
for RDBMS tables capable of supporting Partition-Based Offload that are partially offloaded (e.g. using--older-than-date
).FULL
for all other offloads.Supported Values
FULL|INCREMENTAL
Version Added
2.5.0
-
--older-than-date
¶
Offload partitions older than this date (use
YYYY-MM-DD
format). Overrides--older-than-days
if both are present.Alias
None
Default Value
None
Supported Values
Date in
YYYY-MM-DD
formatVersion Added
2.3.0
-
--older-than-days
¶
Offload partitions older than this number of days (exclusive, i.e. the boundary partition is not offloaded). Suitable for keeping data up to a certain age in the source table. Alternative to
--older-than-date
option. If both are supplied,--older-than-date
will be used.Alias
None
Default Value
None
Supported Values
Valid number of days
Version Added
2.3.0
-
--partition-columns
¶
Override column(s) to use for partitioning backend data. Defaults to source table partition columns.
This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.3.0
-
--partition-digits
¶
Maximum digits allowed for a numeric partition value.
Alias
None
Default Value
15
Supported Values
Integer values
Version Added
2.3.0
-
--partition-functions
¶
Custom UDF to use for synthetic partitioning of offloaded data. Used when no native partitioning scheme exists for the partition column data type. Google BigQuery only.
Alias
None
Default Value
None
Supported Values
Valid custom UDF
Version Added
4.2.0
-
--partition-granularity
¶
Partition level/granularity. Use:
Y
,M
,D
for date/timestamp partition columnsIntegral size for numeric partitions. A value of
1
is effectively list partitioningSub-string length for string partitions
Examples:
M
partitions the table by Year-MonthD
partitions the table by Year-Month-Day5000
partitions the table in ranges of 5000 values1
creates a partition per value, useful for columns holding values such as year and month or categories2
on a string partition key partitions using the first two characters
Alias
None
Default Value
Supported Values
Y|M|D|\d+
Version Added
2.3.0
-
--partition-lower-value
¶
Integer value defining the lower bound of a range values used for backend integer range partitioning. BigQuery only.
Alias
None
Default Value
None
Supported Values
Positive integers
Version Added
4.0.0
-
--partition-names
¶
Specify partitions to be included for offload with Partition-Based Offload. For range-partitioned tables only a single partition name can be specified and it is used to derive a value for
--less-than-value
/--older-than-date
as appropriate. For list-partitioned tables, this option is used to supply a CSV of all partitions to be offloaded and is additional to any partitions offloaded in previous operations.Alias
None
Default Value
None
Supported Values
Valid partition name(s)
Version Added
3.3.0
-
--partition-upper-value
¶
Integer value defining the upper bound of a range of values used for backend integer range partitioning. BigQuery only.
Alias
None
Default Value
None
Supported Values
Positive integers
Version Added
4.0.0
-
--preserve-load-table
¶
Stops the load table being dropped on completion of offload.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--purge
¶
When supported by the backend system, utilize purge when removing a table due to
--reset-backend-table
.Alias
None
Default Value
None
Supported Values
None
Version Added
2.4.9
-
--reset-backend-table
¶
Remove the backend table before offloading. Use with caution as this will delete previously offloaded data for this table.
Alias
None
Default Value
None
Supported Values
None
Version Added
3.3.0
-
--reset-hybrid-view
¶
Reset Partition-Based Offload, Subpartition-Based Offload or Predicate-Based Offload predicates in the hybrid view.
Alias
None
Default Value
None
Supported Values
None
Version Added
3.3.0
-
--skip-steps
¶
Skip given steps. CSV of step IDs to be skipped. Step IDs are found by replacing spaces with underscore and are case-insensitive.
For example, it is possible to skip Impala compute statistics commands using a value of
Compute_backend_statistics
if an initial offload is being performed in stages, and then gather them with the final offload command.Alias
None
Default Value
None
Supported Values
Valid offload step names
Version Added
2.3.0
-
--sort-columns
¶
CSV list of columns used to sort or cluster data when inserting into the final destination table. Offloads using Partition-Based Offload or Subpartition-Based Offload will retrieve the value used by the prior offload if no list of columns is explicitly provided. This option has no effect when
OFFLOAD_SORT_ENABLED
/--offload-sort-enabled
is false.When using Offload Join the column names in
--sort-columns
must match those in the final destination table (not the names used in the source tables).This option supports the wildcard character
*
in column names.Alias
None
Default Value
None for non-partitioned source tables,
--partition-columns
for partitioned source tablesSupported Values
Valid column name(s)
Version Added
2.7.0
-
--sqoop-disable-direct
¶
It is recommended that the OraOOP optimizations for Sqoop (included in standard Apache Sqoop from
v1.4.5
) are used. If not, then disable direct path mode.Alias
None
Default Value
SQOOP_DISABLE_DIRECT
,false
Supported Values
true|false
Version Added
2.3.0
-
--sqoop-mapreduce-map-java-opts
¶
Sqoop specific setting for
-Dmapreduce.map.java.opts
. Allows control over Java options for Sqoop MapReduce jobs.Alias
None
Default Value
None
Supported Values
Valid Sqoop Java options
Version Added
2.3.0
-
--sqoop-mapreduce-map-memory-mb
¶
Sqoop specific setting for
-Dmapreduce.map.memory.mb
. Allows control over memory allocation for Sqoop MapReduce jobs.Alias
None
Default Value
None
Supported Values
Valid numbers in MB
Version Added
2.3.0
-
--sqoop-additional-options
¶
Additional Sqoop command options added to the end of the Sqoop command.
Alias
None
Default Value
Supported Values
Any Sqoop command option/argument not already included in the Sqoop command line
Version Added
2.9.0
-
--sqoop-password-file
¶
Path to an HDFS file containing
ORA_APP_PASS
which is then passed to Sqoop using the Sqoop--password-file
option. This file should be protected with appropriate file system permissions.Alias
None
Default Value
Supported Values
Valid HDFS path
Version Added
2.5.0
-
--storage-compression
¶
Storage compression of final offload table.
GZIP
only available with Parquet.ZLIB
only available with ORC.MED
is an alias forSNAPPY
on both Impala and Hive. This is the default value because it gives the best balance of elapsed time to compression.HIGH
is an alias forGZIP
on Impala,ZLIB
on Hive.Alias
None
Default Value
MED
Supported Values
HIGH|MED|NONE|GZIP|ZLIB|SNAPPY
Version Added
2.3.0
-
--storage-format
¶
Storage format of final backend table. Not applicable to Google BigQuery or Snowflake.
Alias
None
Default Value
PARQUET
for Impala,ORC
for HiveSupported Values
ORC|PARQUET
Version Added
2.3.0
-
--timestamp-tz-columns
¶
CSV list of columns to offload as a timestamp with time zone (will only be effective for date-based columns).
This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
4.0.0
-
--udf-db
¶
Backend database to use for user-defined functions (UDFs).
Gluent Data Platform UDFs are used in Hadoop-based backends to:
Convert data to Oracle Database binary formats (
ORACLE_NUMBER
,ORACLE_DATE
)Perform Run-Length Encoding
Handle data conversion functions e.g.
UPPER
,LOWER
They are installed once during installation, and upgraded, using the
connect --install-udfs
command.Custom UDFs can also be created by users in BigQuery and used by Gluent Data Platform for synthetic partitioning. Custom UDFs must be installed prior to running any
offload
commands that require access to them.Alias
None
Default Value
Supported Values
Valid backend database
Version Added
2.3.0
-
--unicode-string-columns
¶
CSV list of columns to Offload as Unicode string (only effective for string columns).
This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
4.3.0
-
--variable-string-columns
¶
CSV list of columns to offload as a variable length string. Only effective for date/timestamp columns.
This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--verify
¶
Validation method to use when verifying data at the end of an offload.
Alias
None
Default Value
minus
Supported Values
minus|aggregate
Version Added
2.3.0
-
--verify-parallelism
¶
Degree of parallelism to use for the RDBMS query executed when validating an offload. Values of 0 or 1 will execute the query without parallelism. Values > 1 will force a parallel query of the given degree. If unset, the RDBMS query will fall back to using the behavior specified by RDBMS defaults.
Alias
None
Default Value
Supported Values
0
and positive integersVersion Added
4.2.1
Present Parameters¶
-
--aggregate-by
¶
CSV list of columns to aggregate by (GROUP BY) when presenting an Advanced Aggregation Pushdown rule.
This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.3.0
-
--base-name
¶
For aggregations only. Provide the name of the base hybrid view originally presented before aggregation. Use when the base view name is different to its source backend table.
Alias
None
Default Value
None
Supported Values
<SCHEMA>.<VIEW_NAME>
Version Added
2.3.0
-
--binary-columns
¶
CSV list of columns to present using a binary data type. Only effective for string-based columns.
This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--columns
¶
CSV list of columns to present.
This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.3.0
-
--count-star-expressions
¶
CSV list of functional equivalents to
COUNT(*)
for aggregation pushdown.If you also use
COUNT(x)
in your SQL statements then, apart fromCOUNT(1)
which is automatically catered for, the presence ofCOUNT(x)
will cause rewrite rules to fail unless you include it with this parameter.Alias
None
Default Value
None
Supported Values
E.g.
COUNT(9)
Version Added
2.3.0
-
--data-governance-custom-properties
¶
CSV list of free-format tags for data governance metadata. These are in addition to
DATA_GOVERNANCE_AUTO_PROPERTIES
and will overrideDATA_GOVERNANCE_CUSTOM_PROPERTIES
.Alias
None
Default Value
Supported Values
Valid JSON string of key/value pairs (no nested or complex data types)
Version Added
2.11.0
CSV list of free-format tags for data governance metadata. These are in addition to
DATA_GOVERNANCE_AUTO_TAGS
and therefore useful for tags to be applied to specific activities.Alias
None
Default Value
Supported Values
E.g.
CONFIDENTIAL,TIER1
Version Added
2.11.0
-
--date-columns
¶
CSV list of columns to present to Oracle Database as DATE (effective for datetime/timestamp columns).
This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.3.0
-
--date-fns
¶
CSV list of functions to apply to the non-aggregating date/timestamp projection.
Alias
None
Default Value
MIN
,MAX
,COUNT
Supported Values
MIN
,MAX
,COUNT
Version Added
2.3.0
-
--decimal-columns
¶
CSV list of columns to offload/present as a fixed precision and scale numeric data type. For example
DECIMAL(p,s)
where “p,s” is specified in a paired--decimal-columns-type
option. Only effective for numeric columns. These options allow repeat inclusion for flexible data type specification, for example:"--decimal-columns-type=18,2 --decimal-columns=price,cost --decimal-columns-type=6,4 --decimal-columns=location"
This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.5.0
-
--decimal-columns-type
¶
State the precision and scale of columns listed in a paired
--decimal-columns
option. Must be of format “precision,scale” where 1<=precision<=38 and 0<=scale<=38 and scale<=precision. e.g.:"--decimal-columns-type=18,2"
When offloading, values specified in this option are subject to padding as per the
--decimal-padding-digits
option.Alias
None
Default Value
None
Supported Values
Valid “precision,scale” where 1<=precision<=38 and 0<=scale<=38 and scale<=precision
Version Added
2.5.0
-
--detect-sizes
¶
Query backend table/view data length and set external table columns sizes accordingly.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--integer-1-columns
¶
CSV list of columns to offload/present (as applicable) as a 1-byte integer, known as
TINYINT
in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-2-columns
¶
CSV list of columns to offload/present (as applicable) as a 2-byte integer, known as
SMALLINT
in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-4-columns
¶
CSV list of columns to offload/present (as applicable) as a 4-byte integer, known as
INT
in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-8-columns
¶
CSV list of columns to offload/present (as applicable) as an 8-byte integer, known as
BIGINT
in many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-38-columns
¶
CSV list of columns to offload/present (as applicable) as 38 digit integral column. If a system does not support 38 digits of precision then the most appropriate data type available will be used. Only effective for numeric columns.
This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--interval-ds-columns
¶
CSV list of columns to present to Oracle Database as
INTERVAL DAY TO SECOND
type (will only be effective for backendSTRING
columns).This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.3.0
-
--interval-ym-columns
¶
CSV list of columns to present to Oracle Database as
INTERVAL YEAR TO MONTH
type (will only be effective for backendSTRING
columns).This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.3.0
-
--large-binary-columns
¶
CSV list of columns to present using a large binary data type, for example Oracle Database
BLOB
. Only effective for string-based columns.This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--large-string-columns
¶
CSV list of columns to present as a large string data type, for example Oracle Database
CLOB
. Only effective for string-based columns.This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--lob-data-length
¶
Expected length of RDBMS LOB data
Alias
None
Default Value
32K
Supported Values
E.g.
64K
,10M
Version Added
2.4.7
-
--materialize-join
¶
Use this option to materialize a join specified using
--present-join
.Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--measures
¶
CSV list of aggregated columns to include in the projection of an aggregated present.
This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.4.0
-
--no-create-aggregations
¶
Skip aggregation creation. If this parameter is used, then to benefit from Advanced Aggregation Pushdown the aggregate hybrid objects must be created using Present.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--no-gather-stats
¶
Skip generation of new statistics for presented tables/views (default behavior is to generate statistics for new aggregate/join views or existing backend tables with no statistics).
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--num-location-files
¶
Number of external table location files for parallel data retrieval.
Alias
None
Default Value
Supported Values
Integer values
Version Added
2.7.2
-
--numeric-fns
¶
CSV list of aggregate functions to apply to aggregated numeric columns or measures in an aggregation projection.
Alias
None
Default Value
MIN
,MAX
,AVG
,SUM
,COUNT
Supported Values
MIN
,MAX
,AVG
,SUM
,COUNT
Version Added
2.3.0
-
--present-join
¶
Present a view of the supplied join(s) allowing the join processing to be offloaded. Repeated use of
--present-join
allows multiple row sources to be included. See documentation for syntax.Alias
None
Default Value
None
Supported Values
Version Added
2.3.0
-
--reset-backend-table
¶
Remove the backend table before offloading. Use with caution as this will delete previously offloaded data for this table.
Alias
None
Default Value
None
Supported Values
None
Version Added
3.3.0
-
--sample-stats
¶
Estimate statistics by scanning a few (random) partitions for presented partitioned tables/views, or a percentage of the non-partitioned presented table/view for backends that support row based percentage sampling (default behavior is to scan the entire table).
Alias
None
Default Value
None
Supported Values
0-100
Version Added
2.3.0
-
--string-fns
¶
CSV list of aggregate functions to apply to aggregated string columns or measures in an aggregation projection.
Alias
None
Default Value
MIN
,MAX
,COUNT
Supported Values
MIN
,MAX
,COUNT
Version Added
2.3.0
-
--timestamp-columns
¶
CSV list of columns to present as a
TIMESTAMP
(only effective for date based columns)This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
4.0.0
-
--unicode-string-columns
¶
CSV list of columns to Present as Unicode string (only effective for string columns).
This option supports the wildcard character
*
in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
4.3.0
Incremental Update Parameters¶
-
--incremental-batch-size
¶
Batch (fetch) size to use when extracting changes for shipping from a table that is enabled for Incremental Update.
Alias
None
Default Value
1000
Supported Values
Positive integers
Version Added
2.5.0
-
--incremental-changelog-sequence-cache-size
¶
Specifies the cache size to use for a sequence coupled to the log table used for Incremental Update extraction.
Alias
None
Default Value
100
Supported Values
Positive integers
Version Added
2.10.0
-
--incremental-changelog-table
¶
Specifies the name of the log table to use for Incremental Update extraction (format is
<OWNER>.<TABLE>
). Not required when--incremental-extraction-method
isORA_ROWSCN
.Alias
None
Default Value
<Hybrid Schema>.<Table Name>_LOG
Supported Values
<OWNER>.<TABLE>
Version Added
2.5.0
-
--incremental-delta-threshold
¶
When running the compaction routine for a table enabled for Incremental Update, this threshold denotes the minimum number of changes required to enable the compaction routine to be executed (i.e. compaction will only be executed if there are at least this many rows in the delta table at a given time).
Alias
None
Default Value
50000
Supported Values
Positive integers
Version Added
2.5.0
-
--incremental-extraction-method
¶
Indicates which change extraction method to use when enabling Incremental Update for a table during an offload.
Alias
None
Default Value
ORA_ROWSCN
Supported Values
ORA_ROWSCN,CHANGELOG,UPDATABLE_CHANGELOG,UPDATABLE,CHANGELOG_INSERT,UPDATABLE_INSERT
Version Added
2.5.0
-
--incremental-full-compaction
¶
When running the compaction routine for a table that has Incremental Update enabled, insert compacted records into a new base table, also known as an out-of-place compaction.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.10.0
-
--incremental-key-columns
¶
Comma-separated list of columns that uniquely identify rows in an offloaded source table. Columns are used when extracting incremental changes from the source table and applying them to the offloaded table. In the absence of this parameter the primary key of the table is used.
This option supports the wildcard character
*
in column names.Alias
None
Default Value
Primary key
Supported Values
Comma-separated list of columns
Version Added
2.5.0
-
--incremental-no-lockfile
¶
When running the compaction routine for a table that is enabled for Incremental Update, do not use a lockfile on the local filesystem to prevent multiple compaction processes from running concurrently (on that machine).
Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--incremental-no-verify-primary-key
¶
Bypass verification of mandatory primary key when using
CHANGELOG_INSERT
orUPDATABLE_INSERT
extraction methods.Alias
None
Default Value
None
Supported Values
None
Version Added
2.9.0
Warning
With this option, users must ensure that no duplicate records are inserted.
-
--incremental-no-verify-shipped
¶
Bypass verification of the number of change records shipped when extracting and shipping changes for a table that is enabled for Incremental Update. Not applicable when using Incremental Update with Google BigQuery backends.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--incremental-partition-wise-full-compaction
¶
When running the compaction routine for a table that has Incremental Update enabled, insert compacted records into the new base table partition-wise. Note that this may cause the compaction process to take significantly longer overall, but it can also significantly reduce the cluster resources used by compaction at any one time.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
. Renamed from--incremental-partition-wise-compaction
in2.10.0
-
--incremental-retain-obsolete-objects
¶
Retain the previous artifacts when the compaction routine has completed for a table with Incremental Update enabled.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
Warning
With this option, users must manage previous artifacts and associated storage. In some circumstances, retained obsolete objects can cause the re-offloading of entire tables (with the
--reset-backend-table
option) to fail.
-
--incremental-run-compaction
¶
Run the compaction routine for a table that has Incremental Update enabled. Must be used in conjunction with the
--execute
parameter.Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--incremental-run-compaction-without-snapshot
¶
Run the compaction routine for a table without creating an HDFS snapshot.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.10.0
-
--incremental-run-extraction
¶
Extract and ship all new changes for a table that has Incremental Update enabled. Must be used in conjunction with the
--execute
parameter.Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--incremental-terminate-compaction
¶
When running the compaction routine for a table with Incremental Update enabled, instruct the compaction process to exit when blocked by some external condition. By default, the compaction process will keep running when blocked, but will drop into a sleep-then-poll loop.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--incremental-tmp-dir
¶
When extracting and shipping changes for a table that has Incremental Update enabled, this specifies the staging directory to be used for local data files, before they are shipped to HDFS.
Alias
None
Default Value
<OFFLOAD_HOME>/tmp/incremental_changes
Supported Values
Valid writable directory
Version Added
2.5.0
-
--incremental-updates-disabled
¶
Disables Incremental Update for the specified table.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.6.0
-
--incremental-updates-enabled
¶
Enables Incremental Update for the table being offloaded.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--incremental-wait-time
¶
When running the compaction routine for a table that has Incremental Update enabled, this specifies the minimum amount of time (in minutes) to allow for active queries to complete before performing any database operations that could cause such queries to fail.
Alias
None
Default Value
15
Supported Values
0 and positive integers
Version Added
2.5.0
Validate Parameters¶
-
--aggregate-functions
¶
Comma-separated list of aggregate functions to apply, e.g.
MIN,MAX,COUNT
. Functions need to be available and use the same arguments in both frontend and backend databases.Alias
-A
Default Value
[('MIN', 'MAX', 'COUNT')]
Supported Values
CSV list of expressions
Version Added
2.3.0
-
--as-of-scn
¶
Execute validation on frontend site as-of a specified SCN (assumes an
ORACLE
frontend).Alias
None
Default Value
None
Supported Values
Valid SCN
Version Added
2.3.0
-
--filters
¶
Comma-separated list of (<column> <operation> <value>) expressions, e.g.
PROD_ID < 12, CUST_ID >= 1000
. Expressions must be supported in both frontend and backend databases.Alias
-F
Default Value
None
Supported Values
CSV list of expressions
Version Added
2.3.0
-
--frontend-parallelism
¶
Degree of parallelism to use for the RDBMS query executed when validating an offload. Values of 0 or 1 will execute the query without parallelism. Values > 1 will force a parallel query of the given degree. If unset, the RDBMS query will fall back to using the behavior specified by RDBMS defaults.
Alias
None
Default Value
Supported Values
0
and positive integersVersion Added
4.2.1
-
--group-bys
¶
Comma-separated list of group by expressions, e.g.
COL1, COL2
. Expressions must be supported in both frontend and backend databases.This option supports the wildcard character
*
in column names.Alias
-G
Default Value
None
Supported Values
csv list of expressions
Version Added
2.3.0
-
--selects
¶
Comma-separated list of columns OR <number> of columns to run aggregations on. If <number> is specified the first and last columns and the <number>-2 highest cardinality columns will be selected.
This option supports the wildcard character
*
in column names.Alias
-S
Default Value
5
Supported Values
CSV list of columns OR <number>
Version Added
2.3.0
-
--skip-boundary-check
¶
Do not include ‘offloaded boundary check’ in the list of filters. The ‘offloaded boundary check’ filter defines data that was offloaded to the backend database. For example:
WHERE TIME_ID < timestamp '2015-07-01 00:00:00'
which resulted from applying the--older-than-date=2015-07-01
filter during offload.Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
Schema Sync Parameters¶
-
--command-file
¶
Name of an additional log file to record the commands that have been applied (if the
--execute
option has been used) or should be applied (if the--execute
option has not been used). Supplied as full or relative path.Alias
None
Default Value
None
Supported Values
Full or relative path to file
Version Added
2.8.0
-
--include
¶
CSV list of schemas, schema.tables or tables to examine for change detection and evolution. Supports wildcards (using
*
). Example formats:SCHEMA1
,SCHEMA*
,SCHEMA1.TABLE1,SCHEMA1.TABLE2,SCHEMA2.TAB*
,SCHEMA1.TAB*
,*.TABLE1,*.TABLE2
,*.TAB*
.Alias
None
Default Value
None
Supported Values
List of one or more schema(s), schema(s).table(s) or table(s)
Version Added
2.8.0
-
--no-create-aggregations
¶
Skip aggregation creation. If this parameter is used, then to benefit from Advanced Aggregation Pushdown the aggregate hybrid objects must be created using Present.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
Diagnose Parameters¶
-
--backend-log-size-limit
¶
Size limit for data returned from each backend log e.g.
100K
,0.5M
,1G
.Alias
None
Default Value
10M
Supported Values
<n><K|M|G|T>
Version Added
2.11.0
-
--hive-http-endpoint
¶
Endpoint of the HiverServer2 or HiveServer2 Interactive (LLAP) service in the format
<server|ip address>:<port>
.Alias
None
Default Value
None
Supported Values
<server|ip address>:<port>
Version Added
3.1.0
-
--impalad-http-port
¶
Port of the Impala Daemon HTTP Server.
Alias
None
Default Value
25000
Supported Values
Positive integers
Version Added
2.11.0
-
--include-backend-logs
¶
Retrieve backend query engine logs.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.11.0
-
--include-backend-config
¶
Retrieve backend query engine config.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.11.0
-
--include-logs-from
¶
Collate and package log files modified or created since date (format:
YYYY-MM-DD
) or date/time (format:YYYY-MM-DD_HH24:MM:SS
). Can be used in conjunction with the--include-logs-to
parameter to specify a search range.Alias
None
Default Value
None
Supported Values
YYYY-MM-DD
orYYYY-MM-DD_HH24:MM:SS
Version Added
2.11.0
-
--include-logs-last
¶
Collate and package log files modified or created in the last
n
[d]ays (e.g. 3d) or [h]ours (e.g. 7h).Alias
None
Default Value
None
Supported Values
<n><d|h>
Version Added
2.11.0
-
--include-logs-to
¶
Collate and package log files modified or created since date (format:
YYYY-MM-DD
) or date/time (format:YYYY-MM-DD_HH24:MM:SS
). Can be used in conjunction with the--include-logs-from
parameter to specify a search range.Alias
None
Default Value
None
Supported Values
YYYY-MM-DD
orYYYY-MM-DD_HH24:MM:SS
Version Added
2.11.0
-
--include-permissions
¶
Collect permissions of files and directories related to Gluent Data Platform.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.11.0
-
--include-processes
¶
Collect details for running processes related to Gluent Data Platform.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.11.0
-
--include-query-logs
¶
Retrieve logs for a supplied query ID.
Alias
None
Default Value
None
Supported Values
Valid Impala/LLAP query ID
Version Added
2.11.0
-
--log-location
¶
Location in which to search for log files.
Alias
None
Default Value
OFFLOAD_HOME/log
Supported Values
Valid directory path
Version Added
2.11.0
-
--output-location
¶
Location in which to save files created by Diagnose.
Alias
None
Default Value
OFFLOAD_HOME/log
Supported Values
Valid directory path
Version Added
2.11.0
-
--retain-created-files
¶
By default, after they have been packaged, files created by Diagnose in
--output-location
are removed. Specify this parameter to retain them.Alias
None
Default Value
None
Supported Values
None
Version Added
2.11.0
-
--spark-application-id
¶
Retrieve logs for a supplied Spark application ID.
Alias
None
Default Value
None
Supported Values
Valid Spark application ID
Version Added
3.1.0
Offload Status Report Parameters¶
-
--csv-delimiter
¶
Field delimiter character for output.
Alias
None
Default Value
,
Supported Values
Must be a single character
Version Added
2.11.0
-
--csv-enclosure
¶
Enclosure character for string fields in CSV output.
Alias
None
Default Value
"
Supported Values
Must be a single character
Version Added
2.11.0
-
-o
¶
Output format for Offload Status Report data.
Alias
--output-format
Default Value
text
Supported Values
csv|text|html|json|raw
Version Added
2.11.0
-
--output-format
¶
Output format for Offload Status Report data.
Alias
-o
Default Value
text
Supported Values
csv|text|html|json|raw
Version Added
2.11.0
-
--output-level
¶
Level of detail required for the Offload Status Report.
Alias
None
Default Value
summary
Supported Values
summary|detail
Version Added
2.11.0
-
--report-directory
¶
Directory to save the report in.
Alias
None
Default Value
OFFLOAD_HOME/log
Supported Values
Valid directory path
Version Added
2.11.0
-
--report-name
¶
Name of report.
Alias
None
Default Value
Gluent_Offload_Status_Report_{DB_NAME}_{YYYY}-{MM}-{DD}_{HH}-{MI}-{SS}.[html|txt|csv]
Supported Values
Valid filename
Version Added
2.11.0
-
-s
¶
Optional name of schema to run the Offload Status Report for.
Alias
--schema
Default Value
None
Supported Values
Valid schema name
Version Added
2.11.0
-
--schema
¶
Optional name of schema to run the Offload Status Report for.
Alias
-s
Default Value
None
Supported Values
Valid schema name
Version Added
2.11.0
-
-t
¶
Optional name of table to run the Offload Status Report for.
Alias
--table
Default Value
None
Supported Values
Valid table name
Version Added
2.11.0
-
--table
¶
Optional name of table to run the Offload Status Report for.
Alias
-t
Default Value
None
Supported Values
Valid table name
Version Added
2.11.0
Password Tool Parameters¶
-
--encrypt
¶
Encrypt a clear-text, case-sensitive password. User will be prompted for the input password and the encrypted version will be output.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--keygen
¶
Generate a password key file of the name given by
--keyfile
.Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--keyfile
¶
Name of the password key file to generate.
Alias
None
Default Value
None
Supported Values
Valid path and file name
Version Added
2.5.0
Result Cache Manager Parameters¶
-
--rc-retention-hours
¶
Controls how long to retain Result Cache files for.
Alias
None
Default Value
24
Supported Values
Valid number of hours
Version Added
2.3.0
Oracle Database Schemas¶
Gluent Data Platform Admin Schema¶
This account is used by Gluent Data Platform to perform administrative activities. It is defined by ORA_ADM_USER
.
Non-standard privileges granted to this schema are:
-
ANALYZE ANY
Required to copy optimizer statistics from application schema to hybrid schema
-
GRANT ANY OBJECT PRIVILEGE
Enables the Admin Schema to grant permission on application schema tables to the hybrid schema.
-
SELECT ANY DICTIONARY
Enables Offload and Present operations to access the Oracle Database data dictionary for information such as column names, data types and partitioning schemes.
-
SELECT ANY TABLE
Required for Offload activity.
Gluent Data Platform Application Schema¶
This account is used by Gluent Data Platform to perform read-only activities. It is defined by ORA_APP_USER
.
Non-standard privileges granted to this schema are:
-
FLASHBACK ANY TABLE
Required for Sqoop to provide a consistent point-in-time data load. The Gluent Data Platform application schema does not have DML privileges on user application schema tables, therefore there is no threat posed by this configuration.
-
SELECT ANY DICTIONARY
Documented requirement of Sqoop.
-
SELECT ANY TABLE
Required for Sqoop to read application schema tables during an offload.
Gluent Data Platform Repository Schema¶
This account is used by Gluent Data Platform to store operational metadata. It is defined by ORA_REPO_USER
.
Non-standard privileges granted to this schema are:
-
SELECT ANY DICTIONARY
Enables installed database packages in support of the metadata repository to access the Oracle Database data dictionary.
Hybrid Schemas¶
Gluent Data Platform hybrid schemas are required to enable remote data to be queried in tandem with customer data in the RDBMS application schema.
Non-standard privileges granted to hybrid schemas are:
-
CONNECT THROUGH GLUENT_ADM
Offload and Present use this to create hybrid objects without requiring powerful
CREATE ANY
andDROP ANY
privileges.
-
GLOBAL QUERY REWRITE
Required to support Gluent Query Engine optimizations.
-
SELECT ANY TABLE
Enables a hybrid view to access the original application schema and offloaded table.
Data Daemon¶
Properties¶
The following Java properties can be set by creating a $OFFLOAD_HOME/conf/datad.properties
file containing <property>=<value>
properties and values.
-
datad.initial-request-pool-size
¶
The initial size of the thread pool for concurrent read requests from the RDBMS.
Default Value
16
Supported Values
Positive integers
Version Added
4.2.2
-
datad.max-request-pool-size
¶
The maximum size of the thread pool for concurrent read requests from the RDBMS.
Default Value
1024
Supported Values
Positive integers
Version Added
4.2.2
-
datad.read-pipeline-size
¶
The number of reads from the backend to keep in the pipeline to be processed.
Default Value
4
Supported Values
Positive integers
Version Added
4.0.0
-
datad.send-queue-size
¶
The maximum size in MB of the queue to send to the RDBMS.
Default Value
16
Supported Values
Positive integers
Version Added
4.0.0
-
grpc.port
¶
The port used for Data Daemon. Setting to
0
results in random port selection.Default Value
50051
Supported Values
Any valid port
Version Added
4.0.0
-
grpc.security.cert-chain
¶
The full path to the certificate chain in PEM format to enable TLS on the Data Daemon socket.
Default Value
None
Supported Values
file:<full path to PEM file>
Version Added
4.0.0
-
grpc.security.private-key
¶
The full path to the private key in PEM format to enable TLS on the Data Daemon socket.
Default Value
None
Supported Values
file:<full path to PEM file>
Version Added
4.0.0
-
logging.config
¶
The full path to a LOGBack format configuration file to override default logging.
Default Value
None
Supported Values
<full path to xml file>
Version Added
4.0.0
-
logging.level.com.gluent.providers.bigquery.BigQueryProvider
¶
The log level for Data Daemon interactions with BigQuery.
Default Value
info
Supported Values
off|error|warn|info|debug|all
Version Added
4.0.0
-
logging.level.com.gluent.providers.impala.ImpalaProvider
¶
The log level for Data Daemon interactions with Impala.
Default Value
info
Supported Values
off|error|warn|info|debug|all
Version Added
4.0.0
-
logging.level.com.gluent.providers.jdbc.JdbcDataProvider
¶
The log level for general Data Daemon operations when interacting with Snowflake and Azure Synapse Analytics.
Default Value
info
Supported Values
off|error|warn|info|debug|all
Version Added
4.1.0
-
logging.level.com.gluent.providers.snowflake.SnowflakeJdbcDataProvider
¶
The log level for Data Daemon interactions with Snowflake.
Default Value
info
Supported Values
off|error|warn|info|debug|all
Version Added
4.1.0
-
logging.level.com.gluent.providers.synapse.SynapseProvider
¶
The log level for Data Daemon interactions with Azure Synapse Analytics.
Default Value
info
Supported Values
off|error|warn|info|debug|all
Version Added
4.3.0
-
server.port
¶
The port used for Data Daemon Web Interface. Setting to
0
results in random port selection.Default Value
50052
Supported Values
Any valid port
Version Added
4.0.0
-
spring.main.web-application-type
¶
Allows Data Daemon Web Interface to be disabled.
Default Value
None
Supported Values
NONE
Version Added
4.0.0
Configuration¶
The following Java configuration options can be set by creating a $OFFLOAD_HOME/conf/datad.conf
file containing JAVA_OPTS="<parameter1> <parameter2> ..."
e.g. JAVA_OPTS="-Xms2048m -Xmx2048m -Djavax.security.auth.useSubjectCredsOnly=false"
.
-
-Xms
¶
Sets the initial and minimum Java heap size.
Default Value
Smaller of 1/4th of the physical memory or 1GB
Supported Values
-Xms<size>[g|G|m|M|k|K]
Version Added
4.0.0
-
-Xmx
¶
Sets the maximum Java heap size.
Default Value
Larger of 1/64th of the physical memory or some reasonable minimum
Supported Values
-Xmx<size>[g|G|m|M|k|K]
Version Added
4.0.0
-
-Djavax.security.auth.useSubjectCredsOnly
¶
Required to be set to
false
when authenticating with a Kerberos enabled backend.Default Value
true
Supported Values
true|false
Version Added
4.0.0