Reference¶
Table of Contents
Documentation Conventions¶
Commands and keywords are in this
font.$OFFLOAD_HOMEis set when the environment file (offload.env) is sourced, unless already set, and refers to the directory namedoffloadthat is created when the software is unpacked. This is also referred to as<OFFLOAD_HOME>in sections of this guide where the environment file has not been created/sourced.Third party vendor product names might be aliased or shortened for simplicity. See Third Party Vendor Products for cross-references to full product names and trademarks.
Environment File¶
-
AWS_ACCESS_KEY_ID¶ Access key ID for AWS authentication, required when staging offloaded data to S3 and not using either an AWS credentials file or instance-level permissions.
Supported Values
Valid AWS access key ID
Version Added
4.1.0
-
AWS_SECRET_ACCESS_KEY¶ Secret access key for AWS authentication, required when staging offloaded data to S3 and not using either an AWS credentials file or instance-level permissions.
Supported Values
Valid AWS secret access key
Version Added
4.1.0
-
BACKEND_DISTRIBUTION¶ Backend system distribution override.
Supported Values
CDH|GCP|MSAZURE|SNOWFLAKEVersion Added
2.3.0
-
BACKEND_IDENTIFIER_CASE¶ Case conversion to be applied to any backend identifier names created by Gluent Data Platform. Backend systems may ignore any case conversion if they are case-insensitive.
Supported Values
UPPER|LOWER|NO_MODIFYVersion Added
4.0.0
-
BACKEND_ODBC_DRIVER_NAME¶ Name of the Microsoft ODBC driver as specified in
odbcinst.ini.Supported Values
Valid
odbcinst.inientryVersion Added
4.3.0
-
BIGQUERY_DATASET_LOCATION¶ Google BigQuery location to use when creating a dataset. Only applicable when creating datasets using the
--create-backend-dboption.Supported Values
Any valid Google BigQuery location
Version Added
4.0.2
Note
Google BigQuery dataset locations must be compatible with that of the Google Cloud Storage bucket specified in OFFLOAD_FS_CONTAINER
-
CLASSPATH¶ Ensures Gluent
libdirectory is included.Supported Values
Valid paths
Version Added
2.3.0
-
CLOUDERA_NAVIGATOR_HIVE_SOURCE_ID¶ The Cloudera Navigator entity ID for the Hive source that will register metadata. See the Installation and Upgrade guide for details on how to set this parameter.
Supported Values
Valid Cloudera Navigator entity ID
Version Added
2.11.0
-
CONNECTOR_HIVE_SERVER_HOST¶ Name of host(s) to use to connect to Impala/Hive. Can be a comma-separated list of hosts to randomly choose from, e.g.
hadoop1,hadoop2,hadoop3. Use when configuring Gluent Query Engine to connect to a different Cloudera Data Platform experience to Gluent Offload Engine (e.g. Data Warehouse rather than Data Hub). If unset, all connections will be made toHIVE_SERVER_HOST.Supported Values
Hostname or IP address of Impala/Hive host(s)
Version Added
4.1.0
-
CONNECTOR_HIVE_SERVER_HTTP_PATH¶ Path component of URL endpoint when connecting to HiveServer2 in HTTP mode (i.e. when
HIVE_SERVER_HTTP_TRANSPORTistrue). Use when configuring Gluent Query Engine to connect to a different Cloudera Data Platform experience to Gluent Offload Engine (e.g. Data Warehouse rather than Data Hub). If unset, all connections will be made withHIVE_SERVER_HTTP_PATH.Supported Values
Valid URL path
Version Added
4.1.0
-
CONNECTOR_SQL_ENGINE¶ SQL engine used by Gluent Query Engine for hybrid queries.
Default Value
IMPALASupported Values
IMPALA|BIGQUERY|SNOWFLAKE|SYNAPSEVersion Added
3.1.0
-
CONN_PRE_CMD¶ Used to set pre-commands before query execution, e.g.
set hive.execution.engine=tez;.Supported Values
Supported session
setparametersVersion Added
2.3.0
-
DATA_GOVERNANCE_API_PASS¶ Password for the account specified in
DATA_GOVERNANCE_API_USER. Password encryption is supported using the Password Tool utility.Supported Values
Cloudera Navigator service account password
Version Added
2.11.0
-
DATA_GOVERNANCE_API_URL¶ URL for a data governance REST API in the format
http://fqdn-n.example.com:port/api. Leaving this configuration item blank disables data governance integration.Supported Values
Valid Cloudera Navigator REST API URL
Version Added
2.11.0
-
DATA_GOVERNANCE_API_USER¶ Service account to be used to connect to a data governance REST API.
Supported Values
Cloudera Navigator service account name
Version Added
2.11.0
-
DATA_GOVERNANCE_AUTO_PROPERTIES¶ CSV string of dynamic properties to include in data governance metadata. The tokens in the CSV will be expanded at runtime if prefixed with
+or ignored if prefixed with-.Supported Values
CSV containing the following tokens prefixed with either
+or-.GLUENT_OBJECT_TYPE,SOURCE_RDBMS_TABLE,TARGET_RDBMS_TABLE,INITIAL_GLUENT_VERSION,LATEST_GLUENT_VERSION,INITIAL_OPERATION_DATETIME,LATEST_OPERATION_DATETIMEVersion Added
2.11.0
-
DATA_GOVERNANCE_AUTO_TAGS¶ CSV string of tags to include in data governance metadata. Tags are free-format except for
+RDBMS_NAMEwhich is expanded at run time.Default Value:
GLUENT,+RDBMS_NAMESupported Values
CSV containing tags to attach to data governance metadata
Version Added
2.11.0
-
DATA_GOVERNANCE_BACKEND¶ Specify the data governance API type accessed via
DATA_GOVERNANCE_API_URL.Supported Values
navigatorVersion Added
2.11.0
-
DATA_GOVERNANCE_CUSTOM_PROPERTIES¶ JSON string of key/value pairs to include in data governance metadata.
Associated Option
Supported Values
Valid JSON string of key/value pairs (no nested or complex data types)
Version Added
2.11.0
-
DATA_GOVERNANCE_CUSTOM_TAGS¶ CSV string of tags to include in data governance metadata.
Associated Option
Supported Values
CSV containing tags to attach to data governance metadata
Version Added
2.11.0
-
DATA_SAMPLE_PARALLELISM¶ Degree of parallelism to use when sampling data for all columns in the source RDBMS table that are either date/timestamp-based or defined as a number without a precision and scale. A value of 0 or 1 disables parallelism.
Associated Option
Default Value
0Supported Values
0and positive integersVersion Added
4.2.0
-
DATAD_ADDRESS¶ The address(es) of Data Daemon. For a single daemon the format is
<hostname/IP address>:<port>. Specifying multiple daemons can be achieved in one of two ways:By DNS address. The DNS server can return multiple A records for a hostname and Gluent Data Platform will load balance between these, e.g.
<load-balancer-address>:<load-balancer-port>By IP address and port. The comma-separated list must be prefixed with
ipv4:e.g.ipv4:<hostname/IP address>:<port>,<hostname/IP address>:<port>
Supported Values
<hostname/IP address>:<port>,<load-balancer-address>:<load-balancer-port>,ipv4:<hostname/IP address>:<port>,<hostname/IP address>:<port>Version Added
4.0.0
-
DATAD_SSL_ACTIVE¶ Set to
truewhen TLS is enabled on the Data Daemon socket.Supported Values
true|falseVersion Added
4.0.0
-
DATAD_SSL_TRUSTED_CERTS¶ The trusted certificate when TLS is enabled on the Data Daemon socket.
Supported Values
Full path to the trusted certificate
Version Added
4.0.0
-
DATAD_WEB_PASS¶ Password for authentication with Data Daemon Web Interface (if configured). Password encryption is supported using the Password Tool utility.
Supported Values
Data Daemon Web Interface user password
Version Added
4.1.0
-
DATAD_WEB_USER¶ User for authentication with Data Daemon Web Interface (if configured).
Supported Values
Data Daemon Web Interface username
Version Added
4.1.0
-
DB_NAME_PREFIX¶ Database name/path prefix for multitenant support. This allows multiple Oracle Database databases to offload to the same backend cluster. If undefined, the
DB_UNIQUE_NAMEwill be used, giving<DB_UNIQUE_NAME>_<schema>. If defined but empty, no prefix is used, giving<schema>. Otherwise, databases will be named<DB_NAME_PREFIX>_<schema>.If the source database is part of an Oracle Data Guard configuration set
DB_NAME_PREFIXto ensure thatDB_UNIQUE_NAMEis not used.Associated Option
Supported Values
Characters supported by the backend database/dataset/schema-naming rules
Version Added
2.3.0
-
DEFAULT_BUCKETS¶ Default number of offload buckets for parallel data retrieval from the backend Hadoop system. If you aim to run your biggest queries with parallel DOP X then set this value to X. This way each Oracle Database PX slave can start its own Smart Connector process for fetching a subset of data.
Associated Option
Supported Values
Valid Oracle Database DOP
Version Added
2.3.0
-
DEFAULT_BUCKETS_MAX¶ Upper limit of
DEFAULT_BUCKETSwhenDEFAULT_BUCKETS=AUTO.Default Value
16Supported Values
Valid Oracle Database DOP
Version Added
2.7.0
-
DEFAULT_BUCKETS_THRESHOLD¶ Threshold at which RDBMS segments are considered “small” by
DEFAULT_BUCKETS=AUTOtuning.Supported Values
E.g.
3M,0.5GVersion Added
2.7.0
-
GOOGLE_APPLICATION_CREDENTIALS¶ Path to Google service account private key JSON file.
Supported Values
Valid paths
Version Added
4.0.0
-
GOOGLE_KMS_KEY_NAME¶ Google Cloud Key Management Service cryptographic key name to use for encryption and decryption operations. The purpose of this key must be Symmetric encryption.
Supported Values
Valid KMS key name
Version Added
4.2.0
-
GOOGLE_KMS_KEY_RING_NAME¶ Google Cloud Key Management Service cryptographic key ring name containing the key defined in
GOOGLE_KMS_KEY_NAMESupported Values
Valid KMS key ring name
Version Added
4.2.0
-
GOOGLE_KMS_KEY_RING_LOCATION¶ Google Cloud Key Management Service cryptographic key ring location of the key ring defined in
GOOGLE_KMS_KEY_RING_NAMESupported Values
Valid Google Cloud Service locations
Version Added
4.2.0
-
HADOOP_SSH_USER¶ User to connect to Hadoop server(s) defined in
HIVE_SERVER_HOSTusing password-less SSH.Supported Values
Valid host username
Version Added
2.3.0
-
HDFS_CMD_HOST¶ Overrides
HIVE_SERVER_HOSTfor the HDFS command steps only. In split installation environments where orchestration commands are run from a Hadoop edge node(s), set this tolocalhostin the Hadoop edge node(s) configuration file.Supported Values
Hostname or IP address of HDFS host
Version Added
2.3.0
-
HDFS_DATA¶ HDFS data directory of the
HIVE_SERVER_USER. Used to store offloaded data.Associated Option
Supported Values
Valid HDFS directory
Version Added
2.3.0
-
HDFS_DB_PATH_SUFFIX¶ Hadoop databases are named
<schema><HDFS_DB_PATH_SUFFIX>and<schema>_load<HDFS_DB_PATH_SUFFIX>. When this value is not set the suffix of the databases defaults to.db, giving<schema>.dband<schema>_load.db. Set this to an empty string to use no suffix. For backend systems other than Hadoop this variable has no effect.Associated Option
Supported Values
Hostname or IP address of HDFS host
Version Added
2.3.0
-
HDFS_HOME¶ HDFS home directory of the
HIVE_SERVER_USER.Supported Values
Valid HDFS directory
Version Added
2.3.0
-
HDFS_LOAD¶ HDFS data directory of the
HIVE_SERVER_USER. Used to stage offloaded data.Supported Values
Valid HDFS directory
Version Added
3.4.0
-
HDFS_NAMENODE_ADDRESS¶ Hostname or IP address of the active HDFS namenode or the ID of the HDFS nameservice if HDFS High Availability is configured. This value is required in order to execute result cache queries. In a deployment where result cache queries will never be used, this variable can safely be unset.
Supported Values
Hostname or IP address of active HDFS namenode or ID of the HDFS nameservice if HDFS High Availability is configured
Version Added
2.3.0
-
HDFS_NAMENODE_PORT¶ Port of the active HDFS namenode. Set to
0if HDFS High Availability is configured andHDFS_NAMENODE_ADDRESSis set to a nameservice ID. As withHDFS_NAMENODE_ADDRESS, this value is necessary for executing result cache queries, but otherwise can safely be unset.Supported Values
Port of active HDFS namenode or
0if HDFS High Availability is configuredVersion Added
2.3.0
-
HDFS_RESULT_CACHE_USER¶ Hadoop user to impersonate when making HDFS requests for result cache queries; must have write permissions to HDFS_HOME. In a deployment where result cache queries will never be used, this variable can safely be unset.
Default Value
Supported Values
Hadoop username
Version Added
2.3.0
-
HDFS_SNAPSHOT_PATH¶ Before an Incremental Update compaction a HDFS snapshot will be automatically created in the location specified by HDFS_SNAPSHOT_PATH. This location must be a snapshottable directory (consult your HDFS administrators to enable this). When changing HDFS_SNAPSHOT_PATH from the default ensure that it remains a parent directory of
HDFS_DATA. Unsetting this variable will disable automatic HDFS snapshots.Default Value
Supported Values
HDFS path that is equal to or a parent of
HDFS_DATAVersion Added
2.10.0
-
HDFS_SNAPSHOT_SUDO_COMMAND¶ If
HADOOP_SSH_USERis not the inode owner ofHDFS_SNAPSHOT_PATHthen HDFS superuser rights will be required to take HDFS snapshots. Asudorule (or equivalent user substitution tool) can be used to enable this using HDFS_SNAPSHOT_SUDO_COMMAND. The command must be password-less.Supported Values
A valid user-substitution command
Version Added
2.10.0
-
HIVE_SERVER_AUTH_MECHANISM¶ Authentication mechanism for HiveServer2. In non-kerberized and non-LDAP environments, should be set to: Impala:
NOSASL, Hive: value ofhive-site.xml:hive.server2.authentication. In LDAP environments, should be set toPLAIN.Supported Values
NOSASL|PLAIN, value ofhive-site.xml:hive.server2.authenticationVersion Added
2.3.0
-
HIVE_SERVER_HOST¶ Name of host(s) to connect to Impala/Hive. Can be a comma-separated list of hosts to randomly choose from, e.g.
hadoop1,hadoop2,hadoop3.Supported Values
Hostname or IP address of Impala/Hive host(s)
Version Added
2.3.0
-
HIVE_SERVER_HTTP_PATH¶ Path component of URL endpoint when connecting to HiveServer2 in HTTP mode (i.e. when
HIVE_SERVER_HTTP_TRANSPORTistrue).Supported Values
Valid URL path
Version Added
4.1.0
-
HIVE_SERVER_HTTP_TRANSPORT¶ Use HTTP transport for HiveServer2 connections.
Default Value
falseSupported Values
true|falseVersion Added
4.1.0
-
HIVE_SERVER_PASS¶ Password of the user to authenticate with HiveServer2 service. Required in LDAP enabled Impala configurations. Password encryption is supported using the Password Tool utility.
Supported Values
HiveServer2 service password
Version Added
2.3.0
-
HIVE_SERVER_PORT¶ Port of HiveServer2 service. Default Impala port is
21050, default Hive port is10000.Default Value
21050|10000Supported Values
Port of HiveServer2 service
Version Added
2.3.0
-
HIVE_SERVER_USER¶ Name of the user to authenticate with HiveServer2 service.
Supported Values
HiveServer2 service username
Version Added
2.3.0
-
HYBRID_EXT_TABLE_DEGREE¶ Default degree of parallelism for base hybrid external tables. When set to
AUTOOffload will copy settings from the source RDBMS table to the hybrid external table.Associated Option
Supported Values
AUTOand positive integersVersion Added
2.11.2
-
HS2_SESSION_PARAMS¶ Comma-separated list of HiveServer2 session parameters to set.
BATCH_SIZE=16384is a recommended performance setting.E.g.
export HS2_SESSION_PARAMS="BATCH_SIZE=16384,MEM_LIMIT=2G".Supported Values
Valid Impala/Hive session parameters
Version Added
2.3.0
-
IN_LIST_JOIN_TABLE¶ Database and table name of the in-list join table. Can be created and populated with
./connect --create-sequence-table. Applicable to Impala.Supported Values
Valid database and table name
Version Added
2.4.2
-
IN_LIST_JOIN_TABLE_SIZE¶ Size of table specified by
IN_LIST_JOIN_TABLE. Required for both table population byconnect, and table usage by Gluent Query Engine. Applicable to Impala.Supported Values
Up to 1000000
Version Added
2.4.2
-
KERBEROS_KEYTAB¶ The path of the keytab file. If not provided, a valid ticket must already exist in the cache (i.e. manual
kinit).Supported Values
Path to the keytab file
Version Added
2.3.0
-
KERBEROS_PATH¶ If your Kerberos utilities (like
kinit) reside in some non-standard directory, set the path here.Supported Values
Path to Kerberos utilities
Version Added
2.3.0
-
KERBEROS_PRINCIPAL¶ The Kerberos user to authenticate as. i.e.
kinit -kt KERBEROS_KEYTAB KERBEROS_PRINCIPALshould succeed. IfKERBEROS_KEYTABis provided, this should also be provided.Supported Values
Name of Kerberos principal
Version Added
2.3.0
-
KERBEROS_SERVICE¶ The Impala/Hive service (typically
impala/hive). If empty, Smart Connector will attempt to connect unsecured.Supported Values
Name of Impala service
Version Added
2.3.0
-
KERBEROS_TICKET_CACHE_PATH¶ Required to use the
libhdfs3-based result cache with an HDFS cluster that uses Kerberos authentication. In a deployment where result cache queries will never be used, this variable can safely be unset.Supported Values
Path to Kerberos ticket cache path for the user that will be executing Smart Connector processes
Version Added
2.3.0
-
LD_LIBRARY_PATH¶ Ensures Gluent
libdirectory is included.Supported Values
Valid paths
Version Added
2.3.0
-
LIBHDFS3_CONF¶ HDFS client configuration file location.
Supported Values
Valid path to XML configuration file
Version Added
3.0.4
-
LOG_LEVEL¶ Logging level verbosity.
Default Value
infoSupported Values
info|detail|debugVersion Added
2.3.0
-
MAX_OFFLOAD_CHUNK_COUNT¶ Restrict number of partitions offloaded per cycle. See Offload Transport Chunks for usage.
Associated Option
Supported Values
1-1000Version Added
2.9.0
-
MAX_OFFLOAD_CHUNK_SIZE¶ Restrict size of partitions offloaded per cycle. See Offload Transport Chunks for usage.
Associated Option
Supported Values
E.g.
100M,1G,1.5GVersion Added
2.9.0
-
METAD_AUTOSTART¶ Enable Metadata Daemon automatic start:
TRUE: If Metadata Daemon is not running, Smart Connector will attempt to start Metadata Daemon automatically. FALSE: Smart Connector will only attempt to connect to an already running Metadata Daemon.
Default Value
trueSupported Values
true|falseVersion Added
2.6.0
-
METAD_POOL_SIZE¶ The maximum number of connections Metadata Daemon will maintain in its connection pool to Oracle Database.
Default Value
16Supported Values
Number of connections
Version Added
2.4.5
-
METAD_POOL_TIMEOUT¶ The timeout for idle connections in Metadata Daemon’s connection pool to Oracle Database.
Default Value
300Supported Values
Timeout value in seconds
Version Added
2.4.5
-
NLS_LANG¶ Should be set to the value of Oracle Database
NLS_CHARACTERSET.Supported Values
Valid
NLS_CHARACTERSETvaluesVersion Added
2.3.0
-
NUM_LOCATION_FILES¶ Number of external table location files for parallel data retrieval.
Associated Option
Supported Values
Integer values
Version Added
2.7.2
-
OFFLOAD_BACKEND_SESSION_PARAMETERS¶ Key/value pairs, in JSON format, to override backend query engine parameters. These take effect when establishing a connection to the backend system. For example:
"{\"export OFFLOAD_BACKEND_SESSION_PARAMETERS="{\"request_pool\": \"'root.gluent'\"}"Supported Values
Valid JSON string of key/value pairs (no nested or complex data types)
Version Added
3.3.2
-
OFFLOAD_BIN¶ Path to the Gluent Data Platform
bindirectory ($OFFLOAD_HOME/bin).Supported Values
Oracle Database directory object name
Version Added
2.3.0
-
OFFLOAD_CONF¶ Path to the Gluent Data Platform
confdirectory.Supported Values
Path to
confdirectoryVersion Added
2.3.0
-
OFFLOAD_COMPRESS_LOAD_TABLE¶ Compress staged data during an Offload. This can be useful when staging to cloud storage.
Associated Option
Supported Values
true|falseVersion Added
4.0.0
-
OFFLOAD_DISTRIBUTE_ENABLED¶ Distribute data by partition key(s) during the final INSERT operation of an offload. Hive only.
Associated Option
Supported Values
true|falseVersion Added
2.8.0
-
OFFLOAD_FS_AZURE_ACCOUNT_DOMAIN¶ Microsoft Azure storage account service domain, required when staging offloaded data in Azure storage.
Supported Values
blob.core.windows.netVersion Added
4.1.0
-
OFFLOAD_FS_AZURE_ACCOUNT_KEY¶ Microsoft Azure account key, required when staging offloaded data in Azure storage.
Supported Values
Valid Azure account key
Version Added
4.1.0
-
OFFLOAD_FS_AZURE_ACCOUNT_NAME¶ Microsoft Azure account name, required when staging offloaded data in Azure storage.
Supported Values
Valid Azure account name
Version Added
4.1.0
-
OFFLOAD_FS_CONTAINER¶ The name of the bucket or container to be used when offloading to cloud storage.
Associated Option
Supported Values
A cloud storage bucket/container name configured for use by the backend cluster
Version Added
3.0.0
-
OFFLOAD_FS_PREFIX¶ A directory path used to prefix database locations within
OFFLOAD_FS_SCHEME. WhenOFFLOAD_FS_SCHEMEisinheritHDFS_DATAtakes precedence over this setting.Associated Option
Supported Values
A valid directory in HDFS or cloud storage
Version Added
3.0.0
-
OFFLOAD_FS_SCHEME¶ The filesystem scheme to be used for database and table locations.
inheritspecifies that all tables created by Offload will not specify aLOCATIONclause, they will inherit the location from the parent database. See Integrating with Cloud Storage for details.Associated Option
Supported Values
inherit,hdfs,s3a,adl,abfs,abfssVersion Added
3.0.0
-
OFFLOAD_HOME¶ Location of Gluent Data Platform installation.
Supported Values
Path to installed
offloaddirectoryVersion Added
2.3.0
-
OFFLOAD_LOG¶ Path to the Gluent Data Platform
logdirectory.Supported Values
Oracle Database directory object name
Version Added
2.3.0
-
OFFLOAD_LOGDIR¶ Override Smart Connector log path. If undefined defaults to
$OFFLOAD_HOME/log.Supported Values
Valid path
Version Added
2.3.0
-
OFFLOAD_NOT_NULL_PROPAGATION¶ Specify how Offload should treat
NOT NULLconstraints on offloaded columns. A value ofAUTOwill propagate all RDBMSNOT NULLconstraints to the backend and a value ofNONEwill not propagate anyNOT NULLconstraints columns to the backend table. Only applies to Google BigQuery, Snowflake or Azure Synapse Analytics backends. The--not-null-columnsoption can be used to override this global setting, allowing a specific list of columns to be defined asNOT NULLfor an individual offload.Default Value
AUTOSupported Values
AUTO|NONEVersion Added
4.3.4
-
OFFLOAD_SORT_ENABLED¶ Enables the sorting/clustering of data when inserting in to the final destination table. Columns used for sorting/clustering are specified using
--sort-columns.Associated Option
Supported Values
true|falseVersion Added
2.7.0
-
OFFLOAD_STAGING_FORMAT¶ Staging file format to use when staging offloaded data for loading into Snowflake.
Default value
PARQUETSupported Values
AVRO|PARQUETVersion Added
4.1.0
-
OFFLOAD_TRANSPORT¶ Method used to transport data from an RDBMS frontend to a backend system.
AUTOselects the optimal method based on configuration and table structure.Associated Option
Supported Values
AUTO|GLUENT|SQOOPVersion Added
3.1.0
-
OFFLOAD_TRANSPORT_AUTH_USING_ORACLE_WALLET¶ Instruct Offload that RDBMS authentication is via an Oracle Wallet. The wallet location should be configured using Hadoop configuration appropriate to method used for data transport. See
SQOOP_OVERRIDESandOFFLOAD_TRANSPORT_SPARK_PROPERTIESfor examples.Supported Values
true|falseVersion Added
3.1.0
-
OFFLOAD_TRANSPORT_CMD_HOST¶ An override for
HDFS_CMD_HOSTwhen running shell based Offload Transport commands such as Sqoop or Spark Submit.Associated Option
Supported Values
Hostname or IP address of HDFS host
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_CONSISTENT_READ¶ Control whether parallel data transport tasks should use a consistent point in time when reading RDBMS data.
Associated Option
Supported Values
true|falseVersion Added
3.1.0
-
OFFLOAD_TRANSPORT_CREDENTIAL_PROVIDER_PATH¶ The credential provider path to be used in conjunction with
OFFLOAD_TRANSPORT_PASSWORD_ALIAS. Integration with Hadoop Credential Provider API is only supported by Sqoop, Spark Submit and Livy based Offload Transport.Supported Values
A valid HDFS path
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_DSN¶ Database connection details for Offload Transport if different to
ORA_CONN.Associated Option
Supported Values
<hostname>:<port>/<service>Version Added
3.1.0
-
OFFLOAD_TRANSPORT_FETCH_SIZE¶ Number of records to fetch in a single batch from the RDBMS during Offload. Offload Transport may encounter memory pressure if a table is very wide (e.g. contains LOB columns) and there are lots of records in a batch. Reducing the fetch size can alleviate this if more memory cannot be allocated.
Associated Option
Supported Values
Positive integers
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_LIVY_API_VERIFY_SSL¶ Used to enable SSL for Livy API calls. There are 4 states:
Empty: Do not use SSL.
TRUE: Use SSL and verify Hadoop certificate against known certificates.
FALSE: Use SSL and do not verify Hadoop certificate.
/some/path/here/cert-bundle.crt: Use SSL and verify Hadoop certificate against path to certificate bundle.
Supported Values
Empty,
true|false,<path to certificate bundle>Version Added
3.1.0
-
OFFLOAD_TRANSPORT_LIVY_API_URL¶ URL for Livy/Spark REST API in the format
http://fqdn-n.example.com:port.httpscan be used in place ofhttp.Associated Option
Supported Values
Valid Livy REST API URL
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_LIVY_IDLE_SESSION_TIMEOUT¶ Timeout (in seconds) for idle Spark client sessions created in Livy.
Associated Option
Supported Values
Positive integers
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_LIVY_MAX_SESSIONS¶ Limits the number of Livy sessions Offload will create. Sessions are re-used when idle. New sessions are only created when no idle sessions are available.
Associated Option
Supported Values
Positive integers
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_PARALLELISM¶ The number of parallel streams to be used when transporting data from the source RDBMS to the backend.
Associated Option
Supported Values
Positive integers
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_PASSWORD_ALIAS¶ An alias provided by Hadoop Credential Provider API to be used for RDBMS authentication during Offload Transport. The key store containing the alias must be specified in either
OFFLOAD_TRANSPORT_CREDENTIAL_PROVIDER_PATHor in Hadoop configuration Path (hadoop.security.credential.provider.path).Associated Option
Supported Values
Valid Hadoop Credential Provider API alias
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_RDBMS_SESSION_PARAMETERS¶ Key/value pairs, in JSON format, to supply database session parameter values. These only take effect during Offload Transport, e.g.
'{"cell_offload_processing": "false"}'Supported Values
Valid JSON string of key/value pairs (no nested or complex data types)
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_SMALL_TABLE_THRESHOLD¶ Threshold above which Query Import is no longer considered the correct offload choice for non-partitioned tables.
Supported Values
E.g.
100M,1G,1.5GVersion Added
4.2.0
-
OFFLOAD_TRANSPORT_SPARK_OVERRIDES¶ Override JVM flags for a
spark-submitcommand, inserted immediately afterspark-submit.Associated Option
Supported Values
Valid JVM options
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_SPARK_PROPERTIES¶ Key/value pairs, in JSON format, to override Spark property defaults. Examples:
'{"spark.driver.memory": "8G", "spark.executor.memory": "8G"}' '{"spark.driver.extraJavaOptions": "-Doracle.net.wallet_location=/some/path/here/gluent_wallet", "spark.executor.extraJavaOptions": "-Doracle.net.wallet_location=/some/path/here/gluent_wallet"}'Associated Option
Supported Values
Valid JSON string of key/value pairs (no nested or complex data types)
Version Added
3.1.0
Note
Some properties will not take effect when connecting to the Spark Thrift Server because the Spark context has already been created.
-
OFFLOAD_TRANSPORT_SPARK_QUEUE_NAME¶ YARN queue name for Gluent Offload Engine Spark jobs.
Associated Option
Supported Values
Valid YARN queue name
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_SPARK_SUBMIT_EXECUTABLE¶ The executable to use for submitting Spark applications. Can be empty,
spark-submitorspark2-submit.Supported Values
Blank or
spark-submit|spark2-submitVersion Added
3.1.0
-
OFFLOAD_TRANSPORT_SPARK_SUBMIT_MASTER_URL¶ The master URL for the Spark cluster, only used for non-Hadoop Spark clusters. If empty, Spark will use default settings.
Associated Option
None
Supported Values
Valid master URL
Version Added
4.0.0
-
OFFLOAD_TRANSPORT_SPARK_THRIFT_HOST¶ Name of host(s) where the Spark Thrift Server is running. Can be a comma-separated list of hosts to randomly choose from, e.g.
hadoop1,hadoop2,hadoop3.Associated Option
Supported Values
Hostname or IP address of Spark Thrift Server host(s)
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_SPARK_THRIFT_PORT¶ Port that the Spark Thrift Server is listening on.
Associated Option
Supported Values
Active port
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_USER¶ User to authenticate as when executing Offload Transport commands such as SSH for
spark-submitor Sqoop commands, or Livy API callsAssociated Option
None
Supported Values
Valid username
Version Added
4.0.0
-
OFFLOAD_TRANSPORT_VALIDATION_POLLING_INTERVAL¶ Polling interval in seconds for validation of Spark transport row count. A value of -1 disables retrieval of RDBMS SQL statistics. A value of 0 disables polling resulting in a single capture of SQL statistics after Offload Transport. A value greater than 0 polls RDBMS SQL statistics using the specified interval.
Associated Option
Supported Values
Interval value in seconds,
0or-1Version Added
4.2.1
Note
When the Spark Thrift Server or Apache Livy are used for Offload Transport it is recommended to set OFFLOAD_TRANSPORT_VALIDATION_POLLING_INTERVAL to a positive value. This is because polling RDBMS SQL statistics is the primary validation for both Spark Thrift Server and Apache Livy based Offload Transport.
-
OFFLOAD_UDF_DB¶ For Impala/Hive, the database that Gluent Data Platform UDFs are created in. If undefined defaults to the
defaultdatabase.For BigQuery, the name of the dataset that contains custom UDF(s) for synthetic partitioning. If undefined, the dataset will be determined from the
--partition-functionsoption.Supported Values
Valid Impala/Hive database or BigQuery dataset
Version Added
2.3.0
-
OFFLOAD_VERIFY_PARALLELISM¶ Degree of parallelism to use for the RDBMS query executed when validating an offload. Values of 0 or 1 will execute the query without parallelism. Values > 1 will force a parallel query of the given degree. If unset, the RDBMS query will fall back to using the behavior specified by RDBMS defaults.
Associated Option
Supported Values
0and positive integersVersion Added
4.2.1
-
ORA_ADM_CONN¶ Connection string (typically tnsnames.ora entry) for
ORA_ADM_USERconnections. Primarily for use with Oracle Wallet as each entry requires a unique connection string.Supported Values
Connection string corresponding to the Oracle Wallet entry for
ORA_ADM_USERVersion Added
4.2.0
-
ORA_ADM_PASS¶ Password of the Gluent Data Platform Admin Schema chosen during installation. Password encryption is supported using the Password Tool utility.
Supported Values
Oracle Database ADM password
Version Added
2.3.0
-
ORA_ADM_USER¶ Name of the Gluent Data Platform Admin Schema chosen during installation.
Supported Values
Oracle Database ADM username
Version Added
2.3.0
-
ORA_APP_PASS¶ Password of the Gluent Data Platform Application Schema chosen during installation. Password encryption is supported using the Password Tool utility.
Supported Values
Oracle Database APP password
Version Added
2.3.0
-
ORA_APP_USER¶ Name of the Gluent Data Platform Application Schema chosen during installation.
Supported Values
Oracle Database APP username
Version Added
2.3.0
-
ORA_CONN¶ Oracle Database connection details. Fully qualified DB service name must be used if the Oracle Database service name includes domain-names (
DB_DOMAIN), e.g.ORCL12.gluent.com.Supported Values
<hostname>:<port>/<service>Version Added
2.3.0
-
ORA_REPO_USER¶ Name of the Gluent Data Platform Repository Schema chosen during installation.
Supported Values
Oracle Database REPO username
Version Added
3.3.0
-
PASSWORD_KEY_FILE¶ Password key file generated by Password Tool and used to create encrypted password strings.
Supported Values
Path to Password Key File
Version Added
2.5.0
-
PATH¶ Ensures Gluent Data Platform
bindirectory is included. The path order is important to ensure that the Python distribution included with Gluent Data Platform is used.Supported Values
Valid paths
Version Added
2.3.0
-
QUERY_ENGINE¶ Backend SQL engine to use for commands issued as part of Offload/Present orchestration.
Supported Values
BIGQUERY|IMPALA|SNOWFLAKE|SYNAPSEVersion Added
2.3.0
-
QUERY_MONITOR_THRESHOLD¶ Threshold for hybrid query execution time (in seconds) that enables automatic monitoring of a query in the backend. Queries with Data Daemon execution time below this threshold will not gather any backend trace metrics or profiles. A value of 0 will enable automatic trace/profile collection for all hybrid queries. Individual hybrid queries can have trace enabled or disabled with the
GLUENT_QUERY_MONITORorGLUENT_NO_QUERY_MONITORhints, respectively.Supported Values
Integers >= 0
Version Added
4.3.2
-
SNOWFLAKE_ACCOUNT¶ Name of the Snowflake account to use with Gluent Data Platform.
Supported Values
Snowflake account name
Version Added
4.1.0
-
SNOWFLAKE_DATABASE¶ Name of the Snowflake database to use with Gluent Data Platform.
Supported Values
Snowflake database name
Version Added
4.1.0
-
SNOWFLAKE_FILE_FORMAT_PREFIX¶ Name prefix for Gluent Offload Engine to use when creating file format objects while offloading to Snowflake.
Default Value
GLUENT_OFFLOAD_FILE_FORMATSupported Values
Valid Snowflake file format object name <= 120 characters
Version Added
4.1.0
-
SNOWFLAKE_INTEGRATION¶ Name of the Snowflake storage integration for Gluent Offload Engine to use when offloading to Snowflake.
Supported Values
Valid Snowflake integration name
Version Added
4.1.0
-
SNOWFLAKE_PASS¶ Password for Snowflake service account user for Gluent Data Platform, required when using password authentication. Password encryption is supported using the Password Tool utility.
Supported Values
Snowflake user’s password
Version Added
4.1.0
-
SNOWFLAKE_PEM_FILE¶ Path to private PEM file for Snowflake service account user for Gluent Data Platform, required when using key-pair authentication.
Supported Values
Path to Snowflake user’s private PEM key file
Version Added
4.1.0
-
SNOWFLAKE_PEM_PASSPHRASE¶ Optional PEM passphrase to authenticate the Snowflake service account user for Gluent Data Platform, only required when using key-pair authentication with a passphrase. Passphrase encryption is supported using the Password Tool utility.
Supported Values
Snowflake user’s PEM passphrase
Version Added
4.1.0
-
SNOWFLAKE_ROLE¶ Name of the Snowflake database role created by Gluent Data Platform.
Default Value
GLUENT_OFFLOAD_ROLESupported Values
Valid Snowflake role name
Version Added
4.1.0
-
SNOWFLAKE_STAGE¶ Name for Gluent Offload Engine to use when creating schema-level stage objects while offloading to Snowflake.
Default Value
GLUENT_OFFLOAD_STAGESupported Values
Valid Snowflake stage name
Version Added
4.1.0
-
SNOWFLAKE_USER¶ Name of the Snowflake service account user for Gluent Data Platform.
Supported Values
Valid Snowflake user name
Version Added
4.1.0
-
SNOWFLAKE_WAREHOUSE¶ Default Snowflake warehouse for Gluent Data Platform to use when interacting with Snowflake.
Supported Values
Valid Snowflake warehouse name
Version Added
4.1.0
-
SPARK_HISTORY_SERVER¶ SPARK_HISTORY_SERVER is a URL for accessing the runtime history of the running Spark Thrift Server UI.
Supported Values
URL of Spark History Server e.g.
http://hadoop1:18081/Version Added
3.1.0
-
SPARK_THRIFT_HOST¶ Name of host(s) where the Spark Thrift Server is running. Can be a comma-separated list of hosts to randomly choose from, e.g.
hadoop1,hadoop2,hadoop3.Supported Values
Hostname or IP address of Spark Thrift Server host(s)
Version Added
3.1.0
-
SPARK_THRIFT_PORT¶ Port that the Spark Thrift Server is listening on.
Supported Values
Active port
Version Added
3.1.0
-
SQOOP_DISABLE_DIRECT¶ It is recommended that the OraOOP optimizations for Sqoop (included in standard Apache Sqoop from
v1.4.5) are used. If not, then disable direct path mode.Associated Option
Supported Values
true|falseVersion Added
2.3.0
-
SQOOP_OVERRIDES¶ Override flags for Sqoop command, inserted immediately after
sqoop import. To avoid issues,-Dsqoop.avro.logical_types.decimal.enable=falseis included by default and should not be removed. Additional settings can be added as below. For example:"-Dsqoop.avro.logical_types.decimal.enable=false -Dmapreduce.map.java.opts='-Doracle.net.wallet_location=/some/path/here/gluent_wallet'"
Associated Option
Supported Values
Valid Sqoop parameters
Version Added
2.3.0
-
SQOOP_ADDITIONAL_OPTIONS¶ Additional Sqoop command options added at the end of the Sqoop command.
Associated Option
Supported Values
Any Sqoop command option/argument not already included in the Sqoop command line
Version Added
2.9.0
-
SQOOP_PASSWORD_FILE¶ HDFS path to Sqoop password file, readable by
HADOOP_SSH_USER. If not specified,ORA_APP_PASSwill be used.Associated Option
Supported Values
HDFS path to password file
Version Added
2.5.0
-
SQOOP_QUEUE_NAME¶ YARN queue name for Gluent Offload Engine Sqoop jobs.
Associated Option
Supported Values
Valid YARN queue name
Version Added
3.1.0
-
SSL_ACTIVE¶ Set to
truewhen Impala/Hive uses SSL/TLS encryption.Supported Values
true|falseVersion Added
2.3.0
-
SSL_TRUSTED_CERTS¶ SSL/TLS trusted certificates.
Supported Values
Path to SSL certificate
Version Added
2.3.0
-
START_OF_WEEK¶ Specify the first day of the week for
TO_CHAR(<value>, 'D')predicate pushdown. Applies to Snowflake and Azure Synapse Analytics.Default Value
7Supported Values
1(Monday) to7(Sunday)Version Added
4.3.0
-
SYNAPSE_AUTH_MECHANISM¶ Azure Synapse Analytics authentication mechanism.
Supported Values
SqlPassword,ActiveDirectoryPassword,ActiveDirectoryMsi,ActiveDirectoryServicePrincipalVersion Added
4.3.0
-
SYNAPSE_COLLATION¶ Azure Synapse Analytics collation to use for character columns. Note that changing this to a value with different behavior to the frontend system may give unexpected results.
Supported Values
Valid collations
Version Added
4.3.0
-
SYNAPSE_DATA_SOURCE¶ Name of the external data source for Gluent Offload Engine to use when offloading to Azure Synapse Analytics. Note that in databases with case-sensitive collations this parameter is case-sensitive.
Supported Values
Valid Azure Synapse Analytics external data source
Version Added
4.3.0
-
SYNAPSE_DATABASE¶ Name of the Azure Synapse Analytics database to use with Gluent Data Platform. Note that in databases with case-sensitive collations this parameter is case-sensitive.
Supported Values
Valid Azure Synapse Analytics database name
Version Added
4.3.0
-
SYNAPSE_FILE_FORMAT¶ Name of the file format for Gluent Offload Engine to use when offloading to Azure Synapse Analytics. Note that in databases with case-sensitive collations this parameter is case-sensitive.
Supported Values
Valid Azure Synapse Analytics Parquet file format
Version Added
4.3.0
-
SYNAPSE_MSI_CLIENT_ID¶ Specifies the object (principal) ID of the identity for
ActiveDirectoryMsiauthentication with a user-assigned identity. Leave blank when using other authentication mechanisms.Supported Values
Object (principal) ID of the identity
Version Added
4.3.0
-
SYNAPSE_PASS¶ Specifies the password for the Gluent Data Platform user for
SqlPasswordorActiveDirectoryPasswordauthentication. Leave blank when using other authentication mechanisms. Password encryption is supported using the Password Tool utility.Supported Values
Azure Synapse Analytics user’s password
Version Added
4.3.0
-
SYNAPSE_PORT¶ Dedicated SQL endpoint port of Azure Synapse Analytics workspace.
Default Value
1433Supported Values
Valid port
Version Added
4.3.0
-
SYNAPSE_RESOURCE_GROUP¶ Resource group of Azure Synapse Analytics workspace.
Supported Values
Valid Azure Synapse Analytics resource group
Version Added
4.3.0
-
SYNAPSE_ROLE¶ Name of the Azure Synapse Analytics database role assigned to the Gluent Data Platform user. Note that in databases with case-sensitive collations this parameter is case-sensitive.
Supported Values
Valid Azure Synapse Analytics role name
Version Added
4.3.0
-
SYNAPSE_SERVER¶ Dedicated SQL endpoint of Azure Synapse Analytics workspace.
Supported Values
Valid Azure Synapse Analytics dedicated SQL endpoint
Version Added
4.3.0
-
SYNAPSE_SERVICE_PRINCIPAL_ID¶ Specifies the application (client) ID for
ActiveDirectoryServicePrincipalauthentication. Leave blank when using other authentication mechanisms.Supported Values
Application (client) ID
Version Added
4.3.0
-
SYNAPSE_SERVICE_PRINCIPAL_SECRET¶ Specifies the client secret for
ActiveDirectoryServicePrincipalauthentication. Leave blank when using other authentication mechanisms.Supported Values
Client secret
Version Added
4.3.0
-
SYNAPSE_SUBSCRIPTION_ID¶ ID of the subscription containing the Azure Synapse Analytics workspace.
Supported Values
Valid Azure Synapse Analytics resource group
Version Added
4.3.0
-
SYNAPSE_USER¶ Specifies the username for the Gluent Data Platform user for
SqlPasswordorActiveDirectoryPasswordauthentication. Leave blank when using other authentication mechanisms.Supported Values
Azure Synapse Analytics username
Version Added
4.3.0
-
SYNAPSE_WORKSPACE¶ Name of the Azure Synapse Analytics workspace.
Supported Values
Valid Azure Synapse Analytics workspace
Version Added
4.3.0
-
TWO_TASK¶ Used to support Pluggable Databases in Oracle Database Multitenant environments. Set to
ORA_CONNfor single instance and an EZconnect string connecting to the local instance, typically<hostname>:<port>/<ORACLE_SID>, for Oracle RAC (Real Application Clusters).Supported Values
ORA_CONNor EZconnect stringVersion Added
2.10.0
-
USE_ORACLE_WALLET¶ Controls use of Oracle Wallet for authentication for orchestration commands and Metadata Daemon. When set to
trueOFFLOAD_TRANSPORT_AUTH_USING_ORACLE_WALLETis automatically set totrue.Default Value
falseSupported Values
true|falseVersion Added
4.2.0
-
WEBHDFS_HOST¶ Can be used in conjunction with
WEBHDFS_PORTto optimize HDFS activities removing JVM start-up overhead by utilizing WebHDFS. From version2.4.7the value can be a comma-separated list of hosts if HDFS is configured for High Availability.Supported Values
Hostname or IP address of WebHDFS host
Version Added
2.3.0
-
WEBHDFS_PORT¶ Can be used in conjunction with
WEBHDFS_HOSTto optimize HDFS activities removing JVM start-up overhead by utilizing WebHDFS. If this value is unset then default ports of 50070 (HTTP) or 50470 (HTTPS) are used.Default Value
50070|50470Supported Values
Port of HDFS namenode
Version Added
2.3.0
-
WEBHDFS_VERIFY_SSL¶ Used to enable SSL for WebHDFS calls. There are 4 states:
Empty: Do not use SSL
TRUE: Use SSL & verify Hadoop certificate against known certificates
FALSE: Use SSL & do not verify Hadoop certificate
/some/path/here/cert-bundle.crt: Use SSL & verify Hadoop certificate against path to certificate bundle
Supported Values
Empty,
true|false,<path to certificate bundle>Version Added
2.3.0
Common Parameters¶
-
--execute¶ Perform operations, rather than just printing.
Alias
-xDefault Value
None
Supported Values
None
Version Added
2.3.0
-
-f¶ Force option. Replace Gluent Offload Engine managed tables/views as required. Use with caution.
Alias
--forceDefault Value
None
Supported Values
None
Version Added
2.3.0
-
--force¶ Force option. Replace Gluent Offload Engine managed tables/views as required. Use with caution.
Alias
-fDefault Value
None
Supported Values
None
Version Added
2.3.0
-
--no-webhdfs¶ Prevent the use of WebHDFS even when configured for use.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
-t¶ Owner and table name.
Alias
--tableDefault Value
None
Supported Values
<OWNER>.<NAME>Version Added
2.3.0
-
--table¶ Owner and table name.
Alias
-tDefault Value
None
Supported Values
<OWNER>.<NAME>Version Added
2.3.0
-
--target-name¶ Override owner and/or name of created frontend or backend object as appropriate for a command.
Allows separation of the RDBMS owner and/or name from the backend system. This can be necessary as some characters supported for owner and name in Oracle Database are not supported in all backend systems, for example
$in Hadoop-based or BigQuery backends.Allows offload to an existing backend database with a different name to the source RDBMS schema.
Allows present to a hybrid schema without a corresponding application RDBMS schema or with a different name to the source backend database.
Alias
None
Default Value
None
Supported Values
<OWNER>.<NAME>Version Added
2.3.0
-
-v¶ Verbose output.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--vv¶ More verbose output.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
-x¶ Perform operations, rather than just printing.
Alias
--executeDefault Value
None
Supported Values
None
Version Added
2.3.0
Connect Parameters¶
-
--create-sequence-table¶ Create the Gluent Data Platform sequence table. See
IN_LIST_JOIN_TABLEandIN_LIST_JOIN_TABLE_SIZE.Alias
None
Default Value
None
Supported Values
None
Version Added
2.4.2
-
--install-udfs¶ Install Gluent Data Platform user-defined functions (UDFs).
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--sequence-table-name¶ See
IN_LIST_JOIN_TABLE.Alias
None
Default Value
default.gluent_sequenceSupported Values
Valid database and table name
Version Added
2.4.2
-
--sequence-table-size¶ -
Alias
None
Default Value
10000Supported Values
Up to 1000000
Version Added
2.4.2
-
--sql-file¶ Write SQL commands to a file rather than execute them when
connectis run.Alias
None
Default Value
None
Supported Values
Any valid path
Version Added
2.11.0
-
--update-root-files¶ Updates both Metadata Daemon and Data Daemon scripts with configuration and sets ownership to
root:root. This option can only be run withrootprivileges.Alias
None
Default Value
None
Supported Values
None
Version Added
2.11.0
-
--update-metad-files¶ Updates Metadata Daemon scripts with configuration and sets ownership to
root:root. This option can only be run withrootprivileges.Alias
None
Default Value
None
Supported Values
None
Version Added
4.0.0
-
--update-datad-files¶ Updates Data Daemon scripts with configuration and sets ownership to
root:root. This option can only be run withrootprivileges.Alias
None
Default Value
None
Supported Values
None
Version Added
4.0.0
-
--upgrade-environment-file¶ Updates configuration file (
offload.env) with any missing default configuration from offload.env.template. Typically used after upgrades.Alias
None
Default Value
None
Supported Values
None
Version Added
2.11.0
-
--validate-udfs¶ Validate that the Gluent Data Platform user-defined functions (UDFs) are accessible from Impala after installation/upgrade.
Alias
None
Default Value
None
Supported Values
None
Version Added
4.1.0
Offload Parameters¶
-
--allow-decimal-scale-rounding¶ Confirm that it is acceptable for Offload to round decimal places when loading data into a backend system.
Alias
None
Default Value
None
Supported Values
None
Version Added
4.0.0
-
--allow-floating-point-conversions¶ Confirm that it is acceptable for Offload to convert NaN or Infinity special values to NULL when loading data into a backend system.
Alias
None
Default Value
None
Supported Values
None
Version Added
4.3.0
-
--allow-nanosecond-timestamp-columns¶ Confirm that it is safe to offload timestamp columns with nanosecond capability when the backend system does not support nanoseconds.
Alias
None
Default Value
None
Supported Values
None
Version Added
4.0.2
-
--bucket-hash-column¶ Column to use when calculating offload bucket values.
Alias
None
Default Value
None
Supported Values
Valid column name
Version Added
2.3.0
-
--compress-load-table¶ Compress the contents of the load table during offload.
Alias
None
Default Value
OFFLOAD_COMPRESS_LOAD_TABLE,falseSupported Values
None
Version Added
2.3.0
-
--compute-load-table-stats¶ Compute statistics on the load table during offload. Applicable to Impala.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.9.0
-
--create-backend-db¶ Automatically create backend databases. Either use this option, or ensure the correct databases/datasets/schemas (base and load databases) for offloading and presenting already exist.
Alias
None
Default Value
None
Supported Values
None
Version Added
3.3.0
-
--count-star-expressions¶ CSV list of functional equivalents to
COUNT(*)for aggregation pushdown.If you also use
COUNT(x)in your SQL statements then, apart fromCOUNT(1)which is automatically catered for, the presence ofCOUNT(x)will cause rewrite rules to fail unless you include it with this parameter.Alias
None
Default Value
None
Supported Values
E.g.
COUNT(9)Version Added
2.3.0
-
--data-governance-custom-properties¶ CSV list of free-format tags for data governance metadata. These are in addition to
DATA_GOVERNANCE_AUTO_PROPERTIESand will overrideDATA_GOVERNANCE_CUSTOM_PROPERTIES.Alias
None
Default Value
Supported Values
Valid JSON string of key/value pairs (no nested or complex data types)
Version Added
2.11.0
CSV list of free-format tags for data governance metadata. These are in addition to
DATA_GOVERNANCE_AUTO_TAGSand therefore useful for tags to be applied to specific activities.Alias
None
Default Value
Supported Values
E.g.
CONFIDENTIAL,TIER1Version Added
2.11.0
-
--data-sample-parallelism¶ Degree of parallelism to use when sampling data for all columns in the source RDBMS table that are either date or timestamp-based or defined as a number without a precision and scale. A value of 0 or 1 disables parallelism.
Alias
None
Default Value
Supported Values
0and positive integersVersion Added
4.2.0
-
--data-sample-percent¶ Sample data for all columns in the source RDBMS table that are either date or timestamp-based or defined as a number without a precision and scale. A value of 0 disables sampling. A value of
AUTOwill enable Offload choose a percentage based on the size of the RDBMS table.Alias
None
Default Value
AUTOSupported Values
AUTOor0-100Version Added
2.5.0
-
--date-columns¶ CSV list of columns to offload as DATE (effective for date/timestamp columns).
This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
4.0.0
-
--db-name-prefix¶ Multitenant support, enabling many Oracle Database databases to offload to the same backend cluster. See
DB_NAME_PREFIXfor details.Alias
None
Default Value
Supported Values
Supported backend characters
Version Added
2.3.0
-
--decimal-columns¶ CSV list of columns to offload/present as a fixed precision and scale numeric data type. For example
DECIMAL(p,s)where “p,s” is specified in a paired--decimal-columns-typeoption. Only effective for numeric columns. These options allow repeat inclusion for flexible data type specification, for example:"--decimal-columns-type=18,2 --decimal-columns=price,cost --decimal-columns-type=6,4 --decimal-columns=location"
This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.5.0
-
--decimal-columns-type¶ State the precision and scale of columns listed in a paired
--decimal-columnsoption. Must be of format “precision,scale” where 1<=precision<=38 and 0<=scale<=38 and scale<=precision. e.g.:"--decimal-columns-type=18,2"
When offloading, values specified in this option are subject to padding as per the
--decimal-padding-digitsoption.Alias
None
Default Value
None
Supported Values
Valid “precision,scale” where 1<=precision<=38 and 0<=scale<=38 and scale<=precision
Version Added
2.5.0
-
--decimal-padding-digits¶ Padding to apply to precision and scale of DECIMALs during an offload.
Alias
None
Default Value
2
Supported Values
Integral values
Version Added
2.5.0
-
--double-columns¶ CSV list of columns to store as a double precision floating-point. Only effective for numeric columns.
This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.4.7
-
--equal-to-values¶ Used for list-partitioned tables to specify a partition to be included for Partition-Based Offload by partition key value. This option can be included multiple times to match multiple partitions, for example:
--equal-to-values=2011 --equal-to-values=2012 --equal-to-values=2013
Alias
None
Default Value
None
Supported Values
Valid literals matching list-partition key values
Version Added
3.3.0
-
--ext-table-degree¶ Default degree of parallelism for base hybrid external tables. When set to
AUTOOffload will copy settings from the source RDBMS table to the hybrid external table.Alias
None
Default Value
HYBRID_EXT_TABLE_DEGREEorAUTOSupported Values
AUTOand positive integersVersion Added
2.11.2
-
--hdfs-data¶ Command line override for
HDFS_DATA.Alias
None
Default Value
Supported Values
Valid HDFS path
Version Added
2.3.0
-
--hdfs-db-path-suffix¶ Hadoop databases are named
<schema><HDFS_DB_PATH_SUFFIX>and<schema>_load<HDFS_DB_PATH_SUFFIX>. When this value is not set the suffix of the databases defaults to.db, giving<schema>.dband<schema>_load.db. Set this to an empty string to use no suffix. For backend systems other than Hadoop this variable has no effect.Alias
None
Default Value
HDFS_DB_PATH_SUFFIX,.dbon Hadoop systems, or''on other backend systems.Supported Values
Valid HDFS path
Version Added
2.3.0
-
--hive-column-stats¶ Enable computation of column stats with “NATIVE”
--offload-statsmethod. Applies to Hive only.Alias
None
Default Value
None
Supported Values
None
Version Added
2.6.1
-
--integer-1-columns¶ CSV list of columns to offload/present (as applicable) as a 1-byte integer, known as
TINYINTin many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-2-columns¶ CSV list of columns to offload/present (as applicable) as a 2-byte integer, known as
SMALLINTin many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-4-columns¶ CSV list of columns to offload/present (as applicable) as a 4-byte integer, known as
INTin many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-8-columns¶ CSV list of columns to offload/present (as applicable) as an 8-byte integer, known as
BIGINTin many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-38-columns¶ CSV list of columns to offload/present (as applicable) as 38 digit integral column. If a system does not support 38 digits of precision then the most appropriate data type available will be used. Only effective for numeric columns.
This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--less-than-value¶ Offload partitions with high water mark less than this value.
Alias
None
Default Value
None
Supported Values
Integer or date values (use
YYYY-MM-DDformat)Version Added
2.3.0
-
--lob-data-length¶ Expected length of RDBMS LOB data
Alias
None
Default Value
32KSupported Values
E.g.
64K,10MVersion Added
2.4.7
-
--max-offload-chunk-count¶ Restrict the number of partitions offloaded per cycle. See Offload Transport Chunks for usage.
Alias
None
Default Value
Supported Values
1-1000Version Added
2.3.0
-
--max-offload-chunk-size¶ Restrict the size of partitions offloaded per cycle. See Offload Transport Chunks for usage.
Alias
None
Default Value
Supported Values
E.g.
100M,1G,1.5GVersion Added
2.3.0
-
--no-auto-detect-dates¶ Turn off automatic adoption of string data type for RDBMS date values that are incompatible with the backend system. For example, dates preceding 1400-01-01 are invalid in Impala and will be offloaded to string columns unless this option is used.
Alias
None
Default Value
False
Supported Values
None
Version Added
2.5.1
-
--no-auto-detect-numbers¶ Turn off automatic adoption of numeric data types based on their precision and scale in the RDBMS. All numeric data types will be offloaded to a general purpose data type such as
DECIMAL(38,18)on Hadoop systems,NUMERICorBIGNUMERICon Google BigQuery orNUMBER(38,18)on Snowflake.Alias
None
Default Value
False
Supported Values
None
Version Added
2.3.0
-
--no-create-aggregations¶ Skip aggregation creation. If this parameter is used, then to benefit from Advanced Aggregation Pushdown the aggregate hybrid objects must be created using Present.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--no-generate-dependent-views¶ Dependent views will not be automatically re-generated in the hybrid schema.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--no-materialize-join¶ Offload a join (specified by
--offload-join) as a view.Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--no-modify-hybrid-view¶ Prevent an offload predicate from being added to the boundary conditions in a hybrid view. Can only be used in conjunction with
--offload-predicatefor--offload-predicate-typevalues ofRANGE,LIST_AS_RANGE,RANGE_AND_PREDICATEorLIST_AS_RANGE_AND_PREDICATE.Alias
None
Default Value
None
Supported Values
None
Version Added
3.4.0
-
--no-verify¶ Skip the data validation step at the end of an offload.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--not-null-columns¶ Specifies which columns should be created as
NOT NULLwhen offloading a table. Used to override the globalOFFLOAD_NOT_NULL_PROPAGATIONconfiguration variable at an offload level. Accepts a CSV list and/or wildcard(s) of valid columns to create asNOT NULLin the backend. Only applies to Google BigQuery, Snowflake or Azure Synapse Analytics backends.This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
4.3.4
-
--num-buckets¶ Default number of offload buckets (subpartitions) for an offloaded table, allowing parallel data retrieval. A value of
AUTOtunes to a value between 1 andDEFAULT_BUCKETS_MAX.Alias
None
Default Value
DEFAULT_BUCKETSorAUTOSupported Values
Integer values or
AUTOVersion Added
2.3.0
-
--num-location-files¶ Number of external table location files for parallel data retrieval.
Alias
None
Default Value
Supported Values
Integer values
Version Added
2.7.2
Note
When offloading or materializing data in Impala then --num-location-files will be aligned with --num-buckets/DEFAULT_BUCKETS
-
--offload-by-subpartition¶ Offload a subpartitioned table with Subpartition-Based Offload (i.e. with reference to subpartition keys and high values rather than partition-level information).
Alias
None
Default Value
True for composite partitioned tables that are unsupported for Partition-Based Offload but supported for Subpartition-Based Offload, False for all other tables
Supported Values
None
Version Added
2.7.0
-
--offload-chunk-column¶ Splits load data by this column during insert from the load table to the final table. This can be used to manage memory usage.
Alias
None
Default Value
None
Supported Values
Valid column name
Version Added
2.3.0
-
--offload-chunk-impala-insert-hint¶ Used to inject a hint into the
INSERT AS SELECTmoving data from load table to final destination. The absence of a value injects no hint. Impala only.Alias
None
Default Value
None
Supported Values
SHUFFLE|NOSHUFFLEVersion Added
2.3.0
-
--offload-distribute-enabled¶ Distribute data by partition key(s) during the final INSERT operation of an offload. Hive only.
Alias
None
Default Value
Supported Values
None
Version Added
2.8.0
-
--offload-fs-container¶ The name of the bucket or container to be used when offloading to cloud storage.
Alias
None
Default Value
Supported Values
A cloud storage bucket/container name configured for use by the backend cluster
Version Added
3.0.0
-
--offload-fs-prefix¶ A directory path used to prefix database locations within
OFFLOAD_FS_SCHEME. WhenOFFLOAD_FS_SCHEMEisinheritHDFS_DATAtakes precedence over this setting.Alias
None
Default Value
Supported Values
A valid directory in HDFS or cloud storage
Version Added
3.0.0
-
--offload-fs-scheme¶ The filesystem scheme to be used for database and table locations.
inheritspecifies that all tables created by Offload will not specify aLOCATIONclause, they will inherit the location from the parent database. See Integrating with Cloud Storage for details.Alias
None
Default Value
OFFLOAD_FS_SCHEME,inheritSupported Values
inherit,hdfs,s3a,adl,abfs,abfssVersion Added
3.0.0
-
--offload-join¶ Offload a materialized view of the supplied join(s), allowing join processing to be offloaded. Repeated use of
--offload-joinallows multiple row sources to be included. See documentation for syntax details.Alias
None
Default Value
None
Supported Values
Version Added
2.3.0
-
--offload-predicate¶ Specify a predicate to identify a set of data in a table for offload. Can be used to offload all or some of the data in any table type. See documentation for syntax details.
Alias
None
Default Value
None
Supported Values
Version Added
3.4.0
-
--offload-predicate-type¶ Override the default INCREMENTAL_PREDICATE_TYPE for a partitioned table. Can be used to offload LIST partitioned tables using RANGE logic with an
--offload-predicate-typevalue ofLIST_AS_RANGEor used for specialized cases of offloading with Partition-Based Offload and Predicate-Based Offload.Alias
None
Default Value
None
Supported Values
LIST,LIST_AS_RANGE,RANGE,RANGE_AND_PREDICATE,LIST_AS_RANGE_AND_PREDICATE,PREDICATEVersion Added
3.3.1
-
--offload-sort-enabled¶ Sort/cluster data during the final INSERT operation of an offload. Configure sort/cluster columns using
--sort-columns.Alias
None
Default Value
OFFLOAD_SORT_ENABLED,falseSupported Values
None
Version Added
2.7.0
-
--offload-stats¶ Method used to manage backend table stats during an Offload, Incremental Update Extraction or Compaction.
NATIVEis the default.HISTORYwill gather stats on all partitions without stats (applicable to an Offload on Hive only and will automatically be replaced withNATIVEon Impala).COPYwill copy table statistics from the RDBMS to an offloaded table if the backend system supports setting of statistics.NONEwill prevent Offload from managing stats; for Hive this results in no stats being gathered even ifhive.stats.autogather=trueis set at the system level.Alias
None
Default Value
NATIVESupported Values
NATIVE|HISTORY|COPY|NONEVersion Added
2.4.7(HISTORYadded in2.9.0)
-
--offload-transport¶ Method used to transport data from an RDBMS frontend to a backend system.
AUTOselects the optimal method based on configuration and table structure.Alias
None
Default Value
OFFLOAD_TRANSPORT,AUTOSupported Values
AUTO|GLUENT|SQOOPVersion Added
3.1.0
-
--offload-transport-cmd-host¶ An override for
HDFS_CMD_HOSTwhen running shell based Offload Transport commands such as Sqoop or Spark Submit.Alias
None
Default Value
Supported Values
Hostname or IP address of HDFS host
Version Added
3.1.0
-
--offload-transport-consistent-read¶ Control whether parallel data transport tasks should use a consistent point in time when reading RDBMS data.
Alias
None
Default Value
Supported Values
true|falseVersion Added
3.1.0
-
--offload-transport-dsn¶ Database connection details for Offload Transport if different to
ORA_CONN.Alias
None
Default Value
Supported Values
<hostname>:<port>/<service>Version Added
3.1.0
-
--offload-transport-fetch-size¶ Number of records to fetch in a single batch from the RDBMS during Offload. Offload Transport may encounter memory pressure if a table is very wide (e.g. contains LOB columns) and there are lots of records in a batch. Reducing the fetch size can alleviate this if more memory cannot be allocated.
Alias
None
Default Value
Supported Values
Positive integers
Version Added
3.1.0
-
--offload-transport-jvm-overrides¶ JVM overrides (inserted right after
sqoop importorspark-submit).Alias
None
Default Value
Supported Values
Version Added
3.1.0
-
--offload-transport-livy-api-url¶ URL for Livy/Spark REST API in the format
http://fqdn-n.example.com:port.httpscan be used in place ofhttp.Alias
None
Default Value
Supported Values
Valid Livy REST API URL
Version Added
3.1.0
-
--offload-transport-livy-idle-session-timeout¶ Timeout (in seconds) for idle Spark client sessions created in Livy.
Alias
None
Default Value
Supported Values
Positive integers
Version Added
3.1.0
-
--offload-transport-livy-max-sessions¶ Limits the number of Livy sessions Offload will create. Sessions are re-used when idle. New sessions are only created when no idle sessions are available.
Alias
None
Default Value
Supported Values
Positive integers
Version Added
3.1.0
-
--offload-transport-parallelism¶ The number of parallel streams to be used when transporting data from the source RDBMS to the backend.
Alias
None
Default Value
Supported Values
Positive integers
Version Added
3.1.0
-
--offload-transport-password-alias¶ An alias provided by Hadoop Credential Provider API to be used for RDBMS authentication during Offload Transport. The key store containing the alias must be specified in either
OFFLOAD_TRANSPORT_CREDENTIAL_PROVIDER_PATHor in Hadoop configuration Path (hadoop.security.credential.provider.path).Alias
None
Default Value
Supported Values
Valid Hadoop Credential Provider API alias
Version Added
3.1.0
-
--offload-transport-queue-name¶ YARN queue name to be used for Offload Transport jobs.
Alias
None
Default Value
Supported Values
Version Added
3.1.0
-
--offload-transport-small-table-threshold¶ Threshold above which Query Import is no longer considered the correct offload choice for non-partitioned tables.
Alias
None
Default Value
Supported Values
E.g.
100M,1G,1.5GVersion Added
3.1.0
-
--offload-transport-spark-properties¶ Key/value pairs, in JSON format, to override Spark property defaults. Examples:
'{"spark.driver.memory": "8G", "spark.executor.memory": "8G"}' '{"spark.driver.extraJavaOptions": "-Doracle.net.wallet_location=/some/path/here/gluent_wallet", "spark.executor.extraJavaOptions": "-Doracle.net.wallet_location=/some/path/here/gluent_wallet"}'Alias
None
Default Value
Supported Values
Valid JSON string of key/value pairs (no nested or complex data types)
Version Added
3.1.0
-
--offload-transport-spark-thrift-host¶ Name of host(s) where the Spark Thrift Server is running. Can be a comma-separated list of hosts to randomly choose from, e.g.
hadoop1,hadoop2,hadoop3.Alias
None
Default Value
Supported Values
Hostname or IP address of Spark Thrift Server host(s)
Version Added
3.1.0
-
--offload-transport-spark-thrift-port¶ Port that the Spark Thrift Server is listening on.
Alias
None
Default Value
Supported Values
Active port
Version Added
3.1.0
-
--offload-transport-validation-polling-interval¶ Polling interval in seconds for validation of Spark transport row count. A value of -1 disables retrieval of RDBMS SQL statistics. A value of 0 disables polling resulting in a single capture of SQL statistics after Offload Transport. A value greater than 0 polls RDBMS SQL statistics using the specified interval.
Alias
None
Default Value
Supported Values
Interval value in seconds,
0or-1Version Added
4.2.1
-
--offload-type¶ Identifies a range-partitioned offload as
FULLorINCREMENTAL.FULLdictates that all data is offloaded.INCREMENTALdictates that data up to a boundary threshold will be offloaded.Alias
None
Default Value
INCREMENTALfor RDBMS tables capable of supporting Partition-Based Offload that are partially offloaded (e.g. using--older-than-date).FULLfor all other offloads.Supported Values
FULL|INCREMENTALVersion Added
2.5.0
-
--older-than-date¶ Offload partitions older than this date (use
YYYY-MM-DDformat). Overrides--older-than-daysif both are present.Alias
None
Default Value
None
Supported Values
Date in
YYYY-MM-DDformatVersion Added
2.3.0
-
--older-than-days¶ Offload partitions older than this number of days (exclusive, i.e. the boundary partition is not offloaded). Suitable for keeping data up to a certain age in the source table. Alternative to
--older-than-dateoption. If both are supplied,--older-than-datewill be used.Alias
None
Default Value
None
Supported Values
Valid number of days
Version Added
2.3.0
-
--partition-columns¶ Override column(s) to use for partitioning backend data. Defaults to source table partition columns.
This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.3.0
-
--partition-digits¶ Maximum digits allowed for a numeric partition value.
Alias
None
Default Value
15Supported Values
Integer values
Version Added
2.3.0
-
--partition-functions¶ Custom UDF to use for synthetic partitioning of offloaded data. Used when no native partitioning scheme exists for the partition column data type. Google BigQuery only.
Alias
None
Default Value
None
Supported Values
Valid custom UDF
Version Added
4.2.0
-
--partition-granularity¶ Partition level/granularity. Use:
Y,M,Dfor date/timestamp partition columnsIntegral size for numeric partitions. A value of
1is effectively list partitioningSub-string length for string partitions
Examples:
Mpartitions the table by Year-MonthDpartitions the table by Year-Month-Day5000partitions the table in ranges of 5000 values1creates a partition per value, useful for columns holding values such as year and month or categories2on a string partition key partitions using the first two characters
Alias
None
Default Value
Supported Values
Y|M|D|\d+Version Added
2.3.0
-
--partition-lower-value¶ Integer value defining the lower bound of a range values used for backend integer range partitioning. BigQuery only.
Alias
None
Default Value
None
Supported Values
Positive integers
Version Added
4.0.0
-
--partition-names¶ Specify partitions to be included for offload with Partition-Based Offload. For range-partitioned tables only a single partition name can be specified and it is used to derive a value for
--less-than-value/--older-than-dateas appropriate. For list-partitioned tables, this option is used to supply a CSV of all partitions to be offloaded and is additional to any partitions offloaded in previous operations.Alias
None
Default Value
None
Supported Values
Valid partition name(s)
Version Added
3.3.0
-
--partition-upper-value¶ Integer value defining the upper bound of a range of values used for backend integer range partitioning. BigQuery only.
Alias
None
Default Value
None
Supported Values
Positive integers
Version Added
4.0.0
-
--preserve-load-table¶ Stops the load table being dropped on completion of offload.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--purge¶ When supported by the backend system, utilize purge when removing a table due to
--reset-backend-table.Alias
None
Default Value
None
Supported Values
None
Version Added
2.4.9
-
--reset-backend-table¶ Remove the backend table before offloading. Use with caution as this will delete previously offloaded data for this table.
Alias
None
Default Value
None
Supported Values
None
Version Added
3.3.0
-
--reset-hybrid-view¶ Reset Partition-Based Offload, Subpartition-Based Offload or Predicate-Based Offload predicates in the hybrid view.
Alias
None
Default Value
None
Supported Values
None
Version Added
3.3.0
-
--skip-steps¶ Skip given steps. CSV of step IDs to be skipped. Step IDs are found by replacing spaces with underscore and are case-insensitive.
For example, it is possible to skip Impala compute statistics commands using a value of
Compute_backend_statisticsif an initial offload is being performed in stages, and then gather them with the final offload command.Alias
None
Default Value
None
Supported Values
Valid offload step names
Version Added
2.3.0
-
--sort-columns¶ CSV list of columns used to sort or cluster data when inserting into the final destination table. Offloads using Partition-Based Offload or Subpartition-Based Offload will retrieve the value used by the prior offload if no list of columns is explicitly provided. This option has no effect when
OFFLOAD_SORT_ENABLED/--offload-sort-enabledis false.When using Offload Join the column names in
--sort-columnsmust match those in the final destination table (not the names used in the source tables).This option supports the wildcard character
*in column names.Alias
None
Default Value
None for non-partitioned source tables,
--partition-columnsfor partitioned source tablesSupported Values
Valid column name(s)
Version Added
2.7.0
-
--sqoop-disable-direct¶ It is recommended that the OraOOP optimizations for Sqoop (included in standard Apache Sqoop from
v1.4.5) are used. If not, then disable direct path mode.Alias
None
Default Value
SQOOP_DISABLE_DIRECT,falseSupported Values
true|falseVersion Added
2.3.0
-
--sqoop-mapreduce-map-java-opts¶ Sqoop specific setting for
-Dmapreduce.map.java.opts. Allows control over Java options for Sqoop MapReduce jobs.Alias
None
Default Value
None
Supported Values
Valid Sqoop Java options
Version Added
2.3.0
-
--sqoop-mapreduce-map-memory-mb¶ Sqoop specific setting for
-Dmapreduce.map.memory.mb. Allows control over memory allocation for Sqoop MapReduce jobs.Alias
None
Default Value
None
Supported Values
Valid numbers in MB
Version Added
2.3.0
-
--sqoop-additional-options¶ Additional Sqoop command options added to the end of the Sqoop command.
Alias
None
Default Value
Supported Values
Any Sqoop command option/argument not already included in the Sqoop command line
Version Added
2.9.0
-
--sqoop-password-file¶ Path to an HDFS file containing
ORA_APP_PASSwhich is then passed to Sqoop using the Sqoop--password-fileoption. This file should be protected with appropriate file system permissions.Alias
None
Default Value
Supported Values
Valid HDFS path
Version Added
2.5.0
-
--storage-compression¶ Storage compression of final offload table.
GZIPonly available with Parquet.ZLIBonly available with ORC.MEDis an alias forSNAPPYon both Impala and Hive. This is the default value because it gives the best balance of elapsed time to compression.HIGHis an alias forGZIPon Impala,ZLIBon Hive.Alias
None
Default Value
MEDSupported Values
HIGH|MED|NONE|GZIP|ZLIB|SNAPPYVersion Added
2.3.0
-
--storage-format¶ Storage format of final backend table. Not applicable to Google BigQuery or Snowflake.
Alias
None
Default Value
PARQUETfor Impala,ORCfor HiveSupported Values
ORC|PARQUETVersion Added
2.3.0
-
--timestamp-tz-columns¶ CSV list of columns to offload as a timestamp with time zone (will only be effective for date-based columns).
This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
4.0.0
-
--udf-db¶ Backend database to use for user-defined functions (UDFs).
Gluent Data Platform UDFs are used in Hadoop-based backends to:
Convert data to Oracle Database binary formats (
ORACLE_NUMBER,ORACLE_DATE)Perform Run-Length Encoding
Handle data conversion functions e.g.
UPPER,LOWER
They are installed once during installation, and upgraded, using the
connect --install-udfscommand.Custom UDFs can also be created by users in BigQuery and used by Gluent Data Platform for synthetic partitioning. Custom UDFs must be installed prior to running any
offloadcommands that require access to them.Alias
None
Default Value
Supported Values
Valid backend database
Version Added
2.3.0
-
--unicode-string-columns¶ CSV list of columns to Offload as Unicode string (only effective for string columns).
This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
4.3.0
-
--variable-string-columns¶ CSV list of columns to offload as a variable length string. Only effective for date/timestamp columns.
This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--verify¶ Validation method to use when verifying data at the end of an offload.
Alias
None
Default Value
minusSupported Values
minus|aggregateVersion Added
2.3.0
-
--verify-parallelism¶ Degree of parallelism to use for the RDBMS query executed when validating an offload. Values of 0 or 1 will execute the query without parallelism. Values > 1 will force a parallel query of the given degree. If unset, the RDBMS query will fall back to using the behavior specified by RDBMS defaults.
Alias
None
Default Value
Supported Values
0and positive integersVersion Added
4.2.1
Present Parameters¶
-
--aggregate-by¶ CSV list of columns to aggregate by (GROUP BY) when presenting an Advanced Aggregation Pushdown rule.
This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.3.0
-
--base-name¶ For aggregations only. Provide the name of the base hybrid view originally presented before aggregation. Use when the base view name is different to its source backend table.
Alias
None
Default Value
None
Supported Values
<SCHEMA>.<VIEW_NAME>Version Added
2.3.0
-
--binary-columns¶ CSV list of columns to present using a binary data type. Only effective for string-based columns.
This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--columns¶ CSV list of columns to present.
This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.3.0
-
--count-star-expressions¶ CSV list of functional equivalents to
COUNT(*)for aggregation pushdown.If you also use
COUNT(x)in your SQL statements then, apart fromCOUNT(1)which is automatically catered for, the presence ofCOUNT(x)will cause rewrite rules to fail unless you include it with this parameter.Alias
None
Default Value
None
Supported Values
E.g.
COUNT(9)Version Added
2.3.0
-
--data-governance-custom-properties¶ CSV list of free-format tags for data governance metadata. These are in addition to
DATA_GOVERNANCE_AUTO_PROPERTIESand will overrideDATA_GOVERNANCE_CUSTOM_PROPERTIES.Alias
None
Default Value
Supported Values
Valid JSON string of key/value pairs (no nested or complex data types)
Version Added
2.11.0
CSV list of free-format tags for data governance metadata. These are in addition to
DATA_GOVERNANCE_AUTO_TAGSand therefore useful for tags to be applied to specific activities.Alias
None
Default Value
Supported Values
E.g.
CONFIDENTIAL,TIER1Version Added
2.11.0
-
--date-columns¶ CSV list of columns to present to Oracle Database as DATE (effective for datetime/timestamp columns).
This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.3.0
-
--date-fns¶ CSV list of functions to apply to the non-aggregating date/timestamp projection.
Alias
None
Default Value
MIN,MAX,COUNTSupported Values
MIN,MAX,COUNTVersion Added
2.3.0
-
--decimal-columns¶ CSV list of columns to offload/present as a fixed precision and scale numeric data type. For example
DECIMAL(p,s)where “p,s” is specified in a paired--decimal-columns-typeoption. Only effective for numeric columns. These options allow repeat inclusion for flexible data type specification, for example:"--decimal-columns-type=18,2 --decimal-columns=price,cost --decimal-columns-type=6,4 --decimal-columns=location"
This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.5.0
-
--decimal-columns-type¶ State the precision and scale of columns listed in a paired
--decimal-columnsoption. Must be of format “precision,scale” where 1<=precision<=38 and 0<=scale<=38 and scale<=precision. e.g.:"--decimal-columns-type=18,2"
When offloading, values specified in this option are subject to padding as per the
--decimal-padding-digitsoption.Alias
None
Default Value
None
Supported Values
Valid “precision,scale” where 1<=precision<=38 and 0<=scale<=38 and scale<=precision
Version Added
2.5.0
-
--detect-sizes¶ Query backend table/view data length and set external table columns sizes accordingly.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--integer-1-columns¶ CSV list of columns to offload/present (as applicable) as a 1-byte integer, known as
TINYINTin many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-2-columns¶ CSV list of columns to offload/present (as applicable) as a 2-byte integer, known as
SMALLINTin many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-4-columns¶ CSV list of columns to offload/present (as applicable) as a 4-byte integer, known as
INTin many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-8-columns¶ CSV list of columns to offload/present (as applicable) as an 8-byte integer, known as
BIGINTin many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-38-columns¶ CSV list of columns to offload/present (as applicable) as 38 digit integral column. If a system does not support 38 digits of precision then the most appropriate data type available will be used. Only effective for numeric columns.
This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--interval-ds-columns¶ CSV list of columns to present to Oracle Database as
INTERVAL DAY TO SECONDtype (will only be effective for backendSTRINGcolumns).This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.3.0
-
--interval-ym-columns¶ CSV list of columns to present to Oracle Database as
INTERVAL YEAR TO MONTHtype (will only be effective for backendSTRINGcolumns).This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.3.0
-
--large-binary-columns¶ CSV list of columns to present using a large binary data type, for example Oracle Database
BLOB. Only effective for string-based columns.This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--large-string-columns¶ CSV list of columns to present as a large string data type, for example Oracle Database
CLOB. Only effective for string-based columns.This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--lob-data-length¶ Expected length of RDBMS LOB data
Alias
None
Default Value
32KSupported Values
E.g.
64K,10MVersion Added
2.4.7
-
--materialize-join¶ Use this option to materialize a join specified using
--present-join.Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--measures¶ CSV list of aggregated columns to include in the projection of an aggregated present.
This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.4.0
-
--no-create-aggregations¶ Skip aggregation creation. If this parameter is used, then to benefit from Advanced Aggregation Pushdown the aggregate hybrid objects must be created using Present.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--no-gather-stats¶ Skip generation of new statistics for presented tables/views (default behavior is to generate statistics for new aggregate/join views or existing backend tables with no statistics).
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--num-location-files¶ Number of external table location files for parallel data retrieval.
Alias
None
Default Value
Supported Values
Integer values
Version Added
2.7.2
-
--numeric-fns¶ CSV list of aggregate functions to apply to aggregated numeric columns or measures in an aggregation projection.
Alias
None
Default Value
MIN,MAX,AVG,SUM,COUNTSupported Values
MIN,MAX,AVG,SUM,COUNTVersion Added
2.3.0
-
--present-join¶ Present a view of the supplied join(s) allowing the join processing to be offloaded. Repeated use of
--present-joinallows multiple row sources to be included. See documentation for syntax.Alias
None
Default Value
None
Supported Values
Version Added
2.3.0
-
--reset-backend-table¶ Remove the backend table before offloading. Use with caution as this will delete previously offloaded data for this table.
Alias
None
Default Value
None
Supported Values
None
Version Added
3.3.0
-
--sample-stats¶ Estimate statistics by scanning a few (random) partitions for presented partitioned tables/views, or a percentage of the non-partitioned presented table/view for backends that support row based percentage sampling (default behavior is to scan the entire table).
Alias
None
Default Value
None
Supported Values
0-100Version Added
2.3.0
-
--string-fns¶ CSV list of aggregate functions to apply to aggregated string columns or measures in an aggregation projection.
Alias
None
Default Value
MIN,MAX,COUNTSupported Values
MIN,MAX,COUNTVersion Added
2.3.0
-
--timestamp-columns¶ CSV list of columns to present as a
TIMESTAMP(only effective for date based columns)This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
4.0.0
-
--unicode-string-columns¶ CSV list of columns to Present as Unicode string (only effective for string columns).
This option supports the wildcard character
*in column names.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
4.3.0
Incremental Update Parameters¶
-
--incremental-batch-size¶ Batch (fetch) size to use when extracting changes for shipping from a table that is enabled for Incremental Update.
Alias
None
Default Value
1000Supported Values
Positive integers
Version Added
2.5.0
-
--incremental-changelog-sequence-cache-size¶ Specifies the cache size to use for a sequence coupled to the log table used for Incremental Update extraction.
Alias
None
Default Value
100Supported Values
Positive integers
Version Added
2.10.0
-
--incremental-changelog-table¶ Specifies the name of the log table to use for Incremental Update extraction (format is
<OWNER>.<TABLE>). Not required when--incremental-extraction-methodisORA_ROWSCN.Alias
None
Default Value
<Hybrid Schema>.<Table Name>_LOGSupported Values
<OWNER>.<TABLE>Version Added
2.5.0
-
--incremental-delta-threshold¶ When running the compaction routine for a table enabled for Incremental Update, this threshold denotes the minimum number of changes required to enable the compaction routine to be executed (i.e. compaction will only be executed if there are at least this many rows in the delta table at a given time).
Alias
None
Default Value
50000Supported Values
Positive integers
Version Added
2.5.0
-
--incremental-extraction-method¶ Indicates which change extraction method to use when enabling Incremental Update for a table during an offload.
Alias
None
Default Value
ORA_ROWSCNSupported Values
ORA_ROWSCN,CHANGELOG,UPDATABLE_CHANGELOG,UPDATABLE,CHANGELOG_INSERT,UPDATABLE_INSERTVersion Added
2.5.0
-
--incremental-full-compaction¶ When running the compaction routine for a table that has Incremental Update enabled, insert compacted records into a new base table, also known as an out-of-place compaction.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.10.0
-
--incremental-key-columns¶ Comma-separated list of columns that uniquely identify rows in an offloaded source table. Columns are used when extracting incremental changes from the source table and applying them to the offloaded table. In the absence of this parameter the primary key of the table is used.
This option supports the wildcard character
*in column names.Alias
None
Default Value
Primary key
Supported Values
Comma-separated list of columns
Version Added
2.5.0
-
--incremental-no-lockfile¶ When running the compaction routine for a table that is enabled for Incremental Update, do not use a lockfile on the local filesystem to prevent multiple compaction processes from running concurrently (on that machine).
Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--incremental-no-verify-primary-key¶ Bypass verification of mandatory primary key when using
CHANGELOG_INSERTorUPDATABLE_INSERTextraction methods.Alias
None
Default Value
None
Supported Values
None
Version Added
2.9.0Warning
With this option, users must ensure that no duplicate records are inserted.
-
--incremental-no-verify-shipped¶ Bypass verification of the number of change records shipped when extracting and shipping changes for a table that is enabled for Incremental Update. Not applicable when using Incremental Update with Google BigQuery backends.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--incremental-partition-wise-full-compaction¶ When running the compaction routine for a table that has Incremental Update enabled, insert compacted records into the new base table partition-wise. Note that this may cause the compaction process to take significantly longer overall, but it can also significantly reduce the cluster resources used by compaction at any one time.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0. Renamed from--incremental-partition-wise-compactionin2.10.0
-
--incremental-retain-obsolete-objects¶ Retain the previous artifacts when the compaction routine has completed for a table with Incremental Update enabled.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0Warning
With this option, users must manage previous artifacts and associated storage. In some circumstances, retained obsolete objects can cause the re-offloading of entire tables (with the
--reset-backend-tableoption) to fail.
-
--incremental-run-compaction¶ Run the compaction routine for a table that has Incremental Update enabled. Must be used in conjunction with the
--executeparameter.Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--incremental-run-compaction-without-snapshot¶ Run the compaction routine for a table without creating an HDFS snapshot.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.10.0
-
--incremental-run-extraction¶ Extract and ship all new changes for a table that has Incremental Update enabled. Must be used in conjunction with the
--executeparameter.Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--incremental-terminate-compaction¶ When running the compaction routine for a table with Incremental Update enabled, instruct the compaction process to exit when blocked by some external condition. By default, the compaction process will keep running when blocked, but will drop into a sleep-then-poll loop.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--incremental-tmp-dir¶ When extracting and shipping changes for a table that has Incremental Update enabled, this specifies the staging directory to be used for local data files, before they are shipped to HDFS.
Alias
None
Default Value
<OFFLOAD_HOME>/tmp/incremental_changesSupported Values
Valid writable directory
Version Added
2.5.0
-
--incremental-updates-disabled¶ Disables Incremental Update for the specified table.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.6.0
-
--incremental-updates-enabled¶ Enables Incremental Update for the table being offloaded.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--incremental-wait-time¶ When running the compaction routine for a table that has Incremental Update enabled, this specifies the minimum amount of time (in minutes) to allow for active queries to complete before performing any database operations that could cause such queries to fail.
Alias
None
Default Value
15Supported Values
0 and positive integersVersion Added
2.5.0
Validate Parameters¶
-
--aggregate-functions¶ Comma-separated list of aggregate functions to apply, e.g.
MIN,MAX,COUNT. Functions need to be available and use the same arguments in both frontend and backend databases.Alias
-ADefault Value
[('MIN', 'MAX', 'COUNT')]Supported Values
CSV list of expressions
Version Added
2.3.0
-
--as-of-scn¶ Execute validation on frontend site as-of a specified SCN (assumes an
ORACLEfrontend).Alias
None
Default Value
None
Supported Values
Valid SCN
Version Added
2.3.0
-
--filters¶ Comma-separated list of (<column> <operation> <value>) expressions, e.g.
PROD_ID < 12, CUST_ID >= 1000. Expressions must be supported in both frontend and backend databases.Alias
-FDefault Value
None
Supported Values
CSV list of expressions
Version Added
2.3.0
-
--frontend-parallelism¶ Degree of parallelism to use for the RDBMS query executed when validating an offload. Values of 0 or 1 will execute the query without parallelism. Values > 1 will force a parallel query of the given degree. If unset, the RDBMS query will fall back to using the behavior specified by RDBMS defaults.
Alias
None
Default Value
Supported Values
0and positive integersVersion Added
4.2.1
-
--group-bys¶ Comma-separated list of group by expressions, e.g.
COL1, COL2. Expressions must be supported in both frontend and backend databases.This option supports the wildcard character
*in column names.Alias
-GDefault Value
None
Supported Values
csv list of expressions
Version Added
2.3.0
-
--selects¶ Comma-separated list of columns OR <number> of columns to run aggregations on. If <number> is specified the first and last columns and the <number>-2 highest cardinality columns will be selected.
This option supports the wildcard character
*in column names.Alias
-SDefault Value
5Supported Values
CSV list of columns OR <number>
Version Added
2.3.0
-
--skip-boundary-check¶ Do not include ‘offloaded boundary check’ in the list of filters. The ‘offloaded boundary check’ filter defines data that was offloaded to the backend database. For example:
WHERE TIME_ID < timestamp '2015-07-01 00:00:00'which resulted from applying the--older-than-date=2015-07-01filter during offload.Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
Schema Sync Parameters¶
-
--command-file¶ Name of an additional log file to record the commands that have been applied (if the
--executeoption has been used) or should be applied (if the--executeoption has not been used). Supplied as full or relative path.Alias
None
Default Value
None
Supported Values
Full or relative path to file
Version Added
2.8.0
-
--include¶ CSV list of schemas, schema.tables or tables to examine for change detection and evolution. Supports wildcards (using
*). Example formats:SCHEMA1,SCHEMA*,SCHEMA1.TABLE1,SCHEMA1.TABLE2,SCHEMA2.TAB*,SCHEMA1.TAB*,*.TABLE1,*.TABLE2,*.TAB*.Alias
None
Default Value
None
Supported Values
List of one or more schema(s), schema(s).table(s) or table(s)
Version Added
2.8.0
-
--no-create-aggregations¶ Skip aggregation creation. If this parameter is used, then to benefit from Advanced Aggregation Pushdown the aggregate hybrid objects must be created using Present.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
Diagnose Parameters¶
-
--backend-log-size-limit¶ Size limit for data returned from each backend log e.g.
100K,0.5M,1G.Alias
None
Default Value
10MSupported Values
<n><K|M|G|T>Version Added
2.11.0
-
--hive-http-endpoint¶ Endpoint of the HiverServer2 or HiveServer2 Interactive (LLAP) service in the format
<server|ip address>:<port>.Alias
None
Default Value
None
Supported Values
<server|ip address>:<port>Version Added
3.1.0
-
--impalad-http-port¶ Port of the Impala Daemon HTTP Server.
Alias
None
Default Value
25000Supported Values
Positive integers
Version Added
2.11.0
-
--include-backend-logs¶ Retrieve backend query engine logs.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.11.0
-
--include-backend-config¶ Retrieve backend query engine config.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.11.0
-
--include-logs-from¶ Collate and package log files modified or created since date (format:
YYYY-MM-DD) or date/time (format:YYYY-MM-DD_HH24:MM:SS). Can be used in conjunction with the--include-logs-toparameter to specify a search range.Alias
None
Default Value
None
Supported Values
YYYY-MM-DDorYYYY-MM-DD_HH24:MM:SSVersion Added
2.11.0
-
--include-logs-last¶ Collate and package log files modified or created in the last
n[d]ays (e.g. 3d) or [h]ours (e.g. 7h).Alias
None
Default Value
None
Supported Values
<n><d|h>Version Added
2.11.0
-
--include-logs-to¶ Collate and package log files modified or created since date (format:
YYYY-MM-DD) or date/time (format:YYYY-MM-DD_HH24:MM:SS). Can be used in conjunction with the--include-logs-fromparameter to specify a search range.Alias
None
Default Value
None
Supported Values
YYYY-MM-DDorYYYY-MM-DD_HH24:MM:SSVersion Added
2.11.0
-
--include-permissions¶ Collect permissions of files and directories related to Gluent Data Platform.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.11.0
-
--include-processes¶ Collect details for running processes related to Gluent Data Platform.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.11.0
-
--include-query-logs¶ Retrieve logs for a supplied query ID.
Alias
None
Default Value
None
Supported Values
Valid Impala/LLAP query ID
Version Added
2.11.0
-
--log-location¶ Location in which to search for log files.
Alias
None
Default Value
OFFLOAD_HOME/logSupported Values
Valid directory path
Version Added
2.11.0
-
--output-location¶ Location in which to save files created by Diagnose.
Alias
None
Default Value
OFFLOAD_HOME/logSupported Values
Valid directory path
Version Added
2.11.0
-
--retain-created-files¶ By default, after they have been packaged, files created by Diagnose in
--output-locationare removed. Specify this parameter to retain them.Alias
None
Default Value
None
Supported Values
None
Version Added
2.11.0
-
--spark-application-id¶ Retrieve logs for a supplied Spark application ID.
Alias
None
Default Value
None
Supported Values
Valid Spark application ID
Version Added
3.1.0
Offload Status Report Parameters¶
-
--csv-delimiter¶ Field delimiter character for output.
Alias
None
Default Value
,Supported Values
Must be a single character
Version Added
2.11.0
-
--csv-enclosure¶ Enclosure character for string fields in CSV output.
Alias
None
Default Value
"Supported Values
Must be a single character
Version Added
2.11.0
-
-o¶ Output format for Offload Status Report data.
Alias
--output-formatDefault Value
textSupported Values
csv|text|html|json|rawVersion Added
2.11.0
-
--output-format¶ Output format for Offload Status Report data.
Alias
-oDefault Value
textSupported Values
csv|text|html|json|rawVersion Added
2.11.0
-
--output-level¶ Level of detail required for the Offload Status Report.
Alias
None
Default Value
summarySupported Values
summary|detailVersion Added
2.11.0
-
--report-directory¶ Directory to save the report in.
Alias
None
Default Value
OFFLOAD_HOME/logSupported Values
Valid directory path
Version Added
2.11.0
-
--report-name¶ Name of report.
Alias
None
Default Value
Gluent_Offload_Status_Report_{DB_NAME}_{YYYY}-{MM}-{DD}_{HH}-{MI}-{SS}.[html|txt|csv]Supported Values
Valid filename
Version Added
2.11.0
-
-s¶ Optional name of schema to run the Offload Status Report for.
Alias
--schemaDefault Value
None
Supported Values
Valid schema name
Version Added
2.11.0
-
--schema¶ Optional name of schema to run the Offload Status Report for.
Alias
-sDefault Value
None
Supported Values
Valid schema name
Version Added
2.11.0
-
-t¶ Optional name of table to run the Offload Status Report for.
Alias
--tableDefault Value
None
Supported Values
Valid table name
Version Added
2.11.0
-
--table¶ Optional name of table to run the Offload Status Report for.
Alias
-tDefault Value
None
Supported Values
Valid table name
Version Added
2.11.0
Password Tool Parameters¶
-
--encrypt¶ Encrypt a clear-text, case-sensitive password. User will be prompted for the input password and the encrypted version will be output.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--keygen¶ Generate a password key file of the name given by
--keyfile.Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--keyfile¶ Name of the password key file to generate.
Alias
None
Default Value
None
Supported Values
Valid path and file name
Version Added
2.5.0
Result Cache Manager Parameters¶
-
--rc-retention-hours¶ Controls how long to retain Result Cache files for.
Alias
None
Default Value
24Supported Values
Valid number of hours
Version Added
2.3.0
Oracle Database Schemas¶
Gluent Data Platform Admin Schema¶
This account is used by Gluent Data Platform to perform administrative activities. It is defined by ORA_ADM_USER.
Non-standard privileges granted to this schema are:
-
ANALYZE ANY Required to copy optimizer statistics from application schema to hybrid schema
-
GRANT ANY OBJECT PRIVILEGE Enables the Admin Schema to grant permission on application schema tables to the hybrid schema.
-
SELECT ANY DICTIONARY Enables Offload and Present operations to access the Oracle Database data dictionary for information such as column names, data types and partitioning schemes.
-
SELECT ANY TABLE Required for Offload activity.
Gluent Data Platform Application Schema¶
This account is used by Gluent Data Platform to perform read-only activities. It is defined by ORA_APP_USER.
Non-standard privileges granted to this schema are:
-
FLASHBACK ANY TABLE Required for Sqoop to provide a consistent point-in-time data load. The Gluent Data Platform application schema does not have DML privileges on user application schema tables, therefore there is no threat posed by this configuration.
-
SELECT ANY DICTIONARY Documented requirement of Sqoop.
-
SELECT ANY TABLE Required for Sqoop to read application schema tables during an offload.
Gluent Data Platform Repository Schema¶
This account is used by Gluent Data Platform to store operational metadata. It is defined by ORA_REPO_USER.
Non-standard privileges granted to this schema are:
-
SELECT ANY DICTIONARY Enables installed database packages in support of the metadata repository to access the Oracle Database data dictionary.
Hybrid Schemas¶
Gluent Data Platform hybrid schemas are required to enable remote data to be queried in tandem with customer data in the RDBMS application schema.
Non-standard privileges granted to hybrid schemas are:
-
CONNECT THROUGH GLUENT_ADM Offload and Present use this to create hybrid objects without requiring powerful
CREATE ANYandDROP ANYprivileges.
-
GLOBAL QUERY REWRITE Required to support Gluent Query Engine optimizations.
-
SELECT ANY TABLE Enables a hybrid view to access the original application schema and offloaded table.
Data Daemon¶
Properties¶
The following Java properties can be set by creating a $OFFLOAD_HOME/conf/datad.properties file containing <property>=<value> properties and values.
-
datad.initial-request-pool-size¶ The initial size of the thread pool for concurrent read requests from the RDBMS.
Default Value
16Supported Values
Positive integers
Version Added
4.2.2
-
datad.max-request-pool-size¶ The maximum size of the thread pool for concurrent read requests from the RDBMS.
Default Value
1024Supported Values
Positive integers
Version Added
4.2.2
-
datad.read-pipeline-size¶ The number of reads from the backend to keep in the pipeline to be processed.
Default Value
4Supported Values
Positive integers
Version Added
4.0.0
-
datad.send-queue-size¶ The maximum size in MB of the queue to send to the RDBMS.
Default Value
16Supported Values
Positive integers
Version Added
4.0.0
-
grpc.port¶ The port used for Data Daemon. Setting to
0results in random port selection.Default Value
50051Supported Values
Any valid port
Version Added
4.0.0
-
grpc.security.cert-chain¶ The full path to the certificate chain in PEM format to enable TLS on the Data Daemon socket.
Default Value
None
Supported Values
file:<full path to PEM file>Version Added
4.0.0
-
grpc.security.private-key¶ The full path to the private key in PEM format to enable TLS on the Data Daemon socket.
Default Value
None
Supported Values
file:<full path to PEM file>Version Added
4.0.0
-
logging.config¶ The full path to a LOGBack format configuration file to override default logging.
Default Value
None
Supported Values
<full path to xml file>Version Added
4.0.0
-
logging.level.com.gluent.providers.bigquery.BigQueryProvider¶ The log level for Data Daemon interactions with BigQuery.
Default Value
infoSupported Values
off|error|warn|info|debug|allVersion Added
4.0.0
-
logging.level.com.gluent.providers.impala.ImpalaProvider¶ The log level for Data Daemon interactions with Impala.
Default Value
infoSupported Values
off|error|warn|info|debug|allVersion Added
4.0.0
-
logging.level.com.gluent.providers.jdbc.JdbcDataProvider¶ The log level for general Data Daemon operations when interacting with Snowflake and Azure Synapse Analytics.
Default Value
infoSupported Values
off|error|warn|info|debug|allVersion Added
4.1.0
-
logging.level.com.gluent.providers.snowflake.SnowflakeJdbcDataProvider¶ The log level for Data Daemon interactions with Snowflake.
Default Value
infoSupported Values
off|error|warn|info|debug|allVersion Added
4.1.0
-
logging.level.com.gluent.providers.synapse.SynapseProvider¶ The log level for Data Daemon interactions with Azure Synapse Analytics.
Default Value
infoSupported Values
off|error|warn|info|debug|allVersion Added
4.3.0
-
server.port¶ The port used for Data Daemon Web Interface. Setting to
0results in random port selection.Default Value
50052Supported Values
Any valid port
Version Added
4.0.0
-
spring.main.web-application-type¶ Allows Data Daemon Web Interface to be disabled.
Default Value
None
Supported Values
NONEVersion Added
4.0.0
Configuration¶
The following Java configuration options can be set by creating a $OFFLOAD_HOME/conf/datad.conf file containing JAVA_OPTS="<parameter1> <parameter2> ..." e.g. JAVA_OPTS="-Xms2048m -Xmx2048m -Djavax.security.auth.useSubjectCredsOnly=false".
-
-Xms¶ Sets the initial and minimum Java heap size.
Default Value
Smaller of 1/4th of the physical memory or 1GB
Supported Values
-Xms<size>[g|G|m|M|k|K]Version Added
4.0.0
-
-Xmx¶ Sets the maximum Java heap size.
Default Value
Larger of 1/64th of the physical memory or some reasonable minimum
Supported Values
-Xmx<size>[g|G|m|M|k|K]Version Added
4.0.0
-
-Djavax.security.auth.useSubjectCredsOnly¶ Required to be set to
falsewhen authenticating with a Kerberos enabled backend.Default Value
trueSupported Values
true|falseVersion Added
4.0.0