Reference¶
Table of Contents
Documentation Conventions¶
Commands and keywords are in this
font.$OFFLOAD_HOMEis set when the environment file (offload.env) is sourced, unless already set, and refers to the directory namedoffloadthat is created when the software is unpacked. This is also referred to as<OFFLOAD_HOME>in sections of this guide where the environment file has not been created/sourced.Third party vendor product names might be aliased or shortened for simplicity. See Third Party Vendor Products for cross-references to full product names and trademarks.
Environment File¶
-
BACKEND_DISTRIBUTION¶ Backend system distribution override.
Necessity
Mandatory
Supported Values
CDH|EMR|GCP|HDP|MAPRVersion Added
2.3.0
-
BACKEND_IDENTIFIER_CASE¶ Case conversion to be applied to any backend identifier names created by Gluent Data Platform. Backend systems may ignore any case conversion if they are case-insensitive.
Necessity
Optional
Supported Values
UPPER|LOWER|NO_MODIFYVersion Added
4.0.0
-
BIGQUERY_DATASET_LOCATION¶ Google BigQuery location to use when creating a dataset. Only applicable when creating datasets using the
--create-backend-dboption.Necessity
Optional
Supported Values
Any valid Google BigQuery location
Version Added
4.0.2
Note
Google BigQuery dataset locations must be compatible with that of the Google Cloud Storage bucket specified in OFFLOAD_FS_CONTAINER
-
CLASSPATH¶ Ensures Gluent
libdirectory is included.Necessity
Optional
Supported Values
Valid paths
Version Added
2.3.0
-
CLOUDERA_NAVIGATOR_HIVE_SOURCE_ID¶ The Cloudera Navigator entity ID for the Hive source that will register metadata. See the Installation and Upgrade guide for details on how to set this parameter.
Necessity
Mandatory when integrating with Cloudera Navigator, otherwise optional
Supported Values
Valid Cloudera Navigator entity ID
Version Added
2.11.0
-
CONNECTOR_SQL_ENGINE¶ SQL engine used by Gluent Query Engine for hybrid queries.
Necessity
Optional
Default Value
IMPALASupported Values
IMPALA|BIGQUERYVersion Added
3.1.0
-
CONN_PRE_CMD¶ Used to set pre-commands before query execution, e.g.
set hive.execution.engine=tez;.Necessity
Optional
Supported Values
Supported session
setparametersVersion Added
2.3.0
-
DATAD_ADDRESS¶ The address(es) of Data Daemon. For a single daemon the format is
<hostname/IP address>:<port>. Specifying multiple daemons can be achieved in one of two ways:By DNS address. The DNS server can return multiple A records for a hostname and Gluent Data Platform will load balance between these, e.g.
<load-balancer-address>:<load-balancer-port>By IP address and port. The comma-separated list must be prefixed with
ipv4:e.g.ipv4:<hostname/IP address>:<port>,<hostname/IP address>:<port>
Necessity
Mandatory
Supported Values
<hostname/IP address>:<port>,<load-balancer-address>:<load-balancer-port>,ipv4:<hostname/IP address>:<port>,<hostname/IP address>:<port>Version Added
4.0.0
-
DATAD_SSL_ACTIVE¶ Set to
truewhen TLS is enabled on the Data Daemon socket.Necessity
Optional
Supported Values
true|falseVersion Added
4.0.0
-
DATAD_SSL_TRUSTED_CERTS¶ The trusted certificate when TLS is enabled on the Data Daemon socket.
Necessity
Optional
Supported Values
Full path to the trusted certificate
Version Added
4.0.0
-
DATA_GOVERNANCE_API_PASS¶ Password for the account specified in
DATA_GOVERNANCE_API_USER. Password encryption is supported using the Password Tool utility.Necessity
Optional
Supported Values
Cloudera Navigator service account password
Version Added
2.11.0
-
DATA_GOVERNANCE_API_URL¶ URL for a data governance REST API in the format
http://fqdn-n.example.com:port/api. Leaving this configuration item blank disables data governance integration.Necessity
Optional
Supported Values
Valid Cloudera Navigator REST API URL
Version Added
2.11.0
-
DATA_GOVERNANCE_API_USER¶ Service account to be used to connect to a data governance REST API.
Necessity
Optional
Supported Values
Cloudera Navigator service account name
Version Added
2.11.0
-
DATA_GOVERNANCE_AUTO_PROPERTIES¶ CSV string of dynamic properties to include in data governance metadata. The tokens in the CSV will be expanded at runtime if prefixed with
+or ignored if prefixed with-.Necessity
Optional
Supported Values
CSV containing the following tokens prefixed with either
+or-.GLUENT_OBJECT_TYPE,SOURCE_RDBMS_TABLE,TARGET_RDBMS_TABLE,INITIAL_GLUENT_VERSION,LATEST_GLUENT_VERSION,INITIAL_OPERATION_DATETIME,LATEST_OPERATION_DATETIMEVersion Added
2.11.0
-
DATA_GOVERNANCE_AUTO_TAGS¶ CSV string of tags to include in data governance metadata. Tags are free-format except for
+RDBMS_NAMEwhich is expanded at run time.Necessity
Optional
Default Value:
GLUENT,+RDBMS_NAMESupported Values
CSV containing tags to attach to data governance metadata
Version Added
2.11.0
-
DATA_GOVERNANCE_BACKEND¶ Specify the data governance API type accessed via
DATA_GOVERNANCE_API_URL.Necessity
Optional
Supported Values
navigatorVersion Added
2.11.0
-
DATA_GOVERNANCE_CUSTOM_PROPERTIES¶ JSON string of key/value pairs to include in data governance metadata.
Necessity
Optional
Associated Option
Supported Values
Valid JSON string of key/value pairs (no nested or complex data types)
Version Added
2.11.0
-
DATA_GOVERNANCE_CUSTOM_TAGS¶ CSV string of tags to include in data governance metadata.
Necessity
Optional
Associated Option
Supported Values
CSV containing tags to attach to data governance metadata
Version Added
2.11.0
-
DB_NAME_PREFIX¶ Database name/path prefix for multitenant support. This allows multiple Oracle Database databases to offload to the same backend cluster. If undefined, the
DB_UNIQUE_NAMEwill be used, giving<DB_UNIQUE_NAME>_<schema>. If defined but empty, no prefix is used, giving<schema>. Otherwise, databases will be named<DB_NAME_PREFIX>_<schema>.If the source database is part of an Oracle Data Guard configuration set
DB_NAME_PREFIXto ensure thatDB_UNIQUE_NAMEis not used.Necessity
Optional
Associated Option
Supported Values
Supported Hadoop characters
Version Added
2.3.0
-
DEFAULT_BUCKETS¶ Default number of offload buckets (subpartitions) for parallel data retrieval from the backend system. If you aim to run your biggest queries with parallel DOP X then set this value to X. This way each Oracle Database PX slave can start its own Smart Connector process for fetching a subset of data.
Necessity
Optional
Associated Option
Supported Values
Valid Oracle Database DOP
Version Added
2.3.0
-
DEFAULT_BUCKETS_MAX¶ Upper limit of
DEFAULT_BUCKETSwhenDEFAULT_BUCKETS=AUTO.Necessity
Optional
Default Value
16Supported Values
Valid Oracle Database DOP
Version Added
2.7.0
-
DEFAULT_BUCKETS_THRESHOLD¶ Threshold at which RDBMS segments are considered “small” by
DEFAULT_BUCKETS=AUTOtuning.Necessity
Optional
Supported Values
E.g.
3M,0.5GVersion Added
2.7.0
-
GOOGLE_APPLICATION_CREDENTIALS¶ Path to Google service account private key JSON file.
Necessity
Mandatory
Supported Values
Valid paths
Version Added
4.0.0
-
HADOOP_SSH_USER¶ User to connect to Hadoop server(s) defined in
HIVE_SERVER_HOSTusing password-less SSH.Necessity
Mandatory
Supported Values
Valid os username
Version Added
2.3.0
-
HDFS_CMD_HOST¶ Overrides
HIVE_SERVER_HOSTfor the HDFS command steps only. In split installation environments where orchestration commands are run from a Hadoop edge node(s), set this tolocalhostin the Hadoop edge node(s) configuration file.Necessity
Optional
Supported Values
Hostname or IP address of HDFS host
Version Added
2.3.0
-
HDFS_DATA¶ HDFS data directory of the
HIVE_SERVER_USER. Used to store offloaded data.Necessity
Mandatory
Associated Option
Supported Values
Valid HDFS directory
Version Added
2.3.0
-
HDFS_DB_PATH_SUFFIX¶ Hadoop databases are named
<schema><HDFS_DB_PATH_SUFFIX>and<schema>_load<HDFS_DB_PATH_SUFFIX>. When this value is not set the suffix of the databases defaults to.db, giving<schema>.dband<schema>_load.db. Set this to an empty string to use no suffix. For backend systems other than Hadoop this variable has no effect.Necessity
Optional
Associated Option
Supported Values
Hostname or IP address of HDFS host
Version Added
2.3.0
-
HDFS_HOME¶ HDFS home directory of the
HIVE_SERVER_USER.Necessity
Mandatory
Supported Values
Valid HDFS directory
Version Added
2.3.0
-
HDFS_LOAD¶ HDFS data directory of the
HIVE_SERVER_USER. Used to stage offloaded data.Necessity
Mandatory
Supported Values
Valid HDFS directory
Version Added
3.4.0
-
HDFS_NAMENODE_ADDRESS¶ Hostname or IP address of the active HDFS namenode or the ID of the HDFS nameservice if HDFS High Availability is configured. This value is required in order to execute result cache queries. In a deployment where result cache queries will never be used, this variable can safely be unset.
Necessity
Optional
Supported Values
Hostname or IP address of active HDFS namenode or ID of the HDFS nameservice if HDFS High Availability is configured
Version Added
2.3.0
-
HDFS_NAMENODE_PORT¶ Port of the active HDFS namenode. Set to
0if HDFS High Availability is configured andHDFS_NAMENODE_ADDRESSis set to a nameservice ID. As withHDFS_NAMENODE_ADDRESS, this value is necessary for executing result cache queries, but otherwise can safely be unset.Necessity
Optional
Supported Values
Port of active HDFS namenode or
0if HDFS High Availability is configuredVersion Added
2.3.0
-
HDFS_RESULT_CACHE_USER¶ Hadoop user to impersonate when making HDFS requests for result cache queries; must have write permissions to HDFS_HOME. In a deployment where result cache queries will never be used, this variable can safely be unset.
Necessity
Mandatory
Default Value
Supported Values
Hadoop username
Version Added
2.3.0
-
HDFS_SNAPSHOT_PATH¶ Before an Incremental Update compaction a HDFS snapshot will be automatically created in the location specified by HDFS_SNAPSHOT_PATH. This location must be a snapshottable directory (consult your HDFS administrators to enable this). When changing HDFS_SNAPSHOT_PATH from the default ensure that it remains a parent directory of
HDFS_DATA. Unsetting this variable will disable automatic HDFS snapshots.Necessity
Optional
Default Value
Supported Values
HDFS path that is equal to or a parent of
HDFS_DATAVersion Added
2.10.0
-
HDFS_SNAPSHOT_SUDO_COMMAND¶ If
HADOOP_SSH_USERis not the inode owner ofHDFS_SNAPSHOT_PATHthen HDFS superuser rights will be required to take HDFS snapshots. Asudorule (or equivalent user substitution tool) can be used to enable this using HDFS_SNAPSHOT_SUDO_COMMAND. The command must be password-less.Necessity
Optional
Supported Values
A valid user-substitution command
Version Added
2.10.0
-
HIVE_SERVER_HOST¶ Name of host(s) to connect to Impala/Hive. Can be a comma-separated list of hosts to randomly choose from, e.g.
hadoop1,hadoop2,hadoop3.Necessity
Mandatory
Supported Values
Hostname or IP address of Impala/Hive host(s)
Version Added
2.3.0
-
HIVE_SERVER_PORT¶ Port of HiveServer2 service. Default Impala port is
21050, default Hive port is10000.Necessity
Mandatory
Default Value
21050|10000Supported Values
Port of HiveServer2 service
Version Added
2.3.0
-
HIVE_SERVER_AUTH_MECHANISM¶ Authentication mechanism for HiveServer2. In non-kerberized and non-LDAP environments, should be set to: Impala:
NOSASL, Hive: value ofhive-site.xml:hive.server2.authentication. In LDAP environments, should be set toPLAIN.Necessity
Optional
Supported Values
NOSASL|PLAIN, value ofhive-site.xml:hive.server2.authenticationVersion Added
2.3.0
-
HIVE_SERVER_PASS¶ Password of the user to authenticate with HiveServer2 service. Required in LDAP enabled Impala configurations. Password encryption is supported using the Password Tool utility.
Necessity
Mandatory with LDAP
Supported Values
HiveServer2 service password
Version Added
2.3.0
-
HIVE_SERVER_USER¶ Name of the user to authenticate with HiveServer2 service.
Necessity
Mandatory
Supported Values
HiveServer2 service username
Version Added
2.3.0
-
HYBRID_EXT_TABLE_DEGREE¶ Default degree of parallelism for base hybrid external tables. When set to
AUTOOffload will copy settings from the source RDBMS table to the hybrid external table.Necessity
Optional
Associated Option
Supported Values
AUTOand positive integersVersion Added
2.11.2
-
HS2_SESSION_PARAMS¶ Comma-separated list of HiveServer2 session parameters to set.
BATCH_SIZE=16384is a recommended performance setting.E.g.
export HS2_SESSION_PARAMS="BATCH_SIZE=16384,MEM_LIMIT=2G".Necessity
Optional
Supported Values
Valid Impala/Hive session parameters
Version Added
2.3.0
-
IN_LIST_JOIN_TABLE¶ Database and table name of the in-list-join table. Can be created and populated with
./connect --create-sequence-table.Necessity
Optional
Supported Values
Valid database and table name
Version Added
2.4.2
-
IN_LIST_JOIN_TABLE_SIZE¶ Size of table specified by
IN_LIST_JOIN_TABLE. Required for both table population byconnect, and table usage by Gluent Query Engine.Necessity
Optional
Supported Values
Up to 1000000
Version Added
2.4.2
-
KERBEROS_KEYTAB¶ The path of the keytab file. If not provided, a valid ticket must already exist in the cache (i.e. manual
kinit).Necessity
Optional
Supported Values
Path to the keytab file
Version Added
2.3.0
-
KERBEROS_PATH¶ If your Kerberos utilities (like
kinit) reside in some non-standard directory, set the path here.Necessity
Optional
Supported Values
Path to Kerberos utilities
Version Added
2.3.0
-
KERBEROS_PRINCIPAL¶ The Kerberos user to authenticate as. i.e.
kinit -kt KERBEROS_KEYTAB KERBEROS_PRINCIPALshould succeed. IfKERBEROS_KEYTABis provided, this should also be provided.Necessity
Optional
Supported Values
Name of Kerberos principal
Version Added
2.3.0
-
KERBEROS_SERVICE¶ The Impala/Hive service (typically
impala/hive). If empty, Smart Connector will attempt to connect unsecured.Necessity
Optional
Supported Values
Name of impala service
Version Added
2.3.0
-
KERBEROS_TICKET_CACHE_PATH¶ Required to use the
libhdfs3-based result cache with an HDFS cluster that uses Kerberos authentication. In a deployment where result cache queries will never be used, this variable can safely be unset.Necessity
Optional
Supported Values
Path to Kerberos ticket cache path for the user that will be executing Smart Connector processes
Version Added
2.3.0
-
LD_LIBRARY_PATH¶ Ensures Gluent
libdirectory is included.Necessity
Optional
Supported Values
Valid paths
Version Added
2.3.0
-
LIBHDFS3_CONF¶ HDFS client configuration file location.
Necessity
Optional
Supported Values
Valid path to XML configuration file
Version Added
3.0.4
-
LOG_LEVEL¶ Logging level verbosity.
Necessity
Mandatory
Default Value
infoSupported Values
info|detail|debugVersion Added
2.3.0
-
MAX_OFFLOAD_CHUNK_COUNT¶ Restrict number of partitions offloaded per cycle. See Offload Transport Chunks for usage.
Necessity
Optional
Associated Option
Supported Values
1-1000Version Added
2.9.0
-
MAX_OFFLOAD_CHUNK_SIZE¶ Restrict size of partitions offloaded per cycle. See Offload Transport Chunks for usage.
Necessity
Optional
Associated Option
Supported Values
E.g.
100M,1G,1.5GVersion Added
2.9.0
-
METAD_AUTOSTART¶ Enable Metadata Daemon automatic start:
TRUE: If Metadata Daemon is not running, Smart Connector will attempt to start Metadata Daemon automatically. FALSE: Smart Connector will only attempt to connect to an already running Metadata Daemon.
Necessity
Optional
Default Value
trueSupported Values
true|falseVersion Added
2.6.0
-
METAD_POOL_SIZE¶ The maximum number of connections Metadata Daemon will maintain in its connection pool to Oracle Database.
Necessity
Optional
Default Value
16Supported Values
Number of connections
Version Added
2.4.5
-
METAD_POOL_TIMEOUT¶ The timeout for idle connections in Metadata Daemon’s connection pool to Oracle Database.
Necessity
Optional
Default Value
300Supported Values
Timeout value in seconds
Version Added
2.4.5
-
NLS_LANG¶ Should be set to the value of Oracle Database
NLS_CHARACTERSET.Necessity
Optional
Supported Values
Valid
NLS_CHARACTERSETvaluesVersion Added
2.3.0
-
NUM_LOCATION_FILES¶ Number of external table location files for parallel data retrieval.
Necessity
Optional
Associated Option
Supported Values
Integer values
Version Added
2.7.2
-
OFFLOAD_BACKEND_SESSION_PARAMETERS¶ Key/value pairs, in JSON format, to override backend query engine parameters. These take effect when establishing a connection to the backend system. For example:
"{\"export OFFLOAD_BACKEND_SESSION_PARAMETERS="{\"request_pool\": \"'root.gluent'\"}"Necessity
Optional
Supported Values
Valid JSON string of key/value pairs (no nested or complex data types)
Version Added
3.3.2
-
OFFLOAD_BIN¶ Path to the Gluent Data Platform
bindirectory ($OFFLOAD_HOME/bin).Necessity
Mandatory
Supported Values
Oracle Database directory object name
Version Added
2.3.0
-
OFFLOAD_CONF¶ Path to the Gluent Data Platform
confdirectory.Necessity
Optional
Supported Values
Path to
confdirectoryVersion Added
2.3.0
-
OFFLOAD_COMPRESS_LOAD_TABLE¶ Compress staged data during an Offload. This can be useful when staging to cloud storage.
Necessity
Optional
Associated Option
Supported Values
true|falseVersion Added
4.0.0
-
OFFLOAD_DISTRIBUTE_ENABLED¶ Distribute data by partition key(s) during the final INSERT operation of an offload. Hive only.
Necessity
Optional
Associated Option
Supported Values
true|falseVersion Added
2.8.0
-
OFFLOAD_FS_CONTAINER¶ The name of the bucket or container to be used when offloading to cloud storage.
Necessity
Optional
Associated Option
Supported Values
A cloud storage bucket/container name configured for use by the backend cluster
Version Added
3.0.0
-
OFFLOAD_FS_PREFIX¶ A directory path used to prefix database locations within
OFFLOAD_FS_SCHEME. WhenOFFLOAD_FS_SCHEMEisinheritHDFS_DATAtakes precedence over this setting.Necessity
Optional
Associated Option
Supported Values
A valid directory in HDFS or cloud storage
Version Added
3.0.0
-
OFFLOAD_FS_SCHEME¶ The filesystem scheme to be used for database and table locations.
inheritspecifies that all tables created by Offload will not specify aLOCATIONclause, they will inherit the location from the parent database. See Integrating with Cloud Storage for details.Necessity
Mandatory
Associated Option
Supported Values
inherit,hdfs,s3a,adl,abfs,abfssVersion Added
3.0.0
-
OFFLOAD_HOME¶ Location of Gluent Data Platform installation.
Necessity
Mandatory
Supported Values
Path to installed
offloaddirectoryVersion Added
2.3.0
-
OFFLOAD_LOG¶ Path to the Gluent Data Platform
logdirectory.Necessity
Mandatory
Supported Values
Oracle Database directory object name
Version Added
2.3.0
-
OFFLOAD_LOGDIR¶ Override Smart Connector log path. If undefined defaults to
$OFFLOAD_HOME/log.Necessity
Optional
Supported Values
Valid path
Version Added
2.3.0
-
OFFLOAD_SORT_ENABLED¶ Enables the sorting/clustering of data when inserting in to the final destination table. Columns used for sorting/clustering are specified using
--sort-columns.Necessity
Optional
Associated Option
Supported Values
true|falseVersion Added
2.7.0
-
OFFLOAD_TRANSPORT¶ Method used to transport data from an RDBMS frontend to a backend system.
AUTOselects the optimal method based on configuration and table structure.Necessity
Optional
Associated Option
Supported Values
AUTO|GLUENT|SQOOPVersion Added
3.1.0
-
OFFLOAD_TRANSPORT_AUTH_USING_ORACLE_WALLET¶ Instruct Offload that RDBMS authentication is via an Oracle Wallet. The wallet location should be configured using Hadoop configuration appropriate to method used for data transport. See
SQOOP_OVERRIDESandOFFLOAD_TRANSPORT_SPARK_PROPERTIESfor examples.Necessity
Optional
Supported Values
true|falseVersion Added
3.1.0
-
OFFLOAD_TRANSPORT_CMD_HOST¶ An override for
HDFS_CMD_HOSTwhen running shell based Offload Transport commands such as Sqoop or Spark Submit.Necessity
Optional
Associated Option
Supported Values
Hostname or IP address of HDFS host
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_CONSISTENT_READ¶ Control whether parallel data transport tasks should use a consistent point in time when reading RDBMS data.
Necessity
Optional
Associated Option
Supported Values
true|falseVersion Added
3.1.0
-
OFFLOAD_TRANSPORT_CREDENTIAL_PROVIDER_PATH¶ The credential provider path to be used in conjunction with
OFFLOAD_TRANSPORT_PASSWORD_ALIAS. Integration with Hadoop Credential Provider API is only supported by Sqoop, Spark Submit and Livy based Offload Transport.Necessity
Optional
Supported Values
A valid HDFS path
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_DSN¶ Database connection details for Offload Transport if different to
ORA_CONN.Necessity
Optional
Associated Option
Supported Values
<hostname>:<port>/<service>Version Added
3.1.0
-
OFFLOAD_TRANSPORT_FETCH_SIZE¶ Number of records to fetch in a single batch from the RDBMS during Offload. Offload Transport may encounter memory pressure if a table is very wide (e.g. contains LOB columns) and there are lots of records in a batch. Reducing the fetch size can alleviate this if more memory cannot be allocated.
Necessity
Optional
Associated Option
Supported Values
Positive integers
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_LIVY_API_VERIFY_SSL¶ Used to enable SSL for Livy API calls. There are 4 states:
Empty: Do not use SSL.
TRUE: Use SSL and verify Hadoop certificate against known certificates.
FALSE: Use SSL and do not verify Hadoop certificate.
/some/path/here/cert-bundle.crt: Use SSL and verify Hadoop certificate against path to certificate bundle.
Necessity
Optional
Supported Values
Empty,
true|false,<path to certificate bundle>Version Added
3.1.0
-
OFFLOAD_TRANSPORT_LIVY_API_URL¶ URL for Livy/Spark REST API in the format
http://fqdn-n.example.com:port.httpscan be used in place ofhttp.Necessity
Optional
Associated Option
Supported Values
Valid Livy REST API URL
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_LIVY_IDLE_SESSION_TIMEOUT¶ Timeout (in seconds) for idle Spark client sessions created in Livy.
Necessity
Optional
Associated Option
Supported Values
Positive integers
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_LIVY_MAX_SESSIONS¶ Limits the number of Livy sessions Offload will create. Sessions are re-used when idle. New sessions are only created when no idle sessions are available.
Necessity
Optional
Associated Option
Supported Values
Positive integers
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_PARALLELISM¶ The number of parallel streams to be used when transporting data from the source RDBMS to the backend.
Necessity
Optional
Associated Option
Supported Values
Positive integers
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_PASSWORD_ALIAS¶ An alias provided by Hadoop Credential Provider API to be used for RDBMS authentication during Offload Transport. The key store containing the alias must be specified in either
OFFLOAD_TRANSPORT_CREDENTIAL_PROVIDER_PATHor in Hadoop configuration Path (hadoop.security.credential.provider.path).Necessity
Optional
Associated Option
Supported Values
Valid Hadoop Credential Provider API alias
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_RDBMS_SESSION_PARAMETERS¶ Key/value pairs, in JSON format, to supply database session parameter values. These only take effect during Offload Transport, e.g.
'{"cell_offload_processing": "false"}'Necessity
Optional
Supported Values
Valid JSON string of key/value pairs (no nested or complex data types)
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_SPARK_OVERRIDES¶ Override JVM flags for a
spark-submitcommand, inserted immediately afterspark-submit. For example:"-Dmapred.map.child.java.opts='-Doracle.net.wallet_location=/some/path/here/gluent_wallet'"
Necessity
Optional
Associated Option
Supported Values
Valid JVM options
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_SPARK_PROPERTIES¶ Key/value pairs, in JSON format, to override Spark property defaults. Examples:
'{"spark.driver.memory": "8G", "spark.executor.memory": "8G"}' '{"spark.driver.extraJavaOptions": "-Doracle.net.wallet_location=/some/path/here/gluent_wallet", "spark.executor.extraJavaOptions": "-Doracle.net.wallet_location=/some/path/here/gluent_wallet"}'Necessity
Optional
Associated Option
Supported Values
Valid JSON string of key/value pairs (no nested or complex data types)
Version Added
3.1.0
Note
Some properties will not take effect when connecting to the Spark Thrift Server because the Spark context has already been created.
-
OFFLOAD_TRANSPORT_SPARK_QUEUE_NAME¶ YARN queue name for Gluent Offload Engine Spark jobs.
Necessity
Optional
Associated Option
Supported Values
Valid YARN queue name
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_SPARK_SUBMIT_EXECUTABLE¶ The executable to use for submitting Spark applications. Can be empty,
spark-submitorspark2-submit.Necessity
Optional
Supported Values
Blank or
spark-submit|spark2-submitVersion Added
3.1.0
-
OFFLOAD_TRANSPORT_SPARK_SUBMIT_MASTER_URL¶ The master URL for the Spark cluster, only used for non-Hadoop Spark clusters. If empty, Spark will use default settings.
Necessity
Optional
Associated Option
None
Supported Values
Valid master URL
Version Added
4.0.0
-
OFFLOAD_TRANSPORT_SPARK_THRIFT_HOST¶ Name of host(s) where the Spark Thrift Server is running. Can be a comma-separated list of hosts to randomly choose from, e.g.
hadoop1,hadoop2,hadoop3.Necessity
Optional
Associated Option
Supported Values
Hostname or IP address of Spark Thrift Server host(s)
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_SPARK_THRIFT_PORT¶ Port that the Spark Thrift Server is listening on.
Necessity
Optional
Associated Option
Supported Values
Active port
Version Added
3.1.0
-
OFFLOAD_TRANSPORT_USER¶ User to authenticate as when executing Offload Transport commands such as SSH for
spark-submitor Sqoop commands, or Livy API callsNecessity
Mandatory
Associated Option
None
Supported Values
Valid username
Version Added
4.0.0
-
OFFLOAD_UDF_DB¶ Impala/Hive database that Gluent UDFs are created in. If undefined defaults to the
defaultdatabase.Necessity
Optional
Supported Values
Valid Impala/Hive database
Version Added
2.3.0
-
ORA_ADM_PASS¶ Password of the Gluent Data Platform Admin Schema chosen during installation. Password encryption is supported using the Password Tool utility.
Necessity
Mandatory
Supported Values
Oracle Database ADM password
Version Added
2.3.0
-
ORA_ADM_USER¶ Name of the Gluent Data Platform Admin Schema chosen during installation.
Necessity
Mandatory
Supported Values
Oracle Database ADM username
Version Added
2.3.0
-
ORA_APP_PASS¶ Password of the Gluent Data Platform Application Schema chosen during installation. Password encryption is supported using the Password Tool utility.
Necessity
Mandatory
Supported Values
Oracle Database APP password
Version Added
2.3.0
-
ORA_APP_USER¶ Name of the Gluent Data Platform Application Schema chosen during installation.
Necessity
Mandatory
Supported Values
Oracle Database APP username
Version Added
2.3.0
-
ORA_CONN¶ Oracle Database connection details. Fully qualified DB service name must be used if the Oracle Database service name includes domain-names (
DB_DOMAIN), e.g.ORCL12.gluent.com.Necessity
Mandatory
Supported Values
<hostname>:<port>/<service>Version Added
2.3.0
-
ORA_REPO_USER¶ Name of the Gluent Data Platform Repository Schema chosen during installation.
Necessity
Mandatory
Supported Values
Oracle Database REPO username
Version Added
3.3.0
-
PASSWORD_KEY_FILE¶ Password key file generated by Password Tool and used to create encrypted password strings.
Necessity
Optional
Supported Values
Path to Password Key File
Version Added
2.5.0
-
PATH¶ Ensures Gluent Data Platform
bindirectory is included. The path order is important to ensure that the Python distribution included with Gluent Data Platform is used.Necessity
Optional
Supported Values
Valid paths
Version Added
2.3.0
-
QUERY_ENGINE¶ Backend SQL engine to use for commands issued as part of Offload/Present orchestration.
Necessity
Optional
Default Value
IMPALASupported Values
IMPALA|HIVEVersion Added
2.3.0
-
SPARK_HISTORY_SERVER¶ SPARK_HISTORY_SERVER is a URL for accessing the runtime history of the running Spark Thrift Server UI.
Necessity
Optional
Supported Values
URL of Spark History Server e.g.
http://hadoop1:18081/Version Added
3.1.0
-
SPARK_THRIFT_HOST¶ Name of host(s) where the Spark Thrift Server is running. Can be a comma-separated list of hosts to randomly choose from, e.g.
hadoop1,hadoop2,hadoop3.Necessity
Optional
Supported Values
Hostname or IP address of Spark Thrift Server host(s)
Version Added
3.1.0
-
SPARK_THRIFT_PORT¶ Port that the Spark Thrift Server is listening on.
Necessity
Optional
Supported Values
Active port
Version Added
3.1.0
-
SQOOP_DISABLE_DIRECT¶ It is recommended that the OraOOP optimizations for Sqoop (included in standard Apache Sqoop from
v1.4.5) are used. If not, then disable direct path mode.Necessity
Optional
Associated Option
Supported Values
true|falseVersion Added
2.3.0
-
SQOOP_OVERRIDES¶ Override flags for Sqoop command, inserted immediately after
sqoop import. For example:"-Dmapred.map.child.java.opts='-Doracle.net.wallet_location=/some/path/here/gluent_wallet'"
Necessity
Optional
Associated Option
Supported Values
Valid Sqoop parameters
Version Added
2.3.0
-
SQOOP_ADDITIONAL_OPTIONS¶ Additional Sqoop command options added at the end of the Sqoop command.
Necessity
Optional
Associated Option
Supported Values
Any Sqoop command option/argument not already included in the Sqoop command line
Version Added
2.9.0
-
SQOOP_PASSWORD_FILE¶ HDFS path to Sqoop password file, readable by
HADOOP_SSH_USER. If not specified,ORA_APP_PASSwill be used.Necessity
Optional
Associated Option
Supported Values
HDFS path to password file
Version Added
2.5.0
-
SQOOP_QUEUE_NAME¶ YARN queue name for Gluent Offload Engine Sqoop jobs.
Necessity
Optional
Associated Option
Supported Values
Valid YARN queue name
Version Added
3.1.0
-
SSL_ACTIVE¶ Set to
truewhen Impala/Hive uses SSL/TLS encryption.Necessity
Optional
Supported Values
true|falseVersion Added
2.3.0
-
SSL_TRUSTED_CERTS¶ SSL/TLS trusted certificates.
Necessity
Optional
Supported Values
Path to SSL certificate
Version Added
2.3.0
-
TWO_TASK¶ Used to support Pluggable Databases in Oracle Database Multitenant environments. Set to
ORA_CONNfor single instance and an EZconnect string connecting to the local instance, typically<hostname>:<port>/<ORACLE_SID>, for Oracle RAC (Real Application Clusters).Necessity
Required for Pluggable Databases
Supported Values
ORA_CONNor EZconnect stringVersion Added
2.10.0
-
WEBHDFS_HOST¶ Can be used in conjunction with
WEBHDFS_PORTto optimize HDFS activities removing JVM start-up overhead by utilizing WebHDFS. From version2.4.7the value can be a comma-separated list of hosts if HDFS is configured for High Availability.Necessity
Optional
Supported Values
Hostname or IP address of WebHDFS host
Version Added
2.3.0
-
WEBHDFS_PORT¶ Can be used in conjunction with
WEBHDFS_HOSTto optimize HDFS activities removing JVM start-up overhead by utilizing WebHDFS. If this value is unset then default ports of 50070 (HTTP) or 50470 (HTTPS) are used.Necessity
Optional
Default Value
50070|50470Supported Values
Port of HDFS namenode
Version Added
2.3.0
-
WEBHDFS_VERIFY_SSL¶ Used to enable SSL for WebHDFS calls. There are 4 states:
Empty: Do not use SSL
TRUE: Use SSL & verify Hadoop certificate against known certificates
FALSE: Use SSL & do not verify Hadoop certificate
/some/path/here/cert-bundle.crt: Use SSL & verify Hadoop certificate against path to certificate bundle
Necessity
Optional
Supported Values
Empty,
true|false,<path to certificate bundle>Version Added
2.3.0
Common Parameters¶
-
--execute¶ Perform operations, rather than just printing.
Alias
-xDefault Value
None
Supported Values
None
Version Added
2.3.0
-
-f¶ Force option. Replace Gluent Offload Engine managed tables/views as required. Use with caution.
Alias
--forceDefault Value
None
Supported Values
None
Version Added
2.3.0
-
--force¶ Force option. Replace Gluent Offload Engine managed tables/views as required. Use with caution.
Alias
-fDefault Value
None
Supported Values
None
Version Added
2.3.0
-
--no-webhdfs¶ Prevent the use of WebHDFS even when configured for use.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
-t¶ Owner and table name.
Alias
--tableDefault Value
None
Supported Values
<OWNER>.<NAME>Version Added
2.3.0
-
--table¶ Owner and table name.
Alias
-tDefault Value
None
Supported Values
<OWNER>.<NAME>Version Added
2.3.0
-
--target-name¶ Override owner and/or name of created frontend or backend object as appropriate for a command.
Allows separation of the RDBMS owner and/or name from the backend system. This can be necessary as some characters supported for owner and name in Oracle Database are not supported in all backend systems, for example
$in Hadoop-based backends.Allows offload to an existing backend database with a different name to the source RDBMS schema.
Allows present to a hybrid schema without a corresponding application RDBMS schema or with a different name to the source backend database.
Alias
None
Default Value
None
Supported Values
<OWNER>.<NAME>Version Added
2.3.0
-
-v¶ Verbose output.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--vv¶ More verbose output.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
-x¶ Perform operations, rather than just printing.
Alias
--executeDefault Value
None
Supported Values
None
Version Added
2.3.0
Connect Parameters¶
-
--create-sequence-table¶ Create the Gluent Data Platform sequence table. See
IN_LIST_JOIN_TABLEandIN_LIST_JOIN_TABLE_SIZE.Alias
None
Default Value
None
Supported Values
None
Version Added
2.4.2
-
--install-udfs¶ Install Gluent Data Platform user-defined functions (UDFs).
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--sequence-table-name¶ See
IN_LIST_JOIN_TABLE.Alias
None
Default Value
default.gluent_sequenceSupported Values
Valid database and table name
Version Added
2.4.2
-
--sequence-table-size¶ -
Alias
None
Default Value
10000Supported Values
Up to 1000000
Version Added
2.4.2
-
--sql-file¶ Write SQL commands to a file rather than execute them when
connectis run.Alias
None
Default Value
None
Supported Values
Any valid path
Version Added
2.11.0
-
--update-root-files¶ Updates both Metadata Daemon and Data Daemon scripts with configuration and sets ownership to
root:root. This option can only be run withrootprivileges.Alias
None
Default Value
None
Supported Values
None
Version Added
2.11.0
-
--update-metad-files¶ Updates Metadata Daemon scripts with configuration and sets ownership to
root:root. This option can only be run withrootprivileges.Alias
None
Default Value
None
Supported Values
None
Version Added
4.0.0
-
--update-datad-files¶ Updates Data Daemon scripts with configuration and sets ownership to
root:root. This option can only be run withrootprivileges.Alias
None
Default Value
None
Supported Values
None
Version Added
4.0.0
-
--upgrade-environment-file¶ Updates configuration file (
offload.env) with any missing default configuration from offload.env.template. Typically used after upgrades.Alias
None
Default Value
None
Supported Values
None
Version Added
2.11.0
Offload Parameters¶
-
--allow-decimal-scale-rounding¶ Confirm that it is acceptable for Offload to round decimal places when loading data into a backend system.
Alias
None
Default Value
None
Supported Values
None
Version Added
4.0.0
-
--allow-nanosecond-timestamp-columns¶ Confirm that it is safe to offload timestamp columns with nanosecond capability when the backend system does not support nanoseconds.
Alias
None
Default Value
None
Supported Values
None
Version Added
4.0.2
-
--allow-time-zone-columns¶ Confirm that it is safe to offload time zone sensitive data.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--bucket-hash-column¶ Column to use when calculating offload bucket values.
Alias
None
Default Value
None
Supported Values
Valid column name
Version Added
2.3.0
-
--compress-load-table¶ Compress the contents of the load table during offload.
Alias
None
Default Value
OFFLOAD_COMPRESS_LOAD_TABLE,falseSupported Values
None
Version Added
2.3.0
-
--compute-load-table-stats¶ Compute statistics on the load table during offload.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.9.0
-
--create-backend-db¶ Automatically create backend databases. Either use this option, or ensure databases matching 1) the Oracle Database source schema and 2) the Oracle Database source schema with suffix
_loadalready exist.Alias
None
Default Value
None
Supported Values
None
Version Added
3.3.0
-
--count-star-expressions¶ CSV list of functional equivalents to
COUNT(*)for aggregation pushdown.If you also use
COUNT(x)in your SQL statements then, apart fromCOUNT(1)which is automatically catered for, the presence ofCOUNT(x)will cause rewrite rules to fail unless you include it with this parameter.Alias
None
Default Value
None
Supported Values
E.g.
COUNT(9)Version Added
2.3.0
-
--data-governance-custom-properties¶ CSV list of free-format tags for data governance metadata. These are in addition to
DATA_GOVERNANCE_AUTO_PROPERTIESand will overrideDATA_GOVERNANCE_CUSTOM_PROPERTIES.Alias
None
Default Value
Supported Values
Valid JSON string of key/value pairs (no nested or complex data types)
Version Added
2.11.0
CSV list of free-format tags for data governance metadata. These are in addition to
DATA_GOVERNANCE_AUTO_TAGSand therefore useful for tags to be applied to specific activities.Alias
None
Default Value
Supported Values
E.g.
CONFIDENTIAL,TIER1Version Added
2.11.0
-
--data-sample-percent¶ Sample RDBMS data for numeric columns with no RDBMS precision and scale properties. A value of 0 disables sampling. A value of
AUTOwill enable Offload choose a percentage based on the size of the RDBMS table.Alias
None
Default Value
AUTOSupported Values
AUTOor0-100Version Added
2.5.0
-
--date-columns¶ CSV list of columns to offload as DATE (effective for date/timestamp columns).
Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
4.0.0
-
--db-name-prefix¶ Multitenant support, enabling many Oracle Database databases to offload to the same backend cluster. See
DB_NAME_PREFIXfor details.Alias
None
Default Value
Supported Values
Supported backend characters
Version Added
2.3.0
-
--decimal-columns¶ CSV list of columns to offload/present as a fixed precision and scale numeric data type. For example
DECIMAL(p,s)where “p,s” is specified in a paired--decimal-columns-typeoption. Only effective for numeric columns. These options allow repeat inclusion for flexible data type specification, for example:"--decimal-columns-type=18,2 --decimal-columns=price,cost --decimal-columns-type=6,4 --decimal-columns=location"
Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.5.0
-
--decimal-columns-type¶ State the precision and scale of columns listed in a paired
--decimal-columnsoption. Must be of format “precision,scale” where 1<=precision<=38 and 0<=scale<=38 and scale<=precision. e.g.:"--decimal-columns-type=18,2"
When offloading, values specified in this option are subject to padding as per the
--decimal-padding-digitsoption.Alias
None
Default Value
None
Supported Values
Valid “precision,scale” where 1<=precision<=38 and 0<=scale<=38 and scale<=precision
Version Added
2.5.0
-
--decimal-padding-digits¶ Padding to apply to precision and scale of DECIMALs during an offload.
Alias
None
Default Value
2
Supported Values
Integral values
Version Added
2.5.0
-
--double-columns¶ CSV list of columns to store as a double precision floating-point. Only effective for numeric columns.
Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.4.7
-
--equal-to-values¶ Used for list-partitioned tables to specify a partition to be included for Partition-Based Offload by partition key value. This option can be included multiple times to match multiple partitions, for example:
--equal-to-values=2011 --equal-to-values=2012 --equal-to-values=2013
Alias
None
Default Value
None
Supported Values
Valid literals matching list-partition key values
Version Added
3.3.0
-
--ext-table-degree¶ Default degree of parallelism for base hybrid external tables. When set to
AUTOOffload will copy settings from the source RDBMS table to the hybrid external table.Alias
None
Default Value
HYBRID_EXT_TABLE_DEGREEorAUTOSupported Values
AUTOand positive integersVersion Added
2.11.2
-
--hdfs-data¶ Command line override for
HDFS_DATA.Alias
None
Default Value
Supported Values
Valid HDFS path
Version Added
2.3.0
-
--hdfs-db-path-suffix¶ Hadoop databases are named
<schema><HDFS_DB_PATH_SUFFIX>and<schema>_load<HDFS_DB_PATH_SUFFIX>. When this value is not set the suffix of the databases defaults to.db, giving<schema>.dband<schema>_load.db. Set this to an empty string to use no suffix. For backend systems other than Hadoop this variable has no effect.Alias
None
Default Value
HDFS_DB_PATH_SUFFIX,.dbon Hadoop systems, or''on other backend systems.Supported Values
Valid HDFS path
Version Added
2.3.0
-
--hive-column-stats¶ Enable computation of column stats with “NATIVE”
--offload-statsmethod. Applies to Hive only.Alias
None
Default Value
None
Supported Values
None
Version Added
2.6.1
-
--integer-1-columns¶ CSV list of columns to offload/present (as applicable) as a 1-byte integer, known as
TINYINTin many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-2-columns¶ CSV list of columns to offload/present (as applicable) as a 2-byte integer, known as
SMALLINTin many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-4-columns¶ CSV list of columns to offload/present (as applicable) as a 4-byte integer, known as
INTin many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-8-columns¶ CSV list of columns to offload/present (as applicable) as an 8-byte integer, known as
BIGINTin many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-38-columns¶ CSV list of columns to offload/present (as applicable) as 38 digit integral column. If a system does not support 38 digits of precision then the most appropriate data type available will be used. Only effective for numeric columns.
Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--less-than-value¶ Offload partitions with high water mark less than this value.
Alias
None
Default Value
None
Supported Values
Integer or date values (use
YYYY-MM-DDformat)Version Added
2.3.0
-
--lob-data-length¶ Expected length of RDBMS LOB data
Alias
None
Default Value
32KSupported Values
E.g.
64K,10MVersion Added
2.4.7
-
--max-offload-chunk-count¶ Restrict the number of partitions offloaded per cycle. See Offload Transport Chunks for usage.
Alias
None
Default Value
Supported Values
1-1000Version Added
2.3.0
-
--max-offload-chunk-size¶ Restrict the size of partitions offloaded per cycle. See Offload Transport Chunks for usage.
Alias
None
Default Value
Supported Values
E.g.
100M,1G,1.5GVersion Added
2.3.0
-
--no-auto-detect-dates¶ Turn off automatic adoption of string data type for RDBMS date values that are incompatible with the backend system. For example, dates preceding 1400-01-01 are invalid in Impala and will be offloaded to string columns unless this option is used.
Alias
None
Default Value
False
Supported Values
None
Version Added
2.5.1
-
--no-auto-detect-numbers¶ Turn off automatic adoption of numeric data types based on their precision and scale in the RDBMS. All numeric data types will be offloaded to a general purpose data type such as
DECIMAL(38,18)on Hadoop systems orNUMERICon Google BigQuery.Alias
None
Default Value
False
Supported Values
None
Version Added
2.3.0
-
--no-create-aggregations¶ Skip aggregation creation. If this parameter is used, then to benefit from Advanced Aggregation Pushdown the aggregate hybrid objects must be created using Present.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--no-generate-dependent-views¶ Dependent views will not be automatically re-generated in the hybrid schema.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--no-materialize-join¶ Offload a join (specified by
--offload-join) as a view.Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--no-modify-hybrid-view¶ Prevent an offload predicate from being added to the boundary conditions in a hybrid view. Can only be used in conjunction with
--offload-predicatefor--offload-predicate-typevalues ofRANGE,LIST_AS_RANGE,RANGE_AND_PREDICATEorLIST_AS_RANGE_AND_PREDICATE.Alias
None
Default Value
None
Supported Values
None
Version Added
3.4.0
-
--no-verify¶ Skip the data validation step at the end of an offload.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--num-buckets¶ Default number of offload buckets (subpartitions) for an offloaded table, allowing parallel data retrieval. A value of
AUTOtunes to a value between 1 andDEFAULT_BUCKETS_MAX.Alias
None
Default Value
DEFAULT_BUCKETSor AUTOSupported Values
Integer values or
AUTOVersion Added
2.3.0
-
--num-location-files¶ Number of external table location files for parallel data retrieval.
Alias
None
Default Value
Supported Values
Integer values
Version Added
2.7.2
Note
When offloading or materializing data in Impala then --num-location-files will be aligned with --num-buckets/DEFAULT_BUCKETS
-
--offload-by-subpartition¶ Instructs Offload to use subpartition keys and high values in place of top-level partition information.
Alias
None
Default Value
True for supported LIST/RANGE and HASH/RANGE partitioned tables, False for all other tables
Supported Values
None
Version Added
2.7.0
-
--offload-chunk-column¶ Splits load data by this column during insert from the load table to the final table. This can be used to manage memory usage.
Alias
None
Default Value
None
Supported Values
Valid column name
Version Added
2.3.0
-
--offload-chunk-impala-insert-hint¶ Used to inject a hint into the
INSERT AS SELECTmoving data from load table to final destination. The absence of a value injects no hint. Impala only.Alias
None
Default Value
None
Supported Values
SHUFFLE|NOSHUFFLEVersion Added
2.3.0
-
--offload-distribute-enabled¶ Distribute data by partition key(s) during the final INSERT operation of an offload. Hive only.
Alias
None
Default Value
Supported Values
None
Version Added
2.8.0
-
--offload-fs-container¶ The name of the bucket or container to be used when offloading to cloud storage.
Alias
None
Default Value
Supported Values
A cloud storage bucket/container name configured for use by the backend cluster
Version Added
3.0.0
-
--offload-fs-prefix¶ A directory path used to prefix database locations within
OFFLOAD_FS_SCHEME. WhenOFFLOAD_FS_SCHEMEisinheritHDFS_DATAtakes precedence over this setting.Alias
None
Default Value
Supported Values
A valid directory in HDFS or cloud storage
Version Added
3.0.0
-
--offload-fs-scheme¶ The filesystem scheme to be used for database and table locations.
inheritspecifies that all tables created by Offload will not specify aLOCATIONclause, they will inherit the location from the parent database. See Integrating with Cloud Storage for details.Alias
None
Default Value
OFFLOAD_FS_SCHEME,inheritSupported Values
inherit,hdfs,s3a,adl,abfs,abfssVersion Added
3.0.0
-
--offload-join¶ Offload a materialized view of the supplied join(s), allowing join processing to be offloaded. Repeated use of
--offload-joinallows multiple row sources to be included. See documentation for syntax details.Alias
None
Default Value
None
Supported Values
Version Added
2.3.0
-
--offload-predicate¶ Specify a predicate to identify a set of data in a table for offload. Can be used to offload all or some of the data in any table type. See documentation for syntax details.
Alias
None
Default Value
None
Supported Values
Version Added
3.4.0
-
--offload-predicate-type¶ Override the default INCREMENTAL_PREDICATE_TYPE for a partitioned table. Can be used to offload LIST partitioned tables using RANGE logic with an
--offload-predicate-typevalue ofLIST_AS_RANGEor used for specialized cases of offloading with Partition-Based Offload and Predicate-Based Offload.Alias
None
Default Value
None
Supported Values
LIST,LIST_AS_RANGE,RANGE,RANGE_AND_PREDICATE,LIST_AS_RANGE_AND_PREDICATE,PREDICATEVersion Added
3.3.1
-
--offload-sort-enabled¶ Sort/cluster data during the final INSERT operation of an offload. Configure sort/cluster columns using
--sort-columns.Alias
None
Default Value
OFFLOAD_SORT_ENABLED,falseSupported Values
None
Version Added
2.7.0
-
--offload-stats¶ Method used to manage backend table stats during an Offload, Incremental Update Extraction or Compaction.
NATIVEis the default.HISTORYwill gather stats on all partitions without stats (applicable to an Offload on Hive only and will automatically be replaced withNATIVEon Impala).COPYwill copy table statistics from the RDBMS to an offloaded table if the backend system supports setting of statistics.NONEwill prevent Offload from managing stats; for Hive this results in no stats being gathered even ifhive.stats.autogather=trueis set at the system level.Alias
None
Default Value
NATIVESupported Values
NATIVE|HISTORY|COPY|NONEVersion Added
2.4.7(HISTORYadded in2.9.0)
-
--offload-transport¶ Method used to transport data from an RDBMS frontend to a backend system.
AUTOselects the optimal method based on configuration and table structure.Alias
None
Default Value
OFFLOAD_TRANSPORT,AUTOSupported Values
AUTO|GLUENT|SQOOPVersion Added
3.1.0
-
--offload-transport-cmd-host¶ An override for
HDFS_CMD_HOSTwhen running shell based Offload Transport commands such as Sqoop or Spark Submit.Alias
None
Default Value
Supported Values
Hostname or IP address of HDFS host
Version Added
3.1.0
-
--offload-transport-consistent-read¶ Control whether parallel data transport tasks should use a consistent point in time when reading RDBMS data.
Alias
None
Default Value
Supported Values
true|falseVersion Added
3.1.0
-
--offload-transport-dsn¶ Database connection details for Offload Transport if different to
ORA_CONN.Alias
None
Default Value
Supported Values
<hostname>:<port>/<service>Version Added
3.1.0
-
--offload-transport-fetch-size¶ Number of records to fetch in a single batch from the RDBMS during Offload. Offload Transport may encounter memory pressure if a table is very wide (e.g. contains LOB columns) and there are lots of records in a batch. Reducing the fetch size can alleviate this if more memory cannot be allocated.
Alias
None
Default Value
Supported Values
Positive integers
Version Added
3.1.0
-
--offload-transport-jvm-overrides¶ JVM overrides (inserted right after
sqoop importorspark-submit).Alias
None
Default Value
Supported Values
Version Added
3.1.0
-
--offload-transport-livy-api-url¶ URL for Livy/Spark REST API in the format
http://fqdn-n.example.com:port.httpscan be used in place ofhttp.Alias
None
Default Value
Supported Values
Valid Livy REST API URL
Version Added
3.1.0
-
--offload-transport-livy-idle-session-timeout¶ Timeout (in seconds) for idle Spark client sessions created in Livy.
Alias
None
Default Value
Supported Values
Positive integers
Version Added
3.1.0
-
--offload-transport-livy-max-sessions¶ Limits the number of Livy sessions Offload will create. Sessions are re-used when idle. New sessions are only created when no idle sessions are available.
Alias
None
Default Value
Supported Values
Positive integers
Version Added
3.1.0
-
--offload-transport-parallelism¶ The number of parallel streams to be used when transporting data from the source RDBMS to the backend.
Alias
None
Default Value
Supported Values
Positive integers
Version Added
3.1.0
-
--offload-transport-password-alias¶ An alias provided by Hadoop Credential Provider API to be used for RDBMS authentication during Offload Transport. The key store containing the alias must be specified in either
OFFLOAD_TRANSPORT_CREDENTIAL_PROVIDER_PATHor in Hadoop configuration Path (hadoop.security.credential.provider.path).Alias
None
Default Value
Supported Values
Valid Hadoop Credential Provider API alias
Version Added
3.1.0
-
--offload-transport-queue-name¶ YARN queue name to be used for Offload Transport jobs.
Alias
None
Default Value
Supported Values
Version Added
3.1.0
-
--offload-transport-small-table-threshold¶ Threshold above which Query Import is no longer considered the correct offload choice for non-partitioned tables.
Alias
None
Default Value
20MSupported Values
E.g.
100M,1G,1.5GVersion Added
3.1.0
-
--offload-transport-spark-properties¶ Key/value pairs, in JSON format, to override Spark property defaults. Examples:
'{"spark.driver.memory": "8G", "spark.executor.memory": "8G"}' '{"spark.driver.extraJavaOptions": "-Doracle.net.wallet_location=/some/path/here/gluent_wallet", "spark.executor.extraJavaOptions": "-Doracle.net.wallet_location=/some/path/here/gluent_wallet"}'Alias
None
Default Value
Supported Values
Valid JSON string of key/value pairs (no nested or complex data types)
Version Added
3.1.0
-
--offload-transport-spark-thrift-host¶ Name of host(s) where the Spark Thrift Server is running. Can be a comma-separated list of hosts to randomly choose from, e.g.
hadoop1,hadoop2,hadoop3.Alias
None
Default Value
Supported Values
Hostname or IP address of Spark Thrift Server host(s)
Version Added
3.1.0
-
--offload-transport-spark-thrift-port¶ Port that the Spark Thrift Server is listening on.
Alias
None
Default Value
Supported Values
Active port
Version Added
3.1.0
-
--offload-type¶ Identifies a range-partitioned offload as
FULLorINCREMENTAL.FULLdictates that all data is offloaded.INCREMENTALdictates that data up to a boundary threshold will be offloaded.Alias
None
Default Value
INCREMENTALfor RDBMS tables capable of supporting Partition-Based Offload that are partially offloaded (e.g. using--older-than-date).FULLfor all other offloads.Supported Values
FULL|INCREMENTALVersion Added
2.5.0
-
--older-than-date¶ Offload partitions older than this date (use
YYYY-MM-DDformat). Overrides--older-than-daysif both are present.Alias
None
Default Value
None
Supported Values
Date in
YYYY-MM-DDformatVersion Added
2.3.0
-
--older-than-days¶ Offload partitions older than this number of days (exclusive, i.e. the boundary partition is not offloaded). Suitable for keeping data up to a certain age in the source table. Alternative to
--older-than-dateoption. If both are supplied,--older-than-datewill be used.Alias
None
Default Value
None
Supported Values
Valid number of days
Version Added
2.3.0
-
--partition-columns¶ Override column(s) to use for partitioning backend data. Defaults to source table partition columns.
Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.3.0
-
--partition-digits¶ Maximum digits allowed for a numeric partition value.
Alias
None
Default Value
15Supported Values
Integer values
Version Added
2.3.0
-
--partition-granularity¶ Partition level/granularity. Use:
Y,M,Dfor date/timestamp partition columnsIntegral size for numeric partitions. A value of
1is effectively list partitioningSub-string length for string partitions
Examples:
Mpartitions the table by Year-MonthDpartitions the table by Year-Month-Day5000partitions the table in ranges of 5000 values1creates a partition per value, useful for columns holding values such as year and month or categories2on a string partition key partitions using the first two characters
Alias
None
Default Value
Mfor date/timestamp partition columns. There is no default for other partition column typesSupported Values
Y|M|D|\d+Version Added
2.3.0
-
--partition-lower-value¶ Integer value defining the lower bound of a range values used for backend integer range partitioning.
Alias
None
Default Value
None
Supported Values
Positive integers
Version Added
4.0.0
-
--partition-names¶ Specify partitions to be included for offload with Partition-Based Offload. For range-partitioned tables only a single partition name can be specified and it is used to derive a value for
--less-than-value/--older-than-dateas appropriate. For list-partitioned tables, this option is used to supply a CSV of all partitions to be offloaded and is additional to any partitions offloaded in previous operations.Alias
None
Default Value
None
Supported Values
Valid partition name(s)
Version Added
3.3.0
-
--partition-upper-value¶ Integer value defining the upper bound of a range of values used for backend integer range partitioning.
Alias
None
Default Value
None
Supported Values
Positive integers
Version Added
4.0.0
-
--preserve-load-table¶ Stops the load table being dropped on completion of offload.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--purge¶ When supported by the backend system, utilize purge when removing a table due to
--reset-backend-table.Alias
None
Default Value
None
Supported Values
None
Version Added
2.4.9
-
--reset-backend-table¶ Remove the backend table before offloading. Use with caution as this will delete previously offloaded data for this table.
Alias
None
Default Value
None
Supported Values
None
Version Added
3.3.0
-
--reset-hybrid-view¶ Reset Partition-Based Offload or Predicate-Based Offload predicates in the hybrid view.
Alias
None
Default Value
None
Supported Values
None
Version Added
3.3.0
-
--skip-steps¶ Skip given steps. CSV of step IDs to be skipped. Step IDs are found by replacing spaces with underscore and are case-insensitive.
For example, it is possible to skip Impala compute statistics commands using a value of
Compute_backend_statisticsif an initial offload is being performed in stages, and then gather them with the final offload command.Alias
None
Default Value
None
Supported Values
Valid offload step names
Version Added
2.3.0
-
--sort-columns¶ CSV list of columns used to sort or cluster data when inserting into the final destination table. Offloads using Partition-Based Offload or Subpartition-Based Offload will retrieve the value used by the prior offload if no list of columns is explicitly provided. This option has no effect when
OFFLOAD_SORT_ENABLED/--offload-sort-enabledis false.When using Offload Join the column names in
--sort-columnsmust match those in the final destination table (not the names used in the source tables).Alias
None
Default Value
None for non-partitioned source tables,
--partition-columnsfor partitioned source tablesSupported Values
Valid column name(s)
Version Added
2.7.0
-
--sqoop-disable-direct¶ It is recommended that the OraOOP optimizations for Sqoop (included in standard Apache Sqoop from
v1.4.5) are used. If not, then disable direct path mode.Alias
None
Default Value
SQOOP_DISABLE_DIRECT,falseSupported Values
true|falseVersion Added
2.3.0
-
--sqoop-mapreduce-map-java-opts¶ Sqoop specific setting for
-Dmapreduce.map.java.opts. Allows control over Java options for Sqoop MapReduce jobs.Alias
None
Default Value
None
Supported Values
Valid Sqoop Java options
Version Added
2.3.0
-
--sqoop-mapreduce-map-memory-mb¶ Sqoop specific setting for
-Dmapreduce.map.memory.mb. Allows control over memory allocation for Sqoop MapReduce jobs.Alias
None
Default Value
None
Supported Values
Valid numbers in MB
Version Added
2.3.0
-
--sqoop-additional-options¶ Additional Sqoop command options added to the end of the Sqoop command.
Alias
None
Default Value
Supported Values
Any Sqoop command option/argument not already included in the Sqoop command line
Version Added
2.9.0
-
--sqoop-password-file¶ Path to an HDFS file containing
ORA_APP_PASSwhich is then passed to Sqoop using the Sqoop--password-fileoption. This file should be protected with appropriate file system permissions.Alias
None
Default Value
Supported Values
Valid HDFS path
Version Added
2.5.0
-
--storage-compression¶ Storage compression of final offload table.
GZIPonly available with Parquet.ZLIBonly available with ORC.MEDis an alias forSNAPPYon both Impala and Hive. This is the default value because it gives the best balance of elapsed time to compression.HIGHis an alias forGZIPon Impala,ZLIBon Hive.Alias
None
Default Value
MEDSupported Values
HIGH|MED|NONE|GZIP|ZLIB|SNAPPYVersion Added
2.3.0
-
--storage-format¶ Storage format of final backend table. Not applicable to Google BigQuery.
Alias
None
Default Value
PARQUETfor Impala,ORCfor HiveSupported Values
ORC|PARQUETVersion Added
2.3.0
-
--timestamp-tz-columns¶ CSV list of columns to offload as a timestamp with time zone (will only be effective for date-based columns).
Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
4.0.0
-
--udf-db¶ Backend database to use for custom user-defined functions (UDFs).
Gluent Data Platform UDFs are used in Hadoop-based backends to:
Convert data to Oracle Database binary formats (
ORACLE_NUMBER,ORACLE_DATE)Perform Run-Length Encoding
Handle data conversion functions e.g.
UPPER,LOWER
They are installed once during installation, and upgraded, using the
connect --install-udfscommand.Alias
None
Default Value
Supported Values
Valid backend database
Version Added
2.3.0
-
--variable-string-columns¶ CSV list of columns to offload as a variable length string. Only effective for date/timestamp columns.
Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--verify¶ Validation method to use when verifying data at the end of an offload.
Alias
None
Default Value
minusSupported Values
minus|aggregateVersion Added
2.3.0
Present Parameters¶
-
--aggregate-by¶ CSV list of columns to aggregate by (GROUP BY) when presenting an Advanced Aggregation Pushdown rule.
Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.3.0
-
--base-name¶ For aggregations only. Provide the name of the base hybrid view originally presented before aggregation. Use when the base view name is different to its source backend table.
Alias
None
Default Value
None
Supported Values
<SCHEMA>.<VIEW_NAME>Version Added
2.3.0
-
--binary-columns¶ CSV list of columns to present using a binary data type. Only effective for string-based columns.
Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--columns¶ CSV list of columns to present.
Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.3.0
-
--count-star-expressions¶ CSV list of functional equivalents to
COUNT(*)for aggregation pushdown.If you also use
COUNT(x)in your SQL statements then, apart fromCOUNT(1)which is automatically catered for, the presence ofCOUNT(x)will cause rewrite rules to fail unless you include it with this parameter.Alias
None
Default Value
None
Supported Values
E.g.
COUNT(9)Version Added
2.3.0
-
--data-governance-custom-properties¶ CSV list of free-format tags for data governance metadata. These are in addition to
DATA_GOVERNANCE_AUTO_PROPERTIESand will overrideDATA_GOVERNANCE_CUSTOM_PROPERTIES.Alias
None
Default Value
Supported Values
Valid JSON string of key/value pairs (no nested or complex data types)
Version Added
2.11.0
CSV list of free-format tags for data governance metadata. These are in addition to
DATA_GOVERNANCE_AUTO_TAGSand therefore useful for tags to be applied to specific activities.Alias
None
Default Value
Supported Values
E.g.
CONFIDENTIAL,TIER1Version Added
2.11.0
-
--date-columns¶ CSV list of columns to present to Oracle Database as DATE (effective for datetime/timestamp columns).
Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.3.0
-
--date-fns¶ CSV list of functions to apply to the non-aggregating date/timestamp projection.
Alias
None
Default Value
MIN,MAX,COUNTSupported Values
MIN,MAX,COUNTVersion Added
2.3.0
-
--decimal-columns¶ CSV list of columns to offload/present as a fixed precision and scale numeric data type. For example
DECIMAL(p,s)where “p,s” is specified in a paired--decimal-columns-typeoption. Only effective for numeric columns. These options allow repeat inclusion for flexible data type specification, for example:"--decimal-columns-type=18,2 --decimal-columns=price,cost --decimal-columns-type=6,4 --decimal-columns=location"
Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.5.0
-
--decimal-columns-type¶ State the precision and scale of columns listed in a paired
--decimal-columnsoption. Must be of format “precision,scale” where 1<=precision<=38 and 0<=scale<=38 and scale<=precision. e.g.:"--decimal-columns-type=18,2"
When offloading, values specified in this option are subject to padding as per the
--decimal-padding-digitsoption.Alias
None
Default Value
None
Supported Values
Valid “precision,scale” where 1<=precision<=38 and 0<=scale<=38 and scale<=precision
Version Added
2.5.0
-
--detect-sizes¶ Query backend table/view data length and set external table columns sizes accordingly.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--integer-1-columns¶ CSV list of columns to offload/present (as applicable) as a 1-byte integer, known as
TINYINTin many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-2-columns¶ CSV list of columns to offload/present (as applicable) as a 2-byte integer, known as
SMALLINTin many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-4-columns¶ CSV list of columns to offload/present (as applicable) as a 4-byte integer, known as
INTin many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-8-columns¶ CSV list of columns to offload/present (as applicable) as an 8-byte integer, known as
BIGINTin many systems. Check your backend/RDBMS documentation to ensure column values are compatible. Only effective for numeric columns.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--integer-38-columns¶ CSV list of columns to offload/present (as applicable) as 38 digit integral column. If a system does not support 38 digits of precision then the most appropriate data type available will be used. Only effective for numeric columns.
Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--interval-ds-columns¶ CSV list of columns to present to Oracle Database as
INTERVAL DAY TO SECONDtype (will only be effective for backendSTRINGcolumns).Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.3.0
-
--interval-ym-columns¶ CSV list of columns to present to Oracle Database as
INTERVAL YEAR TO MONTHtype (will only be effective for backendSTRINGcolumns).Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.3.0
-
--large-binary-columns¶ CSV list of columns to present using a large binary data type, for example Oracle Database
BLOB. Only effective for string-based columns.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--large-string-columns¶ CSV list of columns to present as a large string data type, for example Oracle Database
CLOB. Only effective for string-based columns.Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
3.3.0
-
--lob-data-length¶ Expected length of RDBMS LOB data
Alias
None
Default Value
32KSupported Values
E.g.
64K,10MVersion Added
2.4.7
-
--materialize-join¶ Use this option to materialize a join specified using
--present-join.Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--measures¶ CSV list of aggregated columns to include in the projection of an aggregated present.
Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
2.4.0
-
--no-create-aggregations¶ Skip aggregation creation. If this parameter is used, then to benefit from Advanced Aggregation Pushdown the aggregate hybrid objects must be created using Present.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--no-gather-stats¶ Skip generation of new statistics for presented tables/views (default behavior is to generate statistics for new aggregate/join views or existing backend tables with no statistics).
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
-
--num-location-files¶ Number of external table location files for parallel data retrieval.
Alias
None
Default Value
Supported Values
Integer values
Version Added
2.7.2
-
--numeric-fns¶ CSV list of aggregate functions to apply to non-aggregating numeric columns or measures in an aggregation projection.
Alias
None
Default Value
MIN,MAX,AVG,SUM,COUNTSupported Values
MIN,MAX,AVG,SUM,COUNTVersion Added
2.3.0
-
--present-join¶ Present a view of the supplied join(s) allowing the join processing to be offloaded. Repeated use of
--present-joinallows multiple row sources to be included. See documentation for permitted syntax.Alias
None
Default Value
None
Supported Values
Version Added
2.3.0
-
--reset-backend-table¶ Remove the backend table before offloading. Use with caution as this will delete previously offloaded data for this table.
Alias
None
Default Value
None
Supported Values
None
Version Added
3.3.0
-
--sample-stats¶ Estimate statistics by scanning a few (random) partitions for presented tables/views (default behavior is to scan the entire table).
Alias
None
Default Value
None
Supported Values
0-100Version Added
2.3.0
-
--string-fns¶ CSV list of aggregate functions to apply to non-aggregating string columns or measures in an aggregation projection.
Alias
None
Default Value
MIN,MAX,COUNTSupported Values
MIN,MAX,COUNTVersion Added
2.3.0
-
--timestamp-columns¶ CSV list of columns to present as a
TIMESTAMP(only effective for date based columns)Alias
None
Default Value
None
Supported Values
Valid column name(s)
Version Added
4.0.0
Incremental Update Parameters¶
-
--incremental-batch-size¶ Batch (fetch) size to use when extracting changes for shipping from a table that is enabled for Incremental Update.
Alias
None
Default Value
1000Supported Values
Positive integers
Version Added
2.5.0
-
--incremental-changelog-sequence-cache-size¶ Specifies the cache size to use for a sequence coupled to the log table used for Incremental Update extraction.
Alias
None
Default Value
100Supported Values
Positive integers
Version Added
2.10.0
-
--incremental-changelog-table¶ Specifies the name of the log table to use for Incremental Update extraction (format is
<OWNER>.<TABLE>). Not required when--incremental-extraction-methodisORA_ROWSCN.Alias
None
Default Value
<Hybrid Schema>.<Table Name>_LOGSupported Values
<OWNER>.<TABLE>Version Added
2.5.0
-
--incremental-delta-threshold¶ When running the compaction routine for a table enabled for Incremental Update, this threshold denotes the minimum number of changes required to enable the compaction routine to be executed (i.e. compaction will only be executed if there are at least this many rows in the delta table at a given time).
Alias
None
Default Value
50000Supported Values
Positive integers
Version Added
2.5.0
-
--incremental-extraction-method¶ Indicates which change extraction method to use when enabling Incremental Update for a table during an offload.
Alias
None
Default Value
ORA_ROWSCNSupported Values
ORA_ROWSCN,CHANGELOG,UPDATABLE_CHANGELOG,UPDATABLE,CHANGELOG_INSERT,UPDATABLE_INSERTVersion Added
2.5.0
-
--incremental-full-compaction¶ When running the compaction routine for a table that has Incremental Update enabled, insert compacted records into a new base table, also known as an out-of-place compaction.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.10.0
-
--incremental-key-columns¶ Comma-separated list of columns that uniquely identify rows in an offloaded source table. Columns are used when extracting incremental changes from the source table and applying them to the offloaded table. In the absence of this parameter the primary key of the table is used.
Alias
None
Default Value
Primary key
Supported Values
Comma-separated list of columns
Version Added
2.5.0
-
--incremental-no-lockfile¶ When running the compaction routine for a table that is enabled for Incremental Update, do not use a lockfile on the local filesystem to prevent multiple compaction processes from running concurrently (on that machine).
Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--incremental-no-verify-primary-key¶ Bypass verification of mandatory primary key when using
CHANGELOG_INSERTorUPDATABLE_INSERTextraction methods.Alias
None
Default Value
None
Supported Values
None
Version Added
2.9.0Warning
With this option, users must ensure that no duplicate records are inserted.
-
--incremental-no-verify-shipped¶ Bypass verification of the number of change records shipped when extracting and shipping changes for a table that is enabled for Incremental Update.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--incremental-partition-wise-full-compaction¶ When running the compaction routine for a table that has Incremental Update enabled, insert compacted records into the new base table partition-wise. Note that this may cause the compaction process to take significantly longer overall, but it can also significantly reduce the cluster resources used by compaction at any one time.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0. Renamed from--incremental-partition-wise-compactionin2.10.0
-
--incremental-retain-obsolete-objects¶ Retain the previous artifacts when the compaction routine has completed for a table with Incremental Update enabled.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0Warning
With this option, users must manage previous artifacts and associated storage. In some circumstances, retained obsolete objects can cause the re-offloading of entire tables (with the
--reset-backend-tableoption) to fail.
-
--incremental-run-compaction¶ Run the compaction routine for a table that has Incremental Update enabled. Must be used in conjunction with the
--executeparameter.Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--incremental-run-compaction-without-snapshot¶ Run the compaction routine for a table without creating an HDFS snapshot.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.10.0
-
--incremental-run-extraction¶ Extract and ship all new changes for a table that has Incremental Update enabled. Must be used in conjunction with the
--executeparameter.Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--incremental-terminate-compaction¶ When running the compaction routine for a table with Incremental Update enabled, instruct the compaction process to exit when blocked by some external condition. By default, the compaction process will keep running when blocked, but will drop into a sleep-then-poll loop.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--incremental-tmp-dir¶ When extracting and shipping changes for a table that has Incremental Update enabled, this specifies the staging directory to be used for local data files, before they are shipped to HDFS.
Alias
None
Default Value
<OFFLOAD_HOME>/tmp/incremental_changesSupported Values
Valid writable directory
Version Added
2.5.0
-
--incremental-updates-disabled¶ Disables Incremental Update for the specified table.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.6.0
-
--incremental-updates-enabled¶ Enables Incremental Update for the table being offloaded.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--incremental-wait-time¶ When running the compaction routine for a table that has Incremental Update enabled, this specifies the minimum amount of time (in minutes) to allow for active queries to complete before performing any database operations that could cause such queries to fail.
Alias
None
Default Value
15Supported Values
0 and positive integersVersion Added
2.5.0
Validate Parameters¶
-
--aggregate-functions¶ Comma-separated list of aggregate functions to apply, e.g.
max, min, count. Functions need to be available and use the same arguments in both frontend and backend databases.Alias
-ADefault Value
[('min', 'max', 'count')]Supported Values
CSV list of expressions
Version Added
2.3.0
-
--as-of-scn¶ Execute validation on frontend site as-of a specified SCN (assumes an
ORACLEfrontend).Alias
None
Default Value
None
Supported Values
Valid SCN
Version Added
2.3.0
-
--filters¶ Comma-separated list of (<column> <operation> <value>) expressions, e.g.
PROD_ID < 12, CUST_ID >= 1000. Expressions must be supported in both frontend and backend databases.Alias
-FDefault Value
None
Supported Values
CSV list of expressions
Version Added
2.3.0
-
--group-bys¶ Comma-separated list of group by expressions, e.g.
COL1, COL2. Expressions must be supported in both frontend and backend databases.Alias
-GDefault Value
None
Supported Values
csv list of expressions
Version Added
2.3.0
-
--selects¶ Comma-separated list of columns OR <number> of columns to run aggregations on. If <number> is specified the first and last columns and the <number>-2 highest cardinality columns will be selected.
Alias
-SDefault Value
5Supported Values
CSV list of columns OR <number>
Version Added
2.3.0
-
--skip-boundary-check¶ Do not include ‘offloaded boundary check’ in the list of filters. The ‘offloaded boundary check’ filter defines data that was offloaded to the backend database. For example:
WHERE TIME_ID < timestamp '2015-07-01 00:00:00'which resulted from applying the--older-than-date=2015-07-01filter during offload.Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
Schema Sync Parameters¶
-
--command-file¶ Name of an additional log file to record the commands that have been applied (if the
--executeoption has been used) or should be applied (if the--executeoption has not been used). Supplied as full or relative path.Alias
None
Default Value
None
Supported Values
Full or relative path to file
Version Added
2.8.0
-
--include¶ CSV list of schemas, schema.tables or tables to examine for change detection and evolution. Supports wildcards (using
*). Example formats:SCHEMA1,SCHEMA*,SCHEMA1.TABLE1,SCHEMA1.TABLE2,SCHEMA2.TAB*,SCHEMA1.TAB*,*.TABLE1,*.TABLE2,*.TAB*.Alias
None
Default Value
None
Supported Values
List of one or more schema(s), schema(s).table(s) or table(s)
Version Added
2.8.0
-
--no-create-aggregations¶ Skip aggregation creation. If this parameter is used, then to benefit from Advanced Aggregation Pushdown the aggregate hybrid objects must be created using Present.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.3.0
Diagnose Parameters¶
-
--backend-log-size-limit¶ Size limit for data returned from each backend log e.g. 100K, 0.5M, 1G.
Alias
None
Default Value
10MSupported Values
<n><K|M|G|T>Version Added
2.11.0
-
--hive-http-endpoint¶ Endpoint of the HiverServer2 or HiveServer2 Interactive (LLAP) service in the format
<server|ip address>:<port>.Alias
None
Default Value
None
Supported Values
<server|ip address>:<port>Version Added
3.1.0
-
--impalad-http-port¶ Port of the Impala Daemon HTTP Server.
Alias
None
Default Value
25000Supported Values
Positive integers
Version Added
2.11.0
-
--include-backend-logs¶ Retrieve backend query engine logs.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.11.0
-
--include-backend-config¶ Retrieve backend query engine config.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.11.0
-
--include-logs-from¶ Collate and package log files modified or created since date (format:
YYYY-MM-DD) or date/time (format:YYYY-MM-DD_HH24:MM:SS). Can be used in conjunction with the--include-logs-toparameter to specify a search range.Alias
None
Default Value
None
Supported Values
YYYY-MM-DDorYYYY-MM-DD_HH24:MM:SSVersion Added
2.11.0
-
--include-logs-last¶ Collate and package log files modified or created in the last
n[d]ays (e.g. 3d) or [h]ours (e.g. 7h).Alias
None
Default Value
None
Supported Values
<n><d|h>Version Added
2.11.0
-
--include-logs-to¶ Collate and package log files modified or created since date (format:
YYYY-MM-DD) or date/time (format:YYYY-MM-DD_HH24:MM:SS). Can be used in conjunction with the--include-logs-fromparameter to specify a search range.Alias
None
Default Value
None
Supported Values
YYYY-MM-DDorYYYY-MM-DD_HH24:MM:SSVersion Added
2.11.0
-
--include-permissions¶ Collect permissions of files and directories related to Gluent Data Platform.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.11.0
-
--include-processes¶ Collect details for running processes related to Gluent Data Platform.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.11.0
-
--include-query-logs¶ Retrieve logs for a supplied query ID.
Alias
None
Default Value
None
Supported Values
Valid Impala/LLAP query ID
Version Added
2.11.0
-
--log-location¶ Location in which to search for log files.
Alias
None
Default Value
OFFLOAD_HOME/logSupported Values
Valid directory path
Version Added
2.11.0
-
--output-location¶ Location in which to save files created by Diagnose.
Alias
None
Default Value
OFFLOAD_HOME/logSupported Values
Valid directory path
Version Added
2.11.0
-
--retain-created-files¶ By default, after they have been packaged, files created by Diagnose in
--output-locationare removed. Specify this parameter to retain them.Alias
None
Default Value
None
Supported Values
None
Version Added
2.11.0
-
--spark-application-id¶ Retrieve logs for a supplied Spark application ID.
Alias
None
Default Value
None
Supported Values
Valid Spark application ID
Version Added
3.1.0
Offload Status Report Parameters¶
-
--csv-delimiter¶ Field delimiter character for output.
Alias
None
Default Value
,Supported Values
Must be a single character
Version Added
2.11.0
-
--csv-enclosure¶ Enclosure character for string fields in CSV output.
Alias
None
Default Value
"Supported Values
Must be a single character
Version Added
2.11.0
-
-o¶ Output format for Offload Status Report data.
Alias
--output-formatDefault Value
textSupported Values
csv|text|html|json|rawVersion Added
2.11.0
-
--output-format¶ Output format for Offload Status Report data.
Alias
-oDefault Value
textSupported Values
csv|text|html|json|rawVersion Added
2.11.0
-
--output-level¶ Level of detail required for the Offload Status Report.
Alias
None
Default Value
summarySupported Values
summary|detailVersion Added
2.11.0
-
--report-directory¶ Directory to save the report in.
Alias
None
Default Value
OFFLOAD_HOME/logSupported Values
Valid directory path
Version Added
2.11.0
-
--report-name¶ Name of report.
Alias
None
Default Value
Gluent_Offload_Status_Report_{DB_NAME}_{YYYY}-{MM}-{DD}_{HH}-{MI}-{SS}.[html|txt|csv]Supported Values
Valid filename
Version Added
2.11.0
-
-s¶ Optional name of schema to run the Offload Status Report for.
Alias
--schemaDefault Value
None
Supported Values
Valid schema name
Version Added
2.11.0
-
--schema¶ Optional name of schema to run the Offload Status Report for.
Alias
-sDefault Value
None
Supported Values
Valid schema name
Version Added
2.11.0
-
-t¶ Optional name of table to run the Offload Status Report for.
Alias
--tableDefault Value
None
Supported Values
Valid table name
Version Added
2.11.0
-
--table¶ Optional name of table to run the Offload Status Report for.
Alias
-tDefault Value
None
Supported Values
Valid table name
Version Added
2.11.0
Password Tool Parameters¶
-
--encrypt¶ Encrypt a clear-text, case-sensitive password. User will be prompted for the input password and the encrypted version will be output.
Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--keygen¶ Generate a password key file of the name given by
--keyfile.Alias
None
Default Value
None
Supported Values
None
Version Added
2.5.0
-
--keyfile¶ Name of the password key file to generate.
Alias
None
Default Value
None
Supported Values
Valid path and file name
Version Added
2.5.0
Result Cache Manager Parameters¶
-
--rc-retention-hours¶ Controls how long to retain Result Cache files for.
Alias
None
Default Value
24Supported Values
Valid number of hours
Version Added
2.3.0
Oracle Database Schemas¶
Gluent Data Platform Admin Schema¶
This account is used by Gluent Data Platform to perform administrative activities. It is defined by ORA_ADM_USER.
Non-standard privileges granted to this schema are:
-
ANALYZE ANY Required to copy optimizer statistics from application schema to hybrid schema
-
GRANT ANY OBJECT PRIVILEGE Enables the Admin Schema to grant permission on application schema tables to the hybrid schema.
-
SELECT ANY DICTIONARY Enables Offload and Present operations to access the Oracle Database data dictionary for information such as column names, data types and partitioning schemes.
-
SELECT ANY TABLE Required for Offload activity.
Gluent Data Platform Application Schema¶
This account is used by Gluent Data Platform to perform read-only activities. It is defined by ORA_APP_USER.
Non-standard privileges granted to this schema are:
-
FLASHBACK ANY TABLE Required for Sqoop to provide a consistent point-in-time data load. The Gluent Data Platform application schema does not have DML privileges on user application schema tables, therefore there is no threat posed by this configuration.
-
SELECT ANY DICTIONARY Documented requirement of Sqoop.
-
SELECT ANY TABLE Required for Sqoop to read application schema tables during an offload.
Gluent Data Platform Repository Schema¶
This account is used by Gluent Data Platform to store operational metadata. It is defined by ORA_REPO_USER.
Non-standard privileges granted to this schema are:
-
SELECT ANY DICTIONARY Enables installed database packages in support of the metadata repository to access the Oracle Database data dictionary.
Hybrid Schemas¶
Gluent Data Platform hybrid schemas are required to enable remote data to be queried in tandem with customer data in the RDBMS application schema.
Non-standard privileges granted to hybrid schemas are:
-
CONNECT THROUGH GLUENT_ADM Offload and Present use this to create hybrid objects without requiring powerful
CREATE ANYandDROP ANYprivileges.
-
GLOBAL QUERY REWRITE Required to support Gluent Query Engine optimizations.
-
SELECT ANY TABLE Enables a hybrid view to access the original application schema and offloaded table.
Data Daemon¶
Properties¶
The following Java properties can be set by creating a $OFFLOAD_HOME/conf/datad.properties file containing <property>=<value> properties and values.
-
datad.max-concurrent-requests¶ The maximum number of concurrent read requests from the RDBMS that can be processed.
Default Value
16Supported Values
Positive integers
Version Added
4.0.0
-
datad.read-pipeline-size¶ The number of reads from the backend to keep in the pipeline to be processed.
Default Value
4Supported Values
Positive integers
Version Added
4.0.0
-
datad.send-queue-size¶ The maximum size in MB of the queue to send to the RDBMS.
Default Value
16Supported Values
Positive integers
Version Added
4.0.0
-
grpc.port¶ The port used for Data Daemon. Setting to
0results in random port selection.Default Value
50051Supported Values
Any valid port
Version Added
4.0.0
-
grpc.security.cert-chain¶ The full path to the certificate chain in PEM format to enable TLS on the Data Daemon socket.
Default Value
None
Supported Values
file:<full path to PEM file>Version Added
4.0.0
-
grpc.security.private-key¶ The full path to the private key in PEM format to enable TLS on the Data Daemon socket.
Default Value
None
Supported Values
file:<full path to PEM file>Version Added
4.0.0
-
logging.config¶ The full path to a LOGBack format configuration file to override default logging.
Default Value
None
Supported Values
<full path to xml file>Version Added
4.0.0
-
logging.level.com.gluent.providers.impala.ImpalaProvider¶ The log level for Data Daemon interactions with Impala.
Default Value
infoSupported Values
off|error|warn|info|debug|allVersion Added
4.0.0
-
logging.level.com.gluent.providers.bigquery.BigQueryProvider¶ The log level for Data Daemon interactions with BigQuery.
Default Value
infoSupported Values
off|error|warn|info|debug|allVersion Added
4.0.0
-
server.port¶ The port used for Data Daemon Web Interface. Setting to
0results in random port selection.Default Value
50052Supported Values
Any valid port
Version Added
4.0.0
-
spring.main.web-application-type¶ Allows Data Daemon Web Interface to be disabled.
Default Value
None
Supported Values
NONEVersion Added
4.0.0
Configuration¶
The following Java configuration options can be set by creating a $OFFLOAD_HOME/conf/datad.conf file containing JAVA_OPTS="<parameter1> <parameter2> ..." e.g. JAVA_OPTS="-Xms2048m -Xmx2048m -Djavax.security.auth.useSubjectCredsOnly=false".
-
-Xms¶ Sets the initial and minimum Java heap size.
Default Value
Smaller of 1/4th of the physical memory or 1GB
Supported Values
-Xms<size>[g|G|m|M|k|K]Version Added
4.0.0
-
-Xmx¶ Sets the maximum Java heap size.
Default Value
Larger of 1/64th of the physical memory or some reasonable minimum
Supported Values
-Xmx<size>[g|G|m|M|k|K]Version Added
4.0.0
-
-Djavax.security.auth.useSubjectCredsOnly¶ Required to be set to
falsewhen authenticating with a Kerberos enabled backend.Default Value
trueSupported Values
true|falseVersion Added
4.0.0