Gluent Data Platform Environment File Creation¶

Table of Contents

Introduction
Templates
Oracle
Backend
Propagate to Remaining Oracle RAC Nodes
Documentation Feedback

Introduction ¶

Gluent Data Platform uses an environment file named offload.env located in the $OFFLOAD_HOME/conf directory. The Gluent Data Platform environment file contains configuration key-value pairs that allow Gluent Data Platform components to interact with both the Oracle Database and backend system.

The Gluent Data Platform environment file is initially populated during installation and subsequently modified during upgrade or when environmental changes occur.

The steps required for the creation of the Gluent Data Platform environment file are as follows:

Step	Necessity	Details
Step 1	Mandatory	Copy the environment file template
Step 2	Mandatory	Configure Oracle parameters
Step 3	Mandatory	Configure Backend parameters
Step 4	Optional	Propagate environment file to remaining Oracle RAC nodes (if Gluent Data Platform is installed on those nodes)

Note

Gluent Data Platform is highly configurable and there are a number of other options that may be appropriate for an environment. For assistance please contact Gluent Support.

Templates ¶

Template environment files are included with Gluent Data Platform. The template is the starting point for the creation of the offload.env file. The template required depends on the source and backend combination outlined in the table below.

Source	Backend	Details
Oracle	Cloudera Data Hub	`oracle-hadoop-offload.env.template`
Oracle	Cloudera Data Platform	`oracle-hadoop-offload.env.template`
Oracle	Google BigQuery	`oracle-bigquery-offload.env.template`
Oracle	Snowflake	`oracle-snowflake-offload.env.template`

On the Oracle Database server navigate to the $OFFLOAD_HOME/conf directory and copy the correct environment file template, for example:

$ cd $OFFLOAD_HOME/conf/
$ cp oracle-hadoop-offload.env.template offload.env

Oracle ¶

Open offload.env in a text editor.

Note

When offload.env is copied to the server from an external source then be mindful of potential conversion from UNIX to DOS/MAC file format. A file in DOS/MAC format will cause errors such as syntax error: unexpected end of file. In these cases the dos2unix command will resolve the issue.

The Oracle parameters that require attention are:

Parameter	Reference
`METAD_AUTOSTART`	Set to `false` if the outcome of the Metadata Daemon OS User prerequisite action was root
`NLS_LANG`	Set to value gathered during the Database Characterset prerequisite action
`ORA_CONN`	EZConnect connection string to Oracle database
`ORA_ADM_USER`	Username for connecting to Oracle for administration activities from Install Oracle Database Components
`ORA_ADM_PASS`	Password for `ORA_ADM_USER` from Set Passwords for Gluent Oracle Database Users
`ORA_ADM_CONN`	Connection string (typically tnsnames.ora entry) for `ORA_ADM_USER` connections - primarily for use with Oracle Wallet as each entry requires a unique connection string.
`ORA_APP_USER`	Username for connecting to Oracle for read activities from Install Oracle Database Components
`ORA_APP_PASS`	Password for `ORA_APP_USER` from Set Passwords for Gluent Oracle Database Users
`ORA_REPO_USER`	Username for the Gluent Metadata Repository owner from Install Oracle Database Components
`TWO_TASK`	For single instance Oracle Multitenant/Pluggable database environments uncomment the preconfigured `TWO_TASK` variable. For RAC Multitenant/Pluggable database environments `TWO_TASK` should be set to an EZconnect string connecting to the local instance, typically `<hostname>:<port>/<ORACLE_SID>`

Backend ¶

Refer to the appropriate backend distribution section below.

Cloudera Data Hub¶

The Cloudera Data Hub parameters that require attention are:

Parameter	Reference
`DATAD_ADDRESS`	Address(es) of Data Daemon(s). Can point to a single daemon or multiple daemons
`HDFS_CMD_HOST`	Host for running HDFS command steps. Overrides `HIVE_SERVER_HOST` if set
`HDFS_DATA`	Set to the location created for Impala databases containing offloaded data during Create HDFS Directories
`HDFS_HOME`	Set to the location created for storage of files used by Gluent Data Platform user-defined functions during Create HDFS Directories
`HDFS_LOAD`	Set to the location created as the transient staging area used by the data transport phase of Offload during Create HDFS Directories
`HDFS_NAMENODE_ADDRESS`	Set to the hostname or IP address of the active HDFS namenode or the ID of the HDFS nameservice if HDFS High Availability is configured
`HDFS_NAMENODE_PORT`	Set to the port of the active HDFS namenode, or `0` if HDFS High Availability is configured and `HDFS_NAMENODE_ADDRESS` is set to a nameservice ID
`HIVE_SERVER_AUTH_MECHANISM`	If the Cluster Configuration prerequisite shows LDAP is enabled, set to `PLAIN`
`HIVE_SERVER_HOST`	Hadoop node(s) running Impala Frontend Server (impalad). Can be a single entry, a comma-separated list of entries, or a load balancer entry
`HIVE_SERVER_PASS`	If the Cluster Configuration prerequisite shows LDAP is enabled, set to the password of the `HIVE_SERVER_USER` user. Password encryption is supported using the Password Tool utility
`HIVE_SERVER_PORT`	Set to the value obtained from Cluster Configuration prerequisite if different from default value (`21050`)
`HIVE_SERVER_USER`	If the Cluster Configuration prerequisite shows LDAP is enabled, set to the LDAP user with which Gluent Data Platform will authenticate to Impala
`IN_LIST_JOIN_TABLE`	On Cloudera Data Hub versions earlier than 5.10.x this parameter should be uncommented for the Creation of Sequence Table step
`IN_LIST_JOIN_TABLE_SIZE`	On Cloudera Data Hub versions earlier than 5.10.x this parameter should be uncommented for the Creation of Sequence Table step
`KERBEROS_KEYTAB`	If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the full path to the Kerberos keytab file created during the Kerberos prerequisite
`KERBEROS_PRINCIPAL`	If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the Kerberos principal created during the Kerberos prerequisite
`KERBEROS_SERVICE`	If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the value of Kerberos Principal from the Cluster Configuration prerequisite
`KERBEROS_TICKET_CACHE_PATH`	If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the full path of the Kerberos ticket cache file (e.g. `/tmp/krb5cc_54321`)
`OFFLOAD_TRANSPORT_CMD_HOST`	Host for running Offload data transport (Sqoop or Spark Submit) commands
`OFFLOAD_TRANSPORT_PARALLELISM`	Choose a value appropriate to the resources available in both the source RDBMS and target Hadoop cluster. Refer to the Offload Guide for further information
`OFFLOAD_TRANSPORT_USER`	Set to the Gluent Data Platform OS user created in the Provision a Gluent Data Platform OS User prerequisite
`SQOOP_OVERRIDES`	If `OFFLOAD_TRANSPORT_CMD_HOST` is using the Sqoop 1 Client and “Enable avro logical types” is true, then set this to `"-Dsqoop.avro.logical_types.decimal.enable=false"`
`SSL_ACTIVE`	If the Cluster Configuration prerequisite shows SSL is enabled set to `true`
`SSL_TRUSTED_CERTS`	If `SSL_ACTIVE` is `true` set to the value of SSL Certificate from the Cluster Configuration prerequisite
`WEBHDFS_HOST`	Set to the address of the HDFS NameNode. If a High Availability configuration is used, this address can be a Nameservice and Gluent Data Platform will connect to the active NameNode
`WEBHDFS_PORT`	Set to the port of the WebHDFS service
`WEBHDFS_VERIFY_SSL`	Set if WebHDFS is secured with SSL. Valid values are `false` (use SSL but do not verify certificates), `true` (use SSL and verify certificates against default certificates), and a path to a certificate bundle for the HDFS cluster (all nodes)

Note

If using Sqoop for offload transport and the path /user/$OFFLOAD_TRANSPORT_USER will not exist in HDFS then you may also need to override yarn.app.mapreduce.am.staging-dir with a writable location using SQOOP_OVERRIDES as in the example below:

export SQOOP_OVERRIDES='-Dyarn.app.mapreduce.am.staging-dir=/encryption_zone/data/gluent'

Review the following for additional parameters that may require attention before continuing with installation steps:

Integrating with Cloud Storage

Cloudera Data Platform Private Cloud¶

The Cloudera Data Platform Private Cloud parameters that require attention are:

Parameter	Reference
`DATAD_ADDRESS`	Address(es) of Data Daemon(s). Can point to a single daemon or multiple daemons
`HDFS_CMD_HOST`	Host for running HDFS command steps. Overrides `HIVE_SERVER_HOST` if set
`HDFS_DATA`	Set to the location created for Impala databases containing offloaded data during Create HDFS Directories
`HDFS_HOME`	Set to the location created for storage of files used by Gluent Data Platform user-defined functions during Create HDFS Directories
`HDFS_LOAD`	Set to the location created as the transient staging area used by the data transport phase of Offload during Create HDFS Directories
`HDFS_NAMENODE_ADDRESS`	Set to the hostname or IP address of the active HDFS namenode or the ID of the HDFS nameservice if HDFS High Availability is configured
`HDFS_NAMENODE_PORT`	Set to the port of the active HDFS namenode, or `0` if HDFS High Availability is configured and `HDFS_NAMENODE_ADDRESS` is set to a nameservice ID
`HIVE_SERVER_AUTH_MECHANISM`	If the Cluster Configuration prerequisite shows LDAP is enabled, set to `PLAIN`
`HIVE_SERVER_HOST`	Hadoop node(s) running Impala Frontend Server (impalad). Can be a single entry, a comma-separated list of entries, or a load balancer entry
`HIVE_SERVER_PASS`	If the Cluster Configuration prerequisite shows LDAP is enabled, set to the password of the `HIVE_SERVER_USER` user. Password encryption is supported using the Password Tool utility
`HIVE_SERVER_PORT`	Set to the value obtained from Cluster Configuration prerequisite if different from default value (`21050`)
`HIVE_SERVER_USER`	If the Cluster Configuration prerequisite shows LDAP is enabled, set to the LDAP user with which Gluent Data Platform will authenticate to Impala
`KERBEROS_KEYTAB`	If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the full path to the Kerberos keytab file created during the Kerberos prerequisite
`KERBEROS_PRINCIPAL`	If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the Kerberos principal created during the Kerberos prerequisite
`KERBEROS_SERVICE`	If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the value of Kerberos Principal from the Cluster Configuration prerequisite
`KERBEROS_TICKET_CACHE_PATH`	If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the full path of the Kerberos ticket cache file (e.g. `/tmp/krb5cc_54321`)
`OFFLOAD_TRANSPORT_CMD_HOST`	Host for running Offload data transport (Sqoop or Spark Submit) commands
`OFFLOAD_TRANSPORT_PARALLELISM`	Choose a value appropriate to the resources available in both the source RDBMS and target Hadoop cluster. Refer to the Offload Guide for further information
`OFFLOAD_TRANSPORT_USER`	Set to the Gluent Data Platform OS user created in the Provision a Gluent Data Platform OS User prerequisite
`SQOOP_OVERRIDES`	If `OFFLOAD_TRANSPORT_CMD_HOST` is using the Sqoop 1 Client and “Enable avro logical types” is true, then set this to `"-Dsqoop.avro.logical_types.decimal.enable=false"`
`SSL_ACTIVE`	If the Cluster Configuration prerequisite shows SSL is enabled set to `true`
`SSL_TRUSTED_CERTS`	If `SSL_ACTIVE` is `true` set to the value of SSL Certificate from the Cluster Configuration prerequisite
`WEBHDFS_HOST`	Set to the address of the HDFS NameNode. If a High Availability configuration is used, this address can be a Nameservice and Gluent Data Platform will connect to the active NameNode
`WEBHDFS_PORT`	Set to the port of the WebHDFS service
`WEBHDFS_VERIFY_SSL`	Set if WebHDFS is secured with SSL. Valid values are `false` (use SSL but do not verify certificates), `true` (use SSL and verify certificates against default certificates), and a path to a certificate bundle for the HDFS cluster (all nodes)

Note

If using Sqoop for offload transport and the path /user/$OFFLOAD_TRANSPORT_USER will not exist in HDFS then you may also need to override yarn.app.mapreduce.am.staging-dir with a writable location using SQOOP_OVERRIDES as in the example below:

export SQOOP_OVERRIDES='-Dyarn.app.mapreduce.am.staging-dir=/encryption_zone/data/gluent'

Review the following for additional parameters that may require attention before continuing with installation steps:

Integrating with Cloud Storage

Cloudera Data Platform Public Cloud¶

The Cloudera Data Platform Public Cloud parameters that require attention are:

Parameter	Reference
`CONNECTOR_HIVE_SERVER_HOST`	If Gluent Query Engine is using Data Warehouse Impala set to the domain of the Data Warehouse Impala endpoint. E.g. `coordinator-dw.env-rh6wcb.dw.z0if-2pfm.cloudera.site`
`CONNECTOR_HIVE_SERVER_HTTP_PATH`	If Gluent Query Engine is using Data Warehouse Impala set to `cliservice/impala`
`DATAD_ADDRESS`	Address(es) of Data Daemon(s). Can point to a single daemon or multiple daemons
`HDFS_CMD_HOST`	Host for running HDFS command steps. Overrides `HIVE_SERVER_HOST` if set.
`HDFS_LOAD`	Set to the location created as the transient staging area used by the data transport phase of Offload during Create HDFS Directories
`HIVE_SERVER_HOST`	Set to the domain of the Data Hub Impala endpoint. E.g. `dh-master10.env.z0if-2pfm.cloudera.site`
`HIVE_SERVER_HTTP_PATH`	Set to the path of the Data Hub Impala endpoint. E.g. `data-hub/cdp-proxy-api/impala`
`HIVE_SERVER_PASS`	Set to the Workload Password of the CDP User for Gluent Data Platform. Password encryption is supported using the Password Tool utility
`HIVE_SERVER_PORT`	Set to `443`
`HIVE_SERVER_HTTP_TRANSPORT`	Set to `true`
`HIVE_SERVER_USER`	Set to the Workload User Name of the CDP User for Gluent Data Platform
`HS2_SESSION_PARAMS`	If Gluent Query Engine is using Data Hub Impala set to `SPOOL_QUERY_RESULTS=1`
`OFFLOAD_FS_CONTAINER`	Set to the bucket (AWS) or container (Azure) used as the “Storage Location Base” in the Data Lake
`OFFLOAD_FS_PREFIX`	Set to the path within the bucket (AWS) or container (Azure) to use as the location for Impala databases containing offloaded data
`OFFLOAD_FS_SCHEME`	Set to `s3a` (AWS) or `abfs` (Azure)
`OFFLOAD_TRANSPORT_CMD_HOST`	Set to the Data Hub host for running Offload data transport (Sqoop or Spark Submit) commands. This is typically a Gateway node but can be any Data Hub node running a YARN role (e.g. YARN ResourceManager)
`OFFLOAD_TRANSPORT_PARALLELISM`	Choose a value appropriate to the resources available in both the source RDBMS and Data Hub cluster. Refer to the Offload Guide for further information
`OFFLOAD_TRANSPORT_USER`	Set to the Workload User Name of the CDP User for Gluent Data Platform
`SQOOP_OVERRIDES`	If `OFFLOAD_TRANSPORT_CMD_HOST` is using the Sqoop 1 Client and “Enable avro logical types” is true, then set this to `"-Dsqoop.avro.logical_types.decimal.enable=false"`
`SMART_CONNECTOR_OPTIONS`	Add this parameter as a new entry: `export SMART_CONNECTOR_OPTIONS=-no-result-cache`
`SSL_ACTIVE`	Set to `true`

Google BigQuery¶

The Google BigQuery parameters that require attention are:

Parameter	Reference
`DATAD_ADDRESS`	Address(es) of Data Daemon(s). Can point to a single daemon or multiple daemons
`GOOGLE_APPLICATION_CREDENTIALS`	Path to Google Service Account private key JSON file
`GOOGLE_KMS_KEY_NAME`	Google Cloud Key Management Service cryptographic key name if customer-managed encryption keys (CMEK) for BigQuery will be used
`GOOGLE_KMS_KEY_RING_NAME`	Google Cloud Key Management Service cryptographic key ring name if customer-managed encryption keys (CMEK) for BigQuery will be used
`GOOGLE_KMS_KEY_RING_LOCATION`	Google Cloud Key Management Service cryptographic key ring location if customer-managed encryption keys (CMEK) for BigQuery will be used
`OFFLOAD_FS_CONTAINER`	The name of the Google Cloud Storage Bucket to be used for staging data during Gluent Orchestration
`OFFLOAD_FS_PREFIX`	An optional path with which to prefix offloaded table paths
`OFFLOAD_TRANSPORT_CMD_HOST`	Gluent Node hostname or IP address
`OFFLOAD_TRANSPORT_PARALLELISM`	Choose a value appropriate to the resources available in both the source RDBMS and Gluent Node. Refer to the Offload Guide for further information
`OFFLOAD_TRANSPORT_SPARK_SUBMIT_EXECUTABLE`	The executable to use for submitting Spark applications. Set to `spark-submit`
`OFFLOAD_TRANSPORT_USER`	Set to gluent if the Gluent Node was provisioned from a Google Cloud Platform Marketplace image. Otherwise the name of the Gluent Data Platform OS User

Snowflake¶

The Snowflake parameters that require attention are:

Parameter	Reference
`DATAD_ADDRESS`	Address(es) of Data Daemon(s). Can point to a single daemon or multiple daemons
`OFFLOAD_FS_CONTAINER`	The name of the cloud storage bucket or container to be used for staging data during Gluent Orchestration
`OFFLOAD_FS_PREFIX`	An optional path with which to prefix offloaded table paths
`OFFLOAD_FS_SCHEME`	The cloud storage scheme to be used for staging data during Gluent Orchestration
`OFFLOAD_TRANSPORT_CMD_HOST`	Gluent Node hostname or IP address
`OFFLOAD_TRANSPORT_PARALLELISM`	Choose a value appropriate to the resources available in both the source RDBMS and Gluent Node. Refer to the Offload Guide for further information
`OFFLOAD_TRANSPORT_SPARK_SUBMIT_EXECUTABLE`	The executable to use for submitting Spark applications. Set to `spark-submit`
`OFFLOAD_TRANSPORT_USER`	Set to gluent if the Gluent Node was provisioned from a Google Cloud Platform Marketplace image. Otherwise the name of the Gluent Data Platform OS User
`SNOWFLAKE_ACCOUNT`	Name of the Snowflake Account to be used with Gluent Data Platform
`SNOWFLAKE_DATABASE`	Name of the Snowflake Database to be used with Gluent Data Platform
`SNOWFLAKE_ROLE`	Name of the Gluent Data Platform database Role
`SNOWFLAKE_WAREHOUSE`	Name of the Snowflake Warehouse to use with Gluent Data Platform
`SNOWFLAKE_INTEGRATION`	Name of the Snowflake Storage Integration to use with Gluent Data Platform
`SNOWFLAKE_STAGE`	Name of storage stages to be created by Gluent Data Platform
`SNOWFLAKE_FILE_FORMAT_PREFIX`	Name prefix to use when Gluent Data Platform creates file formats for data loading
`SNOWFLAKE_USER`	Name of the Snowflake User to use to connect to Snowflake for all Gluent Data Platform operations
`SNOWFLAKE_PASS`	Password for the `SNOWFLAKE_USER`, if applicable. Password encryption is supported using the Password Tool utility
`SNOWFLAKE_PEM_FILE`	If the `SNOWFLAKE_USER` was created with RSA key-based file authentication, path to the PEM file
`SNOWFLAKE_PEM_PASSPHRASE`	Passphrase for PEM file authentication, if applicable. Password encryption is supported using the Password Tool utility

One of the following cloud storage vendor credentials needs to be set.

Amazon S3¶

Note

Alternate methods of authentication with Amazon S3 are available and are listed below. Either of these can be used can be used instead of populating offload.env.

Instance Attached IAM Role (Recommended)	No configuration is required. Ensure the IAM role is attached to all servers on which Gluent Data Platform is installed
AWS Credentials File	Authentication can be configured via a credentials file with `aws configure`. Ensure this is done on all servers on which Gluent Data Platform is installed

`AWS_ACCESS_KEY_ID`	Access key ID for the service account used to authenticate
`AWS_SECRET_ACCESS_KEY`	Secret access key for the service account used to authenticate

Google Cloud Storage¶

Parameter	Reference
`GOOGLE_APPLICATION_CREDENTIALS`	Path to Google service account private key JSON file

Microsoft Azure¶

`OFFLOAD_FS_AZURE_ACCOUNT_KEY`	Storage account access key
`OFFLOAD_FS_AZURE_ACCOUNT_NAME`	Storage account name
`OFFLOAD_FS_AZURE_ACCOUNT_DOMAIN`	Storage account service domain. Set to `blob.core.windows.net`

Propagate to Remaining Oracle RAC Nodes ¶

Perform the following steps on each additional Oracle RAC node on which Gluent Data Platform is installed:

Copy offload.env from first node into $OFFLOAD_HOME/conf
For Oracle Multitenant/Pluggable Database environments update TWO_TASK in offload.env to connect to the local instance

Documentation Feedback ¶

Send feedback on this documentation to: feedback@gluent.com

Gluent Data Platform Environment File Creation¶

Introduction¶

Templates¶

Oracle¶

Backend¶

Cloudera Data Hub¶

Cloudera Data Platform Private Cloud¶

Cloudera Data Platform Public Cloud¶

Google BigQuery¶

Snowflake¶

Amazon S3¶

Google Cloud Storage¶

Microsoft Azure¶

Propagate to Remaining Oracle RAC Nodes¶

Documentation Feedback¶

Introduction ¶

Templates ¶

Oracle ¶

Backend ¶

Propagate to Remaining Oracle RAC Nodes ¶

Documentation Feedback ¶