Gluent Data Platform Environment File Creation¶

Table of Contents

Introduction
Templates
Oracle
Backend
Propagate to Remaining Oracle RAC Nodes
Documentation Feedback

Introduction ¶

Gluent Data Platform uses an environment file named offload.env located in the $OFFLOAD_HOME/conf directory. The Gluent Data Platform environment file contains configuration key-value pairs that allow Gluent Data Platform components to interact with both the Oracle Database and backend system.

The Gluent Data Platform environment file is initially populated during installation and subsequently modified during upgrade or when environmental changes occur.

The steps required for the creation of the Gluent Data Platform environment file are as follows:

Step	Necessity	Details
Step 1	Mandatory	Copy the environment file template
Step 2	Mandatory	Configure Oracle parameters
Step 3	Mandatory	Configure Backend parameters
Step 4	Optional	Propagate environment file to remaining Oracle RAC nodes (if Gluent Data Platform is installed on those nodes)

Note

Gluent Data Platform is highly configurable and there are a number of other options that may be appropriate for an environment. For assistance please contact Gluent Support.

Templates ¶

Template environment files are included with Gluent Data Platform. The template is the starting point for the creation of the offload.env file. The template required depends on the source and backend combination outlined in the table below.

Source	Backend	Details
Oracle	Cloudera Data Hub	`oracle-hadoop-offload.env.template`
Oracle	Google BigQuery	`oracle-bigquery-offload.env.template`

On the Oracle Database server navigate to the $OFFLOAD_HOME/conf directory and copy the correct environment file template, for example:

$ cd $OFFLOAD_HOME/conf/
$ cp oracle-hadoop-offload.env.template offload.env

Oracle ¶

Open offload.env in a text editor.

Note

When offload.env is copied to the server from an external source then be mindful of potential conversion from UNIX to DOS/MAC file format. A file in DOS/MAC format will cause errors such as syntax error: unexpected end of file. In these cases the dos2unix command will resolve the issue.

The Oracle parameters that require attention are:

Parameter	Reference
`METAD_AUTOSTART`	Set to `false` if the outcome of the Metadata Daemon OS User prerequisite action was root
`NLS_LANG`	Set to value gathered during the Database Characterset prerequisite action
`ORA_CONN`	EZConnect connection string to Oracle database
`ORA_ADM_USER`	Username for connecting to Oracle for administration activities from Install Oracle Database Components
`ORA_ADM_PASS`	Password for `ORA_ADM_USER` from Set Passwords for Gluent Oracle Database Users
`ORA_APP_USER`	Username for connecting to Oracle for read activities from Install Oracle Database Components
`ORA_APP_PASS`	Password for `ORA_APP_USER` from Set Passwords for Gluent Oracle Database Users
`ORA_REPO_USER`	Username for the Gluent Metadata Repository owner from Install Oracle Database Components
`TWO_TASK`	For single instance Oracle Multitenant/Pluggable database environments uncomment the preconfigured `TWO_TASK` variable. For RAC Multitenant/Pluggable database environments `TWO_TASK` should be set to an EZconnect string connecting to the local instance, typically `<hostname>:<port>/<ORACLE_SID>`

Backend ¶

Refer to the appropriate backend distribution section below.

Cloudera Data Hub¶

The Cloudera Data Hub parameters that require attention are:

Parameter	Reference
`DATAD_ADDRESS`	Address(es) of Data Daemon(s). Can point to a single daemon or multiple daemons
`HDFS_DATA`	Set to the location created for Impala databases containing offloaded data during Create HDFS Directories
`HDFS_HOME`	Set to the location created for storage of files used by Gluent Data Platform user-defined functions during Create HDFS Directories
`HDFS_NAMENODE_ADDRESS`	Set to the hostname or IP address of the active HDFS namenode or the ID of the HDFS nameservice if HDFS High Availability is configured
`HDFS_NAMENODE_PORT`	Set to the port of the active HDFS namenode, or `0` if HDFS High Availability is configured and `HDFS_NAMENODE_ADDRESS` is set to a nameservice ID
`HIVE_SERVER_AUTH_MECHANISM`	If the Cluster Configuration prerequisite shows LDAP is enabled, set to `PLAIN`
`HIVE_SERVER_HOST`	Hadoop node(s) running Impala Frontend Server (impalad). Can be a single entry, a comma-separated list of entries, or a load balancer entry
`HIVE_SERVER_PASS`	If the Cluster Configuration prerequisite shows LDAP is enabled, set to the password of the `HIVE_SERVER_USER` user. Password encryption is supported using the Password Tool utility
`HIVE_SERVER_PORT`	Set to the value obtained from Cluster Configuration prerequisite if different from default value (`21050`)
`HIVE_SERVER_USER`	If the Cluster Configuration prerequisite shows LDAP is enabled, set to the LDAP user with which Gluent Data Platform will authenticate to Impala
`IN_LIST_JOIN_TABLE`	On Cloudera Data Hub versions earlier than 5.10.x this parameter should be uncommented for the Creation of Sequence Table step
`IN_LIST_JOIN_TABLE_SIZE`	On Cloudera Data Hub versions earlier than 5.10.x this parameter should be uncommented for the Creation of Sequence Table step
`KERBEROS_KEYTAB`	If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the full path to the Kerberos keytab file created during the Kerberos prerequisite
`KERBEROS_PRINCIPAL`	If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the Kerberos principal created during the Kerberos prerequisite
`KERBEROS_SERVICE`	If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the value of Kerberos Principal from the Cluster Configuration prerequisite
`KERBEROS_TICKET_CACHE_PATH`	If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the full path of the Kerberos ticket cache file (e.g. `/tmp/krb5cc_54321`)
`OFFLOAD_TRANSPORT_CMD_HOST`	Host for running Offload data transport (Sqoop or Spark Submit) commands
`OFFLOAD_TRANSPORT_PARALLELISM`	Choose a value appropriate to the resources available in both the source RDBMS and target Hadoop cluster. Refer to the Offload Guide for further information
`OFFLOAD_TRANSPORT_USER`	Set to the Gluent Data Platform OS user created in the Provision a Gluent Data Platform OS User prerequisite
`SSL_ACTIVE`	If the Cluster Configuration prerequisite shows SSL is enabled set to `true`
`SSL_TRUSTED_CERTS`	If `SSL_ACTIVE` is `true` set to the value of SSL Certificate from the Cluster Configuration prerequisite
`WEBHDFS_HOST`	Set to the address of the HDFS NameNode. If a High Availability configuration is used, this address can be a Nameservice and Gluent Data Platform will connect to the active NameNode
`WEBHDFS_PORT`	Set to the port of the WebHDFS service
`WEBHDFS_VERIFY_SSL`	Set if WebHDFS is secured with SSL. Valid values are `false` (use SSL but do not verify certificates), `true` (use SSL and verify certificates against default certificates), and a path to a certificate bundle for the HDFS cluster (all nodes)

Note

If using Sqoop for offload transport and the path /user/$OFFLOAD_TRANSPORT_USER will not exist in HDFS then you may also need to override yarn.app.mapreduce.am.staging-dir with a writable location using SQOOP_OVERRIDES as in the example below:

export SQOOP_OVERRIDES='-Dyarn.app.mapreduce.am.staging-dir=/encryption_zone/data/gluent'

Google BigQuery¶

The Google BigQuery parameters that require attention are:

Parameter	Reference
`DATAD_ADDRESS`	Address(es) of Data Daemon(s). Can point to a single daemon or multiple daemons
`GOOGLE_APPLICATION_CREDENTIALS`	Path to Google Service Account private key JSON file
`OFFLOAD_FS_CONTAINER`	The name of the Google Cloud Storage Bucket to be used for staging data during Gluent Orchestration
`OFFLOAD_FS_PREFIX`	An optional path with which to prefix offloaded table paths
`OFFLOAD_TRANSPORT_CMD_HOST`	Gluent Node hostname or IP address
`OFFLOAD_TRANSPORT_PARALLELISM`	Choose a value appropriate to the resources available in both the source RDBMS and Gluent Node. Refer to the Offload Guide for further information
`OFFLOAD_TRANSPORT_SPARK_SUBMIT_EXECUTABLE`	The executable to use for submitting Spark applications. Set to `spark-submit`
`OFFLOAD_TRANSPORT_USER`	Set to gluent if the Gluent Node was provisioned from a Google Cloud Platform Marketplace image. Otherwise the name of the Gluent Data Platform OS User

Propagate to Remaining Oracle RAC Nodes ¶

Perform the following steps on each additional Oracle RAC node on which Gluent Data Platform is installed:

Copy offload.env from first node into $OFFLOAD_HOME/conf
For Oracle Multitenant/Pluggable Database environments update TWO_TASK in offload.env to connect to the local instance

Documentation Feedback ¶

Send feedback on this documentation to: feedback@gluent.com

Gluent Data Platform Environment File Creation¶

Introduction¶

Templates¶

Oracle¶

Backend¶

Cloudera Data Hub¶

Google BigQuery¶

Propagate to Remaining Oracle RAC Nodes¶

Documentation Feedback¶

Introduction ¶

Templates ¶

Oracle ¶

Backend ¶

Propagate to Remaining Oracle RAC Nodes ¶

Documentation Feedback ¶