Gluent Data Platform Environment File Creation¶
Table of Contents
Introduction¶
Gluent Data Platform uses an environment file named offload.env
located in the $OFFLOAD_HOME/conf
directory. The Gluent Data Platform environment file contains configuration key-value pairs that allow Gluent Data Platform components to interact with both the Oracle Database and backend system.
The Gluent Data Platform environment file is initially populated during installation and subsequently modified during upgrade or when environmental changes occur.
The steps required for the creation of the Gluent Data Platform environment file are as follows:
Step |
Necessity |
Details |
---|---|---|
Step 1 |
Mandatory |
Copy the environment file template |
Step 2 |
Mandatory |
Configure Oracle parameters |
Step 3 |
Mandatory |
Configure Backend parameters |
Step 4 |
Optional |
Propagate environment file to remaining Oracle RAC nodes (if Gluent Data Platform is installed on those nodes) |
Note
Gluent Data Platform is highly configurable and there are a number of other options that may be appropriate for an environment. For assistance please contact Gluent Support.
Templates¶
Template environment files are included with Gluent Data Platform. The template is the starting point for the creation of the offload.env
file. The template required depends on the source and backend combination outlined in the table below.
Source |
Backend |
Details |
---|---|---|
Oracle |
Cloudera Data Hub |
|
Oracle |
Google BigQuery |
|
On the Oracle Database server navigate to the $OFFLOAD_HOME/conf
directory and copy the correct environment file template, for example:
$ cd $OFFLOAD_HOME/conf/
$ cp oracle-hadoop-offload.env.template offload.env
Oracle¶
Open offload.env
in a text editor.
Note
When offload.env
is copied to the server from an external source then be mindful of potential conversion from UNIX to DOS/MAC file format. A file in DOS/MAC format will cause errors such as syntax error: unexpected end of file
. In these cases the dos2unix
command will resolve the issue.
The Oracle parameters that require attention are:
Parameter |
Reference |
---|---|
Set to |
|
Set to value gathered during the Database Characterset prerequisite action |
|
EZConnect connection string to Oracle database |
|
Username for connecting to Oracle for administration activities from Install Oracle Database Components |
|
Password for |
|
Username for connecting to Oracle for read activities from Install Oracle Database Components |
|
Password for |
|
Username for the Gluent Metadata Repository owner from Install Oracle Database Components |
|
For single instance Oracle Multitenant/Pluggable database environments uncomment the preconfigured |
Backend¶
Refer to the appropriate backend distribution section below.
Cloudera Data Hub¶
The Cloudera Data Hub parameters that require attention are:
Parameter |
Reference |
---|---|
Address(es) of Data Daemon(s). Can point to a single daemon or multiple daemons |
|
Set to the location created for Impala databases containing offloaded data during Create HDFS Directories |
|
Set to the location created for storage of files used by Gluent Data Platform user-defined functions during Create HDFS Directories |
|
Set to the hostname or IP address of the active HDFS namenode or the ID of the HDFS nameservice if HDFS High Availability is configured |
|
Set to the port of the active HDFS namenode, or |
|
If the Cluster Configuration prerequisite shows LDAP is enabled, set to |
|
Hadoop node(s) running Impala Frontend Server (impalad). Can be a single entry, a comma-separated list of entries, or a load balancer entry |
|
If the Cluster Configuration prerequisite shows LDAP is enabled, set to the password of the |
|
Set to the value obtained from Cluster Configuration prerequisite if different from default value ( |
|
If the Cluster Configuration prerequisite shows LDAP is enabled, set to the LDAP user with which Gluent Data Platform will authenticate to Impala |
|
On Cloudera Data Hub versions earlier than 5.10.x this parameter should be uncommented for the Creation of Sequence Table step |
|
On Cloudera Data Hub versions earlier than 5.10.x this parameter should be uncommented for the Creation of Sequence Table step |
|
If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the full path to the Kerberos keytab file created during the Kerberos prerequisite |
|
If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the Kerberos principal created during the Kerberos prerequisite |
|
If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the value of Kerberos Principal from the Cluster Configuration prerequisite |
|
If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the full path of the Kerberos ticket cache file (e.g. |
|
Host for running Offload data transport (Sqoop or Spark Submit) commands |
|
Choose a value appropriate to the resources available in both the source RDBMS and target Hadoop cluster. Refer to the Offload Guide for further information |
|
Set to the Gluent Data Platform OS user created in the Provision a Gluent Data Platform OS User prerequisite |
|
If the Cluster Configuration prerequisite shows SSL is enabled set to |
|
If |
|
Set to the address of the HDFS NameNode. If a High Availability configuration is used, this address can be a Nameservice and Gluent Data Platform will connect to the active NameNode |
|
Set to the port of the WebHDFS service |
|
Set if WebHDFS is secured with SSL. Valid values are |
Note
If using Sqoop for offload transport and the path /user/$OFFLOAD_TRANSPORT_USER
will not exist in HDFS then you may also need to override yarn.app.mapreduce.am.staging-dir
with a writable location using SQOOP_OVERRIDES
as in the example below:
export SQOOP_OVERRIDES='-Dyarn.app.mapreduce.am.staging-dir=/encryption_zone/data/gluent'
Google BigQuery¶
The Google BigQuery parameters that require attention are:
Parameter |
Reference |
---|---|
Address(es) of Data Daemon(s). Can point to a single daemon or multiple daemons |
|
Path to Google Service Account private key JSON file |
|
The name of the Google Cloud Storage Bucket to be used for staging data during Gluent Orchestration |
|
An optional path with which to prefix offloaded table paths |
|
Gluent Node hostname or IP address |
|
Choose a value appropriate to the resources available in both the source RDBMS and Gluent Node. Refer to the Offload Guide for further information |
|
The executable to use for submitting Spark applications. Set to |
|
Set to gluent if the Gluent Node was provisioned from a Google Cloud Platform Marketplace image. Otherwise the name of the Gluent Data Platform OS User |
Propagate to Remaining Oracle RAC Nodes¶
Perform the following steps on each additional Oracle RAC node on which Gluent Data Platform is installed:
Copy
offload.env
from first node into$OFFLOAD_HOME/conf
For Oracle Multitenant/Pluggable Database environments update
TWO_TASK
inoffload.env
to connect to the local instance