Gluent Data Platform Environment File Creation¶
Table of Contents
Introduction¶
Gluent Data Platform uses an environment file named offload.env
located in the $OFFLOAD_HOME/conf
directory. The Gluent Data Platform environment file contains configuration key-value pairs that allow Gluent Data Platform components to interact with both the Oracle Database and backend system.
The Gluent Data Platform environment file is initially populated during installation and subsequently modified during upgrade or when environmental changes occur.
The steps required for the creation of the Gluent Data Platform environment file are as follows:
Step |
Necessity |
Details |
---|---|---|
Step 1 |
Mandatory |
Copy the environment file template |
Step 2 |
Mandatory |
Configure Oracle parameters |
Step 3 |
Mandatory |
Configure Backend parameters |
Step 4 |
Optional |
Propagate environment file to remaining Oracle RAC nodes (if Gluent Data Platform is installed on those nodes) |
Note
Gluent Data Platform is highly configurable and there are a number of other options that may be appropriate for an environment. For assistance please contact Gluent Support.
Templates¶
Template environment files are included with Gluent Data Platform. The template is the starting point for the creation of the offload.env
file. The template required depends on the source and backend combination outlined in the table below.
Source |
Backend |
Details |
---|---|---|
Oracle |
Azure Synapse Analytics |
|
Oracle |
Cloudera Data Hub |
|
Oracle |
Cloudera Data Platform |
|
Oracle |
Google BigQuery |
|
Oracle |
Snowflake |
|
On the Oracle Database server navigate to the $OFFLOAD_HOME/conf
directory and copy the correct environment file template, for example:
$ cd $OFFLOAD_HOME/conf/
$ cp oracle-hadoop-offload.env.template offload.env
Oracle¶
Open offload.env
in a text editor.
Note
When offload.env
is copied to the server from an external source then be mindful of potential conversion from UNIX to DOS/MAC file format. A file in DOS/MAC format will cause errors such as syntax error: unexpected end of file
. In these cases the dos2unix
command will resolve the issue.
The Oracle parameters that require attention are:
Parameter |
Reference |
---|---|
Set to |
|
Set to value gathered during the Database Characterset prerequisite action |
|
EZConnect connection string to Oracle database |
|
Username for connecting to Oracle for administration activities from Install Oracle Database Components |
|
Password for |
|
Connection string (typically tnsnames.ora entry) for |
|
Username for connecting to Oracle for read activities from Install Oracle Database Components |
|
Password for |
|
Username for the Gluent Metadata Repository owner from Install Oracle Database Components |
|
For single instance Oracle Multitenant/Pluggable database environments uncomment the preconfigured |
Backend¶
Refer to the appropriate backend distribution section below.
Azure Synapse Analytics¶
The Azure Synapse Analytics parameters that require attention are:
Parameter |
Reference |
---|---|
Name of the Microsoft ODBC driver as specified in |
|
Address(es) of Data Daemon(s). Can point to a single daemon or multiple daemons |
|
Azure storage account access key |
|
Azure storage account name |
|
Domain of the Azure storage service. Set to |
|
Name of the Azure Storage Container to be used for staging data during Gluent Orchestration |
|
Optional path with which to prefix offloaded table paths |
|
Azure storage scheme to be used for staging data during Gluent Orchestration |
|
Gluent Node hostname or IP address |
|
Choose a value appropriate to the resources available in both the source RDBMS and Gluent Node. Refer to the Offload Guide for further information |
|
Executable to use for submitting Spark applications. Set to |
|
Name of the Gluent Data Platform OS User |
|
Dedicated SQL endpoint of Azure Synapse Analytics workspace |
|
Name of the Azure Synapse Analytics SQL Pool Database |
|
Name of the Gluent Data Platform database Role |
|
Azure Synapse Analytics authentication mechanism |
|
For |
|
For |
|
For |
|
For |
|
For |
|
Name of the Azure Synapse Analytics Data Source to be used with Gluent Data Platform |
|
Name of the Azure Synapse Analytics File Format to be used with Gluent Data Platform |
|
ID of the subscription containing the Azure Synapse Analytics workspace |
|
Resource group of Azure Synapse Analytics workspace |
|
Name of the Azure Synapse Analytics workspace |
|
Azure Synapse Analytics collation to use for character columns. Please note that changing this to a value with different behavior to the frontend system may give unexpected results |
Cloudera Data Hub¶
The Cloudera Data Hub parameters that require attention are:
Parameter |
Reference |
---|---|
Address(es) of Data Daemon(s). Can point to a single daemon or multiple daemons |
|
Host for running HDFS command steps. Overrides |
|
Set to the location created for Impala databases containing offloaded data during Create HDFS Directories |
|
Set to the location created for storage of files used by Gluent Data Platform user-defined functions during Create HDFS Directories |
|
Set to the location created as the transient staging area used by the data transport phase of Offload during Create HDFS Directories |
|
Set to the hostname or IP address of the active HDFS namenode or the ID of the HDFS nameservice if HDFS High Availability is configured |
|
Set to the port of the active HDFS namenode, or |
|
If the Cluster Configuration prerequisite shows LDAP is enabled, set to |
|
Hadoop node(s) running Impala Frontend Server (impalad). Can be a single entry, a comma-separated list of entries, or a load balancer entry |
|
If the Cluster Configuration prerequisite shows LDAP is enabled, set to the password of the |
|
Set to the value obtained from Cluster Configuration prerequisite if different from default value ( |
|
If the Cluster Configuration prerequisite shows LDAP is enabled, set to the LDAP user with which Gluent Data Platform will authenticate to Impala |
|
On Cloudera Data Hub versions earlier than 5.10.x this parameter should be uncommented for the Creation of Sequence Table step |
|
On Cloudera Data Hub versions earlier than 5.10.x this parameter should be uncommented for the Creation of Sequence Table step |
|
If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the full path to the Kerberos keytab file created during the Kerberos prerequisite |
|
If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the Kerberos principal created during the Kerberos prerequisite |
|
If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the value of Kerberos Principal from the Cluster Configuration prerequisite |
|
If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the full path of the Kerberos ticket cache file (e.g. |
|
Host for running Offload data transport (Sqoop or Spark Submit) commands |
|
Choose a value appropriate to the resources available in both the source RDBMS and target Hadoop cluster. Refer to the Offload Guide for further information |
|
Set to the Gluent Data Platform OS user created in the Provision a Gluent Data Platform OS User prerequisite |
|
Any additional Sqoop parameters. To avoid offload issues, |
|
If the Cluster Configuration prerequisite shows SSL is enabled set to |
|
If |
|
Set to the address of the HDFS NameNode. If a High Availability configuration is used, this address can be a Nameservice and Gluent Data Platform will connect to the active NameNode |
|
Set to the port of the WebHDFS service |
|
Set if WebHDFS is secured with SSL. Valid values are |
Note
If using Sqoop for offload transport and the path /user/$OFFLOAD_TRANSPORT_USER
will not exist in HDFS then you may also need to override yarn.app.mapreduce.am.staging-dir
with a writable location using SQOOP_OVERRIDES
as in the example below:
export SQOOP_OVERRIDES='-Dsqoop.avro.logical_types.decimal.enable=false -Dyarn.app.mapreduce.am.staging-dir=/encryption_zone/data/gluent'
Review the following for additional parameters that may require attention before continuing with installation steps:
Cloudera Data Platform Private Cloud¶
The Cloudera Data Platform Private Cloud parameters that require attention are:
Parameter |
Reference |
---|---|
Address(es) of Data Daemon(s). Can point to a single daemon or multiple daemons |
|
Host for running HDFS command steps. Overrides |
|
Set to the location created for Impala databases containing offloaded data during Create HDFS Directories |
|
Set to the location created for storage of files used by Gluent Data Platform user-defined functions during Create HDFS Directories |
|
Set to the location created as the transient staging area used by the data transport phase of Offload during Create HDFS Directories |
|
Set to the hostname or IP address of the active HDFS namenode or the ID of the HDFS nameservice if HDFS High Availability is configured |
|
Set to the port of the active HDFS namenode, or |
|
If the Cluster Configuration prerequisite shows LDAP is enabled, set to |
|
Hadoop node(s) running Impala Frontend Server (impalad). Can be a single entry, a comma-separated list of entries, or a load balancer entry |
|
If the Cluster Configuration prerequisite shows LDAP is enabled, set to the password of the |
|
Set to the value obtained from Cluster Configuration prerequisite if different from default value ( |
|
If the Cluster Configuration prerequisite shows LDAP is enabled, set to the LDAP user with which Gluent Data Platform will authenticate to Impala |
|
If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the full path to the Kerberos keytab file created during the Kerberos prerequisite |
|
If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the Kerberos principal created during the Kerberos prerequisite |
|
If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the value of Kerberos Principal from the Cluster Configuration prerequisite |
|
If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the full path of the Kerberos ticket cache file (e.g. |
|
Host for running Offload data transport (Sqoop or Spark Submit) commands |
|
Choose a value appropriate to the resources available in both the source RDBMS and target Hadoop cluster. Refer to the Offload Guide for further information |
|
Set to the Gluent Data Platform OS user created in the Provision a Gluent Data Platform OS User prerequisite |
|
Any additional Sqoop parameters. To avoid offload issues, |
|
If the Cluster Configuration prerequisite shows SSL is enabled set to |
|
If |
|
Set to the address of the HDFS NameNode. If a High Availability configuration is used, this address can be a Nameservice and Gluent Data Platform will connect to the active NameNode |
|
Set to the port of the WebHDFS service |
|
Set if WebHDFS is secured with SSL. Valid values are |
Note
If using Sqoop for offload transport and the path /user/$OFFLOAD_TRANSPORT_USER
will not exist in HDFS then you may also need to override yarn.app.mapreduce.am.staging-dir
with a writable location using SQOOP_OVERRIDES
as in the example below:
export SQOOP_OVERRIDES='-Dsqoop.avro.logical_types.decimal.enable=false -Dyarn.app.mapreduce.am.staging-dir=/encryption_zone/data/gluent'
Review the following for additional parameters that may require attention before continuing with installation steps:
Cloudera Data Platform Public Cloud¶
The Cloudera Data Platform Public Cloud parameters that require attention are:
Parameter |
Reference |
---|---|
If Gluent Query Engine is using Data Warehouse Impala set to the domain of the Data Warehouse Impala endpoint. E.g. |
|
If Gluent Query Engine is using Data Warehouse Impala set to |
|
Address(es) of Data Daemon(s). Can point to a single daemon or multiple daemons |
|
Host for running HDFS command steps. Overrides |
|
Set to the location created as the transient staging area used by the data transport phase of Offload during Create HDFS Directories |
|
Set to the domain of the Data Hub Impala endpoint. E.g. |
|
Set to the path of the Data Hub Impala endpoint. E.g. |
|
Set to the Workload Password of the CDP User for Gluent Data Platform. Password encryption is supported using the Password Tool utility |
|
Set to |
|
Set to |
|
Set to the Workload User Name of the CDP User for Gluent Data Platform |
|
If Gluent Query Engine is using Data Hub Impala set to |
|
Set to the bucket (AWS) or container (Azure) used as the “Storage Location Base” in the Data Lake |
|
Set to the path within the bucket (AWS) or container (Azure) to use as the location for Impala databases containing offloaded data |
|
Set to |
|
Set to the Data Hub host for running Offload data transport (Sqoop or Spark Submit) commands. This is typically a Gateway node but can be any Data Hub node running a YARN role (e.g. YARN ResourceManager) |
|
Choose a value appropriate to the resources available in both the source RDBMS and Data Hub cluster. Refer to the Offload Guide for further information |
|
Set to the Workload User Name of the CDP User for Gluent Data Platform |
|
|
Add this parameter as a new entry: |
Set to |
Google BigQuery¶
The Google BigQuery parameters that require attention are:
Parameter |
Reference |
---|---|
Address(es) of Data Daemon(s). Can point to a single daemon or multiple daemons |
|
Path to Google Service Account private key JSON file |
|
Google Cloud Key Management Service cryptographic key name if customer-managed encryption keys (CMEK) for BigQuery will be used |
|
Google Cloud Key Management Service cryptographic key ring name if customer-managed encryption keys (CMEK) for BigQuery will be used |
|
Google Cloud Key Management Service cryptographic key ring location if customer-managed encryption keys (CMEK) for BigQuery will be used |
|
The name of the Google Cloud Storage Bucket to be used for staging data during Gluent Orchestration |
|
An optional path with which to prefix offloaded table paths |
|
Gluent Node hostname or IP address |
|
Choose a value appropriate to the resources available in both the source RDBMS and Gluent Node. Refer to the Offload Guide for further information |
|
The executable to use for submitting Spark applications. Set to |
|
Set to gluent if the Gluent Node was provisioned from a Google Cloud Platform Marketplace image. Otherwise the name of the Gluent Data Platform OS User |
Snowflake¶
The Snowflake parameters that require attention are:
Parameter |
Reference |
---|---|
Address(es) of Data Daemon(s). Can point to a single daemon or multiple daemons |
|
The name of the cloud storage bucket or container to be used for staging data during Gluent Orchestration |
|
An optional path with which to prefix offloaded table paths |
|
The cloud storage scheme to be used for staging data during Gluent Orchestration |
|
Gluent Node hostname or IP address |
|
Choose a value appropriate to the resources available in both the source RDBMS and Gluent Node. Refer to the Offload Guide for further information |
|
The executable to use for submitting Spark applications. Set to |
|
Set to gluent if the Gluent Node was provisioned from a Google Cloud Platform Marketplace image. Otherwise the name of the Gluent Data Platform OS User |
|
Name of the Snowflake Account to be used with Gluent Data Platform |
|
Name of the Snowflake Database to be used with Gluent Data Platform |
|
Name of the Gluent Data Platform database Role |
|
Name of the Snowflake Warehouse to use with Gluent Data Platform |
|
Name of the Snowflake Storage Integration to use with Gluent Data Platform |
|
Name of storage stages to be created by Gluent Data Platform |
|
Name prefix to use when Gluent Data Platform creates file formats for data loading |
|
Name of the Snowflake User to use to connect to Snowflake for all Gluent Data Platform operations |
|
Password for the |
|
If the |
|
Passphrase for PEM file authentication, if applicable. Password encryption is supported using the Password Tool utility |
One of the following cloud storage vendor credentials needs to be set.
Amazon S3¶
Note
Alternate methods of authentication with Amazon S3 are available and are listed below. Either of these can be used can be used instead of populating offload.env
.
Instance Attached IAM Role (Recommended) |
No configuration is required. Ensure the IAM role is attached to all servers on which Gluent Data Platform is installed |
AWS Credentials File |
Authentication can be configured via a credentials file with |
Access key ID for the service account used to authenticate |
|
Secret access key for the service account used to authenticate |
Google Cloud Storage¶
Parameter |
Reference |
---|---|
Path to Google service account private key JSON file |
Microsoft Azure¶
Storage account access key |
|
Storage account name |
|
Storage account service domain. Set to |
Propagate to Remaining Oracle RAC Nodes¶
Perform the following steps on each additional Oracle RAC node on which Gluent Data Platform is installed:
Copy
offload.env
from first node into$OFFLOAD_HOME/conf
For Oracle Multitenant/Pluggable Database environments update
TWO_TASK
inoffload.env
to connect to the local instance