Gluent Data Platform Environment File Creation

Introduction

Gluent Data Platform uses an environment file named offload.env located in the $OFFLOAD_HOME/conf directory. The Gluent Data Platform environment file contains configuration key-value pairs that allow Gluent Data Platform components to interact with both the Oracle Database and backend system.

The Gluent Data Platform environment file is initially populated during installation and subsequently modified during upgrade or when environmental changes occur.

The steps required for the creation of the Gluent Data Platform environment file are as follows:

Step

Necessity

Details

Step 1

Mandatory

Copy the environment file template

Step 2

Mandatory

Configure Oracle parameters

Step 3

Mandatory

Configure Backend parameters

Step 4

Optional

Propagate environment file to remaining Oracle RAC nodes (if Gluent Data Platform is installed on those nodes)

Note

Gluent Data Platform is highly configurable and there are a number of other options that may be appropriate for an environment. For assistance please contact Gluent Support.

Templates

Template environment files are included with Gluent Data Platform. The template is the starting point for the creation of the offload.env file. The template required depends on the source and backend combination outlined in the table below.

Source

Backend

Details

Oracle

Cloudera Data Hub

oracle-hadoop-offload.env.template

Oracle

Cloudera Data Platform

oracle-hadoop-offload.env.template

Oracle

Google BigQuery

oracle-bigquery-offload.env.template

Oracle

Snowflake

oracle-snowflake-offload.env.template

On the Oracle Database server navigate to the $OFFLOAD_HOME/conf directory and copy the correct environment file template, for example:

$ cd $OFFLOAD_HOME/conf/
$ cp oracle-hadoop-offload.env.template offload.env

Oracle

Open offload.env in a text editor.

Note

When offload.env is copied to the server from an external source then be mindful of potential conversion from UNIX to DOS/MAC file format. A file in DOS/MAC format will cause errors such as syntax error: unexpected end of file. In these cases the dos2unix command will resolve the issue.

The Oracle parameters that require attention are:

Parameter

Reference

METAD_AUTOSTART

Set to false if the outcome of the Metadata Daemon OS User prerequisite action was root

NLS_LANG

Set to value gathered during the Database Characterset prerequisite action

ORA_CONN

EZConnect connection string to Oracle database

ORA_ADM_USER

Username for connecting to Oracle for administration activities from Install Oracle Database Components

ORA_ADM_PASS

Password for ORA_ADM_USER from Set Passwords for Gluent Oracle Database Users

ORA_APP_USER

Username for connecting to Oracle for read activities from Install Oracle Database Components

ORA_APP_PASS

Password for ORA_APP_USER from Set Passwords for Gluent Oracle Database Users

ORA_REPO_USER

Username for the Gluent Metadata Repository owner from Install Oracle Database Components

TWO_TASK

For single instance Oracle Multitenant/Pluggable database environments uncomment the preconfigured TWO_TASK variable. For RAC Multitenant/Pluggable database environments TWO_TASK should be set to an EZconnect string connecting to the local instance, typically <hostname>:<port>/<ORACLE_SID>

Backend

Refer to the appropriate backend distribution section below.

Cloudera Data Hub

The Cloudera Data Hub parameters that require attention are:

Parameter

Reference

DATAD_ADDRESS

Address(es) of Data Daemon(s). Can point to a single daemon or multiple daemons

HDFS_CMD_HOST

Host for running HDFS command steps. Overrides HIVE_SERVER_HOST if set

HDFS_DATA

Set to the location created for Impala databases containing offloaded data during Create HDFS Directories

HDFS_HOME

Set to the location created for storage of files used by Gluent Data Platform user-defined functions during Create HDFS Directories

HDFS_LOAD

Set to the location created as the transient staging area used by the data transport phase of Offload during Create HDFS Directories

HDFS_NAMENODE_ADDRESS

Set to the hostname or IP address of the active HDFS namenode or the ID of the HDFS nameservice if HDFS High Availability is configured

HDFS_NAMENODE_PORT

Set to the port of the active HDFS namenode, or 0 if HDFS High Availability is configured and HDFS_NAMENODE_ADDRESS is set to a nameservice ID

HIVE_SERVER_AUTH_MECHANISM

If the Cluster Configuration prerequisite shows LDAP is enabled, set to PLAIN

HIVE_SERVER_HOST

Hadoop node(s) running Impala Frontend Server (impalad). Can be a single entry, a comma-separated list of entries, or a load balancer entry

HIVE_SERVER_PASS

If the Cluster Configuration prerequisite shows LDAP is enabled, set to the password of the HIVE_SERVER_USER user. Password encryption is supported using the Password Tool utility

HIVE_SERVER_PORT

Set to the value obtained from Cluster Configuration prerequisite if different from default value (21050)

HIVE_SERVER_USER

If the Cluster Configuration prerequisite shows LDAP is enabled, set to the LDAP user with which Gluent Data Platform will authenticate to Impala

IN_LIST_JOIN_TABLE

On Cloudera Data Hub versions earlier than 5.10.x this parameter should be uncommented for the Creation of Sequence Table step

IN_LIST_JOIN_TABLE_SIZE

On Cloudera Data Hub versions earlier than 5.10.x this parameter should be uncommented for the Creation of Sequence Table step

KERBEROS_KEYTAB

If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the full path to the Kerberos keytab file created during the Kerberos prerequisite

KERBEROS_PRINCIPAL

If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the Kerberos principal created during the Kerberos prerequisite

KERBEROS_SERVICE

If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the value of Kerberos Principal from the Cluster Configuration prerequisite

KERBEROS_TICKET_CACHE_PATH

If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the full path of the Kerberos ticket cache file (e.g. /tmp/krb5cc_54321)

OFFLOAD_TRANSPORT_CMD_HOST

Host for running Offload data transport (Sqoop or Spark Submit) commands

OFFLOAD_TRANSPORT_PARALLELISM

Choose a value appropriate to the resources available in both the source RDBMS and target Hadoop cluster. Refer to the Offload Guide for further information

OFFLOAD_TRANSPORT_USER

Set to the Gluent Data Platform OS user created in the Provision a Gluent Data Platform OS User prerequisite

SQOOP_OVERRIDES

If OFFLOAD_TRANSPORT_CMD_HOST is using the Sqoop 1 Client and “Enable avro logical types” is true, then set this to "-Dsqoop.avro.logical_types.decimal.enable=false"

SSL_ACTIVE

If the Cluster Configuration prerequisite shows SSL is enabled set to true

SSL_TRUSTED_CERTS

If SSL_ACTIVE is true set to the value of SSL Certificate from the Cluster Configuration prerequisite

WEBHDFS_HOST

Set to the address of the HDFS NameNode. If a High Availability configuration is used, this address can be a Nameservice and Gluent Data Platform will connect to the active NameNode

WEBHDFS_PORT

Set to the port of the WebHDFS service

WEBHDFS_VERIFY_SSL

Set if WebHDFS is secured with SSL. Valid values are false (use SSL but do not verify certificates), true (use SSL and verify certificates against default certificates), and a path to a certificate bundle for the HDFS cluster (all nodes)

Note

If using Sqoop for offload transport and the path /user/$OFFLOAD_TRANSPORT_USER will not exist in HDFS then you may also need to override yarn.app.mapreduce.am.staging-dir with a writable location using SQOOP_OVERRIDES as in the example below:

export SQOOP_OVERRIDES='-Dyarn.app.mapreduce.am.staging-dir=/encryption_zone/data/gluent'

Review the following for additional parameters that may require attention before continuing with installation steps:

Cloudera Data Platform Private Cloud

The Cloudera Data Platform Private Cloud parameters that require attention are:

Parameter

Reference

DATAD_ADDRESS

Address(es) of Data Daemon(s). Can point to a single daemon or multiple daemons

HDFS_CMD_HOST

Host for running HDFS command steps. Overrides HIVE_SERVER_HOST if set

HDFS_DATA

Set to the location created for Impala databases containing offloaded data during Create HDFS Directories

HDFS_HOME

Set to the location created for storage of files used by Gluent Data Platform user-defined functions during Create HDFS Directories

HDFS_LOAD

Set to the location created as the transient staging area used by the data transport phase of Offload during Create HDFS Directories

HDFS_NAMENODE_ADDRESS

Set to the hostname or IP address of the active HDFS namenode or the ID of the HDFS nameservice if HDFS High Availability is configured

HDFS_NAMENODE_PORT

Set to the port of the active HDFS namenode, or 0 if HDFS High Availability is configured and HDFS_NAMENODE_ADDRESS is set to a nameservice ID

HIVE_SERVER_AUTH_MECHANISM

If the Cluster Configuration prerequisite shows LDAP is enabled, set to PLAIN

HIVE_SERVER_HOST

Hadoop node(s) running Impala Frontend Server (impalad). Can be a single entry, a comma-separated list of entries, or a load balancer entry

HIVE_SERVER_PASS

If the Cluster Configuration prerequisite shows LDAP is enabled, set to the password of the HIVE_SERVER_USER user. Password encryption is supported using the Password Tool utility

HIVE_SERVER_PORT

Set to the value obtained from Cluster Configuration prerequisite if different from default value (21050)

HIVE_SERVER_USER

If the Cluster Configuration prerequisite shows LDAP is enabled, set to the LDAP user with which Gluent Data Platform will authenticate to Impala

KERBEROS_KEYTAB

If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the full path to the Kerberos keytab file created during the Kerberos prerequisite

KERBEROS_PRINCIPAL

If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the Kerberos principal created during the Kerberos prerequisite

KERBEROS_SERVICE

If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the value of Kerberos Principal from the Cluster Configuration prerequisite

KERBEROS_TICKET_CACHE_PATH

If the Cluster Configuration prerequisite shows Kerberos is enabled, set to the full path of the Kerberos ticket cache file (e.g. /tmp/krb5cc_54321)

OFFLOAD_TRANSPORT_CMD_HOST

Host for running Offload data transport (Sqoop or Spark Submit) commands

OFFLOAD_TRANSPORT_PARALLELISM

Choose a value appropriate to the resources available in both the source RDBMS and target Hadoop cluster. Refer to the Offload Guide for further information

OFFLOAD_TRANSPORT_USER

Set to the Gluent Data Platform OS user created in the Provision a Gluent Data Platform OS User prerequisite

SQOOP_OVERRIDES

If OFFLOAD_TRANSPORT_CMD_HOST is using the Sqoop 1 Client and “Enable avro logical types” is true, then set this to "-Dsqoop.avro.logical_types.decimal.enable=false"

SMART_CONNECTOR_OPTIONS

Add this parameter as a new entry: export SMART_CONNECTOR_OPTIONS=-no-result-cache

SSL_ACTIVE

If the Cluster Configuration prerequisite shows SSL is enabled set to true

SSL_TRUSTED_CERTS

If SSL_ACTIVE is true set to the value of SSL Certificate from the Cluster Configuration prerequisite

WEBHDFS_HOST

Set to the address of the HDFS NameNode. If a High Availability configuration is used, this address can be a Nameservice and Gluent Data Platform will connect to the active NameNode

WEBHDFS_PORT

Set to the port of the WebHDFS service

WEBHDFS_VERIFY_SSL

Set if WebHDFS is secured with SSL. Valid values are false (use SSL but do not verify certificates), true (use SSL and verify certificates against default certificates), and a path to a certificate bundle for the HDFS cluster (all nodes)

Note

If using Sqoop for offload transport and the path /user/$OFFLOAD_TRANSPORT_USER will not exist in HDFS then you may also need to override yarn.app.mapreduce.am.staging-dir with a writable location using SQOOP_OVERRIDES as in the example below:

export SQOOP_OVERRIDES='-Dyarn.app.mapreduce.am.staging-dir=/encryption_zone/data/gluent'

Review the following for additional parameters that may require attention before continuing with installation steps:

Cloudera Data Platform Public Cloud

The Cloudera Data Platform Public Cloud parameters that require attention are:

Parameter

Reference

CONNECTOR_HIVE_SERVER_HOST

If Gluent Query Engine is using Data Warehouse Impala set to the domain of the Data Warehouse Impala endpoint. E.g. coordinator-dw.env-rh6wcb.dw.z0if-2pfm.cloudera.site

CONNECTOR_HIVE_SERVER_HTTP_PATH

If Gluent Query Engine is using Data Warehouse Impala set to cliservice/impala

DATAD_ADDRESS

Address(es) of Data Daemon(s). Can point to a single daemon or multiple daemons

HDFS_CMD_HOST

Host for running HDFS command steps. Overrides HIVE_SERVER_HOST if set.

HDFS_HOME

Set to the same location as HDFS_LOAD

HDFS_LOAD

Set to the location created as the transient staging area used by the data transport phase of Offload during Create HDFS Directories

HIVE_SERVER_HOST

Set to the domain of the Data Hub Impala endpoint. E.g. dh-master10.env.z0if-2pfm.cloudera.site

HIVE_SERVER_HTTP_PATH

Set to the path of the Data Hub Impala endpoint. E.g. data-hub/cdp-proxy-api/impala

HIVE_SERVER_PASS

Set to the Workload Password of the CDP User for Gluent Data Platform. Password encryption is supported using the Password Tool utility

HIVE_SERVER_PORT

Set to 443

HIVE_SERVER_HTTP_TRANSPORT

Set to true

HIVE_SERVER_USER

Set to the Workload User Name of the CDP User for Gluent Data Platform

HS2_SESSION_PARAMS

If Gluent Query Engine is using Data Hub Impala set to SPOOL_QUERY_RESULTS=1

OFFLOAD_FS_CONTAINER

Set to the bucket (AWS) or container (Azure) used as the “Storage Location Base” in the Data Lake

OFFLOAD_FS_PREFIX

Set to the path within the bucket (AWS) or container (Azure) to use as the location for Impala databases containing offloaded data

OFFLOAD_FS_SCHEME

Set to s3a (AWS) or abfs (Azure)

OFFLOAD_TRANSPORT_CMD_HOST

Set to the Data Hub host for running Offload data transport (Sqoop or Spark Submit) commands. This is typically a Gateway node but can be any Data Hub node running a YARN role (e.g. YARN ResourceManager)

OFFLOAD_TRANSPORT_PARALLELISM

Choose a value appropriate to the resources available in both the source RDBMS and Data Hub cluster. Refer to the Offload Guide for further information

OFFLOAD_TRANSPORT_USER

Set to the Workload User Name of the CDP User for Gluent Data Platform

SQOOP_OVERRIDES

If OFFLOAD_TRANSPORT_CMD_HOST is using the Sqoop 1 Client and “Enable avro logical types” is true, then set this to "-Dsqoop.avro.logical_types.decimal.enable=false"

SMART_CONNECTOR_OPTIONS

Add this parameter as a new entry: export SMART_CONNECTOR_OPTIONS=-no-result-cache

SSL_ACTIVE

Set to true

Google BigQuery

The Google BigQuery parameters that require attention are:

Parameter

Reference

DATAD_ADDRESS

Address(es) of Data Daemon(s). Can point to a single daemon or multiple daemons

GOOGLE_APPLICATION_CREDENTIALS

Path to Google Service Account private key JSON file

OFFLOAD_FS_CONTAINER

The name of the Google Cloud Storage Bucket to be used for staging data during Gluent Orchestration

OFFLOAD_FS_PREFIX

An optional path with which to prefix offloaded table paths

OFFLOAD_TRANSPORT_CMD_HOST

Gluent Node hostname or IP address

OFFLOAD_TRANSPORT_PARALLELISM

Choose a value appropriate to the resources available in both the source RDBMS and Gluent Node. Refer to the Offload Guide for further information

OFFLOAD_TRANSPORT_SPARK_SUBMIT_EXECUTABLE

The executable to use for submitting Spark applications. Set to spark-submit

OFFLOAD_TRANSPORT_USER

Set to gluent if the Gluent Node was provisioned from a Google Cloud Platform Marketplace image. Otherwise the name of the Gluent Data Platform OS User

Snowflake

The Snowflake parameters that require attention are:

Parameter

Reference

DATAD_ADDRESS

Address(es) of Data Daemon(s). Can point to a single daemon or multiple daemons

OFFLOAD_FS_CONTAINER

The name of the cloud storage bucket or container to be used for staging data during Gluent Orchestration

OFFLOAD_FS_PREFIX

An optional path with which to prefix offloaded table paths

OFFLOAD_FS_SCHEME

The cloud storage scheme to be used for staging data during Gluent Orchestration

OFFLOAD_TRANSPORT_CMD_HOST

Gluent Node hostname or IP address

OFFLOAD_TRANSPORT_PARALLELISM

Choose a value appropriate to the resources available in both the source RDBMS and Gluent Node. Refer to the Offload Guide for further information

OFFLOAD_TRANSPORT_SPARK_SUBMIT_EXECUTABLE

The executable to use for submitting Spark applications. Set to spark-submit

OFFLOAD_TRANSPORT_USER

Set to gluent if the Gluent Node was provisioned from a Google Cloud Platform Marketplace image. Otherwise the name of the Gluent Data Platform OS User

SNOWFLAKE_ACCOUNT

Name of the Snowflake Account to be used with Gluent Data Platform

SNOWFLAKE_DATABASE

Name of the Snowflake Database to be used with Gluent Data Platform

SNOWFLAKE_ROLE

Name of the Gluent Data Platform database Role

SNOWFLAKE_WAREHOUSE

Name of the Snowflake Warehouse to use with Gluent Data Platform

SNOWFLAKE_INTEGRATION

Name of the Snowflake Storage Integration to use with Gluent Data Platform

SNOWFLAKE_STAGE

Name of storage stages to be created by Gluent Data Platform

SNOWFLAKE_FILE_FORMAT_PREFIX

Name prefix to use when Gluent Data Platform creates file formats for data loading

SNOWFLAKE_USER

Name of the Snowflake User to use to connect to Snowflake for all Gluent Data Platform operations

SNOWFLAKE_PASS

Password for the SNOWFLAKE_USER, if applicable. Password encryption is supported using the Password Tool utility

SNOWFLAKE_PEM_FILE

If the SNOWFLAKE_USER was created with RSA key-based file authentication, path to the PEM file

SNOWFLAKE_PEM_PASSPHRASE

Passphrase for PEM file authentication, if applicable. Password encryption is supported using the Password Tool utility

One of the following cloud storage vendor credentials needs to be set.

Amazon S3

Note

Alternate methods of authentication with Amazon S3 are available and are listed below. Either of these can be used can be used instead of populating offload.env.

Instance Attached IAM Role (Recommended)

No configuration is required. Ensure the IAM role is attached to all servers on which Gluent Data Platform is installed

AWS Credentials File

Authentication can be configured via a credentials file with aws configure. Ensure this is done on all servers on which Gluent Data Platform is installed

AWS_ACCESS_KEY_ID

Access key ID for the service account used to authenticate

AWS_SECRET_ACCESS_KEY

Secret access key for the service account used to authenticate

Google Cloud Storage

Parameter

Reference

GOOGLE_APPLICATION_CREDENTIALS

Path to Google service account private key JSON file

Microsoft Azure

OFFLOAD_FS_AZURE_ACCOUNT_KEY

Storage account access key

OFFLOAD_FS_AZURE_ACCOUNT_NAME

Storage account name

OFFLOAD_FS_AZURE_ACCOUNT_DOMAIN

Storage account service domain. Set to blob.core.windows.net

Propagate to Remaining Oracle RAC Nodes

Perform the following steps on each additional Oracle RAC node on which Gluent Data Platform is installed:

  1. Copy offload.env from first node into $OFFLOAD_HOME/conf

  2. For Oracle Multitenant/Pluggable Database environments update TWO_TASK in offload.env to connect to the local instance

Documentation Feedback

Send feedback on this documentation to: feedback@gluent.com