Cloudera Data Platform Private Cloud Prerequisites

Introduction

This document includes the prerequisite steps for Cloudera Data Platform Private Cloud.

Cluster Configuration

A number of cluster options will need specific Gluent Data Platform configuration parameters to be set when completing the Gluent Data Platform Environment File Creation installation step. Confirm these values as follows and note them down:

Option

Cloudera Manager Location

DataNode Data Transfer Protection

Clusters → Cluster Name → HDFS Service → Configuration → Search (dfs.data.transfer.protection)

Data Transfer Encryption

Clusters → Cluster Name → HDFS Service → Configuration → Search (dfs.encrypt.data.transfer)

Hadoop RPC Protection

Clusters → Cluster Name → HDFS Service → Configuration → Search (hadoop.rpc.protection)

HDFS High Availability

Clusters → Cluster Name → HDFS Service → Instances → Federation and High Availability

Impala HS2 Port

Clusters → Cluster Name → Impala Service → Configuration → Search (hs2_port)

Kerberos Enabled

Administration → Security

Kerberos Principal

Clusters → Cluster Name → Impala Service → Configuration → Search (Kerberos Principal)

LDAP Enabled

Clusters → Cluster Name → Impala Service → Configuration → Search (enable_ldap_auth)

Ranger Enabled

Clusters → Cluster Name → Impala Service → Configuration → Search (Ranger Service)

SSL Certificate

Clusters → Cluster Name → Impala Service → Configuration → Search (ssl_server_certificate)

SSL Enabled

Clusters → Cluster Name → Impala Service → Configuration → Search (client_services_ssl_enabled)

Provision a Gluent Data Platform OS User

A Gluent Data Platform OS user (assumed to be gluent for the remainder of this document) is required on the Hadoop node(s) on which Gluent Offload Engine commands will be initiated.

This user should be provisioned using the appropriate method for the environment, e.g. LDAP, local users, etc. There are no specific group membership requirements for this user.

Verify the user is present using the following command:

$ id gluent

Storage Requirements

Note

This prerequisite is needed only if Gluent Data Platform is to be installed on Hadoop node(s).

A filesystem location must be created for Gluent Data Platform installation.

Gluent Data Platform occupies approximately 1GB of storage once unpacked.

During operation, Gluent Data Platform will write log and trace files within its installation directory. Sufficient space will need to be allocated for continuing operations.

The filesystem location must be owned by the provisioned Gluent Data Platform OS user.

Default Shell

The owner of the Gluent Data Platform software requires the Bash shell. The outcome of the following should be bash for that user:

$ echo $SHELL

Create HDFS Directories

Gluent Data Platform requires up to three locations within HDFS depending on the use of cloud storage:

Parameter

Purpose

Necessity

Required Permissions

Default Location

HDFS_DATA

Stores a persistent copy of data offloaded from Oracle Database, and Incremental Update metadata

Mandatory if any data is to be persisted in HDFS or Incremental Update will be used

Read, write for HADOOP_SSH_USER
Read, write for hive group

/user/gluent/offload

HDFS_HOME

Stores the Gluent UDF library file

Mandatory if UDFs are to be based in HDFS

Read, write for HADOOP_SSH_USER
Read for hive group

/user/gluent

HDFS_LOAD

Transient staging area used by the data transport phase of Offload

Mandatory

Read, write for HADOOP_SSH_USER
Read for hive group

/user/gluent/offload

The steps to create the default locations with the correct permissions are detailed below.

Create gluent directory in HDFS (as hdfs):

hdfs dfs -mkdir /user/gluent

Change ownership of gluent directory (as hdfs):

hdfs dfs -chown gluent:hive /user/gluent

Create offload directory (as gluent):

hdfs dfs -mkdir /user/gluent/offload

Change permissions on offload directory to allow group write (as gluent):

hdfs dfs -chmod 770 /user/gluent/offload

Verify permissions on offload directory (as gluent):

hdfs dfs -ls -d /user/gluent/offload

Note

The offload directory should be group writable, i.e., the final ls command above should show permissions of drwxrwx---.

Oracle JDBC Drivers

Oracle’s JDBC driver should be downloaded from Oracle's JDBC and UCP Downloads page and installed to the location shown below. The location is dependent on the method that will be used by Offload to Transport Data to Staging. The driver should be installed on all nodes where offload transport jobs will be initiated. Ensure the file permissions are world readable.

Offload Transport Method

Location

Sqoop

/var/lib/sqoop

Spark

$SPARK_HOME/jars

Sqoop

If Sqoop will be used to Transport Data to Staging then save the example command below into a temporary script (e.g. gl_sqoop.sh) and modify the placeholders in --connect, --username, --password and --target-dir with appropriate environment values:

gluent$ sqoop import -Doracle.sessionTimeZone=UTC \
-Doraoop.timestamp.string=true \
-Doraoop.jdbc.url.verbatim=true \
--connect \
jdbc:oracle:thin:@<db_host|vip>:<port>/<service> \
--username <database_username> \
--password $'<database_password>' \
--table SYS.DBA_OBJECTS \
--split-by OBJECT_ID \
--target-dir=/user/gluent/offload/test \
--delete-target-dir \
-m4 \
--direct \
--as-avrodatafile \
--outdir=.glsqoop

Note

If the database password contains a single-quote character (‘) then this must be escaped with a backslash.

Run the test Sqoop job (as gluent) from the node to which the Oracle JDBC Drivers were copied:

$ ./gl_sqoop.sh

Verify the test Sqoop job completes without error.

Oracle OS Package

If Gluent Data Platform is to be installed on a Hadoop node install the operating system libaio package on that node if it is not already present (as root):

# yum install libaio

Kerberos

Note

This prerequisite is needed in a Kerberized cluster only if Gluent Data Platform is to be installed on a Hadoop node or if HDFS commands are to be run from a Hadoop node.

The keytab of the Kerberos principal that will be used to authenticate must be accessible by the Gluent Data Platform OS user on the Hadoop node.

Verify that a Kerberos ticket can be obtained for the principal and keytab created (as gluent):

$ kinit -kt <path_to_keytab_file> <principal_name>
$ klist

Documentation Feedback

Send feedback on this documentation to: feedback@gluent.com