Cloudera Data Platform Private Cloud Prerequisites¶
Table of Contents
Introduction¶
This document includes the prerequisite steps for Cloudera Data Platform Private Cloud.
Cluster Configuration¶
A number of cluster options will need specific Gluent Data Platform configuration parameters to be set when completing the Gluent Data Platform Environment File Creation installation step. Confirm these values as follows and note them down:
Option |
Cloudera Manager Location |
---|---|
DataNode Data Transfer Protection |
Clusters → Cluster Name → HDFS Service → Configuration → Search (dfs.data.transfer.protection) |
Data Transfer Encryption |
Clusters → Cluster Name → HDFS Service → Configuration → Search (dfs.encrypt.data.transfer) |
Hadoop RPC Protection |
Clusters → Cluster Name → HDFS Service → Configuration → Search (hadoop.rpc.protection) |
HDFS High Availability |
Clusters → Cluster Name → HDFS Service → Instances → Federation and High Availability |
Impala HS2 Port |
Clusters → Cluster Name → Impala Service → Configuration → Search (hs2_port) |
Kerberos Enabled |
Administration → Security |
Kerberos Principal |
Clusters → Cluster Name → Impala Service → Configuration → Search (Kerberos Principal) |
LDAP Enabled |
Clusters → Cluster Name → Impala Service → Configuration → Search (enable_ldap_auth) |
Ranger Enabled |
Clusters → Cluster Name → Impala Service → Configuration → Search (Ranger Service) |
SSL Certificate |
Clusters → Cluster Name → Impala Service → Configuration → Search (ssl_server_certificate) |
SSL Enabled |
Clusters → Cluster Name → Impala Service → Configuration → Search (client_services_ssl_enabled) |
Provision a Gluent Data Platform OS User¶
A Gluent Data Platform OS user (assumed to be gluent for the remainder of this document) is required on the Hadoop node(s) on which Gluent Offload Engine commands will be initiated.
This user should be provisioned using the appropriate method for the environment, e.g. LDAP, local users, etc. There are no specific group membership requirements for this user.
Verify the user is present using the following command:
$ id gluent
Storage Requirements¶
Note
This prerequisite is needed only if Gluent Data Platform is to be installed on Hadoop node(s).
A filesystem location must be created for Gluent Data Platform installation.
Gluent Data Platform occupies approximately 1GB of storage once unpacked.
During operation, Gluent Data Platform will write log and trace files within its installation directory. Sufficient space will need to be allocated for continuing operations.
The filesystem location must be owned by the provisioned Gluent Data Platform OS user.
Default Shell¶
The owner of the Gluent Data Platform software requires the Bash shell. The outcome of the following should be bash
for that user:
$ echo $SHELL
Create HDFS Directories¶
Gluent Data Platform requires up to three locations within HDFS depending on the use of cloud storage:
Parameter |
Purpose |
Necessity |
Required Permissions |
Default Location |
---|---|---|---|---|
Stores a persistent copy of data offloaded from Oracle Database, and Incremental Update metadata |
Mandatory if any data is to be persisted in HDFS or Incremental Update will be used |
Read, write for
HADOOP_SSH_USER Read, write for
hive group |
|
|
Stores the Gluent UDF library file |
Mandatory if UDFs are to be based in HDFS |
Read, write for
HADOOP_SSH_USER Read for
hive group |
|
|
Transient staging area used by the data transport phase of Offload |
Mandatory |
Read, write for
HADOOP_SSH_USER Read for
hive group |
|
The steps to create the default locations with the correct permissions are detailed below.
Create gluent
directory in HDFS (as hdfs):
hdfs dfs -mkdir /user/gluent
Change ownership of gluent directory (as hdfs):
hdfs dfs -chown gluent:hive /user/gluent
Create offload directory (as gluent):
hdfs dfs -mkdir /user/gluent/offload
Change permissions on offload directory to allow group write (as gluent):
hdfs dfs -chmod 770 /user/gluent/offload
Verify permissions on offload directory (as gluent):
hdfs dfs -ls -d /user/gluent/offload
Note
The offload
directory should be group writable, i.e., the final ls
command above should show permissions of drwxrwx---
.
Oracle JDBC Drivers¶
Oracle’s JDBC driver should be downloaded from Oracle's JDBC and UCP Downloads page and installed to the location shown below. The location is dependent on the method that will be used by Offload to Transport Data to Staging. The driver should be installed on all nodes where offload transport jobs will be initiated. Ensure the file permissions are world readable.
Offload Transport Method |
Location |
---|---|
Sqoop |
|
Spark |
|
Sqoop¶
If Sqoop will be used to Transport Data to Staging then save the example command below into a temporary script (e.g. gl_sqoop.sh
) and modify the placeholders in --connect
, --username
, --password
and --target-dir
with appropriate environment values:
gluent$ sqoop import -Doracle.sessionTimeZone=UTC \
-Doraoop.timestamp.string=true \
-Doraoop.jdbc.url.verbatim=true \
--connect \
jdbc:oracle:thin:@<db_host|vip>:<port>/<service> \
--username <database_username> \
--password $'<database_password>' \
--table SYS.DBA_OBJECTS \
--split-by OBJECT_ID \
--target-dir=/user/gluent/offload/test \
--delete-target-dir \
-m4 \
--direct \
--as-avrodatafile \
--outdir=.glsqoop
Note
If the database password contains a single-quote character (‘) then this must be escaped with a backslash.
Run the test Sqoop job (as gluent) from the node to which the Oracle JDBC Drivers were copied:
$ ./gl_sqoop.sh
Verify the test Sqoop job completes without error.
Oracle OS Package¶
If Gluent Data Platform is to be installed on a Hadoop node install the operating system libaio
package on that node if it is not already present (as root):
# yum install libaio
Kerberos¶
Note
This prerequisite is needed in a Kerberized cluster only if Gluent Data Platform is to be installed on a Hadoop node or if HDFS commands are to be run from a Hadoop node.
The keytab of the Kerberos principal that will be used to authenticate must be accessible by the Gluent Data Platform OS user on the Hadoop node.
Verify that a Kerberos ticket can be obtained for the principal and keytab created (as gluent):
$ kinit -kt <path_to_keytab_file> <principal_name>
$ klist