Cloudera Data Platform Public Cloud Installation

Introduction

This document includes the installation steps for Cloudera Data Platform Public Cloud.

Gluent Data Platform Software Installation

In addition to the mandatory installation on Oracle Database servers, for production deployments it is recommended that Gluent Data Platform is installed on at least one other server. This can be any server that satisfies the Gluent Data Platform Supported Operating Systems and Versions. A Gluent node may been provisioned to satisfy this requirement.

The additional server enables Data Daemon to be sized appropriately for throughput without consuming resources on the Oracle Database server(s). It may also be beneficial for the following reasons:

  • Password-less SSH connectivity between Oracle Database servers and Hadoop nodes is not permitted

  • Separation of duties: Orchestration commands can be run by the Hadoop administrators rather than the Oracle database team

This document assumes that the OS user to be used for installation is gluent.

Unpack Software

Perform the following actions as gluent.

Unpack the install tarball (gluent_offload_<version>.tar.bz2):

Note

When unpacking, an offload directory will be created if it does not exist. The offload directory is referred to as <OFFLOAD_HOME> and an environment variable ($OFFLOAD_HOME) will be set when offload.env is sourced.

$ cd <Gluent Data Platform Base Directory>
$ tar xpf <Gluent Data Platform Installation Media Directory>/gluent_offload_<version>.tar.bz2

Gluent Data Platform Environment File

  1. Copy offload.env from an Oracle Database server into $OFFLOAD_HOME/conf

  2. If both HDFS and Offload Transport commands will be issued from this server then set both HDFS_CMD_HOST and OFFLOAD_TRANSPORT_CMD_HOST to localhost in offload.env

User-Defined Functions

Creation

If Gluent Data Platform has been installed on a server in addition to the Oracle Database server, the connect command to create the user-defined functions (UDFs) detailed below should be run from that server. Otherwise, run this command using the Gluent Data Platform installation on an Oracle Database server.

Tip

By default UDFs are created in the default Impala database. This database can be changed by specifying the database name in the OFFLOAD_UDF_DB parameter in offload.env.

The storage location of the library that is referenced by the Gluent UDFs is determined by the values of parameters in offload.env. See Integrating with Cloud Storage. Ad hoc overrides to a different cloud or HDFS location are available with the --offload-fs-scheme, --offload-fs-container, --offload-fs-prefix and --hdfs-home parameters with the connect --install-udfs command.

To create the UDFs run the supplied connect command with the --install-udfs option:

$ cd $OFFLOAD_HOME/bin
$ . ../conf/offload.env
$ ./connect --install-udfs

Note

In systems using Sentry to control authorization the ALL ON SERVER/CREATE ON SERVER privilege will be required in order to install UDFs. The privilege can be safely removed once this task is complete.

In systems using Ranger to control authorization, appropriate Ranger permissions are required in order to install UDFs. See Ranger Privileges.

If the user with which Gluent Data Platform will authenticate to Impala is not permitted to have the necessary privileges to create UDFs, even on a temporary basis, then a script can be generated for execution by a system administrator. Use the --sql-file option to specify a file where commands should be written instead of being executed:

$ cd $OFFLOAD_HOME/bin
$ . ../conf/offload.env
$ ./connect --install-udfs --sql-file=/tmp/gluent_udfs.sql

The /tmp/gluent_udfs.sql file can then be run by an Impala user with the required Sentry privileges.

Data Warehouse

If Gluent Query Engine is configured to use Data Warehouse Impala then the newly created UDFs need to be validated with a connection to Data Warehouse Impala.

To validate the UDFs run the supplied connect command with the --validate-udfs option:

$ cd $OFFLOAD_HOME/bin
$ . ../conf/offload.env
$ ./connect --validate-udfs --hadoop-host=${CONNECTOR_HIVE_SERVER_HOST} --hiveserver2-http-path=${CONNECTOR_HIVE_SERVER_HTTP_PATH}

Documentation Feedback

Send feedback on this documentation to: feedback@gluent.com