Security

Documentation Conventions

  • Commands and keywords are in this font.

  • $OFFLOAD_HOME is set when the environment file (offload.env) is sourced, unless already set, and refers to the directory named offload that is created when the software is unpacked. This is also referred to as <OFFLOAD_HOME> in sections of this guide where the environment file has not been created/sourced.

  • Third party vendor product names might be aliased or shortened for simplicity. See Third Party Vendor Products for cross-references to full product names and trademarks.

Introduction

Gluent Data Platform acts as an interface between traditional proprietary relational database systems and backend data platforms. As such, Gluent Data Platform utilizes the security features of these systems.

In a data lake architecture, where large volumes of business data are brought together in a single system, it is more important than ever to apply a diligent approach to security. The remainder of this guide covers security functionality in Gluent Data Platform and how Gluent Data Platform interacts with other systems.

Details of configuration of security features in Oracle Database and backend data platforms is beyond the scope of this guide.

System Accounts

In order for Gluent Data Platform to operate, system accounts are required in both the RDBMS (such as Oracle Database) and backend data platforms (such as Cloudera Data Hub and Google BigQuery).

Oracle Database

In the source Oracle Database instance the following users, roles and privileges are provisioned during installation (see Install Oracle Database Components):

Table 1: Oracle Database Users

Username

Purpose

GLUENT_ADM

An administrative user with access to create objects in the hybrid schema

GLUENT_APP

A read-only user

GLUENT_REPO

Owner schema of Gluent Metadata Repository

Note

GLUENT is the default system user prefix in the Oracle Database instance but this can be changed during installation if desired. The suffixes of _ADM, _APP and _REPO are mandatory.

At installation time the above accounts are created with 30 character random string passwords. The passwords are not retained and the installing administrator is expected to set the password for the “ADM” and “APP” accounts to a known value as part of the configuration process.

The creation of GLUENT_ADM, GLUENT_APP and GLUENT_REPO allows the least privilege principle to be followed.

Table 2: Oracle Database Roles

Role

Grants

GLUENT_OFFLOAD_ROLE

READ and EXECUTE on OFFLOAD_BIN directory


READ on OFFLOAD_CACHE directory


READ on OFFLOAD_DATA directory


READ and WRITE on OFFLOAD_LOG directory


GLUENT_OFFLOAD_SQLMON_ROLE role

GLUENT_OFFLOAD_REPO_ROLE

SELECT on GLUENT_REPO tables and views


EXECUTE on GLUENT_REPO.OFFLOAD_REPO


EXECUTE on GLUENT_REPO.OFFLOAD_METADATA_OT

GLUENT_OFFLOAD_SQLMON_ROLE

SELECT on GLUENT_ADM.OFFLOAD_SQLMON_SUMMARY


SELECT on GLUENT_ADM.OFFLOAD_SQLMON_HYBRID_OBJECTS


EXECUTE on GLUENT_ADM.OFFLOAD_TOOLS

Table 3: Oracle Database User Privileges

User

Grants

GLUENT_ADM

CREATE SESSION


SELECT ANY DICTIONARY


GRANT ANY OBJECT PRVILEGE


SELECT ANY TABLE


ANALYZE ANY


EXECUTE on SYS.DBMS_LOCK


EXECUTE on SYS.DBMS_FLASHBACK


SELECT_CATALOG_ROLE


GLUENT_OFFLOAD_ROLE


GLUENT_OFFLOAD_REPO_ROLE

GLUENT_APP

CREATE SESSION


SELECT ANY DICTIONARY


SELECT ANY TABLE


FLASHBACK ANY TABLE


GLUENT_OFFLOAD_ROLE

GLUENT_REPO

CREATE SESSION


SELECT ANY DICTIONARY

Oracle Database Server

Gluent Data Platform must be installed as the same user that owns the Oracle software (typically oracle).

Metadata Daemon runs on the Oracle Database server and under certain conditions is required to run as the root user. Refer to Metadata Daemon OS User.

Cloudera Data Hub / Cloudera Data Platform Private Cloud

A Gluent Data Platform OS user (typically named gluent, however, any valid operating system username is supported) is required on the node(s) on which Gluent Offload Engine will be run. There are no specific group membership requirements for this user. Refer to Provision a Gluent Data Platform OS User (CDH) and Provision a Gluent Data Platform OS User (CDP Private Cloud)

A Kerberos principal may be required either for password-less SSH or authentication with a Kerberized cluster.

An LDAP user may be required for authentication with an LDAP enabled Impala.

Cloudera Data Platform Public Cloud

A Cloudera Data Platform user for Gluent Data Platform is required. This user requires access to the Cloudera Data Platform environment and to cloud storage. Refer to Provision a CDP User for Gluent Data Platform.

Google BigQuery

A service account is required for use by Gluent Data Platform.

A role named GLUENT_OFFLOAD_ROLE should be created with the privileges listed below and assigned to the service account for use with Gluent Data Platform:

bigquery.datasets.create
bigquery.datasets.get
bigquery.jobs.create
bigquery.readsessions.create
bigquery.readsessions.getData
bigquery.tables.create
bigquery.tables.delete
bigquery.tables.get
bigquery.tables.getData
bigquery.tables.list
bigquery.tables.update
bigquery.tables.updateData
cloudkms.cryptoKeys.get
storage.buckets.get
storage.objects.create
storage.objects.delete
storage.objects.get
storage.objects.list

Note

The cloudkms.cryptoKeys.get privilege is only required if customer-managed encryption keys (CMEK) for BigQuery will be used.

The creation of the service account and role allows the least privilege principle to be followed.

Snowflake

A user is required for use by Gluent Data Platform.

A role must to be created and granted sufficient privileges to be able to operate with databases, integrations, stages, file formats, schemas, tables and views. The role name GLUENT_OFFLOAD_ROLE is assumed in the steps below.

There are mandatory privileges that are needed regardless of the granularity of access preferred.

Table 4: Mandatory Snowflake Privileges

Role

Grants

GLUENT_OFFLOAD_ROLE

USAGE ON DATABASE <database>


USAGE ON INTEGRATION <integration>


USAGE ON WAREHOUSE <warehouse>

There are additional privileges that depend on how fine-grained the access requirements are.

Either one of the following or a combination of the two are required:

  1. Permissive access privileges to cover all existing and future requirements

  2. Restricted access privileges to cover specific existing requirements

Permissive Access Privileges

Allows Gluent Data Platform to:

  • Have global read access to all existing schemas, tables or views in the database

  • Have the ability to offload new tables to any new or existing schema in the Snowflake database, including the ability to create schemas as required

  • Present any existing table or view to the RDBMS for querying

Table 5: Permissive Snowflake Privileges

Role

Grants

GLUENT_OFFLOAD_ROLE

CREATE SCHEMA ON DATABASE <database>


USAGE, CREATE STAGE, CREATE FILE FORMAT, CREATE TABLE, CREATE VIEW ON ALL SCHEMAS IN DATABASE <database>


USAGE, CREATE STAGE, CREATE FILE FORMAT, CREATE TABLE, CREATE VIEW ON FUTURE SCHEMAS IN DATABASE <database>


SELECT ON FUTURE TABLES IN DATABASE <database>


SELECT ON FUTURE VIEWS IN DATABASE <database>


SELECT ON ALL TABLES IN SCHEMA <existing_schema> 1


SELECT ON ALL VIEWS IN SCHEMA <existing_schema> 1

1(1,2)

For each existing schema that is to be used with Gluent Data Platform.

Restricted Access Privileges

If Gluent Data Platform access needs to adhere to the principle of least privilege, such as:

  • Having read access to limited existing schemas and/or limited existing tables or views

  • Offloading data to a limited set of new or existing schemas

Any or all of the permissive privileges can be replaced with the restricted equivalent below.

If the CREATE SCHEMA privilege is not permitted, the following actions must be done for every schema that Gluent Data Platform is to operate with.

For each new schema to be offloaded from the RDBMS to Snowflake:

CREATE SCHEMA <rdbms_schema_name> COMMENT = 'Offload schema for Gluent Data Platform';
GRANT OWNERSHIP ON SCHEMA <rdbms_schema_name> TO GLUENT_OFFLOAD_ROLE;

CREATE SCHEMA <rdbms_schema_name>_LOAD COMMENT = 'Staging schema for Gluent Data Platform';
GRANT OWNERSHIP ON SCHEMA <rdbms_schema_name>_LOAD TO GLUENT_OFFLOAD_ROLE;

For every existing Snowflake schema with tables or views to be presented to the RDBMS:

CREATE SCHEMA <existing_schema_name>_LOAD COMMENT = 'Staging schema for Gluent Data Platform';
GRANT OWNERSHIP ON SCHEMA <existing_schema_name>_LOAD TO GLUENT_OFFLOAD_ROLE;
Table 6: Restricted Snowflake Privileges

Role

Grants

GLUENT_OFFLOAD_ROLE

USAGE, CREATE STAGE, CREATE FILE FORMAT, CREATE TABLE, CREATE VIEW ON SCHEMA <schema_name> 2


SELECT ON TABLE|VIEW <schema_name>.<table|view_name> 3

2

For every schema that is to be used with Gluent Data Platform.

3

For every existing table or view that is to be presented to an RDBMS with Gluent Data Platform.

Hybrid Schemas

Gluent Data Platform creates a hybrid schema in the Oracle Database instance for each application schema that is offloaded. These accounts are created with 30 character random string passwords that are not stored.

The following grants are made to each hybrid schema:

Table 7: Hybrid Schema Grants

User

Grants

Hybrid schema

CREATE SESSION


CREATE ANY TRIGGER


CREATE MATERIALIZED VIEW


CREATE SEQUENCE


CREATE TABLE


CREATE VIEW


GLOBAL QUERY REWRITE


QUERY REWRITE


SELECT ANY TABLE


EXECUTE on SYS.DBMS_ADVANCED_REWRITE


EXECUTE on SYS.DBMS_FLASHBACK


GLUENT_OFFLOAD_ROLE

Note

GLUENT_ADM is granted CONNECT THROUGH for each hybrid schema.

Encryption

Password Encryption

Gluent Data Platform supports password encryption as follows.

Table 8: Password Encryption Support

Backend Scope

Password Source

Details

All

Gluent Data Platform Environment File

Clear-text passwords and passphrases stored in this file can be encrypted using Gluent Data Platform Environment File Passwords

Cloudera Data Hub
Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud

Sqoop Command Line

Hadoop Credential Provider API authentication to Oracle Database instances during the data transport phase of Offload prevents clear-text password being exposed on the Sqoop command line. Refer to Hadoop Credential Provider API

Cloudera Data Hub
Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud

Sqoop Command Line

Oracle Wallet authentication to Oracle Database instances during the data transport phase of Offload prevents clear-text password being exposed on the Sqoop command line. Refer to Oracle Wallet

Google BigQuery
Cloudera Data Hub
Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud
Snowflake

Spark Command Line

Oracle Wallet authentication to Oracle Database instances during the data transport phase of Offload prevents clear-text password being exposed on the Spark command line. Refer to Oracle Wallet

Network Encryption

Gluent Data Platform supports network encryption as follows.

Table 9: Network Encryption Support

Backend Scope

Software Engine / Component

Details

All

Data Daemon ↔ Metadata Daemon, Smart Connector

TLS encryption in transit can be enabled (not enabled by default). Refer to Securing Data Daemon

All

Gluent Offload Engine ↔ Oracle Database

Oracle Native Encryption connections can be enabled (not enabled by default). Refer to Oracle Native Network Encryption

Cloudera Data Hub
Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud

Sqoop, YARN, Spark ↔ Oracle Database

Oracle Native Encryption connections can be enabled (not enabled by default). Refer to Oracle Native Network Encryption, Sqoop Encryption and Data Integrity in Transit and Spark Encryption and Data Integrity in Transit

Cloudera Data Hub
Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud

Data Daemon, Gluent Offload Engine ↔ Impala

TLS encryption in transit to Impala supported. Refer to SSL_ACTIVE and SSL_TRUSTED_CERTS

Cloudera Data Hub
Cloudera Data Platform Private Cloud

Gluent Offload Engine ↔ WebHDFS

TLS encryption in transit to WebHDFS supported. Refer to WEBHDFS_VERIFY_SSL

Cloudera Data Hub
Cloudera Data Platform Private Cloud

Data Daemon ↔ HDFS

TLS encryption in transit to Secure DataNodes supported. Refer to HDFS Client Configuration File (CDH) and HDFS Client Configuration File (CDP Private Cloud)

Google BigQuery

Data Daemon ↔ Google BigQuery

TLS encryption in transit is enabled by default for Google Cloud Platform. No Gluent Data Platform configuration is required

Google BigQuery
Snowflake

Spark ↔ Oracle Database

Oracle Native Encryption can be enabled (not enabled by default). Refer to Oracle Native Network Encryption and Spark Encryption and Data Integrity in Transit

Google BigQuery

Spark ↔ Google Cloud Storage

TLS encryption in transit is enabled by default for Google Cloud Platform. No Gluent Data Platform configuration is required

Google BigQuery

Gluent Offload Engine ↔ Google BigQuery, Google Cloud Storage

TLS encryption in transit is enabled by default for Google Cloud Platform. No Gluent Data Platform configuration is required

Snowflake

Gluent Offload Engine, Data Daemon ↔ Snowflake

TLS encryption in transit is enabled by default for Snowflake. No Gluent Data Platform configuration is required

Snowflake

Gluent Offload Engine, Spark ↔ Amazon S3

TLS encryption in transit is enabled by default for Amazon S3. No Gluent Data Platform configuration is required

Snowflake

Gluent Offload Engine, Spark ↔ Google Cloud Storage

TLS encryption in transit is enabled by default for Google Cloud Platform. No Gluent Data Platform configuration is required

Snowflake

Gluent Offload Engine, Spark ↔ Microsoft Azure Storage

TLS encryption in transit is enabled by default for Microsoft Azure Storage. No Gluent Data Platform configuration is required

Note

In addition to Oracle Native Encryption, Gluent Data Platform supports TLS enabled connections to Oracle Database instances for both Gluent Data Platform and backend components acting on behalf of Gluent Data Platform. Contact Gluent Support for further details.

Encryption at Rest

Gluent Data Platform supports encryption at rest as follows.

Table 10: Encryption at Rest Support

Scope

Details

Oracle Database

Oracle Transparent Data Encryption (TDE). No Gluent Data Platform configuration is required

Cloudera Data Hub
Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud

HDFS Transparent Encryption (Encryption Zones). Configuration may be required depending on the HDFS_LOAD location. Refer to HDFS Client Configuration File (CDH) and HDFS Client Configuration File (CDP Private Cloud).

Google BigQuery

Google Cloud Platform encrypts and decrypts all data written to disk by default. Encryption keys can be managed by Google or by customers. The use of customer-managed encryption keys (CMEK) by Gluent Data Platform requires values for GOOGLE_KMS_KEY_NAME, GOOGLE_KMS_KEY_RING_NAME and GOOGLE_KMS_KEY_RING_LOCATION. For Google managed encryption no Gluent Data Platform configuration is required

Snowflake

Snowflake encrypts and decrypts all data written to disk by default. Encryption keys can be managed by Snowflake or by customers. No Gluent Data Platform configuration is required

Amazon S3

Amazon S3 server-side bucket encryption transparently encrypts and decrypts all data written to disk by default. No Gluent Data Platform configuration is required

Google Cloud Storage

Google Cloud Platform transparently encrypts and decrypts all data written to disk by default. No Gluent Data Platform configuration is required

Microsoft Azure Storage

Microsoft Azure Storage Service Encryption (SSE) transparently encrypts and decrypts all data written to disk by default. No Gluent Data Platform configuration is required

Data Integrity

Gluent Data Platform supports Oracle Network Data Integrity for ensuring the integrity of data in transit for connections to Oracle Database instances for both Gluent Offload Engine and Sqoop or Spark components acting on behalf of Gluent Data Platform. Refer to Oracle Network Data Integrity.

Authentication

Gluent Data Platform supports authentication as follows.

Table 11: Authentication Support

Backend Scope

Software Engine / Component

Details

All

Metadata Daemon, Gluent Offload Engine → Oracle Database

Password and Oracle Wallet based authentication to Oracle Database instances. Refer to Gluent Data Platform Oracle Wallet Authentication

All

Data Daemon

Password based authentication to web interface. Refer to Configuring Data Daemon Web Interface

Cloudera Data Hub
Cloudera Data Platform Private Cloud

Data Daemon, Gluent Offload Engine → HDFS, Impala

SASL/GSSAPI (Kerberos) authentication to HDFS (including Secure DataNodes) and Impala. Refer to KERBEROS_KEYTAB, KERBEROS_PRINCIPAL, KERBEROS_SERVICE, KERBEROS_TICKET_CACHE_PATH, HDFS Client Configuration File (CDH) and HDFS Client Configuration File (CDP Private Cloud)

Cloudera Data Platform Public Cloud

Data Daemon, Gluent Offload Engine → Impala

Password based authentication directly to Data Warehouse, and to Data Hub via Apache Knox Gateway

Cloudera Data Hub
Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud

Sqoop → Oracle Database

Sqoop password file authentication to Oracle Database instances during the data transport phase of Offload. Refer to Sqoop Password File

Cloudera Data Hub
Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud

Sqoop → Oracle Database

Hadoop Credential Provider API authentication to Oracle Database instances during the data transport phase of Offload. Refer to Hadoop Credential Provider API

Cloudera Data Hub
Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud

Sqoop, Spark → Oracle Database

Oracle Wallet authentication to Oracle Database instances during the data transport phase of Offload. Refer to Oracle Wallet and Oracle Wallet

Google BigQuery

Gluent Offload Engine, Data Daemon → Google BigQuery, Google Cloud Storage

Private key based authentication to Google Cloud Platform

Google BigQuery
Snowflake

Spark → Oracle Database

Oracle Wallet authentication to Oracle Database instances during the data transport phase of Offload. Refer to Oracle Wallet

Google BigQuery

Spark ↔ Google Cloud Storage

Private key based authentication to Google Cloud Platform

Snowflake

Gluent Offload Engine, Data Daemon → Snowflake

Authentication to Snowflake either by password, by RSA key or by passphrase-protected RSA key

Snowflake

Gluent Offload Engine, Spark ↔ Amazon S3

IAM role authentication, or access key based authentication to Amazon S3 using either environmental variables or an AWS credentials file

Snowflake

Gluent Offload Engine, Spark ↔ Google Cloud Storage

Private key based authentication to Google Cloud Platform

Snowflake

Gluent Offload Engine, Spark ↔ Microsoft Azure Storage

Storage account key based authentication to Microsoft Azure

Authorization

Gluent Data Platform supports authorization as follows.

Table 12: Authorization Support

Backend Scope

Software Engine / Component

Details

All

Oracle Database

Oracle Database’s authorization mechanism is the primary access control in an environment where Gluent Data Platform is used. This is because Smart Connector is invoked by accessing a table in an Oracle Database instance that has either been previously offloaded or presented. The principle of least privilege is followed. Refer to Oracle Database

Cloudera Data Hub

Impala

Authorization in Impala is controlled by Sentry. The Gluent Data Platform user requires Sentry privileges to function. Refer to Sentry

Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud

Impala

Authorization in Impala is controlled by Ranger. The Gluent Data Platform user requires Ranger privileges to function. Refer to Ranger Privileges

Cloudera Data Hub
Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud

HDFS

The Gluent Data Platform user must have read and write access to HDFS locations. Refer to Create HDFS Directories (CDH), Create HDFS Directories (CDP Private Cloud) and Create HDFS Directories (CDP Public Cloud)

Cloudera Data Hub
Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud

User-Defined Functions (UDFs)

A user with authorization to create UDFs in Impala is required. This does not have to be the Gluent Data Platform user, but the Gluent Data Platform user must be able to access the UDFs once installed. Refer to Creation of User-Defined Functions (CDH), Creation of User-Defined Functions (CDP Private Cloud) and Creation of User-Defined Functions (CDP Public Cloud)

Google BigQuery

Google BigQuery API

The service account used by Gluent Data Platform requires authorization to interact with Google BigQuery datasets. The principle of least privilege is followed. Refer to Google BigQuery

Google BigQuery

Google Cloud Storage

The service account used by Gluent Data Platform requires authorization to interact with Google Cloud Storage underlying Google BigQuery tables. The principle of least privilege is followed. Refer to Google BigQuery

Google BigQuery

Spark

The service account used by Gluent Data Platform requires authorization to write to a Google Cloud Storage bucket during the data transport phase of Offload. Refer to Bucket

Snowflake

Snowflake API

The Gluent Data Platform user requires authorization to interact with Snowflake databases, integrations, stages, file formats, schemas, tables and views. The principle of least privilege can be followed. Refer to Snowflake

Snowflake

Spark

The account or IAM role used by Gluent Data Platform requires authorization to write to a cloud storage provider bucket or container during the data transport phase of Offload. Refer to Cloud Storage

Documentation Feedback

Send feedback on this documentation to: feedback@gluent.com