Security¶
Table of Contents
Documentation Conventions¶
Commands and keywords are in this
font
.$OFFLOAD_HOME
is set when the environment file (offload.env
) is sourced, unless already set, and refers to the directory namedoffload
that is created when the software is unpacked. This is also referred to as<OFFLOAD_HOME>
in sections of this guide where the environment file has not been created/sourced.Third party vendor product names might be aliased or shortened for simplicity. See Third Party Vendor Products for cross-references to full product names and trademarks.
Introduction¶
Gluent Data Platform acts as an interface between traditional proprietary relational database systems and backend data platforms. As such, Gluent Data Platform utilizes the security features of these systems.
In a data lake architecture, where large volumes of business data are brought together in a single system, it is more important than ever to apply a diligent approach to security. The remainder of this guide covers security functionality in Gluent Data Platform and how Gluent Data Platform interacts with other systems.
Details of configuration of security features in Oracle Database and backend data platforms is beyond the scope of this guide.
System Accounts¶
In order for Gluent Data Platform to operate, system accounts are required in both the RDBMS (such as Oracle Database) and backend data platforms (such as Cloudera Data Hub and Google BigQuery).
Oracle Database¶
In the source Oracle Database instance the following users, roles and privileges are provisioned during installation (see Install Oracle Database Components):
Table 1: Oracle Database Users¶
Username |
Purpose |
---|---|
|
An administrative user with access to create objects in the hybrid schema |
|
A read-only user |
|
Owner schema of Gluent Metadata Repository |
Note
GLUENT
is the default system user prefix in the Oracle Database instance but this can be changed during installation if desired. The suffixes of _ADM
, _APP
and _REPO
are mandatory.
At installation time the above accounts are created with 30 character random string passwords. The passwords are not retained and the installing administrator is expected to set the password for the “ADM” and “APP” accounts to a known value as part of the configuration process.
The creation of GLUENT_ADM
, GLUENT_APP
and GLUENT_REPO
allows the least privilege principle to be followed.
Table 2: Oracle Database Roles¶
Role |
Grants |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Table 3: Oracle Database User Privileges¶
User |
Grants |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Oracle Database Server¶
Gluent Data Platform must be installed as the same user that owns the Oracle software (typically oracle).
Metadata Daemon runs on the Oracle Database server and under certain conditions is required to run as the root user. Refer to Metadata Daemon OS User.
Azure Synapse Analytics¶
A user is required for use by Gluent Data Platform.
A role must to be created and granted sufficient privileges to be able to operate with data sources, file formats, schemas, tables and views. The role name gluent_offload_role is assumed in the steps below.
There are mandatory privileges that are needed regardless of the granularity of access preferred.
Table 4: Mandatory Azure Synapse Analytics Privileges¶
Role |
Grants |
---|---|
|
|
|
|
|
|
|
|
|
|
|
There are additional privileges that depend on how fine-grained the access requirements are.
Either one of the following:
Permissive access privileges for schema creation
Restricted access privileges for schema creation
Permissive Access Privileges¶
Allows Gluent Data Platform to create schemas as required.
Restricted Access Privileges¶
If Gluent Data Platform access needs to adhere to the principle of least privilege, such that it is not permitted to create schemas as required.
For each new schema to be offloaded from the RDBMS to Synapse:
CREATE SCHEMA <rdbms_schema_name> AUTHORIZATION <synapse_user>;
CREATE SCHEMA <rdbms_schema_name>_load AUTHORIZATION <synapse_user>;
For every existing Azure Synapse Analytics schema with tables or views to be presented to the RDBMS:
CREATE SCHEMA <existing_schema_name>_load AUTHORIZATION <synapse_user>;
If the use of the AUTHORIZATION
clause when creating schemas is not permitted, then the following actions must be performed after creating the schemas.
Cloudera Data Hub / Cloudera Data Platform Private Cloud¶
A Gluent Data Platform OS user (typically named gluent, however, any valid operating system username is supported) is required on the node(s) on which Gluent Offload Engine will be run. There are no specific group membership requirements for this user. Refer to Provision a Gluent Data Platform OS User (CDH) and Provision a Gluent Data Platform OS User (CDP Private Cloud)
A Kerberos principal may be required either for password-less SSH or authentication with a Kerberized cluster.
An LDAP user may be required for authentication with an LDAP enabled Impala.
Cloudera Data Platform Public Cloud¶
A Cloudera Data Platform user for Gluent Data Platform is required. This user requires access to the Cloudera Data Platform environment and to cloud storage. Refer to Provision a CDP User for Gluent Data Platform.
Google BigQuery¶
A service account is required for use by Gluent Data Platform.
A role named GLUENT_OFFLOAD_ROLE should be created with the privileges listed below and assigned to the service account for use with Gluent Data Platform:
bigquery.datasets.create
bigquery.datasets.get
bigquery.jobs.create
bigquery.readsessions.create
bigquery.readsessions.getData
bigquery.tables.create
bigquery.tables.delete
bigquery.tables.get
bigquery.tables.getData
bigquery.tables.list
bigquery.tables.update
bigquery.tables.updateData
cloudkms.cryptoKeys.get
storage.buckets.get
storage.objects.create
storage.objects.delete
storage.objects.get
storage.objects.list
Note
The cloudkms.cryptoKeys.get
privilege is only required if customer-managed encryption keys (CMEK) for BigQuery will be used.
The creation of the service account and role allows the least privilege principle to be followed.
Snowflake¶
A user is required for use by Gluent Data Platform.
A role must to be created and granted sufficient privileges to be able to operate with databases, integrations, stages, file formats, schemas, tables and views. The role name GLUENT_OFFLOAD_ROLE is assumed in the steps below.
There are mandatory privileges that are needed regardless of the granularity of access preferred.
Table 7: Mandatory Snowflake Privileges¶
Role |
Grants |
---|---|
|
|
|
|
|
There are additional privileges that depend on how fine-grained the access requirements are.
Either one of the following or a combination of the two are required:
Permissive access privileges to cover all existing and future requirements
Restricted access privileges to cover specific existing requirements
Permissive Access Privileges¶
Allows Gluent Data Platform to:
Have global read access to all existing schemas, tables or views in the database
Have the ability to offload new tables to any new or existing schema in the Snowflake database, including the ability to create schemas as required
Present any existing table or view to the RDBMS for querying
Table 8: Permissive Snowflake Privileges¶
Role |
Grants |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
Restricted Access Privileges¶
If Gluent Data Platform access needs to adhere to the principle of least privilege, such as:
Having read access to limited existing schemas and/or limited existing tables or views
Offloading data to a limited set of new or existing schemas
Any or all of the permissive privileges can be replaced with the restricted equivalent below.
If the CREATE SCHEMA
privilege is not permitted, the following actions must be done for every schema that Gluent Data Platform is to operate with.
For each new schema to be offloaded from the RDBMS to Snowflake:
CREATE SCHEMA <rdbms_schema_name> COMMENT = 'Offload schema for Gluent Data Platform';
GRANT OWNERSHIP ON SCHEMA <rdbms_schema_name> TO GLUENT_OFFLOAD_ROLE;
CREATE SCHEMA <rdbms_schema_name>_LOAD COMMENT = 'Staging schema for Gluent Data Platform';
GRANT OWNERSHIP ON SCHEMA <rdbms_schema_name>_LOAD TO GLUENT_OFFLOAD_ROLE;
For every existing Snowflake schema with tables or views to be presented to the RDBMS:
CREATE SCHEMA <existing_schema_name>_LOAD COMMENT = 'Staging schema for Gluent Data Platform';
GRANT OWNERSHIP ON SCHEMA <existing_schema_name>_LOAD TO GLUENT_OFFLOAD_ROLE;
Hybrid Schemas¶
Gluent Data Platform creates a hybrid schema in the Oracle Database instance for each application schema that is offloaded. These accounts are created with 30 character random string passwords that are not stored.
The following grants are made to each hybrid schema:
Table 10: Hybrid Schema Grants¶
User |
Grants |
---|---|
Hybrid schema |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note
GLUENT_ADM
is granted CONNECT THROUGH
for each hybrid schema.
Encryption¶
Password Encryption¶
Gluent Data Platform supports password encryption as follows.
Table 11: Password Encryption Support¶
Backend Scope |
Password Source |
Details |
---|---|---|
All |
Gluent Data Platform Environment File |
Clear-text passwords and passphrases stored in this file can be encrypted using Gluent Data Platform Environment File Passwords |
All |
Spark Command Line |
Oracle Wallet authentication to Oracle Database instances during the data transport phase of Offload prevents clear-text password being exposed on the Spark command line. Refer to Oracle Wallet |
Cloudera Data Hub
Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud
|
Sqoop Command Line |
Hadoop Credential Provider API authentication to Oracle Database instances during the data transport phase of Offload prevents clear-text password being exposed on the Sqoop command line. Refer to Hadoop Credential Provider API |
Cloudera Data Hub
Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud
|
Sqoop Command Line |
Oracle Wallet authentication to Oracle Database instances during the data transport phase of Offload prevents clear-text password being exposed on the Sqoop command line. Refer to Oracle Wallet |
Network Encryption¶
Gluent Data Platform supports network encryption as follows.
Table 12: Network Encryption Support¶
Backend Scope |
Software Engine / Component |
Details |
---|---|---|
All |
Data Daemon ↔ Metadata Daemon, Smart Connector |
TLS encryption in transit can be enabled (not enabled by default). Refer to Securing Data Daemon |
All |
Gluent Offload Engine ↔ Oracle Database |
Oracle Native Encryption connections can be enabled (not enabled by default). Refer to Oracle Native Network Encryption |
Azure Synapse Analytics |
Gluent Offload Engine, Data Daemon ↔ SQL Pool |
TLS encryption in transit is enabled by default for Azure Synapse Analytics. No Gluent Data Platform configuration is required |
Cloudera Data Hub
Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud
|
Sqoop, YARN, Spark ↔ Oracle Database |
Oracle Native Encryption connections can be enabled (not enabled by default). Refer to Oracle Native Network Encryption, Sqoop Encryption and Data Integrity in Transit and Spark Encryption and Data Integrity in Transit |
Cloudera Data Hub
Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud
|
Data Daemon, Gluent Offload Engine ↔ Impala |
TLS encryption in transit to Impala supported.
Refer to |
Cloudera Data Hub
Cloudera Data Platform Private Cloud
|
Gluent Offload Engine ↔ WebHDFS |
TLS encryption in transit to WebHDFS supported.
Refer to |
Cloudera Data Hub
Cloudera Data Platform Private Cloud
|
Data Daemon ↔ HDFS |
TLS encryption in transit to Secure DataNodes supported. Refer to HDFS Client Configuration File (CDH) and HDFS Client Configuration File (CDP Private Cloud) |
Google BigQuery |
Data Daemon ↔ Google BigQuery |
TLS encryption in transit is enabled by default for Google Cloud Platform. No Gluent Data Platform configuration is required |
Azure Synapse Analytics
Google BigQuery
Snowflake
|
Spark ↔ Oracle Database |
Oracle Native Encryption can be enabled (not enabled by default). Refer to Oracle Native Network Encryption and Spark Encryption and Data Integrity in Transit |
Google BigQuery |
Spark ↔ Google Cloud Storage |
TLS encryption in transit is enabled by default for Google Cloud Platform. No Gluent Data Platform configuration is required |
Google BigQuery |
Gluent Offload Engine ↔ Google BigQuery, Google Cloud Storage |
TLS encryption in transit is enabled by default for Google Cloud Platform. No Gluent Data Platform configuration is required |
Snowflake |
Gluent Offload Engine, Data Daemon ↔ Snowflake |
TLS encryption in transit is enabled by default for Snowflake. No Gluent Data Platform configuration is required |
Snowflake |
Gluent Offload Engine, Spark ↔ Amazon S3 |
TLS encryption in transit is enabled by default for Amazon S3. No Gluent Data Platform configuration is required |
Snowflake |
Gluent Offload Engine, Spark ↔ Google Cloud Storage |
TLS encryption in transit is enabled by default for Google Cloud Platform. No Gluent Data Platform configuration is required |
Azure Synapse Analytics
Snowflake
|
Gluent Offload Engine, Spark ↔ Microsoft Azure Storage |
TLS encryption in transit is enabled by default for Microsoft Azure Storage. No Gluent Data Platform configuration is required |
Note
In addition to Oracle Native Encryption, Gluent Data Platform supports TLS enabled connections to Oracle Database instances for both Gluent Data Platform and backend components acting on behalf of Gluent Data Platform. Contact Gluent Support for further details.
Encryption at Rest¶
Gluent Data Platform supports encryption at rest as follows.
Table 13: Encryption at Rest Support¶
Scope |
Details |
---|---|
Oracle Database |
Oracle Transparent Data Encryption (TDE). No Gluent Data Platform configuration is required |
Cloudera Data Hub
Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud
|
HDFS Transparent Encryption (Encryption Zones). Configuration may be required depending on the |
Azure Synapse Analytics |
Azure Synapse Analytics SQL Pool Transparent Data Encryption (TDE). No Gluent Data Platform configuration is required |
Google BigQuery |
Google Cloud Platform encrypts and decrypts all data written to disk by default. Encryption keys can be managed by Google or by customers. The use of customer-managed encryption keys (CMEK)
by Gluent Data Platform requires values for |
Snowflake |
Snowflake encrypts and decrypts all data written to disk by default. Encryption keys can be managed by Snowflake or by customers. No Gluent Data Platform configuration is required |
Amazon S3 |
Amazon S3 server-side bucket encryption transparently encrypts and decrypts all data written to disk by default. No Gluent Data Platform configuration is required |
Google Cloud Storage |
Google Cloud Platform transparently encrypts and decrypts all data written to disk by default. No Gluent Data Platform configuration is required |
Microsoft Azure Storage |
Microsoft Azure Storage Service Encryption (SSE) transparently encrypts and decrypts all data written to disk by default. No Gluent Data Platform configuration is required |
Data Integrity¶
Gluent Data Platform supports Oracle Network Data Integrity for ensuring the integrity of data in transit for connections to Oracle Database instances for both Gluent Offload Engine and Sqoop or Spark components acting on behalf of Gluent Data Platform. Refer to Oracle Network Data Integrity.
Authentication¶
Gluent Data Platform supports authentication as follows.
Table 14: Authentication Support¶
Backend Scope |
Software Engine / Component |
Details |
---|---|---|
All |
Metadata Daemon, Gluent Offload Engine → Oracle Database |
Password and Oracle Wallet based authentication to Oracle Database instances. Refer to Gluent Data Platform Oracle Wallet Authentication |
All |
Data Daemon |
Password based authentication to web interface. Refer to Configuring Data Daemon Web Interface |
Azure Synapse Analytics |
Data Daemon, Gluent Offload Engine → SQL Pool |
Any of the following authentication mechanisms:
1. SQL authentication
2. Azure Active Directory password authentication
3. Azure Active Directory managed service identity authentication
4. Azure Active Directory service principal authentication
|
Cloudera Data Hub
Cloudera Data Platform Private Cloud
|
Data Daemon, Gluent Offload Engine → HDFS, Impala |
SASL/GSSAPI (Kerberos) authentication to HDFS (including Secure DataNodes) and Impala.
Refer to |
Cloudera Data Platform Public Cloud |
Data Daemon, Gluent Offload Engine → Impala |
Password based authentication directly to Data Warehouse, and to Data Hub via Apache Knox Gateway |
Cloudera Data Hub
Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud
|
Sqoop → Oracle Database |
Sqoop password file authentication to Oracle Database instances during the data transport phase of Offload. Refer to Sqoop Password File |
Cloudera Data Hub
Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud
|
Sqoop → Oracle Database |
Hadoop Credential Provider API authentication to Oracle Database instances during the data transport phase of Offload. Refer to Hadoop Credential Provider API |
Cloudera Data Hub
Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud
|
Sqoop, Spark → Oracle Database |
Oracle Wallet authentication to Oracle Database instances during the data transport phase of Offload. Refer to Oracle Wallet and Oracle Wallet |
Google BigQuery |
Gluent Offload Engine, Data Daemon → Google BigQuery, Google Cloud Storage |
Private key based authentication to Google Cloud Platform |
Google BigQuery
Snowflake
|
Spark → Oracle Database |
Oracle Wallet authentication to Oracle Database instances during the data transport phase of Offload. Refer to Oracle Wallet |
Google BigQuery |
Spark ↔ Google Cloud Storage |
Private key based authentication to Google Cloud Platform |
Snowflake |
Gluent Offload Engine, Data Daemon → Snowflake |
Authentication to Snowflake either by password, by RSA key or by passphrase-protected RSA key |
Snowflake |
Gluent Offload Engine, Spark ↔ Amazon S3 |
IAM role authentication, or access key based authentication to Amazon S3 using either environmental variables or an AWS credentials file |
Snowflake |
Gluent Offload Engine, Spark ↔ Google Cloud Storage |
Private key based authentication to Google Cloud Platform |
Azure Synapse Analytics
Snowflake
|
Gluent Offload Engine, Spark ↔ Microsoft Azure Storage |
Storage account key based authentication to Microsoft Azure |
Authorization¶
Gluent Data Platform supports authorization as follows.
Table 15: Authorization Support¶
Backend Scope |
Software Engine / Component |
Details |
---|---|---|
All |
Oracle Database |
Oracle Database’s authorization mechanism is the primary access control in an environment where Gluent Data Platform is used. This is because Smart Connector is invoked by accessing a table in an Oracle Database instance that has either been previously offloaded or presented. The principle of least privilege is followed. Refer to Oracle Database |
Azure Synapse Analytics |
SQL Pool |
The Gluent Data Platform user requires authorization to interact with data sources, file formats, schemas, tables and views. The principle of least privilege can be followed. Refer to Azure Synapse Analytics |
Azure Synapse Analytics |
Spark |
The account used by Gluent Data Platform requires authorization to write to an Azure storage account container during the data transport phase of Offload. Refer to Storage Container |
Cloudera Data Hub |
Impala |
Authorization in Impala is controlled by Sentry. The Gluent Data Platform user requires Sentry privileges to function. Refer to Sentry |
Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud
|
Impala |
Authorization in Impala is controlled by Ranger. The Gluent Data Platform user requires Ranger privileges to function. Refer to Ranger Privileges |
Cloudera Data Hub
Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud
|
HDFS |
The Gluent Data Platform user must have read and write access to HDFS locations. Refer to Create HDFS Directories (CDH), Create HDFS Directories (CDP Private Cloud) and Create HDFS Directories (CDP Public Cloud) |
Cloudera Data Hub
Cloudera Data Platform Private Cloud
Cloudera Data Platform Public Cloud
|
User-Defined Functions (UDFs) |
A user with authorization to create UDFs in Impala is required. This does not have to be the Gluent Data Platform user, but the Gluent Data Platform user must be able to access the UDFs once installed. Refer to Creation of User-Defined Functions (CDH), Creation of User-Defined Functions (CDP Private Cloud) and Creation of User-Defined Functions (CDP Public Cloud) |
Google BigQuery |
Google BigQuery API |
The service account used by Gluent Data Platform requires authorization to interact with Google BigQuery datasets. The principle of least privilege is followed. Refer to Google BigQuery |
Google BigQuery |
Google Cloud Storage |
The service account used by Gluent Data Platform requires authorization to interact with Google Cloud Storage underlying Google BigQuery tables. The principle of least privilege is followed. Refer to Google BigQuery |
Google BigQuery |
Spark |
The service account used by Gluent Data Platform requires authorization to write to a Google Cloud Storage bucket during the data transport phase of Offload. Refer to Bucket |
Snowflake |
Snowflake API |
The Gluent Data Platform user requires authorization to interact with Snowflake databases, integrations, stages, file formats, schemas, tables and views. The principle of least privilege can be followed. Refer to Snowflake |
Snowflake |
Spark |
The account or IAM role used by Gluent Data Platform requires authorization to write to a cloud storage provider bucket or container during the data transport phase of Offload. Refer to Cloud Storage |