Azure Synapse Analytics Prerequisites

Introduction

This document includes the prerequisite steps for Azure Synapse Analytics.

SQL Pool Database

A database in a new or existing dedicated SQL pool needs to be allocated for use by Gluent Data Platform.

ODBC Driver

The Microsoft ODBC 17 driver must be installed on all servers on which Gluent Offload Engine commands will be initiated.

Refer to the Microsoft ODBC driver installation guide.

Role

A role should be created for use by the Synapse user created for Gluent Data Platform. The default name for this role is GLUENT_OFFLOAD_ROLE but, if required, this can be reconfigured in the Gluent Data Platform environment file.

The name of the created role is assumed to be GLUENT_OFFLOAD_ROLE from this point forward:

CREATE ROLE GLUENT_OFFLOAD_ROLE;

User

A Synapse user should be created in the dedicated SQL pool database for use by Gluent Data Platform.

The user can be authenticated by any of the following mechanisms:

  • SQL authentication

  • Azure Active Directory password authentication

  • Azure Active Directory managed service identity authentication

  • Azure Active Directory service principal authentication

The user must be granted the Role created:

EXEC sp_addrolemember 'GLUENT_OFFLOAD_ROLE', '<synapse_user>';

Access and Privileges

The GLUENT_OFFLOAD_ROLE needs sufficient privileges to be able to operate with data sources, file formats, schemas, tables and views.

There are mandatory privileges that are needed regardless of the granularity of access preferred, and there are additional privileges that depend on how fine-grained the access requirements are.

The following sets of privileges required for Gluent Data Platform to operate are covered below:

  1. Mandatory global privileges

  2. Either one of the following:

    1. Permissive access privileges for schema creation

    2. Restricted access privileges for schema creation

Mandatory Global Privileges

Gluent Data Platform requires the following mandatory privileges, regardless of which permissive/restrictive option is taken for schema and object access:

GRANT ALTER ANY EXTERNAL DATA SOURCE TO GLUENT_OFFLOAD_ROLE;
GRANT ALTER ANY EXTERNAL FILE FORMAT TO GLUENT_OFFLOAD_ROLE;
GRANT CREATE TABLE TO GLUENT_OFFLOAD_ROLE;
GRANT CREATE VIEW TO GLUENT_OFFLOAD_ROLE;
GRANT VIEW DATABASE STATE TO GLUENT_OFFLOAD_ROLE;
GRANT SHOWPLAN to GLUENT_OFFLOAD_ROLE;

Object Access

For every existing Synapse schema with tables or views to be presented to the RDBMS:

GRANT SELECT ON SCHEMA::<existing_schema_name> TO GLUENT_OFFLOAD_ROLE;

Permissive Access Privileges

Use this option if it is acceptable for Gluent Data Platform to create schemas as required.

Create Schemas

GRANT CREATE SCHEMA TO GLUENT_OFFLOAD_ROLE;

Restricted Access Privileges

Use this option if it is not acceptable for Gluent Data Platform to create schemas as required.

Create Schemas

If the CREATE SCHEMA privilege is not permitted, the following actions must be done for every schema that Gluent Data Platform is to operate with.

For each new schema to be offloaded from the RDBMS to Synapse:

CREATE SCHEMA <rdbms_schema_name> AUTHORIZATION <synapse_user>;
CREATE SCHEMA <rdbms_schema_name>_load AUTHORIZATION <synapse_user>;

For every existing Synapse schema with tables or views to be presented to the RDBMS:

CREATE SCHEMA <existing_schema_name>_load AUTHORIZATION <synapse_user>;

Object Access

If the use of the AUTHORIZATION clause when creating schemas is not permitted, then the following actions must be performed after creating the schemas.

For each new schema to be offloaded from the RDBMS to Synapse:

GRANT ALTER, SELECT, INSERT ON SCHEMA::<rdbms_schema_name> TO GLUENT_OFFLOAD_ROLE;
GRANT ALTER, SELECT ON SCHEMA::<rdbms_schema_name>_load TO GLUENT_OFFLOAD_ROLE;

For every existing Synapse schema with tables or views to be presented to the RDBMS:

GRANT ALTER, SELECT ON SCHEMA::<existing_schema_name>_load TO GLUENT_OFFLOAD_ROLE;

Storage Container

A new or existing container in an Azure storage account needs to be allocated for use by Gluent Data Platform.

During an offload Gluent Data Platform will stage data in Azure storage in this container.

The access key for the Azure storage account is needed to provide read and write access to Gluent Data Platform.

Data Source

An external data source needs to be created in the dedicated SQL pool database for use by Gluent Data Platform.

The following parameters are required when creating the external data source.

Parameter

Details

LOCATION 1

'<prefix>://<container>@<storage_account>.[blob|dfs].core.windows.net'

CREDENTIAL

If the data in the LOCATION is secured (i.e. does not allow anonymous access) then a database scoped credential must be created and referenced here. It is beyond the scope of this guide to document the creation of a scoped credential. Refer to Microsoft documentation

TYPE

HADOOP

1

For Azure Data Lake Store Gen 2 data sources, the <prefix> should be set to abfss and the endpoint of the location should be set to dfs.core.windows.net. For other supported data sources the <prefix> should be set to wasbs and the endpoint of the location should be set to blob.core.windows.net. The <container> and <storage_account> must match the Storage Container provisioned.

File Format

An Parquet file format needs to be created in the dedicated SQL pool database for use by Gluent Data Platform.

This example creates the file format with a name of gluent_parquet but any name can be chosen:

CREATE EXTERNAL FILE FORMAT [gluent_parquet]
WITH (
    FORMAT_TYPE = PARQUET
  , DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
);

Documentation Feedback

Send feedback on this documentation to: feedback@gluent.com