Azure Synapse Analytics Prerequisites¶
Table of Contents
Introduction¶
This document includes the prerequisite steps for Azure Synapse Analytics.
SQL Pool Database¶
A database in a new or existing dedicated SQL pool needs to be allocated for use by Gluent Data Platform.
ODBC Driver¶
The Microsoft ODBC 17 driver must be installed on all servers on which Gluent Offload Engine commands will be initiated.
Refer to the Microsoft ODBC driver installation guide.
Role¶
A role should be created for use by the Synapse user created for Gluent Data Platform. The default name for this role is GLUENT_OFFLOAD_ROLE but, if required, this can be reconfigured in the Gluent Data Platform environment file.
The name of the created role is assumed to be GLUENT_OFFLOAD_ROLE from this point forward:
CREATE ROLE GLUENT_OFFLOAD_ROLE;
User¶
A Synapse user should be created in the dedicated SQL pool database for use by Gluent Data Platform.
The user can be authenticated by any of the following mechanisms:
SQL authentication
Azure Active Directory password authentication
Azure Active Directory managed service identity authentication
Azure Active Directory service principal authentication
The user must be granted the Role created:
EXEC sp_addrolemember 'GLUENT_OFFLOAD_ROLE', '<synapse_user>';
Access and Privileges¶
The GLUENT_OFFLOAD_ROLE needs sufficient privileges to be able to operate with data sources, file formats, schemas, tables and views.
There are mandatory privileges that are needed regardless of the granularity of access preferred, and there are additional privileges that depend on how fine-grained the access requirements are.
The following sets of privileges required for Gluent Data Platform to operate are covered below:
Mandatory global privileges
Either one of the following:
Permissive access privileges for schema creation
Restricted access privileges for schema creation
Mandatory Global Privileges¶
Gluent Data Platform requires the following mandatory privileges, regardless of which permissive/restrictive option is taken for schema and object access:
GRANT ALTER ANY EXTERNAL DATA SOURCE TO GLUENT_OFFLOAD_ROLE;
GRANT ALTER ANY EXTERNAL FILE FORMAT TO GLUENT_OFFLOAD_ROLE;
GRANT CREATE TABLE TO GLUENT_OFFLOAD_ROLE;
GRANT CREATE VIEW TO GLUENT_OFFLOAD_ROLE;
GRANT VIEW DATABASE STATE TO GLUENT_OFFLOAD_ROLE;
GRANT SHOWPLAN to GLUENT_OFFLOAD_ROLE;
Object Access¶
For every existing Synapse schema with tables or views to be presented to the RDBMS:
GRANT SELECT ON SCHEMA::<existing_schema_name> TO GLUENT_OFFLOAD_ROLE;
Permissive Access Privileges¶
Use this option if it is acceptable for Gluent Data Platform to create schemas as required.
Create Schemas¶
GRANT CREATE SCHEMA TO GLUENT_OFFLOAD_ROLE;
Restricted Access Privileges¶
Use this option if it is not acceptable for Gluent Data Platform to create schemas as required.
Create Schemas¶
If the CREATE SCHEMA
privilege is not permitted, the following actions must be done for every schema that Gluent Data Platform is to operate with.
For each new schema to be offloaded from the RDBMS to Synapse:
CREATE SCHEMA <rdbms_schema_name> AUTHORIZATION <synapse_user>;
CREATE SCHEMA <rdbms_schema_name>_load AUTHORIZATION <synapse_user>;
For every existing Synapse schema with tables or views to be presented to the RDBMS:
CREATE SCHEMA <existing_schema_name>_load AUTHORIZATION <synapse_user>;
Object Access¶
If the use of the AUTHORIZATION
clause when creating schemas is not permitted, then the following actions must be performed after creating the schemas.
For each new schema to be offloaded from the RDBMS to Synapse:
GRANT ALTER, SELECT, INSERT ON SCHEMA::<rdbms_schema_name> TO GLUENT_OFFLOAD_ROLE;
GRANT ALTER, SELECT ON SCHEMA::<rdbms_schema_name>_load TO GLUENT_OFFLOAD_ROLE;
For every existing Synapse schema with tables or views to be presented to the RDBMS:
GRANT ALTER, SELECT ON SCHEMA::<existing_schema_name>_load TO GLUENT_OFFLOAD_ROLE;
Storage Container¶
A new or existing container in an Azure storage account needs to be allocated for use by Gluent Data Platform.
During an offload Gluent Data Platform will stage data in Azure storage in this container.
The access key for the Azure storage account is needed to provide read and write access to Gluent Data Platform.
Data Source¶
An external data source needs to be created in the dedicated SQL pool database for use by Gluent Data Platform.
The following parameters are required when creating the external data source.
Parameter |
Details |
---|---|
LOCATION 1 |
|
CREDENTIAL |
If the data in the LOCATION is secured (i.e. does not allow anonymous access) then a database scoped credential must be created and referenced here. It is beyond the scope of this guide to document the creation of a scoped credential. Refer to Microsoft documentation |
TYPE |
HADOOP |
- 1
For Azure Data Lake Store Gen 2 data sources, the <prefix> should be set to
abfss
and the endpoint of the location should be set todfs.core.windows.net
. For other supported data sources the <prefix> should be set towasbs
and the endpoint of the location should be set toblob.core.windows.net
. The <container> and <storage_account> must match the Storage Container provisioned.
File Format¶
An Parquet file format needs to be created in the dedicated SQL pool database for use by Gluent Data Platform.
This example creates the file format with a name of gluent_parquet but any name can be chosen:
CREATE EXTERNAL FILE FORMAT [gluent_parquet]
WITH (
FORMAT_TYPE = PARQUET
, DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
);