Using Third Party Tools to Connect to AWS Athena via JDBC

Amazon AWS Athena is an interactive query service that allows you to use standard SQL to view and analyze data in your organization's Tetra Data Lake S3 bucket. This article explains to connect to your Athena tables using a third-party tool.

Amazon Athena JDBC Driver

Amazon Athena provides a JDBC driver for connections. Any tool that can use JDBC driver can connect to Amazon Athena (SQL Workbench, IntelliJ IDEA,...). Also, Java can be used to access Athena programmatically. JDBC drivers download address and more details on JDBC usage can be found here: https://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc.html

📘

NOTE:

The JDBC driver that you choose to use depends on your Java version (e.g. JDK 8.0 or JDK 9.0).

Get Connection Details from the TDP

Connection details for Amazon Athena can be easily obtained from the TDP platform.

Click your profile icon, then go to account settings menu in the upper right corner of the screen and select the Account menu item:

Then, from the left menu, select menu item Data:

The system displays the Amazon user credentials created especially for your organization. The username and password are displayed. Note that password appears only initially when organization is created for the first time. After that, it is hidden. The password is stored for later usage. If you do not know the password, contact your organizational administrator.

Note that credentials can be re-created again by clicking on the Replace Credentials button. However, by clicking this button, a new username is generated and a new password is generated and is displayed. Be aware that new credentials will replace the old ones; the old ones cannot be used again.

Under the username and password sections, the URL address for Athena is displayed with required connection parameter named S3OutputLocation.

Write down username, password, URL and S3OutputLocation to connect to Amazon Athena used by platform and query the data.

Under connection settings, the system displays all the tables in Athena tablespace that belong to your organization:

Connecting to Athena Tables Using Third Party Tools

This section provides information on how to connect to Amazon Athena from various third party tools.

Connect to Athena from IDE Tools (IntellJ IDEA)

  1. Add the JDBC driver for Athena in DataGrip.
    a) Once DataGrip is launched click on the Add button (+ sign)

b) Click on Driver. Name your Driver. In this case I named mine “Athena - Demo”.
c) Add the Athena driver file downloaded in step 1 by clicking on the + sign below the Driver files box, select “Custom JARs”, navigate to the downloaded file, upload.
d) Once the driver file is uploaded, you can select the right Class “com.simba.athena.jdbc42.Driver”.

e) Under Options, select the Dialect to “Generic SQL”.

f) Under Advanced, fill up “AwsRegion”, and the “S3OutputLocation”. The AWS Region is the region your Athena bucket is found, in our case it is “us-east-2”. For the “S3OutputLocation”, you obtained that field in step 1.

g) Once finished filling those fields up, click on “Apply”, then “OK”.

  1. Add a new Data Source.
    a) Click on the Add button (+ sign). Select “Data Source” and find the driver you created.

b) Fill up the User, Password and URL fields using the data obtained in step 1. Click Test Connection. Once you are connected (the green checkmark), click OK.

c) Now you are ready to access your data using DataGrip’s console.

Connect to Athena from Java Spring

To connect to Athena from the Spring application, a datasource must be configured. After the dataSource is configured, it can be used in combination with jdbcTemplate as any other jdbc data source.

Example of Java-based Spring configuration

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.env.Environment;
import org.springframework.jdbc.datasource.DriverManagerDataSource;

import javax.sql.DataSource;
import java.util.Properties;

@Configuration
public class ApplicationConfig {
    @Autowired
    private Environment env;

    @Bean
    public DataSource dataSource(){
        DriverManagerDataSource dataSource = new DriverManagerDataSource();
        dataSource.setDriverClassName("com.simba.athena.jdbc.Driver));
        dataSource.setUrl(env.getProperty("jdbc.url"));
        dataSource.setUsername(env.getProperty("jdbc.username"));
        dataSource.setPassword(env.getProperty("jdbc.password"));
        // Without this property, Athena will not work
        Properties props = new Properties();
        props.put("S3OutputLocation",env.getProperty("jdbc.S3OutputLocation"));
        dataSource.setConnectionProperties(props);

        return dataSource;
    }
}

Connect to Athena from TIBCO Spotfire

Before connecting Spotfire to Amazon Athena, Athena JDBC driver must be deployed to TIBCO Spotfire. Download JDBC drive from Amazon AWS Athena page for appropriate Java versions, deploy downloaded JDBC JAR file to Spotfire's Tomcat, restart Spotfire server and configure Data Source Template. All these steps are described in Spotfire documentation at following URL:

https://community.tibco.com/wiki/tibco-spotfire-jdbc-data-access-connectivity-details#toc-11

Once TIBCO Spotfire server is configured, in order to connect to Amazon Athena and use it as data source, follow the steps below from TIBCO Spotfire client.

  1. In Spotfire Client, from the Tools menu, select Information Designer item
  1. Select Data Sources tab and on the right side click on Setup Data Source
  1. Provide connection details obtained from the Tetra Data Platform. Note that connection URL contains also parameters and it should be in following form:

jdbc:awsathena://AwsRegion=us-east-2;S3OutputLocation=s3://ts-athena-results-us-east-2-staging/demo;athena.us-east-2.amazonaws.com:443

  1. When all fields are filled-in, click on Save As... and save as new Data Source at appropriate location.

NOTE: If there is an error in connection (wrong URL, username, password, missing parameter), system will not allow saving the data source and error message will be displayed.

  1. Remaining steps give an example on how to create simple visualization using Athena tables as data source. The first step is to map table column to Spotfire column. We can either map column one by one by clicking on Column link, or, we can map multiple table columns at once by clicking on Multiple Columns. In this scenario, we will map multiple columns at once.

Click on Multiple Columns.

  1. Dialog is changed and on the left side, we have to navigate to Athena table we want to map to Spotfire. Select one column, or multiple columns, or whole table and click on Add > button.
  1. Click on button Create Columns.... Spotfire will display dialog where to save mapped columns. For better organization, it is preferable to save the columns in dedicated subfolder for given project.
    Click on Create Columns.
  1. Next step is to create information link - it is actual data source - SQL query that is executed to fetch the data from Athena. Click on Create Information Link.
  1. New dialog is displayed. Information link accepts only mapped columns that are saved in Spotfire library. Therefore, on the left side, click on Elements tab and navigate to columns created in previous step. Select desired columns and add them to the right side by clicking on the button Add >.
    This screen contains additional options, like setting parameters, prompt values, join paths, filters, caching, etc. Click on SQL... button displays SQL statement that will be executed to fetch the data.

SQL is editable and can be modified if needed. Close the SQL dialog and click on Save As... button.

  1. System will ask for the location where to save information link. Again, it is recommended to save the link at appropriate location.
  1. Next step in creating data visualization is to select Add Data Tables... from File menu.
  1. Data table supports multiple options as data source. In this case, we want to use information link we created previously. Therefore, click on Add button and select Information Link... from the menu.
  1. In the library, navigate to location where information link is saved in step 10.
  1. System will provide several options for data table. Click OK button.

At this point, Spotfire will trigger SQL and will fetch the data from Athena. Depending on the source size and provided SQL, query can take time.

  1. Once Spotfire loaded data from the source, it will show dialog with visualization suggestions. Suggestions depend on selected columns. We can either accept suggested visualization, or we can click on Close button and manually add visualizations.
  1. All available visualizations can be found on palette at the top.
  1. At the beginning, we can add simple table. Table will include all the columns and is basically the purest representation of what we have in table.
  1. At this point we can also add Filter from palette. Filter includes all columns and system will recognize text and numerical columns. For numerical columns, filters are in form of range bar. For text columns, depending on number of different values filter will display either checkboxes, or select list, or simple edit box. Changing the filters will automatically change the values in table and other visualizations.

Connect to Athena from Tableau

Please refer to the URL below. Please note that all the information necessary to connect is found in step 1 of this documentation. You need to have the driver downloaded and saved in the folder they indicate in the blog before setting up the connection.

Connect to Athena using Python

Below is a Python code snippet you can copy and paste into your Python environment and run. All you need to do is:

  1. Install necessary Python modules
  2. Replace everything in '<>' with the value from your account
# install necessary modules
!pipenv install sqlalchemy
!pipenv install PyAthena
!pipenv install PyAthena[SQLAlchemy]

# import necessary modules
from sqlalchemy.engine import create_engine
from sqlalchemy.sql.schema import Table, MetaData
from sqlalchemy.orm import sessionmaker

# define connection
conn_str = 'awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name}.amazonaws.com:443/{schema_name}?s3_staging_dir={s3_staging_dir}'
engine = create_engine(
  conn_str.format(
    aws_access_key_id='<Access Key ID found in Account -> Data>', # shared across the entire org
    aws_secret_access_key='<Access Key found in Account -> Data>', # shared across the entire org
    region_name='us-east-1',
    schema_name='<Org Slug found in Account -> Organization>',
    s3_staging_dir='<S3OutputLocation in Account -> Data -> additional conenction parameters>'
  )
)

# create session
Session = sessionmaker(bind=engine)
session = Session()

# query data
file_id = '<File ID from TetraScience platform>' #
root_table = Table('<table name>', MetaData(bind=engine), autoload=True)
results = session.query(root_table).filter_by(uuid=file_id)

# print the first row
print(results.first())

Security

Every organization on Tetra Data Platform is automatically provisioned with a separate AWS KMS (Key Management Service) Key.

The query result is encrypted using SSE_KMS (The driver uses server-side encryption with an AWS KMS-management Key).

Additionally, the JDBC driver automatically applies SSL encryption to all connections. SSL encryption protects data and credentials when they are transferred over the network, and provides stronger security than authentication alone. The SSL indicates both TLS (Transport Layer Security) and SSL (Secure Socket Layer). The drier support industry-standard versions of TLS/SSL.

The TLS version that the driver support depending on the JVM version installed. The TLS version used for the connections is the highest version that is supported by the both driver and the server.

Reference:
Oracle Blog: Diagnosing TLS, SSL, and HTTPS


Did this page help you?