Connect to Athena Tables from Python

Amazon Athena is an interactive query service that allows you to use standard SQL to view and analyze data in your organization's Tetra Data Lake S3 bucket. This page describes how to connect to your Athena tables from Python.

Before You Begin

Before you can use Python to connect to Athena tables, you must have:

Connect to Athena Tables from Python

Below is a Python code snippet you can copy and paste into your Python environment and run. All you need to do is:

  1. Install the necessary Python modules.
  2. Replace everything in '<>' with the your account connection details from the Tetra Data Platform (TDP).
# install necessary modules
!pipenv install sqlalchemy
!pipenv install PyAthena
!pipenv install PyAthena[SQLAlchemy]

# import necessary modules
from sqlalchemy.engine import create_engine
from sqlalchemy.sql.schema import Table, MetaData
from sqlalchemy.orm import sessionmaker

# define connection
conn_str = 'awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name}.amazonaws.com:443/{schema_name}?s3_staging_dir={s3_staging_dir}'
engine = create_engine(
  conn_str.format(
    aws_access_key_id='<Access Key ID found in Account ->  SQL Access -> Additional connection parameters>>', # shared across the entire org
    aws_secret_access_key='<Access Key found in Account ->  SQL Access -> Additional connection parameters>>', # shared across the entire org
    region_name='us-east-1',
    schema_name='<Org Slug found in Account -> Organization>'.replace('-','_'), # replace dashes from org slug to underscores,
    s3_staging_dir='<S3OutputLocation in Account -> SQL Access -> Additional connection parameters>'
  )
)
# create session
Session = sessionmaker(bind=engine)
session = Session()

# query data
file_id = '<File ID from TetraScience platform>' #
root_table = Table('<table name>', MetaData(bind=engine), autoload=True)
results = session.query(root_table).filter_by(uuid=file_id)

# print the first row
print(results.first())