Connect to Athena SQL Tables by Using Python
Amazon Athena is an interactive query service that allows you to analyze data in the Tetra Data Lake or Data Lakehouse using standard SQL.
The following procedure shows how to connect to your Tetra Data Platform (TDP) organization's Athena SQL tables by using Python.
Before You Begin
Before you can use Python to connect to your organization's Athena SQL tables, make sure that you do the following:
For more information about setting up the prerequisites, see Use Third-Party Tools to Connect to Athena SQL Tables.
Use Python to Connect to Athena SQL Tables
Copy and paste the following Python code into your Python environment. Then update the following in the example code and run the script:
- Install the necessary Python modules.
- Replace everything in
'<>'
with your SQL credentials from the TDP.
# install necessary modules
!pipenv install sqlalchemy
!pipenv install PyAthena
!pipenv install PyAthena[SQLAlchemy]
# import necessary modules
from sqlalchemy.engine import create_engine
from sqlalchemy.sql.schema import Table, MetaData
from sqlalchemy.orm import sessionmaker
# define connection
conn_str = 'awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name}.amazonaws.com:443/{schema_name}?s3_staging_dir={s3_staging_dir}'
engine = create_engine(
conn_str.format(
aws_access_key_id='<Access Key ID>', # shared across the entire org
aws_secret_access_key='<Access Key>', # shared across the entire org
region_name='<AWS Region>', # shared across the entire org
schema_name='<Org Slug>'.replace('-','_'), # replace dashes from org slug to underscores,
s3_staging_dir='<S3OutputLocation>'
)
)
# create session
Session = sessionmaker(bind=engine)
session = Session()
# query data
metadata = MetaData()
metadata.reflect(bind=engine, only=['<table name>'])
root_table = metadata.tables['<table name>']
results = session.query(root_table).filter_by(location='<location>')
# print the first row
print(results.first())
Create Joins and Aggregate Query Results by Using SQL
To add joins and aggregate your query's search results by using standard SQL syntax, use the following formatting.
NOTE
Make sure that you replace every variable in
'<>'
with information specific to your query.
from pyathena import connect
# Connect to Athena
conn = connect(s3_staging_dir='s3://<athena-staging-dir>/',
region_name='<AWS Region>')
# Run a query
cursor = conn.cursor()
cursor.execute("SELECT * FROM <some_table> WHERE <some_column> = <some_value> LIMIT 10")
# Fetch results
results = cursor.fetchall()
for row in results:
print(row)
Create Joins and Aggregate Query Results by Using SQLAlchemy
To add joins and aggregate your query's search results by using SQLAlchemy, use the following formatting.
NOTE
Make sure that you replace every variable in
'<>'
with information specific to your query.
SQLAlchemy Joins Formatting
metadata.reflect(bind=engine, only=['<table1>', '<table2>'])
table1 = metadata.tables['<table1>']
table2 = metadata.tables['<table2>']
from sqlalchemy import select, join
j = join(table1, table2, table1.c.key == <table2.c.key>)
stmt = select([<table1>, <table2>]).select_from(j)
results = session.execute(stmt)
for row in results:
print(row)
SQLAlchemy Aggregation Formatting
from sqlalchemy import func
# Assuming you have already set up your engine, session, and reflected metadata as shown in the docs:
metadata.reflect(bind=engine, only=['<table name>'])
root_table = metadata.tables['<table name>']
# Example: Count the number of rows per value in a column
results = session.query(root_table.c.<column_name>, func.count()).group_by(root_table.c.<column_name>)
for row in results:
print(row)
# Example: Get the sum of a column
results = session.query(func.sum(root_table.c.<numeric_column>)).all()
print(results)
Documentation Feedback
Do you have questions about our documentation or suggestions for how we can improve it? Start a discussion in TetraConnect Hub. For access, see Access the TetraConnect Hub.
NOTE
Feedback isn't part of the official TetraScience product documentation. TetraScience doesn't warrant or make any guarantees about the feedback provided, including its accuracy, relevance, or reliability. All feedback is subject to the terms set forth in the TetraConnect Hub Community Guidelines.
Updated 2 days ago