Box is a service that offers secure file sharing. Tetra Data Platform has a built-in integration that allows users to pull raw data files from their secure Box storage and into the Tetra Data Lake.
How does the integration work
TetraScience leverages Box's API to constantly detect file change events in your Box account and upload the files into our Data Lake and then trigger Data Pipelines.
Our Box integration currently tracks file creation events, including different versions of the same file. If you remove your file from Box, TetraScience Data Lake will not mirror that and will not delete the files we collected.
Box integration will track (listen) to 3 type events in your Box account:
File has been uploaded (create event)
File has been changed (update event)
File has been copied from another Box location (copy event)
The integration will detect the file creation in Box every 60 seconds.
When you first create a Box integration, it will pull all existing files that match the provided file pattern and put the files in our Data Lake.
How to configure the integration
Set up your Box account
First, create a dedicated API user for this integration. For production usage, the best practice is to create an API user (standard user) dedicated for this integration. We recommend you name it: [email protected].
After the user is created, share the Box folder that you would like the integration to track with the API user with read-only permissions.
Organize your Box folders
It is always a good idea to leverage the folder structure to organize your data, the best practice is to include your study number, project name/id, instrument name/id and etc in the folder path. For example: