TetraScience

Tetra Data Platform Documentation

Welcome to TetraScience Tetra Data Platform (TDP) documentation site. Here, you'll find Product Documentation, API Documentation, and Release Notes for TDP components.

Release Notes    API Documentation

Tetra File-Log Agent

Overview

Introduction

The Tetra File-Log Agent is a high speed, instrument-agnostic agent that detects the changes of file-based outputs generated from instruments. It requires either a Tetra Generic Data Connector (GDC) installed on a DataHub or a User Defined Integration (UDI) for file uploads.

Main Features

The Tetra File-Log Agent does the following:

  • Supports two services LogWatcher service and FileWatcher service, which can upload the new content incrementally or the entire file/folder respectively.
  • Monitors the outputs from multiple folder paths
  • Monitors local and network drives
  • Supports glob patterns to select folders or files
  • Applies Least Privilege Access using Service Account without Local Logon permission
  • Customizes time intervals to detect the changes
  • Specifies the Start Date to select the files/folders
  • Supports large file (up to 5 TB) upload using S3 direct upload option (feature for v3.0.0 and above)
  • Runs automatically in the background without user interaction
  • Auto-starts when the host machine is started
  • Can auto-restart up to three times if it crashes
  • Provides file processing summary
  • Provides a full operational audit trail

Requirements

The following are the requirements for the File-Log Agent.

  1. Windows 10 or Windows 2016 Server
  2. The server should have 8 GB of RAM at minimum, though 16GB is recommended.
  3. The CPU should be at minimum Intel Xeon 2.5.0GHz (or equivalent)
  4. 50 GB of empty disk space in C drive. This is because the Agent copies the source file to Group User temp folder before uploading to TDP. The available space should be larger than the file size to retain the file temporarily.
  5. .NET Framework 4.8 (Download Link).
  6. An Agent has been created from a Generic Data Connector (GDC) or User Defined Integration (UDI) on the Tetra Data Platform (TDP). For more information on which connection to select, and how to create a connector see this page for more details.
  7. The Windows server hosting the Agent has network access to the DataHub host machine (HTTP(S) traffic to the port selected when configuring the Generic Data Connector), or direct internet access (HTTP(S) traffic to the TetraScience cloud API).
  8. The Windows server hosting the Agent has network access (SMB over port 445, TCP and UDP) to any computers whose shared folders need to be monitored, as well as the Group User Account with the necessary access (see below for details).

Additional Notes

Please note the following.

  1. The Agent doesn't support a file path exceeding the default maximum length, which is 260 characters. This is a Windows OS limitation.
  2. The Agent is optimized on the latest Windows OS. It supports previous Windows OS, such as Windows 7 or Windows Server 2012 as well, as long as the requirements above are met. But the Agent won't be able to perform optimally.
  3. The agent scanning speed is dependent on the number of folders it scans, as well as the number of files. While the agent scans the network drive, the network speed also affects the scanning speed.
  4. If the network share is non-windows based, or the network redirector is non-windows based, the Agent scanning speed is also impacted.
  5. If the files you want to upload are approximately 500 MB or larger, you must add an L7 proxy connector to the DataHub.
  6. By default, a Generic Data Connector is provisioned to have 500MB of memory. If your file is close to or larger than 500MB, the agent might not be able to upload it. While the memory allocation for GDC can be adjusted, we recommend installing the L7 proxy connector.
  7. If your file system has a permissions system on top of it, please discuss with TetraScience before installing the agent so we can assist you with agent configuration.

Log Watcher Service

Log Watcher Service detects and extracts new content from the files monitored by the File-Log Agent. The Agent monitors the files or folders from the paths, the associated patterns (Glob patterns) and the Start Date defined in the Windows Management Console. The Start Date is used to exclude the files prior to that date.

The Log Watcher service periodically checks file content changes in the time interval defined in the Log Watcher section in Windows Management Console UI. The change detect logic is based on the following two criteria:

If either of those file attributes is changed, the Agent will start to output the new content from the last line it generates in the previous time.

The maximum length of the output file is 5000 lines as default. System Administrator can modify the setting (Max Batch Size) from the LogWatcher Path Editor Window in Management Console. If the total new content is more than 5000 lines, the Agent will split the output file to multiple smaller files with maximum 5000 lines in each file.

The name convention of the output file is the following:
original_file_name_last_line_number_timestamp.file_extension

Taking an example, the original filename is 20200224-1228-error.log. That file contains 5530 data lines. The Max Batch Size is 5000. When it is outputted for the first time, the Agent generates two files. Please note the time stamp is based on the current time when the file is created.

  • 20200224-1228-error_5000_20200607T101703.384Z.log
  • 20200224-1228-error_5530_20200607T101703.572Z.log

The output files are saved into LogWatcher Output Folder, defined in Log Watcher section in Windows Management Console. To avoid potential file naming collision, The output files related to the same source file are put together into a subfolder. The subfolder name is a unique UUID.

If the instrument appends 50 more data lines to that data file, the Agent will generate an additional file name as 20200224-1228-error_5580_20200607T103353.521Z.log. The output file is saved to the same subfolder with the files generated from previous times. The content of output folder looks like below:

The Agent is going to upload the output files to Tetra Data Platform periodically till they are succeeding.

Once the output files are uploaded successfully to TetraScience Platform, the output files will be moved to the archive folder (LogWatcher Archive Folder). In case, the output files are failed to be uploaded to Tetra Data Platform, the Agent will retry the uploading in the next time interval.

File Watcher Service

Unlike logWatcher service to upload the new content, the fileWatcher service uploads the entire file or folder to the TetraScience platform. If the file has been changed multiple times, the TetraScience Platform could contain multiple versions of the same file.

The Agent monitors the files or folders by using the paths, the associated patterns (Glob patterns) and Start Date defined in the File Watcher Section in Window Management Console. The Start Date is used to exclude the files before that date.

There have two modes, File Mode and Folder Mode, in File Watcher Service, The logic of detecting the changes is slightly different between those two modes.

File Mode

This mode is to detect the changes from an individual file. File Watcher Service periodically checks the following file metadata and store the result in the local SQLite database

If either of them has changed, the Agent marks that file as changed.

Folder Mode

This mode is to detect the changes from All of the files in the whole folder, including the subfolders. If any of the files in the folder has been changed, the Agent will upload the entire folder as a compressed file. The Agent periodically checks and stores the following attributes for that folder

  • Total number of files in the folder
  • Total size of the files in the folder
  • The Last Write Time from the youngest file in the folder

If any of the criteria has changed, the Agent marks that folder as changed.

The path of the changed file or the folder is put into a processing queue. The items in the queue will be processed sequentially.

For those files or folders monitored by the Agent, the Agent checks the attributes of the file or folder in every time interval, compares the result with the ones from the previous time interval, and determines if they are changed and uploaded to Tetra Data Platform.

When the Agent decides the file or the folder is ready to be uploaded, the Agent will move the file or folder Windows temp folder. If it is the folder mode, the folder will be compressed as a zip file.

The output files in the temp folder will be uploaded to the Tetra Platform until they are succeeded. Once succeeded, the files will be removed from the temp folder. The file upload time interval can be specified in the Advanced Settings in the Agent Configuration section.

Updated 25 days ago


Tetra File-Log Agent


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.