Tetra File-Log Agent

The Tetra File-Log Agent is a high speed, instrument-agnostic agent that detects the changes of file-based outputs generated from instruments. It requires either a Tetra Generic Data Connector (GDC) installed on a Data Hub or an API Agent for file uploads.

Tetra File-Log Agent Features

πŸ“˜

NOTE:

The File-Log Agent Log Watcher service was removed in version 4.1.0 of the software.

These are the main features of the Tetra File-Log Agent:

  • Supports the File Watcher service, which can upload the new content incrementally or the entire file/folder, respectively.
  • Monitors the outputs from multiple folder paths
  • Monitors local and network drives
  • Supports glob patterns to select folders or files
  • Applies Least Privilege Access using Service Account without Local Logon permission
  • Customizes time intervals to detect the changes
  • Specifies the Start Date to select the files/folders
  • Supports large file (up to 5 TB as of v4.1.0) upload using the S3 Direct Upload option (feature for v3.0.0 and later)
  • Runs automatically in the background without user interaction
  • Automatically retries uploading files to the Tetra Data Platform (TDP) after file upload errors
  • Auto-starts when the host machine is started
  • Can auto-restart up to three times if it crashes
  • Provides file processing summary
  • Provides a full operational audit trail

πŸ“˜

NOTE

For performance information, see Tetra File-Log Agent Performance Guidelines.

Tetra File-Log Agent Requirements

These are the Tetra File-Log Agent requirements:

  1. Windows 7 Service Pack 1 (SP1) or later, with TLS 1.2 enabled or Windows Server 2008 R2 SP1 or later, with TLS 1.2 enabled
  2. The server should have 8 GB of RAM at minimum, though 16GB is recommended.
  3. The CPU should be at minimum Intel Xeon 2.5.0GHz (or equivalent)
  4. The Agent copies the source file to Group User temp folder before uploading to TDP. The available space in temp folder should be larger than the maximum file size being uploaded to retain the file temporarily.
  5. .NET Framework 4.8 (Download Link).
  6. An Agent has been created from a Generic Data Connector (GDC) or "No Connector" Agent on the Tetra Data Platform (TDP). For details about which connection to select, and how to create a connector, see this page.
  7. The Windows server hosting the Agent requires network access according to its connection mode (chosen in step 6):
    1. When using a "No Connector" Agent, HTTP(S) access to the TetraScience cloud API - for example, https://api.your-infrastructure-name.tetrascience.com/v1/uda.
    2. When using a GDC, HTTP(S) traffic to the DataHub port selected when configuring the GDC - for example, https://192.168.1.1:8443/generic-connector/v1/agent.
  8. To support S3 Direct Upload or Receive Commands, the Agent requires HTTP(S) access to the AWS endpoints listed in 'Endpoints used by agents' on this page.
    1. When using our API, this access must be direct.
    2. When using a GDC, this access can be provided by installing an L7 Proxy connector in the same DataHub as the GDC. In this case the Agent also requires access to the port selected when configuring the L7 Proxy - for example, https://192.168.1.1:3129.
  9. The Windows server hosting the Agent has network access (SMB over port 445, TCP and UDP) to any computers whose shared folders need to be monitored, as well as the Group User Account with the necessary access (see below for details).

Known Limitations

Characters in File Path

  • The File-Log Agent does not support case-sensitive directory or file names. All paths are converted to lower case when storing in the Tetra Data Platform.
  • In versions prior to 4.3.0, the File-Log Agent only supports ASCII characters in directory and file names. Version 4.3.0 and later supports non-ASCII characters in directory and file names.
  • The File-Log Agent does not support a colon β€œ:” or backslash β€œ\” in directory or file names.

Character Conversion Rules

When uploading files to Amazon Simple Storage Service (Amazon S3), the Tetra File-Log Agent modifies file paths in the following ways:

  • The Amazon S3 file path is lower case, except for for local machine names. If the DNS name for the local machine is upper case, the Tetra File-Log Agent uses the original upper-case name without converting it to lower case.
  • Microsoft Windows folder path backslashes (\) are changed to forward slashes (/) in the Amazon S3 path.
  • Windows folder path dots (.) are changed to colons (:) in the Amazon S3 path, unless the dot is in the file name.

Universal Naming Convention (UNC) Path Conversion Example

\\10.10.10.10\folder.1\filename.test.json

-changes to-

10:10:10:10/folder:1/filename.test.json

Local Path Conversion Example

c:\\folder.1\\filename.test.json

-changes to-

<MACHINE_NAME>/c/folder:1/filename.test.json

πŸ“˜

NOTE

For local machine names, the Windows operating system isn't case sensitive. Windows treats two machine names with different cases that are otherwise spelled the same as being identical.

File Systems and Protocols

  • The File-Log Agent is able to access any local file system and any which may be accessed via supported Windows file share protocols (SMB/CIFS, NFS, Windows DFS) as long as the directories and files do not violate the character restrictions above.
    • Depending on the protocol and file system used, file system events may not be supported and the File-Log Agent will rely on polling to detect changes.
  • Mapped network drives are not supported.

File Size

  • When using the Generic Data Connector or Agent without S3 Direct Upload, the maximum file size which may be uploaded is 500MB.
  • When using S3 Direct Upload, the maximum file size which may be uploaded is 5TB.

Additional Notes

  1. The Tetra File-Log Agent does not support a file path exceeding the default maximum length, which is 260 characters. This is a Windows OS limitation. To enable long paths in the host Windows OS, please refer to the FAQ: How to enable long path for Windows
  2. The Tetra File-Log Agent is optimized on the latest Windows OS. It supports previous Windows OS as long as the requirements above are met. But the Tetra File-Log Agent will not perform optimally.
  3. The Tetra File-Log Agent scanning speed is dependent on the number of file and the number of folders it scans. While the Tetra File-Log Agent scans the network drive, the network speed also affects the scanning speed.
  4. If the network share is non-Windows based, or the network redirector is non-Windows based, the Tetra File-Log Agent scanning speed is also impacted.
  5. If the files to upload are approximately 500 MB or larger, you must add an L7 proxy connector to the Data Hub.
  6. By default, a Generic Data Connector is provisioned to have 500 MB of memory. If your file is close to or larger than 500 MB, then the Tetra File-Log Agent may not be able to upload it. While you can adjust the memory allocation for GDC, TetraScience recommends installing the L7 proxy connector.
  7. If your file system has an additional permissions system, please discuss with TetraScience before installing the Tetra File-Log Agent so we can assist you with agent configuration.

File Watcher Service

The File Watcher service uploads the entire file or folder to the Tetra Scientific Data Cloud. If the file has been changed multiple times, the Data Lake could contain multiple versions of the same file.

The Agent monitors the files or folders by using the paths, the associated patterns (Glob patterns) and Start Date defined in the File Watcher Section in Window Management Console. The Start Date is used to exclude the files before that date.

There are two modes, File Mode and Folder Mode, in the File Watcher Service. The logic of detecting the changes is slightly different between these two modes.

File Mode

File Mode monitors specified paths to detect changes in individual files and then uploads those files to the Data Lake. The File Watcher Service periodically checks the following file metadata and stores the results in the local SQLite database:

If the values of either of these metadata types change, the Agent marks that file as changed.

Folder Mode

Folder mode monitors specified paths' subdirectories to detect changes and then uploads those directories to the Data Lake. Directories are uploaded as a compressed zip file containing All of the files in the directory, including the subdirectories. If any of the files in the folder are changed, the Agent will upload the entire folder as a compressed file again. The Agent periodically checks and stores the following attributes for that folder:

  • Total number of files in the folder
  • Total size of the files in the folder
  • The Last Write Time from the youngest file in the folder

If any of the criteria has changed, the Agent marks that folder as changed.

The path of the changed file or the folder is put into a processing queue. The items in the queue will be processed sequentially.

For those files or folders monitored by the Agent, the Agent checks the attributes of the file or folder in every time interval, compares the result with the ones from the previous time interval, and determines if they are changed and uploaded to the Data Lake.

When the Agent decides the file or the folder is ready to be uploaded, the Agent will move the file or folder Windows temp folder. If it is in folder mode, the folder will be compressed as a zip file.

The output files in the temp folder will be uploaded until they succeeded. Once succeeded, the files will be removed from the temp folder. The file upload time interval can be specified in the Advanced Settings in the Agent Configuration section.

πŸ“˜

NOTE

File upload time interval settings are available in Tetra File-Log Agent versions 4.2.3 and earlier. In versions 4.3 and higher, the setting is no longer required, because the upload job runs continuously.

Retry Behavior

If a file upload error occurs, the Tetra File-Log Agent automatically retries uploading the file to the TDP. The Agent retries uploading the file until the upload is successful or the file can't be found. If the Agent can't find a file, that file's status shows as Inactive.