Tetra File-Log Agent
The Tetra File-Log Agent is a high speed, on-premises, instrument-agnostic application that detects the changes of file-based outputs generated from instruments. The Agent can connect to the Tetra Data Platform (TDP) by using either an on-premises Tetra Hub, or the TDP (No Connector) option. To determine which data connection type is required for your use case, see Agent Deployment Options.
For more information and best practices, see File-Log Agent in the TetraConnect Hub. For access, see Access the TetraConnect Hub.
Tetra File-Log Agent Features
NOTE:
The File-Log Agent Log Watcher service was removed in version 4.1.0 of the software.
These are the main features of the Tetra File-Log Agent:
- Supports the File Watcher service, which can upload the new content incrementally or the entire file/folder, respectively.
- Monitors the outputs from multiple folder paths
- Monitors local and network drives
- Supports glob patterns to select folders or files
- Applies
Least Privilege Access
using Service Account withoutLocal Logon
permission - Customizes time intervals to detect the changes
- Specifies the Start Date to select the files/folders
- Supports large file (up to 5 TB as of v4.1.0) upload using the S3 Direct Upload option (feature for v3.0.0 and later)
- Runs automatically in the background without user interaction
- Automatically retries uploading files to the Tetra Data Platform (TDP) after file upload errors
- Auto-starts when the host machine is started
- Can auto-restart up to three times if it crashes
- Provides file processing summary
- Provides a full operational audit trail
NOTE
For performance information, see Tetra File-Log Agent Performance Guidelines.
Tetra File-Log Agent Requirements
These are the Tetra File-Log Agent requirements:
- Windows 7 Service Pack 1 (SP1) or later, with TLS 1.2 enabled or Windows Server 2008 R2 SP1 or later, with TLS 1.2 enabled or Windows Server 2022.
- The server should have 8 GB of RAM at minimum, though 16GB is recommended.
- The CPU should be at minimum Intel Xeon 2.5.0GHz (or equivalent)
- The Agent copies the source file to Group User temp folder before uploading to TDP. The available space in temp folder should be larger than the maximum file size being uploaded to retain the file temporarily.
- .NET Framework 4.8 (Download Link).
- An Agent has been created on the Tetra Data Platform (TDP). For instructions, see Cloud Configuration of Tetra Agents. To determine which data connection type is required for your use case, see Agent Deployment Options.
- The Windows server hosting the Agent requires network access according to its connection mode (chosen in step 6). For details, see Agent Connection Types.
- If you select the Enable S3 Direct Upload or Receive Commands option when you configure a Tetra Agent, then you must add the following endpoints to your organization's allow list before you can use a Tetra Hub:
- For required Data Hub endpoints, see Endpoint Allow List for Tetra Agents When Using a Tetra Data Hub.
- For required Hub endpoints, see Endpoint Allow List for Tetra Agents When Using a Tetra Hub.
NOTE
When using the TetraScience API, this access must be direct. When using a Tetra Data Hub with a Generic Data Connector (GDC), this access can be provided by installing an L7 Proxy Connector in the same Data Hub as the GDC. In this case the Agent also requires access to the port selected when configuring the L7 Proxy (for example,
https://192.168.1.1:3129
).
- The Windows server hosting the Agent has network access (SMB over port 445, TCP and UDP) to any computers whose shared folders need to be monitored, as well as the Group User Account with the necessary access (see below for details).
Known Limitations
Characters in File Path
- The File-Log Agent does not support case-sensitive directory or file names. All paths are converted to lowercase when stored in the TDP.
- In versions prior to 4.3.0, the File-Log Agent only supports ASCII characters in directory and file names. Version 4.3.0 and higher supports non-ASCII characters in directory and file names.
- The File-Log Agent does not support any of the following characters in directory or file names:
< > : " / \ | ? *
- Other file systems may support these characters, but the agent running on Windows and accessing the network share may not be able to correctly scan and read files with these characters in their name.
NOTE
If you're having problems uploading files with special characters in their path or file name, you can do either of the following:
Modify the application producing the files to use only valid characters in the path and file name. This is often the simplest and best solution.
- or -
Network Attached Storage systems often have support for a name translation layer (for example, the vfs_catia module for Samba). You can use these storage systems to configure mappings for characters to make sure the share presents the files with valid characters only. Care must be taken when doing this, however, to avoid creating duplicate file names.
Character Conversion Rules
When uploading files to Amazon Simple Storage Service (Amazon S3), the Tetra File-Log Agent modifies file paths in the following ways:
- The Amazon S3 file path is lower case, except for for local machine names. If the DNS name for the local machine is upper case, the Tetra File-Log Agent uses the original upper-case name without converting it to lower case.
- Microsoft Windows folder path backslashes (
\
) are changed to forward slashes (/
) in the Amazon S3 path. - Windows folder path dots (
.
) are changed to colons (:
) in the Amazon S3 path, unless the dot is in the file name.
Universal Naming Convention (UNC) Path Conversion Example
\\10.10.10.10\folder.1\filename.test.json
-changes to-
10:10:10:10/folder:1/filename.test.json
Local Path Conversion Example
c:\\folder.1\\filename.test.json
-changes to-
<MACHINE_NAME>/c/folder:1/filename.test.json
NOTE
For local machine names, the Windows operating system isn't case sensitive. Windows treats two machine names with different cases that are otherwise spelled the same as being identical.
File Systems and Protocols
- The File-Log Agent is able to access any local file system and any which may be accessed via supported Windows file share protocols (SMB/CIFS, NFS, Windows DFS) as long as the directories and files do not violate the character restrictions above.
- Depending on the protocol and file system used, file system events may not be supported and the File-Log Agent will rely on polling to detect changes.
- The Windows operating system determines the specific network protocol and version the Agent uses to access files.
- Mapped network drives are not supported.
File Size
- When using the Generic Data Connector or Agent without S3 Direct Upload, the maximum file size which may be uploaded is 500MB.
- When using S3 Direct Upload, the maximum file size which may be uploaded is 5TB.
Additional Notes
- The Tetra File-Log Agent does not support a file path exceeding the default maximum length, which is 260 characters. This is a Windows OS limitation. To enable long paths in the host Windows OS, please refer to the FAQ: How to enable long path for Windows
- The Tetra File-Log Agent is optimized on the latest Windows OS. It supports previous Windows OS as long as the requirements above are met. But the Tetra File-Log Agent will not perform optimally.
- The Tetra File-Log Agent scanning speed is dependent on the number of file and the number of folders it scans. While the Tetra File-Log Agent scans the network drive, the network speed also affects the scanning speed.
- If the network share is non-Windows based, or the network redirector is non-Windows based, the Tetra File-Log Agent scanning speed is also impacted.
- If the files to upload are approximately 500 MB or larger, you must add an L7 proxy connector to the Data Hub.
- By default, if you're using a Data Hub with a Generic Data Connector (GDC), the GDC is provisioned to have 500 MB of memory. If your file is close to or larger than 500 MB, then the Tetra File-Log Agent may not be able to upload it. While you can adjust the memory allocation for GDC, TetraScience recommends installing the L7 Proxy Connector and setting the Enable S3 Direct Upload option to Yes.
- If your file system has an additional permissions system, please discuss with TetraScience before installing the Tetra File-Log Agent so we can assist you with agent configuration.
File Watcher Service
The File Watcher service uploads the entire file or folder to the Tetra Scientific Data Cloud. If the file has been changed multiple times, the Data Lake could contain multiple versions of the same file.
The Agent monitors the files or folders by using the paths, the associated patterns (Glob patterns) and Start Date defined in the File Watcher Section in Window Management Console. The Start Date is used to exclude the files before that date.
There are two modes, File Mode and Folder Mode, in the File Watcher Service. The logic of detecting the changes is slightly different between these two modes.
File Mode
File Mode monitors specified paths to detect changes in individual files and then uploads those files to the Data Lake. The File Watcher Service periodically checks the following file metadata and stores the results in the local SQLite database:
If the values of either of these metadata types change, the Agent marks that file as changed.
Folder Mode
Folder mode monitors specified paths' subdirectories to detect changes and then uploads those directories to the Data Lake. Directories are uploaded as a compressed zip file containing All of the files in the directory, including the subdirectories. If any of the files in the folder are changed, the Agent will upload the entire folder as a compressed file again. The Agent periodically checks and stores the following attributes for that folder:
- Total number of files in the folder
- Total size of the files in the folder
- The Last Write Time from the youngest file in the folder
If any of the criteria has changed, the Agent marks that folder as changed.
The path of the changed file or the folder is put into a processing queue. The items in the queue will be processed sequentially.
For those files or folders monitored by the Agent, the Agent checks the attributes of the file or folder in every time interval, compares the result with the ones from the previous time interval, and determines if they are changed and uploaded to the Data Lake.
When the Agent decides the file or the folder is ready to be uploaded, the Agent will move the file or folder Windows temp folder. If it is in folder mode, the folder will be compressed as a zip file.
The output files in the temp folder will be uploaded until they succeeded. Once succeeded, the files will be removed from the temp folder. The file upload time interval can be specified in the Advanced Settings in the Agent Configuration section.
NOTE
File upload time interval settings are available in Tetra File-Log Agent versions 4.2.3 and earlier. In versions 4.3 and higher, the setting is no longer required, because the upload job runs continuously.
Retry Behavior
NOTE
The exponential backoff retry behavior is available in Tetra File-Log Agent v4.4.0 and higher only.
File upload, archive, and delete operations have limited, delayed retries—also known as a retry with exponential backoff pattern. When any of these operations fail, the agent enqueues the file for retry after a delay starting at 1
minute, doubling the delay every retry up to a maximum of 24
hours. Each operation is attempted up to 40 times by default (configurable through Advanced Settings). This behavior results in 10 attempts in the first 34 hours, then one attempt each day for the next 30 days.
Updated 4 months ago