Tetra File-Log Agent Installation Guide (Version 3.5)
Version
This guide is for version 3.5.0 of the software.
NOTE:
The File-Log Agent can only retrieve files stored on the NTFS file system. Other file systems are not supported. Also, when using NFTS, do not make any directory you want to watch case-sensitive.
NOTE:
Before you install the File-Log agent, make sure that the requirements have been met.
Installation
To install the File-Log agent, complete the following steps.
- Download the installation package (Tetrascience.Agent.File-Log.v3.5.0.msi) on the server. You can get the installation package from TetraScience.
- Double-click the installation file to start the installation wizard.
- Follow the prompts. Note that the default installation path is C:\Tetrascience\Tetrascience.Agent.File-Log.v3.5.0.
- When the Wizard is complete, click Close to exit.
- The link to the File-Log Agent appears in the Windows Program menu.
Configuration
Once the File-Log Agent has been installed, do the following to configure it.
Open the File-Log Agent
- Click the File-Log Agent icon in the Windows Program menu.
- The File-Log Agent Management Console appears.
1. Scheduled Windows Task
If this option is enabled, the File-Log Agent creates a Windows Task that checks the status of Agent service daily, at a time you specify.
- If the Agent is stopped, the scheduled Windows Task will automatically restart the Agent service.
- If the Agent is running, nothing further happens.
If you manually stop the File-Log Agent, the Windows Task will be removed. The Windows Task that is created runs under the LocalSystem account.
2. File-Log Group User
The File-Log Group user is the service account that runs the Agent.
-
If the account is left blank, the Agent will run using the Windows predefined LocalSystem account.
-
If the Agent needs to monitor the network shared folders, we suggest that you provide a service account that fulfills the following requirements:
- The service account has at least read and List folder content permission of the network shared folders (including the subfolders and files contained in those folders)
- The service account is part of the local user group of the host server
- The service account needs the Log on as service right
When the user enters the account user name and password, the Agent validates the account immediately. If the user name and password are correct, the Agent should show Valid
next to it.
The Agent will verify the account's permission to the folder paths when starting the Agent. Error messages will pop up if the Account doesn't have enough permission.
3. Agent Configuration
The section specifies how the Agent connects to the Tetra Data Platform. TetraScince provides various ways to connect the Tetra Data Platform. Please reference this link to select the best option to fit your needs.
A. S3 Direct Upload
When turning on this option, the Agent is going to upload the files directly to AWS S3 bucket bypassing from GDC or UDI. Please beware if using the S3 Direct Upload option with GDC, an L7 Proxy Data Connector needs to be created in the same Data Hub where the GDC is set. Also, the port of the L7 Proxy Data Connector is open.
With this option turned on, the user is able to upload up to 1TB of file.
NOTE:
To allow the agent to automatically perform regular backups of the SQLite database file, ensure that the S3 Direct Upload option is enabled. When this option is enabled, the SQLite database file, which stores agent configuration data, is uploaded to the backup bucket in the data lake. Should an agent failure occur, you can restore the database file from the backup bucket so that processing can continue.
B. Advanced Settings
Displays a window that allows the users to specify the time interval of sending the Agent Heartbeat
and uploading output files. The default values are 30 seconds.
- Data Connection Status - Indicates how often the software checks the status of the connection between the Tetra Data Platform and the File-Log Agent. If the Tetra Data Platform does not receive a message indicating that the File-Log Agent is "alive" for more than 5 minutes, it is assumed that the agent is offline.
- File Upload Job - Indicates how often to upload files to the data lake. The difference between this field and the File Interval Change field is that the File Interval Change field indicates the minimum amount of time that the file must be unchanged before uploading. For example, if the File Upload Job field is set to 5 minutes, and the file Interval Change field is set to 1 minute, the file must have been unchanged for at least one minute before upload to the data lake.
- Agent Log File - Indicates how often to upload log files to the data lake.
NOTE
Starting from version 3.4 of the software, the "After a watched file is modified, wait for it to stop changing" was renamed "File Upload Job Runs Every" Also, the "Upload Log File" field was changed to "Agent Log File Uploads Every". Field names were changed to improve readability.
C. Data Connection Agent ID and URL
To fill in these fields, you'll need the information from the GDC or UDI connector that you created as part of the installation prerequisites. If you did not create a connector yet, see the Tetra Agent Integration and Connectors section for more details.
- Agent ID is a UUID
- Examples of the URL from GDC or UDI as below
GDC URL ishttp://10.100.1.1:8888/generic-connector/v1/agent
UDI URL ishttps://api.tetrascience-dev.com/v1/uda/
We strongly recommend that you verify the Agent ID and URL with TetraScience Delivery Engineer before using it.
Please make sure to enter the full URL to Connection Url
Additional note when using UDI:
If the user uses the UDI URL, it is required to attach a valid JWT token to the header.
Please reference the doc regarding UDI to learn how to get JWT Token.
The user clicks the Add/Edit
button, a modal window pops up for the user to paste the JWT token:
Note that the Org Slug field is required when using JWT Token. When the user clicks Save button, the token is encrypted and saved.
4. Service Settings
The user can enable either of the services or both.
NOTE:
The File-Log Agent can only retrieve files stored on the NTFS file system. Other file systems are not supported. Also, when using NFTS, do not make any directory you want to watch case-sensitive.
Functionality Change
When the Tetra File-Log Agent sends data to the TDP, the data is stored in the AWS in S3. The Tetra File-Log Agent sends data from Windows file storage - which is not case sensitive - and uploads it to S3 storage, which is case sensitive. To avoid unintentional duplication of data based on case, the File Log Agent normalizes all file path information to lower case. This ensures a similar lineage behavior as users would expect in their Windows file storage. This minor functionality change was introduced in version 3.4 of the software.
FileWatcher Service Setting
When the FileWatcher Service is enabled, the user can configure the service by providing the path to monitor the file or folder, file source type, glob pattern, file metadata, and tags.
NOTE:
You can specify up to 500 paths in File Watcher.
- File Change Interval: Specify the time span for the Agent to determine if the file should be uploaded by comparing the file's Last Write Time
- Start Date: Causes the agent to ignore files/folders older than the selected date.
- Use Path Configuration: Indicates whether the Agent should apply File Change Interval and Start Date for all of the file paths, or individual file path (This feature is available for v3.3.1 and above).
- Path: Allows the user to specify which folder path to monitor, the glob pattern to select the files/folders, the additional information, such as source type, metadata, and tags, which should be associated with the file when uploading the TetraScience Platform. The user can add a new path, delete a path, or edit an existing path.
If the Use Path Configuration option is turned on, the user can specify the File Change Interval and Start Date for that file path (v3.3.1 and above).
Among the data fields used by FileWatcher service, the Path, Patterns, and File Watcher Mode are mandatory.
NOTE:
For more information on metadata and tags, including which characters can be used, see this topic.
Re-upload: The Re-upload button is associated with every file path. When the user clicks the button, a modal Windows populates to ask for the Start and End Date. When the user provides the date-time range and clicks the Save button, the Agent is going to reupload the files with their Last Write Time is in the specified date-time range.
NOTE:
Starting with v3.5.0, the Agent doesn't validate whether the folder path exists or the Agent Group User has the proper permission on the folder before the Agent starts.
Instead, the Agent verifies the folder path and the user permission when iterating the folder paths. The Agent will display an icon in front of every folder path to indicate whether the folder path is valid.
To illustrate by using the sample below, the following two folders are invalid:
\\tsempowercli2.tsempower.local\Shared-Folder\Performance_Test_01\folder3\
\\tsempowercli2.tsempower.local\Shared-Folder\Performance_Test_01\folder4\
The Agent keeps checking those folder paths in each interval and updates the status if any change.
LogWatcher Service Setting
When the LogWatcher Service is enabled, the user can configure the service by providing the path to monitor the file or folder, file source type, glob pattern, file metadata, and tags, etc.
- Scan Interval to File Change: Specifies how often the LogWatcher Service processes to gather the file's metadata, determine the changes, and generate the new content if the files are updated.
- LogWatcher Output Folder: Specifies the local folder where the new content is stored initially before upload. That folder will be automatically created by the Agent if it doesn't exist.
- Start Date: Causes the Agent to ignore files/folders older than the selected date.
- Path: Specifies which folder to monitor, the glob pattern to filter the files, the additional information, such as source type, metadata, and tags, should be associated with the file when uploading the TetraScience Platform. The user can add a new path, delete a path, or edit an existing path.
Among the data fields used by FileWatcher service, the Path, Patterns, and File Watcher Mode are mandatory. If Source Type is left as empty, the Agent will use unknown as default.
Starting from v3.0.0, if the user updates the source type, Metadata, or tags onto the folder path which already been processed, the Agent will reupload the files to TetraScience Data Platform by attaching the updated Source Type, Metadata or Tags. The detail can be found in FAQ - 7. How the Agent behaves if Source Type, Metadata or Tag is changed?
Start the Agent
When the configuration is complete, the user needs to Save the changes, and click the Start button to start the Agent.
Before the Agent starts, the Agent validates if the File-Log Group User account has the read privilege of the folder paths the user selects. A popup window displays and notifies the users with the path having issues.
When all of the errors are fixed, the Agent can start successfully.
Questions or Issues
If you run into an issue, check the Troubleshooting Issues and Solutions section.
If you would like to learn more about TetraScience File-Log Agent, please read FAQ section or reach TetraScience directly.
Updated 12 months ago