Tetra File-Log Agent Installation Guide (Version 3.4)

Version

This guide is for version 3.4.0 of the software.

πŸ“˜

NOTE:

The File-Log Agent can only retrieve files stored on the NTFS file system. Other file systems are not supported.

πŸ“˜

NOTE:

Before you install the File-Log agent, make sure that the requirements have been met.

Installation

To install the File-Log agent, complete the following steps.

  1. Download the installation package (Tetrascience.Agent.File-Log.vx.x.x.msi, where "vx.x.x" is the version number) on the server. You can get the installation package from TetraScience.
  2. Double-click the installation file to start the installation wizard.
616616

Tetra File-Log Agent Installation Wizard

  1. Follow the prompts. Note that the default installation path is C:\Tetrascience\Tetrascience.Agent.File-Log.v3.x.x.
  2. When the Wizard is complete, click Close to exit.
  3. The link to the File-Log Agent appears in the Windows Program menu.

Configuration

Once the File-Log Agent has been installed, do the following to configure it.

Open the File-Log Agent

  1. Click the File-Log Agent icon in the Windows Program menu.
  2. The File-Log Agent Management Console appears.
11831183

1. File-Log Group User

The File-Log Group user is the service account that runs the Agent.

  • If the account is left blank, the Agent will run using the Windows predefined LocalSystem account.

  • If the Agent needs to monitor the network shared folders, we suggest that you provide a service account that fulfills the following requirements:

    • The service account has at least read and List folder content permission of the network shared folders (including the subfolders and files contained in those folders)
    • The service account is part of the local user group of the host server
    • The service account needs the Log on as service right

When the user enters the account user name and password, the Agent validates the account immediately. If the user name and password are correct, the Agent should show Valid next to it.

The Agent will verify the account's permission to the folder paths when starting the Agent. Error messages will pop up if the Account doesn't have enough permission.

820820

2. Agent Configuration

The section specifies how the Agent connects to the Tetra Data Platform. TetraScince provides various ways to connect the Tetra Data Platform. Please reference this link to select the best option to fit your needs.

A. S3 Direct Upload

When turning on this option, the Agent is going to upload the files directly to AWS S3 bucket bypassing from GDC or UDI. Please beware if using the S3 Direct Upload option with GDC, an L7 Proxy Data Connector needs to be created in the same Data Hub where the GDC is set. Also, the port of the L7 Proxy Data Connector is open.

With this option turned on, the user is able to upload up to 1TB of file.

πŸ“˜

NOTE:

To allow the agent to automatically perform regular backups of the SQLite database file, ensure that the S3 Direct Upload option is enabled. When this option is enabled, the SQLite database file, which stores agent configuration data, is uploaded to the backup bucket in the data lake. Should an agent failure occur, you can restore the database file from the backup bucket so that processing can continue.

B. Advanced Settings

Displays a window that allows the users to specify the time interval of sending the Agent Heartbeat and uploading output files. The default values are 30 seconds.

  • Data Connection Status - Indicates how often the software checks the status of the connection between the Tetra Data Platform and the File-Log Agent. If the Tetra Data Platform does not receive a message indicating that the File-Log Agent is "alive" for more than 5 minutes, it is assumed that the agent is offline.
  • File Upload Job - Indicates how often to upload files to the data lake. The difference between this field and the File Interval Change field is that the File Interval Change field indicates the minimum amount of time that the file must be unchanged before uploading. For example, if the File Upload Job field is set to 5 minutes, and the file Interval Change field is set to 1 minute, the file must have been unchanged for at least one minute before upload to the data lake.
  • Agent Log File - Indicates how often to upload log files to the data lake.

πŸ“˜

NOTE

In version 3.4 of the software, the "After a watched file is modified, wait for it to stop changing" was renamed "File Upload Job Runs Every" Also, the "Upload Log File" field was changed to "Agent Log File Changes Every". Field names were changed to improve readability.

603603

C. Data Connection Agent ID and URL

The prerequisite is to set up an Agent from GDC or UDI. (Please check here if any question)

  • Agent ID is a UUID
  • Examples of the URL from GDC or UDI as below
    GDC URL is http://10.100.1.1:8888/generic-connector/v1/agent
    UDI URL is https://api.tetrascience-dev.com/v1/uda/
    We strongly recommend that you verify the Agent ID and URL with TetraScience Delivery Engineer before using it.

Please make sure to enter the full URL to Connection Url

Additional note when using UDI:

If the user uses the UDI URL, it is required to attach a valid JWT token to the header.
Please reference the doc regarding UDI to learn how to get JWT Token.

The user clicks the Add/Edit button, a modal window pops up for the user to paste the JWT token:

832832

Note that the Org Slug field is required when using JWT Token. When the user clicks Save button, the token is encrypted and saved.

3. Service Settings

The user can enable either of the services or both.

πŸ“˜

NOTE:

The File-Log Agent can only retrieve files stored on the NTFS file system. Other file systems are not supported.

❗️

Functionality Change

When the Tetra File-Log Agent sends data to the TDP, the data is stored in the AWS in S3. The Tetra File-Log Agent sends data from Windows file storage - which is not case sensitive - and uploads it to S3 storage, which is case sensitive. To avoid unintentional duplication of data based on case, the File Log Agent normalizes all file path information to lower case. This ensures a similar lineage behavior as users would expect in their Windows file storage. This minor functionality change was introduced in version 3.4 of the software.

FileWatcher Service Setting

When the FileWatcher Service is enabled, the user can configure the service by providing the path to monitor the file or folder, file source type, glob pattern, file metadata, and tags, etc.

πŸ“˜

NOTE:

You can specify up to 500 paths in File Watcher.

891891

File Change Interval Specify the time span for the Agent to determine if the file should be uploaded by comparing the file's Last Write Time

Start Date The Agent ignores files/folders older than the selected date

Use Path Configuration It indicates if the Agent should apply File Change Interval and Start Date for all of the file paths, or individual file path (This feature is available for v3.3.1 and above)

Path The user can specify which folder path to monitor, the glob pattern to select the files/folders, the additional information, such as source type, metadata, and tags, which should be associated with the file when uploading the TetraScience Platform. The user can add a new path, delete a path, or edit an existing path.

If the Use Path Configuration option turns on, the user can specify the File Change Interval and Start Date for that file path (v3.3.1 and above)

829829

Among the data fields used by FileWatcher service, the Path, Patterns, and File Watcher Mode are mandatory. If Source Type is left empty, the Agent will use unknown as default.

❗️

Functionality changes

From v3.4.0, the behavior that Agent will re-upload existing files when any of Source Type, Metadata or Tags change has been reverted. Instead, there have two changes applied

  1. When the Metadata, Source Type, or Tag is updated, the change will be applied to only the new files or the updated files after the update.

  2. The Agent provides a Re-upload button for users to determine what are files should be re-upload explicitly.

Re-upload, The Re-upload button is associated with every file path. When the user clicks the button, a modal Windows populates to ask for the Start and End Date. When the user provides the date-time range and clicks the Save button, the Agent is going to reupload the files with their Last Write Time is in the specified date-time range.

870870

LogWatcher Service Setting

When the LogWatcher Service is enabled, the user can configure the service by providing the path to monitor the file or folder, file source type, glob pattern, file metadata, and tags, etc.

883883

Scan Interval to File Change It specifies how often the LogWatcher Service process to gather the file's metadata, determine the changes, and generate the new content if the files are updated.

LogWatcher Output Folder It specifies the local folder where the new content is stored initially before upload. That folder will be automatically created by the Agent if it doesn't exist.

Start Date The Agent ignores files/folders older than the selected date

Path The user can specify which folder to monitor, the glob pattern to filter the files, the additional information, such as source type, metadata, and tags, should be associated with the file when uploading the TetraScience Platform. The user can add a new path, delete a path, or edit an existing path.

619619

Among the data fields used by FileWatcher service, the Path, Patterns, and File Watcher Mode are mandatory. If Source Type is left as empty, the Agent will use unknown as default.

Starting from v3.0.0, if the user updates the source type, Metadata, or tags onto the folder path which already been processed, the Agent will reupload the files to TetraScience Data Platform by attaching the updated Source Type, Metadata or Tags. The detail can be found in FAQ - 7. How the Agent behaves if Source Type, Metadata or Tag is changed?

Start the Agent

When the configuration is complete, the user needs to Save the changes, and click the Start button to start the Agent.

922922

Before the Agent starts, the Agent validates if the File-Log Group User account has the read privilege of the folder paths the user selects. A popup window displays and notifies the users with the path having issues.

476476

When all of the errors are fixed, the Agent can start successfully.

617617

🚧

NOTE

Once installation and your initial configuration are complete, use a windows task script to start the agent daily (such as at 1:00 a.m.) to ensure that the agent continues to remain running. For information on how to do this, view this topic.

Questions or Issues

If you run into an issue, check the Troubleshooting Issues and Solutions section.

If you would like to learn more about TetraScience File-Log Agent, please read FAQ section or reach TetraScience directly.


Did this page help you?