Verification Provider

Overview

Trados provides verification functionality on translatable content (source and/or target) in several ways:

Batch Verification Task (set in project workflows)
Editor Document Verification Task
Editor Segment-level Verification operation

The first two verification tasks validate the whole document, whereas the last operation, validates a single segment.

For document validation, native verification is also possible. This allows for validation of a native file or a native annotated file. The difference between an annotated file and a non-annotated file is that the first allows for the generation of location information and the second does not. For native verification, a verification resource package may also be retrieved by the extension. This would contain ancillary files such as schemas, for example, in the case of XML validation.

Trados supports a set of internal verifiers which are fixed in nature and the user can decide, through settings, what will be verified using these. Some of them are, for example:

QA Checker
Tag Verifier

To allow for extending of the verification functionality, Trados supports the creation of extensible or external verifiers. These are hosted in apps which are essentially a self-contained serviced offering the verification functionality.

Extensible verifiers would normally only support a small set of verification types, such as a target segment length check (against the length of the source) for example, or perhaps, a punctuation checker.

There are 2 types of verification which are supported by extensibility:

Bilingual Document Verification
Native File Verification

With Bilingual Document Verification, the verifier has access to the BCM document and can validate each segment as required

Native File Verification validates the native source or target file. This file might also be annotated to allow for the easy determination of segment identifiers. An example of native file verification would be schema validation on an XML file. The aim of native verification is to establish if the translation process has broken any of the document structure and rendered an invalid document.

Each verifier will generate a set of error or warning messages based on the type of validation they perform. The user can change validation settings in the LC UI Project Settings which will affect how or when the messages are generated. External verifiers also have the option to specify resource packages in the settings which will allow the user to upload support files required during the validation process. This option is verifier specific as not all verifiers might require a package.

The user can see generated verification messages in the Editor UI and can choose to ignore certain messages by message type or individually, depending on their needs. They can subsequently un-ignore them at a later stage if desired. Verifiers may also provide localized messages which can be displayed in the same language the Editor UI is presenting in.

Flows:

Verification Flow 1

Verification Flow 2

Notes on the Flow Diagrams

The StartVerification request initiates the background job on the extension. The background job will be responsible for generating the verification messages.

The DownloadDependencyFiles calls to the Trados Cloud Platform API implies using various URLs to call endpoints as defined in the StartVerification request defined below. These resources can include Native files, Native annotated files, BCM documents, Verification Resource Packages, Language Resource Templates and Translation Engine resources.

The PublishMessages endpoint is used to publish batches of verification messages as they are generated by the extension to the Trados Cloud Platform API. See below for more details. The publishMessageUrl is defined in the initial StartVerification request.

The PublishEndResult is a call to the callbackUrl which is defined in the StartVerification request. This is called to indicate that all messages have been generated by the extension. Please see below for more details.

App implementation overview

The app receives a VerifyDocument request from Trados. The request specifies certain URLs relating the the required resources the app will need for performing its verification task in addition to callback URLs for publishing verification messages and finalizing the operation.

These resources are downloaded to the app via the Trados Cloud Platform API and can be stored locally during the verification operation.

When the request is received, the app responds with 201(Created) and starts a background job which will generate the verification messages. For each verification error or warning the app creates, it needs to send this back to Trados via the Trados Cloud Platform API. These messages may be grouped into batches to minimize chatty behavior between the app and the Trados Cloud Platform API.

The app may also receive a request to verify a single segment. All interactions between the app and Trados Cloud Platform API are via REST calls.

API Overview

The extensible verifier API consists of a number of endpoints - an overview of their purpose is given here:

A verification extension defines the following in its descriptor:

{
  "extensionPointId": "lc.verificationprovider",
  "id": "string",
  "name": "string",
  "description": "string",
  "extensionPointVersion": "1.0",
  "configuration": {
    "endpoints": {
      "lc.verification.startverification": "string",
      "lc.verification.verifysegment": "string",
      "lc.verification.getmessagesbyculture": "string",
      "lc.verification.getsettingsschema": "string"
    },
    "validationInputType": "string",
  }
}

The extensionPointId must always be "lc.verificationprovider"

The validationInputType defines which type of validation will happen based on the input document:

BilingualDocument
NativeSource
NativeTarget
NativeAnnotatedTarget

The respective download URL(s) will be provided in the StartVerification request and the extension must download and store these resources as required.

endpoints - the required endpoints for the verification extension, should be relative to your baseUrl.
- lc.verification.startverification - the endpoint used to start the verification operation. For ex: /verify/document.
- lc.verification.verifysegment - the endpoint used to verify a single segment. For ex: /verify/segment.
- lc.verification.getmessagesbyculture - the endpoint used to retrieve localized messages. For ex: /messages.
- lc.verification.getsettingsschema - the endpoint used to retrieve the settings schema. For ex: /schemas.

There are four endpoints which the extension supports:

StartVerification
VerifySegment
GetMessagesByCulture
GetSettingsSchema

There are described in a little more detail here:

StartVerification

This endpoint is called by Trados to initiate the document verification operation. Generally, a background job is started when this call is made and it returns 201(Created) The background job then prepares batches of verification messages which it then publishes to Trados via the Trados Cloud Platform API using the PublishMessages endpoint URL which is provided in the initial StartVerification request. Please see the API documentation for more details: StartVerification

Example:

POST https://your-app.com/verify/document

The request for StartVerification is shown below:

{
  "inputResourceDetails": {
    "nativeFileUrl": "string",
    "bilingualDocumentUrl": "string",
    "bilingualDocumentVersion": 0,
    "nativeAnnotatedFileUrl": "string",
    "languageResourceTemplateId": "string",
    "verificationResourcePackageUrl": "string",
    "translationProfileId": "string"
  },
  "callbackUrl": "string",
  "sourceLanguage": "string",
  "targetLanguage": "string",
  "publishMessageUrl": "string",
  "verifierSettings": {},
  "sessionId": "string"
}

inputResourceDetails - contains all relevant fields relating to the inputs for the verification operation - this includes:

nativeFileUrl - a download URL for retrieving the native file. This is only used in the context of native file verification such as XML schema validation

bilingualDocumentUrl - a download URL for retrieving the bilingual document (BCM) - this is used in the context of document validation

bilingualDocumentVersion - the version of the document which is to be used for validation

nativeAnnotatedFileUrl - a download URL for retrieving the native annotated file. This is only used in the context for native file verification where annotated information is also required. This annotated information can be used to determine segment IDs which would be used in reporting the error location

languageResourceTemplateId - Id used when retrieving a Language Resource Template. Certain validations might require access to this.

verificationResourcePackageUrl - a download URL for retrieving the verification resource package. This would be used in the context of native file validation, for instance, where a schema and any ancillary files might be needed for XML validation.

translationProfileId - this ID can be used to retrieve TranslationEngine resources. An example of its use might be for a terminology verifier.

callbackUrl - this endpoint is called when the validation operation is completed.

sourceLanguage - source language code

targetLanguage - target language code

publishMessageUrl - endpoint called to publish batches of verification messages to the Trados Cloud Platform API

verifierSettings - a JSON object representing any settings for this extension - these are extension specific and are defined in a JSON schema as detailed below

sessionId - a unique ID associated with this particular verification request

PublishMessages Body

The PublishMessage endpoint should be called with batches of verification messages, as they are generated by the background job. The model used for defining these batches is shown below:

{
  "messages": [
    {
      "id": 1,
      "messageType": "string",
      "verifier": "string",
      "level": "string",
      "segmentId": "string",
      "tagId": "string",
      "isSource": "bool",
      "messageArguments": [
        "string"
      ],
      "segmentLocation": {
        "fileId": "string",
        "paragraphUnitId": "string",
        "segmentNumber": "string"
      },
      "messageLocation": {
        "fromLocation": 0,
        "toLocation": 0
      }
    }
  ],
  "sessionId": "string"
}

messages - an array of messages

Each message contains the following fields:

id - a one-based index of the message generated by the extension. The first message should have an id of '1' and subsequent messages should be consecutively numbered 2,3,4 etc...

messageType - the message type identifier. This should be a string ID which is pre-pended with the extension ID

verifier - this will be the same as the extension ID

level - error level of this message

segmentId - the segment ID in which the error occurs

tagId - the id of the tag which the error relates to

isSource - a boolean indicating if the error/warning relates to the source or target

messageArguments - an array of the arguments for the relevant message defined for this messageType. The message is usually stored as a resource with placeholders in the extension and is returned by the GetMessagesByCulture endpoint described below - for each placeholder in the message, an argument should exist in this array

segmentLocation - the location of the segment

messageLocation - the offsets within the segment representing the error span

VerifySegment

This endpoint is called by Trados to validate an individual segment. This endpoint responds with a collection of messages which relate to any issues found for this segment. Please see the API documentation for more details: VerifySegment

Example:

POST https://your-app.com/verify/segment

The request is structured as follows:

{
  "fragment": {},
  "languageResourceTemplateId": "4db79181-4ff4-4d01-8e33-44e7520ac6a6",
  "translationProfileId": "string",
  "verifierSettings": {},
  "sourceLanguage": "string",
  "targetLanguage": "string",
  "segmentLocation": {
    "fileId": "string",
    "paragraphUnitId": "string",
    "segmentNumber": "string"
  }
}

fragment - a json object containing a BCM fragment languageResourceTemplateId - Id of the language resource template to be requested for this session translationProfileId - Id used to retrieve translation engine resources verifierSettings - a json object containing the settings to be applied during the segment validation operation sourceLanguage - the source language code targetLanguage - the target language code segmentLocation - location details of the segment - these are: fileId - id of file paragraphUnitId - id of paragraph unit in file segmentNumber - the segment number within the paragraph (source/target)

GetMessagesByCulture

This endpoint is called by Trados to retrieve localized resources for the messages. The culture is specified in the call as a URL parameter. The app can support localized resources for various languages which will allow the messages to be displayed in the native language of the user in the UI. Please see the API documentation for more details: GetMessagesByCulture

Example:

GET https://your-app.com/messages/es-ES

GetSettingsSchema

This endpoint is called by Trados to retrieve the schema related to any settings which the extension supports. A note on schemas - once you define your settings schema for the extension, it should only be modified in a backwards compatible way, i.e. only adding of extra fields is allowed, not removing or renaming existing fields. Please see the API documentation for more details: GetSettingsSchema

Example:

GET https://your-app.com/schemas

An example schema is shown below:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://rws.com/verification-sample-extension-settings.schema.json",
  "title": "Length-check Verifier",
  "description": "Validates if the target segment length exceeds the source segment length by a given amount.",
  "type": "object",
  "properties": {
    "enabled": {
      "type": "boolean",
      "default": true
    },
    "verificationResourcePackage": {
      "type": "object",
      "properties": {
        "platformSettingType": {
          "type": "string",
          "enum": [
            "file"
          ]
        },
        "platformSettingValue": {
          "type": "string"
        }
      }
    },
    "dateTest": {
      "type": "string",
      "format": "date"
    },
    "dateTimeTest": {
      "type": "string",
      "format": "date-time"
    },
    "lengthCheckCharacterLimit": {
      "type": "integer",
      "default": 20,
      "minimum": 1,
      "maximum": 1000
    },
    "numberTest": {
      "type": "number",
      "default": 10,
      "minimum": 1,
      "maximum": 1000
    },
    "stringTest": {
      "type": "string",
      "default": "testSetting",
      "minLength": 10,
      "maxLength": 100
    },
    "gender": {
      "type": "string",
      "enum": [ "Female", "Male" ]
    }
  },

  "required": [
    "enabled",
    "dateTimeTest",
    "gender",
    "lengthCheckCharacterLimit"
  ]
}

Relevant IDs which need to be passed from requests to responses

The only ID which is passed through from the request to response is the sessionId. This is used in the StartVerification request and sent back in the PublishMessage request.