The Bilingual File Parser
The bilingual parser is used for localizable content extraction. While the native parser allows you to extract source related localizable content, the bilingual parser allows you to set source and target related localizable content, group this content into the paragraph units and the segments.
Writing a bilingual parser
When writing filters to use the Bilingual API you must first derive your parser class from the IBilingualParser and IBilingualContentProcessor interfaces. These two interfaces form the core of the work that is done by a Bilingual parser.
Some other interaces may be also required such as AbstractBilingualFileTypeComponent or INativeContentCycleAware.
However, the simplest way to implement a Bilingual parser is to derive from the AbstractBilingualFileTypeComponent class and the IBilingualParser and INativeContentCycleAware interface only and use the helper functions and properties of the AbstractBilingualFileTypeComponent class. A parser normally derives from the INativeContentCycleAware interface which provides the parser with additional information such as the original file path name, source and target language and encoding and class methods that are called during key phases of the parsing process to enable your filter to manage its initialisation flow-control and clean-up. You should also derive from the ISettingsAware interface if you have any settings associated with this parser. Of course, you will need to create a UI to set the settings as well. Please see Filter UI Settings for more information.
Deriving from AbstractBilingualFileTypeComponent
The AbstractBilingualFileTypeComponent base class provides an implementation of the IFileTypeComponentBuilder and IBilingualFileTypeComponent interfaces leaving only the IBilingualParser and INativeContentCycleAware interfaces to be implemented by the derived class. If this class is used as a base class then most of the properties and methods that a required to implement a Bilingual parser will be implemented in this AbstractBilingualFileTypeComponent class or its base class. This leaves the following interfaces that still need to have their interface members implemented in your bilingual parser class: IBilingualParser, IParser and INativeContentCycleAware.
Implementing IBilingualParser
The IBilingualParser interface contains two essential properties for the bilingual parser.
The first property DocumentProperties
contains an IDocumentProperties interface that is set by the Bilingual API during the parser initialisation. This document properties interface is used to initialise the source and target languages.
The second property Output of type IBilingualContentProcessor is used to tell the Bilingual API of all major file processing events that are encountered during the parsing of a bilingual document. The IBilingualContentProcessor interface is implemented by content processors within the Bilingual API that work on the bilingual content model. To facilitate processing in a streaming manner without requiring the entire document object in memory at any time the parser will need to feed paragraph-units one by one through calls to Output.ProcessParagraphUnit()
. However, document and file properties will need to be provided by the parser to the framework by calls to Output.Initialize()
and Output.SetFileProperties()
before processing any paragraph-units in each document or file. These events are outlined in the next few sections below.
IBilingualParser.DocumentProperties
The DocumentProperties interface of type IBilingualParser is set by the File Type Support Framework when calling a bilingual parser but you will need to define this property and a private member to store its value. This document properties interface is then later used for storing the source and target languages and then initialising the output stream of the bilingual content processes.
IBilingualContentProcessor.Output
The Output interface of type IBilingualContentProcessor is also initialised by the File Type Support Framework and provides a coupling between the Bilingual Parser and all Bilingual Content Processors down the processing chain during the extract conversion phase from a bilingual file format to the default bilingual SDLXliff (.xliff) persistent file format. The Output interface has several methods that are called throughout the file parsing operation.
Output.Initialize()
The Bilingual Parser should call Initialize method to forward the reference to the DocumentProperties object for the document being processed. This is normally done after the SourceLanguage and TargetLanguage have been set using information from the source file being parsed.
Note
This method should always be called, and always before any other calls on the Output interface.
Output.SetFileProperties()
The Bilingual Parser should call SetFileProperties method to provide the framework with a reference to the properties for each file in the document, before the paragraph-units of the file are processed by calling ProcessParagraphUnit as outlined below.
The SetFileProperties method takes an interface reference of type IFileProperties that needs to be created by the Bilingual Parser its self. This can be done using the CreateFileProperties method. However, a Bilingual parser will normally need to set the IPersistentFileConversionProperties property of the IFileProperties created object before passing it to SetFileProperties. This property however can normally be obtained from the parameter of SetFileProperties when your parser class is derived from INativeContentCycleAware. You may also need to update other properties of the IFileProperties object such as source and target language Tool name and version and creation date.
Output.ProcessParagraphUnit()
The Bilingual Parser should call ProcessParagraphUnit method for each paragraph-unit found in the source file being parsed. The parameter of type IParagraphUnit must be created using a call to CreateParagraphUnit. When creating a paragraph-unit the source and target language is specified together with a lock type. Normally paragraph-units are created as structure paragraph-units with lock type Structure or translatable paragraph-unit of lock type Unlocked.
Output.FileComplete()
The Bilingual Processor should call FileComplete method after all paragraph-units in a file have been processed by calling ProcessParagraphUnit for each one. After calling FileComplete the file properties should not be changed any further.
Output.Complete()
The Bilingual Processor should call Complete method after all content in all files have been processed. After calling Complete the document properties should not be changed any further.
Implementing IParser
IParser.OnProgress
The IParser interface contains an event OnProgress or type ProgressEventArgs that must be defined in classes deriving from it. This event can be used to notify the framework and hence the user of the file parsing progress. As well as calling OnProgress with a suitable percentage of the file currently being processed it is normally expected to be called with a parameter of 0% before opening a file and with a parameter of 100% when file reading is complete.
IParser.ParseNext()
The IParser interface contains a ParseNext method which must also be implemented. This method is called repeatedly by the framework to process the next chunk of input from the source bilingual document.
The implementation should parse a suitable chunk (preferably not large) of the input and return a bool that indicates if there is more work to be done before this file is completely parsed. When there is no more of the source file to process then the ParseNext method should return false to indicate that there is no more file content to be processed.
Typically it is in this method, or in methods called by this, that the Output property’s methods are called to inform the framework of the entire source files content.
Implementing INativeContentCycleAware
If deriving from the INativeContentCycleAware interface you must implement the following three methods.
SetFileProperties()
SetFileProperties methods standard implementation is to save its parameter properties of type IFileProperties to a class variable. These file properties can then be used by the bilingual parser to supplement the information available from the source file contents where this information may not be known, such as the original file encoding.
StartOfInput()
Called by the framework after component initialisation i.e. after SetFileProperties , but before any content is parsed and passed to any of the File Type Support Framework components.
EndOfInput()
This is called by the framework after processing of the bilingual content has finished.
Implementing ISettingsAware
If deriving from the ISettingsAware interface you must implement the following method.
InitializeSettings()
InitializeSettings Passes in an ISettingsBundle object and a configurationId
FileTypeConfigurationId
. These can be used to populate the required settings object used by the parser:
public void InitializeSettings(Sdl.Core.Settings.ISettingsBundle settingsBundle, string configurationId)
{
UserSettings _userSettings = new UserSettings();
_userSettings.PopulateFromSettingsBundle(settingsBundle, configurationId);
LockPrdCodes = _userSettings.LockPrdCodes;
}
See Also
Note
This content may be out-of-date. To check the latest information on this topic, inspect the libraries using the Visual Studio Object Browser.