Namespace Sdl.LanguagePlatform.Core.Tokenization
Classes
AutoLocalizationSettings
Contains specialized settings for auto-localization of tokens.
CurrencyFormat
Defines a currency symbol (e.g. $, £, USD) along with permissible options for positioning and separator
CustomUnitDefinition
Provides additional metadata for a custom unit when creating a recognizer for Measurement
DateTimeToken
A Token which represents a date or time expression.
GenericPlaceableToken
Represents a generic placeable token which is not one of the predefined placeable token classes. Generic placeable tokens can never be auto-localized, but may be auto-substitutable.
Match
A match object which is returned by FST, FSA, or regex matches
MeasureToken
A Token which represents a measurement, which consists of a numeric value and a unit.
NumberToken
A Token which represents a numeric value.
PrioritizedToken
A Token with an assigned priority, usually originating from a recognizer's priority. This class is for internal purposes only and should not be used in third-party applications.
SimpleToken
A Token which represents a simple token, such as a word, whitespace, or punctuation.
TagToken
A Token which encapsulates a tag in the input.
Token
Represents a generic, abstract token, which is a sequence of characters in the input. A token is identified using a tokenizer, which breaks up the sequence of characters in the input into a sequence of tokens. That token sequence is non-overlapping, but not necessarily contiguous.
TokenBundle
A special Token which represents a set of alternatives (i.e. an ambiguous analysis) of other tokens which cover the exactly same input span.
TokenizationContext
Holds additional metadata for tokenization for a given culture, such as any custom formats for Number, Date etc.
Interfaces
ILocalizableToken
Defines the interface for auto-localizable tokens. Localizable tokens have a value, and their surface representation ("text") can be automatically converted into a target culture representation, given the token's value and the target culture.
Enums
BuiltinRecognizers
Enumerates the known types of special token recognizers.
CurrencySymbolPosition
Defines the permissible positions for a currency symbol with respect to the currency amount
DateTimePatternType
Enumerates the different types of a date or time pattern.
LocalizationParametersSource
Controls which tokens are used to obtain detailed localization parameters, such as the numeric group separator override, or whitespace handling between a number and the unit in measurements.
NumericSeparator
The numeric separators type which can occur in a number token.
Sign
The sign of a number
TokenizerFlags
Flags controlling tokenizer behaviour
TokenType
The type of a token, e.g. whether the token represents a word, punctuation, etc.
Unit
Enumerates the units known by the system. Only those units are listed which may require cross-system conversion (not yet implemented).
UnitSeparationMode
Controls how units are separated from the numeric value in measurements.