Class Segment
Represents a segment, which is a sequence of SegmentElements, in a particular language.
Inheritance
Inherited Members
Namespace: Sdl.LanguagePlatform.Core
Assembly: Sdl.LanguagePlatform.Core.dll
Syntax
public class Segment
Constructors
Segment()
Initializes a new instance with the System.Globalization.CultureInfo.InvariantCulture, and an empty list of elements.
Declaration
public Segment()
Segment(CultureCode)
Initializes a new instance with the specified culture, and an empty list of elements.
Declaration
public Segment(CultureCode culture)
Parameters
| Type | Name | Description |
|---|---|---|
| Sdl.Core.Globalization.CultureCode | culture | The CultureCode object representing the language. |
Properties
Culture
Gets or sets the culture for this segment.
Declaration
public CultureCode Culture { get; set; }
Property Value
| Type | Description |
|---|---|
| Sdl.Core.Globalization.CultureCode |
CultureName
Gets or sets the culture name for this segment. The culture name must be resolvable through CultureInfoExtensions.GetCultureInfo(string), or an exception will be thrown.
Declaration
public string CultureName { get; set; }
Property Value
| Type | Description |
|---|---|
| System.String |
Elements
Gets or sets the collection of elements in this segment.
Declaration
public List<SegmentElement> Elements { get; set; }
Property Value
| Type | Description |
|---|---|
| System.Collections.Generic.List<SegmentElement> |
HasPairedTags
Gets a value which indicates whether this segment contains any paired tags. Only start tags are checked, it is assumed that the tag structure is valid.
Declaration
public bool HasPairedTags { get; }
Property Value
| Type | Description |
|---|---|
| System.Boolean |
HasPlaceables
Gets a bool value which indicates whether this segment contains any placeables. Note that the return value is only valid if the segment is tokenized.
Declaration
public bool HasPlaceables { get; }
Property Value
| Type | Description |
|---|---|
| System.Boolean |
HasTags
Gets a value which indicates whether this segment contains any tags.
Declaration
public bool HasTags { get; }
Property Value
| Type | Description |
|---|---|
| System.Boolean |
IsEmpty
Gets a value indicating whether this instance contains any elements (false) or not (true).
Declaration
public bool IsEmpty { get; }
Property Value
| Type | Description |
|---|---|
| System.Boolean |
LastElement
Gets or sets the last element of this segment.
Declaration
public SegmentElement LastElement { get; set; }
Property Value
| Type | Description |
|---|---|
| SegmentElement |
Tokens
Gets or sets the collection of tokens in this segment.
Declaration
public List<Token> Tokens { get; set; }
Property Value
| Type | Description |
|---|---|
| System.Collections.Generic.List<Token> |
Methods
Add(SegmentElement)
Adds the provided segment element to the segment's list of elements. When adding a text element, and the last segment element is a text element as well, they will be merged.
Declaration
public void Add(SegmentElement element)
Parameters
| Type | Name | Description |
|---|---|---|
| SegmentElement | element | The element to append |
Add(String)
Adds the provided string as a new text element to the segment's list of elements. If the last segment element is a Text element as well, they will be merged.
Declaration
public void Add(string text)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | text | The text to append |
AddRange(IEnumerable<SegmentElement>)
Adds all segment elements in the collection to this segment.
Declaration
public void AddRange(IEnumerable<SegmentElement> elements)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Collections.Generic.IEnumerable<SegmentElement> | elements | The elements to add |
AnchorDanglingTags()
Sets the anchor for any tags which are not yet anchored (including standalone/placeholder tags). Does not modify tag IDs or alignment anchors.
Declaration
public void AnchorDanglingTags()
Clear()
Empties the list of segment elements.
Declaration
public void Clear()
ComputeStrictIdentityStringAsync()
Gets a strict identity string - use with GetStrictHash()
Declaration
public Task<string> ComputeStrictIdentityStringAsync()
Returns
| Type | Description |
|---|---|
| System.Threading.Tasks.Task<System.String> |
ComputeStrictIdentityStringAsync(IEnumerable<Token>)
Generate strict identity string (not intended for fuzzy matching)
Declaration
public static Task<string> ComputeStrictIdentityStringAsync(IEnumerable<Token> tokens)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Collections.Generic.IEnumerable<Token> | tokens |
Returns
| Type | Description |
|---|---|
| System.Threading.Tasks.Task<System.String> |
DeleteEmptyTagPairs(Boolean)
Deletes empty tag pairs (a start tag directly followed by the end tag with the same tag anchor) from the segment.
Declaration
public bool DeleteEmptyTagPairs(bool onlyInPeripheralPositions)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Boolean | onlyInPeripheralPositions | If true, will delete empty tag pairs only if they appear in peripheral positions (leading, trailing). |
Returns
| Type | Description |
|---|---|
| System.Boolean | true if any tags were deleted, and false otherwise. |
DeleteTags()
Removes all tags from the segment, applying the DeleteAll tag deletion mode.
Declaration
public bool DeleteTags()
Returns
| Type | Description |
|---|---|
| System.Boolean |
|
DeleteTags(Segment.DeleteTagsAction)
Removes all tags from the segment, applying the specified tag deletion mode.
Declaration
public bool DeleteTags(Segment.DeleteTagsAction mode)
Parameters
| Type | Name | Description |
|---|---|---|
| Segment.DeleteTagsAction | mode | The tag deletion mode |
Returns
| Type | Description |
|---|---|
| System.Boolean |
|
Duplicate()
Creates a new instance that is a deep copy of this instance.
Declaration
public Segment Duplicate()
Returns
| Type | Description |
|---|---|
| Segment | A new instance that is a deep copy of this instance. |
Equals(Segment)
Compares this instance to another Segment object.
Declaration
public bool Equals(Segment other)
Parameters
| Type | Name | Description |
|---|---|---|
| Segment | other | The other instance. |
Returns
| Type | Description |
|---|---|
| System.Boolean | true if the language and all the elements are the same, otherwise false. |
FillUnmatchedStartAndEndTags()
Inserts corresponding start and end tags for unmatched end and start tags to the segment. For unmatched end tags, the corresponding start tags are inserted at the beginning of the segment. Corresponding end tags for unmatched start tags are added at the end. In certain cases, not all dangling tags can be filled, and in order to obtain a valid segment without any unmatched tags, RemoveUnmatchedStartAndEndTags(Boolean) should be called after calling this method. Note that only the tag type is checked, not whether there are start or end tags without a corresponding tag having the same tag anchor.
The method will discontinue if the tag pairing structure is incorrect (i.e. if there are overlapping tags).
Declaration
public bool FillUnmatchedStartAndEndTags()
Returns
| Type | Description |
|---|---|
| System.Boolean |
|
FindTag(TagType, Int32)
Finds and returns the tag with the provided type and the provided tag anchor, or null if no such tag exists in the segment.
Declaration
public Tag FindTag(TagType type, int anchor)
Parameters
| Type | Name | Description |
|---|---|---|
| TagType | type | |
| System.Int32 | anchor |
Returns
| Type | Description |
|---|---|
| Tag |
GetHashCode()
Declaration
public override int GetHashCode()
Returns
| Type | Description |
|---|---|
| System.Int32 | A hash code for this object |
Overrides
GetMaxTagAnchor()
Returns the highest tag anchor used in the segment, or 0 if no tags are present.
Declaration
public int GetMaxTagAnchor()
Returns
| Type | Description |
|---|---|
| System.Int32 |
GetMinMaxTagAnchor(out Int32, out Int32)
Returns the smallest and largest tag anchor used in the segment. Both default to 0.
Declaration
public void GetMinMaxTagAnchor(out int min, out int max)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Int32 | min | |
| System.Int32 | max |
GetTagCount()
Returns the number of tags in the segment. Paired tags are counted only once.
Declaration
public int GetTagCount()
Returns
| Type | Description |
|---|---|
| System.Int32 | The segment's tag count |
GetTagIdGroups()
Computes a mapping from the start tag token index to that tag's tag ID. Only start and standalone/placeholder tags are included in the mapping. The mapping may be n:1. The segment must be tokenized, or an exception is thrown.
Declaration
public Dictionary<int, string> GetTagIdGroups()
Returns
| Type | Description |
|---|---|
| System.Collections.Generic.Dictionary<System.Int32, System.String> |
GetTagPairings()
Returns a dictionary of paired tag token indices, mapping from the start tag's token index to the end tag's token index. The segment must be tokenized, or an exception is thrown.
Declaration
public Dictionary<int, int> GetTagPairings()
Returns
| Type | Description |
|---|---|
| System.Collections.Generic.Dictionary<System.Int32, System.Int32> |
GetTokenIndex(SegmentPosition)
Returns the index of the token at the specified position.
Declaration
public int GetTokenIndex(SegmentPosition p)
Parameters
| Type | Name | Description |
|---|---|---|
| SegmentPosition | p |
Returns
| Type | Description |
|---|---|
| System.Int32 | The index of the token at the specified position, or -1 if it is not found, or if the segment is not tokenized. |
GetWeakHashCode()
Returns a hash code which does not depend on tag anchors in the segment. This can be used for translation tracking in bilingual documents.
Declaration
public int GetWeakHashCode()
Returns
| Type | Description |
|---|---|
| System.Int32 | A hash code which is independent of tag anchors. |
HasPeripheralWhitespace()
Determines whether the segment starts or ends with at least one whitespace character.
Declaration
public bool HasPeripheralWhitespace()
Returns
| Type | Description |
|---|---|
| System.Boolean |
HasTokenBundles()
Returns true if any of the segment's tokens is a TokenBundle (i.e. an ambigous tokenization), and false otherwise. Token bundles should only be used inside the TM Kernel and not be returned through the TM API.
Declaration
public bool HasTokenBundles()
Returns
| Type | Description |
|---|---|
| System.Boolean |
HasUnmatchedStartOrEndTags()
Determines whether the segment has any unmatched start or end tags. Note that this method only tests the tag type, and does not handle paired tags where the start or end tag are missing.
Declaration
public bool HasUnmatchedStartOrEndTags()
Returns
| Type | Description |
|---|---|
| System.Boolean |
|
IsValid()
Determines if this segment is valid.
Declaration
public bool IsValid()
Returns
| Type | Description |
|---|---|
| System.Boolean | true if the segment is valid, false othwerwise. |
MergeAdjacentTextRuns()
Merges adjacent text runs.
Declaration
public void MergeAdjacentTextRuns()
RemoveTokenBundles()
Replaces token bundles with the "best" token in that bundle. Returns true if any replacement has been done, and false otherwise.
Declaration
public bool RemoveTokenBundles()
Returns
| Type | Description |
|---|---|
| System.Boolean |
RemoveUnmatchedStartAndEndTags()
Deletes all tags from the segment which have a tag type of Core.TagType.UnmatchedStart or Core.TagType.UnmatchedEnd. Note that this method only tests the tag type, and does not handle paired tags where the start or end tag are missing.
Declaration
public bool RemoveUnmatchedStartAndEndTags()
Returns
| Type | Description |
|---|---|
| System.Boolean |
|
RemoveUnmatchedStartAndEndTags(Boolean)
Deletes all tags from the segment which have a tag type of Core.TagType.UnmatchedStart or Core.TagType.UnmatchedEnd, if these tags occur in peripheral positions, which means that dangling end tags are only removed if they appear at the start of the segment, and dangling start tags are only removed if they appear at the end of the segment, with no other tags or text preceding the tag (in case of segment-initial dangling end tags), or following the tag (for segment-trailing dangling start tags).
Note that this method only tests the tag type, and does not handle paired tags where the start or end tag are missing.
Declaration
public bool RemoveUnmatchedStartAndEndTags(bool peripheralPositionsOnly)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Boolean | peripheralPositionsOnly |
Returns
| Type | Description |
|---|---|
| System.Boolean |
|
RenumberTagAnchors(Int32, ref Int32)
Renumbers tag anchors, starting at nextTagAnchor, in a consecutive manner. Although tag anchors have no semantics for standalone tags, they are also anchored in the same manner. Errors in tag numbering will be ignored (but preserved, i.e. invalid tag anchors will be mapped to potentially new, also invalid tag anchors).
Declaration
public bool RenumberTagAnchors(int nextTagAnchor, ref int maxAlignmentAnchor)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Int32 | nextTagAnchor | The first anchor to assign (must be larger than zero) |
| System.Int32 | maxAlignmentAnchor | Returns the highest alignment anchor in the renumbered segment. |
Returns
| Type | Description |
|---|---|
| System.Boolean | true if the any anchors were reassigned, and false otherwise. |
RenumberTagAnchors(ref Int32)
Renumbers tag anchors so that they start at 1 and are consecutive. Although tag anchors have no semantics for standalone tags, they are also anchored in the same manner. Errors in tag numbering will be ignored (but preserved, i.e. invalid tag anchors will be mapped to potentially new, also invalid tag anchors).
Declaration
public bool RenumberTagAnchors(ref int maxAlignmentAnchor)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Int32 | maxAlignmentAnchor |
Returns
| Type | Description |
|---|---|
| System.Boolean | true if the any anchors were reassigned, and false otherwise. |
ToPlain()
Returns a string containing only the plain text in this segment. Note that text placeholders will be replaced with their text equivalent.
Declaration
public string ToPlain()
Returns
| Type | Description |
|---|---|
| System.String | A string containing only the plain text in this segment. |
ToPlain(SegmentRange)
Computes the plain-text version of the part of the segment specified by the provided range.
Declaration
public string ToPlain(SegmentRange range)
Parameters
| Type | Name | Description |
|---|---|---|
| SegmentRange | range | The range of the segment to convert |
Returns
| Type | Description |
|---|---|
| System.String | The plain-text string corresponding to the provided range. |
ToPlain(Boolean, Boolean, out List<SegmentPosition>)
Computes the plain-text version of the segment and returns, in the ranges list, the segment range of each character of the result string. The number of elements in that collection will be equal to the length of the string in characters.
Declaration
public string ToPlain(bool tolower, bool tobase, out List<SegmentPosition> ranges)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Boolean | tolower | If true, the returned string will be lower-cased |
| System.Boolean | tobase | If true, all letters will be mapped to their base character (i.e. diacritics will be stripped) |
| System.Collections.Generic.List<SegmentPosition> | ranges | A reference to the list of segment ranges which will be returned upon completion. The list includes, for each character in the result string, the position in the original segment. |
Returns
| Type | Description |
|---|---|
| System.String |
ToPlain(Int32, Int32)
Returns a string containing only the plain text in this segment, covering the given token range. An exception will be thrown if the segment's tokens are not set or the token range is outside the bounds.
Declaration
public string ToPlain(int fromToken, int intoToken)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Int32 | fromToken | The index of the first token |
| System.Int32 | intoToken | The index of the last token (inclusive, i.e. "into" semantics) |
Returns
| Type | Description |
|---|---|
| System.String | A plain text string covering the specified token range |
ToString()
Declaration
public override string ToString()
Returns
| Type | Description |
|---|---|
| System.String | A string representation of the object, for display purposes. |
Overrides
Trim()
Removes leading whitespace from the first segment element, if that is a text element, and trailing whitespace from the last segment element, if that is a text element. If the first/last segment element is not a text element, it will not be altered. Also, leading (trailing) whitespace will not be removed from a text element if it is preceded (followed) only by non-text elements. Also deletes any null elements.
Declaration
public void Trim()
TrimEnd()
Removes trailing whitespace from the last segment element, if that is a text element. If the last segment element is not a text element, nothing will happen. Hence, trailing whitespace will not be removed from a text element if it is followed by non-text elements. The number of elements may be altered by this method. Empty (null) elements will also be removed.
Declaration
public string TrimEnd()
Returns
| Type | Description |
|---|---|
| System.String | A string consisting of the trimmed-off characters, or |
TrimStart()
Removes leading whitespace from the first segment element, if that is a text element. If the first segment element is not a text element, nothing will happen. Hence, leading whitespace will not be removed from a text element if it is preceded by non-text elements. The number of elements may be altered by this method. Empty (null) elements will also be removed.
Declaration
public string TrimStart()
Returns
| Type | Description |
|---|---|
| System.String | A string consisting of the trimmed-off characters, or |
UpdateFromTokenIndices(ICollection<Int32>)
Updates the segment's text from the tokens, and adjusts span indices accordingly. An exception is thrown if the segment is not tokenized.
Declaration
public bool UpdateFromTokenIndices(ICollection<int> tokenIndices)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Collections.Generic.ICollection<System.Int32> | tokenIndices | The list of tokens to update. |
Returns
| Type | Description |
|---|---|
| System.Boolean | true if the segment was changed, and false otherwise. |
Validate()
Validates the current instance, with the ReportAllErrors validation mode.
Declaration
public ErrorCode Validate()
Returns
| Type | Description |
|---|---|
| ErrorCode | An error code (which may be OK, indicating the segment is valid). |
Validate(Segment.ValidationMode)
Performs validation checks on this instance, applying the specified validation mode.
Declaration
public ErrorCode Validate(Segment.ValidationMode mode)
Parameters
| Type | Name | Description |
|---|---|---|
| Segment.ValidationMode | mode | The validation mode to apply |
Returns
| Type | Description |
|---|---|
| ErrorCode | An error code (which may be OK, indicating the segment is valid). |
VerifyTokenSpans()
Verifies whether the spans of the segment's tokens are correct and reflect the segment's text. Note that the segment should be tokenized. If not, true is returned.
Declaration
public bool VerifyTokenSpans()
Returns
| Type | Description |
|---|---|
| System.Boolean | true if the verification was successful or the segment is not tokenized, and false otherwise. |
WeakEquals(Segment)
Computes weak equality with another segment.
Weak equality does not check culture compatibility and tag anchors do not need to be identical, but text elements must match, as well as the order of tags (element similarity must not be None)
Declaration
public bool WeakEquals(Segment other)
Parameters
| Type | Name | Description |
|---|---|---|
| Segment | other |
Returns
| Type | Description |
|---|---|
| System.Boolean |