Class Segment
Represents a segment, which is a sequence of SegmentElements, in a particular language.
Inherited Members
Namespace: SdlSdl.LanguagePlatformCore
Assembly: Sdl.LanguagePlatform.Core.dll
Syntax
[DataContract]
public class Segment
Constructors
Segment()
Initializes a new instance with the InvariantCulture, and an empty list of elements.
Declaration
public Segment()
Segment(CultureCode)
Initializes a new instance with the specified culture, and an empty list of elements.
Declaration
public Segment(CultureCode culture)
Parameters
Type | Name | Description |
---|---|---|
CultureCode | culture | The CultureCode object representing the language. |
Properties
Culture
Gets or sets the culture for this segment.
Declaration
public CultureCode Culture { get; set; }
Property Value
Type | Description |
---|---|
CultureCode |
CultureName
Gets or sets the culture name for this segment. The culture name must be resolvable through CultureInfoExtensions.GetCultureInfo(string), or an exception will be thrown.
Declaration
[DataMember]
public string CultureName { get; set; }
Property Value
Type | Description |
---|---|
string |
Elements
Gets or sets the collection of elements in this segment.
Declaration
[DataMember]
public List<SegmentElement> Elements { get; set; }
Property Value
Type | Description |
---|---|
ListSegmentElement |
HasPairedTags
Gets a value which indicates whether this segment contains any paired tags. Only start tags are checked, it is assumed that the tag structure is valid.
Declaration
public bool HasPairedTags { get; }
Property Value
Type | Description |
---|---|
bool |
HasPlaceables
Gets a bool value which indicates whether this segment contains any placeables. Note that the return value is only valid if the segment is tokenized.
Declaration
public bool HasPlaceables { get; }
Property Value
Type | Description |
---|---|
bool |
HasTags
Gets a value which indicates whether this segment contains any tags.
Declaration
public bool HasTags { get; }
Property Value
Type | Description |
---|---|
bool |
IsEmpty
Gets a value indicating whether this instance contains any elements (false) or not (true).
Declaration
public bool IsEmpty { get; }
Property Value
Type | Description |
---|---|
bool |
LastElement
Gets or sets the last element of this segment.
Declaration
public SegmentElement LastElement { get; set; }
Property Value
Type | Description |
---|---|
SegmentElement |
Tokens
Gets or sets the collection of tokens in this segment.
Declaration
[DataMember]
public List<Token> Tokens { get; set; }
Property Value
Type | Description |
---|---|
ListToken |
Methods
Add(SegmentElement)
Adds the provided segment element to the segment's list of elements. When adding a text element, and the last segment element is a text element as well, they will be merged.
Declaration
public void Add(SegmentElement element)
Parameters
Type | Name | Description |
---|---|---|
SegmentElement | element | The element to append |
Add(string)
Adds the provided string as a new text element to the segment's list of elements. If the last segment element is a Text element as well, they will be merged.
Declaration
public void Add(string text)
Parameters
Type | Name | Description |
---|---|---|
string | text | The text to append |
AddRange(IEnumerable<SegmentElement>)
Adds all segment elements in the collection to this segment.
Declaration
public void AddRange(IEnumerable<SegmentElement> elements)
Parameters
Type | Name | Description |
---|---|---|
IEnumerableSegmentElement | elements | The elements to add |
AnchorDanglingTags()
Sets the anchor for any tags which are not yet anchored (including standalone/placeholder tags). Does not modify tag IDs or alignment anchors.
Declaration
public void AnchorDanglingTags()
Clear()
Empties the list of segment elements.
Declaration
public void Clear()
ComputeStrictIdentityStringAsync()
Gets a strict identity string - use with GetStrictHash()
Declaration
public Task<string> ComputeStrictIdentityStringAsync()
Returns
Type | Description |
---|---|
Taskstring |
ComputeStrictIdentityStringAsync(IEnumerable<Token>)
Generate strict identity string (not intended for fuzzy matching)
Declaration
public static Task<string> ComputeStrictIdentityStringAsync(IEnumerable<Token> tokens)
Parameters
Type | Name | Description |
---|---|---|
IEnumerableToken | tokens |
Returns
Type | Description |
---|---|
Taskstring |
DeleteEmptyTagPairs(bool)
Deletes empty tag pairs (a start tag directly followed by the end tag with the same tag anchor) from the segment.
Declaration
public bool DeleteEmptyTagPairs(bool onlyInPeripheralPositions)
Parameters
Type | Name | Description |
---|---|---|
bool | onlyInPeripheralPositions | If true, will delete empty tag pairs only if they appear in peripheral positions (leading, trailing). |
Returns
Type | Description |
---|---|
bool | true if any tags were deleted, and false otherwise. |
DeleteTags()
Removes all tags from the segment, applying the DeleteAll tag deletion mode.
Declaration
public bool DeleteTags()
Returns
Type | Description |
---|---|
bool |
|
DeleteTags(DeleteTagsAction)
Removes all tags from the segment, applying the specified tag deletion mode.
Declaration
public bool DeleteTags(Segment.DeleteTagsAction mode)
Parameters
Type | Name | Description |
---|---|---|
SegmentDeleteTagsAction | mode | The tag deletion mode |
Returns
Type | Description |
---|---|
bool |
|
Duplicate()
Creates a new instance that is a deep copy of this instance.
Declaration
public Segment Duplicate()
Returns
Type | Description |
---|---|
Segment | A new instance that is a deep copy of this instance. |
Equals(Segment)
Compares this instance to another Segment object.
Declaration
public bool Equals(Segment other)
Parameters
Type | Name | Description |
---|---|---|
Segment | other | The other instance. |
Returns
Type | Description |
---|---|
bool | true if the language and all the elements are the same, otherwise false. |
FillUnmatchedStartAndEndTags()
Inserts corresponding start and end tags for unmatched end and start tags to the segment. For unmatched end tags, the corresponding start tags are inserted at the beginning of the segment. Corresponding end tags for unmatched start tags are added at the end. In certain cases, not all dangling tags can be filled, and in order to obtain a valid segment without any unmatched tags, RemoveUnmatchedStartAndEndTags(bool) should be called after calling this method. Note that only the tag type is checked, not whether there are start or end tags without a corresponding tag having the same tag anchor.
The method will discontinue if the tag pairing structure is incorrect (i.e. if there are overlapping tags).
Declaration
public bool FillUnmatchedStartAndEndTags()
Returns
Type | Description |
---|---|
bool |
|
FindTag(TagType, int)
Finds and returns the tag with the provided type and the provided tag anchor, or null if no such tag exists in the segment.
Declaration
public Tag FindTag(TagType type, int anchor)
Parameters
Type | Name | Description |
---|---|---|
TagType | type | |
int | anchor |
Returns
Type | Description |
---|---|
Tag |
GetHashCode()
Declaration
public override int GetHashCode()
Returns
Type | Description |
---|---|
int | A hash code for this object |
Overrides
GetMaxTagAnchor()
Returns the highest tag anchor used in the segment, or 0 if no tags are present.
Declaration
public int GetMaxTagAnchor()
Returns
Type | Description |
---|---|
int |
GetMinMaxTagAnchor(out int, out int)
Returns the smallest and largest tag anchor used in the segment. Both default to 0.
Declaration
public void GetMinMaxTagAnchor(out int min, out int max)
Parameters
Type | Name | Description |
---|---|---|
int | min | |
int | max |
GetTagCount()
Returns the number of tags in the segment. Paired tags are counted only once.
Declaration
public int GetTagCount()
Returns
Type | Description |
---|---|
int | The segment's tag count |
GetTagIdGroups()
Computes a mapping from the start tag token index to that tag's tag ID. Only start and standalone/placeholder tags are included in the mapping. The mapping may be n:1. The segment must be tokenized, or an exception is thrown.
Declaration
public Dictionary<int, string> GetTagIdGroups()
Returns
Type | Description |
---|---|
Dictionaryintstring |
GetTagPairings()
Returns a dictionary of paired tag token indices, mapping from the start tag's token index to the end tag's token index. The segment must be tokenized, or an exception is thrown.
Declaration
public Dictionary<int, int> GetTagPairings()
Returns
Type | Description |
---|---|
Dictionaryintint |
GetTokenIndex(SegmentPosition)
Returns the index of the token at the specified position.
Declaration
public int GetTokenIndex(SegmentPosition p)
Parameters
Type | Name | Description |
---|---|---|
SegmentPosition | p |
Returns
Type | Description |
---|---|
int | The index of the token at the specified position, or -1 if it is not found, or if the segment is not tokenized. |
GetWeakHashCode()
Returns a hash code which does not depend on tag anchors in the segment. This can be used for translation tracking in bilingual documents.
Declaration
public int GetWeakHashCode()
Returns
Type | Description |
---|---|
int | A hash code which is independent of tag anchors. |
HasPeripheralWhitespace()
Determines whether the segment starts or ends with at least one whitespace character.
Declaration
public bool HasPeripheralWhitespace()
Returns
Type | Description |
---|---|
bool |
HasTokenBundles()
Returns true if any of the segment's tokens is a TokenBundle (i.e. an ambigous tokenization), and false otherwise. Token bundles should only be used inside the TM Kernel and not be returned through the TM API.
Declaration
public bool HasTokenBundles()
Returns
Type | Description |
---|---|
bool |
HasUnmatchedStartOrEndTags()
Determines whether the segment has any unmatched start or end tags. Note that this method only tests the tag type, and does not handle paired tags where the start or end tag are missing.
Declaration
public bool HasUnmatchedStartOrEndTags()
Returns
Type | Description |
---|---|
bool |
|
IsValid()
Determines if this segment is valid.
Declaration
public bool IsValid()
Returns
Type | Description |
---|---|
bool | true if the segment is valid, false othwerwise. |
MergeAdjacentTextRuns()
Merges adjacent text runs.
Declaration
public void MergeAdjacentTextRuns()
RemoveTokenBundles()
Replaces token bundles with the "best" token in that bundle. Returns true if any replacement has been done, and false otherwise.
Declaration
public bool RemoveTokenBundles()
Returns
Type | Description |
---|---|
bool |
RemoveUnmatchedStartAndEndTags()
Deletes all tags from the segment which have a tag type of Core.TagType.UnmatchedStart or Core.TagType.UnmatchedEnd. Note that this method only tests the tag type, and does not handle paired tags where the start or end tag are missing.
Declaration
public bool RemoveUnmatchedStartAndEndTags()
Returns
Type | Description |
---|---|
bool |
|
RemoveUnmatchedStartAndEndTags(bool)
Deletes all tags from the segment which have a tag type of Core.TagType.UnmatchedStart or Core.TagType.UnmatchedEnd, if these tags occur in peripheral positions, which means that dangling end tags are only removed if they appear at the start of the segment, and dangling start tags are only removed if they appear at the end of the segment, with no other tags or text preceding the tag (in case of segment-initial dangling end tags), or following the tag (for segment-trailing dangling start tags).
Note that this method only tests the tag type, and does not handle paired tags where the start or end tag are missing.
Declaration
public bool RemoveUnmatchedStartAndEndTags(bool peripheralPositionsOnly)
Parameters
Type | Name | Description |
---|---|---|
bool | peripheralPositionsOnly |
Returns
Type | Description |
---|---|
bool |
|
RenumberTagAnchors(int, ref int)
Renumbers tag anchors, starting at nextTagAnchor, in a consecutive manner. Although tag anchors have no semantics for standalone tags, they are also anchored in the same manner. Errors in tag numbering will be ignored (but preserved, i.e. invalid tag anchors will be mapped to potentially new, also invalid tag anchors).
Declaration
public bool RenumberTagAnchors(int nextTagAnchor, ref int maxAlignmentAnchor)
Parameters
Type | Name | Description |
---|---|---|
int | nextTagAnchor | The first anchor to assign (must be larger than zero) |
int | maxAlignmentAnchor | Returns the highest alignment anchor in the renumbered segment. |
Returns
Type | Description |
---|---|
bool | true if the any anchors were reassigned, and false otherwise. |
RenumberTagAnchors(ref int)
Renumbers tag anchors so that they start at 1 and are consecutive. Although tag anchors have no semantics for standalone tags, they are also anchored in the same manner. Errors in tag numbering will be ignored (but preserved, i.e. invalid tag anchors will be mapped to potentially new, also invalid tag anchors).
Declaration
public bool RenumberTagAnchors(ref int maxAlignmentAnchor)
Parameters
Type | Name | Description |
---|---|---|
int | maxAlignmentAnchor |
Returns
Type | Description |
---|---|
bool | true if the any anchors were reassigned, and false otherwise. |
ToPlain()
Returns a string containing only the plain text in this segment. Note that text placeholders will be replaced with their text equivalent.
Declaration
public string ToPlain()
Returns
Type | Description |
---|---|
string | A string containing only the plain text in this segment. |
ToPlain(SegmentRange)
Computes the plain-text version of the part of the segment specified by the provided range.
Declaration
public string ToPlain(SegmentRange range)
Parameters
Type | Name | Description |
---|---|---|
SegmentRange | range | The range of the segment to convert |
Returns
Type | Description |
---|---|
string | The plain-text string corresponding to the provided range. |
ToPlain(bool, bool, out List<SegmentPosition>)
Computes the plain-text version of the segment and returns, in the ranges list, the segment range of each character of the result string. The number of elements in that collection will be equal to the length of the string in characters.
Declaration
public string ToPlain(bool tolower, bool tobase, out List<SegmentPosition> ranges)
Parameters
Type | Name | Description |
---|---|---|
bool | tolower | If true, the returned string will be lower-cased |
bool | tobase | If true, all letters will be mapped to their base character (i.e. diacritics will be stripped) |
ListSegmentPosition | ranges | A reference to the list of segment ranges which will be returned upon completion. The list includes, for each character in the result string, the position in the original segment. |
Returns
Type | Description |
---|---|
string |
ToPlain(int, int)
Returns a string containing only the plain text in this segment, covering the given token range. An exception will be thrown if the segment's tokens are not set or the token range is outside the bounds.
Declaration
public string ToPlain(int fromToken, int intoToken)
Parameters
Type | Name | Description |
---|---|---|
int | fromToken | The index of the first token |
int | intoToken | The index of the last token (inclusive, i.e. "into" semantics) |
Returns
Type | Description |
---|---|
string | A plain text string covering the specified token range |
ToString()
Declaration
public override string ToString()
Returns
Type | Description |
---|---|
string | A string representation of the object, for display purposes. |
Overrides
Trim()
Removes leading whitespace from the first segment element, if that is a text element, and trailing whitespace from the last segment element, if that is a text element. If the first/last segment element is not a text element, it will not be altered. Also, leading (trailing) whitespace will not be removed from a text element if it is preceded (followed) only by non-text elements. Also deletes any null elements.
Declaration
public void Trim()
TrimEnd()
Removes trailing whitespace from the last segment element, if that is a text element. If the last segment element is not a text element, nothing will happen. Hence, trailing whitespace will not be removed from a text element if it is followed by non-text elements. The number of elements may be altered by this method. Empty (null) elements will also be removed.
Declaration
public string TrimEnd()
Returns
Type | Description |
---|---|
string | A string consisting of the trimmed-off characters, or |
TrimStart()
Removes leading whitespace from the first segment element, if that is a text element. If the first segment element is not a text element, nothing will happen. Hence, leading whitespace will not be removed from a text element if it is preceded by non-text elements. The number of elements may be altered by this method. Empty (null) elements will also be removed.
Declaration
public string TrimStart()
Returns
Type | Description |
---|---|
string | A string consisting of the trimmed-off characters, or |
UpdateFromTokenIndices(ICollection<int>)
Updates the segment's text from the tokens, and adjusts span indices accordingly. An exception is thrown if the segment is not tokenized.
Declaration
public bool UpdateFromTokenIndices(ICollection<int> tokenIndices)
Parameters
Type | Name | Description |
---|---|---|
ICollectionint | tokenIndices | The list of tokens to update. |
Returns
Type | Description |
---|---|
bool | true if the segment was changed, and false otherwise. |
Validate()
Validates the current instance, with the ReportAllErrors validation mode.
Declaration
public ErrorCode Validate()
Returns
Type | Description |
---|---|
ErrorCode | An error code (which may be OK, indicating the segment is valid). |
Validate(ValidationMode)
Performs validation checks on this instance, applying the specified validation mode.
Declaration
public ErrorCode Validate(Segment.ValidationMode mode)
Parameters
Type | Name | Description |
---|---|---|
SegmentValidationMode | mode | The validation mode to apply |
Returns
Type | Description |
---|---|
ErrorCode | An error code (which may be OK, indicating the segment is valid). |
VerifyTokenSpans()
Verifies whether the spans of the segment's tokens are correct and reflect the segment's text. Note that the segment should be tokenized. If not, true is returned.
Declaration
public bool VerifyTokenSpans()
Returns
Type | Description |
---|---|
bool | true if the verification was successful or the segment is not tokenized, and false otherwise. |
WeakEquals(Segment)
Computes weak equality with another segment.
Weak equality does not check culture compatibility and tag anchors do not need to be identical, but text elements must match, as well as the order of tags (element similarity must not be None)
Declaration
public bool WeakEquals(Segment other)
Parameters
Type | Name | Description |
---|---|---|
Segment | other |
Returns
Type | Description |
---|---|
bool |