Skip to content

XUIStrings

Description

A module containing many extensions methods and regular methods for manipulating strings.

Includes comprehensive support for unicode characters such as emoji.

Properties

Name Type Read-Only
CodePointCategoryDictionary Dictionary

Methods

Name Parameters Returns
CategoryForLatin1 codePoint As Integer XUIStrings.UnicodeCategories
CharacterArray s As String String()
CharacterCount s As String Integer
CheckLetter uc As XUIStrings.UnicodeCategories Boolean
CheckLetterOrDigit uc As XUIStrings.UnicodeCategories Boolean
Chop s As String, numChars As Integer String
Contains s As String, what As String, caseSensitive As Boolean Boolean
FromArray chars() As String, start As Integer, length As Integer String
GetLatin1UnicodeCharacter character As String XUIStrings.UnicodeCategories
GetUnicodeCategory s As String XUIStrings.UnicodeCategories
InitialiseCodepointCategoryDictionary Dictionary
IsASCII character As String Boolean
IsASCIILetter letter As String Boolean
IsASCIILetterOrDigit letter As String Boolean
IsASCIILetterOrDigitOrHyphen letter As String Boolean
IsASCIILetterOrDigitOrUnderscore letter As String Boolean
IsASCIILetterOrUnderscore letter As String Boolean
IsBinaryDigit s As String Boolean
IsDigit s As String Boolean
IsExactly char As String, ParamArray characters() As String Boolean
IsHexDigit s As String Boolean
IsLatin1 character As String Boolean
IsLetter s As String Boolean
IsLetterDigitOrUnderscore s As String Boolean
IsLetterOrDigit s As String Boolean
IsLowercaseCharacter character As String Boolean
IsOctalDigit s As String Boolean
IsRGBA s As String Boolean
IsSpaceOrTab character As String Boolean
IsSpaceOrTabOrNewline character As String Boolean
IsUppercaseASCIICharacter s As String Boolean
IsUppercaseASCIILetter s As String Boolean
IsWhiteSpace s As String Boolean
JustifyLeft s As String, width As Integer, char As String String
JustifyLeft s As String, width As Integer, char As String String
JustifyRight s As String, width As Integer, char As String String
JustifyRight s As String, width As Integer, char As String String
LeftCharacters s As String, count As Integer String
MiddleCharacters s As String, start As Integer String
MiddleCharacters s As String, start As Integer, count As Integer String
ReplaceInvisibleCharacters s As String String
RightCharacters s As String, count As Integer String

Constants

Name Type
TAB String
UNICODE_CODEPOINT_CATEGORY_PAIRS String

TAB As String The horiztonal tab character.


UNICODE_CODEPOINT_CATEGORY_PAIRS As String Contains parsed data from the Unicode standard where each line represents a codepoint/category pairing.

Each line is in the format: codepoint:category where codepoint is a hex value codepoint and category is a two character category.


Enumerations

UnicodeCategories

The different Unicode categories.

Name
ClosePunctuation
ConnectorPunctuation
Control
CurrencySymbol
DashPunctuation
DecimalDigitNumber
EnclosingMark
FinalQuotePunctuation
Format
InitialQuotePunctuation
LetterNumber
LineSeparator
LowercaseLetter
MathSymbol
ModifierLetter
ModifierSymbol
NonSpacingMark
OpenPunctuation
OtherLetter
OtherNotAssigned
OtherNumber
OtherPunctuation
OtherSymbol
ParagraphSeparator
PrivateUse
SpaceSeparator
SpaceCombiningMark
Surrogate
TitlecaseLetter
UppercaseLetter
None

Property Descriptions

CodePointCategoryDictionary As Dictionary

Maps a Unicode codepoint to its category. Key = Unicode codepoint, Value = Unicode category.


Method Descriptions

CategoryForLatin1(codePoint As Integer) As XUIStrings.UnicodeCategories

Returns the Unicode category for a latin1 character.

Assumes that codePoint is within the range &u0000 and &u00FF.


CharacterArray(s As String) As String()

Returns the individual characters in s as an array.

It's at least 4x faster to use Text to split into characters and then iterate over that array than to use the native String.Characters() method that returns an Iterable.


CharacterCount(s As String) As Integer

Returns the number of characters in the passed string (including multibyte characters).


CheckLetter(uc As XUIStrings.UnicodeCategories) As Boolean

Checks if uc belongs to the letter category.


CheckLetterOrDigit(uc As XUIStrings.UnicodeCategories) As Boolean

Checks if uc belongs to the letter or digit categories.


Chop(s As String, numChars As Integer) As String

Removes numChars characters from s.

If numChars is greater than the length of s, "" is returned.


Contains(s As String, what As String, caseSensitive As Boolean) As Boolean

True if s contains what.


FromArray(chars() As String, start As Integer, length As Integer) As String

Returns a string from chars beginning at index start for length characters. Assumes chars is an array of individual characters.

If start + length > the number of remaining characters then all characters from start to the end of chars are returned. If length = -1 then all characters from start to the end of chars are returned.


GetLatin1UnicodeCharacter(character As String) As XUIStrings.UnicodeCategories

Returns the Unicode category for Unicode characters <= &h00ff.

Assumes that character is one character long.


GetUnicodeCategory(s As String) As XUIStrings.UnicodeCategories

Returns the Unicode category that s belongs to.

If s is empty or is more than one character in length then we return a special None category.


InitialiseCodepointCategoryDictionary() As Dictionary

Returns a dictionary mapping unicode codepoints to unicode categories.

UNICODE_CODEPOINT_CATEGORY_PAIRS contains parsed data from the Unicode standard where each line represents a codepoint/category pairing. Each line is in the format: codepoint:category where codepoint is a hex value codepoint and category is a two character category.


IsASCII(character As String) As Boolean

True if character is in the ASCII range.

Assumes that character is one character in length.


IsASCIILetter(letter As String) As Boolean

Returns True if the letter is A-Z or a-z.


IsASCIILetterOrDigit(letter As String) As Boolean

Returns True if the letter is A-Z, a-z or 0-9


IsASCIILetterOrDigitOrHyphen(letter As String) As Boolean

Returns True if the letter is A-Z, a-z, 0-9 or "-"


IsASCIILetterOrDigitOrUnderscore(letter As String) As Boolean

Returns True if the letter is A-Z, a-z, 0-9 or the underscore.


IsASCIILetterOrUnderscore(letter As String) As Boolean

Returns True if the letter is A-Z, a-z or the underscore.


IsBinaryDigit(s As String) As Boolean

True if s is 0 or 1.


IsDigit(s As String) As Boolean

True if s is a single digit in the range 0-9.

We could use GetUnicodeCategory but a Select...Case is faster.


IsExactly(char As String, ParamArray characters() As String) As Boolean

True if char exactly matches (case-sensitive) any of the passed characters.

Assumes that char is a single character in length.


IsHexDigit(s As String) As Boolean

True if s is a valid hexadecimal digit (0-9, a-f, A-F).


IsLatin1(character As String) As Boolean

Returns True for if character is in the ASCII or Latin-1 supplement range.

Assumes that character is one character in length.


IsLetter(s As String) As Boolean

True if s is a letter.

Based on code from .NET core:


IsLetterDigitOrUnderscore(s As String) As Boolean

Determines whether s is a letter, a digit or an underscore.

Based on code from .NET core:


IsLetterOrDigit(s As String) As Boolean

Determines whether s is a letter or a digit.

Based on code from .NET core:


IsLowercaseCharacter(character As String) As Boolean

True if character is lowercase.

Assumes that character is one character long.


IsOctalDigit(s As String) As Boolean

True if s is a valid octal digit (0-7).


IsRGBA(s As String) As Boolean

Returns True if s is a valid RGBA hex string.

Valid formats are:


IsSpaceOrTab(character As String) As Boolean

True if character is a space or horizontal tab.


IsSpaceOrTabOrNewline(character As String) As Boolean

True if character is a space, horizontal tab or UNIX newline (&u0A).


IsUppercaseASCIICharacter(s As String) As Boolean

True if s is an uppercase ASCII character.


IsUppercaseASCIILetter(s As String) As Boolean

True if s is an uppercase ASCII letter.


IsWhiteSpace(s As String) As Boolean

True if s is Unicode whitespace.

Assumes s is a single character. String.Asc returns the codepoint for the first character in s so if this method is passed a string comprising more than one character, it'll break.

&u0009 = <control> HORIZONTAL TAB
&u000a = <control> LINE FEED
&u000b = <control> VERTICAL TAB
&u000c = <contorl> FORM FEED
&u000d = <control> CARRIAGE RETURN
&u0085 = <control> NEXT LINE
&u00a0 = NO-BREAK SPACE

JustifyLeft(s As String, width As Integer, char As String) As String

Left justifies s to width characters using char to pad the right edge if required.

"Hello".JustifyLeft(10) // Becomes "Hello     "

JustifyLeft(s As String, width As Integer, char As String) As String

Left justifies s to width characters using char to pad the right edge if required.

"Hello".JustifyLeft(10) // Becomes "Hello     "

JustifyRight(s As String, width As Integer, char As String) As String

Right justifies s to width characters using char to pad the left edge if required.

"Hello".JustifyRight(10) // Becomes "     Hello"

JustifyRight(s As String, width As Integer, char As String) As String

Right justifies s to width characters using char to pad the left edge if required.

"Hello".JustifyRight(10) // Becomes "     Hello"

LeftCharacters(s As String, count As Integer) As String

Returns count left-most characters from s.


MiddleCharacters(s As String, start As Integer) As String

Returns all of the characters from start to the end of s. The start position is a zero-based.


MiddleCharacters(s As String, start As Integer, count As Integer) As String

Returns count characters from s. Handles multibyte characters like emoji.


ReplaceInvisibleCharacters(s As String) As String

Replaces certain invisible characters with a visible representation.


RightCharacters(s As String, count As Integer) As String

Returns count right-most characters from s.