Skip to content

MKInlineHTMLScanner

Description

Used internally by the parser to scan inline HTML blocks.

Properties

Name Type Read-Only
EmailPartOneCharacters Dictionary

Methods

Name Parameters Returns
FindClosingTag chars() As MKCharacter, pos As integer, tagName As String Integer
FindOpenTag chars() As MKCharacter, pos As Integer, tagName As String Integer
GetHtmlTagName chars() As MKCharacter, pos As Integer String
MatchAnythingExcept chars() As MKCharacter, pos As Integer, currentChar As String, invalidChar As String Boolean
MatchAnythingExceptInvalidAndWhitespace chars() As MKCharacter, pos As Integer, currentChar As String, ParamArray invalidChars() As String Boolean
MatchASCIILetterOrDigit chars() As MKCharacter, pos As Integer, currentChar As String, ParamArray validChars() As String Boolean
MatchASCIILetterOrDigitOrHyphen chars() As MKCharacter, pos As Integer, maxCount As Integer Integer
MatchASCIILetterOrValidCharacter chars() As MKCharacter, pos As Integer, currentChar As String, ParamArray validChars() As String Boolean
ScanAutoLink chars() As MKCharacter, startPos As Integer, uri As String Integer
ScanDeclarationCommentOrCData chars() As MKCharacter, startPos As Integer Integer
ScanEmailLink chars() As MKCharacter, startPos As Integer, uri As String Integer
ScanLinkScheme chars() As MKCharacter, pos As Integer Integer
ScanProcessingInstruction chars() As MKCharacter, startPos As Integer Integer
SkipWhitespace chars() As MKCharacter, pos As Integer, currentChar As String Boolean

Property Descriptions

EmailPartOneCharacters As Dictionary

This property is shared.

Stores the characters that are valid for the first part of an email autolink: a-zA-Z0-9.!#$%&'*+\/=?^_{|}~-`


Method Descriptions

FindClosingTag(chars() As MKCharacter, pos As integer, tagName As String) As Integer This method is shared.

Finds the 0-based index in chars of a valid HTML closingTag beginning at pos. Returns 0 if no valid closingTag is found.

Assumes that pos points to the character immediately following "</"

closingTag: </, tagName, optional whitespace, >
tagName: ASCII letter, >= 0 ASCII letter|digit|-

Also sets the ByRef tagName parameter to the detected tagName (if present) or "" if no valid tagName is found.

The return value is the 0-based index immediately after the closing >.


FindOpenTag(chars() As MKCharacter, pos As Integer, tagName As String) As Integer This method is shared.

Returns the 0-based index in line of the end of a valid HTML opening tag, beginning at pos or 0 if not found. tagName is set to the tag found or "".

Assumes that pos points to the character immediately following "<" Sets the ByRef parameter tagName to the detected tag name (if present) or "" if none is found.

openTag: "<", a tagname, >= 0 attributes, optional whitespace, optional "/", and a ">".
tagName: ASCII letter, >= 0 ASCII letter|digit|-
attribute: whitespace, attributeName, optional attributeValueSpec
attributeName: ASCII letter|-|:, >=0 ASCII letter|digit|_|.|:|-
attributeValueSpec: optional whitespace, =, optional whitespace, attributeValue
attributeValue: unQuotedAttValue | singleQuotedAttValue | doubleQuotedAttValue
unQuotedAttValue: > 0 characters NOT including whitespace, ", ', =, <, >, or `.
singleQuotedAttValue: ', >= 0 characters NOT including ', then a final '
doubleQuotedAttValue: ", >= 0 characters NOT including ", then a final "

GetHtmlTagName(chars() As MKCharacter, pos As Integer) As String This method is shared.

Starting at pos, reads a HTML tag name from chars and returns it. Adjusts pos to point to the character immediately after the tag name. May return "".

Note: pos is passed ByRef. tagName: ASCII letter, >= 0 ASCII letter|digit|- Returns "" If no valid tagName is found.


MatchAnythingExcept(chars() As MKCharacter, pos As Integer, currentChar As String, invalidChar As String) As Boolean This method is shared.

Advances past the characters in chars starting at pos until invalidChar. Returns True if we advanced. pos and currentChar are mutated.


MatchAnythingExceptInvalidAndWhitespace(chars() As MKCharacter, pos As Integer, currentChar As String, ParamArray invalidChars() As String) As Boolean This method is shared.

Advances past the characters in chars starting at pos until whitespace or an invalid character is found. Returns True if we advanced. pos and currentChar are mutated.


MatchASCIILetterOrDigit(chars() As MKCharacter, pos As Integer, currentChar As String, ParamArray validChars() As String) As Boolean This method is shared.

Advances through chars starting at pos as long as the character is an ASCII letter, digit or validChars. Mutates pos and currentChar. True if pos changed.


MatchASCIILetterOrDigitOrHyphen(chars() As MKCharacter, pos As Integer, maxCount As Integer) As Integer This method is shared.

Advances through chars as long as it matches an ASCII letter, digit or hyphen. Returns the number of matched characters. Stops if we match maxCount characters.


MatchASCIILetterOrValidCharacter(chars() As MKCharacter, pos As Integer, currentChar As String, ParamArray validChars() As String) As Boolean This method is shared.

Advances through chars starting at pos as long as the character is an ASCII letter or validChars. Mutates pos and currentChar. True if pos changed.


ScanAutoLink(chars() As MKCharacter, startPos As Integer, uri As String) As Integer This method is shared.

Scans chars for a valid autolink returning the index of the character immediately following a valid autolink or 0 if none is found. Sets uri to the absolute URI.

Assumes chars(startPos - 1) = "<"

Valid autolink:

     "<", absolute URI, ">"
Absolute URI = scheme, :, >=0 characters (not WS, <, >)
Scheme = [A-Za-z]{1}[A-Za-z0-9\+\.\-]{1, 31}

ScanDeclarationCommentOrCData(chars() As MKCharacter, startPos As Integer) As Integer This method is shared.

Scans chars for a valid HTML declaration, comment or CDATA section. Returns the index of the character after the closing character or 0 if not found.

Assumes startPos points at the index of the character immediately following <!.

CDATA:
-----
  "<![CDATA[", >= 0 characters, then "]]>

Declaration:
-----------
  "<!", >= 1 uppercase ASCII letters, whitespace, >= 1 characters not including ">", then ">"

Comment:
-------
  "<!--" + text + "-->"
  Where text does not start with ">" or "->", does not end with "-", and does not contain "--"

Starting assumptions:
  <![CDATA[X]]>
  0123456789012
    ^

  <!X X>
  012345
    ^

  <!--a-->
  01234567
    ^

ScanEmailLink(chars() As MKCharacter, startPos As Integer, uri As String) As Integer This method is shared.

Scans chars from startPos for a valid email autolink, returning the index of the character after a valid autolink or 0 if none is found. Sets uri to the absolute URI.

Assumes chars(startPos - 1) = "<" Sets the ByRef parameter uri to the absolute URI.

Valid email autolink:

 "<", email address, ">"
  Email address:
     [a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?
     (?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*

ScanLinkScheme(chars() As MKCharacter, pos As Integer) As Integer This method is shared.

Scans chars beginning at pos for an inline link scheme, returning the index of the character following the scheme or 0 if none is found.

Valid scheme = [A-Za-z]{1}[A-Za-z0-9+.-]{1, 31}


ScanProcessingInstruction(chars() As MKCharacter, startPos As Integer) As Integer This method is shared.

Scans for an inline HTML "processing instruction". Returns the index in chars of the character after the closing ?> or 0 if not found.

A processing instruction consists of the string <?, a string of characters not including the string ?> and the string ?>. Assumes startPos points at the index in chars of the character immediately following an opening <?.


SkipWhitespace(chars() As MKCharacter, pos As Integer, currentChar As String) As Boolean This method is shared.

Skips over whitespace in chars beginning at pos updating pos and currentChar. Returns True if any whitespace was skipped.