Skip to content

MKParser

Description

This class is responsible for parsing Markdown input into an abstract syntax tree (AST). The AST can then be transformed into other representations (e.g. HTML).

Properties

Name Type Read-Only
mAllMatched Boolean
mCharsLastIndex Integer
mContainer MKBlock
mCurrentBlock MKBlock
mCurrentChar String
mCurrentColumn Integer
mCurrentIndent Integer
mCurrentLine XUITextLine
mCurrentOffset Integer
mDoc MKDocument
mLastMatchedContainer MKBlock
mLines() XUITextLine
mMaybeLazy Boolean
mNextNWS Integer
mNextNWSColumn Integer
mRemainingSpaces Integer

Methods

Name Parameters Returns
AcceptsLines b As MKBlock Boolean
AdvanceOffset count As Integer, columns As Boolean
AdvanceOptionalSpace Boolean
CanContain parentType As MKBlockTypes, childType As MKBlockTypes Boolean
ConvertParagraphBlockToSetextHeading paragraph As MarkdownKit.MKBlock, line As XUITextLine MKSetextHeadingBlock
CreateChildBlock parent As MKBlock, line As XUITextLine, type As MKBlockTypes, blockStartOffset As Integer MKBlock
FindNextNonWhitespace
FirstNonBlankIndex Integer
IsATXHeader data As Dictionary Boolean
IsClosingCodeFence length As Integer Boolean
IsCodeFenceOpening fenceChar As String, data As Dictionary Boolean
IsCorrectHtmlBlockEnd type As MKHTMLBlockTypes, line As XUITextLine, pos As Integer Boolean
IsHtmlBlockStart pos As Integer, data As Dictionary Boolean
IsHtmlBlockType7Start pos As Integer, data As Dictionary Boolean
IsSetextHeadingLine data As Dictionary Boolean
IsThematicBreak chars() As String, pos As Integer Boolean
LastNonBlankIndex firstNonBlank As Integer Integer
MatchWhitespaceCharacters line As XUITextLine, pos As Integer Integer
ParseBlockStructure
ParseInlines
ParseLines lines() As XUITextLine MKDocument
ParseListMarker indented As Boolean, line As XUITextLine, pos As Integer, interruptsParagraph As Boolean, data As MarkdownKit.MKListData Boolean
ParseSource markdown As String MKDocument
ProcessLine line As XUITextLine
ProcessRemainderOfLine
Reset lines() As XUITextLine
ResetLine line As XUITextLine
TryNewBlocks
TryOpenBlocks

Constants

Name Type
CODE_INDENT Double
TAB_SIZE Double

CODE_INDENT As Double The number of spaces required for a code indentation.


TAB_SIZE As Double The number of spaces a tab is considered equivalent to.


Property Descriptions

mAllMatched As Boolean

Used internally when parsing the block structure.


mCharsLastIndex As Integer

A cache of [mCurrentLine.Characters.LastIndex].


mContainer As MKBlock

The block we are currently considering.


mCurrentBlock As MKBlock

The block currently being evaluated.


mCurrentChar As String

The current character we are evaluating.


mCurrentColumn As Integer

The 0-based virtual position in the line that takes tab expansion into account.


mCurrentIndent As Integer

The current indent number expressed as spaces (accounts for tab stops).


mCurrentLine As XUITextLine

The line we are currently processing.


mCurrentOffset As Integer

The 0-based position of the character considered as the start of the current line being evaluated once indentation and block starters have been consumed.


mDoc As MKDocument

The document the parser is currently constructing.


mLastMatchedContainer As MKBlock

The last matching container.


mLines() As XUITextLine

A reference to the array of text lines being parsed. Should be considered read-only.


mMaybeLazy As Boolean

True if the current line might be a lazy continuation line.


mNextNWS As Integer

The zero-based index of the next non-whitespace character in the line, assuming that the line begins at mCurrentOffset.


mNextNWSColumn As Integer

The 0-based virtual position of the next non-whitespace character on [mCurrentLine] that takes tab expansion into account.


mRemainingSpaces As Integer

Internally used to compute additional remaining spaces.


Method Descriptions

AcceptsLines(b As MKBlock) As Boolean

Returns True if b accepts lines.


AdvanceOffset(count As Integer, columns As Boolean)

Advances the current offset by count places.

If columns is True then we need to take into consideration tab stops. The offset relates to the location on the current line that is considered the start of the line once indentation and block openers are taken into consideration.


AdvanceOptionalSpace() As Boolean

Advances a single space or tab if the next character is a space returning True if there was a space.


CanContain(parentType As MKBlockTypes, childType As MKBlockTypes) As Boolean This method is shared.

Returns True if a parentType can contain childType.


ConvertParagraphBlockToSetextHeading(paragraph As MarkdownKit.MKBlock, line As XUITextLine) As MKSetextHeadingBlock

Removes the passed paragraph from its parent and replaces it with a new SetextHeading block with the same children. Returns the SetextHeading block.


CreateChildBlock(parent As MKBlock, line As XUITextLine, type As MKBlockTypes, blockStartOffset As Integer) As MKBlock

Creates a new block of type, adds it as a child of parent.

blockStartOffset will be applied to mCurrentOffset when determining the absolute start position of this block.


FindNextNonWhitespace()

Finds the next non-whitespace (NWS) character on this line


FirstNonBlankIndex() As Integer

Finds the index in mLines of the first non-blank line or returns -1 if there are only blank lines.


IsATXHeader(data As Dictionary) As Boolean

Returns True if mCurrentLine, beginning at mNextNWS, is a valid ATX heading. If True then data is a new valid dictionary, otherwise data is set to Nil.

Assumes that mNextNWS points to a "#" in mCurrentLine. Sets data.Value("level") to the header level (1 to 6). Sets data.Value("length") to number of characters from the start of the opening sequence to the first character of the heading content. Sets data.Value("closingSequenceCount") to the number of trailing # characters (may be zero). Sets data.Value("closingSequenceStart") to the index of the first # character in the closing sequence if there is one, otherwise data.Value("closingSequenceCount")` is absent.


IsClosingCodeFence(length As Integer) As Boolean

Returns True if mCurrentLine, beginning at mNextNWS is a closing fence of at least length characters.


IsCodeFenceOpening(fenceChar As String, data As Dictionary) As Boolean

Returns True if mCurrentLine, beginning at mNextNWS, is a fenced code opening. Populates data with the "fenceLength".

Assumes that mCurrentChar = fenceChar and mCurrentLine.Characters(mNextNWS) = fenceChar as this method is only called from TryNewBlocks.

Also assumes that fenceChar is either "`" or "~".

We don't capture the (optional) info string here as it gets added later as a MKTextBlock child of this block.


IsCorrectHtmlBlockEnd(type As MKHTMLBlockTypes, line As XUITextLine, pos As Integer) As Boolean

Returns True if we find the correct ending condition for the specified HTML block type.

There are 7 kinds of HTML blocks (CommonMark spec 0.29 4.6).


IsHtmlBlockStart(pos As Integer, data As Dictionary) As Boolean

Returns True if the there is a HTML block starting at pos on mCurrentLine. Puts the "type" of HTML block in data.

There are 7 kinds of HTML block. See the note "HTML Block Types" in this class for more detail.


IsHtmlBlockType7Start(pos As Integer, data As Dictionary) As Boolean

Returns True if mCurrentLine from pos is a type 7 HTML block start. Sets data.Value("type") to none or type 7 enumeration.

Type 7:
{openTag NOT script|style|pre}[•→]+|⮐$   or
{closingTag}[•→]+|⮐$

IsSetextHeadingLine(data As Dictionary) As Boolean

Returns True if mCurrentLine, beginning at mNextNWS is a setext heading line. Sets data.Value("level")

Sets data.Value("level") to the heading level (1 or 2) or 0 if this is not a setext heading line.

  ^[=]+[ ]*$
  ^[-]+[ ]*$

IsThematicBreak(chars() As String, pos As Integer) As Boolean

Returns True if line starting at pos is a thematic break.

Valid thematic break lines consist of >= 3 dashes, underscores or asterixes which may be optionally separated by any amount of spaces or tabs whitespace. The characters must match:


LastNonBlankIndex(firstNonBlank As Integer) As Integer

Finds the index in mLines of the last non-blank line or returns -1 if there are only blank lines.

firstNonBlank should be the index of a valid non-blank line in mLines (i.e. FirstNonBlankIndex has been called prior to this method).


MatchWhitespaceCharacters(line As XUITextLine, pos As Integer) As Integer

Matches whitespace on line beginning at pos and returns how many characters were matched.


ParseBlockStructure()

Parses mLines into a block structure.

This is part 1 of the parsing process. It gives us the overall structure of the Markdown document. Assumes the parser has been reset before this method is invoked.


ParseInlines()

Walks the document parsing inline content.

Assumes that ParseBlockStructure was called immediately prior to this method.


ParseLines(lines() As XUITextLine) As MKDocument

Parses lines into a Markdown document.


ParseListMarker(indented As Boolean, line As XUITextLine, pos As Integer, interruptsParagraph As Boolean, data As MarkdownKit.MKListData) As Boolean

Returns True if able to parse a ListItem marker, populating data with the details.


ParseSource(markdown As String) As MKDocument

Parses markdown into a Markdown document.


ProcessLine(line As XUITextLine)

Processes a line of Markdown and incorporates it into the document tree.


ProcessRemainderOfLine()

Processes what's left of the current line.

We've tried matching against the open blocks and we've opened any required new blocks. What now remains at the offset is a text line. Add it to the appropriate container.


Reset(lines() As XUITextLine)

Resets all properties, ready to parse again.


ResetLine(line As XUITextLine)

Sets line to be the current line for processing, clears the line's tokens and marks it as dirty.


TryNewBlocks()

Tries to start a new container block.


TryOpenBlocks()

Iterates through open blocks and descend through their last children down to the last open block.

For each open block, check to see if mCurrentLine meets the required condition to keep the block open.

mContainer will be set to the block which last had a match to the line.


HTML Block Types

Type 1: MKHTMLBlockTypes.InterruptingBlockWithEmptyLines

Start condition: The line begins with the string "<script", "<pre", or "<style" (case-insensitive), followed by whitespace, the string ">", or the end of the line.

End condition: The line contains an end tag "</script>", "</pre>", or "</style>" (case-insensitive). It need not match the start tag.

Type 2: MKHTMLBlockTypesComment

Start condition: The line begins with the string "<!--".

End condition: The line contains the string "-->".

Type 3: MKHTMLBlockTypes.ProcessingInstruction

Start condition: The line begins with the string "<?".

End condition: The line contains the string "?>".

Type 4: MKHTMLBlocks.TypeDocumentType

Start condition: The line begins with the string "<!" followed by an uppercase ASCII letter.

End condition: The line contains the character ">".

Type 5: MKHTMLBlockTypes.CData

Start condition: The line begins with the string "<![CDATA[".

End condition: The line contains the string "]]>".

Type 6: MKHTMLBlockTypes.InterruptingBlock

Start condition: The line begins the string "<" or "<!--" followed by one of the strings (case-insensitive) "address", "article", "aside", "base", "basefont", "blockquote", "body", "caption", "center", "col", "colgroup", "dd", "details", "dialog", "dir", "div", "dl", "dt", "fieldset", "figcaption", "figure", "footer", "form", "frame", "frameset", "h1", "h2", "h3", "h4", "h5", "h6", "head", "header", "hr", "html", "iframe", "legend", "li", "link", "main", "menu", "menuitem", "nav", "noframes", "ol", "optgroup", "option", "p", "param", "section", "source", "summary", "table", "tbody", "td", "tfoot", "th", "thead", "title", "tr", "track", "ul", followed by whitespace, the end of the line, the string "-->" or the string "/>".

End condition: The line is followed by a blank line.

Type 7: MKHTMLBlockTypes.NonInterruptingBlock

Start condition: The line begins with a complete open tag (with any tag name other than script, style, or pre) or a complete closing tag, followed only by whitespace or the end of the line.

End condition: The line is followed by a blank line.