MKParser
Description
This class is responsible for parsing Markdown input into an abstract syntax tree (AST). The AST can then be transformed into other representations (e.g. HTML).
Properties
Name | Type | Read-Only |
---|---|---|
mAllMatched | Boolean |
|
mCharsLastIndex | Integer |
|
mContainer | MKBlock |
|
mCurrentBlock | MKBlock |
|
mCurrentChar | String |
|
mCurrentColumn | Integer |
|
mCurrentIndent | Integer |
|
mCurrentLine | XUITextLine |
|
mCurrentOffset | Integer |
|
mDoc | MKDocument |
|
mLastMatchedContainer | MKBlock |
|
mLines() | XUITextLine |
|
mMaybeLazy | Boolean |
|
mNextNWS | Integer |
|
mNextNWSColumn | Integer |
|
mRemainingSpaces | Integer |
Methods
Name | Parameters | Returns |
---|---|---|
AcceptsLines | b As MKBlock |
Boolean |
AdvanceOffset | count As Integer , columns As Boolean |
|
AdvanceOptionalSpace | Boolean |
|
CanContain | parentType As MKBlockTypes , childType As MKBlockTypes |
Boolean |
ConvertParagraphBlockToSetextHeading | paragraph As MarkdownKit.MKBlock , line As XUITextLine |
MKSetextHeadingBlock |
CreateChildBlock | parent As MKBlock , line As XUITextLine , type As MKBlockTypes , blockStartOffset As Integer |
MKBlock |
FindNextNonWhitespace | ||
FirstNonBlankIndex | Integer |
|
IsATXHeader | data As Dictionary |
Boolean |
IsClosingCodeFence | length As Integer |
Boolean |
IsCodeFenceOpening | fenceChar As String , data As Dictionary |
Boolean |
IsCorrectHtmlBlockEnd | type As MKHTMLBlockTypes , line As XUITextLine , pos As Integer |
Boolean |
IsHtmlBlockStart | pos As Integer , data As Dictionary |
Boolean |
IsHtmlBlockType7Start | pos As Integer , data As Dictionary |
Boolean |
IsSetextHeadingLine | data As Dictionary |
Boolean |
IsThematicBreak | chars() As String , pos As Integer |
Boolean |
LastNonBlankIndex | firstNonBlank As Integer |
Integer |
MatchWhitespaceCharacters | line As XUITextLine , pos As Integer |
Integer |
ParseBlockStructure | ||
ParseInlines | ||
ParseLines | lines() As XUITextLine |
MKDocument |
ParseListMarker | indented As Boolean , line As XUITextLine , pos As Integer , interruptsParagraph As Boolean , data As MarkdownKit.MKListData |
Boolean |
ParseSource | markdown As String |
MKDocument |
ProcessLine | line As XUITextLine |
|
ProcessRemainderOfLine | ||
Reset | lines() As XUITextLine |
|
ResetLine | line As XUITextLine |
|
TryNewBlocks | ||
TryOpenBlocks |
Constants
Name | Type |
---|---|
CODE_INDENT | Double |
TAB_SIZE | Double |
CODE_INDENT As Double The number of spaces required for a code indentation.
TAB_SIZE As Double The number of spaces a tab is considered equivalent to.
Property Descriptions
mAllMatched As Boolean
Used internally when parsing the block structure.
mCharsLastIndex As Integer
A cache of [mCurrentLine.Characters.LastIndex].
mContainer As MKBlock
The block we are currently considering.
mCurrentBlock As MKBlock
The block currently being evaluated.
mCurrentChar As String
The current character we are evaluating.
mCurrentColumn As Integer
The 0-based virtual position in the line that takes tab expansion into account.
mCurrentIndent As Integer
The current indent number expressed as spaces (accounts for tab stops).
mCurrentLine As XUITextLine
The line we are currently processing.
mCurrentOffset As Integer
The 0-based position of the character considered as the start of the current line being evaluated once indentation and block starters have been consumed.
mDoc As MKDocument
The document the parser is currently constructing.
mLastMatchedContainer As MKBlock
The last matching container.
mLines() As XUITextLine
A reference to the array of text lines being parsed. Should be considered read-only.
mMaybeLazy As Boolean
True if the current line might be a lazy continuation line.
mNextNWS As Integer
The zero-based index of the next non-whitespace character in the line, assuming that the line begins at mCurrentOffset
.
mNextNWSColumn As Integer
The 0-based virtual position of the next non-whitespace character on [mCurrentLine] that takes tab expansion into account.
mRemainingSpaces As Integer
Internally used to compute additional remaining spaces.
Method Descriptions
AcceptsLines(b As MKBlock) As Boolean
Returns True if b
accepts lines.
AdvanceOffset(count As Integer, columns As Boolean)
Advances the current offset by count
places.
If columns
is True then we need to take into consideration tab stops.
The offset relates to the location on the current line that is considered the start of the line
once indentation and block openers are taken into consideration.
AdvanceOptionalSpace() As Boolean
Advances a single space or tab if the next character is a space returning True if there was a space.
CanContain(parentType As MKBlockTypes, childType As MKBlockTypes) As Boolean This method is shared.
Returns True if a parentType
can contain childType
.
ConvertParagraphBlockToSetextHeading(paragraph As MarkdownKit.MKBlock, line As XUITextLine) As MKSetextHeadingBlock
Removes the passed paragraph
from its parent and replaces it with a new SetextHeading block
with the same children. Returns the SetextHeading block.
CreateChildBlock(parent As MKBlock, line As XUITextLine, type As MKBlockTypes, blockStartOffset As Integer) As MKBlock
Creates a new block of type
, adds it as a child of parent
.
blockStartOffset
will be applied to mCurrentOffset
when determining the absolute start
position of this block.
FindNextNonWhitespace()
Finds the next non-whitespace (NWS) character on this line
FirstNonBlankIndex() As Integer
Finds the index in mLines of the first non-blank line or returns -1 if there are only blank lines.
IsATXHeader(data As Dictionary) As Boolean
Returns True if mCurrentLine
, beginning at mNextNWS
, is a valid ATX heading.
If True then data
is a new valid dictionary, otherwise data
is set to Nil.
Assumes that mNextNWS
points to a "#" in mCurrentLine
.
Sets data.Value("level")
to the header level (1 to 6).
Sets data.Value("length")
to number of characters from the start of the opening sequence to the
first character of the heading content.
Sets data.Value("closingSequenceCount")
to the number of trailing #
characters (may be zero).
Sets data.Value("closingSequenceStart")
to the index of the first #
character in the closing
sequence if there is one, otherwise data.Value("closingSequenceCount")` is absent.
IsClosingCodeFence(length As Integer) As Boolean
Returns True if mCurrentLine, beginning at mNextNWS
is a closing fence of at least length
characters.
IsCodeFenceOpening(fenceChar As String, data As Dictionary) As Boolean
Returns True if mCurrentLine
, beginning at mNextNWS
, is a fenced code opening. Populates data
with the "fenceLength".
Assumes that mCurrentChar = fenceChar
and mCurrentLine.Characters(mNextNWS) = fenceChar
as this
method is only called from TryNewBlocks
.
Also assumes that fenceChar
is either "`" or "~".
We don't capture the (optional) info string here as it gets added later as a MKTextBlock
child of this block.
IsCorrectHtmlBlockEnd(type As MKHTMLBlockTypes, line As XUITextLine, pos As Integer) As Boolean
Returns True if we find the correct ending condition for the specified HTML block type.
There are 7 kinds of HTML blocks (CommonMark spec 0.29 4.6).
IsHtmlBlockStart(pos As Integer, data As Dictionary) As Boolean
Returns True if the there is a HTML block starting at pos
on mCurrentLine
.
Puts the "type" of HTML block in data
.
There are 7 kinds of HTML block. See the note "HTML Block Types" in this class for more detail.
IsHtmlBlockType7Start(pos As Integer, data As Dictionary) As Boolean
Returns True if mCurrentLine
from pos
is a type 7 HTML block start. Sets data.Value("type")
to
none or type 7 enumeration.
Type 7:
{openTag NOT script|style|pre}[•→]+|⮐$ or
{closingTag}[•→]+|⮐$
IsSetextHeadingLine(data As Dictionary) As Boolean
Returns True if mCurrentLine
, beginning at mNextNWS
is a setext heading line.
Sets data.Value("level")
Sets data.Value("level")
to the heading level (1 or 2) or 0 if this is not a setext heading line.
^[=]+[ ]*$
^[-]+[ ]*$
IsThematicBreak(chars() As String, pos As Integer) As Boolean
Returns True if line
starting at pos
is a thematic break.
Valid thematic break lines consist of >= 3 dashes, underscores or asterixes which may be optionally separated by any amount of spaces or tabs whitespace. The characters must match:
LastNonBlankIndex(firstNonBlank As Integer) As Integer
Finds the index in mLines
of the last non-blank line or returns -1 if there are only blank lines.
firstNonBlank
should be the index of a valid non-blank line in mLines
(i.e. FirstNonBlankIndex
has
been called prior to this method).
MatchWhitespaceCharacters(line As XUITextLine, pos As Integer) As Integer
Matches whitespace on line
beginning at pos
and returns how many characters were matched.
ParseBlockStructure()
Parses mLines
into a block structure.
This is part 1 of the parsing process. It gives us the overall structure of the Markdown document. Assumes the parser has been reset before this method is invoked.
ParseInlines()
Walks the document parsing inline content.
Assumes that ParseBlockStructure
was called immediately prior to this method.
ParseLines(lines() As XUITextLine) As MKDocument
Parses lines
into a Markdown document.
ParseListMarker(indented As Boolean, line As XUITextLine, pos As Integer, interruptsParagraph As Boolean, data As MarkdownKit.MKListData) As Boolean
Returns True if able to parse a ListItem marker, populating data
with the details.
ParseSource(markdown As String) As MKDocument
Parses markdown
into a Markdown document.
ProcessLine(line As XUITextLine)
Processes a line of Markdown and incorporates it into the document tree.
ProcessRemainderOfLine()
Processes what's left of the current line.
We've tried matching against the open blocks and we've opened any required new blocks. What now remains at the offset is a text line. Add it to the appropriate container.
Reset(lines() As XUITextLine)
Resets all properties, ready to parse again.
ResetLine(line As XUITextLine)
Sets line
to be the current line for processing, clears the line's tokens and marks it as dirty.
TryNewBlocks()
Tries to start a new container block.
TryOpenBlocks()
Iterates through open blocks and descend through their last children down to the last open block.
For each open block, check to see if mCurrentLine
meets the required condition to keep the block open.
mContainer
will be set to the block which last had a match to the line.
HTML Block Types
Type 1: MKHTMLBlockTypes.InterruptingBlockWithEmptyLines
Start condition: The line begins with the string "<script"
, "<pre"
, or "<style"
(case-insensitive),
followed by whitespace, the string ">"
, or the end of the line.
End condition: The line contains an end tag "</script>"
, "</pre>"
, or "</style>"
(case-insensitive).
It need not match the start tag.
Type 2: MKHTMLBlockTypesComment
Start condition: The line begins with the string "<!--"
.
End condition: The line contains the string "-->"
.
Type 3: MKHTMLBlockTypes.ProcessingInstruction
Start condition: The line begins with the string "<?"
.
End condition: The line contains the string "?>"
.
Type 4: MKHTMLBlocks.TypeDocumentType
Start condition: The line begins with the string "<!
" followed by an uppercase ASCII letter.
End condition: The line contains the character ">"
.
Type 5: MKHTMLBlockTypes.CData
Start condition: The line begins with the string "<![CDATA["
.
End condition: The line contains the string "]]>"
.
Type 6: MKHTMLBlockTypes.InterruptingBlock
Start condition: The line begins the string "<"
or "<!--"
followed by one of the strings
(case-insensitive) "address"
, "article"
, "aside"
, "base"
, "basefont"
, "blockquote"
, "body"
,
"caption"
, "center"
, "col"
, "colgroup"
, "dd"
, "details"
, "dialog"
, "dir"
, "div"
, "dl"
,
"dt"
, "fieldset"
, "figcaption"
, "figure"
, "footer"
, "form"
, "frame"
, "frameset"
, "h1"
,
"h2"
, "h3"
, "h4"
, "h5"
, "h6"
, "head"
, "header"
, "hr"
, "html"
, "iframe"
, "legend"
,
"li"
, "link"
, "main"
, "menu"
, "menuitem"
, "nav"
, "noframes"
, "ol"
, "optgroup"
, "option"
,
"p"
, "param"
, "section"
, "source"
, "summary"
, "table"
, "tbody"
, "td"
, "tfoot"
, "th"
,
"thead"
, "title"
, "tr"
, "track"
, "ul"
, followed by whitespace, the end of the line, the string "-->"
or the string "/>"
.
End condition: The line is followed by a blank line.
Type 7: MKHTMLBlockTypes.NonInterruptingBlock
Start condition: The line begins with a complete open tag (with any tag name other than script, style, or pre) or a complete closing tag, followed only by whitespace or the end of the line.
End condition: The line is followed by a blank line.