Parse MyST Markdown#

See also

Parsing and rendering helper functions#

The MyST Parser comes bundled with some helper functions to quickly parse MyST Markdown and render its output.

Important

These APIs are primarily intended for testing and development purposes. For proper parsing see MyST with Sphinx and MyST with Docutils.

Parse MyST Markdown to HTML#

The following code parses markdown and renders as HTML using only the markdown-it parser (i.e. no sphinx or docutils specific processing is done):

from myst_parser.main import to_html
to_html("some *text* {literal}`a`")
'<p>some <em>text</em> <code class="myst role">{literal}[a]</code></p>\n'

Parse MyST Markdown to docutils#

The following function renders your text as docutils AST objects (for example, for use with the Sphinx ecosystem):

from myst_parser.main import to_docutils
print(to_docutils("some *text* {literal}`a`").pformat())
<document source="notset">
    <paragraph>
        some
        <emphasis>
            text

        <literal>
            a

Note

This function only performs the initial parse of the AST, without applying any transforms or post-processing. See for example the Sphinx core events.

Parse MyST Markdown as markdown-it tokens#

The MyST Parser uses markdown-it-py tokens as an intermediate representation of your text. Normally these tokens are then rendered into various outputs. If you’d like direct access to the tokens, use the to_tokens function. Here’s an example of its use:

from pprint import pprint
from myst_parser.main import to_tokens

for token in to_tokens("some *text*"):
    print(token, "\n")
Token(type='paragraph_open', tag='p', nesting=1, attrs=None, map=[0, 1], level=0, children=None, content='', markup='', info='', meta={}, block=True, hidden=False)

Token(type='inline', tag='', nesting=0, attrs=None, map=[0, 1], level=1, children=[Token(type='text', tag='', nesting=0, attrs=None, map=None, level=0, children=None, content='some ', markup='', info='', meta={}, block=False, hidden=False), Token(type='em_open', tag='em', nesting=1, attrs=None, map=None, level=0, children=None, content='', markup='*', info='', meta={}, block=False, hidden=False), Token(type='text', tag='', nesting=0, attrs=None, map=None, level=1, children=None, content='text', markup='', info='', meta={}, block=False, hidden=False), Token(type='em_close', tag='em', nesting=-1, attrs=None, map=None, level=0, children=None, content='', markup='*', info='', meta={}, block=False, hidden=False)], content='some *text*', markup='', info='', meta={}, block=True, hidden=False)

Token(type='paragraph_close', tag='p', nesting=-1, attrs=None, map=None, level=0, children=None, content='', markup='', info='', meta={}, block=True, hidden=False)

Each token is an abstract representation of a piece of MyST Markdown syntax.

Use the parser object for more control#

The MyST Parser is actually a markdown-it-py parser with several extensions pre-enabled that support the MyST syntax. If you’d like more control over the parsing process, then you can directly use a markdown-it-py parser with MyST syntax extensions loaded.

See also

markdown-it-py is an extensible Python parser and renderer for flavors of markdown. It is inspired heavily by the markdown-it Javascript package. See the documentation of these tools for more information.

Load a parser#

To load one of these parsers for your own use, use the create_md_parser function. Below we’ll create such a parser and show that it is an instance of a markdown-it-py parser:

from markdown_it.renderer import RendererHTML
from myst_parser.main import create_md_parser, MdParserConfig
config = MdParserConfig()
parser = create_md_parser(config, RendererHTML)
parser
markdown_it.main.MarkdownIt()

List the active rules#

We can list the currently active rules for this parser. Each rules maps onto a particular markdown syntax, and a Token. To list the active rules, use the get_active_rules method:

pprint(parser.get_active_rules())
{'block': ['front_matter',
           'table',
           'code',
           'math_block_label',
           'math_block',
           'fence',
           'myst_line_comment',
           'blockquote',
           'myst_block_break',
           'myst_target',
           'hr',
           'list',
           'footnote_def',
           'reference',
           'heading',
           'lheading',
           'html_block',
           'paragraph'],
 'core': ['normalize', 'block', 'inline'],
 'inline': ['text',
            'newline',
            'math_inline',
            'math_single',
            'escape',
            'myst_role',
            'backticks',
            'emphasis',
            'link',
            'image',
            'footnote_ref',
            'autolink',
            'html_inline',
            'entity'],
 'inline2': ['balance_pairs', 'emphasis', 'text_collapse']}

Parse and render markdown#

Once we have a Parser instance, we can use it to parse some markdown. Use the render function to do so:

parser.render("*abc*")
'<p><em>abc</em></p>\n'

Disable and enable rules#

You can disable and enable rules for a parser using the disable and enable methods. For example, below we’ll disable the emphasis rule (which is what detected the *abc* syntax above) and re-render the text:

parser.disable("emphasis").render("*abc*")
'<p>*abc*</p>\n'

As you can see, the parser no longer detected the *<text>* syntax as requiring an emphasis.

Turn off all block-level syntax#

If you’d like to use your parser only for in-line content, you may turn off all block-level syntax with the renderInline method:

parser.enable("emphasis").renderInline("- *abc*")
'- <em>abc</em>'

The Token Stream#

When you parse markdown with the MyST Parser, the result is a flat stream of Tokens. These are abstract representations of each type of syntax that the parser has detected.

For example, below we’ll show the token stream for some simple markdown:

from myst_parser.main import to_tokens
tokens = to_tokens("""
Here's some *text*

1. a list

> a *quote*""")
[t.type for t in tokens]
['paragraph_open',
 'inline',
 'paragraph_close',
 'ordered_list_open',
 'list_item_open',
 'paragraph_open',
 'inline',
 'paragraph_close',
 'list_item_close',
 'ordered_list_close',
 'blockquote_open',
 'paragraph_open',
 'inline',
 'paragraph_close',
 'blockquote_close']

Note that these tokens are flat, although some of the tokens refer to one another (for example, Tokens with _open and _close represent the start/end of blocks).

Tokens of type inline will have a children attribute that contains a list of the Tokens that they contain. For example:

tokens[6]
Token(type='inline', tag='', nesting=0, attrs=None, map=[3, 4], level=3, children=[Token(type='text', tag='', nesting=0, attrs=None, map=None, level=0, children=None, content='a list', markup='', info='', meta={}, block=False, hidden=False)], content='a list', markup='', info='', meta={}, block=True, hidden=False)

Rendering tokens#

The list of Token objects can be rendered to a number of different outputs. This involves first processing the Tokens, and then defining how each should be rendered in an output format (e.g., HTML or Docutils).

For example, the sphinx renderer first converts the token to a nested structure, collapsing the opening/closing tokens into single tokens:

from markdown_it.token import nest_tokens
nested = nest_tokens(tokens)
[t.type for t in nested]
['paragraph_open', 'ordered_list_open', 'blockquote_open']
print(nested[0].opening, end="\n\n")
print(nested[0].closing, end="\n\n")
print(nested[0].children, end="\n\n")
Token(type='paragraph_open', tag='p', nesting=1, attrs=None, map=[1, 2], level=0, children=None, content='', markup='', info='', meta={}, block=True, hidden=False)

Token(type='paragraph_close', tag='p', nesting=-1, attrs=None, map=None, level=0, children=None, content='', markup='', info='', meta={}, block=True, hidden=False)

[Token(type='inline', tag='', nesting=0, attrs=None, map=[1, 2], level=1, children=[Token(type='text', tag='', nesting=0, attrs=None, map=None, level=0, children=None, content="Here's some ", markup='', info='', meta={}, block=False, hidden=False), NestedTokens(opening=Token(type='em_open', tag='em', nesting=1, attrs=None, map=None, level=0, children=None, content='', markup='*', info='', meta={}, block=False, hidden=False), closing=Token(type='em_close', tag='em', nesting=-1, attrs=None, map=None, level=0, children=None, content='', markup='*', info='', meta={}, block=False, hidden=False), children=[Token(type='text', tag='', nesting=0, attrs=None, map=None, level=1, children=None, content='text', markup='', info='', meta={}, block=False, hidden=False)])], content="Here's some *text*", markup='', info='', meta={}, block=True, hidden=False)]

It then renders each token to a Sphinx-based docutils object. See the renderers section for more information about rendering tokens.