Python API¶
MyST-Parser also has a Python API via the myst_parser
package.
See also
The markdown-it-py package
The raw text is first parsed to syntax ‘tokens’, then these are converted to other formats using ‘renderers’.
Quick-Start¶
The simplest way to understand how text will be parsed is using:
from myst_parser.main import to_html
to_html("some *text*")
'<p>some <em>text</em></p>\n'
from myst_parser.main import to_docutils
print(to_docutils("some *text*").pformat())
<document source="notset">
<paragraph>
some
<emphasis>
text
from pprint import pprint
from myst_parser.main import to_tokens
for token in to_tokens("some *text*"):
print(token)
print()
Token(type='paragraph_open', tag='p', nesting=1, attrs=None, map=[0, 1], level=0, children=None, content='', markup='', info='', meta={}, block=True, hidden=False)
Token(type='inline', tag='', nesting=0, attrs=None, map=[0, 1], level=1, children=[Token(type='text', tag='', nesting=0, attrs=None, map=None, level=0, children=None, content='some ', markup='', info='', meta={}, block=False, hidden=False), Token(type='em_open', tag='em', nesting=1, attrs=None, map=None, level=0, children=None, content='', markup='*', info='', meta={}, block=False, hidden=False), Token(type='text', tag='', nesting=0, attrs=None, map=None, level=1, children=None, content='text', markup='', info='', meta={}, block=False, hidden=False), Token(type='em_close', tag='em', nesting=-1, attrs=None, map=None, level=0, children=None, content='', markup='*', info='', meta={}, block=False, hidden=False)], content='some *text*', markup='', info='', meta={}, block=True, hidden=False)
Token(type='paragraph_close', tag='p', nesting=-1, attrs=None, map=None, level=0, children=None, content='', markup='', info='', meta={}, block=True, hidden=False)
The Parser¶
The default_parser
function loads a standard markdown-it parser with the default syntax rules for MyST.
from myst_parser.main import default_parser, MdParserConfig
config = MdParserConfig(renderer="html")
parser = default_parser(config)
parser
markdown_it.main.MarkdownIt()
pprint(parser.get_active_rules())
{'block': ['front_matter',
'table',
'code',
'math_block_eqno',
'math_block',
'fence',
'myst_line_comment',
'blockquote',
'myst_block_break',
'myst_target',
'hr',
'list',
'footnote_def',
'reference',
'heading',
'lheading',
'html_block',
'paragraph'],
'core': ['normalize', 'block', 'inline'],
'inline': ['text',
'newline',
'math_inline',
'math_single',
'escape',
'myst_role',
'backticks',
'emphasis',
'link',
'image',
'footnote_ref',
'autolink',
'html_inline',
'entity'],
'inline2': ['balance_pairs', 'emphasis', 'text_collapse']}
parser.render("*abc*")
'<p><em>abc</em></p>\n'
Any of these rules can be disabled:
parser.disable("emphasis").render("*abc*")
'<p>*abc*</p>\n'
renderInline
turns off any block syntax rules.
parser.enable("emphasis").renderInline("- *abc*")
'- <em>abc</em>'
The Token Stream¶
The text is parsed to a flat token stream:
from myst_parser.main import to_tokens
tokens = to_tokens("""
Here's some *text*
1. a list
> a *quote*""")
[t.type for t in tokens]
['paragraph_open',
'inline',
'paragraph_close',
'ordered_list_open',
'list_item_open',
'paragraph_open',
'inline',
'paragraph_close',
'list_item_close',
'ordered_list_close',
'blockquote_open',
'paragraph_open',
'inline',
'paragraph_close',
'blockquote_close']
Inline type tokens contain the inline tokens as children:
tokens[6]
Token(type='inline', tag='', nesting=0, attrs=None, map=[3, 4], level=3, children=[Token(type='text', tag='', nesting=0, attrs=None, map=None, level=0, children=None, content='a list', markup='', info='', meta={}, block=False, hidden=False)], content='a list', markup='', info='', meta={}, block=True, hidden=False)
The sphinx renderer first converts the token to a nested structure, collapsing the opening/closing tokens into single tokens:
from markdown_it.token import nest_tokens
nested = nest_tokens(tokens)
[t.type for t in nested]
['paragraph_open', 'ordered_list_open', 'blockquote_open']
print(nested[0].opening, end="\n\n")
print(nested[0].closing, end="\n\n")
print(nested[0].children, end="\n\n")
Token(type='paragraph_open', tag='p', nesting=1, attrs=None, map=[1, 2], level=0, children=None, content='', markup='', info='', meta={}, block=True, hidden=False)
Token(type='paragraph_close', tag='p', nesting=-1, attrs=None, map=None, level=0, children=None, content='', markup='', info='', meta={}, block=True, hidden=False)
[Token(type='inline', tag='', nesting=0, attrs=None, map=[1, 2], level=1, children=[Token(type='text', tag='', nesting=0, attrs=None, map=None, level=0, children=None, content="Here's some ", markup='', info='', meta={}, block=False, hidden=False), NestedTokens(opening=Token(type='em_open', tag='em', nesting=1, attrs=None, map=None, level=0, children=None, content='', markup='*', info='', meta={}, block=False, hidden=False), closing=Token(type='em_close', tag='em', nesting=-1, attrs=None, map=None, level=0, children=None, content='', markup='*', info='', meta={}, block=False, hidden=False), children=[Token(type='text', tag='', nesting=0, attrs=None, map=None, level=1, children=None, content='text', markup='', info='', meta={}, block=False, hidden=False)])], content="Here's some *text*", markup='', info='', meta={}, block=True, hidden=False)]
Renderers¶
The myst_parser.docutils_renderer.DocutilsRenderer
converts a token directly to the docutils.document
representation of the document, converting roles and directives to a docutils.nodes
if a converter can be found for the given name.
from myst_parser.main import to_docutils
document = to_docutils("""
Here's some *text*
1. a list
> a quote
{emphasis}`content`
```{sidebar} my sidebar
content
```
""")
print(document.pformat())
<document source="notset">
<paragraph>
Here's some
<emphasis>
text
<enumerated_list>
<list_item>
<paragraph>
a list
<block_quote>
<paragraph>
a quote
<paragraph>
<emphasis>
content
<sidebar>
<title>
my sidebar
<paragraph>
content
The myst_parser.sphinx_renderer.SphinxRenderer
builds on the DocutilsRenderer
to add sphinx specific nodes, e.g. for cross-referencing between documents.
To use the sphinx specific roles and directives outside of a sphinx-build
, they must first be loaded with the in_sphinx_env
option.
document = to_docutils("""
Here's some *text*
1. a list
> a quote
{ref}`target`
```{glossary} my gloassary
name
definition
```
""",
in_sphinx_env=True)
print(document.pformat())
<document source="notset">
<paragraph>
Here's some
<emphasis>
text
<enumerated_list>
<list_item>
<paragraph>
a list
<block_quote>
<paragraph>
a quote
<paragraph>
<pending_xref refdoc="mock_docname" refdomain="std" refexplicit="False" reftarget="target" reftype="ref" refwarn="True">
<inline classes="xref std std-ref">
target
<glossary>
<definition_list classes="glossary">
<definition_list_item>
<term ids="term-my-gloassary">
my gloassary
<index entries="('single',\ 'my\ gloassary',\ 'term-my-gloassary',\ 'main',\ None)">
<term ids="term-name">
name
<index entries="('single',\ 'name',\ 'term-name',\ 'main',\ None)">
<definition>
<paragraph>
definition
You can also set Sphinx configuration via sphinx_conf
. This is a dictionary representation of the contents of the Sphinx conf.py
.
Warning
This feature is only meant for simple testing. It will fail for extensions that require the full Sphinx build process and/or access to external files.
document = to_docutils("""
````{tabs}
```{tab} Apples
Apples are green, or sometimes red.
```
````
""",
in_sphinx_env=True,
conf={"extensions": ["sphinx_tabs.tabs"]}
)
print(document.pformat())
<document source="notset">
<container classes="sphinx-tabs">
<container>
<a classes="item">
<container>
<paragraph>
Apples
<container classes="ui bottom attached sphinx-tab tab segment sphinx-data-tab-0-0 active">
<paragraph>
Apples are green, or sometimes red.