# Parse MyST Markdown ```{seealso} - The MyST Parser package heavily uses the the [markdown-it-py](https://github.com/executablebooks/markdown-it-py) package. - {ref}`The MyST-Parser API reference ` contains a more complete reference of this API. ``` ## Parsing and rendering helper functions The MyST Parser comes bundled with some helper functions to quickly parse MyST Markdown and render its output. :::{important} These APIs are primarily intended for testing and development purposes. For proper parsing see {ref}`myst-sphinx` and {ref}`myst-docutils`. ::: ### Parse MyST Markdown to HTML The following code parses markdown and renders as HTML using only the markdown-it parser (i.e. no sphinx or docutils specific processing is done): ```python from myst_parser.main import to_html to_html("some *text* {literal}`a`") ``` ```html '

some text {literal}[a]

\n' ``` ### Parse MyST Markdown to docutils The following function renders your text as **docutils AST objects** (for example, for use with the Sphinx ecosystem): ```python from myst_parser.main import to_docutils print(to_docutils("some *text* {literal}`a`").pformat()) ``` ```xml some text a ``` :::{note} This function only performs the initial parse of the AST, without applying any transforms or post-processing. See for example the [Sphinx core events](https://www.sphinx-doc.org/en/master/extdev/appapi.html?highlight=config-inited#sphinx-core-events). ::: ### Parse MyST Markdown as `markdown-it` tokens The MyST Parser uses `markdown-it-py` tokens as an intermediate representation of your text. Normally these tokens are then *rendered* into various outputs. If you'd like direct access to the tokens, use the `to_tokens` function. Here's an example of its use: ```python from pprint import pprint from myst_parser.main import to_tokens for token in to_tokens("some *text*"): print(token, "\n") ``` ```python Token(type='paragraph_open', tag='p', nesting=1, attrs=None, map=[0, 1], level=0, children=None, content='', markup='', info='', meta={}, block=True, hidden=False) Token(type='inline', tag='', nesting=0, attrs=None, map=[0, 1], level=1, children=[Token(type='text', tag='', nesting=0, attrs=None, map=None, level=0, children=None, content='some ', markup='', info='', meta={}, block=False, hidden=False), Token(type='em_open', tag='em', nesting=1, attrs=None, map=None, level=0, children=None, content='', markup='*', info='', meta={}, block=False, hidden=False), Token(type='text', tag='', nesting=0, attrs=None, map=None, level=1, children=None, content='text', markup='', info='', meta={}, block=False, hidden=False), Token(type='em_close', tag='em', nesting=-1, attrs=None, map=None, level=0, children=None, content='', markup='*', info='', meta={}, block=False, hidden=False)], content='some *text*', markup='', info='', meta={}, block=True, hidden=False) Token(type='paragraph_close', tag='p', nesting=-1, attrs=None, map=None, level=0, children=None, content='', markup='', info='', meta={}, block=True, hidden=False) ``` Each token is an abstract representation of a piece of MyST Markdown syntax. ## Use the parser object for more control The MyST Parser is actually a `markdown-it-py` parser with several extensions pre-enabled that support the MyST syntax. If you'd like more control over the parsing process, then you can directly use a `markdown-it-py` parser with MyST syntax extensions loaded. :::{seealso} [`markdown-it-py`](https://markdown-it-py.readthedocs.io/) is an extensible Python parser and renderer for flavors of markdown. It is inspired heavily by the [`markdown-it`](https://github.com/markdown-it/markdown-it) Javascript package. See the documentation of these tools for more information. ::: ### Load a parser To load one of these parsers for your own use, use the `default_parser` function. Below we'll create such a parser and show that it is an instance of a `markdown-it-py` parser: ```python from myst_parser.main import default_parser, MdParserConfig config = MdParserConfig(renderer="html") parser = default_parser(config) parser ``` ```python markdown_it.main.MarkdownIt() ``` ### List the active rules We can list the **currently active rules** for this parser. Each rules maps onto a particular markdown syntax, and a Token. To list the active rules, use the `get_active_rules` method: ```python pprint(parser.get_active_rules()) ``` ```python {'block': ['front_matter', 'table', 'code', 'math_block_label', 'math_block', 'fence', 'myst_line_comment', 'blockquote', 'myst_block_break', 'myst_target', 'hr', 'list', 'footnote_def', 'reference', 'heading', 'lheading', 'html_block', 'paragraph'], 'core': ['normalize', 'block', 'inline'], 'inline': ['text', 'newline', 'math_inline', 'math_single', 'escape', 'myst_role', 'backticks', 'emphasis', 'link', 'image', 'footnote_ref', 'autolink', 'html_inline', 'entity'], 'inline2': ['balance_pairs', 'emphasis', 'text_collapse']} ``` ### Parse and render markdown Once we have a Parser instance, we can use it to parse some markdown. Use the `render` function to do so: ```python parser.render("*abc*") ``` ```html '

abc

\n' ``` ### Disable and enable rules You can disable and enable rules for a parser using the `disable` and `enable` methods. For example, below we'll disable the `emphasis` rule (which is what detected the `*abc*` syntax above) and re-render the text: ```python parser.disable("emphasis").render("*abc*") ``` ```html '

*abc*

\n' ``` As you can see, the parser no longer detected the `**` syntax as requiring an _emphasis_. ### Turn off all block-level syntax If you'd like to use your parser *only* for in-line content, you may turn off all block-level syntax with the `renderInline` method: ```python parser.enable("emphasis").renderInline("- *abc*") ``` ```html '- abc' ``` ## The Token Stream When you parse markdown with the MyST Parser, the result is a flat stream of **Tokens**. These are abstract representations of each type of syntax that the parser has detected. For example, below we'll show the token stream for some simple markdown: ```python from myst_parser.main import to_tokens tokens = to_tokens(""" Here's some *text* 1. a list > a *quote*""") [t.type for t in tokens] ``` ```python ['paragraph_open', 'inline', 'paragraph_close', 'ordered_list_open', 'list_item_open', 'paragraph_open', 'inline', 'paragraph_close', 'list_item_close', 'ordered_list_close', 'blockquote_open', 'paragraph_open', 'inline', 'paragraph_close', 'blockquote_close'] ``` Note that these tokens are **flat**, although some of the tokens refere to one another (for example, Tokens with `_open` and `_close` represent the start/end of blocks). Tokens of type `inline` will have a `children` attribute that contains a list of the Tokens that they contain. For example: ```python tokens[6] ``` ```python Token(type='inline', tag='', nesting=0, attrs=None, map=[3, 4], level=3, children=[Token(type='text', tag='', nesting=0, attrs=None, map=None, level=0, children=None, content='a list', markup='', info='', meta={}, block=False, hidden=False)], content='a list', markup='', info='', meta={}, block=True, hidden=False) ``` ### Rendering tokens The list of Token objects can be rendered to a number of different outputs. This involves first processing the Tokens, and then defining how each should be rendered in an output format (e.g., HTML or Docutils). For example, the sphinx renderer first converts the token to a nested structure, collapsing the opening/closing tokens into single tokens: ```python from markdown_it.token import nest_tokens nested = nest_tokens(tokens) [t.type for t in nested] ``` ```python ['paragraph_open', 'ordered_list_open', 'blockquote_open'] ``` ```python print(nested[0].opening, end="\n\n") print(nested[0].closing, end="\n\n") print(nested[0].children, end="\n\n") ``` ```python Token(type='paragraph_open', tag='p', nesting=1, attrs=None, map=[1, 2], level=0, children=None, content='', markup='', info='', meta={}, block=True, hidden=False) Token(type='paragraph_close', tag='p', nesting=-1, attrs=None, map=None, level=0, children=None, content='', markup='', info='', meta={}, block=True, hidden=False) [Token(type='inline', tag='', nesting=0, attrs=None, map=[1, 2], level=1, children=[Token(type='text', tag='', nesting=0, attrs=None, map=None, level=0, children=None, content="Here's some ", markup='', info='', meta={}, block=False, hidden=False), NestedTokens(opening=Token(type='em_open', tag='em', nesting=1, attrs=None, map=None, level=0, children=None, content='', markup='*', info='', meta={}, block=False, hidden=False), closing=Token(type='em_close', tag='em', nesting=-1, attrs=None, map=None, level=0, children=None, content='', markup='*', info='', meta={}, block=False, hidden=False), children=[Token(type='text', tag='', nesting=0, attrs=None, map=None, level=1, children=None, content='text', markup='', info='', meta={}, block=False, hidden=False)])], content="Here's some *text*", markup='', info='', meta={}, block=True, hidden=False)] ``` It then renders each token to a Sphinx-based docutils object. See [the renderers section](renderers.md) for more information about rendering tokens.