Md4qt

Directories | |
| md4qt | |
Detailed Description
md4qt is a header-only C++ library for parsing Markdown.
md4qt supports CommonMark 0.31.2 Spec, and some GitHub extensions, such as tables, footnotes, tasks lists, strikethroughs, LaTeX Maths injections, GitHub's auto-links.
md4qt can be built with Qt6 or with ICU.
This library parses Markdown into tree structure.
- Example
- Benchmark
- Playground
- Release notes
- Known issues
- Q/A
- Why another AST Markdown parser?
- What should I know about links in the document?
- What is the second argument of `MD::Parser::parse()`?
- What is an `MD::Anchor`?
- Does the library throw exceptions?
- Why `MD::Parser` and `MD::Document` are templates?
- So, how can I use `md4qt` with `Qt6` and `ICU`?
- `ICU` is slower than `Qt6`? Really?
- Why is parsing wrong on Windows with `std::ifstream`?
- How can I convert `MD::Document` into `HTML`?
- How can I obtain positions of blocks/elements in `Markdown` file?
- How can I easily traverse through the `MD::Document`?
- Why don't you have an implementation for pure `STL` with `std::string`?
- Is it possible to write custom text plugin for this parser?
- Is it possible to find `Markdown` item by its position?
- How can I walk through the document and find all items of given type?
- How can I add and process a custom (user-defined) item in `MD::Document`?
Example
Benchmark
Approximate benchmark with cmark-gfm says, that Qt6 version of md4qt is slower ~13 times. But you will get complete C++ tree structure of the Markdown document with all major extensions and sugar and cherry on the cake.
| Markdown library | Result |
|---|---|
| cmark-gfm | 0.22 ms |
md4qt with Qt6 | 2.9 ms |
md4qt with Qt6 without GitHub auto-links extension | 2.5 ms |
Playground
You can play in action with md4qt in Markdown Tools. There you can find Markdown editor/viewer/converter to PDF.
And KleverNotes from KDE uses md4qt too.
Release notes
- Note that version 4.0.0 is API incompatible with 3.0.0. In version 4.0.0 were changed rules with spaces, this version fully supports CommonMark standard in this question. Methods
isSpaceBefore(),isSpaceAfter()were removed, and spaces are presented as in Markdown, so keep it in mind.
Known issues
You can find a list of know issues here. These issues are controversial a little, so at this time they exists as is in md4qt. But if you'd like to see any of them resolved, welcome to discussion.
Q/A
Why another AST Markdown parser?
When I wrote this library I knew about
md4cparser, but not aboutcmark-gfm.md4cwas not suitable for my purposes, whereascmark-gfmcould do everything I needed. But God did it so, so I wrotemd4qtand only later knew aboutcmark-gfm. Ok, code is written and tested. Let it be.What I can say yet, is that this library is C++. And for some people can be easier to use C++ code instead of C with freeing memory by hand. Qt do things easier by handling text encoding... So let it be, guys.
And one more cherry on the cake -
md4qtcan parse Markdown recursively. What it is described below.
What should I know about links in the document?
In some cases in Markdown link's URL is something document related. So, when you got a
MD::Linkin the document check if the labelled links of the document contains key with URL in the link, and if so, use URL from labelled links, look:MD::Link<MD::QStringTrait> *item = ...;QString url = item->url();const auto it = doc->labeledLinks().find(url);if (it != doc->labeledLinks().cend()) {url = it->second->url();}
What is the second argument of MD::Parser::parse()?
- Second argument of
MD::Parser::parse()is a flag that tells the parser to process Markdown files recursively or not. If parsing is recursive then if in the targeted Markdown file exist links to other Markdown files, then they will be parsed too and will exist in the resulting document.
What is an MD::Anchor?
- As
md4qtsupports recursive Markdown parsing, then in the resulting document can be represented more than one Markdown file. Each file in the document starts withMD::Anchor, it just shows that during traversing through the document you reached new file.
Does the library throw exceptions?
- No. This library doesn't use exceptions. Any text is a valid Markdown, so I don't need to inform user about errors. Qt itself doesn't use exceptions either. So you can catch only standard C++ exceptions, like
std::bad_alloc, for example. Possibly withMD::UnicodeStringTraityou will catch more standard exceptions, possibly I missed something somewhere, but I tried to negotiate all possible exceptions.
Why MD::Parser and MD::Document are templates?
- Since version
2.0.0md4qtcan be built not only withQt6, but withSTLtoo. The code of the parser is the same in both cases. I just added two ready traits to support different C++ worlds. WithSTLI useICUlibrary for Unicode handling, anduriparserlibrary to parse and check URLs. These dependencies can be installed with the Conan package manager.
So, how can I use md4qt with Qt6 and ICU?
To build with
ICUsupport you need to defineMD4QT_ICU_STL_SUPPORTbefore includingparser.h. In this case you will get access toMD::UnicodeStringTrait, that can be passed toMD::Parseras template parameter. You will receive in dependenciesC++ STL,ICUanduriparser.To build with
Qt6support you need to defineMD4QT_QT_SUPPORT. In this case you will get access toMD::QStringTraitto work with Qt's classes and functions. In this case in dependencies you will receiveQt6.You can define both to have ability to use
md4qtwithQt6andICU.
ICU is slower than Qt6? Really?
- Don't believe anybody, just build built-in
md_benchamrkand have a look. Dry numbers say, thatQt6QStringis ~2 times faster thanicu::UnicodeStringin such tasks. Markdown parsing implies to check every symbol, and tied to use access to every character in the string withoperator [] (...), or memberat(...). I do it very often in the parser's code and the profiler says that most of the run-time is spent on such operations.QStringis just more optimized for accessing separate characters thanicu::UnicodeString...
Why is parsing wrong on Windows with std::ifstream?
- Such a problem can occur on Windows with MSVC if you open the file in text mode, so for
MD::Parseralways openstd::ifstreamwithstd::ios::binaryflag. And yes, I expect to receive UTF-8 encoded content...
How can I convert MD::Document into HTML?
In version
2.0.5were made commits with implementation ofMD::toHtml()function. You can do the following:#define MD4QT_QT_SUPPORT#include <md4qt/traits.h>#include <md4qt/parser.h>#include <md4qt/html.h>int main(){return 0;}Trait::String toHtml(std::shared_ptr< Document< Trait > > doc, bool wrapInBodyTag=true, const typename Trait::String &hrefForRefBackImage={}, bool wrapInArticle=true)Convert Document to HTML.Definition html.h:768
How can I obtain positions of blocks/elements in Markdown file?
- Done in version
2.0.5. Remember that all positions inmd4qtstart with 0, where first symbol on first line will have coordinates(0,0). One more important thing is that all ranges of position inmd4qtare given inclusive, that mean that last column of any element will point to the last symbol in this element.
How can I easily traverse through the MD::Document?
Since version
2.6.0invisitor.hheader implementedMD::Visitorinterface with which you can easily walk through the document, all you need is implement/override virtual methods to handle that or another element in the document, like:
Why don't you have an implementation for pure STL with std::string?
- Because of performance, I did a pure
STLimplementation where the string class was anstd::stringwith some small third-party library to handleUTF8, and benchmark said that the performance was like withQt6QString, so I decided to not support third trait. Maybe because I am lazy?
Is it possible to write custom text plugin for this parser?
Since version
3.0.0in theMD::Parserwas added a method for adding custom text plugins.//! Add text plugin.void addTextPlugin(//! ID of a plugin. Use TextPlugin::UserDefinedPluginID value for start ID.int id,//! Function of a plugin, that will be invoked to processs raw text.MD::TextPluginFunc<Trait> plugin,//! Should this plugin be used in parsing of internals of links?bool processInLinks,//! User data that will be passed to plugin function.const typename Trait::StringList &userData);std::function< void(std::shared_ptr< Paragraph< Trait > >, TextParsingOpts< Trait > &, const typename Trait::StringList &)> TextPluginFuncFunctor type for text plugin.Definition parser.h:906
What is a ID of a plugin?
IDof a plugin is a regularintthat should be (but not mandatory) started fromenum TextPlugin : int {UnknownPluginID = 0,GitHubAutoLinkPluginID = 1,UserDefinedPluginID = 255}; // enum TextPluginMD::UserDefinedPluginIDvalue. Note that plugins will be invoked corresponding to itsIDfrom smallest to largest, so a developer can handle an order of text plugins.
What is a MD::TextPluginFunc<Trait>?
Text plugin is a usual function with a signature
template<class Trait>using TextPluginFunc = std::function<void(std::shared_ptr<Paragraph<Trait>>,TextParsingOpts<Trait> &, const typename Trait::StringList &)>;You will get already parsed
MD::Paragraphwith all items in it. And you are able to process remaining raw text data and check it for what you need.MD::TextParsingOptsis an auxiliary structure with some data. You are interested inbool collectRefLinks;, when this flag istruethe parser is in a state of collecting reference links, and on this stage plugin may do nothing.A last argument of plugin function is a user data, that was passed to
MD::Parser::addTextPlugin()method.A most important thing in
MD::TextParsingOptsstructure is astd::vector<TextData> m_rawTextData;. This vector contains not processed raw text data fromMarkdown. The size ofm_rawTextDatais the same as a count ofMD::Textitems inMD::Paragraph, and theirs sizes should remain equal. So, if you replace one of text item with something, for example link, corresponding text item should be removed fromMD::Paragraphandm_rawTextData. Or if you replace just a part of text item - it should be modified inMD::Paragraphandm_rawTextData. Be careful, it's UB, if you will make a mistake here, possibly you will crash.One more thing - don't forget to set positions of elements in
MD::Documentto new values if you change something, and don't forget about such things likeMD::ItemWithOpts::openStyles()andMD::ItemWithOpts::closeStyles(). Document should remain correct after your manipulations, so any syntax highlighter, for example, won't do a mistake.Note, that
MD::TextDataisstruct TextData {typename Trait::String m_str;long long int m_pos = -1;long long int m_line = -1;};And
m_posandm_linehere is relative toMD::MdBlock<Trait> & fr;member ofMD::TextParsingOpts, but document require absolute positions in theMarkdowntext. So when you will set positions to new items, use, for example, a following code.setEndColumn(po.fr.data.at(s.line).first.virginPos(s.pos));where
sis an object ofMD::TextDatatype.
What is processInLinks flag for?
processInLinksflag should be set to false if you desire to not process your plugin in link's captions, as, for example, links can't contain other links, so if you are implementing a plugin for new links this flag should be set tofalsefor your plugin.
What for is a userData argument?
- This list of strings will be passed to plugin function. This is auxiliary data that can be handy for plugin implementation.
Could you show an example of a plugin?
In
md4qtalready exists one text plugin for handling GitHub's auto-link. A plugin function is quite simple, look.template<class Trait>inline voidgithubAutolinkPlugin(std::shared_ptr<Paragraph<Trait>> p,TextParsingOpts< Trait > &po){if (!po.collectRefLinks) {long long int i = 0;while (i >= 0 && i < (long long int) po.rawTextData.size()) {i = processGitHubAutolinkExtension(p, po, i);++i;}}}But
MD::processGitHubAutolinkExtension()is not so trivial :) Have a look at its implementation to have a good example, it's placed inparser.h.Good luck with plugining. :)
I didn't understand how raw text data correlates with a paragraph.
Let I will show you on example how raw text data correlate with paragraph. Just two diagrams and you won't have any more questions. Look.
Consider we want to replace any occurrence of
@Xby some kind of a link. Before modifications we had.
And after work of your plugin we should have.
How can I get a string of MD::StyleDelim?
Since version
3.0.0was added a function to get a substring from text fragment with given virgin positions.template<class Trait>inline typename Trait::StringvirginSubstr(const MdBlock<Trait> &fr, const WithPosition &virginPos);And a function to get local position from virgin one.
template<class Trait>inline std::pair<long long int, long long int>localPosFromVirgin(const MdBlock<Trait> &fr,long long int virginColumn,long long int virginLine)
Is it possible to find Markdown item by its position?
- Since version
3.0.0was added new structureMD::PosCache. You can passMD::Documentinto itsMD::PosCache::initialize()method and find first item with all its nested first children by given position withMD::PosCache::findFirstInCache()method.
How can I walk through the document and find all items of given type?
Since version
3.0.0was added algorithmMD::forEach().//! Calls function for each item in the document with the given type.template<class Trait>inline voidforEach(//! Vector of item's types to be processed.const typename Trait::template Vector<ItemType> &types,//! Document.std::shared_ptr<Document<Trait>> doc,//! Functor object.ItemFunctor<Trait> func,//! Maximun nesting level.//! 0 means infinity, 1 - only top level items...unsigned int maxNestingLevel = 0);
How can I add and process a custom (user-defined) item in MD::Document?
- Since version
3.0.0inMD::ItemTypeenum appearedMD::UserDefinedenumerator. So you can inherit from anyMD::Itemclass and return fromtype()method value greater or equalMD::ItemType::UserData. To handle user-defined types of items inMD::Visitorclass now exists methodvoid onUserDefined(MD::Item<Trait> *item). So you can handle your custom items and do what you need.
Documentation copyright © 1996-2025 The KDE developers.
Generated on Fri May 2 2025 12:05:26 by doxygen 1.13.2 written by Dimitri van Heesch, © 1997-2006
KDE's Doxygen guidelines are available online.