Much slower than MD4C #50

nuttyartist · 2023-08-29T16:24:38Z

Hello! Thanks for this library. I was wondering why for the same text I got such a difference performance:

Maddy took 5304 milliseconds
Qt took 5 milliseconds

Maddy code:

std::stringstream markdownInput("some text...");
m_markdownParser->Parse(markdownInput);

Qt code:

QString markdownInput("some text...");
QTextDocument textDoc;
textDoc.setMarkdown(markdownInput);
textDoc.toHtml();

EDIT: By mistake I set it as a feature request.

progsource · 2023-08-29T16:44:04Z

When it comes to performance tests there are certain things that play into results, for example:

Operating System
currently running apps on the system (so any other running processes, that can slow down a test)
How many times did you run the tests?

So currently it is difficult to know the exact reasons for your results.

Besides that maddy's regex way of doing things might slow down currently processing Markdown. In version 2 I plan to remove the usage of regex and go with another approach which hopefully will speed maddy up. (Which I - of course - will benchmark)
But until then maddy might not be the fastest solution.

I'm working every now and then on version 2, but cannot commit yet to a release date due to RL and maddy being a side-project.

Of course - if somebody finds a way to speed things up a little in the meantime - I'm always happy for contributions.

nuttyartist · 2023-09-01T07:24:43Z

Excuse my late reply. Here's a reproducible test with the first chapter of Moby Dick in Markdown: https://gist.github.com/nuttyartist/cb0053ccda823ac98a7ce58f296269cc

I got somewhat consistent results of the following:
During Debug mode:

Maddy took 84380 milliseconds
MD4C took 0 milliseconds

During Release mode:

Maddy took 17552 milliseconds
MD4C took 0 milliseconds

EDIT: I edited the title after realizing Qt is using MD4C underneath.

vedderb · 2023-11-13T14:25:22Z

I ran into the performance-issue too and for me that almost makes maddy unusable. After some profiling and testing I found that the culprits are the following parsers:

EMPHASIZED_PARSER
ITALIC_PARSER
STRIKETHROUGH_PARSER
STRONG_PARSER

What they have in common is a long regexp that seems to take long to evaluate. I don't know if this breaks anything, but I replaced them with the following loops:

EmphasizedParser

void
  Parse(std::string& line) override
  {
      std::string pattern = "_";
      std::string newPattern = "em";

      for (;;) {
          int patlen = pattern.size();

          auto pos1 = line.find(pattern);
          if (pos1 == std::string::npos) {
              break;
          }

          auto pos2 = line.find(pattern, pos1 + patlen);
          if (pos2 == std::string::npos) {
              break;
          }

          std::string word = line.substr(pos1 + patlen, pos2 - pos1 - patlen);
          line = line.replace(pos1, (patlen + pos2) - pos1, "<" + newPattern + ">" + word + "</" + newPattern + ">");
      }
  }

ItalicParser

void
  Parse(std::string& line) override
  {
      std::string pattern = "*";
      std::string newPattern = "i";

      for (;;) {
          int patlen = pattern.size();

          auto pos1 = line.find(pattern);
          if (pos1 == std::string::npos) {
              break;
          }

          auto pos2 = line.find(pattern, pos1 + patlen);
          if (pos2 == std::string::npos) {
              break;
          }

          std::string word = line.substr(pos1 + patlen, pos2 - pos1 - patlen);
          line = line.replace(pos1, (patlen + pos2) - pos1, "<" + newPattern + ">" + word + "</" + newPattern + ">");
      }
  }

StrikeThroughParser

void
  Parse(std::string& line) override
  {
      std::string pattern = "~~";
      std::string newPattern = "s";

      for (;;) {
          int patlen = pattern.size();

          auto pos1 = line.find(pattern);
          if (pos1 == std::string::npos) {
              break;
          }

          auto pos2 = line.find(pattern, pos1 + patlen);
          if (pos2 == std::string::npos) {
              break;
          }

          std::string word = line.substr(pos1 + patlen, pos2 - pos1 - patlen);
          line = line.replace(pos1, (patlen + pos2) - pos1, "<" + newPattern + ">" + word + "</" + newPattern + ">");
      }
  }

StrongParser

void
  Parse(std::string& line) override
  {
      std::string pattern = "**";
      std::string newPattern = "strong";

      for (;;) {
          int patlen = pattern.size();

          auto pos1 = line.find(pattern);
          if (pos1 == std::string::npos) {
              break;
          }

          auto pos2 = line.find(pattern, pos1 + patlen);
          if (pos2 == std::string::npos) {
              break;
          }

          std::string word = line.substr(pos1 + patlen, pos2 - pos1 - patlen);
          line = line.replace(pos1, (patlen + pos2) - pos1, "<" + newPattern + ">" + word + "</" + newPattern + ">");
      }

      pattern = "__";

      for (;;) {
          int patlen = pattern.size();

          auto pos1 = line.find(pattern);
          if (pos1 == std::string::npos) {
              break;
          }

          auto pos2 = line.find(pattern, pos1 + patlen);
          if (pos2 == std::string::npos) {
              break;
          }

          std::string word = line.substr(pos1 + patlen, pos2 - pos1 - patlen);
          line = line.replace(pos1, (patlen + pos2) - pos1, "<" + newPattern + ">" + word + "</" + newPattern + ">");
      }
  }

I didn't measure how much faster this is, but my application went from being very laggy when parsing markdown-files to no lag that I can notice at all.

This is just a quick fix and I don't have time at the moment to clean it up and test it more, otherwise I would make a pull request. Just sharing it hoping that it is useful.

nuttyartist added the feature Feature Request label Aug 29, 2023

progsource removed the feature Feature Request label Aug 29, 2023

progsource added the question label Aug 29, 2023

nuttyartist changed the title ~~Much slower than Qt~~ Much slower than MD4C Sep 1, 2023

progsource mentioned this issue Sep 3, 2023

Maddy performance #52

Open

vedderb mentioned this issue Nov 14, 2023

Package - Images in description aren't being displayed at all vedderb/vesc_tool#337

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Much slower than MD4C #50

Much slower than MD4C #50

nuttyartist commented Aug 29, 2023 •

edited

Loading

progsource commented Aug 29, 2023

nuttyartist commented Sep 1, 2023 •

edited

Loading

vedderb commented Nov 13, 2023

Much slower than MD4C #50

Much slower than MD4C #50

Comments

nuttyartist commented Aug 29, 2023 • edited Loading

progsource commented Aug 29, 2023

nuttyartist commented Sep 1, 2023 • edited Loading

vedderb commented Nov 13, 2023

nuttyartist commented Aug 29, 2023 •

edited

Loading

nuttyartist commented Sep 1, 2023 •

edited

Loading