Adventures in Markdown, RegExes, and Line Termination

Posted on November 13, 2022 with tags:
In this post:

I’m a big fan of markdown. From pull request descriptions to blog posts, I use it almost daily.

Teradata, my current employer, writes its in-product help in markdown and displays that markdown to the customer with the markdown-navigator component from our open source UI platform, Covalent.

Last week, a bug came in from our integration testing team complaining that a list of hyperlinks was not properly rendering in a page of our in-product help.

Essentially, the list of links looked like:

[Link 1 title](www.example.com)
------------------------------------
[Link 2 title](www.teradata.com)
------------------------------------
[Link 3 title](docs.teradata.com)

If you are familiar with markdown, you’d recognize the usage of the [title](url) syntax.

Syntax Error?

Upon a cursory investigation I replicated the bug and also found that lists of links in other pages were working properly. Immediately, I jumped to the conclusion that some syntax error was causing the problematic behavior, but found no syntax errors upon inspection.

I was stumped - How could two files with the exact same syntax render so differently?

Digging Deeper

I dove into the source code of Covalent’s markdown component, which uses showdown.js to render markdown as HTML. However, before passing the markdown content to showdown, Covalent replaces specific markdown elements with custom components to fit Teradata brand and style guidelines. This includes code blocks and lists.

These custom components are pulled out of the markdown with a few regular expressions. It must be that the faulty lists aren’t being identified by the RegExes! Right?

Nope! I was surprised to find that the broken lists of links were actually properly identified by the regex and the properly rendered lists were missed by the regex, thus showing up vanilla markdown instead of our custom component. Everything is backwards!

Finding the guilty party

Ultimately, I discovered the two culprits of all this wonkiness:

  1. Our custom list component didn’t render internal markdown, it just displayed as plaintext
  2. The regexes used to pull out custom lists didn’t account for DOS-style (Windows) line terminators
💡 Modern *nix systems (Linux and MacOS) use the newline character `\n` to denote line termination, while DOS systems (Microsoft Windows) use carriage-return and line-feed in conjunction `\r\n` (Abbreviated CRLF)

Therefore, markdown files with CRLF line terminators would not render with our custom components (~70% of our in-product help files), and the custom list component would display list item content as plaintext, instead of converting links to <a> tags in the html output.

Root Cause Analysis

After asking around it seemed neither the other UI engineers nor the Help Content Team were aware of the custom list component, and I couldn’t get an answer for why it existed in the first place. So I checked the git blame to find its inception. The PR that added the custom list component illuminated it use: The list component was only meant for Covalent’s own documentation site to style API inputs and outputs. The custom list was never meant to be shipped as part of our in-product help.

Implementing a fix

To fix this bug, I added an optional input parameter to Covalent’s markdown component to enable the custom list, and defaulted it to false. That way, I could toggle the list on for Covalent’s documentation, but leave it disabled for our in-product help. You can see the full extent of my changes here: https://github.com/Teradata/covalent/pull/1984

Takeaways

When writing RegExes to look for newline characters, make sure you account for those pesky \rs!

Thanks for reading!

Comments

Sign in to leave a comment.