Parsing Rich Text

To make use of the rich text features when not using our rich text rendering (headless), you may need to process the content yourself. You may also proceed with this guide if you want to refresh your knowledge of recursion, as our style representation is a hierarchical structure which can be parsed recursively (some would call the structure an AST, Abstract Syntax Tree).

So the first step is to have a closer look at an example which was created with our Rich Text Editor component and extracted from the database. From there we can build an understanding about the steps we need to take to create HTML for example.

If you received your event via API, you may want to present the description in HTML, yet you are unsure how to do that. The first step is to deserialize the description string. In JavaScript, you can use JSON.parse(event.description) for that, because the serialization format used is JSON. In this guide we will develop a simple toHTML function that receives the deserialized content and outputs HTML.

Discovering the data structure

After deserializing the input, you will discover that the root of the structure is an array, in which all other objects (which we will call node from here on) are contained. There might be a variety of nodes (even nested inside each other) in your specific description, so we will proceed with presenting all node kinds.

{
    "text": "This is a text fragment",
    "bold": true,
    "italic": true
}

The most basic node is called a Text Node. Such a node holds text with some basic styling (bold and italic). You can only find it as leaves in our AST, so this node does not have any children.

{
    "type": "link",
    "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "children": []
}

Because we might want to link to other resources (we are in the Internet after all), there is the Link Node. It is also the first kind of node that can have children. These two nodes are also the only kinds of nodes that contain information the user can see in the end. All the other (following) nodes are necessary to structure the content the Link Node and the Text Node are providing.

The following node is called List Item Node and cannot occur without specific parents (so you won't find this node in the root array). Because the following nodes are all pretty simple in the properties they hold, we will present them like so:

{ "type": "list-item", "children": [] }

The kind of node that can be parent of the List Item Node is the List Node, either as a Unordered List Node or Ordered List Node. In fact, these nodes can only hold List Item Nodes - so they are restricted in the kind of node they can have as children while the List Item Node itself is only restricted in the kind of parent node, not in the kind of node it may have as children.

{ "type": "ordered-list", "children": [] }
{ "type": "unordered-list", "children": [] }

The next kind of nodes are the Subheading Nodes.

{ "type": "subheading-1", "children": [] }
{ "type": "subheading-2", "children": [] }

The last kind of node is the Paragraph Node, a node that creates structure.

{ "type": "paragraph", "children": [] }

type TextNode = {
    text: string;
    bold?: boolean;
    italic?: boolean;
};

type LinkNode = {
    type: "link";
    url: string;
    children: Node[];
};

type ItemNode = {
    type: "list-item";
    children: Node[];
};

type ListNode = {
    type: "ordered-list" | "unordered-list";
    children: ItemNode[];
};

type ParagraphNode = {
    type: "paragraph";
    children: Node[];
}

type SubheadingNode = { 
    type: "subheading-1" | "subheading-2";
    children: Node[];
}

type Node = 
    | TextNode 
    | LinkNode 
    | ListNode 
    | ParagraphNode
    | SubheadingNode;

Rendering Text Nodes

We want to serialize a Text Node into a simple HTML representation. To do that, we will neglect any kinds of attributes we could apply to the element we generate. We just want to press the text (which we will call innerHTML) between two corresponding HTML tags. With the option that our text has to be styled either bold or italic, both, or not styled at all we could write pretty complicated code.

With two helper functions that wrap our content with the <i>-Tag (which browsers render to italic) and <strong> (requesting the content to be rendered bold), we can simply combine the requested stylings according to the properties present in the node.

function renderTextNode(node: TextNode): string {
    let text = node.text;
    if (node.italic) {
        text = i(text);
    }
    if (node.bold) {
        text = strong(text);
    }
    return text;
}

For a later step we will need to distinguish between normal nodes and text nodes, so we will implement that right away. The difference between a text node and all other nodes is the existence of the property "text", so we will use that to our advantage.

function isTextNode(node: any): node is TextNode {
    return node.text != null;
}

type TextNode = {
    text: string;
    bold?: boolean;
    italic?: boolean;
};

function strong(innerHTML: string) {
    return `<strong>${innerHTML}</strong>`;
}

function i(innerHTML: string) {
    return `<i>${innerHTML}</i>`;
}

Linking to resources

Now we want to have a look at the first node that can have children, the Link Node. You can see on the right how we want to serialize a <a>-tag. Two comments on the html attributes we are injecting besides the href-attribute:

target="_blank" to make sure that our users do not leave our site. Because the description may be shown on the events shop page, we want to make sure that we do not encourage leaving.
rel="nofollow" to ensure that our pages' SEO ranking does not take a dump if the event description is linking to a less reputable resource (we are signaling to search engines that we do not endorse this link).

Having a look at the signature of the a() function, we see that we need two parameters: The url parameter can be taken more or less straight from the Link Nodes' url property. To make sure that the link is correctly encoded for browsers, we update the raw url from the parameter by replacing characters which cannot be used in a url (escapeHtml is a basic implementation of an urlencoding function). But how do we construct the innerHTML parameter?

For that we have to keep in mind that we want to create a toHTML function in this guide. It will take an array of nodes and return a single string that is encoding the nodes as html. Anticipating that we will succeed in implementing that, we will already use that method here in a recursive fashion. Because the children property is exactly what the toHTML function will need as a parameter, we can implement the rendering of a Link Node like so:

function renderLinkNode(node: LinkNode) {
    const innerHTML = toHTML(node.children);
    return a(innerHTML, node.url);
}

type LinkNode = {
    type: "link";
    url: string;
    children: Node[];
};

function escapeHtml(unsafe: string) {
    if(!unsafe) {
        return ""
    }
    return unsafe
        .replace(/&/g, "&amp;")
        .replace(/</g, "&lt;")
        .replace(/>/g, "&gt;")
        .replace(/"/g, "&quot;")
        .replace(/'/g, "&#039;");
}

function a(innerHTML: string, url: string) {
    return `<a rel="nofollow" href="${escapeHtml(url)}" target="_blank">${innerHTML}</a>`
}

Rendering Subheading Nodes

function renderSubheadingOneNode(node: SubheadingNode): string {
  const innerHTML = toHTML(node.children);
  return h1(innerHTML);
}

function renderSubheadingTwoNode(node: SubheadingNode): string {
  const innerHTML = toHTML(node.children);
  return h2(innerHTML);
}

type SubheadingNode = {
  type: "subheading-1" | "subheading-2";
  children: Node[];
}

Building container elements

The next thing to do is to build the renderXY functions for the missing nodes, namely the Item Node, the List Nodes and the Paragraph Node. While they all need innerHTML and thus might seem complicated, they follow the same patterns as the nodes beforehand: The innerHTML can be generated recursively with toHTML and is then interpolated between the respective tags.

function renderItemNode(node: ItemNode) {
    const innerHTML = toHTML(node.children);
    return li(innerHTML);
}

function renderUnorderedListNode(node: ListNode) {
    const innerHTML = toHTML(node.children);
    return ul(innerHTML);
}

function renderOrderedListNode(node: ListNode) {
    const innerHTML = toHTML(node.children);
    return ol(innerHTML);
}

function renderParagraphNode(node: ParagraphNode) {
    const innerHTML = toHTML(node.children);
    return p(innerHTML);
}

function renderSubheadingOne(node: ParagraphNode) {
    const innerHTML = toHTML(node.children);
    return h1(innerHTML);
}

function renderSubheadingTwo(node: renderParagraphNode) {
    const innerHTML = toHTML(node.children);
    return h2(innerHTML);
}

type ItemNode = {
    type: "list-item";
    children: Node[];
};

function li(innerHTML: string) {
    return `<li>${innerHTML}</li>`
}

type ListNode = {
    type: "ordered-list" | "unordered-list";
    children: ItemNode[];
};

function ul(innerHTML: string) {
    return `<ul>${innerHTML}</ul>`
}

function ol(innerHTML: string) {
    return `<ol>${innerHTML}</ol>`
}

type ParagraphNode = {
    type: "paragraph";
    children: Node[];
}

function p(innerHTML: string) {
    return `<p>${innerHTML}</p>`
}

type SubheadingNode = {
    type: "subheading-1" | "subheading-2"
    children: Node[];
}

function h1(innerHTML: string) {
    return `<h1>${innerHTML}</h1>`
}

function h2(innerHTML: string) {
    return `<h2>${innerHTML}</h2>`
}

Refactoring and ✨completing the code✨

Now that we have seen how our functions are expected to behave, we will refactor them into a single function called renderNode. We are building the innerHTML in each function, which is not necessary while also increasing complexity. The renderNode function receives a generic node and renders the correct markup. Lets see what that might look like.

function renderNode(node: Node | ItemNode) {
    if (isTextNode(node)) {
        return renderTextNode(node);
    }

    const innerHTML = toHTML(node.children);
    switch (node.type) {
        case "link":
            return a(innerHTML, node.url);
        case "list-item":
            return li(innerHTML);
        case "ordered-list":
            return ol(innerHTML);
        case "paragraph":
            return p(innerHTML);
        case "unordered-list":
            return ul(innerHTML);
        case "subheading-1":
            return h1(innerHTML);
        case "subheading-2":
            return h2(innerHTML);
        default:
            return innerHTML;
    }
}

The last thing we need to build is the toHTML function. As already stated, the only difference to the renderNode function is that we are receiving an array of nodes, not a single node. Because html is a hierarchical markup, we can just append the output of each renderNode return value. This can look like the following:

function toHTML(nodes: Node[] | ItemNode[]) {
    return nodes.map(renderNode).join("");
}

🎉**Aaaand that's it!**🎉 Thank you for following this guide and happy hacking!

type Node = 
    | TextNode 
    | LinkNode 
    | ListNode 
    | ParagraphNode
    | SubheadingNode;

The complete guide in TypeScript

type TextNode = {
    text: string;
    bold?: boolean;
    italic?: boolean;
};

function strong(innerHTML: string) {
    return `<strong>${innerHTML}</strong>`;
}

function i(innerHTML: string) {
    return `<i>${innerHTML}</i>`;
}

function renderTextNode(node: TextNode): string {
    let text = node.text;
    if (node.italic) {
        text = i(text);
    }
    if (node.bold) {
        text = strong(text);
    }
    return text;
}

function isTextNode(node: any): node is TextNode {
    return node.text != null;
}

type LinkNode = {
    type: "link";
    url: string;
    children: Node[];
};

function escapeHtml(unsafe: string) {
    if (!unsafe) {
        return ""
    }
    return unsafe
        .replace(/&/g, "&amp;")
        .replace(/</g, "&lt;")
        .replace(/>/g, "&gt;")
        .replace(/"/g, "&quot;")
        .replace(/'/g, "&#039;");
}

function a(innerHTML: string, url: string) {
    return `<a rel="nofollow" href="${escapeHtml(url)}" target="_blank">${innerHTML}</a>`
}

type ItemNode = {
    type: "list-item";
    children: Node[];
};

function li(innerHTML: string) {
    return `<li>${innerHTML}</li>`
}

type ListNode = {
    type: "ordered-list" | "unordered-list";
    children: ItemNode[];
};

function ul(innerHTML: string) {
    return `<ul>${innerHTML}</ul>`
}

function ol(innerHTML: string) {
    return `<ol>${innerHTML}</ol>`
}

type ParagraphNode = {
    type: "paragraph";
    children: Node[];
}

function p(innerHTML: string) {
    return `<p>${innerHTML}</p>`
}

type SubheadingNode = {
    type: "subheading-1" | "subheading-2";
    children: Node[];
}

function h1(innerHTML: string) {
    return `<h1>${innerHTML}</h1>`
}

function h2(innerHTML: string) {
    return `<h2>${innerHTML}</h2>`
}

type Node = 
    | TextNode
    | LinkNode
    | ListNode
    | ParagraphNode
    | SubheadingNode;

function renderNode(node: Node | ItemNode): string {
    if (isTextNode(node)) {
        return renderTextNode(node);
    }

    const innerHTML = toHTML(node.children);
    switch (node.type) {
        case "link":
            return a(innerHTML, node.url);
        case "list-item":
            return li(innerHTML);
        case "ordered-list":
            return ol(innerHTML);
        case "paragraph":
            return p(innerHTML);
        case "unordered-list":
            return ul(innerHTML);
        case "subheading-1":
            return h1(innerHTML);
        case "subheading-2":
            return h2(innerHTML);
        default:
            return innerHTML
    }
}

function toHTML(nodes: Node[] | ItemNode[]) {
    return nodes.map(renderNode).join("");
}

The complete guide in JavaScript

function strong(innerHTML) {
    return `<strong>${innerHTML}</strong>`;
}

function i(innerHTML) {
    return `<i>${innerHTML}</i>`;
}

function renderTextNode(node) {
    let text = node.text;
    if (node.italic) {
        text = i(text);
    }
    if (node.bold) {
        text = strong(text);
    }
    return text;
}

function isTextNode(node) {
    return node.text != null;
}

function escapeHtml(unsafe) {
    if (!unsafe) {
        return ""
    }
    return unsafe
        .replace(/&/g, "&amp;")
        .replace(/</g, "&lt;")
        .replace(/>/g, "&gt;")
        .replace(/"/g, "&quot;")
        .replace(/'/g, "&#039;");
}

function a(innerHTML, url) {
    return `<a rel="nofollow" href="${escapeHtml(url)}" target="_blank">${innerHTML}</a>`
}

function li(innerHTML) {
    return `<li>${innerHTML}</li>`
}

function ul(innerHTML) {
    return `<ul>${innerHTML}</ul>`
}

function ol(innerHTML) {
    return `<ol>${innerHTML}</ol>`
}

function p(innerHTML) {
    return `<p>${innerHTML}</p>`
}

function h1(innerHTML) {
  return `<h1>${innerHTML}</h1>`
}

function h2(innerHTML) {
  return `<h2>${innerHTML}</h2>`
}

function renderNode(node) {
    if (isTextNode(node)) {
    return renderTextNode(node)
    }

    const innerHTML = toHTML(node.children);
    switch (node.type) {
        case "link":
            return a(innerHTML, node.url);
        case "list-item":
            return li(innerHTML);
        case "ordered-list":
            return ol(innerHTML);
        case "paragraph":
            return p(innerHTML);
        case "unordered-list":
            return ul(innerHTML);
        case "subheading-1":
            return h1(innerHTML);
        case "subheading-2":
            return h2(innerHTML);
        default:
            return innerHTML;
    }
}

function toHTML(nodes) {
    return nodes.map(renderNode).join("");
}