Turnerj C'est en français!

What is Microdata and why should I care?

Jan 29, 2020

To get this out of the way, Microdata is NOT related to Microservices. Its not some paradigm shift with handling or processing data. Microdata is one of 3 popular formats used for describing content within a web page - the two others being RDFa (Resource Description Framework in Attributes) and JSON-LD (JavaScript Object Notation for Linked Data). These are all primarily used for Search Engine Optimization (SEO) however that isn't their sole purpose.

Similar to Microdata and RDFa, there is also Open Graph which has been made popular by Facebook. While it does allow describing of data and is a popular method used by social media websites, it is more limited to what it describes and doesn't flow into the natural HTML of the page like Microdata or RDFa do.

In this post, we will walk through an example of Microdata as seen on dev.to - specifically, the syndicated version of this very post. A typical post on DEV uses Microdata to describe this article, the cover image, the author and the publisher. Let's have a closer look...

<article itemscope itemtype="http://schema.org/Article" itemprop="mainEntity">
      <meta itemprop="url" content="https://dev.to/turnerj/what-is-microdata-and-why-should-i-care-23jk">
      <meta itemprop="image" content="https://res.cloudinary.com/practicaldev/image/fetch/s--dVn-CraX--/c_imagga_scale,f_auto,fl_progressive,h_500,q_auto,w_1000/https://dev-to-uploads.s3.amazonaws.com/i/9ofe2id7kzynypzdkdps.png">
      <div itemprop="publisher" itemscope itemtype="https://schema.org/Organization">
        <div itemprop="logo" itemscope itemtype="https://schema.org/ImageObject">
          <meta itemprop="url" content="https://practicaldev-herokuapp-com.freetls.fastly.net/assets/android-icon-192x192-0409854849dca4043b26f85039b8c3d42cbac2bd8793fec1004eb389fa153877.png">
          <meta itemprop="width" content="192">
          <meta itemprop="height" content="192">
        </div>
        <meta itemprop="name" content="DEV Community">
      </div>
      <header class="title" id="main-title">
        <h1 class="medium" itemprop="name headline">
          What is Microdata and why should I care?
        </h1>
        <h3>
          <span itemprop="author" itemscope itemtype="http://schema.org/Person">
            <meta itemprop="url" content="https://dev.to/turnerj">
            <a href="/turnerj" class="author">
              <img class="profile-pic" src="https://res.cloudinary.com/practicaldev/image/fetch/s--erE_cpgk--/c_fill,f_auto,fl_progressive,h_50,q_auto,w_50/https://dev-to-uploads.s3.amazonaws.com/uploads/user/profile_image/95629/bd0aa8b6-0c56-4a69-a2cf-77e6d484e77c.jpeg" alt="turnerj profile image" />
              <span itemprop="name">James Turner</span>
            </a>
          </span>
        </h3>
          <div class="tags">
              <a class="tag" href="/t/webdev" style="background-color:#562765;color:#ffffff">#webdev</a>
          </div>
      </header>
      <div class="body" data-article-id="250783" id="article-body" itemprop="articleBody">
        <p>To get this out of the way, no, Microdata is not related to Microservices. Its not some paradigm shift with handling or processing data. Microdata is one of 3 distinct formats used for describing content within a web page - the two others being RDFa and JSON-LD.</p>

...

There is a lot of stuff going on there! Let's break it down into smaller chunks.

<article itemscope itemtype="http://schema.org/Article" itemprop="mainEntity">

We have our <article> HTML tag with a few interesting attributes.

  • itemscope: Think of this as saying "I'm an object with sub-properties". Inside the article tag, we will find more properties.
  • itemtype: This describes what type the child properties belong to. With this in mind, you'll always see itemscope and itemtype on the same tag.
  • itemprop: This says the property that this object/value belongs to.

You might have a few questions like "What is schema.org?" and "Why does the article have an itemprop set - what is it even set to?".

Firstly, schema.org is a vocabulary to describe types - they define the types you can choose from and the properties that you can set. It is a community effort founded by Google, Microsoft, Yahoo and Yandex to help describe the web. While you will likely find many examples of Microdata, RDFa and JSON-LD using schema.org, these formats aren't tied to it - they can use any vocabulary as long as the desired third-party can understand it. However for the purposes of this article, I will keep referring to types as defined by schema.org.

Regarding itemprop existing on the article tag but there being no parent element with an itemscope, web pages can be thought like implicitly being schema.org WebPage type. The property mainEntity allows defining that the article is the most primary object of the web page.

So what do we know now? We have an article of type Article defined by schema.org which is the main entity of the web page. Right now that isn't a lot of information so lets keep digging...

<article itemscope itemtype="http://schema.org/Article" itemprop="mainEntity">
      <meta itemprop="url" content="https://dev.to/turnerj/what-is-microdata-and-why-should-i-care-3o3o">
      <meta itemprop="image" content="https://res.cloudinary.com/practicaldev/image/fetch/s--dVn-CraX--/c_imagga_scale,f_auto,fl_progressive,h_500,q_auto,w_1000/https://dev-to-uploads.s3.amazonaws.com/i/9ofe2id7kzynypzdkdps.png">

Underneath the article tag there are... meta tags?! What might seem unusual, meta tags used in this purpose are useful for Microdata (or RDFa) to describe information that isn't displayed on the page. While the second of the two meta tags is actually the cover image (so it is actually displayed), this really comes down to personal preference on what to use. Anyway, these two tags describe the url and image of the page.

      <div itemprop="publisher" itemscope itemtype="https://schema.org/Organization">
        <div itemprop="logo" itemscope itemtype="https://schema.org/ImageObject">
          <meta itemprop="url" content="https://practicaldev-herokuapp-com.freetls.fastly.net/assets/android-icon-192x192-0409854849dca4043b26f85039b8c3d42cbac2bd8793fec1004eb389fa153877.png">
          <meta itemprop="width" content="192">
          <meta itemprop="height" content="192">
        </div>
        <meta itemprop="name" content="DEV Community">
      </div>

What we have here is the publisher property defined as an Organization type. With it having the itemscope attribute, we know its an object with its own properties (though as I noted earlier, having itemtype effectively gives this away too).

This Organization has a logo (an ImageObject type) for which has a number of its own properties too including the url, width and height of the logo.

We can also see the name of the Organization is "DEV Community".

      <header class="title" id="main-title">
        <h1 class="medium" itemprop="name headline">
          What is Microdata and why should I care?
        </h1>
        <h3>
          <span itemprop="author" itemscope itemtype="http://schema.org/Person">
            <meta itemprop="url" content="https://dev.to/turnerj">
            <a href="/turnerj" class="author">
              <img class="profile-pic" src="https://res.cloudinary.com/practicaldev/image/fetch/s--erE_cpgk--/c_fill,f_auto,fl_progressive,h_50,q_auto,w_50/https://dev-to-uploads.s3.amazonaws.com/uploads/user/profile_image/95629/bd0aa8b6-0c56-4a69-a2cf-77e6d484e77c.jpeg" alt="turnerj profile image" />
              <span itemprop="name">James Turner</span>
            </a>
          </span>
        </h3>

Skipping right along, we see the H1 tag's itemprop describe what looks like two different properties. Yes, that is right - itemprop allows space which allows setting two properties at once. In this case, "What is Microdata and why should I care?" is set to the properties name and headline of the Article.

Then we have another property, author, as an object of nested properties. With the url and name properties defined.

Finally on our journey of discovery, we have this:

      <div class="body" data-article-id="250783" id="article-body" itemprop="articleBody">
        <p>To get this out of the way, no, Microdata is not related to Microservices. Its not some paradigm shift with handling or processing data. Microdata is one of 3 distinct formats used for describing content within a web page - the two others being RDFa and JSON-LD.</p>

A nice simple property, articleBody, that defines the entire body of the article to that element.

What does this all mean? If we parsed this page with the understanding of Microdata, we'd know at lot of specific details about the article, the author, the publisher and content. That the web page specifically points these details out in a standardised fashion makes it easier for those that would benefit from this detailed data.

But who does benefit from this data? Why should I care about Microdata?

Have you used Siri, Alexa or Google's voice assistant? Have you used shopping/price tracking websites to find the lowest price for a product? Have you searched for something on Google and seen the details pane on the right?

Google Search Details Pane

Services and features like these rely on data and while some (particularly voice assistants) might rely on dedicated APIs for their data, others need to effectively scrape the web for it. With every website having different HTML, class names and structures, being able to pull valuable information out of the page is difficult.

Microdata, RDFa or JSON-LD are used as ways to communicate valuable information from a website in a format the other systems can interpret. As these formats embed directly in regular HTML, it isn't a paradigm shift in how things are built to communicate this detail.

One of the biggest benefits I personally see with structured data is the decentrailization of data where individual websites can promote their data in a more structured way, allowing any number of third party tools to consume it.

Whether it is to build more advanced voice assistants, better price trackers or smarter search engines, structured data (through formats like Microdata) provide tools of the future a standardised way to read the web.

Additional Resources