Week 4: Probability Manipulation in SEO / by Valzorra


During our Building the World Session on Tuesday, James gave a fantastic example of how Markov Chains were used in Marketing by illustrating the flow of customers between two brands based on data that the Brand A had supposedly gathered about its customers and marketing strategies. This got me thinking about other potential uses of Markov Chains within the fields of Marketing and Software Engineering. A very exciting meeting point between both Software Engineering and Marketing is the study of Search Engine Optimisation. Search Engine Optimisation can be incredibly useful to us as students going into an extremely competitive environment after our studies, so learning the fundamentals of SEO also has the practical application of potentially increasing our own website’s popularity. However, that’s not the most exciting part about this bit of research. What I want to explore is what factors feature into SEO, how that data can be manipulated through the use of Markov Chains, and how the data can best be visualised. But in order to get to the more fun parts, first I need to gain a greater understanding of SEO.

No individual factor could ensure the success of a page by itself, they must all work in relation to each other for optimal results.

On-Page Search Engine Optimisation (SEO) refers to the practice of editing individual web pages to help them gain relevancy and rank higher amongst search engine results. On-Page SEO relates to the type of content published on a page, how user-friendly that page is, how well designed it is, whether HTML has been used to its full potential, and more. Additionally, On-Page SEO is almost entirely under the control of the page’s publisher. Although there are dozens of different factors that go into On-Page SEO, the most significant ones are listed and examined below. It’s important to note that there is no single one factor that will ensure search engine success, but rather all these components must be utilised and well-developed together.



The content of a website is the single most important factor that determines its success within search engines. No matter how many Search Engine Optimisation techniques are applied, poorly researched and low-quality content is highly unlikely to climb to the top of the search results. Judging what constitutes as good content can be both subjective and difficult, however, the three most note-worthy aspects for it are detailed below.


Quality indicates how valuable the contents of a page are to its visitors. High quality content goes beyond what other similar sites offer, satisfies users, and incentivizes them to stay on the page for long periods of time. What does the page provide that users would not be able to find elsewhere? Is the information on the site distinct or useful? Those are some of the questions one may ask when determining the quality of a site’s content.

Panda attempts to mimic a human point of view.

Search engines use a variety of techniques to determine whether a page contains high quality content. User engagement metrics are key to making that judgement. After a user searches for a query the search engine lists a series of results to them. If they click on the first result, then immediately click back and move on to the second, then the first result must have been unsatisfactory to that user. By gathering millions of data points about the time visitors spend on a page, search engines can estimate how valuable that content is. High quality content engages visitors and keeps them on the page for more than a few seconds. In addition to user engagement metrics, search engines also use machine learning to determine the quality of a page’s content. Google’s 2011 Panda update significantly changed their ranking algorithm. The company used human evaluators to rate the quality of thousands of web sites and they then incorporated machine learning to mirror the evaluation of humans. Once it was able to evaluate websites in the same manner as humans, Panda was released across the web, assessing every web page. The key bit to remember is that Panda attempts to mimic a human point of view. Therefore, the content of a web site must be designed to be valuable and useful to humans, rather than to attempt to artificially rank higher. More on this will follow shortly.

Keywords and Word Choice

Keyword research is an essential and high-return factor to search engine success if done correctly. A website must strive to rank for the correct keywords based that website’s market. Researching the market’s demand for specific keywords can not only provide a target for search engine optimisation, but also reveals information about what users want, need, and how that changes, thus enabling websites to adapt. Additionally, appropriate keywords are more likely to direct interested visitors to the site, rather than just general users that are more likely to click away. Therefore, appropriate use of keywords can feed into content quality as that way interested users will stay on the page longer. In addition to finding the adequate keywords for an individual site, it’s also important that they are used throughout the pages in a natural manner. Flooding a page with keywords in an effort to artificially rank higher in the reach results is highly inefficient. Keywords should rather flow naturally, avoiding unnecessary repetition. This will make the page easier to read for humans, making the site user-friendly, and improving the quality of its content.

Vertical Search

A search engine performs a vertical search when it looks only for specific types of results to display. For example, Google Images is a specialised search engine that only provides images to its users. A web page is likely to rank higher if it incorporates a variety of relevant media that can be efficiently picked up by vertical searches as well. These can include images, video, news, maps, and other forms of media. However, as with the use of keywords, it’s important that all of these elements follow the natural and logical flow of the page and should not be included if they are irrelevant.

Design and Architecture

The structure of a web page refers to how easy that page is to read and understand by both search engines and humans. Even if a website is filled with high quality content, inadequate structure and architecture can negatively impact its success in search engine ranking.


Crawlability refers to a how easy or difficult it is for the search engine to go through a web page and store a copy of it in its index. When a user searches for something, the search engine goes through that index to provide the most relevant results. Therefore, if the engine has had difficulty crawling through a page, it may not provide that result to the user. The easiest type of information for a search engine to index is HTML text. Therefore, the most common type of data on a web page should be in that format. JavaScript, Flash, and even images are often ignored or devalued by crawlers. However, there are way to have a variety of visual content, and still have great crawlability. Using alt-text for images, plugins for Flash and JavaScript, and providing a transcript for videos are all ways that the information can be indexed easier.

User Experience and Interface

While Crawlability refers to how the search engine interprets the data on a web page, User Experience and Interface refers to how easy it is for humans to read and understand its content. The content needs to be intuitive to use and navigate, while also providing direct and relevant information to the query. Additionally, a professionally designed website with a well-structured layout is likely to fair better in the search engine rankings. Users typically consume content that is not only useful and innovative, but also aesthetically pleasing and clear, which is why the overall design of a web page must account for that.

Mobile Version

As of 2015, it has been recorded that more Google searches take place on mobile devices than on desktop. Therefore, websites that are mobile-friendly tend to be ranked higher than those without mobile support due to the large number of searches on mobile. Not only that, but websites that are optimised for such devices also look and feel better for the users themselves, which feeds into the content section of On-Page SEO Factors. For those cases where a website also has an app, both Google and Bing offer app indexing and linking, which means that users can be directed from the search results straight onto an app.


HTML is the underlying code of all websites and webpages. It’s important to understand HTML, because that is the way a publisher of a web page can communicate efficiently with search engine and thus boost their position in the results page. Dozens of HTML tags send specific signals to search engines about the importance and hierarchy of the content. Below is a summary of some of the most important tags and ways to approach HTML to optimise a site for search engines.

Title Tag

The Title Tag is arguably the most important tag when it some to Search Engine Optimisation. It clearly states what each individual page of a website is about and what sort of content users are likely to find if they view that page. For optimal results, titles should be very clear and descriptive, and should ideally include specifics about what users are likely to find on the page. Additionally, titles should also include keywords based on the keyword research mentioned above in order to take full advantage of the title tag and its visibility on the search results page.

Overall Structure

This section is dedicated to other HTML tags that are less significant to SEO success, but are still worth noting and managing correctly. The meta-description of a page serves as a short blurb of that page’s content. This text appears directly underneath the title in the search engine results page. To take full advantage of the meta-description, one needs to use the same keywords in that text as the keywords used in the title. This continuity aids in letting search engines know what the page is about, which helps them rank that page more efficiently. Additionally, header tags are a good way of naturally including keywords into the content, while also providing search engines with more information on what the page is about. Not only that, but header tags also tend to break down large bulks of text, thus making the page easier to consume for humans as well. However, as with the use of keywords, it is important that headers are used naturally within a page rather than artificially structured and overused. Good UX and UI has priority over efficient header and meta description use.

Relationships between On-Page SEO Factors

Now that the most important On-Page Search Optimisation Success Factors have been detailed and explained, it’s important to examine how they relate to each other and how significant each one of those factors is to the overall ranking of a page. The key thing to remember is that any individual factor could not ensure the success of a page by itself, and they must all work in relation to each other for optimal results. Nonetheless, some factors carry more weight than others, which can give publishers an idea of what they should focus on. The relationships and weights of each of the discussed factors are summarised in the following charts.

SEO Factors do not always work individually, and usually efforts to make improvements in on one factor also positively impact another. For example, excellent market research on the most appropriate Keywords for a web page can also improve that page’s Vertical Search, Content Quality, Title, and Overall HTML Structure.

SEO Factors do not always work individually, and usually efforts to make improvements in on one factor also positively impact another. For example, excellent market research on the most appropriate Keywords for a web page can also improve that page’s Vertical Search, Content Quality, Title, and Overall HTML Structure.

The relative impact of each Search Engine Optimization On-Page Factor is rated on a scale of 1-5. Content Quality and Keywords are the most influential factors in terms of ranking higher in search results, while Vertical Search and HTML Structure are not as crucial to search engine success.

The relative impact of each Search Engine Optimization On-Page Factor is rated on a scale of 1-5. Content Quality and Keywords are the most influential factors in terms of ranking higher in search results, while Vertical Search and HTML Structure are not as crucial to search engine success.

Manipulating Probabilities through Markov Chains

Markov Chains are a method for controlling chance and manipulating probability based on the results we want.

Having established the key variables that feature into On-Page SEO, how they affect each other, and what their relative weights are, it becomes apparent that all of this complex data can be inputted into Markov Chains. The Start State Matrix would have estimated values by the publisher for all of the On-Page SEO variables, while the Transition Matrix would show how much those variables would be improved by. What’s more is that by changing certain values in the Transition Matrix, one could estimate what strategies to implement in order to help their website rank higher. Let’s look at an example with a website we will call Bubble Unicorn Donuts. Bubble Unicorn Donuts currently ranks 10th of the 2nd page of Google, when one searches “donuts“. As that does not provide a lot of traffic, the publisher of Bubble Unicorn Donuts would like to boost their rank when users search for the keyword “donuts“. Well, at that point, the publisher would have a look at all of their On-Page SEO Variables that we mentioned above and input values for them in their Start State Matrix. Then the publisher would have to establish which variables they would like to improve. Oh, it turns out that Bubble Unicorn Donuts does not support a Mobile Version on their website. Therefore, by increasing the value for Mobile Version in the Transition Matrix, the publisher of Bubble Unicorn Donuts would be able to calculate the probability of traffic increasing if they were to support Mobile Versions. After they’ve run all of the numbers, it turns out that a Mobile Version of their site is most likely going to boost their rank by a couple of positions. Then the publisher of Bubble Unicorn Donuts would take a look at the other variables, change them further, and would theoretically be able to estimate how to get on the top five results for “donuts“.

This manipulation of probability and the incredible predictive power of Markov Chains can allow users to completely shift the tides when it comes to the performance of a site or marketing strategy. On the other hand, Markov Chains can aid in making gameplay varied, exciting, and difficult to predict by dictating the behaviour of NPCs based on their individual stats and personalities. They are a method by which we can control chance and manipulate probability based on the results we would like and factors we may wish to take into account.