Part of Datacop's Blog Series on Data Science in the Digital Economy (#4)
A Uniform Resource Locator or known better by its acronym ‘URL’ is a reference to a web resource location on the internet. Think of it as the telephone number for the address. This is the string value inserted into a browser’s “address bar”. URLs can be quite long and usually contain these 4 elements: always the scheme and the domain and optionally path, and queries.
In this blog post, we will consider three common types of URLs you may encounter. We will break them down into their component parts and examine why they are there and what function they serve for web marketers and analysts.
A) TheEconomist Promoting Spring Sale URL
B) Wall Street Journal Promoting Specific Article URL
C) Typical E-shop Category Page URL
1. Scheme - a) https:// b) https:// c) https:// The Scheme of a URL identifies the protocol to be used in accessing a web resource. The most well known examples of this is the http(s) application protocol that accesses websites. Other common schemes include ftp used for transferring files, mailto for sending emails, data for storing a table, image, etc. 2. Domain - a) wsj.com b) subscribenow.economist.com c) finlaysonshop.com The domain name is the unique reference of the location. There can only be one website on a domain name. Because each combination of letters can only be used once, commonly known words, acronyms, etc. have become a digital asset with high valuations. For instance, in 2004, the domain name Beer.com was sold for 7 million USD. There are at least 37 publicly known purchases of domain names with transaction values of more than 3 million USD. You can view the full list here:
https://en.wikipedia.org/wiki/List_of_most_expensive_domain_names 3. Path - a) /articles/march-jobs-report-unemployment-rate-2021-11617314225-11610326800 b) / c) /collections/bathroom/ The path is the secondary location reference within the URL. It refers to the location of the web resource within the domain location itself. Sometimes path is “empty” characterised in analytic tools as a “/”. This signifies that the path sub-location is the homepage of the website. In datacop we also recommend using path as the primary sub-location to split your session_starts, session_ends and page_visits by; not by the URL whole. This is because as we will see in the Query part of the domain, there can be a lot of values attached to the Scheme, Domain and Path of a URL. This will produce skewed results that seem correct. However, if there are sessions on the homepage that came through a Google Adwords click, it will actually be split out into a separate URL value. Splitting your session events by the path will avoid this potential problem. Path is also very valuable in breaking down the session activity within the website itself. Almost every digital analytical tool will track each load of a website as an event. Usually called something like a “page_visit”. Since by 2020 websites have become quite complex, a single website may have many sub “areas” of the page. A typical e-commerce shop would have around 8 areas of the website: category pages, product pages, the home page, checkout pages, the orders section, the blog, the user’s account and searching. This provides an insight into where most visitors “web-activity” is taking place. This kind of an analysis would be similar to an offline supermarket recording in which parts of the store do customers spend time in. Below you can see an example of the distribution of the activity on such an e-shop. These numbers may vary from site to site. For instance an e-shop with a much larger catalogue would likely experience more search activity. If we are examining the site of a telecom website, the account may make up to a third of all the activity on the site. However, in most e-commerce situations we can expect product and category page visits to make up the majority of all the activity on site. In the below example we can approximately three quarter of all the activity in that week was on category and product pages.
4. Query - a) ?mod=e2fb&fbclid=IwAR2ks5DNstcvuysT1BPy2RQMYFXKGqDZsNgfMBoc-Fy8UeuuffOga0v0I_s b) ?&utm_campaign=a_21springsale_FY2122_Q1_conversion-bau-21springsale_hot&utm_medium=social-paid&utm_source=facebook-instagram&utm_content=dr_staticlinkad_np-wallstreet-w-apr_nonotebook-50off1yrdo&utm_term=rt-websub-1v90d&fbclid=IwAR2Iv5837bZVWp4BUATFhV_7d_yN-2AfxMmUdzTxcvh93mmrF7QGblCvN-w c) N/a
Query is an optional component of the URL. If present in the URL, It is always preceded by a question mark symbol (?). Query is used in digital marketing campaign tagging in E-Mail, Online Advertising, Push Notifications, etc. In Datacop we work frequently with the tagging queries. There are two types of tags web analysts and marketers should be aware of: Click Identifiers and UTM Tags.
Click Identifiers are used by Online Advertisement providers such as Facebook, Google and Bing to be able to attribute traffic and conversion to specific campaigns launched. This also enabled campaign evaluation. As you can see, each of the online advertisers hash their identifiers. Therefore, if a URL contains one of these click identifiers you can know for sure that is traffic generated by online advertising campaigns.
UTM Tags were invented by Urchin a company that was acquired by Google to resolve the issue of manual digital campaign tagging. Certain UTM settings are often set by default by most marketing automation providers, however having a well thought out UTM structure helps any marketing team to make sense of their campaign data. Therefore, if the marketing campaigns are set with an appropriate UTM structure - it is possible for the marketer and analyst to track down traffic and revenues that resulted from the investments into online advertising.