RAT data structure

Download study results as an Excel file. Here's what you will find in this file

The Result Assessment Tool allows for automatically scraping search results from search engines, classifying them with existing classifiers and having them assessed by human jurors. In the assessments, the study participants answer questions about the respective search result (e.g. whether the search result is relevant to the query). 

When you download your data from RAT, the system generates a comprehensive Excel file. The structure and the variables in that file, which are described in the tables below, provide a detailed overview of the data. You can also RAT sample output showing how the output looks like.

The following table lists all possible variables in a RAT Dataset. The table is divided into four sections:

  1.  Main data (this is what is generated in all cases, regardless of which classifiers or assessments you use)
  2. Source data (saved HTML source code and a screenshot of the search result)
  3. Evaluation data (data generated by participants when answering questions about a particular search result)
  4. Classification data (data generated with the help of a classifier, i.e., the SEO classifier)

Main data

Variable Description Type
id Unique identifier for each row in the table number
query Keywords of the query. text
url URL of the search result. text
main Main URL of the search result (e.g. google.com instead of google.com/imghp?hl=en). text
search engine Name of the Search Engine text
position Rank position of the search result at the time of scraping. number
title Title of the search result on the search engine result page, given by the search engine. text
description Description of the search result on the search engine result page, given by the search engine. text
ip The IP address of the search result. text
timestamp The timestamp of scraping. timestamp

Source data

Variable Description Type
html HTML source code text
screenshot Screenshot from the search result byte
screenshot_timestamp The timestamp of the screenshot timestamp

Evaluation data

Variable Description Type
participant Unique identifier for the study participant number
id_question Unique identifier for the question asked text
question Text of the question text
value Answer of the participant to the question. text
question_timestamp The timestamp of the answer timestamp

Classification data

Please note that the table describes the output of the SEO classifier. Other classifiers will output different fields; see the documentation.

seo_class Result of classifying the probability of applying search engine optimization methods to the search result. (not optimized; probably not optimized; probably optimized; most probably optimized) text
seo_timestamp The timestamp of classification of SEO probability of the search result. timestamp
ads Matches the domain to a list of websites that use ad services. (1 = match; 0 = no match). number
canonical Check whether canonical links were found in the source code of the search result. number
company Matching the domain with a list of known companies. Check if the domain could be found. number
description Check if a description was found in the source code. number
external Counter of external links in a document. number
h1 Counter of h1 headings in a document. number
https Check if the website uses https. number
internal Counter of internal links in a document. number
keyword_density Calculate the keyword density in a document. number
keywords_in_source Counter of keywords found in the source code. number
keywords_in_url Counter of keywords in the URL. number
Calculated loading time of a web page in seconds. number
loading_time
micros Check whether microdata formats like JSON-LD are used in a document. number
news Matching the domain with a list of known news services. Check if the domain could be found. number
nofollow Counter of nofollow links in a document. text
not_optimized Matching the domain with a list of known non SEO optimized documents. number
counter of elements in a document with an og: tag.
og number
robots_txt Check if any references to SEO were found in the robots.txt file of a domain. number
search_engine_services Matching the domain with a list of known search engine services like books.google.com. number
shops Matching of the domain with a list of known shops. number
Check whether a sitemap was used on the search results web page.
sitemap number
title Check whether a title was used on the search result web page. number
List of identified ad services found on the web page.
tools ads list
tools analytics List of identified analytics services found on the web page. list
List of identified caching plugins found on the web page.
tools caching list
tools content List of identified content services found on the web page. list
tools seo List of identified SEO plugins found on the web page. list
tools social List of identified social plugins found on the web page. list
url_length Length of the URL. number
viewport Check if a viewport tag was found in the document number
wordpress Check, if WordPress is used as CMS number

Further information about the implementation of the SEO classifier is available at https://osf.io/u8d62.

 

Contact information for questions about the data set: sebastian.suenkler@haw-hamburg.de.