Download study results as an Excel file. Here's what you will find in this file
The Result Assessment Tool allows for automatically scraping search results from search engines, classifying them with existing classifiers and having them assessed by human jurors. In the assessments, the study participants answer questions about the respective search result (e.g. whether the search result is relevant to the query).
When you download your data from RAT, the system generates a comprehensive Excel file. The structure and the variables in that file, which are described in the tables below, provide a detailed overview of the data. You can also RAT sample output showing how the output looks like.
The following table lists all possible variables in a RAT Dataset. The table is divided into four sections:
- Main data (this is what is generated in all cases, regardless of which classifiers or assessments you use)
- Source data (saved HTML source code and a screenshot of the search result)
- Evaluation data (data generated by participants when answering questions about a particular search result)
- Classification data (data generated with the help of a classifier, i.e., the SEO classifier)
Main data
Variable | Description | Type |
id | Unique identifier for each row in the table | number |
query | Keywords of the query. | text |
url | URL of the search result. | text |
main | Main URL of the search result (e.g. google.com instead of google.com/imghp?hl=en). | text |
search engine | Name of the Search Engine | text |
position | Rank position of the search result at the time of scraping. | number |
title | Title of the search result on the search engine result page, given by the search engine. | text |
description | Description of the search result on the search engine result page, given by the search engine. | text |
ip | The IP address of the search result. | text |
timestamp | The timestamp of scraping. | timestamp |
Source data
Variable | Description | Type |
html | HTML source code | text |
screenshot | Screenshot from the search result | byte |
screenshot_timestamp | The timestamp of the screenshot | timestamp |
Evaluation data
Variable | Description | Type |
participant | Unique identifier for the study participant | number |
id_question | Unique identifier for the question asked | text |
question | Text of the question | text |
value | Answer of the participant to the question. | text |
question_timestamp | The timestamp of the answer | timestamp |
Classification data
Please note that the table describes the output of the SEO classifier. Other classifiers will output different fields; see the documentation.
seo_class | Result of classifying the probability of applying search engine optimization methods to the search result. (not optimized; probably not optimized; probably optimized; most probably optimized) | text |
seo_timestamp | The timestamp of classification of SEO probability of the search result. | timestamp |
ads | Matches the domain to a list of websites that use ad services. (1 = match; 0 = no match). | number |
canonical | Check whether canonical links were found in the source code of the search result. | number |
company | Matching the domain with a list of known companies. Check if the domain could be found. | number |
description | Check if a description was found in the source code. | number |
external | Counter of external links in a document. | number |
h1 | Counter of h1 headings in a document. | number |
https | Check if the website uses https. | number |
internal | Counter of internal links in a document. | number |
keyword_density | Calculate the keyword density in a document. | number |
keywords_in_source | Counter of keywords found in the source code. | number |
keywords_in_url | Counter of keywords in the URL. | number |
Calculated loading time of a web page in seconds. | number | |
loading_time | ||
micros | Check whether microdata formats like JSON-LD are used in a document. | number |
news | Matching the domain with a list of known news services. Check if the domain could be found. | number |
nofollow | Counter of nofollow links in a document. | text |
not_optimized | Matching the domain with a list of known non SEO optimized documents. | number |
counter of elements in a document with an og: tag. | ||
og | number | |
robots_txt | Check if any references to SEO were found in the robots.txt file of a domain. | number |
search_engine_services | Matching the domain with a list of known search engine services like books.google.com. | number |
shops | Matching of the domain with a list of known shops. | number |
Check whether a sitemap was used on the search results web page. | ||
sitemap | number | |
title | Check whether a title was used on the search result web page. | number |
List of identified ad services found on the web page. | ||
tools ads | list | |
tools analytics | List of identified analytics services found on the web page. | list |
List of identified caching plugins found on the web page. | ||
tools caching | list | |
tools content | List of identified content services found on the web page. | list |
tools seo | List of identified SEO plugins found on the web page. | list |
tools social | List of identified social plugins found on the web page. | list |
url_length | Length of the URL. | number |
viewport | Check if a viewport tag was found in the document | number |
wordpress | Check, if WordPress is used as CMS | number |
Further information about the implementation of the SEO classifier is available at https://osf.io/u8d62.
Contact information for questions about the data set: sebastian.suenkler@haw-hamburg.de.