The Searchmetrics Suite offers the possibility of a technical analysis of your website. This leads to the identification of potentials as well as the quick identification and minimization of risks. To get a technical analysis a crawl must first be created in the Site Experience (Site Experience > Crawl Overview > Create Crawl).
Content
Things that should be considered when creating a crawl:
- Make sure that the stored URL does not block any crawler. In the best case whitelist the Searchmetrics bot.
- If a start page is defined, it must contain either "https://" or "http://".
Crawl Set-Up
General
At first you have to select the project, for which the analysis should be done. Afterwards you have to name the crawl and choose a search engine.
Caution: A crawl must always be started in a specific project, so you can only choose between the search engines defined in the project. If you want to add more search engines to the project, look here.
The project URL will be automatically set as the starting page of the crawl but can still be edited in the second step (Crawler). In the next step the maximum level of pages must be selected.
With a tick in the field provided, it is now additionally specified whether it is a Javascript page. If you also want to analyze parts of the page that are normally not crawled, such as internal no-follow links, you can also set a checkmark in the respective field. If the crawl is to be performed regularly, a frequency and a start date can also be selected here.
After all settings have been made, the estimated time as well as the estimated amount of page credits of each crawl will be shown in the window on the right side.
In the example below, a crawl is to be set up in the Amazon (DE) project for Google Germany for Desktop. It will start at amazon.de and a maximum of 1,000 pages will be crawled. Additionally it is a Javascript crawl and the crawler is instructed to follow internal nofollow links. Additionally this crawl should be repeated monthly from 28.09.2020.
If everything goes as planned, the estimated duration is one hour and about 3.000 page credits will be charged.
Crawler
If the desired start page does not correspond to the project domain, it can be inserted in the field Crawler start page. For example, if amazon.com/en/ is selected as the start page, only the pages in the /de subdirectory of www.amazon.com will be crawled.
Important: The URL must be inserted with the protocol (https:// or http://)!
If you want to crawl only under this page, you must place the corresponding checkmark in the field.
The Searchmetrics Bot is set as user agent by default. But there is the possibility to change it. For example, if the Google Bot is selected the crawler pretends to be the Google Bot and crawls the contents released to it. You should also set the region of the crawler and the maximum level (layers of the web page) which should be crawled. The maximum crawl speed selects the number of URLs per second that should be sent to your website. Please note that a higher number can affect your server load.
If the crawl result should be compared with a previous one, the respective result can be selected in the bottom bar. When crawl results are compared with each other, the trend becomes visible in the issues (overview page). This indicates the changes from the last crawl, which gives a good overview of the effect, especially in the case of URL improvements.
Advanced
In this step additional settings can be made.
If a static IP should be used to bypass possible barriers, this can be selected with a check mark in the field. If, for example, a test environment is to be crawled, which only allows access via corresponding access data, these can be entered in the corresponding fields.
Furthermore, in this step individual parameters can be removed or excluded from the crawl. Following the same principle, entire URLs can also be excluded.
If you want to exclude for example all of the /de/ subdirectories just put this into the corresponding field.
In the last step, user-defined headers can be defined, which the crawler should send to a URL when requesting.
Saved URL Groups
If saved URL groups are to be included in the crawl, they can be easily selected and added in the last tab.
After all settings have been made, the crawl can be performed. Select Start Crawl (field on the right).
Note: A crawl costs Page Credits. The estimated number is displayed before the start, but may differ from the number actually used. By clicking on "Start Crawl" you agree to use your existing Page Credits.
All completed crawls are listed in the Crawl Overview page of the corresponding project.