Not every failed Crawl is caused by a bug. Sometimes there are other reasons for that. Find the list below of possible causes that could be responsible for the failed crawl.
Crawls fail because URL is blocking the Searchmetrics Bot
The URL that client is wishing to crawl is blocking the Searchmetrics Bot and that causes the error.
Solutions
1) Use a different Bot
The crawl set up offers the possibility to change the user agent for a crawl (e.g. Google). Therefore you have to navigate yourself to the Crawl Setup > Crawler and choose a different User agent. Otherwise the Searchmetrics Bot is set by default.
2) Best Solution: Whitelist the Searchmetrics Bot:
If it is your own URL (you have access to the code/ contact to the coder) you can whitelist the Searchmetrics Bot and allow us to crawl your page in order to prevent failed crawls in the future.
Some websites might have automated systems to block suspicious activity on their server. This could cause some issues when our crawler tries to access those pages. In this case you can whitelist our IPs on your server and then select Crawl Set Up > Advanced Settings to use the static IP.
EU IPs |
US IPs |
145.14.137.0 |
64.140.129.126 |
145.14.138.10 | 64.140.129.128 |
145.14.142.110 | 64.140.129.141 |
The Start page has been choosen without protocol
The choosen start page neither has "https://" nor "http://" in it.
Solution
By default, this field is empty. This means that the crawler starts on the start page of the project URL. If you only want to crawl individual directories, please enter these URLs with the protocol "https://" or "http://" in the field and activate the checkbox "Crawl only under the start page".
Example: If searchmetrics.com/en/ is selected as a start page, only the pages in the /en subdirectory of www.searchmetrics.com will be crawled. This information can be found in the crawl setup > Crawler under custom start page.
If you don't see any connection to the reasons mentioned above, or the problem persists, please contact your CSM or our support.
support@searchmetrics.com