Pages blocked by robot.txt can still be indexed by Google, but the indexing process is quite complicated. It is unlikely that a page blocked by robot.txt will be ranked highly on the search engine since Google cannot determine for sure the content of the page. Before we explain how search engines rank blocked pages, let’s first understand what robot.txt does.
What is Robot.txt?
Robot.txt is a code which is used to block search engines from crawling your entire website or a specified page within a domain. It basically works like a ‘Do not enter’ sign on a door. All search engines obey the block and will not crawl and index the content of the blocked page. Normally, websites use the file to prevent search engines from accessing pages with sensitive information.
Indexing Blocked Pages
Although Google cannot go through the content of your blocked pages, it will still be able to determine what kind of information the page carries. The service simply compares the URL of the blocked page with the URL of other sites that are trying to rank for specific queries. If the URLs have some kind of resemblance, the page is likely to be ranked for that particular search.
However, such pages will usually be ranked poorly, especially if there are many well-written articles on the same query. Google prefers ranking sites that have the relevant keywords in the titles, subtitles and also a high quality and quantity of informative, user focussed and unique content, and also those with many backlinks. If your page is blocked by robot.txt, it will be impossible for the search engine to determine how good the content is, and whether it answers the given query.
The only way a blocked site can rank highly for a given query is if it has many backlinks. Having many backlinks shows that other sites have found your content useful for a particular search query. That means the site deserves a spot on the first page of the search results.
That being said, you will have a better chance of ranking highly if you don’t block your pages with robot.txt. You should only use this file for pages that contain private information. You don’t need to block pages which you want to appear on search results as they are likely to be ranked very poorly.
Robot.txt file is very important to website owners. Almost all websites require this file for some pages since there are some details which other people don’t need to know. Pages that are blocked by the file can still be indexed by Google, although the process will be a lot harder. One thing the search engine will check is the URL of the page. If it resembles that of other sites which are trying to rank highly for a certain query, it will be shown as part of the results. Another important thing they check is the number of backlinks your site has. A large number of backlinks will increase your chance of being ranked by the search engine.