in ,

Zhihu has completely banned Google and Bing from crawling content. Does it really seem to be afraid that the content will be used to train AI?


Yesterday, Blue Dot Network mentioned that Zhihu has begun to force login to an accountOtherwise, you will be prohibited from viewing the complete content of Zhihu questions and answers and columns. It is meaningless to block the Zhihu login pop-up window through scripts before, because the login window will still pop up when you want to view the content after blocking the pop-up window.

It is not clear why users are forced to log in. Considering that Zhihu has a very high weight in search engines, the Zhihu page may be seen at the top when searching for questions in major search engines. Forced login will affect a large number of users.

One possible reason is that Zhihu does not want its content to be captured by major search engines or other crawlers to train AI models. Now it is very easy to restrict crawling by technical means after forced login. For example, if a user visits a large number of pages in a short period of time, it is definitely abnormal.

Zhihu has completely banned Google and Bing from crawling content. Does it really seem to be afraid that the content will be used to train AI?

Search engines such as Google and Bing are also banned:

It is worth noting that this morning, Blue Dot Network received feedback from netizens that Zhihu’s robots.txt file had been modified sometime in April or May (The specific time may be May 22, which is close to the mandatory login time of Zhihu.), this modification allows Zhihu only Baidu Search and Sogou (Sogou is a new addition, and Sogou was previously prohibited from crawling content) to crawl its content, and no longer allows content from other search engines.

Zhihu reached a cooperation with Baidu a few years ago, so the page weight of Zhihu in Baidu search is very high and can get more clicks. Now only Baidu and Sogou are allowed and all other search engines are prohibited. I don’t know whether Zhihu has reached some agreement with Baidu.

The current situation is that any new content posted by users on Zhihu will not be included in Google and Bing, which means that all new content on Zhihu can no longer be found on Google and Bing.

Most likely it’s an AI training problem:

Whether it is forced login or prohibiting search engines from crawling content, these operations look very much like Zhihu is using it to train artificial intelligence in order to prevent its content from being crawled.This is essentially similar to what Elon Musk did with X/Twitter.。

For large content websites, selling data during the AI ​​boom is indeed a good way to monetize, but banning crawling also means that the open Internet is gradually becoming closed.

In the future, more websites may prohibit search engines or other crawlers from crawling content and may even force login. This is definitely not good news for Internet users.

Attached is Zhihu’s previous robots.txt file: (2024-04-28)

User-agent: Googlebot
Disallow: /appview/
Disallow: /login
Disallow: /logout
Disallow: /resetpassword
Disallow: /terms
Disallow: /search
Allow: /search-special
Disallow: /notifications
Disallow: /settings
Disallow: /inbox
Disallow: /admin_inbox
Disallow: /*?guide*

User-agent: Googlebot-Image
Disallow: /appview/
Disallow: /login
Disallow: /logout
Disallow: /resetpassword
Disallow: /terms
Disallow: /search
Allow: /search-special
Disallow: /notifications
Disallow: /settings
Disallow: /inbox
Disallow: /admin_inbox
Disallow: /*?guide*

User-agent: Baiduspider-news
Disallow: /appview/
Disallow: /login
Disallow: /logout
Disallow: /resetpassword
Disallow: /terms
Disallow: /search
Allow: /search-special
Disallow: /notifications
Disallow: /settings
Disallow: /inbox
Disallow: /admin_inbox
Disallow: /*?guide*

User-agent: Baiduspider
Disallow: /appview/
Disallow: /login
Disallow: /logout
Disallow: /resetpassword
Disallow: /terms
Disallow: /search
Allow: /search-special
Disallow: /notifications
Disallow: /settings
Disallow: /inbox
Disallow: /admin_inbox
Disallow: /*?guide*

User-agent: Baiduspider-render
Disallow: /appview/
Disallow: /login
Disallow: /logout
Disallow: /resetpassword
Disallow: /terms
Disallow: /search
Allow: /search-special
Disallow: /notifications
Disallow: /settings
Disallow: /inbox
Disallow: /admin_inbox
Disallow: /*?guide*

User-agent: Baiduspider-image
Disallow: /appview/
Disallow: /login
Disallow: /logout
Disallow: /resetpassword
Disallow: /terms
Disallow: /search
Allow: /search-special
Disallow: /notifications
Disallow: /settings
Disallow: /inbox
Disallow: /admin_inbox
Disallow: /*?guide*

User-agent: bingbot
Disallow: /appview/
Disallow: /login
Disallow: /logout
Disallow: /resetpassword
Disallow: /terms
Disallow: /search
Allow: /search-special
Disallow: /notifications
Disallow: /settings
Disallow: /inbox
Disallow: /admin_inbox
Disallow: /*?guide*

User-Agent: *
Disallow: /

The latest robots.txt file (2024-05-27):

User-agent: Baiduspider-news
Disallow: /appview/
Disallow: /login
Disallow: /logout
Disallow: /resetpassword
Disallow: /terms
Disallow: /search
Allow: /search-special
Disallow: /notifications
Disallow: /settings
Disallow: /inbox
Disallow: /admin_inbox
Disallow: /*?guide*

User-agent: Baiduspider
Disallow: /appview/
Disallow: /login
Disallow: /logout
Disallow: /resetpassword
Disallow: /terms
Disallow: /search
Allow: /search-special
Disallow: /notifications
Disallow: /settings
Disallow: /inbox
Disallow: /admin_inbox
Disallow: /*?guide*

User-agent: Baiduspider-render
Disallow: /appview/
Disallow: /login
Disallow: /logout
Disallow: /resetpassword
Disallow: /terms
Disallow: /search
Allow: /search-special
Disallow: /notifications
Disallow: /settings
Disallow: /inbox
Disallow: /admin_inbox
Disallow: /*?guide*

User-agent: Baiduspider-image
Disallow: /appview/
Disallow: /login
Disallow: /logout
Disallow: /resetpassword
Disallow: /terms
Disallow: /search
Allow: /search-special
Disallow: /notifications
Disallow: /settings
Disallow: /inbox
Disallow: /admin_inbox
Disallow: /*?guide*

User-agent: Sogou web spider
Disallow: /appview/
Disallow: /login
Disallow: /logout
Disallow: /resetpassword
Disallow: /terms
Disallow: /search
Allow: /tardis/sogou/
Disallow: /notifications
Disallow: /settings
Disallow: /inbox
Disallow: /admin_inbox
Disallow: /*?guide*

User-Agent: *
Disallow: /

Thanks to netizen Yan Liming for posting the message

Copyright Statement: Thank you for your reading. If you need to reprint this article, please indicate the source as Blue Dot Network and mark the hyperlink to this article. Thank you!

What do you think?

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings

Ransomware attack exposes personal information of one-third of Americans, doctors need to submit insurance claims manually

NASA's Psyche probe successfully launched ion thrusters and will travel through the solar system next