Jump to content


 


Register a free account to unlock additional features at BleepingComputer.com
Welcome to BleepingComputer, a free community where people like yourself come together to discuss and learn how to use their computers. Using the site is easy and fun. As a guest, you can browse and view the various discussions in the forums, but can not create a new topic or reply to an existing one unless you are logged in. Other benefits of registering an account are subscribing to topics and forums, creating a blog, and having no ads shown anywhere on the site.


Click here to Register a free account now! or read our Welcome Guide to learn how to use this site.

Photo

robots.txt


  • Please log in to reply
2 replies to this topic

#1 bluekey

bluekey

  • Members
  • 12 posts
  • OFFLINE
  •  
  • Gender:Male
  • Local time:05:41 AM

Posted 03 September 2010 - 02:36 PM

Hi guys

I understand that there is a convention that prevents search engines from indexing files ,namely, robots.txt

However, as i have read from the internet, few search engines does not follow such convention. How can i prevent them as well?

Is the following

User-agent: *
Disallow: /

able to prevent all search engines from indexing all parts of the website?

bluekey

BC AdBot (Login to Remove)

 


#2 groovicus

groovicus

  • Security Colleague
  • 9,963 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:Centerville, SD
  • Local time:04:41 AM

Posted 03 September 2010 - 03:37 PM

Spiders will index your site regardless of whether you want them to or not. It is only by polite convention that some developers create spiders that will abide by the contents of robots.txt; it is after all merely a text file.

#3 Romeo29

Romeo29

    Learning To Bleep


  • Members
  • 3,194 posts
  • OFFLINE
  •  
  • Gender:Not Telling
  • Location:127.0.0.1
  • Local time:05:41 AM

Posted 04 September 2010 - 06:21 AM

If your web host is using Apache web server, then you can place an .htaccess file inside the folder your do not want anyone to access. Inside this put a line Options -Indexes. Also do not put any link to the pages inside that folder, anywhere on your site (or any other site) so a crawler cannot get them.

If your web host is using anything else than Apache, then I am sorry I do not know.

Edited by Romeo29, 04 September 2010 - 06:26 AM.





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users