Jump to content


 


Register a free account to unlock additional features at BleepingComputer.com
Welcome to BleepingComputer, a free community where people like yourself come together to discuss and learn how to use their computers. Using the site is easy and fun. As a guest, you can browse and view the various discussions in the forums, but can not create a new topic or reply to an existing one unless you are logged in. Other benefits of registering an account are subscribing to topics and forums, creating a blog, and having no ads shown anywhere on the site.


Click here to Register a free account now! or read our Welcome Guide to learn how to use this site.

Photo

Preventing Robots


  • Please log in to reply
9 replies to this topic

#1 lisa

lisa

  • Members
  • 4 posts
  • OFFLINE
  •  
  • Local time:06:19 PM

Posted 25 June 2006 - 09:46 AM

I want to remove a site from search engines. This is information I have....
=========

To remove your site from search engines and prevent all robots from crawling it in the future, place the following robots.txt file in your server root:

User-agent: *
Disallow: /
==========

Where is the 'server root'? Does anyone know how is this done?

Thanks!
Lisa

(Moderator edit: moved post to more appropriate forum. jgweed)

Edited by jgweed, 25 June 2006 - 02:14 PM.


BC AdBot (Login to Remove)

 


#2 groovicus

groovicus

  • Security Colleague
  • 9,963 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:Centerville, SD
  • Local time:04:19 PM

Posted 25 June 2006 - 05:38 PM

The server root is just that, the root directory. What sort of server are you running?

#3 nosnhoj#3

nosnhoj#3

  • Members
  • 245 posts
  • OFFLINE
  •  
  • Location:127.0.0.1
  • Local time:03:19 PM

Posted 25 June 2006 - 09:22 PM

Hello,

Just to add a bit to what has already been suggested. You should upload the Robots.txt file to the same directory that your default "home" or "index" page resides. Example:

Your Root Directory may look something like this-----

index.html

contact.html

page1.html

page2.html

robots.txt


Images

image1.jpg
image2.jpg

Stylesheeets

master.css
contact.css



Images and Stylesheets represent folders inside the root directory that contain images and stylesheets respectively.


Hope this helps,


nos :thumbsup:
When I'm right, I'm right....
And when I'm wrong, I could have been right....
So I'm still right, cause I could have been wrong.

#4 lisa

lisa
  • Topic Starter

  • Members
  • 4 posts
  • OFFLINE
  •  
  • Local time:06:19 PM

Posted 27 June 2006 - 07:18 PM

groovicus and nosnhoj#3
I wish I knew how to answer 'what server' I am running. I have worked with the computer a lot, but I guess you can tell from that answer, not enough. This info is showing up on a search engine from another site that I visit, not owned by me. I have contacted the site and asked how to block that info going out to the search engines, but I was wondering if it's something I needed to do also. So, bearing in mind I don't have a clue right now........I go into my home page root directory on my computer and place the Robots.txt file in it, and it will block the search engines? Is that what you mean?
Thanks,
Lisa :thumbsup:

#5 groovicus

groovicus

  • Security Colleague
  • 9,963 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:Centerville, SD
  • Local time:04:19 PM

Posted 27 June 2006 - 07:31 PM

Ok, then let's start simply. Is the server in your posession? What operating system is running on it?

#6 medab1

medab1

  • Members
  • 698 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:earth
  • Local time:05:19 PM

Posted 28 June 2006 - 01:42 AM

Do you use a website host like Angelfire or Tripod or Geocities or Lycos?

Or do you actually own a server machine & run the website from it in your home?

Or do you use Apache or something similar?

#7 lisa

lisa
  • Topic Starter

  • Members
  • 4 posts
  • OFFLINE
  •  
  • Local time:06:19 PM

Posted 03 July 2006 - 05:54 PM

Okay, just got back....
It's my computer in my home. DSL connection. Web site is not mine, info is being pulled off of it regarding my posts and searches from that site and shows up on Google. There is one spam post which is not good, listed on Google that has nothing to do with me. Plus the other stuff doesn't need to be out there. Nothing incriminating but just don't want it there.
Thanks,
Lisa

#8 nyiddle

nyiddle

  • Members
  • 64 posts
  • OFFLINE
  •  
  • Location:New Jerseh.
  • Local time:06:19 PM

Posted 05 July 2006 - 01:03 AM

Okay, just got back....
It's my computer in my home. DSL connection. Web site is not mine, info is being pulled off of it regarding my posts and searches from that site and shows up on Google. There is one spam post which is not good, listed on Google that has nothing to do with me. Plus the other stuff doesn't need to be out there. Nothing incriminating but just don't want it there.
Thanks,
Lisa

I think I know what you're talking about, but you can't really fix that since the website isn't yours. You could tell the person who owns the site to put the robots.txt in the root of the website. Otherwise, the google posts and stuff could be either because he has no verification code for registering users (if the Google person is a user) and if it isn't, and it's just an ad, then you can't stop that because the ad could be what's helping pay for the website.
So it's beyond you, either way.

Posted Image


#9 lisa

lisa
  • Topic Starter

  • Members
  • 4 posts
  • OFFLINE
  •  
  • Local time:06:19 PM

Posted 15 July 2006 - 09:58 PM

Thanks everyone! The other site loves Google and won't do anything. I will work with Google some more and see if I can get specific problem off search. Wonder if it has a time period on Google that it comes off. Anyway, thank you so much for your time!
Lisa

#10 ussr1943

ussr1943

  • Members
  • 490 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:USA
  • Local time:06:19 PM

Posted 07 August 2006 - 11:11 AM

there is another way a little easyier maybe to keep robots out. in the head section of your pages put this in
<META NAME="robots"
Content="NOINDEX,NOFOLLOW">

so when a serchengines robot stumbles upon your page (assuming your page is pretty new and you actually own it) it will ignore the page, and not follow any the links you have on the page.

i hope that helps, although it looks like ppl already helped, i just thought this might be an easyier way
"Ideas are far more powerful than guns."
"The only truly secure system is one that is powered off, cast in a block of concrete and sealed in a lead-lined room with armed guards -- and even then I have my doubts." --Eugene H. Spafford
"One man's terrorist is another's freedom fighter"




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users