Jump to content


 


Register a free account to unlock additional features at BleepingComputer.com
Welcome to BleepingComputer, a free community where people like yourself come together to discuss and learn how to use their computers. Using the site is easy and fun. As a guest, you can browse and view the various discussions in the forums, but can not create a new topic or reply to an existing one unless you are logged in. Other benefits of registering an account are subscribing to topics and forums, creating a blog, and having no ads shown anywhere on the site.


Click here to Register a free account now! or read our Welcome Guide to learn how to use this site.

Photo

Search Engine


  • Please log in to reply
7 replies to this topic

#1 KamakaZ

KamakaZ

  • Members
  • 739 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:Victoria
  • Local time:09:22 AM

Posted 17 May 2009 - 10:39 PM

Hey, i'm in need of a challenge... Now i know i'm not going to be creating a google or yahoo, as i would need datacenters FULL of servers, but i'm more interested in the theory behind it and possible creating a small search engine.

From what i've gathered, thanks to google, it will be written in php, and i'm guessing i'll need a database to store the indexed pages in, but how do i index the pages? Would i need another server acting as a spider gathering data and filling the table, or could it be run off the same server?

i did find this pre-made program - Sphider but i don't think it automatically indexes pages, which is what i'm looking to do...

Any Idea's?

There's no place like 127.0.0.1
There are 10 types of people in the world, those that can read binary, and those who can't.


BC AdBot (Login to Remove)

 


#2 groovicus

groovicus

  • Security Colleague
  • 9,963 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:Centerville, SD
  • Local time:04:22 PM

Posted 17 May 2009 - 11:11 PM

You crack me up..... you ask some of the hardest questions.

Ok, so here goes. In order to do your own search engine, you will first have to be able to crawl websites {which is not PHP, btw}. While you are building this spider, you are going to have to determine what iti is from each web-page that you care about. Google is pretty good at allowing plain questions. Wolfram|Alpha is trying to do a bit better by trying to suss out the intent of one's question, but failing miserably (IMHO).

but how do i index the pages?

How do you want to index the pages? That is sort of up to you. From a technical standpoint, you need to decide what you want to index before you decide how to index it. It is sort of a chicken or egg question.

Would i need another server acting as a spider gathering data and filling the table, or could it be run off the same server?

A spider is just a small program. What you want to do with it is up to you; how you want to handle it is up to you. This is sort of a big question, because there are about an infinit number of ways to do what you want to do. Of course, most of them probably make no sense, but that is what computer science is for. :thumbsup:

#3 KamakaZ

KamakaZ
  • Topic Starter

  • Members
  • 739 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:Victoria
  • Local time:09:22 AM

Posted 17 May 2009 - 11:26 PM

lol, i'm glad i can crack you up...

Still, i don't really know how to CREATE it...

There's no place like 127.0.0.1
There are 10 types of people in the world, those that can read binary, and those who can't.


#4 groovicus

groovicus

  • Security Colleague
  • 9,963 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:Centerville, SD
  • Local time:04:22 PM

Posted 17 May 2009 - 11:43 PM

Create what? The spider? the database? The first step is to determine what it is you want to do. The major search engines employ thousands of people to accomplish what you want to do. I am not trying to discourage you or disparage you; I'm just wondering if you have any idea what it entails to do what you want? Have you any experience with software development at all? Do you have a unique slant on the problem of searching, or would it just be easier to use the results from Live Search or Google?

That's all I am asking.

#5 KamakaZ

KamakaZ
  • Topic Starter

  • Members
  • 739 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:Victoria
  • Local time:09:22 AM

Posted 18 May 2009 - 12:27 AM

Create what? The spider? the database? The first step is to determine what it is you want to do.


I could probably create the database easy enough, it'd be the spider that i'd find hard. I'm mainly only looking to learn the theory behind it, although it would be good to put it into practise.

The major search engines employ thousands of people to accomplish what you want to do. Have you any experience with software development at all?


I realise this is a big job, i have had a bit of experience with visual basic, other than that not to much, but am eager to learn.

what do you think of that link i posted earlier?

There's no place like 127.0.0.1
There are 10 types of people in the world, those that can read binary, and those who can't.


#6 KamakaZ

KamakaZ
  • Topic Starter

  • Members
  • 739 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:Victoria
  • Local time:09:22 AM

Posted 18 May 2009 - 04:53 AM

finished... i ended up using the premade one i mentioned above... sphider

very handy for working out how they work, on a small scale... that solves that question then!

There's no place like 127.0.0.1
There are 10 types of people in the world, those that can read binary, and those who can't.


#7 KamakaZ

KamakaZ
  • Topic Starter

  • Members
  • 739 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:Victoria
  • Local time:09:22 AM

Posted 18 May 2009 - 08:12 PM

i had no idea that a crawler was not actually php... maybe i should look into coding with VB?

what are you suggestions? are there many other languages that are used with web sites to add functionality to them?
(for some reseaon i have the feeling there's going to be a massive list)...

There's no place like 127.0.0.1
There are 10 types of people in the world, those that can read binary, and those who can't.


#8 harish kumar

harish kumar

  • Members
  • 18 posts
  • OFFLINE
  •  
  • Local time:03:52 AM

Posted 21 May 2009 - 01:20 AM

you should read this article hope this article will help you a lot..
http://www.webdevelopersnotes.com/resource...rch_engine.php3




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users