BleepingComputer.com: Search Engine

Jump to content


Register a free account to unlock additional features at BleepingComputer.com
Welcome to BleepingComputer, a free community where people like yourself come together to discuss and learn how to use their computers. Using the site is easy and fun. As a guest, you can browse and view the various discussions in the forums, but can not create a new topic or reply to an existing one unless you are logged in. Other benefits of registering an account are subscribing to topics and forums, creating a blog, and having no ads shown anywhere on the site.

Click here to Register a free account now! or read our Welcome Guide to learn how to use this site.

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

Search Engine

#1 User is offline   KamakaZ 

  • Senior Member
  • PipPipPipPip
  • Find Topics
  • Group: Members
  • Posts: 548
  • Joined: 26-August 08
  • Gender:Male
  • Location:Victoria

Posted 17 May 2009 - 10:39 PM

Hey, i'm in need of a challenge... Now i know i'm not going to be creating a google or yahoo, as i would need datacenters FULL of servers, but i'm more interested in the theory behind it and possible creating a small search engine.

From what i've gathered, thanks to google, it will be written in php, and i'm guessing i'll need a database to store the indexed pages in, but how do i index the pages? Would i need another server acting as a spider gathering data and filling the table, or could it be run off the same server?

i did find this pre-made program - Sphider but i don't think it automatically indexes pages, which is what i'm looking to do...

Any Idea's?
If I am helping you and don't reply in 24 hours please send me a PM

There's no place like 127.0.0.1
There are 10 types of people in the world, those that can read binary, and those who can't.

#2 User is offline   groovicus 

  • Hail Groovicus!
  • PipPipPipPipPipPip
  • Find Topics
  • Group: Moderator
  • Posts: 9,522
  • Joined: 05-June 04
  • Gender:Male
  • Location:Centerville, SD

Posted 17 May 2009 - 11:11 PM

You crack me up..... you ask some of the hardest questions.

Ok, so here goes. In order to do your own search engine, you will first have to be able to crawl websites {which is not PHP, btw}. While you are building this spider, you are going to have to determine what iti is from each web-page that you care about. Google is pretty good at allowing plain questions. Wolfram|Alpha is trying to do a bit better by trying to suss out the intent of one's question, but failing miserably (IMHO).

Quote

but how do i index the pages?

How do you want to index the pages? That is sort of up to you. From a technical standpoint, you need to decide what you want to index before you decide how to index it. It is sort of a chicken or egg question.

Quote

Would i need another server acting as a spider gathering data and filling the table, or could it be run off the same server?

A spider is just a small program. What you want to do with it is up to you; how you want to handle it is up to you. This is sort of a big question, because there are about an infinit number of ways to do what you want to do. Of course, most of them probably make no sense, but that is what computer science is for. :thumbsup:
"Take the risk of thinking for yourself, much more happiness, truth, beauty, and wisdom will come to you that way" - Christopher Hitchens

#3 User is offline   KamakaZ 

  • Senior Member
  • PipPipPipPip
  • Find Topics
  • Group: Members
  • Posts: 548
  • Joined: 26-August 08
  • Gender:Male
  • Location:Victoria

Posted 17 May 2009 - 11:26 PM

lol, i'm glad i can crack you up...

Still, i don't really know how to CREATE it...
If I am helping you and don't reply in 24 hours please send me a PM

There's no place like 127.0.0.1
There are 10 types of people in the world, those that can read binary, and those who can't.

#4 User is offline   groovicus 

  • Hail Groovicus!
  • PipPipPipPipPipPip
  • Find Topics
  • Group: Moderator
  • Posts: 9,522
  • Joined: 05-June 04
  • Gender:Male
  • Location:Centerville, SD

Posted 17 May 2009 - 11:43 PM

Create what? The spider? the database? The first step is to determine what it is you want to do. The major search engines employ thousands of people to accomplish what you want to do. I am not trying to discourage you or disparage you; I'm just wondering if you have any idea what it entails to do what you want? Have you any experience with software development at all? Do you have a unique slant on the problem of searching, or would it just be easier to use the results from Live Search or Google?

That's all I am asking.
"Take the risk of thinking for yourself, much more happiness, truth, beauty, and wisdom will come to you that way" - Christopher Hitchens

#5 User is offline   KamakaZ 

  • Senior Member
  • PipPipPipPip
  • Find Topics
  • Group: Members
  • Posts: 548
  • Joined: 26-August 08
  • Gender:Male
  • Location:Victoria

Posted 18 May 2009 - 12:27 AM

Quote

Create what? The spider? the database? The first step is to determine what it is you want to do.


I could probably create the database easy enough, it'd be the spider that i'd find hard. I'm mainly only looking to learn the theory behind it, although it would be good to put it into practise.

Quote

The major search engines employ thousands of people to accomplish what you want to do. Have you any experience with software development at all?


I realise this is a big job, i have had a bit of experience with visual basic, other than that not to much, but am eager to learn.

what do you think of that link i posted earlier?
If I am helping you and don't reply in 24 hours please send me a PM

There's no place like 127.0.0.1
There are 10 types of people in the world, those that can read binary, and those who can't.

#6 User is offline   KamakaZ 

  • Senior Member
  • PipPipPipPip
  • Find Topics
  • Group: Members
  • Posts: 548
  • Joined: 26-August 08
  • Gender:Male
  • Location:Victoria

Posted 18 May 2009 - 04:53 AM

finished... i ended up using the premade one i mentioned above... sphider

very handy for working out how they work, on a small scale... that solves that question then!
If I am helping you and don't reply in 24 hours please send me a PM

There's no place like 127.0.0.1
There are 10 types of people in the world, those that can read binary, and those who can't.

#7 User is offline   KamakaZ 

  • Senior Member
  • PipPipPipPip
  • Find Topics
  • Group: Members
  • Posts: 548
  • Joined: 26-August 08
  • Gender:Male
  • Location:Victoria

Posted 18 May 2009 - 08:12 PM

i had no idea that a crawler was not actually php... maybe i should look into coding with VB?

what are you suggestions? are there many other languages that are used with web sites to add functionality to them?
(for some reseaon i have the feeling there's going to be a massive list)...
If I am helping you and don't reply in 24 hours please send me a PM

There's no place like 127.0.0.1
There are 10 types of people in the world, those that can read binary, and those who can't.

#8 User is offline   harish kumar 

  • Member
  • PipPip
  • Find Topics
  • Group: Members
  • Posts: 18
  • Joined: 18-May 09

Posted 21 May 2009 - 01:20 AM

you should read this article hope this article will help you a lot..
http://www.webdevelopersnotes.com/resource...rch_engine.php3

Share this topic:


Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users