Hey, i'm in need of a challenge... Now i know i'm not going to be creating a google or yahoo, as i would need datacenters FULL of servers, but i'm more interested in the theory behind it and possible creating a small search engine.
From what i've gathered, thanks to google, it will be written in php, and i'm guessing i'll need a database to store the indexed pages in, but how do i index the pages? Would i need another server acting as a spider gathering data and filling the table, or could it be run off the same server?
i did find this pre-made program - Sphider but i don't think it automatically indexes pages, which is what i'm looking to do...
Any Idea's?
Page 1 of 1
Search Engine
#2
Posted 17 May 2009 - 11:11 PM
You crack me up..... you ask some of the hardest questions.
Ok, so here goes. In order to do your own search engine, you will first have to be able to crawl websites {which is not PHP, btw}. While you are building this spider, you are going to have to determine what iti is from each web-page that you care about. Google is pretty good at allowing plain questions. Wolfram|Alpha is trying to do a bit better by trying to suss out the intent of one's question, but failing miserably (IMHO).
How do you want to index the pages? That is sort of up to you. From a technical standpoint, you need to decide what you want to index before you decide how to index it. It is sort of a chicken or egg question.
A spider is just a small program. What you want to do with it is up to you; how you want to handle it is up to you. This is sort of a big question, because there are about an infinit number of ways to do what you want to do. Of course, most of them probably make no sense, but that is what computer science is for.
Ok, so here goes. In order to do your own search engine, you will first have to be able to crawl websites {which is not PHP, btw}. While you are building this spider, you are going to have to determine what iti is from each web-page that you care about. Google is pretty good at allowing plain questions. Wolfram|Alpha is trying to do a bit better by trying to suss out the intent of one's question, but failing miserably (IMHO).
Quote
but how do i index the pages?
How do you want to index the pages? That is sort of up to you. From a technical standpoint, you need to decide what you want to index before you decide how to index it. It is sort of a chicken or egg question.
Quote
Would i need another server acting as a spider gathering data and filling the table, or could it be run off the same server?
A spider is just a small program. What you want to do with it is up to you; how you want to handle it is up to you. This is sort of a big question, because there are about an infinit number of ways to do what you want to do. Of course, most of them probably make no sense, but that is what computer science is for.
"Take the risk of thinking for yourself, much more happiness, truth, beauty, and wisdom will come to you that way" - Christopher Hitchens
#3
Posted 17 May 2009 - 11:26 PM
lol, i'm glad i can crack you up...
Still, i don't really know how to CREATE it...
Still, i don't really know how to CREATE it...
If I am helping you and don't reply in 24 hours please send me a PM
There's no place like 127.0.0.1
There are 10 types of people in the world, those that can read binary, and those who can't.
There's no place like 127.0.0.1
There are 10 types of people in the world, those that can read binary, and those who can't.
#4
Posted 17 May 2009 - 11:43 PM
Create what? The spider? the database? The first step is to determine what it is you want to do. The major search engines employ thousands of people to accomplish what you want to do. I am not trying to discourage you or disparage you; I'm just wondering if you have any idea what it entails to do what you want? Have you any experience with software development at all? Do you have a unique slant on the problem of searching, or would it just be easier to use the results from Live Search or Google?
That's all I am asking.
That's all I am asking.
"Take the risk of thinking for yourself, much more happiness, truth, beauty, and wisdom will come to you that way" - Christopher Hitchens
#5
Posted 18 May 2009 - 12:27 AM
Quote
Create what? The spider? the database? The first step is to determine what it is you want to do.
I could probably create the database easy enough, it'd be the spider that i'd find hard. I'm mainly only looking to learn the theory behind it, although it would be good to put it into practise.
Quote
The major search engines employ thousands of people to accomplish what you want to do. Have you any experience with software development at all?
I realise this is a big job, i have had a bit of experience with visual basic, other than that not to much, but am eager to learn.
what do you think of that link i posted earlier?
If I am helping you and don't reply in 24 hours please send me a PM
There's no place like 127.0.0.1
There are 10 types of people in the world, those that can read binary, and those who can't.
There's no place like 127.0.0.1
There are 10 types of people in the world, those that can read binary, and those who can't.
#6
Posted 18 May 2009 - 04:53 AM
finished... i ended up using the premade one i mentioned above... sphider
very handy for working out how they work, on a small scale... that solves that question then!
very handy for working out how they work, on a small scale... that solves that question then!
If I am helping you and don't reply in 24 hours please send me a PM
There's no place like 127.0.0.1
There are 10 types of people in the world, those that can read binary, and those who can't.
There's no place like 127.0.0.1
There are 10 types of people in the world, those that can read binary, and those who can't.
#7
Posted 18 May 2009 - 08:12 PM
i had no idea that a crawler was not actually php... maybe i should look into coding with VB?
what are you suggestions? are there many other languages that are used with web sites to add functionality to them?
(for some reseaon i have the feeling there's going to be a massive list)...
what are you suggestions? are there many other languages that are used with web sites to add functionality to them?
(for some reseaon i have the feeling there's going to be a massive list)...
If I am helping you and don't reply in 24 hours please send me a PM
There's no place like 127.0.0.1
There are 10 types of people in the world, those that can read binary, and those who can't.
There's no place like 127.0.0.1
There are 10 types of people in the world, those that can read binary, and those who can't.
#8
Posted 21 May 2009 - 01:20 AM
you should read this article hope this article will help you a lot..
http://www.webdevelopersnotes.com/resource...rch_engine.php3
http://www.webdevelopersnotes.com/resource...rch_engine.php3
Share this topic:
Page 1 of 1

Help


Back to top










