There are four basic functions that a search engine must perform:
- Gather a list of websites to crawl.
- Download the content of each of those web sites, and build up a mapping of “keywords” to pages.
- Allow users to type in keywords and then match those keywords against the mapping you built in step #2.
- Display the results from step #3 in a order that is relevant to the user.
It sounds simple, and if you have a small number of pages to search then it typically is. The difficulty comes from scaling from a 100s of pages to the billions of pages on the internet today.
Most of the difficulty – and what makes google better than many other engines – is not the technical ability to “search” billions of pages (that is, step 1-3), but deciding which of those billions of pages to show at (or near) the top of results (that’s step #4).
For example, when you type “stack overflow” into google, there’s 2.1 million pages in their index that matches those keywords: the thing that makes google good is it’s algorithm for deciding that this stack overflow should appear as the first result (as opposed to say, the wikipedia article on the subject)
The way they do that is the subject of many university student dissertations, white papers, books and speculation. Rest assured the actual algorithm is a closely guarded secret at google and I doubt there’s many who know the intimate details of every aspect of it. It’s also something that’s constantly changing.
solved Search Engine in php? [closed]