Building a Simple Search Engine
I wrote most of the search engine script in PHP that, if you are brand new to programming, is a good place to start.
The raw data consisted of 3 fields: The ticker symbol, the security name along with the exchange.
I loaded this into the database with code rather than externally through any admin functions. Why? The fact is I struggled with the interface for some time (GoDaddy) and could not get the information to load properly. As it wasn't a big job anyhow, I just wrote a little program to do it. For every row of data entered to the database it published a confirmation lineup. Actually this functioned as a fantastic warm-up to be sure I had a deal on database connections and basic MySQL code.
This was actually quite enjoyable! I'd never loaded a large database onto an outside server, also found it amazing to see how fast each one of the data loaded. All 9000+ symbols full in under 15 seconds.
After the information was loaded, the question was: How do I get out of a raw database full of inventory names and symbols into a true search engine?
Well the first thing I had to do was to make an input box to have the user query.
So I started with a super-simple HTML page with a bare-bones user type with the sort of recognizable input box used for anything from a Google search to some captcha confirmation. It requires the user info and calls another page called process_query. php.
Most of the "action" was to occur there.
Now, unlike the site which has the familiar tabbed extension, the process_query page has the php extension. This essentially alerts the host (the computer which 'serves' the pages to your browser) to anticipate special php code.
Roughly speaking php extends the performance of html, though needless to say is a fully functional language in its own right. A page with the php extension will properly display code written entirely in html. However with the php extension you have all of the energy of php waiting at your fingertips. All you have to do is invoke to special compiler working with the proper opening tag.
That is where all the fun starts!
Once the form from the webpage invoked the process_query page, I needed to take care of converting the raw values entered by the user into something that the waiter could utilize. It ends up that this stage of interaction between individual and server can function as a launch point for a whole lot of bugs and security issues. So I added several purposes to wash the worth and make them safe and palatable to the host.
But now came the difficult part.
I had to query the MySQL database to get a whatever the user entered into the input box. I had no MySQL expertise and discovered relational database talks arcane and incomprehensible. So simplicity was going to be key. Obviously not my ultimate aim, however, you gotta walk before you can run, since they say.
Honestly I figured it would take me a day or 2 of internet searching and code trials simply to do this however, surprisingly: It took approximately 10 minutes!
That is appropriate. Ten minutes.
It turns out that querying the database for a single area is the MySQL equivalent of a child learning to say "Da Da".
Within a half an hour I had enlarged into a two element hunt and was thankfully searching and retrieving stock symbols out of this database.
This worked well at obtaining a record of somewhat screened consequences but it vastly improved the number of results (since I used the OR operator) and put all the more tension on the next phase of the operation: sorting the results based on relevance.
Now I can not speak to the trials and tribulations of large search engine contractors, but in the relatively small scale of a stock ticker search, significance sortation is a much bigger task than just pulling the items from the database.
First of all you have the significance level depending on the positioning of a certain character group in a word. If a user types "sil" we'd naturally assume that the user was trying to find a word which started with those letters. It is more likely the user was searching for "silver" or "silicon" than "fossil" or even "intersil".