Simple text search in PHP
To search my music-collection via my browser, I’m currently browsing and searching my webbased interface with the browser itself.
I head my Firefox to http://fileserver/music/, and I get the FancyIndexed list of music on the M-partition. As this list is growing to be quite large nowadays, I can still search for albums by ctrl-f’ing to my desired destination; yet this doesn’t work for individual tracks.
I’ve been building a text-based index of all tracks and albums on the disk, via a daily scheduled bat-file which exports its result to a plain text file.
To search this text file, I’ve created a small search-script in PHP, which can search all entries for a certain string. All is well up until this point.
Now here’s the catch: I want to implement a boolean search with the basic operands such as AND, OR and NOT available to be used in the search-query.
This would be my ideal form of searching with multiple arguments. I’ve got this implemented at this moment, it however searches for the concatenation of the entire string, instead of the existence of all or at least one of the substrings in an entry.
For instance: I want my algorithm to find “(09) – Technohead – I Wanna Be A Hippy.mp3” when I search for “technohead hippy”. The algorithm should match both “technohead” and “hippy”, regardless of their location in the string. At this moment, the search for the same arguments wouldn’t yield this track as a result as the exact string “technohead hippy” does not occur in the entry. Searching on “technohead” or “hippy” would point to this track, however it is not ideal as many other tracks will also match the widened search.
Now going back to why I’m posting this: Is there anyone who can point me to a proper, text based, solution?
I know it can quite easily be solved with a fulltext search in a database, but I would like to stay away from the DB as long as possible.
Also, re-searching all hits on the first argument (when AND is given in the search) with all other arguments is a possibility. The issue is that the other operands like OR and NOT are a little harder to implement.
I feel a nudge in the right direction would help me a lot, so I’m appealing to the better nature of all readers here. 😉
EDIT:
I started implementing an auto-AND functionailty; while still working from my plain-text file.
I created the following code.
// insert some form-stuff $haystack = file(â€filelist.txtâ€); $needles = array_reverse(explode(†“, chop($formData['needle']))); $i = sizeof($haystack); foreach($needles as $needle) { $haystack = searchStack($needle, $haystack); } $j = sizeof($haystack); if($j == 0) { $sOut .= “Helaas is er niets gevonden. “; } else { $sOut .= formatFound($haystack); } $sOut .= ‘ Found: ‘.$j.’ items. Searched ‘.$i.’ files and folders.’; function searchStack($needle, $haystack) { $tempstack = array(); foreach($haystack as $straw) { if(stristr($straw, $needle) !== false) { array_push($tempstack, $straw); } } return $tempstack; } function formatFound($haystack) { $sOut = “â€; foreach($haystack as $straw) { // do some magical tricks in modifying the filename & location to a URL } return $sOut; }
Any comments are always welcome.
Part of being a good programmer is picking the best tool for the job.
Most databases offer various search solutions for exactly the job you want to do. Staying away from it would, then, make you…?
Free cookie for the person to guess the correct answer. No deliveries, unfortunately.
As for nudges, can’t you ‘just’ perform two separate searches for the OR followed by a merge and do a filter for the NOT operator? I know it’ll increase load and decrease performance, but seriously, that’s the consequence of not picking the best tool for the job.
In addition, thinking in Java-ey terms, you could, instead of concating the search string, make it a set of ‘search objects’, consisting of string and operator, and ‘iterate’ over all song titles for all the search objects. But that’s a bit of work for something simple…
Expanding on the search objects idea:
Three types of search objects:
AND
OR
NOT
For all search terms (every word in your search term) you create an object with the desired operator. Then you iterate over all song titles and only include the song title if it includes ALL the search objects of type AND, OR if it includes one of the search objects of type OR, and ONLY if it does not include one of the objects of type NOT.
As said, it’s a bit complex for a simple tool, but eh.
I really like your idea of the Java-tool, and heck: I even considered building one before me posting this. Extensibility, and just the fun of building it were it’s pro’s, the con’s of it being a little overdone and the necessary integration with Apache kind of held me back from creating such a tool.
The reason for me to not use mySQL or the likings is that I really like the simple idea of building my own search-tool with does its work without the overhead of bigger platforms and on a plain text file.
Blah, I guess it’ll be the DB then; incrementally updating the DB will be my first task 😉
I’m still open for any additions or nudges though 😀