search using like

search using like

  
So, when I do a search for something like "ebeam" vs "e-beam" like doesn't get both.  So if I search for ebeam I would like to also find items such as e-beam.   This was including using like.  Is there a better way to do this?
Solution
Hello Jason.

You would have to implement stemming on your system - in information retrieval, this is a process that reduces words to a common root. You can think of a stemmer as a map from the real world vocabulary to a "root dictionary" (which would only include the "roots" or the concepts of each real word term). A stemmer usually reduces plurals (cats and cat maps to cat), verb tensions (am and was could map to be), and synonyms (advocate could map to lawyer).

For example, in your case, your stemmer could map e-beam and ebeam to the root beam. It could also map i-beam and ibeam to beam.

You can implement generic stemmers (there are some well-known algorithms for english text), or rule-based stemmers. But you will most likely have to tune your stemmer based on actual user queries and expected results.

Now what you would have to do is to store the stemmed terms in another database attribute (let's name it StemmedText). So if you have a record containing the text e-beam, in your StemmedText attribute it will store the text beam.
When the user searches with the text e-beam, you will have to stemm the query text (which will give you beam), and then use a LIKE on the StemmedText attribute. Note, however, that if the user enters the text i-beam, he/she will find results containing e-beam (because both words stemms to the same root).

Hope this helps.
Solution
I can see that that would be a large undertaking.  Is there any kind of add on, forge item or commerical product that would do this automatically?
what about doing it yourself?

just replace not alphabetic/numeric characters with % ?
it's not great, but you probably get better results?

true, when you search for ebeam you don't get e-beam.


J - what if I did a combination of the above where for all my equipment, when someone modifies a field I am using for the search I have a collection of words which is those fields with any non alphanumeric fields removed.  Then on my search when they enter their search term I take out any alpha numerics and replace it with %.  So if on the piece of equipment they had entered e-beam, it would be saved in my "search words" as ebeam.  Then if someone searches for ebeam, obviously they will find it.  If they enter e-beam as a search term, before I do my actual search it would replace it such that the actual search word is e%beam.  Wouldn't that then return what I want?
probably yes, since % is to check zero or more characters..

That's basically the approach I suggested. Except that you want to apply the exact same procedure on the users' query.

If e-beam is saved as ebeam on your "search words", then a user querying for e-beam would want a LIKE '%ebeam%' - not a LIKE 'e%beam'.

Querying for 'e%beam' would retrieve unwanted search results - such as "electricity-beam" or "eleven inches i-beam".
Ah ok!  Thank you Leonardo and J!
Hi Jason. I don't know if you still need to address this issue, but I have made this forge component to provide an alternative search method. Make sure to check it out.

Leonardo Fernandes
Leonardo,

Yes, we are using your forge component and I was the person who posted on the forge site how great your documentation is.  The issue here wasn't so much the autocomplete widget itself (I really like your btw and moving forward we are implementing it across our site) but how the search itself handles special characters.  In other words if I put in sem vs s.e.m. will the search complete and doing that both directions (which is the search term and which is the existing value in the database) is getting complicated.
Yes, Jason, I understand.

The new version of the Search and Autocomplete includes a feature called Ranked Search. It uses a special index that matches keywords, instead of using LIKE conditions. This improves performance and delivers more meaningful search results. Check here or here for a demo. Check also its license terms.

This component does include a very generic stemmer, that works as expected with hyphenated words. In your original example, if a document contains the text e-beam, it will match all the following queries: e-beam, ebeam, and e beam.
The same doesn't happen with abbreviations, because I didn't thought about it. But now it is likely that I will include support for them.

Best regards,
Leonardo Fernandes
NICE!!!  I will be trying this hopefully this week.  I did not know about that feature.
Ah, just saw that the ranked search includes a license fee of $800.
Yes, I'm afraid it does. However, you can use it for free on development environments.