Autocomplete with massive dataset

Autocomplete with massive dataset

Hi all,

Anybody know how to go about using the autocomplete on a massive dataset ?
Let's say 100.000 records ...

I've seen this but don't know how to create that TRIE (no misspelling...)

Hi Joop,

Looked that info about the Ties ..and it seems a bit over-engineering.

First how many results do you want to display at a given time? A good autocomplete should only display 5-7 sugestions for a given input. (yes I seen autocompletes with drop downs of 5000+ results ..noone will ever find anything there)

If sugestions are good enough how many characters would the user write?

In a given period of time how many different inputs are sent to server before users get their full result?

If the inputs->results pairs are not a huge size I would consider doing it on a very simple aproach ..without going into Ties or complex algorithms.
And if database performance is an issue, use cache in the Actions to reduce the load.

João Rosado
Hi Joop, 

I'm beliving you already try this one Index(TextToSearchWhatYouWant,"["+Entity.Attribute+"]")>-1

You can use this where clause in a simple query to perform a better performance in the search with text attributes. After all, 100k its not 1M ;)

Might be difficult to implement a Trie using Outsystems only, because it doesn't allow recursive data structures.

This is just a guess, but what about implementing a few rules to reduce those 100k records?

  • Limit query maximum rows.
  • Only query for the auto complete when the search string has more than N characters, where N is whatever you think is right.
Hi Joop,

"Only query for the auto complete when the search string has more than N characters, where N is whatever you think is right."

I would go along with António in this one, its the simpler to implement and you avoid the really heavy first queries where you search for anything with that first letter the user wrote.

Hermínio Mira

merge both solutions, António and mine, don't use LIKE in the query as you probably know ;)

Usually tries are used when the dataset is loaded into memory of your application or some kind of data structure where you have the full dataset. This isn't usually the case with web applications that feed from the database. In this case, having a good structure for a fast lookup is the database's responsibility, not the programmers' business logic.

Implementing an autocomplete over a massive dataset which is stored in a database may actually be as simple as performing the infamous LIKE query with a limited number of results and no "order by" clause. If you limit your query to 11 results and only pass 10 to the autocomplete, you may use the existence of the 11th record to signal a "there's more results" to be shown somewhere, tipping the user to refine his search query.

I'd really advise you to try this simpler solution before digging into more complex data structures and algorithms that may not actually be required to solve your particular problem.

@ricardo : please explain where should that be used ?
@Hermino: did that already, 3 chars before search
@Joao: the pitty is it needs to search for addresses in a major city of the netherlands :-), and it uses a like %xxx%


You don't need to use the "LIKE %InputParameter%".

Replace the "Entity.Address LIKE %InputParameter%" condition with what Ricardo sugested: Index(InputParameter,"["+Entity.Address+"]")>-1

Best regards.
Hey Guys,

Tried this Index stuff, but it doesn't work.
Please can you provide a sample OML to see this working !

Here's a screenshot Joop. Hope it helps.
Duh ... the parameters are the aother way around :-)

 Index(  "["+Entity.Attribute+"]" , TextToSearchWhatYouWant )>-1 
Hi Joop,
Check this image,

this two conditions have the same value, but index is actualy a litle bit faster than like.

Carlos Rocha