[FuzzyStringMatching - Web] Inconsistency between formula and code

Forge Component
(0)
Published on 2019-01-04 by Mukul Varshney
0 votes
Published on 2019-01-04 by Mukul Varshney

Hello Mukul,

The component is amazing and really helpful!

But there is something bothering me regarding the function  from the extension. If you compare "john" with "john" with Threshold = 0.7 and MinMatchingChars = 3 (the default values) the Proximity returns 0.73. Whatever value you put in Threshold (from 0.1 to 0.9), the Proximity is 0.73. But when Threshold is 1, then Proximity returns also 1. This means that something changes the proximity when doing the computations in case the jaroDistance is not <= threshold.

I think in the extension function, at the final assignment the formula has a mistake:

In the wikipedia page you mentioned as reference, the formula is: 

which in the code it translates to:

jaroDistance + SCALING_FACTOR * pos *(1 - jaroDistance)

which means that we add to the proximity returned by the jaro distance some extra points for having a common prefix, in order to reach 1 (perfect match).


Was it intentional? Maybe I understood the formula differently..


With kind regards,

Ana Bocaniciu

Ana Bocaniciu wrote:

Hello Mukul,

The component is amazing and really helpful!

But there is something bothering me regarding the function  from the extension. If you compare "john" with "john" with Threshold = 0.7 and MinMatchingChars = 3 (the default values) the Proximity returns 0.73. Whatever value you put in Threshold (from 0.1 to 0.9), the Proximity is 0.73. But when Threshold is 1, then Proximity returns also 1. This means that something changes the proximity when doing the computations in case the jaroDistance is not <= threshold.

I think in the extension function, at the final assignment the formula has a mistake:

In the wikipedia page you mentioned as reference, the formula is: 

which in the code it translates to:

jaroDistance + SCALING_FACTOR * pos *(1 - jaroDistance)

which means that we add to the proximity returned by the jaro distance some extra points for having a common prefix, in order to reach 1 (perfect match).


Was it intentional? Maybe I understood the formula differently..


With kind regards,

Ana Bocaniciu

Thanks Ana.

It was long back I wrote this .Net code. To look into the code, my company's IT team need to install Visual Studio to open the extension. Once it get VS installed, I will look into it.