[FuzzyStringMatching - Web] Inconsistency between formula and code
Question
Forge component by Mukul Varshney

Hello Mukul,

The component is amazing and really helpful!

But there is something bothering me regarding the function  from the extension. If you compare "john" with "john" with Threshold = 0.7 and MinMatchingChars = 3 (the default values) the Proximity returns 0.73. Whatever value you put in Threshold (from 0.1 to 0.9), the Proximity is 0.73. But when Threshold is 1, then Proximity returns also 1. This means that something changes the proximity when doing the computations in case the jaroDistance is not <= threshold.

I think in the extension function, at the final assignment the formula has a mistake:

In the wikipedia page you mentioned as reference, the formula is: 

which in the code it translates to:

jaroDistance + SCALING_FACTOR * pos *(1 - jaroDistance)

which means that we add to the proximity returned by the jaro distance some extra points for having a common prefix, in order to reach 1 (perfect match).


Was it intentional? Maybe I understood the formula differently..


With kind regards,

Ana Bocaniciu

Ana Bocaniciu wrote:

Hello Mukul,

The component is amazing and really helpful!

But there is something bothering me regarding the function  from the extension. If you compare "john" with "john" with Threshold = 0.7 and MinMatchingChars = 3 (the default values) the Proximity returns 0.73. Whatever value you put in Threshold (from 0.1 to 0.9), the Proximity is 0.73. But when Threshold is 1, then Proximity returns also 1. This means that something changes the proximity when doing the computations in case the jaroDistance is not <= threshold.

I think in the extension function, at the final assignment the formula has a mistake:

In the wikipedia page you mentioned as reference, the formula is: 

which in the code it translates to:

jaroDistance + SCALING_FACTOR * pos *(1 - jaroDistance)

which means that we add to the proximity returned by the jaro distance some extra points for having a common prefix, in order to reach 1 (perfect match).


Was it intentional? Maybe I understood the formula differently..


With kind regards,

Ana Bocaniciu

Thanks Ana.

It was long back I wrote this .Net code. To look into the code, my company's IT team need to install Visual Studio to open the extension. Once it get VS installed, I will look into it.


Ana Bocaniciu wrote:

Hello Mukul,

The component is amazing and really helpful!

But there is something bothering me regarding the function  from the extension. If you compare "john" with "john" with Threshold = 0.7 and MinMatchingChars = 3 (the default values) the Proximity returns 0.73. Whatever value you put in Threshold (from 0.1 to 0.9), the Proximity is 0.73. But when Threshold is 1, then Proximity returns also 1. This means that something changes the proximity when doing the computations in case the jaroDistance is not <= threshold.

I think in the extension function, at the final assignment the formula has a mistake:

In the wikipedia page you mentioned as reference, the formula is: 

which in the code it translates to:

jaroDistance + SCALING_FACTOR * pos *(1 - jaroDistance)

which means that we add to the proximity returned by the jaro distance some extra points for having a common prefix, in order to reach 1 (perfect match).


Was it intentional? Maybe I understood the formula differently..


With kind regards,

Ana Bocaniciu

Hi Ana,

Today, I was able to look into the C# code. It took a long time to get VS installed.


I have fixed the problem and will publish the updated solution.

Thanks for bringing this into the notice.

Best regards,

Mukul.

Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.