[FuzzyStringMatching - Web] Inconsistency between formula and code

Back to Forums

Ana Bocaniciu

Question

Forge

Hello Mukul,

The component is amazing and really helpful!

But there is something bothering me regarding the function from the extension. If you compare "john" with "john" with Threshold = 0.7 and MinMatchingChars = 3 (the default values) the Proximity returns 0.73. Whatever value you put in Threshold (from 0.1 to 0.9), the Proximity is 0.73. But when Threshold is 1, then Proximity returns also 1. This means that something changes the proximity when doing the computations in case the jaroDistance is not <= threshold.

I think in the extension function, at the final assignment the formula has a mistake:

In the wikipedia page you mentioned as reference, the formula is:

which in the code it translates to:

jaroDistance + SCALING_FACTOR * pos *(1 - jaroDistance)

which means that we add to the proximity returned by the jaro distance some extra points for having a common prefix, in order to reach 1 (perfect match).

Was it intentional? Maybe I understood the formula differently..

With kind regards,

Ana Bocaniciu

05 Feb 2020

Mukul Varshney

Ana Bocaniciu wrote:

Hello Mukul,

The component is amazing and really helpful!

I think in the extension function, at the final assignment the formula has a mistake:

In the wikipedia page you mentioned as reference, the formula is:

which in the code it translates to:

jaroDistance + SCALING_FACTOR * pos *(1 - jaroDistance)

which means that we add to the proximity returned by the jaro distance some extra points for having a common prefix, in order to reach 1 (perfect match).

Was it intentional? Maybe I understood the formula differently..

With kind regards,

Ana Bocaniciu

Thanks Ana.

It was long back I wrote this .Net code. To look into the code, my company's IT team need to install Visual Studio to open the extension. Once it get VS installed, I will look into it.

08 Feb 2020

Mukul Varshney

Ana Bocaniciu wrote:

Hello Mukul,

The component is amazing and really helpful!

I think in the extension function, at the final assignment the formula has a mistake:

In the wikipedia page you mentioned as reference, the formula is:

which in the code it translates to:

jaroDistance + SCALING_FACTOR * pos *(1 - jaroDistance)

which means that we add to the proximity returned by the jaro distance some extra points for having a common prefix, in order to reach 1 (perfect match).

Was it intentional? Maybe I understood the formula differently..

With kind regards,

Ana Bocaniciu

Hi Ana,

Today, I was able to look into the C# code. It took a long time to get VS installed.

I have fixed the problem and will publish the updated solution.

Thanks for bringing this into the notice.

Best regards,

Mukul.

14 Apr 2020

Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.

See the full guidelines