[Solved] Edit distance: Ignore start/end [closed]

Question

The code to do this is simple in concept. It’s your idea of what you’d like to ignore that you can add on your own:

#!perl
use v5.22;
use feature qw(signatures);
no warnings qw(experimental::signatures);

use Text::Levenshtein qw(distance);

say edit( "four", "foor" );
say edit( "four", "noise fo or blur" );

sub edit ( $start, $target ) {
    # transform strings to ignore what you want
    # ...
    distance( $start, $target )
    }

Maybe you want to check all substrings of the same length:

use v5.22;
use feature qw(signatures);
no warnings qw(experimental::signatures);

use Text::Levenshtein qw(distance);

say edit( "four", "foar" );
say edit( "four", "noise fo or blur" );

sub edit ( $start, $target ) {
    my $start_length = length $start;
    $target =~ s/\s+//g;
    my @all_n_chars = map {
        substr $target, $_, 4
        } 0 .. ( length($target) - $start_length );

    my $closest;
    my $closest_distance = $start_length + 1;
    foreach ( @all_n_chars ) {
        my $distance = distance( $start, $_ );
        if( $distance < $closest_distance ) {
            $closest = $_;
            $closest_distance = $distance;
            say "closest: $closest Distance: $distance";
            last if $distance == 0;
            }
        }

    return $closest_distance;
    }

This very simpleminded implementation finds what you want. However, realize that other random strings might accidentally have an edit distance that is lower.

closest: foar Distance: 1
1
closest: nois Distance: 3
closest: foor Distance: 1
1

You could extend this to remember the true starting positions of each string so you can find it again in the original, but this should be enough to send you on your way. If you wanted to use Python, I think the program might look very similar.

Accepted Answer

The code to do this is simple in concept. It’s your idea of what you’d like to ignore that you can add on your own:

#!perl
use v5.22;
use feature qw(signatures);
no warnings qw(experimental::signatures);

use Text::Levenshtein qw(distance);

say edit( "four", "foor" );
say edit( "four", "noise fo or blur" );

sub edit ( $start, $target ) {
    # transform strings to ignore what you want
    # ...
    distance( $start, $target )
    }

Maybe you want to check all substrings of the same length:

use v5.22;
use feature qw(signatures);
no warnings qw(experimental::signatures);

use Text::Levenshtein qw(distance);

say edit( "four", "foar" );
say edit( "four", "noise fo or blur" );

sub edit ( $start, $target ) {
    my $start_length = length $start;
    $target =~ s/\s+//g;
    my @all_n_chars = map {
        substr $target, $_, 4
        } 0 .. ( length($target) - $start_length );

    my $closest;
    my $closest_distance = $start_length + 1;
    foreach ( @all_n_chars ) {
        my $distance = distance( $start, $_ );
        if( $distance < $closest_distance ) {
            $closest = $_;
            $closest_distance = $distance;
            say "closest: $closest Distance: $distance";
            last if $distance == 0;
            }
        }

    return $closest_distance;
    }

This very simpleminded implementation finds what you want. However, realize that other random strings might accidentally have an edit distance that is lower.

closest: foar Distance: 1
1
closest: nois Distance: 3
closest: foor Distance: 1
1

You could extend this to remember the true starting positions of each string so you can find it again in the original, but this should be enough to send you on your way. If you wanted to use Python, I think the program might look very similar.