[Solved] Extract string-token objects from string?


On request, here is how I would do this myself:

Screenshot of application

First, I want to create a function that performs this operation, so it can be reused every time we need to do this.

I could have this function return or populate a TList<TWordRec>, but then it would be tiresome to work with it, because the user of the function would then need to add try..finally blocks every time the function is used. Instead, I let it return a TArray<TWordRec>. By definition, this is simply array of TWordRec, that is, a dynamic array of TWordRecs.

But how to efficiently populate such an array? We all know you shouldn’t increase the length of a dynamic array one element at a time; besides, that requires a lot of code. Instead, I populate a local TList<TWordRec> and then, as a last step, create an array from it:

type
  TPhraseMatch = record
    Position: Integer;
    Text: string;
  end;

function GetPhraseMatches(const AText, APhrase: string): TArray<TPhraseMatch>;
begin

  var TextLower := AText.ToLower;
  var PhraseLower := APhrase.ToLower;

  var List := TList<TPhraseMatch>.Create;
  try

    var p := 0;
    repeat
      p := Pos(PhraseLower, TextLower, p + 1);
      if p <> 0 then
      begin
        var Match: TPhraseMatch;
        Match.Position := p - 1 {since the OP wants 0-based string indexing};
        Match.Text := Copy(AText, p, APhrase.Length);
        List.Add(Match);
      end;
    until p = 0;

    Result := List.ToArray;

  finally
    List.Free;
  end;

end;

Notice that I chose an alternative to the regular expression approach, just for educational reasons. I also believe this approach is faster. Also notice how easy it is to work with the TList<TWordRec>: it’s just like a TStringList but with word records instead of strings!

Now, let’s use this function:

procedure TWordFinderForm.ePhraseChange(Sender: TObject);
begin

  lbMatches.Items.BeginUpdate;
  try
    lbMatches.Items.Clear;
    for var Match in GetPhraseMatches(mText.Text, ePhrase.Text) do
      lbMatches.Items.Add(Match.Position.ToString + ':'#32 + Match.Text)
  finally
    lbMatches.Items.EndUpdate;
  end;

end;

Had I not chosen to use a function, but placed all code in one block, I could have iterated over the TList<TWordRec> in exactly the same way:

for var Match in List do

10

solved Extract string-token objects from string?