[Solved] How to cut up a string from an input file with Regex or String Split, C#?


Description

This regex will:

  • capture the fields values for subject, from, to, and read
  • the fields can be listed in the source string in any order
  • capture the date on the second line what starts with a single #

^\#(?!\#)([^\r\n]*)(?=.*?^Subject=([^\r\n]*))(?=.*?^From=([^\r\n]*))(?=.*?^To=([^\r\n]*))(?=.*?^Read=([^\r\n]*))

enter image description here

Expanded

  • ^\#(?!\#) match the first line which only has a single #
  • ([^\r\n]*) capture all characters on the line which are not new line characters
  • (?= start the lookahead for the subject, this format is the same for each of field being looked for. by using a lookahead the fields can appear in any order
    • .*?^Subject= travel through the string until the first occurance of subject at the start of the line
    • ([^\r\n]*) capture all non new line characters
    • ) close the lookahead
  • (?=.*?^From=([^\r\n]*)) repeat the process for from
  • (?=.*?^To=([^\r\n]*)) repeat for to
  • (?=.*?^Read=([^\r\n]*)) repeat for read

The order in which the lookaheads occur in the regular expression dictacts the order in the values are captured. You can add more fields or rearrange them as desired.

Example

Live example: http://www.rubular.com/r/8y1xQZYj05

Sample Text

##File List
#Tue Dec 13 14:27:43 CST 2011
Subject=Research paper.
From=zmeinecke
Cc=
AttachmentName=ADHD Medication Research Paper.docx
Read=true
parentId=
Bcc=
Date=1323748746221
format=blackboard.base.FormattedText$Type\:HTML
AttachmentId=b2cb1016f0b847a3bfae636988aa3f6a
To=ksanger;

Code

using System;
using System.Text.RegularExpressions;
namespace myapp
{
  class Class1
    {
      static void Main(string[] args)
        {
          String sourcestring = "source string to match with pattern";
          Regex re = new Regex(@"^\#(?!\#)([^\r\n]*)(?=.*?^Subject=([^\r\n]*))(?=.*?^From=([^\r\n]*))(?=.*?^To=([^\r\n]*))(?=.*?^Read=([^\r\n]*))",RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline | RegexOptions.Singleline);
          Match m = re.Match(sourcestring);
          for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
            {
              Console.WriteLine("[{0}] = {1}", re.GetGroupNames()[gIdx], m.Groups[gIdx].Value);
            }
        }
    }
}

Capture Groups

  1. gets the date
  2. gets subject
  3. gets from
  4. gets to
  5. gets read

[1] => Tue Dec 13 14:27:43 CST 2011
[2] => Research paper.
[3] => zmeinecke
[4] => ksanger;
[5] => true

4

solved How to cut up a string from an input file with Regex or String Split, C#?