[Solved] Preg_match on multiline text


The idea is to avoid the issue of new lines by not using the dot ($subject is your string):

$pattern = '~
    Project\hNo\.\h\d++\hDATE\h
    (?<date>\d{1,2}\/\d{1,2}\/\d{1,2})
    \s++No\.\hQUESTION\hANSWER\s++
    (?<No>\d++)\s++

    # all characters but D or D not followed by "ate Required"
    (?<desc>(?>[^D]++|D(?!ate\hRequired))+)

    \D++
    (?<date_required>\d{1,2}\/\d{1,2}\/\d{1,2})
~x';

preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);

print_r($matches);

Note that i use possessive quantifiers and atomic groups to avoid as possible backtracks

EDIT:

According to your new example string, i give you a new type of pattern (in a kind of lex style) that is more readable and more editable:

$pattern = <<<LOD
~
 # Raw types
 (?(DEFINE)(?<uint>  \d++                      ))
 (?(DEFINE)(?<date>  \d{1,2}\/\d{1,2}\/\d{1,2} ))

 # Custom types
 (?(DEFINE)(?<void>  (?>\s++|<br\b[^>]*+>)*           ))
 (?(DEFINE)(?<desc>  (?>[^D]++|D(?!ate\h++Required))+ ))

 # Anchors
 (?(DEFINE)(?<A_prj_date>      PROJECT(?>[^D]++|D(?!ATE\b))+DATE\h*+    ))
 (?(DEFINE)(?<A_prj_number>    \g<void>No\.\h++QUESTION\h++ANSWER\b\D++ ))
 (?(DEFINE)(?<A_prj_desc>      \g<void>                                 ))
 (?(DEFINE)(?<A_prj_date_req>  Date\h++Required\D++                     ))

 # Pattern
 \g<A_prj_date>     (?<prj_date>      \g<date> )
 \g<A_prj_number>   (?<prj_number>    \g<uint> )
 \g<A_prj_desc>     (?<prj_desc>      \g<desc> )
 \g<A_prj_date_req> (?<prj_date_req>  \g<date> )    

~xi
LOD;

It begins with the definition of each component you need.

  1. Raw types: subpatterns that you can see everywhere
  2. Custom types: subpatterns specific to your project
  3. Anchors: subpatterns that describe the transition between required fields

After, you have the pattern itself composed with these elements.

You obtain something highly editable, since you can adapt all subpatterns to you needs, add new subpatterns and compose new subpatterns with others.

Example, you can try to replace the A_prj_number subpattern with \D++ that seems to be good enough for your example string:

(?(DEFINE)(?<A_prj_number>\D++))

Another advantage of this syntax, you can easily debug your pattern commenting one by one the elements (in the pattern section) from last to first until you obtain a match:

# Pattern
 \g<A_prj_date>     (?<prj_date>      \g<date> )
 \g<A_prj_number>   (?<prj_number>    \g<uint> )
 # \g<A_prj_desc>     (?<prj_desc>      \g<desc> )
 # \g<A_prj_date_req> (?<prj_date_req>  \g<date> ) 

note: if you have only one project by string use preg_match instead of preg_match_all.

2

solved Preg_match on multiline text