[Solved] Href URL matching, [duplicate]


First use the method described here to retrieve all hrefs, then you can use a regex or strpos to “filter out” those who don’t start with /download/.
The reason why you should use a parser instead of a regex is discussed in many other posts on stack overflow (see this). Once you parsed the document and got the hrefs you need, then you can filter them out with simple functions.

A little code:

$dom = new DOMDocument;
//html string contains your html
$dom->loadHTML($html);
//at the end of the procedure this will be populated with filtered hrefs
$hrefs = array();
foreach( $dom->getElementsByTagName('a') as $node ) {
    //look for href attribute
    if( $node->hasAttribute( 'href' ) ) {
        $href = $node->getAttribute( 'href' );
        // filter out hrefs which don't start with /download/
        if( strpos( $href, "/download/" ) === 0 )
            $hrefs[] = $href; // store href
    }
}

7

solved Href URL matching, [duplicate]