Using DomDocument
and xpath
you can load the entire html and query for the li
elements.
Then it’s a matter of simply outputting the nodeValue
The xpath->query
method below will search for all li
elements that belong to a parent ul
that has a class of breadcrumb
Example
$html="
<html>
<body>
<div class="container">
<ul itemprop="breadcrumb" class="breadcrumb">
<li><a href="https://stackoverflow.com/">Home</a><i class="ico-breadcrumb"></i></li>
<li><a href="http://stackoverflow.com/inspiration/0.iroot">Inspiration</a><i class="ico-breadcrumb"></i></li>
<li><a href="http://stackoverflow.com/inspiration/loft/CC_npccat_100031.icat">Loft</a><i class="ico-breadcrumb"></i></li>
<li>First impressions count - bringing your hallway to life</li>
</ul>
</div>
</body>
</html>";
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$categories = $xpath->query('//ul[contains(@class,"breadcrumb")]/li');
foreach($categories as $category){
print $category->nodeValue . PHP_EOL;
}
This will output
Home
Inspiration
Loft
First impressions count - bringing your hallway to life
solved Extract breadcrumb from html with regex and remove html tags