[Solved] Parsing list items from html with Go


You likely want to use the golang.org/x/net/html package.
It’s not in the Go standard packages, but instead in the Go Sub-repositories. (The sub-repositories are part of the Go Project but outside the main Go tree. They are developed under looser compatibility requirements than the Go core.)

There is an example in that documentation that may be similar to what you want.

If you need to stick with the Go standard packages for some reason, then
for “typical HTML” you can use encoding/xml.

Both packages tend to use an io.Reader for input. If you have a string or []byte variable you can wrap them with strings.NewReader or bytes.Buffer to get an io.Reader.

For HTML it’s more likely you’ll come from an http.Response body
(make sure to close it when done).
Perhaps something like:

    resp, err := http.Get(someURL)
    if err != nil {
        return err
    }
    defer resp.Body.Close()

    doc, err := html.parse(resp.Body)
    if err != nil {
        return err
    }
    // Recursively visit nodes in the parse tree
    var f func(*html.Node)
    f = func(n *html.Node) {
        if n.Type == html.ElementNode && n.Data == "a" {
            for _, a := range n.Attr {
                if a.Key == "href" {
                    fmt.Println(a.Val)
                    break
                }
            }
        }
        for c := n.FirstChild; c != nil; c = c.NextSibling {
            f(c)
        }
    }
    f(doc)
}

Of course, parsing fetched web pages won’t work for pages that modify their own contents with JavaScript on the client side.

1

solved Parsing list items from html with Go