Suggested readings
Static pages:
-
jsoup – HTML parser and content manipulation library
Mind you, many of the pages will create content dynamically using JavaScript after loading. For such a case, the ‘static page’ approach won’t help, you will need to search for tools in the “Web automation” category.
Selenium is such a toolset. You can command you browser to open and navigate pages using a common browser, you may even be able to use a ‘headless browser’ (no UI) using the phantomjs.
Good luck, there’s lots of reading and coding ahead of you.
[edited for examples]
This technique is called Web scraping – use it with google for examples. The following are offered as an example of results in my searches, I offer no warranties or endorsements for them
For “static Webpage scrapping” – here’s an example using jsoup
For “dynamic pages” – here’s an example using Selenium
2
solved Parsing webpages to extract contents