[Solved] Where are my mistakes in my scrapy codes?

The main issue with your code is using of .select instead of .css. Here is what do you need but I’m not sure about titles part (may be you need it on other pages): def parse(self, response): titles = response.xpath(“//div[@class=”artist”]”) # items = [] for title in titles: item = ArtistlistItem() item[“artist”] = title.css(“h2::text”).get() item[“biograpy”] … Read more

[Solved] How to build a powerful crawler like google’s? [closed]

For Python you could go with Frontera by Scrapinghub https://github.com/scrapinghub/frontera https://github.com/scrapinghub/frontera/blob/distributed/docs/source/topics/distributed-architecture.rst They’re the same guys that make Scrapy. There’s also Apache Nutch which is a much older project. http://nutch.apache.org/ 1 solved How to build a powerful crawler like google’s? [closed]

[Solved] Parsing webpages to extract contents

Suggested readings Static pages: java.net.URLConnection and java.net.HttpURLConnection jsoup – HTML parser and content manipulation library Mind you, many of the pages will create content dynamically using JavaScript after loading. For such a case, the ‘static page’ approach won’t help, you will need to search for tools in the “Web automation” category.Selenium is such a toolset. … Read more

[Solved] How to solve Mysql to mysql as I have some problems [duplicate]

MySQL extension was deprecated in PHP 5.5.0, and it was removed in PHP 7.0.0. Instead, the MySQLi or PDO_MySQL extension should be used. Use MySQLi-connect insted of MySQL_connect as well as instead of mysql_select_db use mysqli_select_db EDIT 01 in mysqli_connect you can select database too $link = mysqli_connect(“127.0.0.1”, “my_user”, “my_password”, “my_db”); 6 solved How to … Read more

[Solved] How to make this crawler more efficient [closed]

Provided your intentions are not nefarious– As mentioned in the comment, one way to achieve this is executing the crawler in parallel (multithreading)—as opposed to doing one domain at a time. Something like: exec(‘php crawler.php > /dev/null 2>&1 &’); exec(‘php crawler.php > /dev/null 2>&1 &’); exec(‘php crawler.php > /dev/null 2>&1 &’); exec(‘php crawler.php > /dev/null … Read more

[Solved] Check if element in for each loop is empty

Based on the fact that your images is a available ArrayList<>, you should do like this: if(images.size() > 0){ for (Element src : images){ if (src != null) { System.out.println(“Source ” + src.attr(“abs:src”)); } } } else { System.out.println(“There are no elements in ArrayList<> images”); } First you check if there are elements in the … Read more