[Solved] What are the approaches to the Big-Data problems? [closed]

Question

I will approach your question like this: I assume you are firmly interested in big data database use already and have a real need for one, so instead of repeating textbooks upon textbooks of information about them, I will highlight some that meet your 5 requirements – mainly Cassandra and Hadoop.

1) The first requirement we want to be able to write to and to read from the database quickly.

You’ll want to explore NoSQL databases which are often used for storing “unstructured” Big Data. Some open-source databases include Hadoop and Cassandra. Regarding the Cassandra,

Facebook needed something fast and cheap to handle the billions of status updates, so it started this project and eventually moved it to Apache where it’s found plenty of support in many communities (ref).

References:

2) We also want to have a web interface to the database

See the list of 150 NoSQL databases to see all the various interfaces available, including web interfaces.

Cassandra has a cluster admin, a web-based environment, a web-admin based on AngularJS, and even GUI clients.

References:

3) We want to be able to run different data-analysis algorithm on the data

Cassandra, Hive, and Hadoop are well-suited for data analytics. For example, eBay uses Cassandra for managing time-series data.

References:

4) We want to run machine learning algorithms on the data to be able to learn “relations”

Again, Cassandra and Hadoop are well-suited. Regarding Apache Spark + Cassandra,

Spark was developed in 2009 at UC Berkeley AMPLab, open sourced in
2010, and became a top-level Apache project in February, 2014. It has
since become one of the largest open source communities in big data, with over 200 contributors in 50+ organizations (ref).

Regarding Hadoop,

With the rapid adoption of Apache Hadoop, enterprises use machine learning as a key technology to extract tangible business value from their massive data assets.

References:

5) Finally, we want to have a nice clicks-based interface that visualize the data.

Visualization tools (paid) that work with the above databases include Pentaho, JasperReports, and Datameer Analytics Solutions. Alternatively, there are several open-source interactive visualization tools such as D3 and Dygraphs (for big data sets).

References:

Accepted Answer

I will approach your question like this: I assume you are firmly interested in big data database use already and have a real need for one, so instead of repeating textbooks upon textbooks of information about them, I will highlight some that meet your 5 requirements – mainly Cassandra and Hadoop.

1) The first requirement we want to be able to write to and to read from the database quickly.

You’ll want to explore NoSQL databases which are often used for storing “unstructured” Big Data. Some open-source databases include Hadoop and Cassandra. Regarding the Cassandra,

Facebook needed something fast and cheap to handle the billions of status updates, so it started this project and eventually moved it to Apache where it’s found plenty of support in many communities (ref).

References:

2) We also want to have a web interface to the database

See the list of 150 NoSQL databases to see all the various interfaces available, including web interfaces.

Cassandra has a cluster admin, a web-based environment, a web-admin based on AngularJS, and even GUI clients.

References:

3) We want to be able to run different data-analysis algorithm on the data

Cassandra, Hive, and Hadoop are well-suited for data analytics. For example, eBay uses Cassandra for managing time-series data.

References:

4) We want to run machine learning algorithms on the data to be able to learn “relations”

Again, Cassandra and Hadoop are well-suited. Regarding Apache Spark + Cassandra,

Spark was developed in 2009 at UC Berkeley AMPLab, open sourced in
2010, and became a top-level Apache project in February, 2014. It has
since become one of the largest open source communities in big data, with over 200 contributors in 50+ organizations (ref).

Regarding Hadoop,

With the rapid adoption of Apache Hadoop, enterprises use machine learning as a key technology to extract tangible business value from their massive data assets.

References:

5) Finally, we want to have a nice clicks-based interface that visualize the data.

Visualization tools (paid) that work with the above databases include Pentaho, JasperReports, and Datameer Analytics Solutions. Alternatively, there are several open-source interactive visualization tools such as D3 and Dygraphs (for big data sets).

References: