[Solved] Data mining on MySQL [closed]


SQL databases play little role in data mining. (That is, unless you consider computing various business reports involving averages as “data mining”, IMHO these should at most be called “business analytics”).

The reason is that the advanced statistics performed for data mining can’t be accelerated by the database indexes. And usually, they also take much longer than interactive users would be willing to wait.

So in the end, most actual data mining happens “offline”, outside of a database. The database may serve as initial data storage, but the actual data mining process then usually is 1. load data from database, 2. preprocess data, 3. analyze data, 4. present results.

I know that there exist some SQL extensions such as the DMX (“Data mining eXtensions”). But seriously, that isn’t really data mining. That is an interface to invoke some basic prediction functionality, but nothing general. Any good data mining will require customization of the process, and you can’t do this with a DMX one-liner.

Fact is, the most important tools for data mining are R and SciPy. Followed by the specialized tools such as RapidMiner, Weka and ELKI. Why? Because R and Python are best for scripting. It’s ALL about customization of the process. Forget any push-button solution, they just don’t work reasonably well yet.

You just can’t reasonably train e.g. a support vector machine “inside” of a SQL database (and even less, inside a NoSQL database, which usually is not much more than a key-value store). Also don’t underestimate the need to preprocess your data. So in fact, you will be training on a copy of the data set. You might then just get this copy into a data format most efficient for your actual data mining process later on; instead of keeping it in a random-access general-purpose database store.

solved Data mining on MySQL [closed]