[Solved] What are the approaches to the Big-Data problems? [closed]

I will approach your question like this: I assume you are firmly interested in big data database use already and have a real need for one, so instead of repeating textbooks upon textbooks of information about them, I will highlight some that meet your 5 requirements – mainly Cassandra and Hadoop. 1) The first requirement … Read more

[Solved] What is the appropriate machine learning algorithm for a restaurant’s sales prediction? [closed]

Pretty general question, requiring more than a stack overflow response. The first thing I’d consider is setting up a predictive algorithm like the linear regression you spoke of. You can also add a constant to it, as in mx+b where the B is the known quantity of food for reservations. So you would run linear … Read more

[Solved] Multiclassification task using keras [closed]

The keyword is “multilabel classification“. In the output layer you have multiple neurons, each neuron representing one of your classes. Now you should use a binary classification for each neuron independently. So if you have 3 classes, the output of your network could be [0.1, 0.8, 0.99] which means the following: The first class is … Read more

[Solved] Getting through in Machine Learning [closed]

The best book unequivocally that has implementation of Machine Learning algorithms in Python is the “Introduction to Machine Learning with Python: A Guide for Data Scientists” by Andreas C. Müller. Machine Learning algorithms in Python can be used from a package called scikit-learn. This package has everything you need for Machine Learning. All the algorithms, … Read more

[Solved] How to calculate precision and recall for two nested arrays [closed]

You have to flatten your lists as shown here, and then use classification_report from scikit-learn: correct = [[‘*’,’*’],[‘*’,’PER’,’*’,’GPE’,’ORG’],[‘GPE’,’*’,’*’,’*’,’ORG’]] predicted = [[‘PER’,’*’],[‘*’,’ORG’,’*’,’GPE’,’ORG’],[‘PER’,’*’,’*’,’*’,’MISC’]] target_names = [‘PER’,’ORG’,’MISC’,’LOC’,’GPE’] # leave out ‘*’ correct_flat = [item for sublist in correct for item in sublist] predicted_flat = [item for sublist in predicted for item in sublist] from sklearn.metrics import classification_report print(classification_report(correct_flat, … Read more

[Solved] Machine Learning on financial big data [closed]

Take ML course on coursera. It is a good introductery into ML algorithms which will tell you what ML could do\some general approaches: https://www.coursera.org/course/ml Also to get a broader picture I suggest coursera’s DataSciense course: https://www.coursera.org/course/datasci Finally a good book is Mahout in action – it is more about solving practical matters with mahout and … Read more

[Solved] Machine Learning Two class classification [closed]

Outputs of a neural network are not probabilities (generally), so that could be a reason that you’re not getting the “1 – P” result you’re looking for. Now, if it’s simple logistic regression, you’d get probabilities as output, but I’m assuming what you said is true and you’re using a super-simple neural network. Also, what … Read more

[Solved] ValueError: shapes (4155,1445) and (4587,7) not aligned: 1445 (dim 1) != 4587 (dim 0)

Have a look at the sci-kit learn documentation for Multinomial NB. It clearly specifies the structure of the input data while trainig model.fit() must match the structure of the input data while testing or scoring model.predict(). This means that you cannot use the same model for different dataset. The only way this is possible is … Read more

[Solved] interpreting the confusion matrix [closed]

Remove id feature, also check and remove any features which you think add no value to prediction (any other features like id) or features with unique values. Also check if there is any class imbalance (how many samples of each class are present in data, is there proper balance among the classes?). Then try applying … Read more

[Solved] sklearn.metrics.roc_curve only shows 5 fprs, tprs, thresholds [closed]

This might depend on the default value of the parameter drop_intermediate (default to true) of roc_curve(), which is meant for dropping suboptimal thresholds, doc here. You might prevent such behaviour by passing drop_intermediate=False, instead. Here’s an example: import numpy as np try: from sklearn.datasets import fetch_openml mnist = fetch_openml(‘mnist_784’, version=1, cache=True) mnist[“target”] = mnist[“target”].astype(np.int8) except … Read more