[Solved] Working with python scikit-learn ModuleNotFoundError: No module named ‘sklearn.mixture.gmm’ how i solve this without downgrading scikit-learn

Working with python scikit-learn ModuleNotFoundError: No module named ‘sklearn.mixture.gmm’ how i solve this without downgrading scikit-learn solved Working with python scikit-learn ModuleNotFoundError: No module named ‘sklearn.mixture.gmm’ how i solve this without downgrading scikit-learn

[Solved] Python Machine Learning

Maybe do some Exploratory data analysis first to see if you can figure out a pattern between your target variable and features? It would also be good to extract some features from your date/time variables rather than using them as integers (like weekday_or_not, seasons etc.) You can also try transforming your features (log, sqrt) to … Read more

[Solved] Why couldn’t I predict directly using Features Matrix?

You are using this method in both training and testing: def encode_string(cat_features): enc = preprocessing.LabelEncoder() enc.fit(cat_features) enc_cat_features = enc.transform(cat_features) ohe = preprocessing.OneHotEncoder() encoded = ohe.fit(enc_cat_features.reshape(-1,1)) return encoded.transform(enc_cat_features.reshape(-1,1)).toarray() by calling: Features = encode_string(combined_custs[‘CountryRegionName’]) for col in categorical_columns: temp = encode_string(combined_custs[col]) Features = np.concatenate([Features, temp],axis=1) But as I said in my comment above, you need to apply … Read more

[Solved] What representation of chat text data should I use for user classification? [closed]

You’re asking what ML representation you should use for user-classification of chat text. bag-of-words and word-vector are the main representations generally used in text-processing. However user-classification of chat is not the usual text-processing task, we look for telltale features indicative of a specific user. Here are some: character length, word length, sentence length of each … Read more

[Solved] Error: __init__() got an unexpected keyword argument ‘n_splits’

You are mixing up two different modules. Before 0.18, cross_validation was used for ShuffleSplit. In that, n_splits was not present. n was used to define the number of splits But since you have updated to 0.18 now, cross_validation and grid_search has been deprecated in favor of model_selection. This is mentioned in docs here, and these … Read more

[Solved] Hello, two questions about sklearn.Pipeline with custom transformer for timeseries [closed]

You can not use target, predicted = pipe.fit_predict(df) with your defined pipeline, because the fit_predict() method can only be used, if the estimator has such a method implemented as well. Reference in documentation Valid only if the final estimator implements fit_predict. Also, it would only return the predictions, so you can not use target,predicted = … Read more

[Solved] applying onehotencoder on numpy array

Don’t use a new OneHotEncoder on test_data, use the first one, and only use transform() on it. Do this: test_data = onehotencoder_1.transform(test_data).toarray() Never use fit() (or fit_transform()) on testing data. The different number of columns are entirely possible because it may happen that test data dont contain some categories which are present in train data. … Read more

[Solved] What is the appropriate machine learning algorithm for a restaurant’s sales prediction? [closed]

Pretty general question, requiring more than a stack overflow response. The first thing I’d consider is setting up a predictive algorithm like the linear regression you spoke of. You can also add a constant to it, as in mx+b where the B is the known quantity of food for reservations. So you would run linear … Read more

[Solved] How to calculate precision and recall for two nested arrays [closed]

You have to flatten your lists as shown here, and then use classification_report from scikit-learn: correct = [[‘*’,’*’],[‘*’,’PER’,’*’,’GPE’,’ORG’],[‘GPE’,’*’,’*’,’*’,’ORG’]] predicted = [[‘PER’,’*’],[‘*’,’ORG’,’*’,’GPE’,’ORG’],[‘PER’,’*’,’*’,’*’,’MISC’]] target_names = [‘PER’,’ORG’,’MISC’,’LOC’,’GPE’] # leave out ‘*’ correct_flat = [item for sublist in correct for item in sublist] predicted_flat = [item for sublist in predicted for item in sublist] from sklearn.metrics import classification_report print(classification_report(correct_flat, … Read more

[Solved] ValueError: shapes (4155,1445) and (4587,7) not aligned: 1445 (dim 1) != 4587 (dim 0)

Have a look at the sci-kit learn documentation for Multinomial NB. It clearly specifies the structure of the input data while trainig model.fit() must match the structure of the input data while testing or scoring model.predict(). This means that you cannot use the same model for different dataset. The only way this is possible is … Read more

[Solved] interpreting the confusion matrix [closed]

Remove id feature, also check and remove any features which you think add no value to prediction (any other features like id) or features with unique values. Also check if there is any class imbalance (how many samples of each class are present in data, is there proper balance among the classes?). Then try applying … Read more

[Solved] How to cluster with K-means, when number of clusters and their sizes are known [closed]

It won’t be k-means anymore. K-means is variance minimization, and it seems your objective is to produce paritions of a predefined size, not of minimum variance. However, here is a tutorial that shows how to modify k-means to produce clusters of the same size. You can easily extend this to produce clusters of the desired … Read more