Python机器学习错误…如何解决?

不好意思,不知道该如何最好地表达标题…

我刚开始学习Python机器学习,仍在学习中。我有一组数据集(ML_TEST):

  Sale ID   Amount in $ Region      Product  Salesperson    Win_Lose   1            500     North       ink             Jon     1   2            250     North       ink             Jon     0   3            250     North       ink             Jon     0   4            750     North       paper           Jon     0   5            800     North       ink             Bill    0   6            250     North       paper           Bill    1   7            750     North       paper           Jon     1   8            250     North       ink             Bill    1   9            250     North       paper           Dave    0   10           800     North       desk chair      Bill    1   11           750     South       paper           Dave    0   12           500     South       desk chair      Dave    1   13           500     South       ink             Bill    1   14           500     South       ink             Bill    0   15           400     South       paper           Jon     0   16           250     South       paper           Jon     0   17           250     South       ink             Jon     1   18           250     East        ink             Dave    1   19           250     East        ink             Bill    1   20           400     East        ink             Jon     0   21           400     East        paper           Dave    1   22           500     West        desk chair      Bill    0   23           750     West        desk chair      Jon     1   24           800     West        desk chair      Jon     0   25           450     West        paper           Jon     0   26           250     West        ink             Dave    1   27           250     West        paper           Dave    1   28           250     West        paper           Bill    1   29           250     West        paper           Bill    0   30           400     West        ink             Bill    1

我正在尝试理解运行以下代码时出现的错误:

#Load Librariesimport pandasfrom pandas.tools.plotting import scatter_matriximport matplotlib.pyplot as pltfrom sklearn import model_selectionfrom sklearn.metrics import classification_reportfrom sklearn.metrics import confusion_matrixfrom sklearn.metrics import accuracy_scorefrom sklearn.linear_model import LogisticRegressionfrom sklearn.tree import DecisionTreeClassifierfrom sklearn.neighbors import KNeighborsClassifierfrom sklearn.discriminant_analysis import LinearDiscriminantAnalysisfrom sklearn.naive_bayes import GaussianNBfrom sklearn.svm import SVCimport pyodbcconn = pyodbc.connect('')sql = "Select * from TMP.ML_TEST"dataset = pd.read_sql(sql, conn)array = dataset.valuesX = array[:,0:5]Y = array[:,5]validation_size = 0.20seed = 7X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=validation_size, random_state=seed)print(Y)seed = 7scoring = 'accuracy'models = []models.append(('LR', LogisticRegression()))models.append(('LDA', LinearDiscriminantAnalysis()))models.append(('KNN', KNeighborsClassifier()))models.append(('CART', DecisionTreeClassifier()))models.append(('NB', GaussianNB()))models.append(('SVM', SVC()))# evaluate each model in turnresults = []names = []for name, model in models:    kfold = model_selection.KFold(n_splits=12, random_state=seed)    cv_results = model_selection.cross_val_score(model, X_train, Y_train, cv=kfold, scoring=scoring)    results.append(cv_results)    names.append(name)    msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std())    print(msg)

以下是我得到的错误信息:

ValueError                                Traceback (most recent call last)<ipython-input-119-86bed78dded1> in <module>()     12 for name, model in models:     13         kfold = model_selection.KFold(n_splits=12, random_state=seed)---> 14         cv_results = model_selection.cross_val_score(model, X_train, Y_train, cv=kfold, scoring=scoring)     15         results.append(cv_results)     16         names.append(name)C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)    138                                               train, test, verbose, None,    139                                               fit_params)--> 140                       for train, test in cv_iter)    141     return np.array(scores)[:, 0]    142 C:\ProgramData\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self, iterable)    756             # was dispatched. In particular this covers the edge    757             # case of Parallel used with an exhausted iterator.--> 758             while self.dispatch_one_batch(iterator):    759                 self._iterating = True    760             else:C:\ProgramData\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in dispatch_one_batch(self, iterator)    606                 return False    607             else:--> 608                 self._dispatch(tasks)    609                 return True    610 C:\ProgramData\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in _dispatch(self, batch)    569         dispatch_timestamp = time.time()    570         cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)--> 571         job = self._backend.apply_async(batch, callback=cb)    572         self._jobs.append(job)    573 C:\ProgramData\Anaconda3\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py in apply_async(self, func, callback)    107     def apply_async(self, func, callback=None):    108         """Schedule a func to be run"""--> 109         result = ImmediateResult(func)    110         if callback:    111             callback(result)C:\ProgramData\Anaconda3\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py in __init__(self, batch)    324         # Don't delay the application, to avoid keeping the input    325         # arguments in memory--> 326         self.results = batch()    327     328     def get(self):C:\ProgramData\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self)    129     130     def __call__(self):--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]    132     133     def __len__(self):C:\ProgramData\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py in <listcomp>(.0)    129     130     def __call__(self):--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]    132     133     def __len__(self):C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, error_score)    236             estimator.fit(X_train, **fit_params)    237         else:--> 238             estimator.fit(X_train, y_train, **fit_params)    239     240     except Exception as e:C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py in fit(self, X, y, sample_weight)   1171    1172         X, y = check_X_y(X, y, accept_sparse='csr', dtype=np.float64,-> 1173                          order="C")   1174         check_classification_targets(y)   1175         self.classes_ = np.unique(y)C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)    519     X = check_array(X, accept_sparse, dtype, order, copy, force_all_finite,    520                     ensure_2d, allow_nd, ensure_min_samples,--> 521                     ensure_min_features, warn_on_dtype, estimator)    522     if multi_output:    523         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)    380                                       force_all_finite)    381     else:--> 382         array = np.array(array, dtype=dtype, order=order, copy=copy)    383     384         if ensure_2d:ValueError: could not convert string to float: 'Jon'

我真的很想使用朴素贝叶斯模型,因为我的很多特征都是文本,但连这个错误都无法解决 🙁

我正在尝试构建一个模型,根据这些特征预测一笔销售是赢还是输。


回答:

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注