我已经为我正在处理的数据集创建了特征X和标签y。
此时,我想在这个数据集上训练一个随机森林分类器,但在拟合训练数据时遇到了一个ValueError:setting an array element with a sequence.
以下是X和y的特征以及错误详情:
X:
(array([-8.1530527e-10, 8.9952795e-10, -9.1185753e-10, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32), array([0., 0., 0., ..., 0., 0., 0.], dtype=float32), array([0., 0., 0., ..., 0., 0., 0.], dtype=float32), array([-0.00050612, -0.00057967, -0.00035985, ..., 0. , 0. , 0. ], dtype=float32), array([ 6.8139506e-08, -2.3837963e-05, -2.4622474e-05, ..., 3.1678758e-06, -2.4535689e-06, 0.0000000e+00], dtype=float32), array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 6.9306935e-07, -6.6020442e-07, 0.0000000e+00], dtype=float32), array([-7.30260945e-05, -1.18022966e-04, -1.08280736e-04, ..., 8.83421380e-05, 4.97258679e-06, 0.00000000e+00], dtype=float32), array([0., 0., 0., ..., 0., 0., 0.], dtype=float32), array([ 2.3406714e-05, 3.1186773e-05, 4.9467826e-06, ..., 1.2180173e-07, -9.2944845e-08, 0.0000000e+00], dtype=float32), array([ 1.1845550e-06, -1.6399191e-06, 2.5565218e-06, ..., -8.7445065e-09, 5.9859917e-09, 0.0000000e+00], dtype=float32), array([0., 0., 0., ..., 0., 0., 0.], dtype=float32), array([-1.3284328e-05, -7.4090644e-07, 7.2679302e-07, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32), array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 5.0694009e-08, -3.4546797e-08, 0.0000000e+00], dtype=float32), array([ 1.5591205e-07, -1.5845627e-07, 1.5362870e-07, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32), array([0., 0., 0., ..., 0., 0., 0.], dtype=float32), array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 1.1608539e-05, 8.2463991e-09, 0.0000000e+00], dtype=float32), array([-3.6192148e-07, -1.4590451e-05, -5.3999561e-06, ..., -1.9935460e-05, -3.4417746e-05, 0.0000000e+00], dtype=float32), array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., -2.5319534e-07, 2.6521766e-07, 0.0000000e+00], dtype=float32), array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., -2.5055220e-08, 1.2936166e-08, 0.0000000e+00], dtype=float32), array([0., 0., 0., ..., 0., 0., 0.], dtype=float32), array([ 1.3387315e-05, 6.0913658e-07, -5.6471418e-07, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32), array([ 1.7200684e-02, 3.2272514e-02, 3.2961801e-02, ..., -1.6286784e-06, -8.5592075e-07, 0.0000000e+00], dtype=float32), array([0., 0., 0., ..., 0., 0., 0.], dtype=float32), array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., -3.3923173e-11, 2.8026699e-11, 0.0000000e+00], dtype=float32), array([-0.00103188, -0.00075814, -0.00051426, ..., 0. , 0. , 0. ], dtype=float32), array([ 7.6278877e-07, 2.1624428e-05, 1.1150542e-05, ..., 1.8263392e-09, -1.5558380e-09, 0.0000000e+00], dtype=float32), array([-1.2111740e-07, 6.3130176e-07, -1.8378003e-06, ..., 1.1309878e-05, 5.4562256e-06, 0.0000000e+00], dtype=float32), array([0.00026949, 0.00028119, 0.00020081, ..., 0.00032586, 0.00046612, 0. ], dtype=float32), array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., -7.8796054e-09, 1.7431153e-08, 0.0000000e+00], dtype=float32), array([1.42000988e-06, 1.30781755e-05, 2.77493709e-05, ..., 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], dtype=float32), array([ 2.9161662e-10, -6.3629275e-11, -3.0565092e-10, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32), array([ 2.2051008e-05, 1.6838792e-05, 3.5639907e-05, ..., 4.5767497e-06, -1.2002213e-05, 0.0000000e+00], dtype=float32), array([0., 0., 0., ..., 0., 0., 0.], dtype=float32), array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., -2.0104826e-10, 1.6824393e-10, 0.0000000e+00], dtype=float32), array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., -4.8303300e-06, -1.2008861e-05, 0.0000000e+00], dtype=float32), array([0., 0., 0., ..., 0., 0., 0.], dtype=float32), array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., -2.7673337e-07, 2.8604177e-07, 0.0000000e+00], dtype=float32), array([-0.00066044, -0.0009837 , -0.00090796, ..., -0.00171516, -0.0017666 , 0. ], dtype=float32), array([ 3.2218946e-11, -5.5296181e-11, 8.9530647e-11, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32), array([0., 0., 0., ..., 0., 0., 0.], dtype=float32), array([-1.3284328e-05, -7.4090644e-07, 7.2679302e-07, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32), array([ 4.9886359e-05, 1.4642075e-04, 4.4365996e-04, ..., 6.3584002e-07, -6.2395281e-07, 0.0000000e+00], dtype=float32), array([-3.2826196e-04, 4.5522624e-03, -8.2306744e-04, ..., -2.2519816e-07, -6.2417300e-08, 0.0000000e+00], dtype=float32), array([ 3.1686827e-04, 4.6282235e-04, 1.0160641e-04, ..., -1.4605960e-05, 6.6572487e-05, 0.0000000e+00], dtype=float32), array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., -7.1763244e-09, -2.8297892e-08, 0.0000000e+00], dtype=float32), array([0., 0., 0., ..., 0., 0., 0.], dtype=float32), array([-2.5870585e-07, 4.6514080e-07, -9.5607948e-07, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32), array([ 5.788035e-07, -6.493598e-07, 7.111379e-07, ..., 0.000000e+00, 0.000000e+00, 0.000000e+00], dtype=float32), array([ 2.5118000e-04, 1.4220485e-03, 3.9536849e-04, ..., 4.5242754e-04, -3.1405249e-05, 0.0000000e+00], dtype=float32), array([0., 0., 0., ..., 0., 0., 0.], dtype=float32), array([ 1.1985266e-07, 2.1360799e-07, -1.1951373e-06, ..., -1.3043609e-04, 1.2107374e-06, 0.0000000e+00], dtype=float32), array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 2.5944988e-08, 1.2123945e-07, 0.0000000e+00], dtype=float32), array([0., 0., 0., ..., 0., 0., 0.], dtype=float32), array([0., 0., 0., ..., 0., 0., 0.], dtype=float32), array([-2.4280996e-06, -1.2362683e-05, -8.5034850e-07, ..., -1.0113516e-11, 5.1403621e-12, 0.0000000e+00], dtype=float32), array([9.6098862e-05, 1.6449913e-04, 1.1942573e-04, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32), array([ 1.3284328e-05, 7.4090644e-07, -7.2679302e-07, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32), array([ 2.4700081e-05, 2.9454704e-05, 8.0751715e-06, ..., 1.2746801e-07, -1.6574201e-06, 0.0000000e+00], dtype=float32), array([8.4619669e-06, 9.7476968e-06, 2.0182479e-05, ..., 2.1081217e-11, 4.0220186e-10, 0.0000000e+00], dtype=float32), array([0., 0., 0., ..., 0., 0., 0.], dtype=float32))
y如下
('08', '08', '06', '05', '05', '04', '06', '07', '01', '04', '03', '07', '03', '01', '03', '03', '02', '02', '02', '02', '05', '06', '04', '08', '07', '06', '04', '05', '07', '02', '08', '01', '08', '03', '08', '02', '03', '06', '04', '07', '04', '07', '05', '06', '08', '08', '04', '05', '05', '04', '06', '07', '05', '07', '01', '06', '02', '02', '03', '03')
分类器以及训练/测试分割的代码:
from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)from sklearn.tree import DecisionTreeClassifierdtree = DecisionTreeClassifier()dtree.fit(X_train, y_train)
错误:
---------------------------------------------------------------------------ValueError Traceback (most recent call last)<ipython-input-70-b6417fbfb8de> in <module>() 1 from sklearn.tree import DecisionTreeClassifier 2 dtree = DecisionTreeClassifier()----> 3 dtree.fit(X_train, y_train)/usr/local/lib/python3.6/dist-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted) 788 sample_weight=sample_weight, 789 check_input=check_input,--> 790 X_idx_sorted=X_idx_sorted) 791 return self 792 /usr/local/lib/python3.6/dist-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted) 114 random_state = check_random_state(self.random_state) 115 if check_input:--> 116 X = check_array(X, dtype=DTYPE, accept_sparse="csc") 117 y = check_array(y, ensure_2d=False, dtype=None) 118 if issparse(X):/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator) 431 force_all_finite) 432 else:--> 433 array = np.array(array, dtype=dtype, order=order, copy=copy) 434 435 if ensure_2d:ValueError: setting an array element with a sequence.
EDIT1:我已经将X和y都转换成了numpy数组,但收到的错误仍然相同,详情如下
import numpy as npX = np.asarray(X)y = np.asarray(y)X.shape, y.shape
输出:
((60,), (60,))
回答:
看起来问题出在你的X上。可能构成它的一个数组长度不同,导致你构建的元组在被Scikit-learn处理并转换为Numpy数组时变成了字符串向量,而这不是决策树函数所期望处理的内容。
请查看以下代码片段:
X1 = (array([-8.1530527e-10, 8.9952795e-10, -9.1185753e-10, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype='float32'), array([0., 0., 0., 0., 0., 0.], dtype='float32'), array([0., 0., 0., 0., 0., 0.], dtype='float32'))X2 = (array([-8.1530527e-10, 8.9952795e-10, -9.1185753e-10, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype='float32'), array([0., 0., 0., 0., 0., 0., 1], dtype='float32'), array([0., 0., 0., 0., 0., 0.], dtype='float32'))print("X1:", np.array(X1).dtype, "\nX2:", np.array(X2).dtype)
仅仅通过在X2的第二个元素中添加一个额外的数字,就会导致X2数组变成字符串数组(对象类型)。