理解scikit-learn的ValueError:由于数据形状导致的尝试用序列设置数组元素

在过去的两天里,我一直在努力寻找一种正确的方式来调整和输入我的数据到sklearn.preprocessingfit_transform方法中。我的数据是从成千上万的量子化学计算中解析出来的。我有几个特征,最后是一个与原子位置相关的25×25矩阵。最终的pandas数据框看起来像这样:

特征A (int), 特征B (float), 特征C (float), 特征D (float), [矩阵]

实际上,这个矩阵是用零填充并展平的,所以数据框中包含的是一个625×1的numpy数组

问题在于,当我尝试分割并拟合我的数据来训练我的Sequential模型时,X_train = scaler.fit_transform(X_train)会抛出一个错误:

---------------------------------------------------------------------------TypeError                                 Traceback (most recent call last)TypeError: 只能将大小为1的数组转换为Python标量以上异常是以下异常的直接原因:ValueError                                Traceback (most recent call last)<ipython-input-18-a0e62fa4eda4> in <module>()----> 1 X_train = scaler.fit_transform(X_train)4 frames/usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order)     83      84     """---> 85     return array(a, dtype, copy=False, order=order)     86      87 ValueError: 尝试用序列设置数组元素。

看起来如果我的数据包含一个numpy数组(甚至是一个列表)作为其中一个特征,我就无法使用它来进行拟合。我在这种情况下应该怎么做呢?

附注:

如果有帮助的话,这里是X_train中的一行(我们可以看到我有一个子数组,这就是让我头疼的地方):

print(X_train[0])array([2,       array([1.36220678e+03, 1.05000000e+02, 1.05000000e+02, 1.12460036e+01,       1.12460877e+01, 1.12461039e+01, 1.12460515e+01, 1.12460803e+01,       1.12460599e+01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 1.05000000e+02, 5.33587074e+01, 1.36111111e+01,       6.75949416e+00, 6.75947066e+00, 6.75946638e+00, 1.70097114e+00,       1.70097442e+00, 1.70097229e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 1.05000000e+02, 1.36111111e+01,       5.33587074e+01, 1.70096614e+00, 1.70097517e+00, 1.70097691e+00,       6.75949013e+00, 6.75946954e+00, 6.75947172e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.12460036e+01,       6.75949416e+00, 1.70096614e+00, 5.00000000e-01, 6.05278494e-01,       6.05278867e-01, 2.26938983e-01, 2.12534321e-01, 2.12455393e-01,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       1.12460877e+01, 6.75947066e+00, 1.70097517e+00, 6.05278494e-01,       5.00000000e-01, 6.05280577e-01, 2.12534792e-01, 2.12456720e-01,       2.26940528e-01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 1.12461039e+01, 6.75946638e+00, 1.70097691e+00,       6.05278867e-01, 6.05280577e-01, 5.00000000e-01, 2.12456349e-01,       2.26941094e-01, 2.12535342e-01, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 1.12460515e+01, 1.70097114e+00,       6.75949013e+00, 2.26938983e-01, 2.12534792e-01, 2.12456349e-01,       5.00000000e-01, 6.05279075e-01, 6.05278949e-01, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.12460803e+01,       1.70097442e+00, 6.75946954e+00, 2.12534321e-01, 2.12456720e-01,       2.26941094e-01, 6.05279075e-01, 5.00000000e-01, 6.05280203e-01,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       1.12460599e+01, 1.70097229e+00, 6.75947172e+00, 2.12455393e-01,       2.26940528e-01, 2.12535342e-01, 6.05278949e-01, 6.05280203e-01,       5.00000000e-01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,       0.00000000e+00]),       0.004850999999999999, 0.24708899999999998, 0.28475300000000003,       0.285264, -0.00352, -0.00028, -0.00072], dtype=object)

回答:

首先,让我们构建一个可以重现ValueError的数据框

导入pandas 作为 pdtest_df = pd.DataFrame({'prop_float_%i' % ind: np.random.rand(100) for ind in range(3)})test_df['prop_int'] = np.random.randint(0,100,100)# 创建矩阵列 test_df['matrix'] = np.random.rand(100).astype('object')for ind in range(100):test_df.at[ind,'matrix'] = np.random.rand(625)

test_df的前五行看起来像

    prop_float_0  prop_float_1  prop_float_2  prop_int                                             matrix0       0.748796      0.413757      0.750549        87  [0.0013304191112200048, 0.8335838936187671, 0....1       0.982136      0.014367      0.072711        62  [0.13101366609934562, 0.3455947047272854, 0.67...2       0.767685      0.289047      0.376070        67  [0.5403591994226811, 0.20985464836499557, 0.47...3       0.894771      0.008032      0.458049        11  [0.5520592944741991, 0.1013914150023918, 0.522...4       0.174076      0.493082      0.045521        10  [0.3383177346302546, 0.8874405729210008, 0.169...5       0.701766      0.232873      0.905511        11  [0.42878413331053633, 0.4555373221498983, 0.19...

其中matrix列中的条目是(625,)数组。

我假设你想将featuresmatrix这是原子位置!!!原子位置与基态能量等属性的单位不同!!!)一起标准化,那么你需要在应用fit_transform之前扩展matrix

df_mat = pd.DataFrame(test_df['matrix'].values.tolist()).astype('float')test_df = test_df.drop('matrix',axis=1)test_df.reset_index(drop=True,inplace=True)df_mat.reset_index(drop=True,inplace=True)flattened_df = pd.concat([test_df,df_mat],axis=1)

如你所见,在flattened_df中,原始的matrix列被扩展为625列,在flattened_df上,你可以应用

从sklearn.preprocessing 导入 MinMaxScalermm_scalar = MinMaxScaler()transformed_array = mm_scalar.fit_transform(flattened_df)

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注