我有一个列表 X_train
(超过20000个元素),每个元素都是通过 HashingVectorizer.transform()
创建的稀疏 scipy csr_matrix
。
我的 HashingVectorizer.transform()
对输入文件进行逐行转换,并将结果追加到列表 X_train 中。
我试图使用 X_train 训练一个 SGDClassifier
,但我收到了以下错误:
ValueError: setting an array element with a sequence
。
如何在不进行CPU或内存密集型操作的情况下训练 SGDClassifier?
回答:
稀疏矩阵的列表,以及将其转换为数组或稀疏矩阵(或不转换)的方法:
In [916]: alist=[sparse.random(1,10,.2, format='csr') for _ in range(3)]In [917]: alistOut[917]: [<1x10 sparse matrix of type '<class 'numpy.float64'>' with 2 stored elements in Compressed Sparse Row format>, <1x10 sparse matrix of type '<class 'numpy.float64'>' with 2 stored elements in Compressed Sparse Row format>, <1x10 sparse matrix of type '<class 'numpy.float64'>' with 2 stored elements in Compressed Sparse Row format>]
创建一个正确的二维稀疏矩阵:
In [918]: sparse.vstack(alist)Out[918]: <3x10 sparse matrix of type '<class 'numpy.float64'>' with 6 stored elements in Compressed Sparse Row format>
矩阵的对象数组 – 不好
In [919]: np.array(alist)Out[919]: array([ <1x10 sparse matrix of type '<class 'numpy.float64'>' with 2 stored elements in Compressed Sparse Row format>, <1x10 sparse matrix of type '<class 'numpy.float64'>' with 2 stored elements in Compressed Sparse Row format>, <1x10 sparse matrix of type '<class 'numpy.float64'>' with 2 stored elements in Compressed Sparse Row format>], dtype=object)
尝试创建一个浮点数数组 – 你的错误
In [920]: np.array(alist, float)---------------------------------------------------------------------------ValueError Traceback (most recent call last)<ipython-input-920-52d4689fa7b3> in <module>()----> 1 np.array(alist, float)ValueError: setting an array element with a sequence.