Home IT技术 ValueError: 数据基数不明确。请确保所有数组包含相同数量的样本。卷积神经网络

ValueError: 数据基数不明确。请确保所有数组包含相同数量的样本。卷积神经网络

IT技术 xiaolong · 2025年5月31日 · 0 Comment

我在gcolab上运行这个卷积神经网络模型，我的目标是文本分类。以下是我的代码和错误:

# sequence encodeencoded_docs = tokenizer.texts_to_sequences(train_docs)# pad sequencesmax_length = max([len(s.split()) for s in train_docs])Xtrain = pad_sequences(encoded_docs, maxlen=max_length, padding='post')# define training labelsytrain = array([0 for _ in range(900)] + [1 for _ in range(900)])# load all test reviewsfood_docs = process_docs('/content/drive/MyDrive/CNN_moviedata/data/food', vocab, False)location_docs = process_docs('/content/drive/MyDrive/CNN_moviedata/data/location', vocab, False)price_docs = process_docs('/content/drive/MyDrive/CNN_moviedata/data/price', vocab, False)service_docs = process_docs('/content/drive/MyDrive/CNN_moviedata/data/service', vocab, False)time_docs = process_docs('/content/drive/MyDrive/CNN_moviedata/data/time', vocab, False)test_docs = food_docs + location_docs + price_docs + service_docs + time_docs# sequence encodeencoded_docs = tokenizer.texts_to_sequences(test_docs)# pad sequencesXtest = pad_sequences(encoded_docs, maxlen=max_length, padding='post')# define test labelsytest = array([0 for _ in range(100)] + [1 for _ in range(100)])# define vocabulary size (largest integer value)vocab_size = len(tokenizer.word_index) + 1# define modelmodel = Sequential()model.add(Embedding(vocab_size, 100, input_length=max_length))model.add(Conv1D(filters=32, kernel_size=8, activation='relu'))model.add(MaxPooling1D(pool_size=2))model.add(Flatten())model.add(Dense(10, activation='relu'))model.add(Dense(1, activation='sigmoid'))print(model.summary())# compile networkmodel.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])# fit networkmodel.fit(Xtrain, ytrain, epochs=10, verbose=2)

这是我的模型摘要输出:

模型: “sequential_1”

层（类型）输出形状参数数

embedding_1 (Embedding) (None, 41, 100) 415400

conv1d_1 (Conv1D) (None, 34, 32) 25632

max_pooling1d_1 (MaxPooling1 (None, 17, 32) 0

flatten_1 (Flatten) (None, 544) 0

dense_2 (Dense) (None, 10) 5450

dense_3 (Dense) (None, 1) 11

总参数: 446,493可训练参数: 446,493不可训练参数: 0

None

这是运行最后一个单元格时发生的错误

---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)<ipython-input-39-fa9c5ed3e39a> in <module>()      2 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])      3 # fit network----> 4 model.fit(Xtrain, ytrain, epochs=10, verbose=2)3 frames/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/data_adapter.py in _check_data_cardinality(data)   1527           label, ", ".join(str(i.shape[0]) for i in nest.flatten(single_data)))   1528     msg += "Make sure all arrays contain the same number of samples."-> 1529     raise ValueError(msg)   1530    1531 ValueError: Data cardinality is ambiguous:  x sizes: 9473  y sizes: 1800Make sure all arrays contain the same number of samples.

我对使用CNN还比较新，任何帮助我都会非常感激！谢谢你。

回答：

你的训练数据只有1,800个标签，但是你的训练输入有9,473个。

>>> ytrain = np.array([0 for _ in range(900)] + [1 for _ in range(900)])>>> ytrain.shape(1800,)

假设你实际上是想为你的标签创建50%的0和50%的1，你需要将其更改为类似于:

ytrain = np.array([0 for _ in range(len(Xtrain)//2)] + [1 for _ in range(len(Xtrain)//2)])

这样将创建一个数组，其中Xtrain的一半标签为0，另一半为1。

更新

对于不均匀的数据集，这可能会更好，因为它在中间索引处分割，因此应该能够处理奇数长度:

length = len(Xtrain)middle_index = length//2ytrain = np.array([0 for _ in range(len(Xtrain[:middle_index]))] + [1 for _ in range(len(Xtrain[middle_index:]))])

xiaolong

发表回复取消回复

Home IT技术 ValueError: 数据基数不明确。请确保所有数组包含相同数量的样本。’x’ 大小: 8 ‘y’ 大小: 3

ValueError: 数据基数不明确。请确保所有数组包含相同数量的样本。’x’ 大小: 8 ‘y’ 大小: 3

IT技术 xiaolong · 2025年4月5日 · 0 Comment

每当我尝试运行这段代码时，它都会显示这个值错误，我不知道为什么。我检查了标签和图像列表的长度，它们是相等的，但 x_train 和 y_train 的长度不同。请注意，由于某些原因我不能使用 tensorflow.keras，它会显示错误，所以我只使用 keras

import numpy as npimport osimport kerasimport tensorflow as tffrom sklearn.preprocessing import LabelBinarizerfrom sklearn.model_selection import train_test_splitimport cv2 as cvpeople = ['H', 'J']DIR = 'C:\AI'images = []labels = []haar_cascade = cv.CascadeClassifier('haar_face.xml')for person in people:    path = os.path.join(DIR, person)    label = people.index(person)    for img in os.listdir(path):        img_path = os.path.join(path, img)        img_array = cv.imread(img_path)        gray = cv.cvtColor(img_array, cv.COLOR_BGR2GRAY)        face_rect = haar_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=6)        for (x, y, w, h) in face_rect:            face_roi = img_array[y:y + h, x:x + w]            face_roi = cv.resize(face_roi, (128, 128))            images.append(face_roi)            labels.append(label)#images = np.array(images, dtype='float')/255.0#labels = np.array(labels, dtype='float')/255.0x_train, y_train, x_test, y_test = train_test_split(images, labels, test_size=0.2, random_state=4)x_train = np.array(x_train, dtype='float')/255.0y_train = np.array(y_train, dtype='float')/255.0print(len(x_train), ' ', len(y_train))model = keras.models.Sequential()model.add(keras.layers.Conv2D(32, (3, 3), padding='same', activation='relu', input_shape=(128, 128, 3)))model.add(keras.layers.MaxPool2D(pool_size=(2, 2)))model.add(keras.layers.BatchNormalization(axis=-1))model.add(keras.layers.Dropout(0, 2))model.add(keras.layers.Flatten())model.add(keras.layers.Dense(512, activation='relu'))model.add(keras.layers.Dense(1, activation='sigmoid'))model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])hist = model.fit(np.array(x_train), np.array(y_train), epochs=5, batch_size=64)

回答：

sklearn.model_selection.train_test_split 的示例说明如下：

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

由于您提供的代码中返回的分割顺序错误，我假设您在模型的 .fit() 函数中提供了测试数据输入而不是训练分割的期望输出数据。请尝试以下方法：

x_train, x_test, y_train, y_test = train_test_split(images, labels, test_size=0.2, random_state=4)

artificial-intelligence conv-neural-network deep-learning neural-network python

xiaolong

层（类型） 输出形状 参数数

dense_3 (Dense) (None, 1) 11

相关文章：

Related Posts

发表回复 取消回复

相关文章：

Related Posts

发表回复 取消回复

层（类型）输出形状参数数

发表回复取消回复

发表回复取消回复