将数组拟合到数据生成器然后作为参数传递给Keras分类器

我有一个卷积神经网络,用于对猫和狗进行分类,使用的是Keras分类器。由于我的数据组织方式,我不得不使用自定义交叉验证,我有n组不同的猫狗品种,每个品种有200张图片,每个宠物类别有600张图片。现在,我尝试创建增强数据(即时/就地数据增强)并将它们连接到我的原始组数组中。然而,当我尝试遍历datagen时,我得到了以下错误:

TypeError: __init__() got an unexpected keyword argument 'y'

这是我的尝试:

for i in range(0, 2):    datagen = ImageDataGenerator(    groups[i],    y=labels[i],    batch_size=32,    save_to_dir=None,    save_prefix="",    save_format="png",    rotation_range=20,    zoom_range=0.2    )

我使用自定义交叉验证来检查最佳权重,并将其作为参数传递给KerasClassifier函数,如下所示:

X = np.concatenate([group_1, group_2, group_3], axis=0)[..., np.newaxis]y = np.concatenate([y_g_1, y_g_2, y_g_3], axis=0)    def n_fold_cv():        lengths = [0] + list(accumulate(map(len, [y_g_1, y_g_2, y_g_3])))        i = 1        while i <= 3:            min_length = lengths[i - 1]            max_length = lengths[i]            idx = np.arange(min_length, max_length, dtype=int)            yield idx, idx            i += 1    keras_clf = KerasClassifier(build_fn=build_model, epochs=100, batch_size=8, verbose=0)    accuracies = cross_val_score(estimator=keras_clf, scoring="accuracy", X=X, y=y, cv=n_fold_cv())    print(accuracies)

我有第一组、第二组和第三组。我在连接之前使用自定义k折函数将它们分割成k折。

我想在连接和进行交叉验证之前,对每个组进行一次增强。我还没有真正掌握如何将数据增强拟合到Keras分类器中。

公共Kaggle笔记本和数据集

仅代码,带有数据集

from tensorflow.keras.wrappers.scikit_learn import KerasClassifierfrom sklearn.model_selection import cross_val_scoreimport numpy as npfrom tensorflow.keras.layers import *from tensorflow.keras import Sequentialfrom itertools import accumulateimport tensorflow as tffrom keras import backend as Kfrom keras.preprocessing import imagefrom keras.layers import Conv2D, MaxPooling2Dfrom keras.layers import Activation, Dropout, Flatten, Densefrom keras.callbacks import ModelCheckpointimport osimport numpy as npimport cv2from PIL import Image as PImagefrom os import listdirfrom keras.preprocessing.image import ImageDataGeneratorimg_width, img_height = 128, 160def load_dataset(path):    imagesList = listdir(path)    loadedImages = []    for root, dirs, files in os.walk(path):        for i, name in enumerate(files):            image_path = os.path.join(root, name)            img = PImage.open(image_path)            arr = np.array(img)            loadedImages.append(arr)    return loadedImages def n_fold_cv():    lengths = [0] + list(accumulate(map(len, [y_g_1, y_g_2, y_g_3])))    i = 1    while i <= 3:        min_length = lengths[i - 1]        max_length = lengths[i]        idx = np.arange(min_length, max_length, dtype=int)        yield idx, idx        i += 1              def generateLabel(sober, drunk):    label = []    for i in range(0, 1):        for idx in range(1):            label.extend( [idx]*sober)            for idx in range(1):                label.extend([idx+1]*drunk)    label = np.array(label)    return labely_g_1 = generateLabel(200, 200)y_g_2 = generateLabel(200, 200)y_g_3 = generateLabel(200, 200)group_1 = np.asarray(load_dataset('../input/cats-and-dogs-dataset/Pets 1'))group_2 = np.asarray(load_dataset('../input/cats-and-dogs-dataset/Pets 2'))group_3 = np.asarray(load_dataset('../input/cats-and-dogs-dataset/Pets 3'))groups = np.stack((grupo_1, grupo_2, grupo_3),axis=0)labels = np.stack((y_g_2, y_g_2, y_g_3), axis=0)for i in range(0, 2):    datagen = ImageDataGenerator(    groups[i],    y=labels[i],    batch_size=32,    save_to_dir=None,    save_prefix="",    save_format="png",    rotation_range=20,    zoom_range=0.2    )            X = np.concatenate([group_1, group_2, group_3], axis=0)[..., np.newaxis]y = np.concatenate([y_g_1, y_g_2, y_g_3], axis=0)if K.image_data_format() == 'channels_first':   input_shape = (3, img_width, img_height)else:   input_shape = (img_width, img_height, 3)def build_model():    model = Sequential()    model.add(Conv2D(32, (3, 3), input_shape=input_shape))    model.add(Activation('relu'))    model.add(MaxPooling2D(pool_size=(2, 2)))    model.add(Dropout(0.25))    model.add(Dense(1))    model.add(Activation('sigmoid'))    model.summary()    model.compile(loss='binary_crossentropy',                  optimizer='rmsprop',                  metrics=['mse','accuracy'])    return modelkeras_clf = KerasClassifier(build_fn=build_model, epochs=100, batch_size=8, verbose=0)accuracies = cross_val_score(estimator=keras_clf, scoring="accuracy", X=X, y=y, cv=n_fold_cv())print(accuracies)

回答:

我认为此时你应该创建一个自定义交叉验证循环,因为你需要额外的灵活性。这样你就可以应用任何你想要的转换。例如,我使用了这个转换:

img  = tf.image.random_contrast(img, .2, .5)

但你可以根据需要进行任何转换。

import tensorflow as tffrom tensorflow.keras.layers import *from tensorflow.keras import Sequentialfrom glob2 import globfrom collections import dequegroup1 = glob('group1\\*\\*.jpg')group2 = glob('group2\\*\\*.jpg')group3 = glob('group3\\*\\*.jpg')groups = [group1, group2, group3]assert all(map(len, groups))def load(file_path):    img = tf.io.read_file(file_path)    img = tf.image.decode_jpeg(img, channels=3)    img = tf.image.convert_image_dtype(img, tf.float32)    img = tf.image.resize(img, size=(100, 100))    img  = tf.image.random_contrast(img, .2, .5)    label = tf.strings.split(file_path, os.sep)[1]    label = tf.cast(tf.equal(label, 'dogs'), tf.int32)    return img, labelaccuracies_on_test_set = {}for i in range(len(groups)):    d = deque(groups)    d.rotate(i)    train1, train2, test1 = d    train_ds = tf.data.Dataset.from_tensor_slices(train1 + train2).\        shuffle(len(train1) + len(train2)).map(load).batch(4)    test_ds = tf.data.Dataset.from_tensor_slices(test1).\        shuffle(len(test1)).map(load).batch(4)    model = Sequential()    model.add(Conv2D(32, (3, 3), input_shape=(100, 100, 3)))    model.add(Activation('relu'))    model.add(MaxPooling2D(pool_size=(2, 2)))    model.add(Flatten())    model.add(Dropout(0.25))    model.add(Dense(1))    model.add(Activation('sigmoid'))    model.compile(loss='binary_crossentropy',                  optimizer='rmsprop',                  metrics=['mse', 'accuracy'])    model.fit(train_ds, validation_data=test_ds, epochs=5, verbose=0)    loss, mse, accuracy = model.evaluate(test_ds, verbose=0)    accuracies_on_test_set[f'epoch_{i + 1}_accuracy'] = accuracyprint(accuracies_on_test_set)

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注