使用tensorflow创建离散数据集的zip文件

我有两个虚拟图像数据集，第一个数据集包含三个元素，第二个数据集包含六个元素。

例如，第一个数据集的图像名称为 [1.png, 2.png, 3.png]

第二个数据集的图像名称为 [1_1.png, 1_2.png, 2_1.png, 2_2.png, 3_1.png, 3_2.png]

我想弄清楚如何将这些数据集压缩成zip文件，并将它们映射起来，使得[1.png 需要映射到 1_1.png 和 1_2.png]，[2.png 需要映射到 2_1.png 和 2_2.png]，依此类推。这是可能的吗？我尝试实现的代码如下，但我真的不知道该怎么做。

代码

import osimport tensorflow as tfX=tf.data.Dataset.list_files('D:/test/clear/*.png',shuffle=False)Y=tf.data.Dataset.list_files('D:/test/haze/*.png',shuffle=False)paired=tf.data.Dataset.zip((X,Y))for x in paired:    print(x)

结果

(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\1.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\1_1.png'>)(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\2.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\1_2.png'>)

我想要的结果

(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\1.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\1_1.png'>)(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\1.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\1_2.png'>)(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\2.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\2_1.png'>)(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\2.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\2_2.png'>)

回答：

(这是我在StackOverflow上写的第一个回答，希望它足够清晰，没有太多格式错误。)

我目前能想到的最简单的方法是重复X的数据文件名。

这些是我使用的虚拟文件路径列表:

files_x = ["D:\\test\\clear\\1.png", "D:\\test\\clear\\2.png", "D:\\test\\clear\\3.png"] files_y = ["D:\\test\\haze\\1_1.png", "D:\\test\\haze\\1_2.png",  "D:\\test\\haze\\2_1.png", "D:\\test\\haze\\2_2.png", "D:\\test\\haze\\3_1.png", "D:\\test\\haze\\3_2.png"]

首先，基于文件路径列表创建一个数据集。

ds_files_x_dup = tf.data.Dataset.from_tensor_slices(files_x)

然后可以通过map函数对每个元素应用tf.repeat来重复元素。然而，这会导致重复的元素被组合成一个样本。要获得每个样本一个元素的数据集，你需要对数据集应用flat_map函数。

ds_files_x_dup = ds_files_x_dup.map(lambda x: tf.repeat(x,2))ds_files_x_dup = ds_files_x_dup.flat_map(lambda x: tf.data.Dataset.from_tensor_slices(x))

现在只需基于files_y创建数据集:

ds_files_y = tf.data.Dataset.from_tensor_slices(files_y)

然后将它们压缩在一起:

paired = tf.data.Dataset.zip((ds_files_x_dup, ds_files_y))

然后paired的元素就是:

(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\1.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\1_1.png'>)(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\1.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\1_2.png'>)(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\2.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\2_1.png'>)(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\2.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\2_2.png'>)(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\3.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\3_1.png'>)(<tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\clear\\3.png'>, <tf.Tensor: shape=(), dtype=string, numpy=b'D:\\test\\haze\\3_2.png'>)

学技术

使用tensorflow创建离散数据集的zip文件

代码

结果

我想要的结果

发表回复取消回复

代码

结果

我想要的结果

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复