深度学习Udacity课程：作业1的第二个问题（notMNIST）

在阅读了这篇文章并完成了课程后，我在解决作业1的第二个问题（notMnist）时遇到了困难：

让我们验证数据是否仍然看起来不错。显示ndarray中标签和图像的样本。提示：你可以使用matplotlib.pyplot。

这是我尝试的方法：

import randomrand_smpl = [ train_datasets[i] for i in sorted(random.sample(xrange(len(train_datasets)), 1)) ]print(rand_smpl)filename = rand_smpl[0]import pickleloaded_pickle = pickle.load( open( filename, "r" ) )image_size = 28  # 像素宽度和高度import numpy as npdataset = np.ndarray(shape=(len(loaded_pickle), image_size, image_size),                         dtype=np.float32)import matplotlib.pyplot as pltplt.plot(dataset[2])plt.ylabel('some numbers')plt.show()

但得到的结果是这样的：

这看起来不太合理。说实话，我的代码可能也是如此，因为我并不确定如何解决这个问题！

这些pickle文件是通过以下方式创建的：

image_size = 28  # 像素宽度和高度pixel_depth = 255.0  # 每个像素的级别数def load_letter(folder, min_num_images):  """加载单个字母标签的数据。"""  image_files = os.listdir(folder)  dataset = np.ndarray(shape=(len(image_files), image_size, image_size),                         dtype=np.float32)  print(folder)  num_images = 0  for image in image_files:    image_file = os.path.join(folder, image)    try:      image_data = (ndimage.imread(image_file).astype(float) -                     pixel_depth / 2) / pixel_depth      if image_data.shape != (image_size, image_size):        raise Exception('Unexpected image shape: %s' % str(image_data.shape))      dataset[num_images, :, :] = image_data      num_images = num_images + 1    except IOError as e:      print('Could not read:', image_file, ':', e, '- it\'s ok, skipping.')      dataset = dataset[0:num_images, :, :]  if num_images < min_num_images:    raise Exception('Many fewer images than expected: %d < %d' %                    (num_images, min_num_images))      print('Full dataset tensor:', dataset.shape)  print('Mean:', np.mean(dataset))  print('Standard deviation:', np.std(dataset))  return dataset

其中该函数的调用方式如下：

  dataset = load_letter(folder, min_num_images_per_class)  try:    with open(set_filename, 'wb') as f:      pickle.dump(dataset, f, pickle.HIGHEST_PROTOCOL)

这里的想法是：

现在让我们以更易管理的格式加载数据。由于根据你的计算机设置，你可能无法将所有数据一次性加载到内存中，我们将每个类别加载到一个独立的数据集中，存储在磁盘上并独立管理它们。稍后我们将它们合并成一个可管理大小的单一数据集。

我们将整个数据集转换为一个3D数组（图像索引，x，y），其值为浮点数，标准化为大约零均值和标准差约0.5，以便后续的训练更容易。

回答：

请按以下方式操作：

#定义一个函数将标签转换为字母def letter(i):    return 'abcdefghij'[i]# 你需要一个matplotlib inline来在python笔记本中显示图像%matplotlib inline#在数据集长度范围内随机选择一个数字sample_idx = np.random.randint(0, len(train_dataset))#现在我们显示它plt.imshow(train_dataset[sample_idx])plt.title("Char " + letter(train_labels[sample_idx]))

你的代码实际上改变了数据集的类型，它不是一个大小为(220000, 28, 28)的ndarray

一般来说，pickle文件保存的是一些对象，而不是数组本身。你应该直接使用pickle中的对象来获取你的训练数据集（使用你代码片段中的表示法）：

#将给你train_dataset和labelstrain_dataset = loaded_pickle['train_dataset']train_labels = loaded_pickle['train_labels']

更新：

应@gsarmas的请求，我对整个作业1的解决方案的链接位于这里。

代码已注释且大多自解释，但如果有任何问题，请通过你在github上喜欢的任何方式联系我

学技术

深度学习Udacity课程：作业1的第二个问题（notMNIST）

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复