使用tf Estimator和export_savedmodel函数导出模型

我正在根据这个教程使用Tensorflow构建一个深度神经网络回归器。当我尝试使用tf.estimator的export_savemodel保存模型时，我遇到了以下错误：

 raise ValueError('Feature {} is not in features dictionary.'.format(key)) ValueError: Feature ad_provider is not in features dictionary.

我需要导出模型，以便在Google Cloud Platform上部署以支持预测。

这是我定义列的地方：

CSV_COLUMNS = ["ad_provider", "device", "split_group","gold", "secret_areas", "scored_enemies", "tutorial_sec", "video_success"]FEATURES = ["ad_provider", "device", "split_group","gold", "secret_areas", "scored_enemies", "tutorial_sec"]LABEL = "video_success"ad_provider = tf.feature_column.categorical_column_with_vocabulary_list(    "ad_provider", ["Organic","Apple Search Ads","googleadwords_int","Facebook Ads","website"]  )split_group = tf.feature_column.categorical_column_with_vocabulary_list(    "split_group", [1,2,3,4])device = tf.feature_column.categorical_column_with_hash_bucket(    "device", hash_bucket_size=100)secret_areas = tf.feature_column.numeric_column("secret_areas")gold = tf.feature_column.numeric_column("gold")scored_enemies = tf.feature_column.numeric_column("scored_enemies")finish_tutorial_sec = tf.feature_column.numeric_column("tutorial_sec")video_success = tf.feature_column.numeric_column("video_success")feature_columns = [tf.feature_column.indicator_column(ad_provider),tf.feature_column.embedding_column(device, dimension=8),tf.feature_column.indicator_column(split_group),tf.feature_column.numeric_column(key="gold"),tf.feature_column.numeric_column(key="secret_areas"),tf.feature_column.numeric_column(key="scored_enemies"),tf.feature_column.numeric_column(key="tutorial_sec"),]

之后，我创建了一个函数用于以JSON字典的形式导出我的模型。我不确定我的服务函数是否正确。

def json_serving_input_fn():  """Build the serving inputs."""  inputs = {}  for feat in feature_columns:    inputs[feat.name] = tf.placeholder(shape=[None], dtype= feat.dtype if     hasattr(feat, 'dtype') else tf.string)features = {  key: tf.expand_dims(tensor, -1)  for key, tensor in inputs.items()}  return tf.contrib.learn.InputFnOps(features, None, inputs)

这是我的其他代码：

def main(unused_argv):  #Normalize columns 'Gold' and 'tutorial_sec' for Traininig Set  train_n = training_set  train_n['gold'] = (train_n['gold'] - train_n['gold'].mean()) / (train_n['gold'].max() - train_n['gold'].min())  train_n['tutorial_sec'] = (train_n['tutorial_sec'] - train_n['tutorial_sec'].mean()) / (train_n['tutorial_sec'].max() - train_n['tutorial_sec'].min())  train_n['scored_enemies'] = (train_n['scored_enemies'] - train_n['scored_enemies'].mean()) / (train_n['scored_enemies'].max() - train_n['scored_enemies'].min())  test_n = test_set  test_n['gold'] = (test_n['gold'] - test_n['gold'].mean()) / (test_n['gold'].max() - test_n['gold'].min())  test_n['tutorial_sec'] = (test_n['tutorial_sec'] - test_n['tutorial_sec'].mean()) / (test_n['tutorial_sec'].max() - test_n['tutorial_sec'].min())  test_n['scored_enemies'] = (test_n['scored_enemies'] - test_n['scored_enemies'].mean()) / (test_n['scored_enemies'].max() - test_n['scored_enemies'].min())  train_input_fn = tf.estimator.inputs.pandas_input_fn(    x=train_n,    y=pd.Series(train_n[LABEL].values),    batch_size=100,    num_epochs=None,    shuffle=True)  test_input_fn = tf.estimator.inputs.pandas_input_fn(    x=test_n,    y=pd.Series(test_n[LABEL].values),    batch_size=100,    num_epochs=1,         shuffle=False)  regressor = tf.estimator.DNNRegressor(feature_columns=feature_columns,                                      hidden_units=[40, 30, 20],                                      model_dir="model1",                                      optimizer='RMSProp'                                      )  # Train  regressor.train(input_fn=train_input_fn, steps=5)  regressor.export_savedmodel("test",json_serving_input_fn)  #Evaluate loss over one epoch of test_set.  #For each step, calls `input_fn`, which returns one batch of data.  ev = regressor.evaluate(    input_fn=test_input_fn)  loss_score = ev["loss"]  print("Loss: {0:f}".format(loss_score))  for key in sorted(ev):      print("%s: %s" % (key, ev[key]))  # Print out predictions over a slice of prediction_set.  y = regressor.predict(    input_fn=test_input_fn)  # Array with prediction list!  predictions = list(p["predictions"] for p in y)  #real = list(p["real"] for p in pd.Series(training_set[LABEL].values))  real = test_set[LABEL].values  diff = np.subtract(real,predictions)  diff = np.absolute(diff)  diff = np.mean(diff)  print("Mean Square Error of Test Set = ",diff*diff)

回答：

除了您提到的那个问题之外，我预见到您还会遇到多个其他问题：

您使用的是tf.estimator.DnnRegressor，它是在TensorFlow 1.3中引入的。CloudML Engine仅正式支持TF 1.2版本。
您在pandas数据框中对特征进行了标准化，而在服务时不会发生这种情况（除非您在客户端进行）。这会引入偏差，您将获得较差的预测结果。

所以我们先从使用tf.contrib.learn.DNNRegressor开始，它只需要进行一些小的修改：

regressor = tf.estimator.DNNRegressor(    feature_columns=feature_columns,    hidden_units=[40, 30, 20],    model_dir="model1",    optimizer='RMSProp')regressor.fit(input_fn=train_input_fn, steps=5)regressor.export_savedmodel("test",json_serving_input_fn)

请注意这里使用的是fit而不是train。

（注意：您的json_serving_inputfn实际上已经为TF 1.2编写，并且与TF 1.3不兼容。这对于现在来说是好的）。

现在，您看到的错误的根本原因是ad_provider列/特征不在输入和特征列表中（但您确实有ad_provider_indicator）。这是因为您正在遍历feature_columns而不是原始输入列列表。要解决这个问题，您应该遍历实际的输入而不是特征列；然而，我们还需要知道类型（以下是简化版本，仅展示了几个列）：

CSV_COLUMNS = ["ad_provider", "gold", "video_success"] FEATURES = ["ad_provider", "gold"] TYPES = [tf.string, tf.float32] LABEL = "video_success" def json_serving_input_fn():   """Build the serving inputs."""   inputs = {}   for feat, dtype in zip(FEATURES, TYPES):     inputs[feat] = tf.placeholder(shape=[None], dtype=dtype)   features = {    key: tf.expand_dims(tensor, -1)    for key, tensor in inputs.items()  }  return tf.contrib.learn.InputFnOps(features, None, inputs)

最后，为了标准化您的数据，您可能希望在图中进行操作。您可以尝试使用tf.transform，或者，另一种方法是编写一个自定义的estimator来执行转换，并将实际的模型实现委托给DNNRegressor。

学技术

使用tf Estimator和export_savedmodel函数导出模型

发表回复取消回复

相关文章：

Related Posts

Keras Dense层输入未被展平

无法将分类变量输入随机森林

如何在Keras中对每个输出应用Sigmoid函数？

如何选择类概率的最佳阈值？

在Keras中使用深度学习得到不同的结果

‘MatMul’操作的输入’b’类型为float32，与参数’a’的类型float64不匹配

发表回复 取消回复

发表回复取消回复