我正在根据这个教程使用Tensorflow构建一个深度神经网络回归器。当我尝试使用tf.estimator的export_savemodel保存模型时,我遇到了以下错误:
raise ValueError('Feature {} is not in features dictionary.'.format(key)) ValueError: Feature ad_provider is not in features dictionary.
我需要导出模型,以便在Google Cloud Platform上部署以支持预测。
这是我定义列的地方:
CSV_COLUMNS = ["ad_provider", "device", "split_group","gold", "secret_areas", "scored_enemies", "tutorial_sec", "video_success"]FEATURES = ["ad_provider", "device", "split_group","gold", "secret_areas", "scored_enemies", "tutorial_sec"]LABEL = "video_success"ad_provider = tf.feature_column.categorical_column_with_vocabulary_list( "ad_provider", ["Organic","Apple Search Ads","googleadwords_int","Facebook Ads","website"] )split_group = tf.feature_column.categorical_column_with_vocabulary_list( "split_group", [1,2,3,4])device = tf.feature_column.categorical_column_with_hash_bucket( "device", hash_bucket_size=100)secret_areas = tf.feature_column.numeric_column("secret_areas")gold = tf.feature_column.numeric_column("gold")scored_enemies = tf.feature_column.numeric_column("scored_enemies")finish_tutorial_sec = tf.feature_column.numeric_column("tutorial_sec")video_success = tf.feature_column.numeric_column("video_success")feature_columns = [tf.feature_column.indicator_column(ad_provider),tf.feature_column.embedding_column(device, dimension=8),tf.feature_column.indicator_column(split_group),tf.feature_column.numeric_column(key="gold"),tf.feature_column.numeric_column(key="secret_areas"),tf.feature_column.numeric_column(key="scored_enemies"),tf.feature_column.numeric_column(key="tutorial_sec"),]
之后,我创建了一个函数用于以JSON字典的形式导出我的模型。我不确定我的服务函数是否正确。
def json_serving_input_fn(): """Build the serving inputs.""" inputs = {} for feat in feature_columns: inputs[feat.name] = tf.placeholder(shape=[None], dtype= feat.dtype if hasattr(feat, 'dtype') else tf.string)features = { key: tf.expand_dims(tensor, -1) for key, tensor in inputs.items()} return tf.contrib.learn.InputFnOps(features, None, inputs)
这是我的其他代码:
def main(unused_argv): #Normalize columns 'Gold' and 'tutorial_sec' for Traininig Set train_n = training_set train_n['gold'] = (train_n['gold'] - train_n['gold'].mean()) / (train_n['gold'].max() - train_n['gold'].min()) train_n['tutorial_sec'] = (train_n['tutorial_sec'] - train_n['tutorial_sec'].mean()) / (train_n['tutorial_sec'].max() - train_n['tutorial_sec'].min()) train_n['scored_enemies'] = (train_n['scored_enemies'] - train_n['scored_enemies'].mean()) / (train_n['scored_enemies'].max() - train_n['scored_enemies'].min()) test_n = test_set test_n['gold'] = (test_n['gold'] - test_n['gold'].mean()) / (test_n['gold'].max() - test_n['gold'].min()) test_n['tutorial_sec'] = (test_n['tutorial_sec'] - test_n['tutorial_sec'].mean()) / (test_n['tutorial_sec'].max() - test_n['tutorial_sec'].min()) test_n['scored_enemies'] = (test_n['scored_enemies'] - test_n['scored_enemies'].mean()) / (test_n['scored_enemies'].max() - test_n['scored_enemies'].min()) train_input_fn = tf.estimator.inputs.pandas_input_fn( x=train_n, y=pd.Series(train_n[LABEL].values), batch_size=100, num_epochs=None, shuffle=True) test_input_fn = tf.estimator.inputs.pandas_input_fn( x=test_n, y=pd.Series(test_n[LABEL].values), batch_size=100, num_epochs=1, shuffle=False) regressor = tf.estimator.DNNRegressor(feature_columns=feature_columns, hidden_units=[40, 30, 20], model_dir="model1", optimizer='RMSProp' ) # Train regressor.train(input_fn=train_input_fn, steps=5) regressor.export_savedmodel("test",json_serving_input_fn) #Evaluate loss over one epoch of test_set. #For each step, calls `input_fn`, which returns one batch of data. ev = regressor.evaluate( input_fn=test_input_fn) loss_score = ev["loss"] print("Loss: {0:f}".format(loss_score)) for key in sorted(ev): print("%s: %s" % (key, ev[key])) # Print out predictions over a slice of prediction_set. y = regressor.predict( input_fn=test_input_fn) # Array with prediction list! predictions = list(p["predictions"] for p in y) #real = list(p["real"] for p in pd.Series(training_set[LABEL].values)) real = test_set[LABEL].values diff = np.subtract(real,predictions) diff = np.absolute(diff) diff = np.mean(diff) print("Mean Square Error of Test Set = ",diff*diff)
回答:
除了您提到的那个问题之外,我预见到您还会遇到多个其他问题:
- 您使用的是
tf.estimator.DnnRegressor
,它是在TensorFlow 1.3中引入的。CloudML Engine仅正式支持TF 1.2版本。 - 您在pandas数据框中对特征进行了标准化,而在服务时不会发生这种情况(除非您在客户端进行)。这会引入偏差,您将获得较差的预测结果。
所以我们先从使用tf.contrib.learn.DNNRegressor
开始,它只需要进行一些小的修改:
regressor = tf.estimator.DNNRegressor( feature_columns=feature_columns, hidden_units=[40, 30, 20], model_dir="model1", optimizer='RMSProp')regressor.fit(input_fn=train_input_fn, steps=5)regressor.export_savedmodel("test",json_serving_input_fn)
请注意这里使用的是fit
而不是train
。
(注意:您的json_serving_inputfn
实际上已经为TF 1.2编写,并且与TF 1.3不兼容。这对于现在来说是好的)。
现在,您看到的错误的根本原因是ad_provider
列/特征不在输入和特征列表中(但您确实有ad_provider_indicator
)。这是因为您正在遍历feature_columns
而不是原始输入列列表。要解决这个问题,您应该遍历实际的输入而不是特征列;然而,我们还需要知道类型(以下是简化版本,仅展示了几个列):
CSV_COLUMNS = ["ad_provider", "gold", "video_success"] FEATURES = ["ad_provider", "gold"] TYPES = [tf.string, tf.float32] LABEL = "video_success" def json_serving_input_fn(): """Build the serving inputs.""" inputs = {} for feat, dtype in zip(FEATURES, TYPES): inputs[feat] = tf.placeholder(shape=[None], dtype=dtype) features = { key: tf.expand_dims(tensor, -1) for key, tensor in inputs.items() } return tf.contrib.learn.InputFnOps(features, None, inputs)
最后,为了标准化您的数据,您可能希望在图中进行操作。您可以尝试使用tf.transform
,或者,另一种方法是编写一个自定义的estimator来执行转换,并将实际的模型实现委托给DNNRegressor。