预期二进制或unicode字符串,但得到的是nan – tensorflow/pandas

我对TensorFlow/机器学习还比较新手,因此遇到了一些困难。我有一个csv格式的数据集,可以在这里找到,并且我想像这里一样使用pandas读取它。在另一个数据集上它是可以工作的,但我对其进行了修改和扩展,我觉得这里可能遗漏了一些重要的东西。基本上,我只是想从给定的数据集中预测“overall”评分。以下是我的代码和我得到的回溯信息:

import pandas as pdimport tensorflow as tfimport tempfileCOLUMNS = ["reviewerID", "asin", "reviewerName", "helpful_0", "helpful_1", "reviewText",           "overall", "summary", "unixReviewTime"]CATEGORICAL_COLUMNS = ["reviewerID", "reviewerName", "reviewText", "summary"]CONTINUOUS_COLUMNS = ["helpful_0", "helpful_1", "unixReviewTime"]df_train = pd.read_csv('Digital_Music_5.csv', names=COLUMNS, skipinitialspace=True,                       low_memory=False, skiprows=1)df_test = pd.read_csv('Digital_Music_5_test.csv', names=COLUMNS,                      skipinitialspace=True, skiprows=1)LABEL_COLUMN = "label"df_train[LABEL_COLUMN] = df_train["overall"]df_test[LABEL_COLUMN] = df_train["overall"]print(df_train)def input_fn(df):    # Creates a dictionary mapping from each continuous feature column name (k)    # to the values of that column stored in a constant Tensor.    continuous_cols = {k: tf.constant(df[k].values)                       for k in CONTINUOUS_COLUMNS}    # Creates a dictionary mapping from each categorical feature column name    # (k) to the values of that column stored in a tf.SparseTensor.    categorical_cols = {k: tf.SparseTensor(        indices=[[i, 0] for i in range(df[k].size)],        values=df[k].values,        dense_shape=[df[k].size, 1],) for k in CATEGORICAL_COLUMNS}    # Merges the two dictionaries into one.    feature_cols = dict(continuous_cols)    feature_cols.update(categorical_cols)    # Converts the label column into a constant Tensor.    label = tf.constant(df[LABEL_COLUMN].values)    # Returns the feature columns and the label.    return feature_cols, labeldef train_input_fn():    return input_fn(df_train)def eval_input_fn():    return input_fn(df_test)reviewText = tf.contrib.layers.sparse_column_with_hash_bucket("reviewText", hash_bucket_size=100000)reviewerID = tf.contrib.layers.sparse_column_with_hash_bucket("reviewerID", hash_bucket_size=100000)reviewerName = tf.contrib.layers.sparse_column_with_hash_bucket("reviewerName", hash_bucket_size=100000)summary = tf.contrib.layers.sparse_column_with_hash_bucket("summary", hash_bucket_size=100000)asin = tf.contrib.layers.real_valued_column("asin")helpful_0 = tf.contrib.layers.real_valued_column("helpful_0")helpful_1 = tf.contrib.layers.real_valued_column("helpful_1")unixReviewTime = tf.contrib.layers.real_valued_column("unixReviewTime")# reviewText_x_summary = tf.contrib.layers.crossed_column([reviewText, summary], hash_bucket_size=100000)# reviewerID_x_reviewerName = tf.contrib.layers.crossed_column([reviewerID, reviewerName], hash_bucket_size=100000)# reviewText_x_reviewerID_x_reviewerName = tf.contrib.layers.crossed_column([reviewText, reviewerID, reviewerName], hash_bucket_size=100000)model_dir = tempfile.mkdtemp()m = tf.contrib.learn.LinearClassifier(feature_columns=[reviewText, reviewerName, summary,                                                       asin, helpful_0, helpful_1, unixReviewTime], optimizer=tf.train.FtrlOptimizer(                                                                     learning_rate=0.1,                                                                     l1_regularization_strength=1.0,                                                                     l2_regularization_strength=1.0),                                                       model_dir=model_dir)m.fit(input_fn=train_input_fn, steps=200)# results = m.evaluate(input_fn=eval_input_fn, steps=1)# for key in sorted(results):#     print("{}: {}".format(key, results[key]))

回溯信息:

Traceback (most recent call last):  File "amazon_reviews.py", line 78, in <module>    m.fit(input_fn=train_input_fn, steps=200)  File "/home/cfritz/virtualenvs/tensorflow/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 280, in new_func    return func(*args, **kwargs)  File "/home/cfritz/virtualenvs/tensorflow/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 426, in fit    loss = self._train_model(input_fn=input_fn, hooks=hooks)  File "/home/cfritz/virtualenvs/tensorflow/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 932, in _train_model    features, labels = input_fn()  File "amazon_reviews.py", line 47, in train_input_fn    return input_fn(df_train)  File "amazon_reviews.py", line 36, in input_fn    dense_shape=[df[k].size, 1],) for k in CATEGORICAL_COLUMNS}  File "amazon_reviews.py", line 36, in <dictcomp>    dense_shape=[df[k].size, 1],) for k in CATEGORICAL_COLUMNS}  File "/home/cfritz/virtualenvs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/sparse_tensor.py", line 125, in __init__    values, name="values", as_ref=True)  File "/home/cfritz/virtualenvs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 702, in internal_convert_to_tensor    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)  File "/home/cfritz/virtualenvs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 110, in _constant_tensor_conversion_function    return constant(v, dtype=dtype, name=name)  File "/home/cfritz/virtualenvs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 99, in constant    tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))  File "/home/cfritz/virtualenvs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 451, in make_tensor_proto    append_fn(tensor_proto, proto_values)  File "/home/cfritz/virtualenvs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 109, in SlowAppendObjectArrayToTensorProto    tensor_proto.string_val.extend([compat.as_bytes(x) for x in proto_values])  File "/home/cfritz/virtualenvs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 109, in <listcomp>    tensor_proto.string_val.extend([compat.as_bytes(x) for x in proto_values])  File "/home/cfritz/virtualenvs/tensorflow/lib/python3.6/site-packages/tensorflow/python/util/compat.py", line 65, in as_bytes    (bytes_or_text,))TypeError: Expected binary or unicode string, got nan

回答:

你的输入DataFrame包含空的评论者姓名和评论文本,这些被pd.read_csv()映射为NaN,然而TensorFlow期望的是字符串而不是NaN。

使用以下命令检查空单元格:

df_train[df_train.isnull().any(axis=1)]

你可以简单地将这些NaN转换为一个空字符串,使用

df_train.fillna('', inplace=True)

或者让pd.read_csv()直接创建空字符串而不是NaN,使用na_values=[]

df_train = pd.read_csv('Digital_Music_5.csv', names=COLUMNS,                         skipinitialspace=True, low_memory=False,                         skiprows=1, na_values=[])

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注