TypeError: 无法为列构建TypeSpec

我正在尝试使用这个数据集中的’Name’、’Platform’、’Genre’、’Publisher’和’Year’的值来预测全球销售情况:https://www.kaggle.com/gregorut/videogamesales

这是我用于训练模型的代码:

from __future__ import absolute_import, division, print_function, unicode_literalsimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom IPython.display import clear_outputfrom six.moves import urllibimport tensorflow as tfdftrain = pd.read_csv('./vgsales_eval.csv')dfeval = pd.read_csv('./vgsales_train.csv')print(dftrain[dftrain.isnull().any(axis=1)])y_train = dftrain.pop('Global_Sales')y_eval = dfeval.pop('Global_Sales')CATEGORICAL_COLUMNS = ['Name', 'Platform', 'Genre', 'Publisher']NUMERIC_COLUMNS = ['Year']feature_columns = []for feature_name in CATEGORICAL_COLUMNS:  vocabulary = dftrain[feature_name].unique()  # gets a list of all unique values from given feature column  feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary))for feature_name in NUMERIC_COLUMNS:  feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.int64))print(feature_columns)def make_input_fn(data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):  def input_function():      ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))      if shuffle:      ds = ds.shuffle(1000)      ds = ds.batch(batch_size).repeat(num_epochs)      return ds  return input_function  train_input_fn = make_input_fn(dftrain, y_train)  eval_input_fn = make_input_fn(dfeval, y_eval, num_epochs=1, shuffle=False)linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)linear_est.train(train_input_fn)

我得到了以下错误:

Traceback (most recent call last):  File "C:\Users\kuhn-\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\data\util\structure.py", line 93, in normalize_element    spec = type_spec_from_value(t, use_fallback=False)  File "C:\Users\kuhn-\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\data\util\structure.py", line 466, in type_spec_from_value    (element, type(element).__name__))TypeError: Could not build a TypeSpec for 0                 Tecmo Koei1       Nippon Ichi Software2                    Ubisoft3                 Activision4                      Atari                ...6594                   Kemco6595              Infogrames6596              Activision6597                7G//AMES6598                 WanadooName: Publisher, Length: 6599, dtype: object with type SeriesDuring handling of the above exception, another exception occurred:Traceback (most recent call last):  File "c:\Users\kuhn-\Documents\Github\Tensorflow_Test\VideoGameSales_Test\main.py", line 45, in <module>    linear_est.train(train_input_fn)  File "C:\Users\kuhn-\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 349, in train    loss = self._train_model(input_fn, hooks, saving_listeners)  File "C:\Users\kuhn-\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1175, in _train_model    return self._train_model_default(input_fn, hooks, saving_listeners)  File "C:\Users\kuhn-\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1201, in _train_model_default    self._get_features_and_labels_from_input_fn(input_fn, ModeKeys.TRAIN))  File "C:\Users\kuhn-\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1037, in _get_features_and_labels_from_input_fn    self._call_input_fn(input_fn, mode))  File "C:\Users\kuhn-\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1130, in _call_input_fn    return input_fn(**kwargs)  File "c:\Users\kuhn-\Documents\Github\Tensorflow_Test\VideoGameSales_Test\main.py", line 34, in input_function    ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))  File "C:\Users\kuhn-\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py", line 682, in from_tensor_slices    return TensorSliceDataset(tensors)  File "C:\Users\kuhn-\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py", line 3001, in __init__    element = structure.normalize_element(element)  File "C:\Users\kuhn-\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\data\util\structure.py", line 98, in normalize_element    ops.convert_to_tensor(t, name="component_%d" % i))  File "C:\Users\kuhn-\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1499, in convert_to_tensor    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)  File "C:\Users\kuhn-\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\constant_op.py", line 338, in _constant_tensor_conversion_function    return constant(v, dtype=dtype, name=name)  File "C:\Users\kuhn-\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\constant_op.py", line 264, in constant    allow_broadcast=True)  File "C:\Users\kuhn-\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\constant_op.py", line 282, in _constant_impl    allow_broadcast=allow_broadcast))  File "C:\Users\kuhn-\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\tensor_util.py", line 563, in make_tensor_proto    append_fn(tensor_proto, proto_values)  File "C:\Users\kuhn-\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\tensor_util.py", line 155, in SlowAppendObjectArrayToTensorProto    tensor_proto.string_val.extend([compat.as_bytes(x) for x in proto_values])  File "C:\Users\kuhn-\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\tensor_util.py", line 155, in <listcomp>    tensor_proto.string_val.extend([compat.as_bytes(x) for x in proto_values])  File "C:\Users\kuhn-\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\util\compat.py", line 87, in as_bytes    (bytes_or_text,))TypeError: Expected binary or unicode string, got nan

我在这里做错了什么?这是数据集的问题,还是我需要以不同的方式读取值?


回答:

这基本上是由于你所使用的数据中存在null值,你需要在加载数据时处理这些值。

我做了一些修改。

  1. 为了删除含有null值的记录,你也可以根据列和需要填充的值的数据类型使用df.fillna进行填充。
  2. 我将Year列的数据类型从float改为int,因为这会导致tensor_slices的另一个问题。

以下是使用你所使用的数据的修改后的代码。

df = pd.read_csv('/content/vgsales.csv')# print(df.head())print(df[df.isnull().any(axis=1)])# df.fillna('', inplace=True)df.dropna(how="any",inplace = True)df.Year = df.Year.astype(int) CATEGORICAL_COLUMNS = ['Name', 'Platform', 'Genre', 'Publisher']NUMERIC_COLUMNS = ['Year'] feature_columns = []for feature_name in CATEGORICAL_COLUMNS:  vocabulary = df[feature_name].unique()  # gets a list of all unique values from given feature column  feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary))for feature_name in NUMERIC_COLUMNS:  feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.int64))print(feature_columns)def make_input_fn(data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):  def input_function():      ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))      if shuffle:      ds = ds.shuffle(1000)      ds = ds.batch(batch_size).repeat(num_epochs)      return ds  return input_function  train_input_fn = make_input_fn(df, y_train)  linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注