在训练TensorFlow LinearClassifier时出现TypeError

我在Jupyter笔记本中尝试训练模型时遇到了以下错误:

INFO:tensorflow:Create CheckpointSaverHook.INFO:tensorflow:Error reported to Coordinator: <class 'SystemError'>, <built-in function TF_Run> returned a result with an error setINFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpodutz9be/model.ckpt.---------------------------------------------------------------------------TypeError                                 Traceback (most recent call last)TypeError: expected bytes, float foundDuring handling of the above exception, another exception occurred:SystemError                               Traceback (most recent call last)<ipython-input-12-44daeaf784e5> in <module>()----> 1 model.train(input_fn=input_func,steps=200)~/anaconda3/envs/tfdeeplearning/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py in train(self, input_fn, hooks, steps, max_steps)    239       hooks.append(training.StopAtStepHook(steps, max_steps))    240 --> 241     loss = self._train_model(input_fn=input_fn, hooks=hooks)    242     logging.info('Loss for final step: %s.', loss)    243     return self~/anaconda3/envs/tfdeeplearning/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py in _train_model(self, input_fn, hooks)    684         loss = None    685         while not mon_sess.should_stop():--> 686           _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])    687       return loss    688 ~/anaconda3/envs/tfdeeplearning/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py in __exit__(self, exception_type, exception_value, traceback)    532     if exception_type in [errors.OutOfRangeError, StopIteration]:    533       exception_type = None--> 534     self._close_internal(exception_type)    535     # __exit__ should return True to suppress an exception.    536     return exception_type is None~/anaconda3/envs/tfdeeplearning/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py in _close_internal(self, exception_type)    567     finally:    568       try:--> 569         self._sess.close()    570       finally:    571         self._sess = None~/anaconda3/envs/tfdeeplearning/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py in close(self)    809     if self._sess:    810       try:--> 811         self._sess.close()    812       except _PREEMPTION_ERRORS:    813         pass~/anaconda3/envs/tfdeeplearning/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py in close(self)    906       self._coord.join(    907           stop_grace_period_secs=self._stop_grace_period_secs,--> 908           ignore_live_threads=True)    909     finally:    910       try:~/anaconda3/envs/tfdeeplearning/lib/python3.5/site-packages/tensorflow/python/training/coordinator.py in join(self, threads, stop_grace_period_secs, ignore_live_threads)    387       self._registered_threads = set()    388       if self._exc_info_to_raise:--> 389         six.reraise(*self._exc_info_to_raise)    390       elif stragglers:    391         if ignore_live_threads:~/.local/lib/python3.5/site-packages/six.py in reraise(tp, value, tb)    691             if value.__traceback__ is not tb:    692                 raise value.with_traceback(tb)--> 693             raise value    694         finally:    695             value = None~/anaconda3/envs/tfdeeplearning/lib/python3.5/site-packages/tensorflow/python/estimator/inputs/queues/feeding_queue_runner.py in _run(self, sess, enqueue_op, feed_fn, coord)     92         try:     93           feed_dict = None if feed_fn is None else feed_fn()---> 94           sess.run(enqueue_op, feed_dict=feed_dict)     95         except (errors.OutOfRangeError, errors.CancelledError):     96           # This exception indicates that a queue was closed.~/anaconda3/envs/tfdeeplearning/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)    893     try:    894       result = self._run(None, fetches, feed_dict, options_ptr,--> 895                          run_metadata_ptr)    896       if run_metadata:    897         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)~/anaconda3/envs/tfdeeplearning/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)   1122     if final_fetches or final_targets or (handle and feed_dict_tensor):   1123       results = self._do_run(handle, final_targets, final_fetches,-> 1124                              feed_dict_tensor, options, run_metadata)   1125     else:   1126       results = []~/anaconda3/envs/tfdeeplearning/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)   1319     if handle is None:   1320       return self._do_call(_run_fn, self._session, feeds, fetches, targets,-> 1321                            options, run_metadata)   1322     else:   1323       return self._do_call(_prun_fn, self._session, handle, feeds, fetches)~/anaconda3/envs/tfdeeplearning/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)   1325   def _do_call(self, fn, *args):   1326     try:-> 1327       return fn(*args)   1328     except errors.OpError as e:   1329       message = compat.as_text(e.message)~/anaconda3/envs/tfdeeplearning/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)   1304           return tf_session.TF_Run(session, options,   1305                                    feed_dict, fetch_list, target_list,-> 1306                                    status, run_metadata)   1307    1308     def _prun_fn(session, handle, feed_dict, fetch_list):SystemError: <built-in function TF_Run> returned a result with an error set

我尝试了指定或不指定特征列的数据类型,排除分类列,重启内核,改变批量大小和训练轮数。我可能错过了什么非常简单的事情,但已经花了几个小时试图找出问题所在 🙁

这是代码本身,提前感谢您查看这个问题:

train = pd.read_csv('train.csv')test = pd.read_csv('test.csv')test.columnstrain.info()<class 'pandas.core.frame.DataFrame'>RangeIndex: 891 entries, 0 to 890Data columns (total 12 columns):PassengerId    891 non-null int64Survived       891 non-null int64Pclass         891 non-null int64Name           891 non-null objectSex            891 non-null objectAge            714 non-null float64SibSp          891 non-null int64Parch          891 non-null int64Ticket         891 non-null objectFare           891 non-null float64Cabin          204 non-null objectEmbarked       889 non-null objectdtypes: float64(2), int64(5), object(5)memory usage: 83.6+ KBy  = train['Survived']X = train.drop(['Name','Survived','Ticket','PassengerId'],axis=1)X_test = testcols_to_norm = [ 'Fare']X[cols_to_norm] = X[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min()))pclass = tf.feature_column.numeric_column('Pclass', dtype=tf.int64)sex = tf.feature_column.categorical_column_with_vocabulary_list(key="Sex", vocabulary_list=["male", "female"])age = tf.feature_column.numeric_column('Age', dtype=tf.float64)sibsp = tf.feature_column.numeric_column('SibSp', dtype=tf.int64)fare = tf.feature_column.numeric_column('Fare', dtype=tf.float64)parch = tf.feature_column.numeric_column('Parch', dtype=tf.int64)embarked = tf.feature_column.categorical_column_with_hash_bucket('Embarked', hash_bucket_size=10000)age_buckets = tf.feature_column.bucketized_column(age, boundaries=[0,10,20,30,40,50,60,70,80])feat_cols = [pclass, age_buckets, sex, sibsp, parch, embarked]input_func = tf.estimator.inputs.pandas_input_fn(x=X,y=y,batch_size=4,num_epochs=None,shuffle=True)model = tf.estimator.LinearClassifier(feature_columns=feat_cols)model.train(input_fn=input_func,steps=200)

顺便说一下,当TF崩溃时,我在终端中看到了这个错误:

I 22:37:15.691 NotebookApp] Saving file at /kaggle/titanic/titanic_nb.ipynb[2408:2444:0428/225656.234325:ERROR:upload_data_presenter.cc(73)] Not implemented reached in virtual void extensions::RawDataPresenter::FeedNext(const net::UploadElementReader &)[2408:2444:0428/225656.322176:ERROR:upload_data_presenter.cc(73)] Not implemented reached in virtual void extensions::RawDataPresenter::FeedNext(const net::UploadElementReader &)[2408:2444:0428/225656.442632:ERROR:upload_data_presenter.cc(73)] Not implemented reached in virtual void extensions::RawDataPresenter::FeedNext(const net::UploadElementReader &)[2408:2444:0428/225656.831056:ERROR:upload_data_presenter.cc(73)] Not implemented reached in virtual void extensions::RawDataPresenter::FeedNext(const net::UploadElementReader &)[2408:2408:0428/225656.895872:ERROR:CONSOLE(6)] "Uncaught ReferenceError: gbar is not defined", source: https://clients5.google.com/pagead/drt/dn/ (6)


回答:

天哪,解决方案竟然如此简单 🙂

我手动将整数列的dtype设置为int8,这些列包含的数字在字节大小的范围内:

pclass = tf.feature_column.numeric_column(‘Pclass’, dtype=tf.int8)

parch = tf.feature_column.numeric_column(‘Parch’, dtype=tf.int8)

缩小数据大小确实是合理的,尽管我仍然想知道为什么我之前在处理其他仅包含字节大小数字的数据集时没有遇到这个问题。

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注