y应该是一维数组,但得到的是形状为()的数组

我已经训练并保存了一个模型。我试图在新数据上进一步训练该模型,但出现了错误。相关代码部分如下:

from tensorflow.keras.preprocessing.text import Tokenizer# 使用的最多单词数量(最常见)。MAX_NB_WORDS = 50000# 每条投诉中的最大单词数。MAX_SEQUENCE_LENGTH = 250# 这是固定的。EMBEDDING_DIM = 100tokenizer = Tokenizer(num_words=MAX_NB_WORDS, filters='!"#$%&()*+,-./:;<=>?@[\]^_`{|}~', lower=True)tokenizer.fit_on_texts(master_df['Observation'].values)word_index = tokenizer.word_indexfrom sklearn.feature_extraction.text import CountVectorizercv=CountVectorizer(max_df=1.0,min_df=1, stop_words=stop_words, max_features=10000, ngram_range=(1,3))X=cv.fit_transform(X)with open("../sgd.pickle", 'rb') as f:    sgd = pickle.load(f)def output_sample(sentence):    test=preprocess_text(sentence)    test=test.lower()    #print(test)    test=[test]     tokenizer.fit_on_sequences(test)    new_words= tokenizer.word_index    #print(word_index)``    test1=cv.transform(test)    #print(test1)    output=sgd.predict(test1)    return output[0]def retrain(X,y):    X=preprocess_text(X)    X=X.lower()    X=[X]    tokenizer.fit_on_texts(X)    new_words=tokenizer.word_index    X=cv.fit_transform(X)    sgd.fit(X,y)    with open('sgd.pickle', 'wb') as f:        pickle.dump(sgd, f)    print("模型已在新数据上训练")sentence=input("\n\n输入您的观察结果:\n\n")output=output_sample(sentence)print("\n\n风险预测为",preprocess_text(output),"\n\n")print("上述预测正确吗?\n")corr=input("按'y'表示是,按'n'表示否。\n")if corr=='y':    newy=np.array(output)    retrain(sentence,newy)elif corr=='n':    print("正确的风险是什么?\n1. 低\n2. 中\n")    r=input("输入相应的数字: ")    if r=='1':        newy=np.array('Low')        retrain(sentence,newy)    elif r=='2':        newy=np.array('Medium')        retrain(sentence,newy)    else:        print("输入错误。请重新启动应用程序。")else:    print("输入错误。请重新启动应用程序")

运行程序时,错误发生在sgd.fit(X,y)处。错误信息如下:

---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)~\AppData\Local\Temp/ipykernel_11300/3528077041.py in <module>      5     newy=[output]      6     print(newy)----> 7     retrain(sentence,newy)      8       9 elif corr=='n':~\AppData\Local\Temp/ipykernel_11300/2433836763.py in retrain(X, y)      7     X=cv.fit_transform(X)      8     #y = y.reshape((-1, 1))----> 9     sgd.fit(X,y)     10     with open('sgd.pickle', 'wb') as f:     11         pickle.dump(sgd, f)~\AppData\Local\Programs\Python\Python38\lib\site-packages\sklearn\pipeline.py in fit(self, X, y, **fit_params)    344             if self._final_estimator != 'passthrough':    345                 fit_params_last_step = fit_params_steps[self.steps[-1][0]]--> 346                 self._final_estimator.fit(Xt, y, **fit_params_last_step)    347     348         return self~\AppData\Local\Programs\Python\Python38\lib\site-packages\sklearn\linear_model\_stochastic_gradient.py in fit(self, X, y, coef_init, intercept_init, sample_weight)    727             Returns an instance of self.    728         """--> 729         return self._fit(X, y, alpha=self.alpha, C=1.0,    730                          loss=self.loss, learning_rate=self.learning_rate,    731                          coef_init=coef_init, intercept_init=intercept_init,~\AppData\Local\Programs\Python\Python38\lib\site-packages\sklearn\linear_model\_stochastic_gradient.py in _fit(self, X, y, alpha, C, loss, learning_rate, coef_init, intercept_init, sample_weight)    567         self.t_ = 1.0    568 --> 569         self._partial_fit(X, y, alpha, C, loss, learning_rate, self.max_iter,    570                           classes, sample_weight, coef_init, intercept_init)    571 ~\AppData\Local\Programs\Python\Python38\lib\site-packages\sklearn\linear_model\_stochastic_gradient.py in _partial_fit(self, X, y, alpha, C, loss, learning_rate, max_iter, classes, sample_weight, coef_init, intercept_init)    529                              max_iter=max_iter)    530         else:--> 531             raise ValueError(    532                 "The number of classes has to be greater than one;"    533                 " got %d class" % n_classes)ValueError: The number of classes has to be greater than one; got 1 class

数据样本如下:

Observation                                             Risk0   A separate road for light vehicle should be ma...   Low2   All benches were not having sufficient berm.        Low3   As light arrangement is not adequate.               Low4   As light arrangement is not adequate.               Low5   As contractor's equipment record is not availa...   Low77  First aid Room is not established.                  Medium98  Heavy dust on haul road is found with in suffi...   Medium79  First aid station is maintained in the Rest sh...   Medium171 Presently explosive van is not available with ...   Medium79  First aid station is maintained in the Rest sh...   Medium

理想情况下它应该接受输入,但我不知道为什么会出现这个错误。


回答:

我清理了代码并对retrain函数进行了多项更改,现在该函数将向训练集中添加新的字符串和标签,并再次拟合分类器。您的代码的其他部分在逻辑上保持不变!

实用函数:

def output_sample(sentence):    test=preprocess_text(sentence)    test=test.lower()    test=[test]     tokenizer.fit_on_sequences(test)    new_words= tokenizer.word_index    test1=cv.transform(test)    output=sgd.predict(test1)    return output[0]def preprocess_text(string):    # 做任何你想做的事,但之后返回字符串 ;)    return stringdef retrain(X,y):    X=preprocess_text(X)    X=X.lower()    X=[X]    X = cv.fit_transform(master_df['Observation']+X)    new_words=tokenizer.word_index    sgd.fit(X,master_df['Risk']+y)    with open('sgd.pickle', 'wb') as f:        pickle.dump(sgd, f)    print("模型已在新数据上训练")

实际流程:

import numpy as np import pickleimport nltkfrom sklearn.feature_extraction.text import CountVectorizerstopwords = nltk.corpus.stopwords.words('english')cv=CountVectorizer(max_df=1.0,min_df=1, stop_words=stopwords, max_features=10000, ngram_range=(1,3))master_df = pd.read_csv('classification.tsv')X=cv.fit_transform(master_df['Observation'])from sklearn.linear_model import SGDClassifiertry:    f = open("./sgd.pickle", 'rb')    sgd = pickle.load(f)except:    sgd = SGDClassifier()sgd.fit(X, master_df['Risk'].to_list())sentence=input("\n\n输入您的观察结果:\n\n")output=output_sample(sentence)print("\n\n风险预测为",preprocess_text(output),"\n\n")print("上述预测正确吗?\n")corr=input("按'y'表示是,按'n'表示否。\n")if corr=='y':    newy=np.array(output)    retrain(sentence, newy)elif corr=='n':    print("正确的风险是什么?\n1. 低\n2. 中\n")    r=input("输入相应的数字: ")    if r=='1':        newy=np.array('Low')        retrain(sentence,newy)    elif r=='2':        newy=np.array('Medium')        retrain(sentence,newy)    else:        print("输入错误。请重新启动应用程序。")else:    print("输入错误。请重新启动应用程序")

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注