我有一个文本数据框架需要进行分类。但我首先需要进行过采样。请查看下面的样本数据:
df=[['I am going to class today','I am going to class today','I am going to class today','I am going to class today','I am going to class today','I am going to class today','I am going to class today','I am going to class today','I am going to class today','I am going to class today','I am not going to class today','I am not going to class today','I am not going to class today','I am not going to class today'],['Positive','Positive','Positive','Positive','Positive','Positive','Positive','Positive','Positive','Positive','Negative','Negative','Negative','Negative']]df=pd.DataFrame(df)df=df.transpose()df.columns=['Features','Class']df Features Class0 I am going to class today Positive1 I am going to class today Positive2 I am going to class today Positive3 I am going to class today Positive4 I am going to class today Positive5 I am going to class today Positive6 I am going to class today Positive7 I am going to class today Positive8 I am going to class today Positive9 I am going to class today Positive10 I am not going to class today Negative11 I am not going to class today Negative12 I am not going to class today Negative13 I am not going to class today Negativeoversample = RandomOverSampler(sampling_strategy='minority')# fit and apply the transformX_over, y_over = oversample.fit_resample(df['Features'], df['Class'])# summarize class distributionprint(Counter(y_over))
但这不起作用,并返回了ValueError: Expected 2D array, got 1D array instead:
错误。我该如何对这些数据进行过采样?
回答:
我找到了问题所在。我需要重塑我的数据。
X_over, y_over = oversample.fit_resample(df['Features'].values.reshape(-1,1), df['Class'])
现在这起作用了。
Counter({'Positive': 10, 'Negative': 10})