我在尝试进行心脏病的机器学习实践问题,使用的是来自Kaggle的数据集。然后我尝试将数据分成训练集和测试集,并将模型组合成一个函数进行预测,在Jupyter Notebook中出现了这个错误。
这是我的代码:
# Split data into X and yX = df.drop("target", axis=1)y = df["target"]
分割
# Split data into train and test setsnp.random.seed(42)# Split into train & test setX_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)
预测函数
# Put models in a dictionarymodels = {"Logistic Regression": LogisticRegression(), "KNN": KNeighborsClassifier(), "Random Forest": RandomForestClassifier()}# Create a function to fit and score modelsdef fit_and_score(models, X_train, X_test, y_train, y_test): """ Fits and evaluates given machine learning models. models : a dict of differenct Scikit-Learn machine learning models X_train : training data (no labels) X_test : testing data (no labels) y_train : training labels y_test : test labels """ # Set random seed np.random.seed(42) # Make a dictionary to keep model scores model_scores = {} # Loop through models for name, model in models.items(): # Fit the model to the data model.fit(X_train, y_train) # Evaluate the model and append its score to model_scores model_scores[name] = model.score(X_test, y_test) return model_scores
当我运行这段代码时,就出现了这个错误
model_scores = fit_and_score(models=models, X_train=X_train, X_test=X_test, y_train=y_train, y_test=y_test)model_scores
回答:
你的X_train
、y_train
或两者似乎包含了非浮点数的条目。
在代码中的某个点,尝试使用
X_train = X_train.astype(float)y_train = y_train.astype(float)X_test = X_test.astype(float)y_test = y_test.astype(float)
这样做要么会解决问题,错误会消失;要么其中一个转换会失败,此时你需要决定如何(或是否)将数据编码为浮点数。