我在尝试为预测一个人是否为慢性肾脏病(CKD)患者构建一个深度学习预测模型。你能告诉我吗?我应该如何设计神经网络?每一层应该添加多少神经元?或者在Keras中是否有其他方法可以做到?数据集链接: https://github.com/Samar-080301/Python_Project/blob/master/ckd_full.csv
import tensorflow as tffrom tensorflow import kerasimport pandas as pdfrom sklearn.model_selection import train_test_splitimport osfrom matplotlib import pyplot as pltos.chdir(r'C:\Users\samar\OneDrive\desktop\projects\Chronic_Kidney_Disease')os.getcwd()x=pd.read_csv('ckd_full.csv')y=x[['class']]y['class']=y['class'].replace(to_replace=(r'ckd',r'notckd'), value=(1,0))x=x.drop(columns=['class'])x['rbc']=x['rbc'].replace(to_replace=(r'normal',r'abnormal'), value=(1,0))x['pcc']=x['pcc'].replace(to_replace=(r'present',r'notpresent'), value=(1,0))x['ba']=x['ba'].replace(to_replace=(r'present',r'notpresent'), value=(1,0))x['pc']=x['pc'].replace(to_replace=(r'normal',r'abnormal'), value=(1,0))x['htn']=x['htn'].replace(to_replace=(r'yes',r'no'), value=(1,0))x['dm']=x['dm'].replace(to_replace=(r'yes',r'no'), value=(1,0))x['cad']=x['cad'].replace(to_replace=(r'yes',r'no'), value=(1,0))x['pe']=x['pe'].replace(to_replace=(r'yes',r'no'), value=(1,0))x['ane']=x['ane'].replace(to_replace=(r'yes',r'no'), value=(1,0))x['appet']=x['appet'].replace(to_replace=(r'good',r'poor'), value=(1,0))x[x=="?"]=np.nanxtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.01)#begin the modelmodel=keras.models.Sequential()model.add(keras.layers.Dense(128,input_dim = 24, activation=tf.nn.relu))model.add(tf.keras.layers.Dense(128,activation=tf.nn.relu)) # adding a layer with 128 nodes and relu activaation functionmodel.add(tf.keras.layers.Dense(128,activation=tf.nn.relu)) # adding a layer with 128 nodes and relu activaation functionmodel.add(tf.keras.layers.Dense(128,activation=tf.nn.relu)) # adding a layer with 128 nodes and relu activaation functionmodel.add(tf.keras.layers.Dense(128,activation=tf.nn.relu)) # adding a layer with 128 nodes and relu activaation function model.add(tf.keras.layers.Dense(128,activation=tf.nn.relu)) # adding a layer with 128 nodes and relu activaation functionmodel.add(tf.keras.layers.Dense(2,activation=tf.nn.softmax)) # adding a layer with 2 nodes and softmax activaation functionmodel.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # specifiying hyperparametersmodel.fit(xtrain,ytrain,epochs=5) # load the modelmodel.save('Nephrologist') # save the model with a unique namemyModel=tf.keras.models.load_model('Nephrologist') # make an object of the modelprediction=myModel.predict((xtest)) C:\Users\samar\anaconda3\lib\site-packages\ipykernel_launcher.py:12: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy if sys.path[0] == '':Epoch 1/5396/396 [==============================] - 0s 969us/sample - loss: nan - acc: 0.3561Epoch 2/5396/396 [==============================] - 0s 343us/sample - loss: nan - acc: 0.3763Epoch 3/5396/396 [==============================] - 0s 323us/sample - loss: nan - acc: 0.3763Epoch 4/5396/396 [==============================] - 0s 283us/sample - loss: nan - acc: 0.3763Epoch 5/5396/396 [==============================] - 0s 303us/sample - loss: nan - acc: 0.3763
回答:
这是我实现100%测试准确率的结构:
model=keras.models.Sequential()model.add(keras.layers.Dense(200,input_dim = 24, activation=tf.nn.tanh))model.add(keras.layers.Dense(1, activation=tf.nn.sigmoid))model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) # specifiying hyperparametersxtrain_tensor = tf.convert_to_tensor(xtrain, dtype=tf.float32)ytrain_tensor = tf.convert_to_tensor(ytrain, dtype=tf.float32)model.fit(xtrain_tensor , ytrain_tensor , epochs=500, batch_size=128, validation_split = 0.15, shuffle=True, verbose=2) # load the modelresults = model.evaluate(xtest, ytest, batch_size=128)
输出结果:
3/3 - 0s - loss: 0.2560 - accuracy: 0.9412 - val_loss: 0.2227 - val_accuracy: 0.9815Epoch 500/5003/3 - 0s - loss: 0.2225 - accuracy: 0.9673 - val_loss: 0.2224 - val_accuracy: 0.98151/1 [==============================] - 0s 0s/step - loss: 0.1871 - accuracy: 1.0000
最后一行表示模型在测试数据集上的评估结果。看起来泛化效果很好 🙂
————————————————- 原始回答如下 —————————————————我建议首先使用逻辑回归模型,以查看你的数据集是否具有预测价值。
model=keras.models.Sequential()model.add(keras.layers.Dense(1,input_dim = 24, activation=tf.nn.sigmoid))model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) # specifiying hyperparametersmodel.fit(xtrain,ytrain,epochs=100) # 可能需要更多或更少的轮次。这取决于你的数据集中的噪声量。
如果你看到的准确率得分令你满意,我建议尝试再添加一到两层密集隐藏层,每层包含10到40个节点。重要的是要提到,我的建议完全基于我的经验。
我强烈(!!!!)建议将y_label转换为二进制值,其中1代表正类(记录是CKD患者的记录),0代表负类。请告诉我是否有效,如果无效,我也会尝试处理你的数据集。