我正在处理以下数据集:
http://archive.ics.uci.edu/ml/datasets/Bank+Marketing
可以通过点击数据文件夹链接找到数据。数据集包含两个部分,一个是训练集,一个是测试集。我使用的文件包含了这两个数据集的合并数据。
我试图应用线性判别分析(LDA)来获得两个组件,然而当我的代码运行时,它只生成一个组件。即使我将“n_components”设置为3时,仍然只能得到一个组件。
我刚刚测试了PCA,无论我提供的“n”值是多少,只要“n”小于或等于在转换时X数组中的特征数,PCA都能正常工作。
我不确定为什么LDA的行为如此奇怪。以下是我的代码:
#Load librariesimport pandasimport matplotlib.pyplot as pltfrom sklearn import model_selectionfrom sklearn.discriminant_analysis import LinearDiscriminantAnalysisdataset = pandas.read_csv('bank-full.csv',engine="python", delimiter='\;')#Output Basic Dataset Infoprint(dataset.shape)print(dataset.head(20))print(dataset.describe())# Split-out validation datasetX = dataset.iloc[:,[0,5,9,11,12,13,14]] #we are selecting only the "clean data" w/o preprocessingY = dataset.iloc[:,16] validation_size = 0.20seed = 7X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=validation_size, random_state=seed)# Feature Scalingfrom sklearn.preprocessing import StandardScalersc_X = StandardScaler()X_train = sc_X.fit_transform(X_train)X_temp = X_trainX_validation = sc_X.transform(X_validation)'''# Applying PCAfrom sklearn.decomposition import PCApca = PCA(n_components = 5)X_train = pca.fit_transform(X_train)X_validation = pca.transform(X_validation)explained_variance = pca.explained_variance_ratio_'''# Applying LDAfrom sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDAlda = LDA(n_components = 2)X_train = lda.fit_transform(X_train, Y_train)X_validation = lda.transform(X_validation)
回答:
LDA(至少是sklearn中的实现)最多可以生成k-1个组件(其中k是类别的数量)。所以如果你处理的是二分类问题,你最终会得到一个维度。
相关链接:Python (scikit learn) lda collapsing to single dimension