python中逆向局部线性嵌入LLE

如何使用sklearn或其他python包执行逆向局部线性嵌入（LLE）？

我想对一些表格数据X进行分类机器学习算法（SVM、神经网络等）的训练，其中y是目标类别变量。

通常的程序如下：

将X和y分割为X_train, y_train, X_test, y_test。由于我有大量的参数（列），我可以通过对X_train进行LLE来减少参数数量，以获得X_train_lle。y是目标变量，不进行任何变换。之后，我可以简单地在X_train_lle上训练模型。问题出现在我想在y_test上使用训练好的模型时。如果对X_test和X_train一起进行LLE，会引入数据泄露。另外，如果仅对X_test进行LLE，新的X_test_lle可能会完全不同，因为算法使用的是k最近邻。我认为正确的程序应该是使用在X_train上获得的参数对X_test进行逆向LLE，然后在X_test_lle上使用分类模型。

我查了一些参考文献，第2.4.1节处理了逆向LLE。https://arxiv.org/pdf/2011.10925.pdf

如何使用python（ preferably sklearn）执行逆向LLE？

这是一个代码示例：

import numpy as npfrom sklearn import preprocessingfrom sklearn import svm, datasetsfrom sklearn.manifold import LocallyLinearEmbedding### Generating dummy datan_row = 10000 # these numbers are much bigger for the real problemn_col = 50 #X = np.random.random(n_row, n_col)y = np.random.randint(5, size=n_row) # five different classes labeled from 0 to 4### Preprocessing ###X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size = 0.5, random_state = 1)#standardization using StandardScaler applied to X_train and then scaling X_train and X_testscaler = preprocessing.StandardScaler()scaler.fit(X_train)X_train = scaler.transform(X_train)X_test = scaler.transform(X_test)### Here is the part with LLE #### We reduce the parameter space to 10 with 15 nearest neighboursX_train_lle = LocallyLinearEmbedding(n_neighbors=15, n_components=10, method='modified', eigen_solver='dense')### Here is the training part #### we want to apply SVM to transformed data X_train_lle#Create a svm Classifierclf = svm.SVC(kernel='linear') # Linear Kernel#Train the model using the training setsclf.fit(X_train_lle, y_train)# Here should go the code to do inverse LLE on X_test #i.e. where do values of X_test_lle fit in the manufold X_train_lle### After the previous part of the code was successfully solved by stackoverflow community :)#Predict the response for test datasety_pred = clf.predict(X_test_lle)

回答：

可以使用您提到的论文中的方法（Ghojogh et al. (2020) – 2.4.1节）以及其他论文（例如，Franz et al. (2014) – 4.1节）进行逆变换。基本思路是找到嵌入空间中的k最近邻，然后将每个点表示为其在嵌入空间中的邻居的线性组合。然后保留获得的权重，并使用相同的权重将每个点表示为其在原始空间中的k最近邻的组合。显然，应使用与原始正向LLE相同的邻居数量。

使用barycenter_kneighbor_graph函数的代码看起来像这样：

from sklearn.manifold._locally_linear import barycenter_kneighbors_graph# calculate the weights for expressing each point in the embedded space as a linear combination of its neighborsW = barycenter_kneighbors_graph(Y, n_neighbors = k, reg = 1e-3)# reconstruct the data points in the high dimensional space from its neighbors using the weights calculated based on the embedded spaceX_reconstructed = W @ X

其中Y是原始LLE嵌入的结果（在您的代码片段中是X_train_lle），X是原始数据矩阵，k是最近邻的数量。

学技术

python中逆向局部线性嵌入LLE

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复