我使用RDKIT库成功将单个SMILES转换为独热编码,但在尝试转换包含SMILES的整个.csv文件时遇到了错误。
成功实验:
new = 'O=C(O)C1=C(N2N=CC=N2)C=CC(N)=N1' output : array([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]])
但在尝试转换多个SMILES时,我遇到了以下错误
TypeError: No registered converter was able to produce a C++ rvalue of type class std::basic_string<wchar_t,struct std::char_traits<wchar_t>,class std::allocator<wchar_t> > from this Python object of type DataFrame
我分享了我的代码文件,你可以查看那个演示
如果有人能帮我,请告诉我。
回答:
Chem.MolToSmiles(Chem.MolFromSmiles( smiles ))
只能一次转换一个SMILES,但你尝试转换了整个数据框。你需要遍历数据框中的SMILES。
这应该可以工作。
df = pd.read_csv('RouteSynthesisPrediction_o2h.csv')for smi in df['Target']: smiles = Chem.CanonSmiles(smi) mat = smiles_encoder(smiles) dec = smiles_decoder(mat) print(mat) print(smi) print(smiles) print(dec) print()
输出:
[[0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] ... [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.]]O=C(O)C1=C(N2N=CC=N2)C=CC(N)=N1Nc1ccc(-n2nccn2)c(C(=O)O)n1Nc1ccc(-n2nccn2)c(C(=O)O)n1[[0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] ... [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.]]O=C(OC)C1=C(N2N=CC=N2)C=CC(N)=N1COC(=O)c1nc(N)ccc1-n1nccn1COC(=O)c1nc(N)ccc1-n1nccn1...