我的数据框如下所示:
SNP A B S1 S2 S3 S4 S5 S6 S7
0 rs123 T C 001 100 100 100 001 100 100
2 rs126 G A 010 100 010 100 010 100 010
我希望输出如下:
SNP A B S1 S2 S3 S4 S5 S6 S7
0 rs123 T C CC TT TT TT CC TT TT
2 rs126 G A GA GG GA GG GA GG GA
条件是
if '001' --> df['B'] + df['B']
if '010' --> df['A'] + df['B']
if '100' --> df['A'] + df['A']
我的代码
for col in df.iloc[:,3:].columns:
df[col] = df[col].apply(lambda x: myfunc(x))
def myfunc(x):
if x == '001':
return df['B'] + df['B']
elif x == '010':
return df['A'] + df['B']
elif x == '100':
return df['A'] + df['A']
但我没有得到期望的输出 🙁 有人能帮帮我吗?
回答:
将三个求和作为新列添加到数据框中:
df["B+B"] = df["B"] + df["B"]
df["A+B"] = df["A"] + df["B"]
df["A+A"] = df["A"] + df["A"]
遍历列并应用如下逻辑:
for i in range(1, 8):
df[f"S{i}"][df[f"S{i}"] == "001"] = df["B+B"][df[f"S{i}"] == "001"]
df[f"S{i}"][df[f"S{i}"] == "010"] = df["A+B"][df[f"S{i}"] == "010"]
df[f"S{i}"][df[f"S{i}"] == "100"] = df["A+A"][df[f"S{i}"] == "100"]
df.drop(["B+B", "A+B", "A+A"], axis=1)
SNP A B S1 S2 S3 S4 S5 S6 S7
0 rs123 T C CC TT TT TT CC TT TT
1 rs126 G A GA GG GA GG GA GG GA