如何为 sklearn.svm.SVC 定义自定义核函数?

我正在尝试使用 scikit-learn 在 Python 中构建一个股票预测系统。以下是我的代码:

import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from sklearn import svm,preprocessing
from sklearn.metrics import precision_recall_fscore_support
import pandas as pd
import time
##import statistics
def my_kernel(X, Y):
    """
    我们创建一个自定义核函数:
                 (2  0)
    k(X, Y) = X  (    ) Y.T
                 (0  1)
    """
    M = np.array([[2, 0], [0, 1.0]])
    return np.dot(np.dot(X, M), Y.T)
FEATURES =  ['DE Ratio',
             'Trailing P/E',
             'Price/Sales',
             'Price/Book',
             'Profit Margin',
             'Operating Margin',
             'Return on Assets',
             'Return on Equity',
             'Revenue Per Share',
             'Market Cap',
             'Enterprise Value',
             'Forward P/E',
             'PEG Ratio',
             'Enterprise Value/Revenue',
             'Enterprise Value/EBITDA',
             'Revenue',
             'Gross Profit',
             'EBITDA',
             'Net Income Avl to Common ',
             'Diluted EPS',
             'Earnings Growth',
             'Revenue Growth',
             'Total Cash',
             'Total Cash Per Share',
             'Total Debt',
             'Current Ratio',
             'Book Value Per Share',
             'Cash Flow',
             'Beta',
             'Held by Insiders',
             'Held by Institutions',
             'Shares Short (as of',
             'Short Ratio',
             'Short % of Float',
             'Shares Short (prior ']
def Build_Data_Set():
    data_df = pd.DataFrame.from_csv("key_stats.csv")
    data_df = data_df.reindex(np.random.permutation(data_df.index))
    ##print data_df
    X = np.array(data_df[FEATURES].values)
    y = (data_df["Status"]
         .replace("underperform",0)
         .replace("outperform",1)
         .values.tolist())
    X = preprocessing.scale(X)
    X = StandardScaler().fit_transform(X)
    Z0 = np.array(data_df["stock_p_hancge"])
    Z1 = np.array(data_df["sp500_p_change"])
    return X,y,Z0,Z1
def mykernel(X, Y,gamma=None):
    X, Y = check_pairwise_arrays(X, Y)
    if gamma is None:
        gamma = 1.0 / X.shape[1]
    K = euclidean_distances(X, Y, squared=True)
    k *= -gamma
    np.exp(K, K)
    # 就地指数化 K
    return safe_sparse_dot(X, Y.T, dense_output=True) + k
size = 2094
invest_amount = 10000
total_invests = 0
if_market = 0
if_strat = 0
X, y , Z0,Z1= Build_Data_Set()
print(len(X))
test_size = len(X) - size -1
start = time.clock()
clf = svm.SVC(kernel="mykernel")
clf.fit(X[:size],y[:size])
y_pred = clf.predict(X[size+1:])
y_true = y[size+1:]
time_taken = time.clock()-start
print time_taken,"Seconds"
for x in range(1, test_size+1):
    if y_pred[-x] == 1:
        invest_return = invest_amount + (invest_amount * (Z0[-x]/100))
        market_return = invest_amount + (invest_amount * (Z1[-x]/100))
        total_invests += 1
        if_market += market_return
        if_strat += invest_return
print accuracy_score(y_true, y_pred)
print precision_recall_fscore_support(y_true, y_pred, average='macro')
print "Total Trades:", total_invests
print "Ending with Strategy:",if_strat
print "Ending with Market:",if_market
compared = ((if_strat - if_market) / if_market) * 100.0
do_nothing = total_invests * invest_amount
avg_market = ((if_market - do_nothing) / do_nothing) * 100.0
avg_strat = ((if_strat - do_nothing) / do_nothing) * 100.0
print "Compared to market, we earn",str(compared)+"% more"
print "Average investment return:", str(avg_strat)+"%"
print "Average market return:", str(avg_market)+"%"

预定义的核函数可以正常工作,但对于我的自定义核函数,我遇到了以下错误:

ValueError: 'mykernel' is not in list

根据官方文档,上述代码似乎应该可以工作。


回答:

你需要将核函数本身传递给 kernel= 参数,而不是仅传递函数名称,即:

clf = svm.SVC(kernel=mykernel)

而不是

clf = svm.SVC(kernel="mykernel")

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注