转换原语在使用附加参数时运行良好。这里有一个示例
def string_count(column, string=None): ''' ..note:: this is a naive implementation used for clarity ''' assert string is not None, "string to count needs to be defined" counts = [str(element).lower().count(string) for element in column] return countsdef string_count_generate_name(self): return u"STRING_COUNT(%s, %s)" % (self.base_features[0].get_name(), '"' + str(self.kwargs['string'] + '"'))StringCount = make_trans_primitive( function=string_count, input_types=[Categorical], return_type=Numeric, cls_attributes={ "generate_name": string_count_generate_name })es = ft.demo.load_mock_customer(return_entityset=True)count_the_feat = StringCount(es['transactions']['product_id'], string="5")fm, fd = ft.dfs( entityset=es, target_entity='transactions', max_depth=1, features_only=False, seed_features=[count_the_feat])
输出:
product_id STRING_COUNT(product_id, "5")transaction_id 1 5 12 4 03 3 04 3 05 4 0
然而,如果我修改并将其变成聚合原语,如下所示:
def string_count(column, string=None): ''' ..note:: this is a naive implementation used for clarity ''' assert string is not None, "string to count needs to be defined" counts = [str(element).lower().count(string) for element in column] return sum(counts)def string_count_generate_name(self): return u"STRING_COUNT(%s, %s)" % (self.base_features[0].get_name(), '"' + str(self.kwargs['string'] + '"'))StringCount = make_agg_primitive( function=string_count, input_types=[Categorical], return_type=Numeric, cls_attributes={ "generate_name": string_count_generate_name })es = ft.demo.load_mock_customer(return_entityset=True)count_the_feat = StringCount(es['transactions']['product_id'], string="5")
我会得到以下错误:
TypeError: new_class_init() missing 1 required positional argument: 'parent_entity'
featuretools是否支持带附加参数的自定义聚合原语?
回答:
这里的问题是你的种子特征缺少一个参数。对于聚合原语,你需要指定要聚合的实体。在这种情况下,将你的聚合种子特征的构造更改为
count_the_feat = StringCount(es['transactions']['product_id'], es['sessions'], string="5")
将按预期创建特征
sessions.STRING_COUNT(product_id, "5")
该特征将显示每个会话ID中字符串“5”出现的频率。