当我尝试在实体集之间创建关系(使用我自己的数据)时遇到了问题。虽然没有错误提示,但它就是不为我的一个实体(“prods”实体)生成特征,尽管一切都应该连接得很好。
我无法分享我的数据,但我创建了一个使用模拟数据的最小示例:
import pandas as pdimport featuretools as ft
创建模拟数据
cust = pd.DataFrame([[1,50],[2,60]], columns=['CUST_ID','AGE'])#orders = pd.DataFrame([[1,1,50,33.0],[2,1,60,20],[3,2,66,999.9]], columns=['ORD_ID','CUST_ID','QTY','PRICE'])order_items = pd.DataFrame([[1,1,1,2,3.0],[2,2,2,8,5.0],[3,2,1,2,3.0],[4,3,3,2,3.0]], columns=['ORD_ITM_ID','ORD_ID','PROD_ID','QTY','PRICE'])prods = pd.DataFrame([[1,3.0],[2,5.0],[3,3.0]], columns=['PROD_ID','PRICE'])
定义实体集
es = ft.EntitySet('test')## Adding Customers Entityes.entity_from_dataframe(dataframe=cust, entity_id='cust', index='CUST_ID')## Adding Orders Entityes.entity_from_dataframe(dataframe=orders, entity_id='orders', index='ORD_ID')## Adding Order Items Entityes.entity_from_dataframe(dataframe=order_items, entity_id='order_items', index='ORD_ITM_ID')## Adding Products Entityes.entity_from_dataframe(dataframe=prods, entity_id='prods', index='PROD_ID')
创建关系
customer_relationship = ft.Relationship(es["cust"]["CUST_ID"], es["orders"]["CUST_ID"])orderitems_relationship = ft.Relationship(es["orders"]["ORD_ID"], es["order_items"]["ORD_ID"])products_relationship = ft.Relationship(es["prods"]["PROD_ID"], es["order_items"]["PROD_ID"])### Add Relationshipses = es.add_relationship(customer_relationship)es = es.add_relationship(orderitems_relationship)es = es.add_relationship(products_relationship)
生成特征
feature_defs = ft.dfs(entityset=es, target_entity="cust", agg_primitives=["count", "sum"], verbose = True, features_only = True)## Show featuresfeature_defs
输出:
Built 7 features[<Feature: AGE>, <Feature: COUNT(order_items)>, <Feature: SUM(orders.QTY)>, <Feature: SUM(orders.PRICE)>, <Feature: SUM(order_items.QTY)>, <Feature: COUNT(orders)>, <Feature: SUM(order_items.PRICE)>]
这应该也显示产品变量的特征,但它没有显示。
所以我期望的是SUM会按客户汇总产品价格。但实际上什么也没有。
最终,我想要为有趣的值创建特征。但由于产品变量没有显示出来,添加有趣的值也无法工作。
## Get All Product IDsinteresting_products = es["prods"].df.PROD_ID.unique()es["prods"]["PROD_ID"].interesting_values=interesting_productsfeature_defs = ft.dfs(entityset=es, target_entity="cust", agg_primitives=["count", "sum"], where_primitives=["count", "sum"], verbose = True, features_only = True)## Show featuresfeature_defs
输出:
Built 7 features[<Feature: AGE>, <Feature: COUNT(order_items)>, <Feature: SUM(orders.QTY)>, <Feature: SUM(orders.PRICE)>, <Feature: SUM(order_items.QTY)>, <Feature: COUNT(orders)>, <Feature: SUM(order_items.PRICE)>]
希望有人能帮到我 🙂
回答:
产品特征没有显示的原因是,从中创建的任何特征都将是深度3。你可以在ft.dfs
中使用max_depth
参数来控制深度,像这样
feature_defs = ft.dfs(entityset=es, target_entity="cust", agg_primitives=["count", "sum"], verbose = True, max_depth=3, # add max_depth features_only = True)
现在返回的特征是
[<Feature: AGE>, <Feature: SUM(order_items.QTY)>, <Feature: SUM(order_items.PRICE)>, <Feature: SUM(orders.PRICE)>, <Feature: SUM(orders.QTY)>, <Feature: COUNT(order_items)>, <Feature: COUNT(orders)>, <Feature: SUM(order_items.prods.PRICE)>]
你可以看到最后使用了产品价格的SUM(order_items.prods.PRICE)
。
为了使where子句工作,请将有趣的值添加到order_items
实体中。
interesting_products = es["prods"].df.PROD_ID.unique()es["order_items"]["PROD_ID"].interesting_values=interesting_productsfeature_defs = ft.dfs(entityset=es, target_entity="cust", agg_primitives=["count", "sum"], where_primitives=["count", "sum"], verbose=True, max_depth=3, features_only=True)
这将创建20个特征,如下所示
[<Feature: AGE>, <Feature: SUM(order_items.QTY)>, <Feature: SUM(order_items.PRICE)>, <Feature: SUM(orders.PRICE)>, <Feature: SUM(orders.QTY)>, <Feature: COUNT(order_items)>, <Feature: COUNT(orders)>, <Feature: SUM(order_items.prods.PRICE WHERE PROD_ID = 2)>, <Feature: SUM(order_items.QTY WHERE PROD_ID = 2)>, <Feature: SUM(order_items.QTY WHERE PROD_ID = 3)>, <Feature: SUM(order_items.prods.PRICE)>, <Feature: COUNT(order_items WHERE PROD_ID = 2)>, <Feature: SUM(order_items.prods.PRICE WHERE PROD_ID = 1)>, <Feature: SUM(order_items.PRICE WHERE PROD_ID = 3)>, <Feature: COUNT(order_items WHERE PROD_ID = 1)>, <Feature: COUNT(order_items WHERE PROD_ID = 3)>, <Feature: SUM(order_items.prods.PRICE WHERE PROD_ID = 3)>, <Feature: SUM(order_items.QTY WHERE PROD_ID = 1)>, <Feature: SUM(order_items.PRICE WHERE PROD_ID = 2)>, <Feature: SUM(order_items.PRICE WHERE PROD_ID = 1)>]