在featuretools中我的实体集设置未生成特征

当我尝试在实体集之间创建关系(使用我自己的数据)时遇到了问题。虽然没有错误提示,但它就是不为我的一个实体(“prods”实体)生成特征,尽管一切都应该连接得很好。

我无法分享我的数据,但我创建了一个使用模拟数据的最小示例:

import pandas as pdimport featuretools as ft

创建模拟数据

cust = pd.DataFrame([[1,50],[2,60]],                     columns=['CUST_ID','AGE'])#orders = pd.DataFrame([[1,1,50,33.0],[2,1,60,20],[3,2,66,999.9]],                       columns=['ORD_ID','CUST_ID','QTY','PRICE'])order_items = pd.DataFrame([[1,1,1,2,3.0],[2,2,2,8,5.0],[3,2,1,2,3.0],[4,3,3,2,3.0]],                            columns=['ORD_ITM_ID','ORD_ID','PROD_ID','QTY','PRICE'])prods = pd.DataFrame([[1,3.0],[2,5.0],[3,3.0]],                      columns=['PROD_ID','PRICE'])

定义实体集

es = ft.EntitySet('test')## Adding Customers Entityes.entity_from_dataframe(dataframe=cust,                         entity_id='cust',                         index='CUST_ID')## Adding Orders Entityes.entity_from_dataframe(dataframe=orders,                         entity_id='orders',                         index='ORD_ID')## Adding Order Items Entityes.entity_from_dataframe(dataframe=order_items,                         entity_id='order_items',                         index='ORD_ITM_ID')## Adding Products Entityes.entity_from_dataframe(dataframe=prods,                         entity_id='prods',                         index='PROD_ID')

创建关系

customer_relationship = ft.Relationship(es["cust"]["CUST_ID"],                                   es["orders"]["CUST_ID"])orderitems_relationship = ft.Relationship(es["orders"]["ORD_ID"],                                           es["order_items"]["ORD_ID"])products_relationship = ft.Relationship(es["prods"]["PROD_ID"],                                        es["order_items"]["PROD_ID"])### Add Relationshipses = es.add_relationship(customer_relationship)es = es.add_relationship(orderitems_relationship)es = es.add_relationship(products_relationship)

生成特征

feature_defs = ft.dfs(entityset=es,                                target_entity="cust",                                agg_primitives=["count", "sum"],                                verbose = True,                                 features_only = True)## Show featuresfeature_defs

输出:

Built 7 features[<Feature: AGE>, <Feature: COUNT(order_items)>, <Feature: SUM(orders.QTY)>, <Feature: SUM(orders.PRICE)>, <Feature: SUM(order_items.QTY)>, <Feature: COUNT(orders)>, <Feature: SUM(order_items.PRICE)>]

这应该也显示产品变量的特征,但它没有显示。

所以我期望的是SUM会按客户汇总产品价格。但实际上什么也没有。

最终,我想要为有趣的值创建特征。但由于产品变量没有显示出来,添加有趣的值也无法工作。

## Get All Product IDsinteresting_products = es["prods"].df.PROD_ID.unique()es["prods"]["PROD_ID"].interesting_values=interesting_productsfeature_defs = ft.dfs(entityset=es,                                target_entity="cust",                                agg_primitives=["count", "sum"],                                where_primitives=["count", "sum"],                                verbose = True,                                 features_only = True)## Show featuresfeature_defs

输出:

Built 7 features[<Feature: AGE>, <Feature: COUNT(order_items)>, <Feature: SUM(orders.QTY)>, <Feature: SUM(orders.PRICE)>, <Feature: SUM(order_items.QTY)>, <Feature: COUNT(orders)>, <Feature: SUM(order_items.PRICE)>]

希望有人能帮到我 🙂


回答:

产品特征没有显示的原因是,从中创建的任何特征都将是深度3。你可以在ft.dfs中使用max_depth参数来控制深度,像这样

feature_defs = ft.dfs(entityset=es,                      target_entity="cust",                      agg_primitives=["count", "sum"],                      verbose = True,                       max_depth=3, # add max_depth                      features_only = True)

现在返回的特征是

[<Feature: AGE>, <Feature: SUM(order_items.QTY)>, <Feature: SUM(order_items.PRICE)>, <Feature: SUM(orders.PRICE)>, <Feature: SUM(orders.QTY)>, <Feature: COUNT(order_items)>, <Feature: COUNT(orders)>, <Feature: SUM(order_items.prods.PRICE)>]

你可以看到最后使用了产品价格的SUM(order_items.prods.PRICE)

为了使where子句工作,请将有趣的值添加到order_items实体中。

interesting_products = es["prods"].df.PROD_ID.unique()es["order_items"]["PROD_ID"].interesting_values=interesting_productsfeature_defs = ft.dfs(entityset=es,                      target_entity="cust",                      agg_primitives=["count", "sum"],                      where_primitives=["count", "sum"],                      verbose=True,                       max_depth=3,                       features_only=True)

这将创建20个特征,如下所示

[<Feature: AGE>, <Feature: SUM(order_items.QTY)>, <Feature: SUM(order_items.PRICE)>, <Feature: SUM(orders.PRICE)>, <Feature: SUM(orders.QTY)>, <Feature: COUNT(order_items)>, <Feature: COUNT(orders)>, <Feature: SUM(order_items.prods.PRICE WHERE PROD_ID = 2)>, <Feature: SUM(order_items.QTY WHERE PROD_ID = 2)>, <Feature: SUM(order_items.QTY WHERE PROD_ID = 3)>, <Feature: SUM(order_items.prods.PRICE)>, <Feature: COUNT(order_items WHERE PROD_ID = 2)>, <Feature: SUM(order_items.prods.PRICE WHERE PROD_ID = 1)>, <Feature: SUM(order_items.PRICE WHERE PROD_ID = 3)>, <Feature: COUNT(order_items WHERE PROD_ID = 1)>, <Feature: COUNT(order_items WHERE PROD_ID = 3)>, <Feature: SUM(order_items.prods.PRICE WHERE PROD_ID = 3)>, <Feature: SUM(order_items.QTY WHERE PROD_ID = 1)>, <Feature: SUM(order_items.PRICE WHERE PROD_ID = 2)>, <Feature: SUM(order_items.PRICE WHERE PROD_ID = 1)>]

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注