隔离森林 – TypeError: 无效的类型提升

我试图将隔离森林应用于从事件日志转换而来的数据,但得到了“TypeError: invalid type promotion”的错误,这是否是因为日期时间造成的?我不明白我做错了什么!

我的表格的一部分(处理后):

 +--------------+----------------------+--------------+--------------------+--------------------+-------------------+-----------------+| org:resource | lifecycle:transition | concept:name |   time:timestamp   |   case:REG_DATE    | case:concept:name | case:AMOUNT_REQ |+--------------+----------------------+--------------+--------------------+--------------------+-------------------+-----------------+|           52 |                    0 |            9 | 2011 10-01 38:44.5 | 2011 10-01 38:44.5 |                 0 |           20000 ||           52 |                    0 |            6 | 2011 10-01 38:44.9 | 2011 10-01 38:44.5 |                 2 |           20000 ||           52 |                    0 |            7 | 2011 10-01 39:37.9 | 2011 10-01 38:44.5 |                 0 |           20000 ||           52 |                    1 |           19 | 2011 10-01 39:38.9 | 2011 10-01 38:44.5 |                 1 |           20000 ||           68 |                    2 |           19 | 2011 10-01 36:46.4 | 2011 10-01 38:44.5 |                 3 |           20000 |+--------------+----------------------+--------------+--------------------+--------------------+-------------------+-----------------+

当打印时

df.info()<class 'pandas.core.frame.DataFrame'>RangeIndex: 262200 entries, 0 to 262199Data columns (total 7 columns): #   Column                Non-Null Count   Dtype         ---  ------                --------------   -----          0   org:resource          262200 non-null  int64          1   lifecycle:transition  262200 non-null  int64          2   concept:name          262200 non-null  int64          3   time:timestamp        262200 non-null  datetime64[ns] 4   case:REG_DATE         262200 non-null  datetime64[ns] 5   case:concept:name     262200 non-null  int64          6   case:AMOUNT_REQ       262200 non-null  int32         dtypes: datetime64[ns](2), int32(1), int64(4)memory usage: 13.0 MB

我的代码是:

from sklearn.ensemble import IsolationForestcontamination = 0.05model = IsolationForest(contamination=contamination, n_estimators=10000)model.fit(df)df["iforest"] = pd.Series(model.predict(df))df["iforest"] = df["iforest"].map({1: 0, -1: 1})df["score"] = model.decision_function(df)df.sort_values("score")

然而,我得到了以下错误:

---------------------------------------------------------------------------TypeError                                 Traceback (most recent call last)<ipython-input-23-5edb86351ac8> in <module>      4       5 model = IsolationForest(contamination=contamination, n_estimators=10000)----> 6 model.fit(df)      7       8 df["iforest"] = pd.Series(model.predict(df))~\.conda\envs\process_mining\lib\site-packages\sklearn\ensemble\_iforest.py in fit(self, X, y, sample_weight)    261                 )    262 --> 263         X = check_array(X, accept_sparse=['csc'])    264         if issparse(X):    265             # Pre-sort indices to avoid that each individual tree of the~\.conda\envs\process_mining\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)     70                           FutureWarning)     71         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})---> 72         return f(**kwargs)     73     return inner_f     74 ~\.conda\envs\process_mining\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)    531     532         if all(isinstance(dtype, np.dtype) for dtype in dtypes_orig):--> 533             dtype_orig = np.result_type(*dtypes_orig)    534     535     if dtype_numeric:<__array_function__ internals> in result_type(*args, **kwargs)TypeError: invalid type promotion

回答:

我通过这个答案找到了解决方案:Python – linear regression TypeError: invalid type promotion

技术上,你需要将时间戳转换为序数,这样就可以工作了,我使用以下代码进行了转换:

df['time:timestamp'] = df['time:timestamp'].map(dt.datetime.toordinal)df['case:REG_DATE'] = df['case:REG_DATE'].map(dt.datetime.toordinal)

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注