Home IT技术如何加速Python中的缩放过程？

如何加速Python中的缩放过程？

IT技术 xiaolong · 2025年5月31日 · 0 Comment

我有一个大型文本数据集，我使用MinMaxScaler来转换其中一个特征。代码运行正常，但需要超过3分钟的时间，我想减少这个过程所消耗的时间。有什么建议可以加速这个过程，或者有什么替代方法可以更快地进行这种转换吗？

df = cleanData('data.csv')scaler = MinMaxScaler(feature_range=(0, 5))scaler.fit(pd.DataFrame(df.loc[:,'year']))df.loc[:,'year'] = scaler.transform(pd.DataFrame(df.loc[:,'year']))

回答：

你可以尝试使用dask-ml来做这件事：

import dask.dataframe as ddfrom dask_ml.preprocessing import MinMaxScaler# or read directly from csv with ddf = dd.read_csv('data.csv')ddf = dd.from_pandas(df, npartitions=10)scaler = MinMaxScaler(feature_range=(0, 5))scaler.fit(ddf['year'])ddf['year'] = scaler.transform(ddf['year'])

dask_ml中还有其他可用的预处理工具，请参见https://ml.dask.org/modules/generated/dask_ml.preprocessing.MinMaxScaler.html?highlight=minmaxscaler

geopandas machine-learning python

发表回复取消回复