我在跟随这个教程 https://mapr.com/blog/churn-prediction-sparkml/ 时发现,csv结构需要手动编写如下:
val schema = StructType(Array( StructField("state", StringType, true), StructField("len", IntegerType, true), StructField("acode", StringType, true), StructField("intlplan", StringType, true), StructField("vplan", StringType, true), StructField("numvmail", DoubleType, true), StructField("tdmins", DoubleType, true), StructField("tdcalls", DoubleType, true), StructField("tdcharge", DoubleType, true), StructField("temins", DoubleType, true), StructField("tecalls", DoubleType, true), StructField("techarge", DoubleType, true), StructField("tnmins", DoubleType, true), StructField("tncalls", DoubleType, true), StructField("tncharge", DoubleType, true), StructField("timins", DoubleType, true), StructField("ticalls", DoubleType, true), StructField("ticharge", DoubleType, true), StructField("numcs", DoubleType, true), StructField("churn", StringType, true)
然而,我的数据集有335个特征,我不想全部手动编写…有没有简单的方法来获取这些特征并相应地定义架构?
回答:
我在这里找到了解决方案: https://dzone.com/articles/using-apache-spark-dataframes-for-processing-of-ta 比我想象的要简单