大家好,我正在尝试运行一个k-means聚类算法的集群转储,但它不起作用。有什么想法吗?这是《Mahout实战》书中在伪分布式集群上的一个示例。
有没有什么工具或方法可以可视化集群转储的输出或k-means的输出?
[186946@01HW534064 bin]$ ./mahout clusterdump -dt sequencefile -d /home/186946/reuters-vectors/dictionary.file-0 -i reuters-fkmeans-clusters/clusters-3 -o /home/186946/clusters.txt -b 10 -n 10Running on hadoop, using HADOOP_HOME=/home/186946/hadoop-0.20.2-cdh3u5No HADOOP_CONF_DIR set, using /home/186946/hadoop-0.20.2-cdh3u5/src/conf MAHOUT-JOB: /home/186946/mahout-0.5-cdh3u5/mahout-examples-0.5-cdh3u5-job.jarMAHOUT-JOB: /home/186946/mahout-0.5-cdh3u5/mahout-examples-0.5-cdh3u5-job.jar13/03/08 17:26:11 ERROR common.AbstractJob: Unexpected reuters-fkmeans-clusters/clusters-3 while processing Job-Specific Options:usage: <command> [Generic Options] [Job-Specific Options]Generic Options: -archives <paths> comma separated archives to be unarchived on the compute machines. -conf <configuration file> specify an application configuration file -D <property=value> use value for given property -files <paths> comma separated files to be copied to the map reduce cluster -fs <local|namenode:port> specify a namenode -jt <local|jobtracker:port> specify a job tracker -libjars <paths> comma separated jar files to include in the classpath. -tokenCacheFile <tokensFile> name of the file with the tokensUnexpected reuters-fkmeans-clusters/clusters-3 while processing Job-Specific Options: Usage: [--seqFileDir <seqFileDir> --output <output> --substring <substring> --numWords <numWords> --pointsDir <pointsDir> --dictionary <dictionary> --dictionaryType <dictionaryType> --help --tempDir <tempDir> --startPhase <startPhase> --endPhase <endPhase>] Job-Specific Options: --seqFileDir (-s) seqFileDir The directory containing Sequence Files for the Clusters --output (-o) output Optional output directory. Default is to output to the console. --substring (-b) substring The number of chars of the asFormatString() to print --numWords (-n) numWords The number of top terms to print --pointsDir (-p) pointsDir The directory containing points sequence files mapping input vectors to their cluster. If specified, then the program will output the points associated with a cluster --dictionary (-d) dictionary The dictionary file --dictionaryType (-dt) dictionaryType The dictionary file type (text|sequencefile) --help (-h) Print out help --tempDir tempDir Intermediate output directory --startPhase startPhase First phase to run --endPhase endPhase Last phase to run 13/03/08 17:26:11 INFO driver.MahoutDriver: Program took 133 ms
谢谢
回答:
mahout clusterdump \-d output/vectors/dictionary.file-0 \-dt sequencefile \-i output/clusters/clusters-2-final/part-00000 \-n 20 \-b 100 \-o cdump.txt \-p output/clusters/clusteredPoints/
只需将上述所有行复制粘贴到文本编辑器中,仔细设置-d
、-dt
、-i
、-p
的参数,就像我这样设置一样。
附注:路径来自HDFS。