我一直在尝试用Clojure实现垃圾邮件分类器。我参考的书是《集体智慧编程》。这是用于训练分类器的训练方法:
(defn train [t cat] (incc cat) (let [ws (keys (getwords t))] (for [w ws] (incf w cat))))
这是我编写的sampletrain方法,仅用于将一些训练数据导入分类器中,这样我就不必每次都手动训练它。
(defn sampletrain [] (do (train "Nobody owns the water." "good") (train "the quick rabit jumps fences" "good") (train "buy pharmaceuticals now" "bad") (train "make quick money at the online casino" "bad") (train "the quick brown fox jumps" "good")))
不幸的是,sampletrain方法只用最后一项或句子“the quick brown fox jumps”训练了我的分类器,分类为“good”。最后,我的分类器看起来如下:{“the” {“good” 1}, “quick” {“goood” 1}, “brown” {“good” 1}, “fox” {“good” 1}, “jumps” {“good” 1}}。如您所见,它只用最后一项进行了训练。为了避免这种情况,我用“do”语句包装了所有内容,但我无法弄清楚为什么只有最后一次调用“train”方法被执行。
回答:
Clojure使用隐式返回,do
语句也是如此,因此train
对于每个句子都被调用,但您只返回了最后一个表达式求值的结果。您可以将它们包装在一个结构中以返回所有结果。
结果包装在向量中:
(defn sampletrain [] [(train "Nobody owns the water." "good") (train "the quick rabit jumps fences" "good") (train "buy pharmaceuticals now" "bad") (train "make quick money at the online casino" "bad") (train "the quick brown fox jumps" "good")])