我试图将多个评论转换成字符矩阵,以便提取其中的数字,最终目的是构建基于字符的神经网络来识别金额。虽然它们存储为字符,我可以通过strsplit来验证,但来自iotools的mstrsplit函数似乎无法将它们放入矩阵中。
library("tm")library("iotools")sample1 <- "This is a number 2,000$ presented properly."sample2 <- "This could be another representation $ 2,000."sample3 <- "Often times there is 400 many $20 numbers 3.75$ in a single 1025.50 sentence."sample4 <- "Frequently a data 21/02/2017 precedes a number 5 000 or follows it February 21, 2017."sample5 <- "There are many 50 000 possible ways 20, 400$ that numbers can be presented."sample6 <- "Creating an exhaustive list is probably impossible at 9:52 int he morning."sample7 <- " use of different characters $ might be confusing."text1 <- c(sample1, sample2, sample3, sample4, sample5, sample6, sample7)text <- as.character(text1)str(text)characters <- strsplit(text, "")textcharacters matrixc <- mstrsplit(characters, sep=NA)matrixcmatrixc[1,19]
回答:
我找到了一个替代方法,即使用matrix(unlist(()))。
text1 <- c(sample1, sample2, sample3, sample4, sample5, sample6, sample7) text2 <- as.character(text1) str(text2) char1 <- strsplit(text2, character(0)) mat1 <- matrix(unlist(char1)) mat1