隐马尔可夫模型(HMM)是一种你可以观察到一系列观测值,但并不知道模型生成这些观测值所经历的状态序列的模型。对隐马尔可夫模型的分析旨在从观测数据中恢复隐藏状态的序列。
我有一组数据,包括观测值和隐藏状态(观测值为连续值),其中隐藏状态是由专家标记的。我希望训练一个HMM,以便它能够根据一组(之前未见过的)观测序列,恢复相应的隐藏状态。
是否有任何R包可以做到这一点?研究现有的包(depmixS4, HMM, seqHMM – 仅适用于分类数据)只能让你指定隐藏状态的数量。
编辑:
示例:
data.tagged.by.expert = data.frame( hidden.state = c("Wake", "REM", "REM", "NonREM1", "NonREM2", "REM", "REM", "Wake"), sensor1 = c(1,1.2,1.2,1.3,4,2,1.78,0.65), sensor2 = c(7.2,5.3,5.1,1.2,2.3,7.5,7.8,2.1), sensor3 = c(0.01,0.02,0.08,0.8,0.03,0.01,0.15,0.45) )data.newly.measured = data.frame( sensor1 = c(2,3,4,5,2,1,2,4,5,8,4,6,1,2,5,3,2,1,4), sensor2 = c(2.1,2.3,2.2,4.2,4.2,2.2,2.2,5.3,2.4,1.0,2.5,2.4,1.2,8.4,5.2,5.5,5.2,4.3,7.8), sensor3 = c(0.23,0.25,0.23,0.54,0.36,0.85,0.01,0.52,0.09,0.12,0.85,0.45,0.26,0.08,0.01,0.55,0.67,0.82,0.35) )
我想创建一个具有离散时间 t 的HMM,其中随机变量 x(t) 表示时间 t 的隐藏状态,x(t) {“Wake”, “REM”, “NonREM1”, “NonREM2”},以及3个连续随机变量 sensor1(t), sensor2(t), sensor3(t) 表示时间 t 的观测值。
model.hmm = learn.model(data.tagged.by.user)
然后我想使用创建的模型来估计新测量观测值对应的隐藏状态
hidden.states = estimate.hidden.states(model.hmm, data.newly.measured)
回答:
数据(训练/测试)
为了能够运行朴素贝叶斯分类器的学习方法,我们需要更长的数据集
states = c("NonREM1", "NonREM2", "NonREM3", "REM", "Wake")artificial.hypnogram = rep(c(5,4,1,2,3,4,5), times = c(40,150,200,300,50,90,30))data.tagged.by.expert = data.frame( hidden.state = states[artificial.hypnogram], sensor1 = log(artificial.hypnogram) + runif(n = length(artificial.hypnogram), min = 0.2, max = 0.5), sensor2 = 10*artificial.hypnogram + sample(c(-8:8), size = length(artificial.hypnogram), replace = T), sensor3 = sample(1:100, size = length(artificial.hypnogram), replace = T))hidden.hypnogram = rep(c(5,4,1,2,4,5), times = c(10,10,15,10,10,3))data.newly.measured = data.frame( sensor1 = log(hidden.hypnogram) + runif(n = length(hidden.hypnogram), min = 0.2, max = 0.5), sensor2 = 10*hidden.hypnogram + sample(c(-8:8), size = length(hidden.hypnogram), replace = T), sensor3 = sample(1:100, size = length(hidden.hypnogram), replace = T))
解决方案
在解决方案中,我们使用了维特比算法 – 结合了朴素贝叶斯分类器。
在每个时钟时间 t,一个隐马尔可夫模型包括
-
一个未观测到的状态(在本例中标记为hidden.state),它取有限数量的状态
states = c("NonREM1", "NonREM2", "NonREM3", "REM", "Wake")
-
一组观测变量(在本例中为sensor1, sensor2, sensor3)
转移矩阵
进入新状态是基于转移概率分布 (转移矩阵)。这可以很容易地从data.tagged.by.expert数据中计算出来,例如使用
library(markovchain)emit_p <- markovchainFit(data.tagged.by.expert$hidden.state)$estimate
发射矩阵
每次转移后,会根据条件概率分布 (发射矩阵) 产生一个观测值(sensor_i),该分布仅依赖于hidden.state的当前状态H。我们将用朴素贝叶斯分类器替换发射矩阵。
library(caret)library(klaR)library(e1071)model = train(hidden.state ~ ., data = data.tagged.by.expert, method = 'nb', trControl=trainControl(method='cv',number=10) )
维特比算法
为了解决这个问题,我们使用维特比算法,初始概率为“Wake”状态的1,其他状态为0。(我们期望患者在实验开始时是清醒的)
# 我们期望患者在开始时是清醒的start_p = c(NonREM1 = 0,NonREM2 = 0,NonREM3 = 0, REM = 0, Wake = 1)# 朴素贝叶斯模型model_nb = model$finalModel# 观测值observations = data.newly.measured nObs <- nrow(observations) # 观测值的数量 nStates <- length(states) # 状态的数量# T1, T2 初始化T1 <- matrix(0, nrow = nStates, ncol = nObs) #定义两个二维表格row.names(T1) <- statesT2 <- T1Byj <- predict(model_nb, newdata = observations[1,])$posterior# 初始化T1的第一列for(s in states) T1[s,1] = start_p[s] * Byj[1,s]# 填充T1和T2表格for(j in 2:nObs) { Byj <- predict(model_nb, newdata = observations[j,])$posterior for(s in states) { res <- (T1[,j-1] * emit_p[,s]) * Byj[1,s] T2[s,j] <- states[which.max(res)] T1[s,j] <- max(res) }}# 回溯最佳路径result <- rep("", times = nObs)result[nObs] <- names(which.max(T1[,nObs]))for (j in nObs:2) { result[j-1] <- T2[result[j], j]}# 显示结果result# 显示原始的人工数据 states[hidden.hypnogram]
参考文献
要了解更多关于这个问题的信息,请参见 Vomlel Jiří, Kratochvíl Václav : Dynamic Bayesian Networks for the Classification of Sleep Stages , Proceedings of the 11th Workshop on Uncertainty Processing (WUPES’18), p. 205-215 , Eds: Kratochvíl Václav, Vejnarová Jiřina, Workshop on Uncertainty Processing (WUPES’18), (Třeboň, CZ, 2018/06/06) [2018] 下载