您好, 欢迎来到 !    登录 | 注册 | | 设为首页 | 收藏本站

具有主题模型的LDA,如何查看不同文档属于哪些主题?

具有主题模型的LDA,如何查看不同文档属于哪些主题?

如何使用内置数据集。这将向您显示哪些文档属于哪个主题的可能性最高。

library(topicmodels)
data("AssociatedPress", package = "topicmodels")

k <- 5 # set number of topics
# generate model
lda <- LDA(AssociatedPress[1:20,], control = list(alpha = 0.1), k)
# Now we have a topic model with 20 docs and five topics

# make a data frame with topics as cols, docs as rows and
# cell values as posterior topic distribution for each document
gammaDF <- as.data.frame(lda@gamma) 
names(gammaDF) <- c(1:k)
# inspect...
gammaDF
              1            2            3            4            5
1  8.979807e-05 8.979807e-05 9.996408e-01 8.979807e-05 8.979807e-05
2  8.714836e-05 8.714836e-05 8.714836e-05 8.714836e-05 9.996514e-01
3  9.261396e-05 9.996295e-01 9.261396e-05 9.261396e-05 9.261396e-05
4  9.995437e-01 1.140774e-04 1.140774e-04 1.140774e-04 1.140774e-04
5  3.573528e-04 3.573528e-04 9.985706e-01 3.573528e-04 3.573528e-04
6  5.610659e-05 5.610659e-05 5.610659e-05 5.610659e-05 9.997756e-01
7  9.994345e-01 1.413820e-04 1.413820e-04 1.413820e-04 1.413820e-04
8  4.286702e-04 4.286702e-04 4.286702e-04 9.982853e-01 4.286702e-04
9  3.319338e-03 3.319338e-03 9.867226e-01 3.319338e-03 3.319338e-03
10 2.034781e-04 2.034781e-04 9.991861e-01 2.034781e-04 2.034781e-04
11 4.810342e-04 9.980759e-01 4.810342e-04 4.810342e-04 4.810342e-04
12 2.651256e-04 9.989395e-01 2.651256e-04 2.651256e-04 2.651256e-04
13 1.430945e-04 1.430945e-04 1.430945e-04 9.994276e-01 1.430945e-04
14 8.402940e-04 8.402940e-04 8.402940e-04 9.966388e-01 8.402940e-04
15 8.404830e-05 9.996638e-01 8.404830e-05 8.404830e-05 8.404830e-05
16 1.903630e-04 9.992385e-01 1.903630e-04 1.903630e-04 1.903630e-04
17 1.297372e-04 1.297372e-04 9.994811e-01 1.297372e-04 1.297372e-04
18 6.906241e-05 6.906241e-05 6.906241e-05 9.997238e-01 6.906241e-05
19 1.242780e-04 1.242780e-04 1.242780e-04 1.242780e-04 9.995029e-01
20 9.997361e-01 6.597684e-05 6.597684e-05 6.597684e-05 6.597684e-05


# Now for each doc, find just the top-ranked topic   
toptopics <- as.data.frame(cbind(document = row.names(gammaDF), 
  topic = apply(gammaDF,1,function(x) names(gammaDF)[which(x==max(x))])))
# inspect...
toptopics   
       document topic
1         1     2
2         2     5
3         3     1
4         4     4
5         5     4
6         6     5
7         7     2
8         8     4
9         9     1
10       10     2
11       11     3
12       12     1
13       13     1
14       14     2
15       15     1
16       16     4
17       17     4
18       18     3
19       19     4
20       20     3

那是你想做的吗? 此答案的提示:https : //stat.ethz.ch/pipermail/r-help/2010-August/247706.html

其他 2022/1/1 18:37:33 有555人围观

撰写回答


你尚未登录,登录后可以

和开发者交流问题的细节

关注并接收问题和回答的更新提醒

参与内容的编辑和改进,让解决方法与时俱进

请先登录

推荐问题


联系我
置顶