没有现成的Posgres文本搜索功能可以找到最受欢迎的短语。对于两个单词的短语,您可以ts_stat()
用来查找最流行的单词,消除质点,介词等,并将这些单词交叉连接以查找最流行的单词对。
对于实际数据,您需要更改标记为--> parameter.
的值。对于较大的数据集,查询可能会非常昂贵。
with popular_words as (
select word
from ts_stat('select value::tsvector from a')
where nentry > 1 --> parameter
and not word in ('to', 'the', 'at', 'in', 'a') --> parameter
)
select concat_ws(' ', a1.word, a2.word) phrase, count(*)
from popular_words as a1
cross join popular_words as a2
cross join a
where value ilike format('%%%s %s%%', a1.word, a2.word)
group by 1
having count(*) > 1 --> parameter
order by 2 desc;
phrase | count
-----------------------+-------
movie theater | 3
learning disabilities | 2
(2 rows)