Is there a way to optimise this query
I am trying to calculate various tf-idf measures for a database with 4 million documents. I've created three tables:
CREATE TABLE document
(
id INTEGER Not NULL,
title VARCHAR(20) Not NULL,
body VARCHAR Not NULL
);
CREATE TABLE token
(
id serial Not NULL,
word VARCHAR Not NULL,
idf INTEGER Not NULL
);
CREATE TABLE token_count
(
docId INTEGER Not NULL,
tokenId INTEGER Not NULL,
amount INTEGER Not NULL
)
I am using the following code to populate the token_count
table, it works but it's pretty slow, is there a way to optimise it?
with temp_data as (
select id ,
(ts_stat('select to_tsvector(''english'', body) from document where id='||id)).*
from wikitable
)
insert into token_count (docid, tokenid, amount)
select
id,
(select id from token where word = temp_data.word LIMIT 1),
nentry
from temp_data
from Recent Questions - Stack Overflow https://ift.tt/3nYVLXI
https://ift.tt/eA8V8J
Comments
Post a Comment