在没有完整的重新索引的情况下更新Lucene有效载荷(Updating Lucene payloads without a full re-index)

在Lucene中,我使用有效负载来为文档中的每个令牌存储信息(在我的案例中为float值)。 有时候,这些有效载荷可能需要更新。 如果我知道docID,termID,offset等,有没有什么办法可以更新有效载荷而无需重新索引整个文档?

In Lucene, I'm using payloads to store information for each token in a document (a float value in my case). From time to time, those payloads may need to be updated. If I know the docID, termID, offset, etc., is there any way for me to update the payloads in place without having to re-index the whole document?

最满意答案

我没有意识到任何Lucene API都支持这一点,甚至在引擎盖下的“更新”操作也会作为“删除”和“添加”添加操作来执行。

需要更多存储但减少IO和延迟的解决方法可能是将文档的整个源存储在Lucene索引本身中,或者存储在与Lucene索引相同的节点上的专用数据存储中。 然后,您仍然可以仅将更新的有效负载信息发送到您的应用程序,以更新您的文档。 但是整个文档仍然需要重新编制索引。

另请参见如何设置字段以在lucene中保持行是唯一的?

I'm no aware of any Lucene API to support this, even an "update" operation under the hood is executed as a "delete" and "add" add operation.

A workaround that will require more storage, but reduces IO and latency could be to store the whole source of a document either in the Lucene index itself or a dedicated data store on the same node as the Lucene index. Then you still could send only the updated payload info to your application, to get your document updated. But still the whole document needs to be re-indexed.

See also How to set a field to keep a row unique in lucene?

更多推荐