Indexed I/O users have a new option when it comes to finding that set of documents and narrowing down their collection to what they really want to see. We are excited to announce that we have integrated “More Like This” or “MLT” auto search functionality.
Unlike ‘near duplicate’ detection, the MLT actually searches for items that have similar content. Users can find items that are ‘like’ the current item being viewed, without the need for the item to be near exactly the same. MLT finds similar items, meaning you don’t miss those important, but very relevant, documents in your search.
When users open a document in the document viewer they will automatically execute a ‘More Like This’ search on the document. All items that meet the ‘More Like This’ threshold are automatically returned under the ‘Related Items’ tab in the “Similar” section. They are sorted top to bottom with the most ‘like’ item at the top.
So how does it work?
The standard similarity algorithm used our search engine is known as TF/IDF, or Term Frequency/Inverse Document Frequency, which takes the following factors into account:
How often does the term appear ? The more often, the more relevant. A field containing five mentions of the same term is more likely to be relevant than a field containing just one mention.
Inverse document frequency
How often does each term appear in the entire set/index? The more often, the less relevant. Terms that appear in many documents have a lower weight than more uncommon terms.
How big is the set of text of the document? The longer it is, the less likely it is that words in the document will be relevant. A term appearing in a document with less text carries more weight than the same term appearing in a document with a LOT of text.
The great part is this happens automatically when you view a document. All ‘Similar’ items are returned and made available under the ‘Related Items’ tab.
“More Like This” or “MLT” gives our users yet another great tool to narrow and focus on the important stuff.