[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 10:34:57 +00:00
|
|
|
//===-- DexIndexTests.cpp ----------------------------*- C++ -*-----------===//
|
|
|
|
//
|
|
|
|
// The LLVM Compiler Infrastructure
|
|
|
|
//
|
|
|
|
// This file is distributed under the University of Illinois Open Source
|
|
|
|
// License. See LICENSE.TXT for details.
|
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
2018-08-20 14:39:32 +00:00
|
|
|
#include "TestIndex.h"
|
|
|
|
#include "index/Index.h"
|
|
|
|
#include "index/Merge.h"
|
|
|
|
#include "index/dex/DexIndex.h"
|
2018-07-27 09:54:27 +00:00
|
|
|
#include "index/dex/Iterator.h"
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 10:34:57 +00:00
|
|
|
#include "index/dex/Token.h"
|
|
|
|
#include "index/dex/Trigram.h"
|
2018-07-27 09:54:27 +00:00
|
|
|
#include "llvm/Support/ScopedPrinter.h"
|
|
|
|
#include "llvm/Support/raw_ostream.h"
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 10:34:57 +00:00
|
|
|
#include "gmock/gmock.h"
|
|
|
|
#include "gtest/gtest.h"
|
|
|
|
#include <string>
|
|
|
|
#include <vector>
|
|
|
|
|
2018-08-20 14:39:32 +00:00
|
|
|
using ::testing::ElementsAre;
|
|
|
|
using ::testing::UnorderedElementsAre;
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
using namespace llvm;
|
2018-08-20 14:39:32 +00:00
|
|
|
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 10:34:57 +00:00
|
|
|
namespace clang {
|
|
|
|
namespace clangd {
|
|
|
|
namespace dex {
|
2018-08-20 14:39:32 +00:00
|
|
|
namespace {
|
[clangd] Proof-of-concept query iterators for Dex symbol index
This patch introduces three essential types of query iterators:
`DocumentIterator`, `AndIterator`, `OrIterator`. It provides a
convenient API for query tree generation and serves as a building block
for the next generation symbol index - Dex. Currently, many
optimizations are missed to improve code readability and to serve as the
reference implementation. Potential improvements are briefly mentioned
in `FIXME`s and will be addressed in the following patches.
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
Iterators, their applications and potential extensions are explained in
detail in the design proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: ioeric, sammccall, ilya-biryukov
Subscribers: cfe-commits, klimek, jfb, mgrang, mgorny, MaskRay, jkorous,
arphaman
Differential Revision: https://reviews.llvm.org/D49546
llvm-svn: 338017
2018-07-26 10:42:31 +00:00
|
|
|
|
2018-08-24 11:25:43 +00:00
|
|
|
std::vector<DocID> consumeIDs(Iterator &It) {
|
|
|
|
auto IDAndScore = consume(It);
|
2018-08-22 13:44:15 +00:00
|
|
|
std::vector<DocID> IDs(IDAndScore.size());
|
|
|
|
for (size_t I = 0; I < IDAndScore.size(); ++I)
|
|
|
|
IDs[I] = IDAndScore[I].first;
|
|
|
|
return IDs;
|
|
|
|
}
|
|
|
|
|
2018-07-27 09:54:27 +00:00
|
|
|
TEST(DexIndexIterators, DocumentIterator) {
|
|
|
|
const PostingList L = {4, 7, 8, 20, 42, 100};
|
|
|
|
auto DocIterator = create(L);
|
|
|
|
|
|
|
|
EXPECT_EQ(DocIterator->peek(), 4U);
|
2018-08-20 09:16:14 +00:00
|
|
|
EXPECT_FALSE(DocIterator->reachedEnd());
|
2018-07-27 09:54:27 +00:00
|
|
|
|
|
|
|
DocIterator->advance();
|
|
|
|
EXPECT_EQ(DocIterator->peek(), 7U);
|
2018-08-20 09:16:14 +00:00
|
|
|
EXPECT_FALSE(DocIterator->reachedEnd());
|
2018-07-27 09:54:27 +00:00
|
|
|
|
|
|
|
DocIterator->advanceTo(20);
|
|
|
|
EXPECT_EQ(DocIterator->peek(), 20U);
|
2018-08-20 09:16:14 +00:00
|
|
|
EXPECT_FALSE(DocIterator->reachedEnd());
|
2018-07-27 09:54:27 +00:00
|
|
|
|
|
|
|
DocIterator->advanceTo(65);
|
|
|
|
EXPECT_EQ(DocIterator->peek(), 100U);
|
2018-08-20 09:16:14 +00:00
|
|
|
EXPECT_FALSE(DocIterator->reachedEnd());
|
2018-07-27 09:54:27 +00:00
|
|
|
|
|
|
|
DocIterator->advanceTo(420);
|
2018-08-20 09:16:14 +00:00
|
|
|
EXPECT_TRUE(DocIterator->reachedEnd());
|
2018-07-27 09:54:27 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
TEST(DexIndexIterators, AndWithEmpty) {
|
|
|
|
const PostingList L0;
|
|
|
|
const PostingList L1 = {0, 5, 7, 10, 42, 320, 9000};
|
|
|
|
|
|
|
|
auto AndEmpty = createAnd(create(L0));
|
2018-08-20 09:16:14 +00:00
|
|
|
EXPECT_TRUE(AndEmpty->reachedEnd());
|
2018-07-27 09:54:27 +00:00
|
|
|
|
|
|
|
auto AndWithEmpty = createAnd(create(L0), create(L1));
|
2018-08-20 09:16:14 +00:00
|
|
|
EXPECT_TRUE(AndWithEmpty->reachedEnd());
|
2018-07-27 09:54:27 +00:00
|
|
|
|
2018-08-22 13:44:15 +00:00
|
|
|
EXPECT_THAT(consumeIDs(*AndWithEmpty), ElementsAre());
|
2018-07-27 09:54:27 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
TEST(DexIndexIterators, AndTwoLists) {
|
|
|
|
const PostingList L0 = {0, 5, 7, 10, 42, 320, 9000};
|
|
|
|
const PostingList L1 = {0, 4, 7, 10, 30, 60, 320, 9000};
|
|
|
|
|
|
|
|
auto And = createAnd(create(L1), create(L0));
|
|
|
|
|
2018-08-20 09:16:14 +00:00
|
|
|
EXPECT_FALSE(And->reachedEnd());
|
2018-08-22 13:44:15 +00:00
|
|
|
EXPECT_THAT(consumeIDs(*And), ElementsAre(0U, 7U, 10U, 320U, 9000U));
|
2018-07-27 09:54:27 +00:00
|
|
|
|
|
|
|
And = createAnd(create(L0), create(L1));
|
|
|
|
|
|
|
|
And->advanceTo(0);
|
|
|
|
EXPECT_EQ(And->peek(), 0U);
|
|
|
|
And->advanceTo(5);
|
|
|
|
EXPECT_EQ(And->peek(), 7U);
|
|
|
|
And->advanceTo(10);
|
|
|
|
EXPECT_EQ(And->peek(), 10U);
|
|
|
|
And->advanceTo(42);
|
|
|
|
EXPECT_EQ(And->peek(), 320U);
|
|
|
|
And->advanceTo(8999);
|
|
|
|
EXPECT_EQ(And->peek(), 9000U);
|
|
|
|
And->advanceTo(9001);
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST(DexIndexIterators, AndThreeLists) {
|
|
|
|
const PostingList L0 = {0, 5, 7, 10, 42, 320, 9000};
|
|
|
|
const PostingList L1 = {0, 4, 7, 10, 30, 60, 320, 9000};
|
|
|
|
const PostingList L2 = {1, 4, 7, 11, 30, 60, 320, 9000};
|
|
|
|
|
|
|
|
auto And = createAnd(create(L0), create(L1), create(L2));
|
|
|
|
EXPECT_EQ(And->peek(), 7U);
|
|
|
|
And->advanceTo(300);
|
|
|
|
EXPECT_EQ(And->peek(), 320U);
|
|
|
|
And->advanceTo(100000);
|
|
|
|
|
2018-08-20 09:16:14 +00:00
|
|
|
EXPECT_TRUE(And->reachedEnd());
|
2018-07-27 09:54:27 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
TEST(DexIndexIterators, OrWithEmpty) {
|
|
|
|
const PostingList L0;
|
|
|
|
const PostingList L1 = {0, 5, 7, 10, 42, 320, 9000};
|
|
|
|
|
|
|
|
auto OrEmpty = createOr(create(L0));
|
2018-08-20 09:16:14 +00:00
|
|
|
EXPECT_TRUE(OrEmpty->reachedEnd());
|
2018-07-27 09:54:27 +00:00
|
|
|
|
|
|
|
auto OrWithEmpty = createOr(create(L0), create(L1));
|
2018-08-20 09:16:14 +00:00
|
|
|
EXPECT_FALSE(OrWithEmpty->reachedEnd());
|
2018-07-27 09:54:27 +00:00
|
|
|
|
2018-08-22 13:44:15 +00:00
|
|
|
EXPECT_THAT(consumeIDs(*OrWithEmpty),
|
2018-07-27 09:54:27 +00:00
|
|
|
ElementsAre(0U, 5U, 7U, 10U, 42U, 320U, 9000U));
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST(DexIndexIterators, OrTwoLists) {
|
|
|
|
const PostingList L0 = {0, 5, 7, 10, 42, 320, 9000};
|
|
|
|
const PostingList L1 = {0, 4, 7, 10, 30, 60, 320, 9000};
|
|
|
|
|
|
|
|
auto Or = createOr(create(L0), create(L1));
|
|
|
|
|
2018-08-20 09:16:14 +00:00
|
|
|
EXPECT_FALSE(Or->reachedEnd());
|
2018-07-27 09:54:27 +00:00
|
|
|
EXPECT_EQ(Or->peek(), 0U);
|
|
|
|
Or->advance();
|
|
|
|
EXPECT_EQ(Or->peek(), 4U);
|
|
|
|
Or->advance();
|
|
|
|
EXPECT_EQ(Or->peek(), 5U);
|
|
|
|
Or->advance();
|
|
|
|
EXPECT_EQ(Or->peek(), 7U);
|
|
|
|
Or->advance();
|
|
|
|
EXPECT_EQ(Or->peek(), 10U);
|
|
|
|
Or->advance();
|
|
|
|
EXPECT_EQ(Or->peek(), 30U);
|
|
|
|
Or->advanceTo(42);
|
|
|
|
EXPECT_EQ(Or->peek(), 42U);
|
|
|
|
Or->advanceTo(300);
|
|
|
|
EXPECT_EQ(Or->peek(), 320U);
|
|
|
|
Or->advanceTo(9000);
|
|
|
|
EXPECT_EQ(Or->peek(), 9000U);
|
|
|
|
Or->advanceTo(9001);
|
2018-08-20 09:16:14 +00:00
|
|
|
EXPECT_TRUE(Or->reachedEnd());
|
2018-07-27 09:54:27 +00:00
|
|
|
|
|
|
|
Or = createOr(create(L0), create(L1));
|
|
|
|
|
2018-08-22 13:44:15 +00:00
|
|
|
EXPECT_THAT(consumeIDs(*Or),
|
2018-07-27 09:54:27 +00:00
|
|
|
ElementsAre(0U, 4U, 5U, 7U, 10U, 30U, 42U, 60U, 320U, 9000U));
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST(DexIndexIterators, OrThreeLists) {
|
|
|
|
const PostingList L0 = {0, 5, 7, 10, 42, 320, 9000};
|
|
|
|
const PostingList L1 = {0, 4, 7, 10, 30, 60, 320, 9000};
|
|
|
|
const PostingList L2 = {1, 4, 7, 11, 30, 60, 320, 9000};
|
|
|
|
|
|
|
|
auto Or = createOr(create(L0), create(L1), create(L2));
|
|
|
|
|
2018-08-20 09:16:14 +00:00
|
|
|
EXPECT_FALSE(Or->reachedEnd());
|
2018-07-27 09:54:27 +00:00
|
|
|
EXPECT_EQ(Or->peek(), 0U);
|
|
|
|
|
|
|
|
Or->advance();
|
|
|
|
EXPECT_EQ(Or->peek(), 1U);
|
|
|
|
|
|
|
|
Or->advance();
|
|
|
|
EXPECT_EQ(Or->peek(), 4U);
|
|
|
|
|
|
|
|
Or->advanceTo(7);
|
|
|
|
|
|
|
|
Or->advanceTo(59);
|
|
|
|
EXPECT_EQ(Or->peek(), 60U);
|
|
|
|
|
|
|
|
Or->advanceTo(9001);
|
2018-08-20 09:16:14 +00:00
|
|
|
EXPECT_TRUE(Or->reachedEnd());
|
2018-07-27 09:54:27 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// FIXME(kbobyrev): The testcase below is similar to what is expected in real
|
|
|
|
// queries. It should be updated once new iterators (such as boosting, limiting,
|
|
|
|
// etc iterators) appear. However, it is not exhaustive and it would be
|
2018-08-22 13:44:15 +00:00
|
|
|
// beneficial to implement automatic generation (e.g. fuzzing) of query trees
|
|
|
|
// for more comprehensive testing.
|
2018-07-27 09:54:27 +00:00
|
|
|
TEST(DexIndexIterators, QueryTree) {
|
|
|
|
//
|
|
|
|
// +-----------------+
|
|
|
|
// |And Iterator:1, 5|
|
|
|
|
// +--------+--------+
|
|
|
|
// |
|
|
|
|
// |
|
2018-08-22 13:44:15 +00:00
|
|
|
// +-------------+----------------------+
|
2018-07-27 09:54:27 +00:00
|
|
|
// | |
|
|
|
|
// | |
|
2018-08-22 13:44:15 +00:00
|
|
|
// +----------v----------+ +----------v------------+
|
|
|
|
// |And Iterator: 1, 5, 9| |Or Iterator: 0, 1, 3, 5|
|
|
|
|
// +----------+----------+ +----------+------------+
|
2018-07-27 09:54:27 +00:00
|
|
|
// | |
|
2018-08-22 13:44:15 +00:00
|
|
|
// +------+-----+ +---------------------+
|
2018-07-27 09:54:27 +00:00
|
|
|
// | | | | |
|
2018-08-22 13:44:15 +00:00
|
|
|
// +-------v-----+ +----+---+ +--v--+ +---v----+ +----v---+
|
|
|
|
// |1, 3, 5, 8, 9| |Boost: 2| |Empty| |Boost: 3| |Boost: 4|
|
|
|
|
// +-------------+ +----+---+ +-----+ +---+----+ +----+---+
|
|
|
|
// | | |
|
|
|
|
// +----v-----+ +-v--+ +---v---+
|
|
|
|
// |1, 5, 7, 9| |1, 5| |0, 3, 5|
|
|
|
|
// +----------+ +----+ +-------+
|
|
|
|
//
|
2018-07-27 09:54:27 +00:00
|
|
|
const PostingList L0 = {1, 3, 5, 8, 9};
|
|
|
|
const PostingList L1 = {1, 5, 7, 9};
|
2018-08-22 13:44:15 +00:00
|
|
|
const PostingList L3;
|
|
|
|
const PostingList L4 = {1, 5};
|
|
|
|
const PostingList L5 = {0, 3, 5};
|
2018-07-27 09:54:27 +00:00
|
|
|
|
|
|
|
// Root of the query tree: [1, 5]
|
|
|
|
auto Root = createAnd(
|
|
|
|
// Lower And Iterator: [1, 5, 9]
|
2018-08-22 13:44:15 +00:00
|
|
|
createAnd(create(L0), createBoost(create(L1), 2U)),
|
2018-07-27 09:54:27 +00:00
|
|
|
// Lower Or Iterator: [0, 1, 5]
|
2018-08-22 13:44:15 +00:00
|
|
|
createOr(create(L3), createBoost(create(L4), 3U),
|
|
|
|
createBoost(create(L5), 4U)));
|
2018-07-27 09:54:27 +00:00
|
|
|
|
2018-08-20 09:16:14 +00:00
|
|
|
EXPECT_FALSE(Root->reachedEnd());
|
2018-07-27 09:54:27 +00:00
|
|
|
EXPECT_EQ(Root->peek(), 1U);
|
|
|
|
Root->advanceTo(0);
|
|
|
|
// Advance multiple times. Shouldn't do anything.
|
|
|
|
Root->advanceTo(1);
|
|
|
|
Root->advanceTo(0);
|
|
|
|
EXPECT_EQ(Root->peek(), 1U);
|
2018-08-24 11:25:43 +00:00
|
|
|
auto ElementBoost = Root->consume();
|
2018-08-22 13:44:15 +00:00
|
|
|
EXPECT_THAT(ElementBoost, 6);
|
2018-07-27 09:54:27 +00:00
|
|
|
Root->advance();
|
|
|
|
EXPECT_EQ(Root->peek(), 5U);
|
|
|
|
Root->advanceTo(5);
|
|
|
|
EXPECT_EQ(Root->peek(), 5U);
|
2018-08-24 11:25:43 +00:00
|
|
|
ElementBoost = Root->consume();
|
2018-08-22 13:44:15 +00:00
|
|
|
EXPECT_THAT(ElementBoost, 8);
|
2018-07-27 09:54:27 +00:00
|
|
|
Root->advanceTo(9000);
|
2018-08-20 09:16:14 +00:00
|
|
|
EXPECT_TRUE(Root->reachedEnd());
|
2018-07-27 09:54:27 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
TEST(DexIndexIterators, StringRepresentation) {
|
|
|
|
const PostingList L0 = {4, 7, 8, 20, 42, 100};
|
|
|
|
const PostingList L1 = {1, 3, 5, 8, 9};
|
|
|
|
const PostingList L2 = {1, 5, 7, 9};
|
|
|
|
const PostingList L3 = {0, 5};
|
|
|
|
const PostingList L4 = {0, 1, 5};
|
|
|
|
const PostingList L5;
|
|
|
|
|
2018-08-16 13:19:43 +00:00
|
|
|
EXPECT_EQ(llvm::to_string(*(create(L0))), "[{4}, 7, 8, 20, 42, 100, END]");
|
2018-07-27 09:54:27 +00:00
|
|
|
|
|
|
|
auto Nested = createAnd(createAnd(create(L1), create(L2)),
|
|
|
|
createOr(create(L3), create(L4), create(L5)));
|
|
|
|
|
|
|
|
EXPECT_EQ(llvm::to_string(*Nested),
|
2018-08-30 12:29:36 +00:00
|
|
|
"(& (| [0, {5}, END] [0, {1}, 5, END] [{END}]) (& [{1}, 5, 7, 9, "
|
2018-08-30 11:23:58 +00:00
|
|
|
"END] [{1}, 3, 5, 8, 9, END]))");
|
2018-07-27 09:54:27 +00:00
|
|
|
}
|
|
|
|
|
2018-08-10 11:50:44 +00:00
|
|
|
TEST(DexIndexIterators, Limit) {
|
2018-08-24 11:25:43 +00:00
|
|
|
const PostingList L0 = {3, 6, 7, 20, 42, 100};
|
|
|
|
const PostingList L1 = {1, 3, 5, 6, 7, 30, 100};
|
|
|
|
const PostingList L2 = {0, 3, 5, 7, 8, 100};
|
2018-08-10 11:50:44 +00:00
|
|
|
|
2018-08-24 11:25:43 +00:00
|
|
|
auto DocIterator = createLimit(create(L0), 42);
|
|
|
|
EXPECT_THAT(consumeIDs(*DocIterator), ElementsAre(3, 6, 7, 20, 42, 100));
|
2018-08-10 11:50:44 +00:00
|
|
|
|
2018-08-24 11:25:43 +00:00
|
|
|
DocIterator = createLimit(create(L0), 3);
|
|
|
|
EXPECT_THAT(consumeIDs(*DocIterator), ElementsAre(3, 6, 7));
|
2018-08-10 11:50:44 +00:00
|
|
|
|
2018-08-24 11:25:43 +00:00
|
|
|
DocIterator = createLimit(create(L0), 0);
|
|
|
|
EXPECT_THAT(consumeIDs(*DocIterator), ElementsAre());
|
2018-08-10 11:50:44 +00:00
|
|
|
|
2018-08-24 11:25:43 +00:00
|
|
|
auto AndIterator =
|
|
|
|
createAnd(createLimit(createTrue(9000), 343), createLimit(create(L0), 2),
|
|
|
|
createLimit(create(L1), 3), createLimit(create(L2), 42));
|
|
|
|
EXPECT_THAT(consumeIDs(*AndIterator), ElementsAre(3, 7));
|
2018-08-10 11:50:44 +00:00
|
|
|
}
|
|
|
|
|
2018-08-20 08:47:30 +00:00
|
|
|
TEST(DexIndexIterators, True) {
|
|
|
|
auto TrueIterator = createTrue(0U);
|
2018-08-20 09:16:14 +00:00
|
|
|
EXPECT_TRUE(TrueIterator->reachedEnd());
|
2018-08-22 13:44:15 +00:00
|
|
|
EXPECT_THAT(consumeIDs(*TrueIterator), ElementsAre());
|
2018-08-20 08:47:30 +00:00
|
|
|
|
|
|
|
PostingList L0 = {1, 2, 5, 7};
|
|
|
|
TrueIterator = createTrue(7U);
|
|
|
|
EXPECT_THAT(TrueIterator->peek(), 0);
|
|
|
|
auto AndIterator = createAnd(create(L0), move(TrueIterator));
|
2018-08-20 09:16:14 +00:00
|
|
|
EXPECT_FALSE(AndIterator->reachedEnd());
|
2018-08-22 13:44:15 +00:00
|
|
|
EXPECT_THAT(consumeIDs(*AndIterator), ElementsAre(1, 2, 5));
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST(DexIndexIterators, Boost) {
|
|
|
|
auto BoostIterator = createBoost(createTrue(5U), 42U);
|
|
|
|
EXPECT_FALSE(BoostIterator->reachedEnd());
|
2018-08-24 11:25:43 +00:00
|
|
|
auto ElementBoost = BoostIterator->consume();
|
2018-08-22 13:44:15 +00:00
|
|
|
EXPECT_THAT(ElementBoost, 42U);
|
|
|
|
|
|
|
|
const PostingList L0 = {2, 4};
|
|
|
|
const PostingList L1 = {1, 4};
|
|
|
|
auto Root = createOr(createTrue(5U), createBoost(create(L0), 2U),
|
|
|
|
createBoost(create(L1), 3U));
|
|
|
|
|
2018-08-24 11:25:43 +00:00
|
|
|
ElementBoost = Root->consume();
|
2018-08-22 13:44:15 +00:00
|
|
|
EXPECT_THAT(ElementBoost, Iterator::DEFAULT_BOOST_SCORE);
|
|
|
|
Root->advance();
|
|
|
|
EXPECT_THAT(Root->peek(), 1U);
|
2018-08-24 11:25:43 +00:00
|
|
|
ElementBoost = Root->consume();
|
2018-08-22 13:44:15 +00:00
|
|
|
EXPECT_THAT(ElementBoost, 3);
|
|
|
|
|
|
|
|
Root->advance();
|
|
|
|
EXPECT_THAT(Root->peek(), 2U);
|
2018-08-24 11:25:43 +00:00
|
|
|
ElementBoost = Root->consume();
|
2018-08-22 13:44:15 +00:00
|
|
|
EXPECT_THAT(ElementBoost, 2);
|
|
|
|
|
|
|
|
Root->advanceTo(4);
|
2018-08-24 11:25:43 +00:00
|
|
|
ElementBoost = Root->consume();
|
2018-08-22 13:44:15 +00:00
|
|
|
EXPECT_THAT(ElementBoost, 3);
|
2018-08-20 08:47:30 +00:00
|
|
|
}
|
|
|
|
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 10:34:57 +00:00
|
|
|
testing::Matcher<std::vector<Token>>
|
|
|
|
trigramsAre(std::initializer_list<std::string> Trigrams) {
|
|
|
|
std::vector<Token> Tokens;
|
|
|
|
for (const auto &Symbols : Trigrams) {
|
|
|
|
Tokens.push_back(Token(Token::Kind::Trigram, Symbols));
|
|
|
|
}
|
|
|
|
return testing::UnorderedElementsAreArray(Tokens);
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST(DexIndexTrigrams, IdentifierTrigrams) {
|
2018-08-13 08:57:06 +00:00
|
|
|
EXPECT_THAT(generateIdentifierTrigrams("X86"),
|
2018-08-27 17:26:43 +00:00
|
|
|
trigramsAre({"x86", "x$$", "x8$"}));
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 10:34:57 +00:00
|
|
|
|
2018-08-27 17:26:43 +00:00
|
|
|
EXPECT_THAT(generateIdentifierTrigrams("nl"), trigramsAre({"nl$", "n$$"}));
|
2018-08-13 08:57:06 +00:00
|
|
|
|
2018-08-27 17:26:43 +00:00
|
|
|
EXPECT_THAT(generateIdentifierTrigrams("n"), trigramsAre({"n$$"}));
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 10:34:57 +00:00
|
|
|
|
|
|
|
EXPECT_THAT(generateIdentifierTrigrams("clangd"),
|
2018-08-27 17:26:43 +00:00
|
|
|
trigramsAre({"c$$", "cl$", "cla", "lan", "ang", "ngd"}));
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 10:34:57 +00:00
|
|
|
|
|
|
|
EXPECT_THAT(generateIdentifierTrigrams("abc_def"),
|
2018-08-13 08:57:06 +00:00
|
|
|
trigramsAre({"a$$", "abc", "abd", "ade", "bcd", "bde", "cde",
|
2018-08-27 17:26:43 +00:00
|
|
|
"def", "ab$", "ad$"}));
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 10:34:57 +00:00
|
|
|
|
2018-08-13 08:57:06 +00:00
|
|
|
EXPECT_THAT(generateIdentifierTrigrams("a_b_c_d_e_"),
|
|
|
|
trigramsAre({"a$$", "a_$", "a_b", "abc", "abd", "acd", "ace",
|
2018-08-27 17:26:43 +00:00
|
|
|
"bcd", "bce", "bde", "cde", "ab$"}));
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 10:34:57 +00:00
|
|
|
|
2018-08-13 08:57:06 +00:00
|
|
|
EXPECT_THAT(generateIdentifierTrigrams("unique_ptr"),
|
|
|
|
trigramsAre({"u$$", "uni", "unp", "upt", "niq", "nip", "npt",
|
|
|
|
"iqu", "iqp", "ipt", "que", "qup", "qpt", "uep",
|
2018-08-27 17:26:43 +00:00
|
|
|
"ept", "ptr", "un$", "up$"}));
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 10:34:57 +00:00
|
|
|
|
2018-08-27 17:26:43 +00:00
|
|
|
EXPECT_THAT(
|
|
|
|
generateIdentifierTrigrams("TUDecl"),
|
|
|
|
trigramsAre({"t$$", "tud", "tde", "ude", "dec", "ecl", "tu$", "td$"}));
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 10:34:57 +00:00
|
|
|
|
|
|
|
EXPECT_THAT(generateIdentifierTrigrams("IsOK"),
|
2018-08-27 17:26:43 +00:00
|
|
|
trigramsAre({"i$$", "iso", "iok", "sok", "is$", "io$"}));
|
2018-08-13 08:57:06 +00:00
|
|
|
|
|
|
|
EXPECT_THAT(
|
|
|
|
generateIdentifierTrigrams("abc_defGhij__klm"),
|
|
|
|
trigramsAre({"a$$", "abc", "abd", "abg", "ade", "adg", "adk", "agh",
|
|
|
|
"agk", "bcd", "bcg", "bde", "bdg", "bdk", "bgh", "bgk",
|
|
|
|
"cde", "cdg", "cdk", "cgh", "cgk", "def", "deg", "dek",
|
|
|
|
"dgh", "dgk", "dkl", "efg", "efk", "egh", "egk", "ekl",
|
|
|
|
"fgh", "fgk", "fkl", "ghi", "ghk", "gkl", "hij", "hik",
|
2018-08-27 17:26:43 +00:00
|
|
|
"hkl", "ijk", "ikl", "jkl", "klm", "ab$", "ad$"}));
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 10:34:57 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
TEST(DexIndexTrigrams, QueryTrigrams) {
|
2018-08-13 08:57:06 +00:00
|
|
|
EXPECT_THAT(generateQueryTrigrams("c"), trigramsAre({"c$$"}));
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("cl"), trigramsAre({"cl$"}));
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("cla"), trigramsAre({"cla"}));
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 10:34:57 +00:00
|
|
|
|
2018-08-13 08:57:06 +00:00
|
|
|
EXPECT_THAT(generateQueryTrigrams("_"), trigramsAre({"_$$"}));
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("__"), trigramsAre({"__$"}));
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("___"), trigramsAre({"___"}));
|
|
|
|
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("X86"), trigramsAre({"x86"}));
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 10:34:57 +00:00
|
|
|
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("clangd"),
|
|
|
|
trigramsAre({"cla", "lan", "ang", "ngd"}));
|
|
|
|
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("abc_def"),
|
|
|
|
trigramsAre({"abc", "bcd", "cde", "def"}));
|
|
|
|
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("a_b_c_d_e_"),
|
|
|
|
trigramsAre({"abc", "bcd", "cde"}));
|
|
|
|
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("unique_ptr"),
|
|
|
|
trigramsAre({"uni", "niq", "iqu", "que", "uep", "ept", "ptr"}));
|
|
|
|
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("TUDecl"),
|
|
|
|
trigramsAre({"tud", "ude", "dec", "ecl"}));
|
|
|
|
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("IsOK"), trigramsAre({"iso", "sok"}));
|
|
|
|
|
|
|
|
EXPECT_THAT(generateQueryTrigrams("abc_defGhij__klm"),
|
|
|
|
trigramsAre({"abc", "bcd", "cde", "def", "efg", "fgh", "ghi",
|
|
|
|
"hij", "ijk", "jkl", "klm"}));
|
|
|
|
}
|
|
|
|
|
2018-08-20 14:39:32 +00:00
|
|
|
TEST(DexIndex, Lookup) {
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
auto I = DexIndex::build(generateSymbols({"ns::abc", "ns::xyz"}));
|
|
|
|
EXPECT_THAT(lookup(*I, SymbolID("ns::abc")), UnorderedElementsAre("ns::abc"));
|
|
|
|
EXPECT_THAT(lookup(*I, {SymbolID("ns::abc"), SymbolID("ns::xyz")}),
|
2018-08-20 14:39:32 +00:00
|
|
|
UnorderedElementsAre("ns::abc", "ns::xyz"));
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
EXPECT_THAT(lookup(*I, {SymbolID("ns::nonono"), SymbolID("ns::xyz")}),
|
2018-08-20 14:39:32 +00:00
|
|
|
UnorderedElementsAre("ns::xyz"));
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
EXPECT_THAT(lookup(*I, SymbolID("ns::nonono")), UnorderedElementsAre());
|
2018-08-20 14:39:32 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
TEST(DexIndex, FuzzyFind) {
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
auto Index = DexIndex::build(
|
|
|
|
generateSymbols({"ns::ABC", "ns::BCD", "::ABC", "ns::nested::ABC",
|
|
|
|
"other::ABC", "other::A"}));
|
2018-08-20 14:39:32 +00:00
|
|
|
FuzzyFindRequest Req;
|
|
|
|
Req.Query = "ABC";
|
|
|
|
Req.Scopes = {"ns::"};
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
EXPECT_THAT(match(*Index, Req), UnorderedElementsAre("ns::ABC"));
|
2018-08-20 14:39:32 +00:00
|
|
|
Req.Scopes = {"ns::", "ns::nested::"};
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
EXPECT_THAT(match(*Index, Req),
|
2018-08-20 14:39:32 +00:00
|
|
|
UnorderedElementsAre("ns::ABC", "ns::nested::ABC"));
|
|
|
|
Req.Query = "A";
|
|
|
|
Req.Scopes = {"other::"};
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
EXPECT_THAT(match(*Index, Req),
|
2018-08-20 14:39:32 +00:00
|
|
|
UnorderedElementsAre("other::A", "other::ABC"));
|
|
|
|
Req.Query = "";
|
|
|
|
Req.Scopes = {};
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
EXPECT_THAT(match(*Index, Req),
|
2018-08-20 14:39:32 +00:00
|
|
|
UnorderedElementsAre("ns::ABC", "ns::BCD", "::ABC",
|
|
|
|
"ns::nested::ABC", "other::ABC",
|
|
|
|
"other::A"));
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST(DexIndexTest, FuzzyMatchQ) {
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
auto I = DexIndex::build(
|
2018-08-20 14:39:32 +00:00
|
|
|
generateSymbols({"LaughingOutLoud", "LionPopulation", "LittleOldLady"}));
|
|
|
|
FuzzyFindRequest Req;
|
|
|
|
Req.Query = "lol";
|
|
|
|
Req.MaxCandidateCount = 2;
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
EXPECT_THAT(match(*I, Req),
|
2018-08-20 14:39:32 +00:00
|
|
|
UnorderedElementsAre("LaughingOutLoud", "LittleOldLady"));
|
|
|
|
}
|
|
|
|
|
|
|
|
// FIXME(kbobyrev): This test is different for DexIndex and MemIndex: while
|
|
|
|
// MemIndex manages response deduplication, DexIndex simply returns all matched
|
|
|
|
// symbols which means there might be equivalent symbols in the response.
|
|
|
|
// Before drop-in replacement of MemIndex with DexIndex happens, FileIndex
|
|
|
|
// should handle deduplication instead.
|
|
|
|
TEST(DexIndexTest, DexIndexDeduplicate) {
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
std::vector<Symbol> Symbols = {symbol("1"), symbol("2"), symbol("3"),
|
|
|
|
symbol("2") /* duplicate */};
|
2018-08-20 14:39:32 +00:00
|
|
|
FuzzyFindRequest Req;
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
Req.Query = "2";
|
|
|
|
DexIndex I(Symbols);
|
|
|
|
EXPECT_THAT(match(I, Req), ElementsAre("2", "2"));
|
2018-08-20 14:39:32 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
TEST(DexIndexTest, DexIndexLimitedNumMatches) {
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
auto I = DexIndex::build(generateNumSymbols(0, 100));
|
2018-08-20 14:39:32 +00:00
|
|
|
FuzzyFindRequest Req;
|
|
|
|
Req.Query = "5";
|
|
|
|
Req.MaxCandidateCount = 3;
|
|
|
|
bool Incomplete;
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
auto Matches = match(*I, Req, &Incomplete);
|
2018-08-20 14:39:32 +00:00
|
|
|
EXPECT_EQ(Matches.size(), Req.MaxCandidateCount);
|
|
|
|
EXPECT_TRUE(Incomplete);
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST(DexIndexTest, FuzzyMatch) {
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
auto I = DexIndex::build(
|
2018-08-20 14:39:32 +00:00
|
|
|
generateSymbols({"LaughingOutLoud", "LionPopulation", "LittleOldLady"}));
|
|
|
|
FuzzyFindRequest Req;
|
|
|
|
Req.Query = "lol";
|
|
|
|
Req.MaxCandidateCount = 2;
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
EXPECT_THAT(match(*I, Req),
|
2018-08-20 14:39:32 +00:00
|
|
|
UnorderedElementsAre("LaughingOutLoud", "LittleOldLady"));
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST(DexIndexTest, MatchQualifiedNamesWithoutSpecificScope) {
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
auto I = DexIndex::build(generateSymbols({"a::y1", "b::y2", "y3"}));
|
2018-08-20 14:39:32 +00:00
|
|
|
FuzzyFindRequest Req;
|
|
|
|
Req.Query = "y";
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
EXPECT_THAT(match(*I, Req), UnorderedElementsAre("a::y1", "b::y2", "y3"));
|
2018-08-20 14:39:32 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
TEST(DexIndexTest, MatchQualifiedNamesWithGlobalScope) {
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
auto I = DexIndex::build(generateSymbols({"a::y1", "b::y2", "y3"}));
|
2018-08-20 14:39:32 +00:00
|
|
|
FuzzyFindRequest Req;
|
|
|
|
Req.Query = "y";
|
|
|
|
Req.Scopes = {""};
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
EXPECT_THAT(match(*I, Req), UnorderedElementsAre("y3"));
|
2018-08-20 14:39:32 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
TEST(DexIndexTest, MatchQualifiedNamesWithOneScope) {
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
auto I = DexIndex::build(
|
|
|
|
generateSymbols({"a::y1", "a::y2", "a::x", "b::y2", "y3"}));
|
2018-08-20 14:39:32 +00:00
|
|
|
FuzzyFindRequest Req;
|
|
|
|
Req.Query = "y";
|
|
|
|
Req.Scopes = {"a::"};
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
EXPECT_THAT(match(*I, Req), UnorderedElementsAre("a::y1", "a::y2"));
|
2018-08-20 14:39:32 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
TEST(DexIndexTest, MatchQualifiedNamesWithMultipleScopes) {
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
auto I = DexIndex::build(
|
|
|
|
generateSymbols({"a::y1", "a::y2", "a::x", "b::y3", "y3"}));
|
2018-08-20 14:39:32 +00:00
|
|
|
FuzzyFindRequest Req;
|
|
|
|
Req.Query = "y";
|
|
|
|
Req.Scopes = {"a::", "b::"};
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
EXPECT_THAT(match(*I, Req), UnorderedElementsAre("a::y1", "a::y2", "b::y3"));
|
2018-08-20 14:39:32 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
TEST(DexIndexTest, NoMatchNestedScopes) {
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
auto I = DexIndex::build(generateSymbols({"a::y1", "a::b::y2"}));
|
2018-08-20 14:39:32 +00:00
|
|
|
FuzzyFindRequest Req;
|
|
|
|
Req.Query = "y";
|
|
|
|
Req.Scopes = {"a::"};
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
EXPECT_THAT(match(*I, Req), UnorderedElementsAre("a::y1"));
|
2018-08-20 14:39:32 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
TEST(DexIndexTest, IgnoreCases) {
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
auto I = DexIndex::build(generateSymbols({"ns::ABC", "ns::abc"}));
|
2018-08-20 14:39:32 +00:00
|
|
|
FuzzyFindRequest Req;
|
|
|
|
Req.Query = "AB";
|
|
|
|
Req.Scopes = {"ns::"};
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
EXPECT_THAT(match(*I, Req), UnorderedElementsAre("ns::ABC", "ns::abc"));
|
2018-08-20 14:39:32 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
TEST(DexIndexTest, Lookup) {
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
auto I = DexIndex::build(generateSymbols({"ns::abc", "ns::xyz"}));
|
|
|
|
EXPECT_THAT(lookup(*I, SymbolID("ns::abc")), UnorderedElementsAre("ns::abc"));
|
|
|
|
EXPECT_THAT(lookup(*I, {SymbolID("ns::abc"), SymbolID("ns::xyz")}),
|
2018-08-20 14:39:32 +00:00
|
|
|
UnorderedElementsAre("ns::abc", "ns::xyz"));
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
EXPECT_THAT(lookup(*I, {SymbolID("ns::nonono"), SymbolID("ns::xyz")}),
|
2018-08-20 14:39:32 +00:00
|
|
|
UnorderedElementsAre("ns::xyz"));
|
[clangd] Factor out the data-swapping functionality from MemIndex/DexIndex.
Summary:
This is now handled by a wrapper class SwapIndex, so MemIndex/DexIndex can be
immutable and focus on their job.
Old and busted:
I have a MemIndex, which holds a shared_ptr<vector<Symbol*>>, which keeps the
symbol slab alive. I update by calling build(shared_ptr<vector<Symbol*>>).
New hotness: I have a SwapIndex, which holds a unique_ptr<SymbolIndex>, which
holds a MemIndex, which holds a shared_ptr<void>, which keeps backing
data alive.
I update by building a new MemIndex and calling SwapIndex::reset().
Reviewers: kbobyrev, ioeric
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51422
llvm-svn: 341318
2018-09-03 14:37:43 +00:00
|
|
|
EXPECT_THAT(lookup(*I, SymbolID("ns::nonono")), UnorderedElementsAre());
|
2018-08-20 14:39:32 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
} // namespace
|
[clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are
* Trigrams - these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file
This patch outlines the generic for such token namespaces, but only
implements trigram generation.
The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.
However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).
Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html
The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj
Reviewers: sammccall, ioeric, ilya-biryukovA
Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman
Differential Revision: https://reviews.llvm.org/D49591
llvm-svn: 337901
2018-07-25 10:34:57 +00:00
|
|
|
} // namespace dex
|
|
|
|
} // namespace clangd
|
|
|
|
} // namespace clang
|