Parallel Programming: Assignment

Edwin Carlinet

2019

[UPDATES]

Updates will appear HERE.

Parallel Nearest Neighbor Retrieval

Given an initial set of 3D-points P, you have to design a datastucture that enables concurrent:

The data structure to implement is a the Quad-Tree than splits the search space in quads to enable efficient retrieval. An example in 2D is given below:

In this example, we have a quad-tree with Capacity = 1 which means each leaf can contain at most 1 point. A quad-tree of capacity = N means that a cell is split into 4 children when it contains more than N points. As a matter of performance, you may choose a proper N or limit the depth of tree.

The interface of the tree is the following:

static constexpr int K = 4; 
struct point
{
  short x, y, z;
};

using result_t = std::array<point, K>;


class IQuadTree
{
  public:
    IQuadTree() = default;

    // Initialize the tree with this list of points
    void init(const std::vector<point>& points);

    result_t    query(point p) const noexcept;
    void        insert(point p) noexcept;
    void        erase(point p) noexcept;
};

The points coordinates are short (16-bits signed int).

Query

The query operation must return the K-closest points w.r.t. euclidean distance. If there is a tie between two points, you can return any of them.

(Quoting from: https://stackoverflow.com/questions/41306122/nearest-neighbor-search-in-octree)

To find the point closest to a search point, or to get list of points in order of increasing distance, you can use a priority queue that can hold both points and internal nodes of the tree, which lets you remove them in order of distance.

For points (leaves), the distance is just the distance of the point from the search point. For internal nodes (octants), the distance is the smallest distance from the search point to any point that could possibly be in the octant.

Now, to search, you just put the root of the tree in the priority queue, and repeat:

1. Remove the head of the priority queue;
2. If the head of the queue was a point, then it is the closest point in the tree that you have not yet returned. You know this, because if any internal node could possibly have a closer point, then it would have been returned first from the priority queue;
3. If the head of the queue was an internal node, then put its children back into the queue

Insertion

The insertion must be able to run concurrently with the other operations. If the point is already in the set, this is a no-op.

Deletion

The deletion must be able to concurrently with the other operations. If the point is not in the set, this is a no-op.

Async interface

The async quad tree is exactly as a the normal quadtree except that each operation might be non-blocking.

class IAsyncQuadTree
{
  public:
    IAsyncQuadtree();

    // Initialize the tree with this list of points
    void init(const std::vector<point>& points);

    std::future<result_t> query(point p) const noexcept;
    std::future<void>     insert(point p) noexcept;
    std::future<void>     erase(point p) noexcept;
}

Note that the async executions may be non-deterministic, e.g.:

IAsyncQuadTree set;

// Run asynchronously (maybe in parallel)
auto q1 = set.insert({1,2,3});
auto q2 = set.erase({1,2,3});
auto q3 = set.query({1,2,3});

q1.get();
q2.get();
auto res = q3.get(); 

res may or may not contain the point (1,2,3) depending on the order of execution. However, with:

IAsyncQuadTree set;

auto q1 = set.insert({1,2,3});
q1.get(); // Block
auto q2 = set.erase({1,2,3});
q2.get(); // Block
auto q3 = set.query({1,2,3});
auto res = q3.get(); 

res does not contain the point (1,2,3).

Evaluation protocol

The objective is to process a list of operations as fast as possible.

Given a set of n points, and a sequence of m queries:

Hardware/software limitations

Where to code

Retrieve the archive prpa-login.tar.gz. A naive version of the project is already coded and can be used as a reference for tests and benchmarks (it uses std::vector with a global lock and the async version is actually blocking, nothing runs in parallel).

When building you get two executables: ./tests and ./bench

Tools

Some tool functions are provided:

Tests

You must adapt/augment the provided tests and add new ones!

$ ./tests 
Running main() from /builddir/build/BUILD/googletest-release-1.8.1/googletest/src/gtest_main.cc
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from Quadtree
[ RUN      ] Quadtree.Basic
[       OK ] Quadtree.Basic (0 ms)
[ RUN      ] Quadtree.ConcurrentOperations
[       OK ] Quadtree.ConcurrentOperations (1723 ms)
[ RUN      ] Quadtree.SimpleScenario
I: {
(13868,-18620,-293), (-6994,-18375,-2183), (-25777,-8008,736), (1303,-21291,-12444), (10215,16237,2111), (-24545,-14811,7682), (1946,20037,-3995), (-20190,-4542,-2674), }
S: (-22059,-5302,-1345) -> [(-20190,-4542,-2674) (-25777,-8008,736) (-24545,-14811,7682) (-6994,-18375,-2183) (1303,-21291,-12444) ]
A: (1303,-21291,-12444)
S: (3643,-25228,-11939) -> [(1303,-21291,-12444) (-6994,-18375,-2183) (13868,-18620,-293) (-20190,-4542,-2674) (-24545,-14811,7682) ]
S: (3411,21358,-3046) -> [(1946,20037,-3995) (10215,16237,2111) (-20190,-4542,-2674) (-6994,-18375,-2183) (13868,-18620,-293) ]
S: (17738,14622,-8341) -> [(10215,16237,2111) (1946,20037,-3995) (13868,-18620,-293) (1303,-21291,-12444) (-6994,-18375,-2183) ]
S: (14086,-17994,-90) -> [(13868,-18620,-293) (1303,-21291,-12444) (-6994,-18375,-2183) (10215,16237,2111) (-20190,-4542,-2674) ]
S: (-18809,-10450,-8719) -> [(-20190,-4542,-2674) (-25777,-8008,736) (-6994,-18375,-2183) (-24545,-14811,7682) (1303,-21291,-12444) ]
S: (-8953,-20278,1564) -> [(-6994,-18375,-2183) (1303,-21291,-12444) (-24545,-14811,7682) (-20190,-4542,-2674) (-25777,-8008,736) ]
D: (-24545,-14811,7682)
S: (5921,21258,-3475) -> [(1946,20037,-3995) (10215,16237,2111) (-20190,-4542,-2674) (13868,-18620,-293) (-6994,-18375,-2183) ]
A: (-14656,-14324,-6656)
S: (-24500,-6682,792) -> [(-25777,-8008,736) (-20190,-4542,-2674) (-14656,-14324,-6656) (-6994,-18375,-2183) (1303,-21291,-12444) ]
S: (-29892,-7905,-197) -> [(-25777,-8008,736) (-20190,-4542,-2674) (-14656,-14324,-6656) (-6994,-18375,-2183) (1303,-21291,-12444) ]
S: (10542,18196,3181) -> [(10215,16237,2111) (1946,20037,-3995) (13868,-18620,-293) (-20190,-4542,-2674) (-6994,-18375,-2183) ]
D: (1303,-21291,-12444)
D: (-14656,-14324,-6656)
S: (16962,20328,-6415) -> [(10215,16237,2111) (1946,20037,-3995) (13868,-18620,-293) (-20190,-4542,-2674) (-6994,-18375,-2183) ]
S: (4359,-22708,-12506) -> [(-6994,-18375,-2183) (13868,-18620,-293) (-20190,-4542,-2674) (-25777,-8008,736) (10215,16237,2111) ]
S: (-11353,-13944,-6799) -> [(-6994,-18375,-2183) (-20190,-4542,-2674) (-25777,-8008,736) (13868,-18620,-293) (1946,20037,-3995) ]
A: (-24545,-14811,7682)
[       OK ] Quadtree.SimpleScenario (3 ms)
[ RUN      ] Quadtree.AsyncConsistency
[       OK ] Quadtree.AsyncConsistency (2115 ms)
[----------] 4 tests from Quadtree (3841 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test case ran. (3841 ms total)
[  PASSED  ] 4 tests.

Benchmarks

Benchmark will use Scenario::execute with 10000 queries whose behavior is the following.

$ ./bench

----------------------------------------------------------------
Benchmark                         Time           CPU Iterations
----------------------------------------------------------------
BMScenario/Naive_NoAsync      11332 ms      11309 ms          1    8.84215 items/s
BMScenario/Naive_Async        11488 ms      11466 ms          1    8.72126 items/s

Submission

By group of 2 or 3 people! (not 1, not 4!)

You must submit to moodle.cri.epita.fr

Oral defense and due date