# Parallel_preorder sample Example that uses `parallel_for_each` to do parallel preorder traversal of a sparse graph. Each vertex in the graph is called a "cell". Each cell has a value. The value is a matrix. Some of the cells have operators that compute the cell's value, using other cell's values as input. A cell that uses the value of cell `x` is called a successor of `x`. The algorithm works as follows. 1. Compute the set of cells that have no inputs. This set is called `root_set`. 2. Each cell has an associated field `ref_count` that is an atomic integer. Initialize `ref_count` to the number of inputs for the `Cell`. 3. Update each cell in `root_set`, by applying a `parallel_for_each` to a `root_set`. 4. After updating a cell, for each of its successors 1. Atomically decrement the successor's ref_count 2. If the count became zero, add the cell to the set of cells to be updated, by calling `feeder::add`. The times printed are for the traversal and update, and do not include time for computing the `root_set`. The example is using custom synchronization via ref_count atomic variable. Correctness checking tools might not take this into account, and report data races between different tasks that are actually synchronized. **Note:** It is important to understand that this example is unlikely to show speedup if the cell values are changed to type "float". The reason is twofold. * The smaller value type causes each `Cell` to be significantly smaller than a cache line, which leads to false sharing conflicts. * The time to update the cells becomes very small, and consequently the overhead of `parallel_for_each` swamps the useful work. ## Building the example ``` cmake cmake --build . ``` ## Running the sample ### Predefined make targets * `make run_parallel_preorder` - executes the example with predefined parameters * `make perf_run_parallel_preorder` - executes the example with suggested parameters to measure the oneTBB performance * `make light_test_parallel_preorder` - executes the example with suggested parameters to reduce execution time. ### Application parameters Usage: ``` parallel_preorder [n-of-threads=value] [n-of-nodes=value] [n-of-traversals=value] [silent] [-h] [n-of-threads [n-of-nodes [n-of-traversals]]] ``` * `-h` - prints the help for command line options. * `n-of-threads` - the number of threads to use; a range of the form low\[:high\], where low and optional high are non-negative integers or `auto` for a platform-specific default number. * `n-of-nodes` - the number of nodes in the graph. Default value is 1000. * `n-of-traversals` - the number of times to evaluate the graph. Default value is 500. * `silent` - no output except elapsed time.