More Tilings

Let's look at more tilings that Gern allows programmers to write:

    // Outputs can be double tiled!
    Composable program({
        // More tiling!
        Tile(output["size"], t)(
            Tile(output["size"], t2)(
                add_1(input, temp),
                add_1(temp, output))),
    });

The program produces the output, where the computation is tiled twice:

void function_9(library::impl::ArrayCPU &input, library::impl::ArrayCPU &output, int64_t t, int64_t t2) {
    for (int64_t _gern_x_2_6_8 = 0; (_gern_x_2_6_8 < output.size); _gern_x_2_6_8 = (_gern_x_2_6_8 + t)) {
        for (int64_t _gern_x_2_6 = 0; (_gern_x_2_6 < t); _gern_x_2_6 = (_gern_x_2_6 + t2)) {
            auto _query_output_10 = output.query((_gern_x_2_6 + _gern_x_2_6_8), t2);

            library::impl::ArrayCPU temp = library::impl::ArrayCPU::allocate((_gern_x_2_6 + _gern_x_2_6_8), t2);

            auto _query_input_11 = input.query((_gern_x_2_6 + _gern_x_2_6_8), t2);

            library::impl::add_1(_query_input_11, temp);

            library::impl::add_1(temp, _query_output_10);

            temp.destroy();
        }
    }
}

Nested pipelines can also be written, for example:

    Composable program({
        // More tiling!
        Tile(output["size"], t)(
            Tile(temp["size"], t2)(
                add_1(input, temp)),
            add_1(temp, output)),
    });

prododuces the following program:


void function_9(library::impl::ArrayCPU &input, library::impl::ArrayCPU &output, int64_t t, int64_t t2) {
    for (int64_t _gern_x_2_8 = 0; (_gern_x_2_8 < output.size); _gern_x_2_8 = (_gern_x_2_8 + t)) {
        auto _query_output_10 = output.query(_gern_x_2_8, t);

        library::impl::ArrayCPU temp = library::impl::ArrayCPU::allocate(_gern_x_2_8, t);

        for (int64_t _gern_x_4_6 = 0; (_gern_x_4_6 < temp.size); _gern_x_4_6 = (_gern_x_4_6 + t2)) {
            auto _query_temp_11 = temp.query((_gern_x_4_6 + 0), t2);

            auto _query_input_12 = input.query((_gern_x_4_6 + _gern_x_2_8), t2);

            library::impl::add_1(_query_input_12, _query_temp_11);
        }

        library::impl::add_1(temp, _query_output_10);

        temp.destroy();
    }
}

At each point in the program, Fern has a notion of the data structure the user intends to produce, and each statement yields a data structure. Tilings are permitted only for the data structure that is the final output of the current program scope.