Last time we saw how to use pre-defined MLIR traits to enable upstream MLIR passes like loop-invariant-code-motion to apply to `poly`

programs. We left out `-sccp`

(sparse conditional constant propagation), and so this time we’ll add what is needed to make that pass work. It requires the concept of *folding*.

The code for this article is in this pull request, and as usual the commits are organized to be read in order.

## Constant Propagation vs Canonicalization

`-sccp`

is sparse conditional constant propagation, which attempts to infer when an operation has a constant output, and then replaces the operation with the constant value. Repeating this, it “propagates” the constants as far as possible through the program. You can think of it like eagerly computing values it can during compile time, and then sticking them into the compiled program as constants.

Here’s what it looks like for arith, where all the needed tools are implemented. For an input like:

```
func.func @test_arith_sccp() -> i32 {
%0 = arith.constant 7 : i32
%1 = arith.constant 8 : i32
%2 = arith.addi %0, %0 : i32
%3 = arith.muli %0, %0 : i32
%4 = arith.addi %2, %3 : i32
return %2 : i32
}
```

The output of `tutorial-opt --sccp`

is

```
func.func @test_arith_sccp() -> i32 {
%c63_i32 = arith.constant 63 : i32
%c49_i32 = arith.constant 49 : i32
%c14_i32 = arith.constant 14 : i32
%c8_i32 = arith.constant 8 : i32
%c7_i32 = arith.constant 7 : i32
return %c14_i32 : i32
}
```

Note two additional facts: sccp doesn’t delete dead code, and what is *not* shown here is the main novelty in `sccp`

, which is that it can propagate constants through control flow (ifs and loops).

A related concept is the idea of *canonicalization*, which gets its own `--canonicalize`

pass, and which hides a lot of the heavy lifting in MLIR. Canonicalize overlaps a little bit with `sccp`

, in that it also computes constants and materializes them in the IR. Take, for example, the `--canonicalize`

pass on the same IR:

```
func.func @test_arith_sccp() -> i32 {
%c14_i32 = arith.constant 14 : i32
return %c14_i32 : i32
}
```

The intermediate constants are all pruned, and all that remains is the return value and no operations. Canonicalize cannot propagate constants through control flow, and as such should be thought of as more “local” than `sccp`

.

Both of these, however, are supported via *folding*, which is the process of taking series of ops and merging them together into simpler ops. It also requires our dialect has some sort of *constant* operation, which is inserted (“materialized”) with the results of a fold. Folding and canonicalization are more general than what I’m showing here, so we’ll come back to what else they can do in a future article.

The rough outline of what is needed to support folding in this way is:

- Adding a constant operation
- Adding a materialization hook
- Adding folders for each op

This would result in a situation (test case commit) as follows. Starting from

```
%0 = arith.constant dense<[1, 2, 3]> : tensor<3xi32>
%1 = poly.from_tensor %0 : tensor<3xi32> -> !poly.poly<10>
%2 = poly.mul %1, %1 : (!poly.poly<10>, !poly.poly<10>) -> !poly.poly<10>
%3 = poly.mul %1, %1 : (!poly.poly<10>, !poly.poly<10>) -> !poly.poly<10>
%4 = poly.add %2, %3 : (!poly.poly<10>, !poly.poly<10>) -> !poly.poly<10>
```

We would get

```
%0 = poly.constant dense<[2, 8, 20, 24, 18]> : tensor<5xi32> : <10>
%1 = poly.constant dense<[1, 4, 10, 12, 9]> : tensor<5xi32> : <10>
%2 = poly.constant dense<[1, 2, 3]> : tensor<3xi32> : <10>
```

## Making a constant op

Currently we’re imitating a constant polynomial construction by combing a `from_tensor`

op with `arith.constant`

. Like this:

```
%0 = arith.constant dense<[1, 2, 3]> : tensor<3xi32>
%p0 = poly.from_tensor %0 : tensor<3xi32> -> !poly.poly<10>
```

While a constant operation might combine them into one op.

```
%0 = poly.constant dense<[2, 8, 20, 24, 18]> : !poly.poly<10>
```

The `from_tensor`

op can also be used to build a polynomial from data, not just constants, so it’s worth having around even after we implement `poly.constant`

.

Having a dedicated constant operation has benefits explained in the MLIR documentation on folding. What’s relevant here is that `fold`

can be used to signal to passes like `sccp`

that the result of an op is constant (statically known), or it can be used to say that the result of an op is equivalent to a pre-existing value created by a different op. For the constant case, a `materializeConstant`

hook is also needed to tell MLIR how to take the constant result and turn it into a proper IR op.

The constant op itself, in this commit, comes with two new concepts, the `ConstantLike`

trait and an argument that is an *attribute constraint.*

```
def Poly_ConstantOp : Op<Poly_Dialect, "constant", [Pure, ConstantLike]> { // new
let summary = "Define a constant polynomial via an attribute.";
let arguments = (ins AnyIntElementsAttr:$coefficients); // new
let results = (outs Polynomial:$output);
let assemblyFormat = "$coefficients attr-dict `:` type($output)";
}
```

The `ConstantLike`

attribute is checked here during folding via the constant op matcher as an assertion. [Aside: I’m not sure why the trait is specifically required, so long as the materialization function is present on the dialect; it just seems like this check is used for assertions. Perhaps it’s just a safeguard.]

Next we have the line `let arguments = (ins AnyIntElementsAttr:$coefficients);`

This defines the input to the op as an attribute (statically defined data) rather than a previous SSA value. The `AnyIntElementsAttr`

is itself an attribute constraint, allowing any attribute that is has the `IntElementsAttrBase`

as a base class to be used (e.g., 32-bit or 64-bit integer attributes). This means that we could use all of the following syntax forms:

```
%10 = poly.constant dense<[2, 3, 4]> : tensor<3xi32> : !poly.poly<10>
%11 = poly.constant dense<[2, 3, 4]> : tensor<3xi8> : !poly.poly<10>
%12 = poly.constant dense<"0x020304"> : tensor<3xi8> : !poly.poly<10>
%13 = poly.constant dense<4> : tensor<100xi32> : !poly.poly<10>
```

## Adding folders

We add folders in these commits:

Each has the same structure: add `let hasFolder = 1;`

to the tablegen for the op, which adds a header declaration of the following form (noting that the signature would be different if the op has more than one result value, see docs).

```
OpFoldResult <OpName>::fold(<OpName>::FoldAdaptor adaptor);
```

Then we implement it in `PolyOps.cpp`

. The semantics of this function are such that, if the fold decides the op should be replaced with a constant, it must return an attribute representing that constant, which can be given as the input to a `poly.constant`

. The `FoldAdaptor`

is a shim that has the same method names as an instance of the op’s C++ class, but arguments that have been folded themselves are replaced with `Attribute`

instances representing the constants they were folded with. This will be relevant for folding add and mul, since the body needs to actually compute the result eagerly, and needs access to the actual values to do so.

For `poly.constant`

the implementation is trivial: you just return the input attribute.

```
OpFoldResult ConstantOp::fold(ConstantOp::FoldAdaptor adaptor) {
return adaptor.getCoefficients();
}
```

The `from_tensor`

op is similar, but has an extra cast that acts as an assertion, since the tensor might have been constructed with weird types we don’t want as input. If the `dyn_cast`

fails, the result is `nullptr`

, which is cast by MLIR to a failed `OpFoldResult`

.

```
OpFoldResult FromTensorOp::fold(FromTensorOp::FoldAdaptor adaptor) {
// Returns null if the cast failed, which corresponds to a failed fold.
return dyn_cast<DenseIntElementsAttr>(adaptor.getInput());
}
```

The poly binary ops are slightly more complicated since they are actually doing work. Each of these `fold`

methods effectively takes as input two `DenseIntElementsAttr`

for each operand, and expects us to return another `DenseIntElementsAttr`

for the result.

For add/sub which are elementwise operations on the coefficients, we get to use an existing upstream helper method, constFoldBinaryOp, which through some template metaprogramming wizardry, allows us to specify only the elementwise operation itself.

```
OpFoldResult AddOp::fold(AddOp::FoldAdaptor adaptor) {
return constFoldBinaryOp<IntegerAttr, APInt>(
adaptor.getOperands(), [&](APInt a, APInt b) { return a + b; });
}
```

For mul, we have to write out the multiplication routine manually. In what’s below, I’m implementing the naive textbook polymul algorithm, which could be optimized if one expects people to start compiling programs with large, static polynomials in them.

```
OpFoldResult MulOp::fold(MulOp::FoldAdaptor adaptor) {
auto lhs = cast<DenseIntElementsAttr>(adaptor.getOperands()[0]);
auto rhs = cast<DenseIntElementsAttr>(adaptor.getOperands()[1]);
auto degree = getResult().getType().cast<PolynomialType>().getDegreeBound();
auto maxIndex = lhs.size() + rhs.size() - 1;
SmallVector<APInt, 8> result;
result.reserve(maxIndex);
for (int i = 0; i < maxIndex; ++i) {
result.push_back(APInt((*lhs.begin()).getBitWidth(), 0));
}
int i = 0;
for (auto lhsIt = lhs.value_begin<APInt>(); lhsIt != lhs.value_end<APInt>();
++lhsIt) {
int j = 0;
for (auto rhsIt = rhs.value_begin<APInt>(); rhsIt != rhs.value_end<APInt>();
++rhsIt) {
// index is modulo degree because poly's semantics are defined modulo x^N = 1.
result[(i + j) % degree] += *rhsIt * (*lhsIt);
++j;
}
++i;
}
return DenseIntElementsAttr::get(
RankedTensorType::get(static_cast<int64_t>(result.size()),
IntegerType::get(getContext(), 32)),
result);
}
```

## Adding a constant materializer

Finally, we add a constant materializer. This is a dialect-level feature, so we start by adding `let hasConstantMaterializer = 1;`

to the dialect tablegen, and observing the newly generated header signature:

```
Operation *PolyDialect::materializeConstant(
OpBuilder &builder, Attribute value, Type type, Location loc);
```

The `Attribute`

input represents the result of each folding step above. The `Type`

is the desired result type of the op, which is needed in cases like `arith.constant`

where the same attribute can generate multiple different types via different interpretations of a hex string or splatting with a result tensor that has different dimensions.

In our case the implementation is trivial: just construct a constant op from the attribute.

```
Operation *PolyDialect::materializeConstant(
OpBuilder &builder, Attribute value, Type type, Location loc) {
auto coeffs = dyn_cast<DenseIntElementsAttr>(value);
if (!coeffs)
return nullptr;
return builder.create<ConstantOp>(loc, type, coeffs);
}
```

## Other kinds of folding

While this has demonstrated a generic kind of folding with respect to static constants, many folding functions in MLIR use simple matches to determine when an op can be replaced with a value from a previously computed op.

Take, for example, the complex dialect (for complex numbers). A `complex.create`

op constructs a complex number from real and imaginary parts. A folder in that dialect checks for a pattern like `complex.create(complex.re(%z), complex.im(%z))`

, and replaces it with `%z`

directly. The `arith`

dialect similarly has folds for things like `a-b + b -> a`

and `a + 0 -> a`

.

However, most work on simplifying an IR according to algebraic rules belongs in the canonicalization pass, since while it supports folding, it also supports general rewrite patterns that are allowed to delete and create ops as needed to simplify the IR. We’ll cover canonicalization in more detail in a future article. But just remember, folds may only modify the single operation being folded, use existing SSA values, and may not create new ops. So they are limited in power and decidedly local operations.

Want to respond? Send me an email, post a webmention, or find me elsewhere on the internet.