writing-matchers.rst (7503B)
1 .. _writing_matchers: 2 3 Writing Matchers 4 ================ 5 6 On this page we will give some information about what a matcher is, and then provide an example of developing a simple match iteratively. 7 8 Types of Matchers 9 ----------------- 10 11 There are three types of matches: Node, Narrowing, and Traversal. There isn't always a clear separation or distinction between them, so treat this explanation as illustrative rather than definitive. Here is the documentation on matchers: `https://clang.llvm.org/docs/LibASTMatchersReference.html <https://clang.llvm.org/docs/LibASTMatchersReference.html>`_ 12 13 On that page it is not obvious, so we want to note, **cicking on the name of a matcher expands help about that matcher.** Example: 14 15 .. image:: documentation-expanded.png 16 17 Node Matchers 18 ~~~~~~~~~~~~~ 19 20 Node matchers can be thought of as 'Nouns'. They specify a **type** of node you want to match, that is, a particular *thing*. A function, a binary operation, a variable, a type. 21 22 A full list of `node matchers are listed in the documentation <https://clang.llvm.org/docs/LibASTMatchersReference.html#node-matchers>`_. Some common ones are ``functionDecl()``, ``binaryOperator()``, and ``stmt()``. 23 24 Narrowing Matchers 25 ~~~~~~~~~~~~~~~~~~ 26 27 Narrowing matchers can be thought of as 'Adjectives'. They narrow, or describe, a node, and therefore must be applied to a Node Matcher. For instance a node matcher may be a ``functionDecl``, and the narrowing matcher applied to it may be ``parameterCountIs``. 28 29 The `table in the documentation <https://clang.llvm.org/docs/LibASTMatchersReference.html#narrowing-matchers>`_ lists all the narrowing matchers, which they apply to and how to use them. Here is how to read the table: 30 31 .. image:: narrowing-matcher.png 32 33 And some examples: 34 35 :: 36 37 m functionDecl(parameterCountIs(1)) 38 m functionDecl(anyOf(isDefinition(), isVariadic())) 39 40 41 As you can see **only one Narrowing Matcher is allowed** and it goes inside the parens of the Node Matcher. In the first example, the matcher is ``parameterCountIs``, in the second it is ``anyOf``. 42 43 In the second, we use the singular ``anyOf`` matcher to match any of multiple other Narrowing Matchers: ``isDefinition`` or ``isVariadic``. The other two common combining narrowing matchers are ``allOf()`` and ``unless()``. 44 45 If you *need* to specify a narrowing matcher (because it's a required argument to some other matcher), you can use the ``anything()`` narrowing matcher to have a no-op narrowing matcher. 46 47 Traversal Matchers 48 ~~~~~~~~~~~~~~~~~~ 49 50 Traversal Matchers *also* can be thought of as adjectives - at least most of them. They also describe a specific node, but the difference from a narrowing matcher is that the scope of the description is broader than the individual node. A narrowing matcher says something about the node in isolation (e.g. the number of arguments it has) while a traversal matcher says something about the node's contents or place in the program. 51 52 Again, the `the documentation <https://clang.llvm.org/docs/LibASTMatchersReference.html#traversal-matchers>`_ is the best place to explore and understand these, but here is a simple example for the traversal matcher ``hasArraySize()``: 53 54 :: 55 56 Given: 57 class MyClass { }; 58 MyClass *p1 = new MyClass[10]; 59 60 61 cxxNewExpr() 62 matches the expression 'new MyClass[10]'. 63 64 cxxNewExpr(hasArraySize(integerLiteral(equals(9)))) 65 does not match anything 66 67 cxxNewExpr(hasArraySize(integerLiteral(equals(10)))) 68 matches the expression 'new MyClass[10]'. 69 70 71 72 Example of Iterative Matcher Development 73 ---------------------------------------- 74 75 When developing matchers, it will be much easier if you do the following: 76 77 1. Write out the code you want to match. Write it out in as many different ways as you can. Examples: For some value in the code use a variable, a constant and a function that returns a value. Put the code you want to match inside of a function, inside of a conditional, inside of a function call, and inside of an inline function definition. 78 2. Write out the code you *don't* want to match, but looks like code you do. Write out benign function calls, benign assignments, etc. 79 3. Iterate on your matcher and treat it as _code_ you're writing. Indent it, copy it somewhere in case your browser crashes, even stick it in a tiny temporary version-controlled file. 80 81 As an example of the above, below is a sample iterative development process of a more complicated matcher. 82 83 **Goal**: Match function calls where one of the parameters is an assignment expression with an integer literal, but the function parameter has a default value in the function definition. 84 85 :: 86 87 int add1(int a, int b) { return a + b; } 88 int add2(int c, int d = 8) { return c + d; } 89 90 int main() { 91 int x, y, z; 92 93 add1(x, y); // <- No match, no assignment 94 add1(3 + 4, y); // <- No match, no assignment 95 add1(z = x, y); // <- No match, assignment, but not an integer literal 96 add1(z = 2, y); // <- No match, assignment, integer literal, but function parameter lacks default value 97 add2(3, z = 2); // <- Match 98 } 99 100 101 Here is the iterative development process: 102 103 :: 104 105 //------------------------------------- 106 // Step 1: Find all the function calls 107 m callExpr() 108 // Matches all calls, as expected. 109 110 //------------------------------------- 111 // Step 2: Start refining based on the arguments to the call 112 m callExpr(forEachArgumentWithParam())) 113 // Error: forEachArgumentWithParam expects two parameters 114 115 //------------------------------------- 116 // Step 3: Figure out the syntax to matching all the calls with this new operator 117 m callExpr( 118 forEachArgumentWithParam( 119 anything(), 120 anything() 121 ) 122 ) 123 // Matches all calls, as expected 124 125 //------------------------------------- 126 // Step 4: Find the calls with a binary operator of any kind 127 m callExpr( 128 forEachArgumentWithParam( 129 binaryOperator(), 130 anything() 131 ) 132 ) 133 // Does not match the first call, but matches the others 134 135 //------------------------------------- 136 // Step 5: Limit the binary operator to assignments 137 m callExpr( 138 forEachArgumentWithParam( 139 binaryOperator(isAssignmentOperator()), 140 anything() 141 ) 142 ) 143 // Now matches the final three calls 144 145 //------------------------------------- 146 // Step 6: Starting to refine matching the right-hand of the assignment 147 m callExpr( 148 forEachArgumentWithParam( 149 binaryOperator( 150 allOf( 151 isAssignmentOperator(), 152 hasRHS() 153 )), 154 anything() 155 ) 156 ) 157 // Error, hasRHS expects a parameter 158 159 //------------------------------------- 160 // Step 7: 161 m callExpr( 162 forEachArgumentWithParam( 163 binaryOperator( 164 allOf( 165 isAssignmentOperator(), 166 hasRHS(anything()) 167 )), 168 anything() 169 ) 170 ) 171 // Okay, back to matching the final three calls 172 173 //------------------------------------- 174 // Step 8: Refine to just integer literals 175 m callExpr( 176 forEachArgumentWithParam( 177 binaryOperator( 178 allOf( 179 isAssignmentOperator(), 180 hasRHS(integerLiteral()) 181 )), 182 anything() 183 ) 184 ) 185 // Now we match the final two calls 186 187 //------------------------------------- 188 // Step 9: Apply a restriction to the parameter definition 189 m callExpr( 190 forEachArgumentWithParam( 191 binaryOperator( 192 allOf( 193 isAssignmentOperator(), 194 hasRHS(integerLiteral()) 195 )), 196 hasDefaultArgument() 197 ) 198 ) 199 // Now we match the final call