tor-browser

The Tor Browser
git clone https://git.dasho.dev/tor-browser.git
Log | Files | Refs | README | LICENSE

writing-matchers.rst (7503B)


      1 .. _writing_matchers:
      2 
      3 Writing Matchers
      4 ================
      5 
      6 On this page we will give some information about what a matcher is, and then provide an example of developing a simple match iteratively.
      7 
      8 Types of Matchers
      9 -----------------
     10 
     11 There are three types of matches: Node, Narrowing, and Traversal.  There isn't always a clear separation or distinction between them, so treat this explanation as illustrative rather than definitive.  Here is the documentation on matchers: `https://clang.llvm.org/docs/LibASTMatchersReference.html <https://clang.llvm.org/docs/LibASTMatchersReference.html>`_
     12 
     13 On that page it is not obvious, so we want to note, **cicking on the name of a matcher expands help about that matcher.** Example:
     14 
     15 .. image:: documentation-expanded.png
     16 
     17 Node Matchers
     18 ~~~~~~~~~~~~~
     19 
     20 Node matchers can be thought of as 'Nouns'. They specify a **type** of node you want to match, that is, a particular *thing*. A function, a binary operation, a variable, a type.
     21 
     22 A full list of `node matchers are listed in the documentation <https://clang.llvm.org/docs/LibASTMatchersReference.html#node-matchers>`_. Some common ones are ``functionDecl()``, ``binaryOperator()``, and ``stmt()``.
     23 
     24 Narrowing Matchers
     25 ~~~~~~~~~~~~~~~~~~
     26 
     27 Narrowing matchers can be thought of as 'Adjectives'. They narrow, or describe, a node, and therefore must be applied to a Node Matcher.  For instance a node matcher may be a ``functionDecl``, and the narrowing matcher applied to it may be ``parameterCountIs``.
     28 
     29 The `table in the documentation <https://clang.llvm.org/docs/LibASTMatchersReference.html#narrowing-matchers>`_ lists all the narrowing matchers, which they apply to and how to use them.  Here is how to read the table:
     30 
     31 .. image:: narrowing-matcher.png
     32 
     33 And some examples:
     34 
     35 ::
     36 
     37  m functionDecl(parameterCountIs(1))
     38  m functionDecl(anyOf(isDefinition(), isVariadic()))
     39 
     40 
     41 As you can see **only one Narrowing Matcher is allowed** and it goes inside the parens of the Node Matcher. In the first example, the matcher is ``parameterCountIs``, in the second it is ``anyOf``.
     42 
     43 In the second, we use the singular ``anyOf`` matcher to match any of multiple other Narrowing Matchers: ``isDefinition`` or ``isVariadic``. The other two common combining narrowing matchers are ``allOf()`` and ``unless()``.
     44 
     45 If you *need* to specify a narrowing matcher (because it's a required argument to some other matcher), you can use the ``anything()`` narrowing matcher to have a no-op narrowing matcher.
     46 
     47 Traversal Matchers
     48 ~~~~~~~~~~~~~~~~~~
     49 
     50 Traversal Matchers *also* can be thought of as adjectives - at least most of them.  They also describe a specific node, but the difference from a narrowing matcher is that the scope of the description is broader than the individual node.  A narrowing matcher says something about the node in isolation (e.g. the number of arguments it has) while a traversal matcher says something about the node's contents or place in the program.
     51 
     52 Again, the `the documentation <https://clang.llvm.org/docs/LibASTMatchersReference.html#traversal-matchers>`_ is the best place to explore and understand these, but here is a simple example for the traversal matcher ``hasArraySize()``:
     53 
     54 ::
     55 
     56  Given:
     57    class MyClass { };
     58    MyClass *p1 = new MyClass[10];
     59 
     60 
     61  cxxNewExpr()
     62    matches the expression 'new MyClass[10]'.
     63 
     64  cxxNewExpr(hasArraySize(integerLiteral(equals(9))))
     65    does not match anything
     66 
     67  cxxNewExpr(hasArraySize(integerLiteral(equals(10))))
     68    matches the expression 'new MyClass[10]'.
     69 
     70 
     71 
     72 Example of Iterative Matcher Development
     73 ----------------------------------------
     74 
     75 When developing matchers, it will be much easier if you do the following:
     76 
     77 1. Write out the code you want to match. Write it out in as many different ways as you can. Examples: For some value in the code use a variable, a constant and a function that returns a value. Put the code you want to match inside of a function, inside of a conditional, inside of a function call, and inside of an inline function definition.
     78 2. Write out the code you *don't* want to match, but looks like code you do. Write out benign function calls, benign assignments, etc.
     79 3. Iterate on your matcher and treat it as _code_ you're writing. Indent it, copy it somewhere in case your browser crashes, even stick it in a tiny temporary version-controlled file.
     80 
     81 As an example of the above, below is a sample iterative development process of a more complicated matcher.
     82 
     83 **Goal**: Match function calls where one of the parameters is an assignment expression with an integer literal, but the function parameter has a default value in the function definition.
     84 
     85 ::
     86 
     87  int add1(int a, int b) { return a + b; }
     88  int add2(int c, int d = 8) { return c + d; }
     89 
     90  int main() {
     91   int x, y, z;
     92 
     93   add1(x, y);     // <- No match, no assignment
     94   add1(3 + 4, y); // <- No match, no assignment
     95   add1(z = x, y); // <- No match, assignment, but not an integer literal
     96   add1(z = 2, y); // <- No match, assignment, integer literal, but function parameter lacks default value
     97   add2(3, z = 2); // <- Match
     98  }
     99 
    100 
    101 Here is the iterative development process:
    102 
    103 ::
    104 
    105  //-------------------------------------
    106  // Step 1: Find all the function calls
    107  m callExpr()
    108  // Matches all calls, as expected.
    109 
    110  //-------------------------------------
    111  // Step 2: Start refining based on the arguments to the call
    112  m callExpr(forEachArgumentWithParam()))
    113  // Error: forEachArgumentWithParam expects two parameters
    114 
    115  //-------------------------------------
    116  // Step 3: Figure out the syntax to matching all the calls with this new operator
    117  m callExpr(
    118  	forEachArgumentWithParam(
    119  		anything(),
    120  		anything()
    121  	)
    122  )
    123  // Matches all calls, as expected
    124 
    125  //-------------------------------------
    126  // Step 4: Find the calls with a binary operator of any kind
    127  m callExpr(
    128    forEachArgumentWithParam(
    129       binaryOperator(),
    130       anything()
    131     )
    132  )
    133  // Does not match the first call, but matches the others
    134 
    135  //-------------------------------------
    136  // Step 5: Limit the binary operator to assignments
    137  m callExpr(
    138    forEachArgumentWithParam(
    139       binaryOperator(isAssignmentOperator()),
    140       anything()
    141     )
    142  )
    143  // Now matches the final three calls
    144 
    145  //-------------------------------------
    146  // Step 6: Starting to refine matching the right-hand of the assignment
    147  m callExpr(
    148    forEachArgumentWithParam(
    149       binaryOperator(
    150       	allOf(
    151       	  isAssignmentOperator(),
    152       	  hasRHS()
    153   	    )),
    154       anything()
    155     )
    156  )
    157  // Error, hasRHS expects a parameter
    158 
    159  //-------------------------------------
    160  // Step 7:
    161  m callExpr(
    162    forEachArgumentWithParam(
    163       binaryOperator(
    164       	allOf(
    165       	  isAssignmentOperator(),
    166       	  hasRHS(anything())
    167   		  )),
    168       anything()
    169     )
    170  )
    171  // Okay, back to matching the final three calls
    172 
    173  //-------------------------------------
    174  // Step 8: Refine to just integer literals
    175  m callExpr(
    176    forEachArgumentWithParam(
    177       binaryOperator(
    178       	allOf(
    179       	  isAssignmentOperator(),
    180       	  hasRHS(integerLiteral())
    181   		  )),
    182       anything()
    183     )
    184  )
    185  // Now we match the final two calls
    186 
    187  //-------------------------------------
    188  // Step 9: Apply a restriction to the parameter definition
    189  m callExpr(
    190    forEachArgumentWithParam(
    191       binaryOperator(
    192       	allOf(
    193       	  isAssignmentOperator(),
    194       	  hasRHS(integerLiteral())
    195   		  )),
    196       hasDefaultArgument()
    197     )
    198  )
    199  // Now we match the final call