Emitting Diagnostics in Clang

One of the major strong points of the clang compiler project has always been its commitment to produce clear, correct, precise and especially human-readable diagnostics. Given that clang is the frontend for a language that is infamous for error messages that make even the bravest and most experienced of programmers lose their last ounce of hope (and switch to Rust), emitting understandable error messages is a big deal. One really comes to appreciate clang’s clarity when we compare it to other compilers. Take this simple snippet of code as an example:

int x
int y;

If we compare what gcc (6.3) says about this:

f.cpp:2:2: error: expected initializer before 'int'
  int y;
  ^~~

as opposed clang (4.0):

f.cpp:1:6: error: expected ';' after top level declarator
int x
     ^
     ;
1 error generated.

we can see that even when it comes to the little things, clang pays attention and points out the actual problem, whereas gcc just complains about the consequence.

There are numerous other ways in which the clang development community’s hard work to get diagnostics right (from the start) shine out, from dealing with macros and typedefs all the way to printing pretty trees for the enigmatic templates C++ is famous for. You can read more about this here.

Since clang and LLVM as a whole are not designed as monolithic blobs of code, but rather as a set of (reasonably) standalone libraries, the best thing about diagnostics in clang is that we can make use of them ourselves when building our own clang tools. Documentation and practical examples of using clang’s diagnostics APIs are sparse on the web, so the next few paragraphs will discuss some basic concepts as well as the internal source code and implementation. We’ll begin by looking at emitting plain diagnostics and later investigate FixIt hints to provide useful suggestions to the users of our tools.

Note that this post is best read with some prior experience with clang tooling. You can still read it if you’re just starting out, I just won’t go into the details of every line that’s not so relevant to diagnostics.

Diagnostics

One of the most exciting aspects of my clang tooling journey so far has been diving deep into the clang (and LLVM) source tree and learning about the project by looking at how stuff is implemented. Fortunately, LLVM is probably one of the cleanest and best maintained large-scale codebases in the wild, so popping up a source file is usually fun rather than painful. As such we’ll learn about diagnostics in clang by discussing the classes that make up the API. We begin with the DiagnosticsEngine class, which is the ASTContext of the diagnostics world. We then take a closer look at the DiagnosticBuilder class, which is always used alongside the engine to build diagnostics. Finally we’ll briefly delve into the DiagnosticsConsumer side of things and see how clang’s diagnostics end up where they do.

DiagnosticsEngine

The DiagnosticsEngine class is one of the main actors in clang’s diagnostic-reporting architecture and your first stop when wanting to emit a diagnostic. Its primary purpose is to provide a way to report a diagnostic via its Report method and then to allow configuration of various “meta” aspects of diagnostic reporting. For example, you can configure what diagnostics will be suppressed, whether warnings should show as errors, what the limit on showing nested template declarations is before they are elided, whether colors should be shown or if errors should be printed as a tree. You can find the declaration (and a lot of the definition) of the DiagnosticsEngine class in include/clang/Basic/Diagnostic.h (relative to the clang root), around line 150. In here, for example, we see that diagnostics can be emitted at a certain level:

enum Level {
  Ignored = DiagnosticIDs::Ignored,
  Note = DiagnosticIDs::Note,
  Remark = DiagnosticIDs::Remark,
  Warning = DiagnosticIDs::Warning,
  Error = DiagnosticIDs::Error,
  Fatal = DiagnosticIDs::Fatal
};

Before being able to report a diagnostic yourself, you must request a custom ID for the warning or error message you want to emit, as every diagnostic in clang has a unique ID. This could look like so:

// Typical clang tooling context
void run(clang::ASTContext& Context) {
  // Types shown for clarity
  clang::DiagnosticsEngine &DE = Context.getDiagnostics();
  const unsigned ID = DE.getCustomDiagID(clang::DiagnosticsEngine::Warning,
                                         "I findz a badness");
  // ...

}

The idea of the unique identifier is that you can reuse diagnostics that you define once (maybe using a complex format) in the various contexts that you may find them. This is also how the clang compiler organizes its warnings. If we take a look at include/clang/Basic/DiagnosticASTKinds.td, we can see the following error definition for accessing volatile expressions in constexpr contexts, which is not allowed:

def note_constexpr_access_volatile_obj : Note<
  "%select{read of|assignment to|increment of|decrement of}0 volatile "
  "%select{temporary|object %2|member %2}1 is not allowed in "
  "a constant expression">;

which can then be referenced via diag::note_constexpr_access_volatile_obj in clang code (using its internal code generation mechanisms, which turns the .td file into a .inc header).

Diagnostic Formatting

As you can see from the internal clang warning about volatile accesses in constexpr expressions, clang has a relatively powerful formatting language for its diagnostics that you can use for templatization. The simplest way we can use it is by passing the names of variables or other source constructs to the message in a printf-like fashion. Say, for example, we were writing a tool to ensure that all pointer variables in our code base are prefixed with p_. If we find a rebellious variable without such a prefix, we could emit a nice warning like so:

const unsigned ID = DE.getCustomDiagID(
  clang::DiagnosticsEngine::Warning,
  "Pointer variable '%0' should have a 'p_' prefix");

Sneak-peaking at the DiagnosticBuilder (discussed below), we could later populate this message in the following way:

DiagnosticsEngine.Report(InsertionLoc, ID).AddString(NamedDecl.getName());

Besides this simple example, clang’s diagnostic formatting language has quite a few more features. You can find a full description of them here, but to name a few:

  • To handle the problem of plural names, we can make use of the %s<n> construct. For example, when we write Found %1 foo%s1, this will turn into either Found 1 foo or Found 42 foos, (i.e. the plural “s” is appended automatically), removing the need to write (a) thing(s) like this.
  • By writing %select{a|b|...}<n> we can later pass an index into this list, allowing us to reuse the same error message for different words or phrases.
  • If given a NamedDecl*, writing %q<n> will print he fully qualified name (e.g. std::string) instead of just the unqualified name (string).

The DiagnosticsEngine can also be used to query information about a diagnostics. For example, given that we have created a custom diagnostic ID, we can use the following method to get information about the level of a diagnostic:

clang::DiagnosticsEngine::Level level = DE.getDiagosticLevel(ID, SourceLocation());

and more …

DE.setConstexprBacktraceLimit();
DE.setErrorLimit(10);
DE.getPrintTemplateTree();
DE.setElideType(true);

DiagnosticBuilder

The second piece of clang’s diagnostics puzzle, after the DiagnosticsEngine class, is the DiagnosticBuilder. This is the type of the object returned by DiagnosticsEngine::Report. It is a very lightweight class designed to be used immediately (i.e. not stored) to build a diagnostic, optionally taking appropriate formatting arguments, and then causes the diagnostic to be emitted in its destructor (usually at the end of the scope or expression). To be more precise, once you have created a custom diagnostic ID from an error level and message, formatted with placeholders like %0 and %1, you will use the DiagnosticBuilder to pass the actual values of those arguments. For example:

const auto ID = DE.getCustomDiagID(clang::DiagnosticsEngine::Warning,
                                   "variable %0 is bad, it appears %1 time%s1");

clang::DiagnosticBuilder DB = DE.Report(VarDecl->getLocStart(), ID);
DB.AddString(VarDecl->getName())
DB.AddTaggedVal(6, clang::DiagnosticsEngine::ArgumentKind::ak_uint);

Given a snippet int x; and given that we are matching on integers, this would produce the following diagnostic when the DB variable goes out of scope:

file.cpp:1:5: warning: variable x is bad, it appears 6 times
int x;
    ^

As you can see, the DiagnosticBuilder has quite a few ways of passing values to substitute for the formatting placeholders. These add values in order, i.e. the first argument you pass will set the value for %0, the second for %1 and so on. The first important method is AddString, which sets a string argument. The second function is AddTaggedVal, which takes an integer and formats it according to the clang::DiagnosticsEngine::ArgumentKind enum member, which is the second argument. Above, we specified that it was an unsigned integer (ak_unit). The type AddTaggedVal takes is actually an intptr_t, such that the integer you pass can either be a number, or a pointer (in the usual pragmatic fashion in which clang and LLVM interpret integers), such as to a NamedDecl* for the %q<n> formatter. For example, when matching against a type declaration inside a namespace, like namespace NS { class X { }; }, we can write something like this:

void diagnose(clang::DiagnosticsEngine& DE, clang::CXXRecordDecl* RecordDecl) {
  const auto ID = DE.getCustomDiagID(clang::DiagnosticsEngine::Error,
                                     "detected evil type %q0");
  const auto Pointer = reinterpret_cast<intptr_t>(RecordDecl);
  const auto Kind = clang::DiagnosticsEngine::ArgumentKind::ak_nameddecl;

  DE.Report(RecordDecl->getLocation(), ID);
    .AddTaggedVal(Pointer, Kind);

  // Diagnostic emitted here
}

which will result in:

file.cpp:1:21: error: detected evil type 'A::X'
namespace A { class X { }; }
                    ^

doge clang

Source Ranges

Another cool thing you can do with diagnostics is highlight particular source ranges, like clang when it complains about certain invalid expressions. For example, when we try to add a function to an integer, clang will put markers underneath the entire expression:

// file.c
void f();
int x = 5 + f;
$ clang file.c
file.c:3:13: error: invalid operands to binary expression ('const char *' and 'void (*)()')
int x = "5" + f;
        ~~~ ^ ~
1 error generated.

We can do the same, pretty easily. In fact, we could replicate this exact error message, just that we can’t match on an invalid AST, so we’ll have to take an example that actually compiles. Given this code:

int x = 3 + .14;

we can use the following clang::ast_matchers matcher to find all plus operations adding an integer to a float:

binaryOperator(
  hasOperatorName("+"),
  hasLHS(implicitCastExpr(hasSourceExpression(integerLiteral()))),
  hasRHS(floatLiteral()))

and then emit a diagnostic that warns about the range of this expression. For this, we can simply pass a clang::CharSourceRange to the DiagnosticBuilder’s AddSourceRange method. It’s as simple as that:

void diagnose(clang::BinaryOperator* Op, clang::ASTContext& Context) {
  auto& DE = Context.getDiagnostics();
  const auto ID = DE.getCustomDiagID(
      clang::DiagnosticsEngine::Remark,
      "operands to binary expression ('%0' and '%1') are just fine");

  auto DB = DE.Report(Op->getOperatorLoc(), ID);
  DB.AddString(Op->getLHS()->IgnoreImpCasts()->getType().getAsString());
  DB.AddString(Op->getRHS()->getType().getAsString());

  const auto Range =
      clang::CharSourceRange::getCharRange(Op->getSourceRange());
  DB.AddSourceRange(Range);
}

which gives us:

file.cpp:1:11: remark: operands to binary expression ('int' and 'double') are just fine
int x = 3 + .14;
        ~~^~~~~

Sweet.

DiagnosticsConsumer

Another member of clang’s diagnostics family is the DiagnosticsConsumer class. As you can imagine, clang’s diagnostics are distributed via the typical publisher-subscriber/observer pattern, so subclasses of this class are responsible for actually printing error messages to the console, storing them in a log file or processing them in another way. In a typical clang tool, you’ll most likely not have to interact with this aspect of the infrastructure, but I’ll nevertheless talk you through it briefly.

A DiagnosticConsumer subclass has the primary responsibility of overriding the HandleDiagnostic method, which takes a Diagnostic instance and a DiagnosticEngine::Level value (which is just the Diagnostic’s level, for convenience). A Diagnostic is a container for all information relevant to a diagnostic. Its implementation consists just of an llvm::StringRef to the unformatted message and then a pointer back to the engine, so that it can get more information when necessary. Next to HandleDiagnostic, we can also override BeginSourceFile and EndSourceFile, which lets us do pre- and post-processing, just like we can in a FrontEndAction or an ASTConsumer.

To get an idea of what subclassing DiagnosticConsumer looks like, we can just look at clang’s own IgnoringDiagConsumer inside Basic/Diagnostic.h:

/// \brief A diagnostic client that ignores all diagnostics.
class IgnoringDiagConsumer : public DiagnosticConsumer {
  virtual void anchor();

  void HandleDiagnostic(DiagnosticsEngine::Level DiagLevel,
                        const Diagnostic &Info) override {
    // Just ignore it.
  }
};

Besides the anchor() function, which does nothing, there’s not much to see here. We can now make our own little consumer following this class’ example:

class MyDiagnosticConsumer : public clang::DiagnosticConsumer {
public:
  void HandleDiagnostic(clang::DiagnosticsEngine::Level DiagLevel,
                        const clang::Diagnostic& Info) override {
    llvm::errs() << Info.getID() << '\n';
  }
};

which we can then just register with the DiagnosticsEngine via its setClient method:

auto& DE = Context.getDiagnostics();
DE.setClient(new MyDiagnosticConsumer(), /*ShouldOwnClient=*/true);

To get access to the actual diagnostic string we can use Diagnostic::FormatDiagnostic, which will append the formatted message (i.e. after substitution of %<n> arguments) to the output buffer you provide:

void HandleDiagnostic(clang::DiagnosticsEngine::Level DiagLevel,
                      const clang::Diagnostic& Info) override {
  llvm::SmallVector<char, 128> message;
  Info.FormatDiagnostic(message);
  llvm::errs() << message << '\n';
}

FixIt Hints

The last tool in clang’s diagnostics kit I’d like to talk about is FixIt hints, which are low false-positive rate suggestions we can provide to the user as to how a particular issue could be fixed. At the top of the post you can see an example of a FixIt hint where clang suggests that a semicolon should be added where one is missing. For the following discussion, we’ll be using the following simple snippet as a rolling example:

int var = 4 + 2;

which could be matched with the following ast_matchers matcher:

varDecl(
  hasType(isInteger()),
  hasInitializer(binaryOperator(
    hasOperatorName("+"),
    hasLHS(integerLiteral().bind("lhs")),
    hasRHS(integerLiteral().bind("rhs"))).bind("op"))).bind("var");

Creating FixIt Hints

FixIt hints are created via various static factory functions and can then be supplied to the DiagnosticBuilder, which will cause the FixIt hint to be emitted alongside any error message we provide. More precisely, there are four kinds of FixIt hints we can emit, each created via their respective factory function: insertions, replacements, removals and moves (i.e. moving existing code somewhere else). I’ll discuss each in more detail in the following paragraphs.

Insertions

clang::FixItHint::CreateInsertion() takes a source location and a snippet of code whose insertion should be suggested at the given location. Optionally, you can specify whether the insertion should be placed before or after other insertions at that point (the third argument), which defaults to false. You could use it like so to suggest the renaming of the matched variable:

void run(const MatchResult& Result) {
  const auto* VarDecl = Result.Nodes.getNodeAs<clang::VarDecl>("var");
  const auto FixIt = clang::FixItHint::CreateInsertion(VarDecl->getLocation(), "added_");

  auto& DE = Context.getDiagnostics();
  const auto ID = DE.getCustomDiagID(clang::DiagnosticsEngine::Remark,
                                     "Please rename this");
  DE.Report(VarDecl->GetLocation(), ID).AddFixItHint(FixIt);
}

Running a tool with this code on the above snippet will produce:

file.cpp:1:5: remark: Please rename this
int var = 4 + 2;
    ^
    added_

Replacements

To replace code with a suggested snippet, you can use the clang::FixItHint::CreateReplacement() function. It takes a source range in the code to replace, and the snippet you want to replace it with. Let’s use it to warn that the plus should better be a minus, just for reasons:

void run(const MatchResult& Result) {
  const auto* Op = Result.Nodes.getNodeAs<clang::BinaryOperator>("op");

  const auto Start = Op->getOperatorLoc();
  const auto End = Start.getLocWithOffset(+1);
  const auto SourceRange = clang::CharSourceRange::getCharRange(Start, End);
  const auto FixIt =
      clang::FixItHint::CreateReplacement(SourceRange, "-");

  auto& DE = Context.getDiagnostics();
  const auto ID = DE.getCustomDiagID(clang::DiagnosticsEngine::Remark,
                                     "This should probably be a minus");
  DE.Report(Start, ID).AddFixItHint(FixIt);
}

which produces:

file.cpp:1:11: remark: This should probably be a minus
int var = 4 + 2;
            ^
            -

Removals

To suggest that a redundant or illegal range of code be removed, we can call clang::FixItHint::CreateRemoval(), which takes a CharSourceRange (or a SourceRange) and provides the user with an appropriate hint. We could use it to ensure that all our variables are only a single character, because anything else will be too much to type:

void run(const MatchResult& Result) {
  const auto* VarDecl = Result.Nodes.getNodeAs<clang::VarDecl>("var");

  const auto NameLength = VarDecl->getName().size();
  if (NameLength <= 1) return;

  const auto Start = VarDecl->getLocation().getLocWithOffset(+1);
  const auto End = VarDecl->getLocation().getLocWithOffset(NameLength);
  const auto FixIt =
      clang::FixItHint::CreateRemoval(clang::SourceRange(Start, End));

  auto& DE = Context.getDiagnostics();
  const auto ID = DE.getCustomDiagID(clang::DiagnosticsEngine::Remark,
                                     "Why so readable?");
  DE.Report(VarDecl->getLocation(), ID).AddFixItHint(FixIt);
}

This gives us:

file.cpp:1:5: remark: Why so readable?
int var = 4 + 2;
    ^~~

Moves

Finally, CreateInsertionFromRange() allows us to suggest that a range of existing code should be moved somewhere else. The function takes the location where to recommend the insertion, the original range of code to move to that location and finally again a boolean BeforePreviousInsertions that defaults to false and handles collisions. We can use it to suggest that the two operands of the expression be swapped:

void run(const MatchResult& Result) {
  const auto* Left = Result.Nodes.getNodeAs<clang::IntegerLiteral>("lhs");
  const auto* Right = Result.Nodes.getNodeAs<clang::IntegerLiteral>("rhs");

  const auto LeftRange = clang::CharSourceRange::getCharRange(
      Left->getLocStart(),
      Left->getLocEnd().getLocWithOffset(+1)
  );

  const auto RightRange = clang::CharSourceRange::getCharRange(
      Right->getLocStart(),
      Right->getLocEnd().getLocWithOffset(+1)
  );

  const auto LeftFixIt = clang::FixItHint::CreateInsertionFromRange(
      Left->getLocation(),
      RightRange
  );

  const auto RightFixIt = clang::FixItHint::CreateInsertionFromRange(
      Right->getLocation(),
      LeftRange
  );

  auto& DE = Context.getDiagnostics();
  const auto ID =
      DE.getCustomDiagID(clang::DiagnosticsEngine::Remark,
                         "Integer plus is commutative, so let's swap these");

  DE.Report(Left->getLocation(), ID).AddFixItHint(LeftFixIt);
  DE.Report(Right->getLocation(), ID).AddFixItHint(RightFixIt);
}

which results in the following diagnostics:

file.cpp:1:11: remark: Integer plus is commutative, so let's swap these
int var = 4 + 2;
          ^
          2
file.cpp:1:15: remark: Integer plus is commutative, so let's swap these
int var = 4 + 2;
              ^
              4

Acting on FixIts

Often, we’ll not only want to emit our FixIt hints in diagnostics, but also provide the user the ability to act on the hints and apply them, either in-place or by writing the resulting code to a new file. Clang has nice facilities for this in form of the clang::FixItRewriter. It’s reasonably simple to use, but required quite a lot of digging through clang’s source code to understand entirely. I’ll spare you from that.

Basically, the FixItRewriter, which you can find in include/clang/Rewrite/Frontend/FixItRewriter.h and lib/Frontend/Rewrite/FixItRewriter.cpp (no, the folder swap is not my typo), is a subclass of the clang::DiagnosticConsumer, meaning we can register it with our DiagnosticsEngine and then tell it to rewrite files. This will look like this:

auto& DE = Context.getDiagnostics();
OurFixItRewriterOptions Options;
clang::FixItRewriter Rewriter(DE,
                              Context->getSourceManager(),
                              Context->getLangOpts(),
                              &Options);

DE.setClient(&Rewriter, /*ShouldOwnClient=*/false);

// ...

Rewriter.WriteFixedFiles();

Before discussing the mysterious OurFixItRewriterOptions in the above code, let me explain in a bit more detail what the FixItRewriter does. As it is a DiagnosticsConsumer subclass, it overrides the HandleDiagnostic method. Inside, it will first forward the diagnostic to its own client, which is the one the DiagnosticsEngine has registered when you construct the FixItRewriter. This means that the diagnostics we report will still be emitted to the console as before, just that, additionally, the rewriter will do some work on the FixIts that are registered with each Diagnostic passed to HandleDiagnostic. This “work” looks something like this (from FixItRewriter.cpp):

edit::Commit commit(Editor);
for (unsigned Idx = 0, Last = Info.getNumFixItHints();
     Idx < Last; ++Idx) {
  const FixItHint &Hint = Info.getFixItHint(Idx);

  if (Hint.CodeToInsert.empty()) {
    if (Hint.InsertFromRange.isValid())
      commit.insertFromRange(Hint.RemoveRange.getBegin(),
                         Hint.InsertFromRange, /*afterToken=*/false,
                         Hint.BeforePreviousInsertions);
    else
      commit.remove(Hint.RemoveRange);
  } else {
    if (Hint.RemoveRange.isTokenRange() ||
        Hint.RemoveRange.getBegin() != Hint.RemoveRange.getEnd())
      commit.replace(Hint.RemoveRange, Hint.CodeToInsert);
    else
      commit.insert(Hint.RemoveRange.getBegin(), Hint.CodeToInsert,
                  /*afterToken=*/false, Hint.BeforePreviousInsertions);
  }
}

Besides the transaction system clang has built for its source file modifications, what you see here is basically a loop over the FixIts registered for the diagnostic. Each FixIt results in one commit to the source file the FixIt references. However, these changes will not be performed inside HandleDiagnostic. For this to happen, you must call FixItRewriter::WriteFixedFiles. Inside this method, the rewriter will query its FixItOptions to determine where to store the resulting source buffer. This can be the same file for in-place changes, or a new file otherwise.

As such, what’s missing from the above example is the following subclass of clang:FixItOptions, which we have to define as well:

class OurFixItRewriterOptions : public clang::FixItOptions {
 public:
  MyFixItOptions() {
    InPlace = true;
  }

  std::string RewriteFilename(const std::string& Filename, int& fd) override {
    llvm_unreachable("RewriteFilename should not be called when InPlace = true");
  }
};

It may seem a bit weird that a configuration object like FixItOptions has to be subclassed just to set a single option. The reason for this is that one part of the configuration is the RewriteFilename method, where we can dynamically choose the name of the file a changed source buffer should be written to (it may have been wiser to opt for simply passing a std::function object and sparing us from all the boilerplate of subclassing, but that’s another discussion). If we want the changes to be applied in-place, we can simply set the FixItOptions’s InPlace field to true. In that case, RewriteFilename is not called at all. If we instead would like the changes to be stored in a new file, we can write something like this:


class MyFixItOptions : public clang::FixItOptions {
 public:
  MyFixItOptions() {
    InPlace = false;
    FixWhatYouCan = false;
    FixOnlyWarnings = false;
    Silent = false;
  }

  std::string RewriteFilename(const std::string& Filename, int& fd) override {
    const auto NewFilename = Filename + ".fixed";
    llvm::errs() << "Rewriting FixIts from " << Filename
                 << " to " << NewFilename
                 << "\n";
    fd = -1;
    return NewFilename;
  }
};

In this example, we first set InPlace to false and FixWhatYouCan to true. The latter means that even if there are errors for some rewrites, clang should nevertheless perform those modifications that it can (you probably wouldn’t want this in practice, it’s just to show what other options exist). Furthermore, by setting FixOnlyWarnings to false (its default), also errors will be fixed. Lastly, if the Silent property were true, only errors or those warnings for which FixIts were applied are forwarded to the FixItRewriter’s client (i.e. diagnostics whose level is below error and that don’t have a FixIt won’t be shown).

Inside RewriteFilename, I give an example of how you might want to store a rewritten file with the same name, but with a .fixed suffix. Note that the fd parameter can be set to an open file descriptor if you want to. In that case, the changes will be sent to that file descriptor (maybe useful if it’s a socket?). In most cases, you’ll probably just want to set it to -1, which disables that functionality.

To wrap up, I’ll provide you with an example of a complete clang tool that has the option to rewrite Fixits. If we take our previous example of the plus operation:

// file.cpp
int x = 4 + 2;

And want to rewrite the plus to a minus and store the change in a file, the following tool will do:

// Clang includes
#include "clang/AST/ASTConsumer.h"
#include "clang/AST/ASTContext.h"
#include "clang/AST/Expr.h"
#include "clang/ASTMatchers/ASTMatchFinder.h"
#include "clang/ASTMatchers/ASTMatchers.h"
#include "clang/Basic/Diagnostic.h"
#include "clang/Basic/SourceLocation.h"
#include "clang/Frontend/FrontendAction.h"
#include "clang/Rewrite/Frontend/FixItRewriter.h"
#include "clang/Tooling/CommonOptionsParser.h"
#include "clang/Tooling/Tooling.h"

// LLVM includes
#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/raw_ostream.h"

// Standard includes
#include <cassert>
#include <memory>
#include <string>
#include <type_traits>

namespace MinusTool {

class FixItRewriterOptions : public clang::FixItOptions {
 public:
  using super = clang::FixItOptions;

  /// Constructor.
  ///
  /// The \p RewriteSuffix is the option from the command line.
  explicit FixItRewriterOptions(const std::string& RewriteSuffix)
  : RewriteSuffix(RewriteSuffix) {
    super::InPlace = false;
  }

  /// For a file to be rewritten, returns the (possibly) new filename.
  ///
  /// If the \c RewriteSuffix is empty, returns the \p Filename, causing
  /// in-place rewriting. If it is not empty, the \p Filename with that suffix
  /// is returned.
  std::string RewriteFilename(const std::string& Filename, int& fd) override {
    fd = -1;

    llvm::errs() << "Rewriting FixIts ";

    if (RewriteSuffix.empty()) {
      llvm::errs() << "in-place\n";
      return Filename;
    }

    const auto NewFilename = Filename + RewriteSuffix;
    llvm::errs() << "from " << Filename << " to " << NewFilename << "\n";

    return NewFilename;
  }

 private:
  /// The suffix appended to rewritten files.
  std::string RewriteSuffix;
};

class MatchHandler : public clang::ast_matchers::MatchFinder::MatchCallback {
 public:
  using MatchResult = clang::ast_matchers::MatchFinder::MatchResult;
  using RewriterPointer = std::unique_ptr<clang::FixItRewriter>;

  /// Constructor.
  ///
  /// \p DoRewrite and \p RewriteSuffix are the command line options passed
  /// to the tool.
  MatchHandler(bool DoRewrite, const std::string& RewriteSuffix)
  : FixItOptions(RewriteSuffix), DoRewrite(DoRewrite) {
  }

  /// Runs the MatchHandler's action.
  ///
  /// Emits a diagnostic for each matched expression, optionally rewriting the
  /// file in-place or to another file, depending on the command line options.
  void run(const MatchResult& Result) {
    auto& Context = *Result.Context;

    const auto& Op = Result.Nodes.getNodeAs<clang::BinaryOperator>("op");
    assert(Op != nullptr);

    const auto StartLocation = Op->getOperatorLoc();
    const auto EndLocation = StartLocation.getLocWithOffset(+1);
    const clang::SourceRange SourceRange(StartLocation, EndLocation);
    const auto FixIt = clang::FixItHint::CreateReplacement(SourceRange, "-");

    auto& DiagnosticsEngine = Context.getDiagnostics();

    // The FixItRewriter is quite a heavy object, so let's
    // not create it unless we really have to.
    RewriterPointer Rewriter;
    if (DoRewrite) {
      Rewriter = createRewriter(DiagnosticsEngine, Context);
    }

    const auto ID =
        DiagnosticsEngine.getCustomDiagID(clang::DiagnosticsEngine::Warning,
                                          "This should probably be a minus");

    DiagnosticsEngine.Report(StartLocation, ID).AddFixItHint(FixIt);

    if (DoRewrite) {
      assert(Rewriter != nullptr);
      Rewriter->WriteFixedFiles();
    }
  }

 private:
  /// Allocates a \c FixItRewriter and sets it as the client of the given \p
  /// DiagnosticsEngine.
  ///
  /// The \p Context is forwarded to the constructor of the \c FixItRewriter.
  RewriterPointer createRewriter(clang::DiagnosticsEngine& DiagnosticsEngine,
                                 clang::ASTContext& Context) {
    auto Rewriter =
        std::make_unique<clang::FixItRewriter>(DiagnosticsEngine,
                                               Context.getSourceManager(),
                                               Context.getLangOpts(),
                                               &FixItOptions);

   // Note: it would make more sense to just create a raw pointer and have the
   // DiagnosticEngine own it. However, the FixItRewriter stores a pointer to
   // the client of the DiagnosticsEngine when it gets constructed with it.
   // If we then set the rewriter to be the client of the engine, the old
   // client gets destroyed, leading to happy segfaults when the rewriter
   // handles a diagnostic.
    DiagnosticsEngine.setClient(Rewriter.get(), /*ShouldOwnClient=*/false);

    return Rewriter;
  }

  FixItRewriterOptions FixItOptions;
  bool DoRewrite;
};

/// Consumes an AST and attempts to match for the
/// kinds of nodes we are looking for.
class Consumer : public clang::ASTConsumer {
 public:
  /// Constructor.
  ///
  /// All arguments are forwarded to the \c MatchHandler.
  template <typename... Args>
  explicit Consumer(Args&&... args) : Handler(std::forward<Args>(args)...) {
    using namespace clang::ast_matchers;

    // Want to match:
    // int x = 4   +   2;
    //     ^   ^   ^   ^
    //   var  lhs op  rhs

    // clang-format off
    const auto Matcher = varDecl(
       hasType(isInteger()),
       hasInitializer(binaryOperator(
         hasOperatorName("+"),
         hasLHS(integerLiteral().bind("lhs")),
         hasRHS(integerLiteral().bind("rhs"))).bind("op"))).bind("var");
    // clang-format on

    MatchFinder.addMatcher(Matcher, &Handler);
  }

  /// Attempts to match the match expression defined in the constructor.
  void HandleTranslationUnit(clang::ASTContext& Context) override {
    MatchFinder.matchAST(Context);
  }

 private:
  /// Our callback for matches.
  MatchHandler Handler;

  /// The MatchFinder we use for matching on the AST.
  clang::ast_matchers::MatchFinder MatchFinder;
};

class Action : public clang::ASTFrontendAction {
 public:
  using ASTConsumerPointer = std::unique_ptr<clang::ASTConsumer>;

  /// Constructor, taking the \p RewriteOption and \p RewriteSuffixOption.
  Action(bool DoRewrite, const std::string& RewriteSuffix)
  : DoRewrite(DoRewrite), RewriteSuffix(RewriteSuffix) {
  }

  /// Creates the Consumer instance, forwarding the command line options.
  ASTConsumerPointer CreateASTConsumer(clang::CompilerInstance& Compiler,
                                       llvm::StringRef Filename) override {
    return std::make_unique<Consumer>(DoRewrite, RewriteSuffix);
  }

 private:
  /// Whether we want to rewrite files. Forwarded to the consumer.
  bool DoRewrite;

  /// The suffix for rewritten files. Forwarded to the consumer.
  std::string RewriteSuffix;
};
}  // namespace MinusTool

namespace {
llvm::cl::OptionCategory MinusToolCategory("minus-tool options");

llvm::cl::extrahelp MinusToolCategoryHelp(R"(
This tool turns all your plusses into minuses, because why not.
Given a binary plus operation with two integer operands:

int x = 4 + 2;

This tool will rewrite the code to change the plus into a minus:

int x = 4 - 2;

You're welcome.
)");

llvm::cl::opt<bool>
    RewriteOption("rewrite",
                  llvm::cl::init(false),
                  llvm::cl::desc("If set, emits rewritten source code"),
                  llvm::cl::cat(MinusToolCategory));

llvm::cl::opt<std::string> RewriteSuffixOption(
    "rewrite-suffix",
    llvm::cl::desc("If -rewrite is set, changes will be rewritten to a file "
                   "with the same name, but this suffix"),
    llvm::cl::cat(MinusToolCategory));

llvm::cl::extrahelp
    CommonHelp(clang::tooling::CommonOptionsParser::HelpMessage);
}  // namespace

/// A custom \c FrontendActionFactory so that we can pass the options
/// to the constructor of the tool.
struct ToolFactory : public clang::tooling::FrontendActionFactory {
  clang::FrontendAction* create() override {
    return new MinusTool::Action(RewriteOption, RewriteSuffixOption);
  }
};

auto main(int argc, const char* argv[]) -> int {
  using namespace clang::tooling;

  CommonOptionsParser OptionsParser(argc, argv, MinusToolCategory);
  ClangTool Tool(OptionsParser.getCompilations(),
                 OptionsParser.getSourcePathList());

  return Tool.run(new ToolFactory());
}

Does it do what we want it to?

$ ./minus-tool file.cpp -rewrite -suffix=".fixed"
file.cpp:1:11: warning: This should probably be a minus
int x = 4 + 2;
          ^
          -
file.cpp:1:11: note: FIX-IT applied suggested code changes
Rewriting FixIts from file.cpp to file.cpp.fixed
1 warning generated.

$ cat file.cpp.fixed
int x = 4 - 2;

Outro

Clang is an incredibly exciting set of libraries for building not only a full-fledged compiler, but also your own tools for static analysis, linting or source-to-source transformation. As part of this, you’ll almost always want to emit some diagnostics, hopefully in an equally expressive manner as clang does itself. I hope that with this post, I’ve shown you all you need to know about emitting diagnostics with clang. As such, the only thing left to do is …


clang-all-the-things