JWCC is a minimal extension to the widely used JSON file format with (1) optional commas after the final element of arrays and objects and (2) C/C++ style comments. These two features make it more suitable for human-editable configuration files, without adding so many features that it’s incompatible with numerous other (deliberate and accidental) existing JSON extensions.

Native Type Theory is a new paper by myself and Mike Stay. We propose a unifying method of reasoning for programming languages: model a language as a theory, form the category of presheaves, and use the internal language of the topos.

As programmers, we spend a lot of time just carting data from one place to another. Sometimes that’s the entire purpose of a program or library (data conversion whatevers), but more often it’s just something that needs to happen in the course of getting a certain task done. When we’re sending a request, using a library, executing templates or whatever, it’s important to be 100% clear on the format of the data, which is a fancy way of saying how the data is encoded.

Here’s a list of reasons why SELECT * is bad for SQL performance, assuming that your application doesn’t actually need all the columns. When I write production code, I explicitly specify the columns of interest in the select-list (projection), not only for performance reasons, but also for application reliability reasons. For example, will your application’s data processing code suddenly break when a new column has been added or the column order has changed in a table?

I’ll focus only on the SQL performance aspects in this article. I’m using examples based on Oracle, but most of this reasoning applies to other modern relational databases too.

Read More

In Curried Elixir, I said this:

We’re going to do this by wrapping my_fun in a chain of anonymous functions, each binding exactly one name:

curried_fun =
  fn (a) ->
    fn (b) ->
      fn © ->
        my_fun.(a, b, c)
      end
    end
  end

And then went on implementing curry as a recursive function. But what if I actually want my curried function to be a chain of anonymous functions, and get rid of this recursive call?

The obvious thing to do would be to try to implement this as a macro instead. But I think we can do better than this.

Read More

Welcome to the fantastic world of nerdy regex fun! Start playing by selecting one of the puzzle challenges below. There are a wide range of difficulties from beginner to expert.

Anyone who ever tried to cross-compile a C/C++ program knows how big a PITA the whole process could be. The main reasons for this sorry state of things are generally how byzantine build systems tend to be when configuring for cross-compilation, and how messy it is to set-up your cross toolchain in the first place…

While simple I/O code in Haskell looks very similar to its equivalents in imperative languages, attempts to write somewhat more complex code often result in a total mess. This is because Haskell I/O is really very different in how it actually works.

The following text is an attempt to explain the details of Haskell I/O implementations. This explanation should help you eventually learn all the smart I/O tips. Moreover, I’ve added a detailed explanation of various traps you might encounter along the way. After reading this text, you will be well on your way towards mastering I/O in Haskell.

Read More

Small and portable Regular Expression (regex) library written in C.

Design is inspired by Rob Pike’s regex-code for the book “Beautiful Code” available online here.

Supports a subset of the syntax and semantics of the Python standard library implementation (the re-module).

One of the first things a novice SQL developer learns about is called “thinking in SQL”, which is usually being opposed to “procedural thinking”

Let’s see what part of brain does this intellectual activity take and how to use it.

Two features distinguish SQL from other languages you learned as a 11-year old kid on your first PC, like BASIC or perl or maybe even C++ if you’re such a Wunderkind.

First, SQL is set-based. It does things with sets.

Every tool is designed to do things with something else. Like, you use a hammer to do things with nails, or use a screwdriver to do things with screws, or use an oven to do things with food.

Same with computer languages.

BASIC does things with variables. perl does things with scalars, arrays, hashes and file streams. Assembly does things with registers and memory.

You should not be confused by something like “registers are just a special case of variables”, or “a hash is just a generalized container which exposes this and this method” or something like that. No.

A hash is a hash, a variable is a variable and a register is a register.

Like, an egg is a food and rice is a food and it’s possible to cook some eggs in a rice cooker and vice versa, but they are just wrong tools to do that.

Read More

In my last post I discussed inline caching as a technique for runtime optimization. I ended the post with some extensions to the basic technique, like quickening. If you have not read the previous post, I recommend it. This post will make many references to it.


Quickening involves bytecode rewriting — self modifying code — to remove some branches and indirection in the common path. Stefan Brunthaler writes about it in his papers Efficient Interpretation using Quickening and Inline Caching Meets QuickeniIn my last post I discussed inline caching as a technique for runtime optimization. I ended the post with some extensions to the basic technique, like quickening. If you have not read the previous post, I recommend it. This post will make many references to it.


Quickening involves bytecode rewriting — self modifying code — to remove some branches and indirection in the common path. Stefan Brunthaler writes about it in his papers Efficient Interpretation using Quickening and Inline Caching Meets QuickenIn my last post I discussed inline caching as a technique for runtime optimization. I ended the post with some extensions to the basic technique, like quickening. If you have not read the previous post, I recommend it. This post will make many references to it…

Read More

In the previous post we’ve seen a very nice technique to use value semantics with inheritance and virtual methods, which was made possible by std::any.

Given its usefulness, it would be interesting to better understand std::any. Indeed, std::any is sometimes said to be “the modern void“. But it does much more than a void.

If you use JSON anywhere, I want you to try something. Pop open the developer tools in your web browser and ask it to tell you what the result of ‘Math.pow(2,60)’ is. Just stuff that there and look at what you get back. It’s a big number, right?

Now, using any other reference source, look up what 2 to the power of 60 is (hint: Wikipedia “power of two” if your Google search only gives you scientific notation). Compare it to what you got from your browser.

What did you find? I assume if you didn’t notice the “000” on the end of what your web browser told you before, you will now.

Feel free to try this with other large numbers. You should find that anything above 2^53 starts getting squirrelly.

Exactly WHY this happens is not important for this specific post - floating point, mantissa, yadda yadda. I’ve covered it elsewhere, but I don’t think people really appreciated it for the problem that it is. This method of approaching it should get the general concept around to a wider audience, or at least, I hope it will.

Read More

My team writes a lot of command line tools, and we like to assume that people aren’t using a literal VT100 (meaning: we liberally use colours, italics, and basically every other terminal feature available to us). This tends to result in strings in our code that look a little like this:

”\x1b[A\r\x1b[K\x1b[1;32mopened \x1b[1;4;34m%s\x1b[0;1;32m in your browser.\x1b[0m\n”

If you’re like most people, your face just melted, but it’s actually really simple. This page is a crash course in what all of these things mean, and how to learn to read and write them effectively.

Read More

I work at Red Hat on the GNU Compiler Collection (GCC). In GCC 10, I added the new -fanalyzer option, a static analysis pass for identifying various problems at compile-time, rather than at runtime. The initial implementation was aimed at early adopters, who found a few bugs, including a security vulnerability: CVE-2020-1967. Bernd Edlinger, who discovered the issue, had to wade through many false positives accompanying the real issue. Other users also managed to get the analyzer to crash on their code.

I’ve been rewriting the analyzer to address these issues in the next major release, GCC 11. In this article, I describe the steps I’m taking to reduce the number of false positives and make this static analysis tool more robust.

Read More

Inheritance is a useful but controversial technique in C++. There is even a famous talk by Sean Parent called Inheritance is the base class of evil. So inheritance is not the most popular feature of the C++ community.

Nevertheless, inheritance is useful, and widely used by C++ developers.

What is the problem of inheritance? It has several problems, and one of them is that it forces us to manipulate objects through pointers.

To illustrate, consider the following hierarchy of classes…

Type families in Haskell offer a flavor of dependent types: a function g or a type family G may have a result whose type F x depends on the argument x:

type family F (x :: Type) :: Type

g :: forall x. Proxy x -> F x – Proxy to avoid ambiguity g = undefined – dummy

type family G (x :: Type) :: F x

But it is not quite clear how well features of other “truly” dependently typed languages translate to Haskell. The challenge we’ll face in this post is to do type-level pattern-matching on GADTs indexed by type families.

Read More