KYC, Bitcoin, and the failed hopes of AML policies: Tracking funds on-chain

The cornerstone of the modern approach to money laundering is to prevent illicit funds from entering the financial system. The rationale is understandable: if criminals won’t be able to use their money, they will have to eventually stop whatever they are doing and go get a 9 to 5 job.

However, after 20 years of ever tighter (and ever more expensive) AML regulations, the levels of organized crime, tax evasion, or drug use do not show any signs of decrease. At the same time, the basic right to privacy is being unceremoniously violated on an everyday basis, with each financial operation, no matter how tiny, being subject to extensive verifications and tons of paperwork. Check Part 1 of this story for details and numbers.

This prompts a question: should we reconsider our approach to the AML strategy?

Two years ago, a fintech author David G.W. Birch wrote an article for Forbes, reflecting on the main principle of AML – gatekeeping. The key thought could be resumed as “instead of trying to prevent criminals from getting into the system, we let them in and monitor what they are up to.”

Indeed, why do we erect expensive AML gates and force the bad guys to turn to hardly traceable cash or works of art, while we can simply let them in and follow the money to hunt them down? To do so, we can use both the existing reporting system within traditional finance and the on-chain analytics within the blockchain. However, while the former is more or less understandable, the latter is still a mystery for most people. What’s more, politicians and bankers regularly accuse crypto of being a tool for criminals, tax evaders, and all sorts of Satan worshipers, further exacerbating the misunderstanding.

To shed more light on this matter, we need to better understand how on-chain analytics works. It is not an obvious task though: blockchain analysis methods are often proprietary and analytics companies sharing them could risk losing their business edge. However, some of them, like Chainalysis, publish rather detailed documentation, while the Luxembourgish firm Scorechain agreed to share some details of their trade for this story. Combining this data can give us a good idea of the potential and limitations of on-chain analytics.

How does on-chain analytics work?

The blockchain is transparent and auditable by anyone. However, not everyone is capable of drawing meaningful conclusions from the myriads of datasets it is composed of. Gathering data, identifying the entities, and putting the conclusions into a readable format is the specialty of on-chain analytic firms.

It all starts with getting a copy of the ledger, i.e. synchronizing the internal software with the blockchains.

Then, a tedious stage of mapping begins. How can we know that this address belongs to an exchange, and this one – to a darknet marketplace? Analysts employ all their creativity and resourcefulness to try and de-pseudonymize the blockchain as much as they can. Any technique is good as long as it works: collecting open-source data from law enforcement, scraping websites, navigating Twitter-X and other social media, acquiring data from specialized blockchain explorers like Etherscan, following the trace of stolen funds upon requests from attorneys… Some services are identified by interacting with them, i.e. sending funds to centralized exchanges to identify their addresses. To reduce the errors, the data is often cross-checked with different sources.

Once the addresses are identified to the best of one’s ability, one can see a bit clearer in the maze of transaction hashes. Yet, the picture is still far from complete. If for account-based blockchains like Ethereum identifying an address allows tracking its funds in a rather straightforward manner, for UTXO blockchains like Bitcoin, the situation is much less obvious.

Indeed, unlike Ethereum, which keeps track of addresses, Bitcoin blockchain keeps track of the unspent transaction outputs (UTXO). Each transaction always sends all the coins associated with an address. If a person wishes to spend only a part of their coins, the unspent part, also known as change, is assigned to a newly created address controlled by the sender.

It is the job of on-chain analytics firms to make sense of these movements and determine clusters of UTXO associated with the same entity.

Can on-chain analytics be trusted?

On-chain analytics is not an exact science. Both the mapping and the clustering of UTXO rely on experience and a carefully calibrated set of heuristics each company has developed for itself.

This issue was highlighted last July in the court hearing involving Chainalysis, which had provided its forensic expertise in the US v Sterlingov case. The firm’s representative admitted that not only its methods were not peer-reviewed or otherwise scientifically validated, but also the firm did not keep track of its false positives. In Chainalysis defense, the first point is understandable: the methods that each firm uses to analyze the blockchain are closely guarded trade secrets. However, the issue of false positives must be tackled better, especially if it could end up sending someone to jail.

Scorechain uses a different approach, erring on the side of caution and only choosing the methods that do not generate false positives in the clustering process, such as the multi-input heuristics (assumption that in a single transaction all input addresses come from one entity). Unlike Chainalysis, they do not use any change heuristics, which produce a lot of false positives. In some cases, their team can manually track UTXOs if a human operator has enough reasons to do so, but overall, this approach tolerates blind spots, counting on the additional information in the future that would fill them in.

The very notion of heuristics – i.e. strategies that employ a practical but not necessarily scientifically proven approach to problem-solving – implies that it cannot guarantee 100% reliability. It is the outcome that measures its effectiveness. The FBI stating that Chainalysis’ methods are “generally reliable” could serve as proof of quality, but it would be better if all on-chain analytics firms could start measuring and sharing their rates of false positives and false negatives.

Seeing through the fog

There are ways of obfuscating the trace of funds or making them more difficult to find. Crypto hackers and scammers are known to use all kinds of techniques: chain hopping, privacy blockchains, mixers…

Some of them, like swapping or bridging assets, can be traced by on-chain analytics firms. Others, like the privacy chain Monero, or various mixers and tumblers, often can’t. There were, however, instances when Chainalysis claimed to de-mix transactions passed through a mixer, and most recently Finnish authorities announced that they have tracked Monero transactions as part of an investigation.

In any case, the very fact of having used these masking techniques is very much visible and can serve as a red flag for any AML purposes. The US Treasury adding last year the smart contract address of Tornado Cash mixer to the OFAC list is one such example. Now, when the coins’ history is traced down to this mixer, the funds are suspected of belonging to illicit actors. This is not great news for privacy advocates, but rather reassuring for crypto AML.

One might ask what’s the point of flagging the mixed coins and tracing them across blockchains if we don’t have a concrete person to pin them to, like in the banking system? Luckily, criminals have to interact with the non-criminal world, and the tainted money sooner or later ends up either at goods or service providers, or at a bank account, and this is where law enforcement can identify the actual persons. This is how the FBI got its biggest-ever seizure of $4.5 billion worth of Bitcoin (in 2022 prices) following the Bitfinex hack. This also works in reverse: if law enforcement gets access to a criminal’s private keys, they can move up the blockchain history to identify the addresses that had interacted with it at some point. This is how the London Metropolitan Police uncovered a whole drug dealing network from one single arrest (source: Chainalysis’ Crypto Crime 2023 report).

Crime has existed since the dawn of humanity, and will probably accompany it till its end, using ever-evolving camouflaging techniques. Luckily, crime detection methods follow suit, and it happens that the blockchain is an ideal environment for deploying digital forensics tools. After all, it is transparent and accessible to everyone (which by the way cannot be said about the banking sector).

One can argue that current on-chain analysis methods need to be improved – and that point holds true. However, it is clear that even in this imperfect form it is already an efficient tool for tracking bad guys on-chain. Perhaps, then, it’s time to reconsider our approach to AML and let the criminals into the blockchain?

A special thank you to the Scorechain team for sharing their knowledge.

This is a guest post by Marie Poteriaieva. Opinions expressed are entirely their own and do not necessarily reflect those of BTC Inc or Bitcoin Magazine.