Data Analytics Is Already Deciding Who Gets Investigated for Financial Crime

HMRC knows about your Airbnb listing. It knows about the car you posted on Instagram last month. It cross-referenced your Land Registry records against your declared rental income sometime around Tuesday and flagged a discrepancy that a human investigator will now review over coffee.

That isn’t speculation. HMRC’s Connect system, the data analytics platform sitting at the core of tax enforcement across Britain, analyses over fifty-five billion data points every year. It pulls from banks, utility companies, the Land Registry, Companies House, overseas tax authorities, online marketplaces like eBay and Etsy, and yes, publicly available social media. The system cost forty-five million pounds to build when it launched in 2010 and in the 2024/25 tax year alone the leads it generated helped recover an additional four point six billion pounds in tax revenue.

That number is about to get bigger. Chancellor Reeves identified a forty-seven billion pound tax gap in the 2025 Budget and roughly seven billion of that is expected to be clawed back through expanded AI surveillance and compliance enforcement. HMRC isn’t hiring thousands of new investigators to do it, they’re feeding Connect more data and letting the algorithms do the targeting.

How Connect Actually Decides You’re Worth Investigating

The system works by cross-matching data sources that most people assume exist in separate silos.

Your bank reports interest payments to HMRC.
The Land Registry shows property ownership.
Estate agents share client lists.
Airbnb and other platforms report booking income.
Employers submit payroll data.
Overseas jurisdictions share account information through automatic exchange agreements.

Connect layers all of this together and builds what amounts to a financial portrait. Then it compares that portrait against what you actually declared on your tax return.

Predictive modelling and risk scoring

The AI component does something genuinely sophisticated here, it doesn’t just look for missing income. It uses predictive modelling to score cases by risk, clustering taxpayers into groups and flagging outliers whose patterns look statistically unusual compared to others in their sector.

A landlord declaring twelve thousand pounds in rental income when every comparable property in the same postcode generates twenty-five thousand is going to show up.
A sole trader whose business expenses are three times the industry average is going to show up.
Someone whose social media suggests a lifestyle that doesn’t match their declared earnings is absolutely going to show up.

Social network analysis

Social network analysis is another layer, Connect maps relationships between people, companies and properties to identify hidden connections. A director running three companies that all transact with the same offshore entity in a jurisdiction known for secrecy, that pattern emerges without any human needing to spot it manually.

False positives and the cost of getting flagged

The system catches genuine evaders but it also flags innocent people. Kevin Igoe, Managing Director of PfP, has publicly noted that Connect can “easily produce false positives and trigger investigations into innocent individuals and businesses.”

When you’re one of those individuals receiving an HMRC enquiry letter because an algorithm decided your numbers looked odd, the experience is stressful and potentially expensive even if you’ve done nothing wrong. Professional representation during a Connect-triggered enquiry typically costs between two thousand and ten thousand pounds depending on complexity.

How prosecution teams are building cases now

It’s not just HMRC. The way financial crime gets investigated across all the major enforcement bodies has shifted fundamentally because of data analytics and the change happened faster than most people in the legal profession expected.

The FCA and data-led regulation

The FCA published its 2025 strategy making clear it was moving to what it called “data-led regulation.” In practical terms that means the FCA is using machine learning to monitor market activity in near real-time, identifying trading anomalies and potential insider dealing or market manipulation before opening formal investigations.

They’re commencing fewer investigations overall but the ones they do open are landing harder because the evidential foundation is already built by the time the target knows they’re being looked at.

The Serious Fraud Office

The Serious Fraud Office takes a different approach to the same underlying principle. In complex multi-jurisdictional cases like Ultra Electronics, the SFO traces financial flows across countries by analysing millions of transaction records and cross-referencing them against communication metadata, compliance documentation and whistleblower reports. The kind of pattern recognition that would have taken a team of forensic accountants months to assemble manually can now be generated in days.

Cross-border intelligence sharing

Europol’s European Financial and Economic Crime Centre, established in 2020, coordinates cross-border financial intelligence sharing across EU member states using centralised analytics platforms. Even post-Brexit, data sharing arrangements mean that transaction patterns identified by European authorities can surface in investigations being run by the NCA or SFO on this side of the channel.

How defence teams use the same tools

Here’s the part that doesn’t get discussed enough. The same analytical techniques that prosecutors use to build cases are available to defence teams, and competent defence work now involves running the prosecution’s dataset independently to find what they missed or got wrong.

Where Connect’s data gets it wrong

When HMRC’s Connect system flags someone based on a pattern match, that match is only as good as the data feeding it. Incomplete records, miscategorised transactions, timing differences between when income was received and when it was declared, data from third parties that contains errors — all of these can produce a narrative of evasion where the reality is administrative messiness.

Defence analysts will rebuild the transaction timeline from source documents and compare it against what Connect assembled. Discrepancies between the two versions become the foundation of the defence argument. The prosecution says the pattern suggests hidden income, the defence shows the pattern was created by a data entry error at the bank or a timing mismatch between tax years.

Risk modelling in defence work

Risk modelling has become part of the defence toolkit as well. Compliance officers and defence teams increasingly use methods like Monte Carlo simulation to assess exposure across multiple possible scenarios, running thousands of iterations to estimate the probability distribution of outcomes rather than relying on single-point estimates.

For a company facing a potential investigation the question isn’t just “are we liable” but “across all the variables we can model, what’s the range of financial exposure and which scenarios are most likely.” That probabilistic approach to risk is reshaping how organisations decide whether to self-report, how much to provision for potential penalties and how aggressively to contest HMRC’s interpretation of the data.

Algorithm-driven enforcement and legal safeguards

There’s an uncomfortable question sitting underneath all of this and it hasn’t been answered properly by any government or regulator.

When a system analyses fifty-five billion data points and uses predictive modelling to decide which taxpayers deserve investigation, what happens to the presumption of innocence? You haven’t been accused of anything. No human has reviewed your affairs. An algorithm assigned you a risk score based on statistical patterns and now you’re explaining your finances to an HMRC officer who arrived with a conclusion already half-formed by the data.

The EU AI Act and post-Brexit position

The EU’s AI Act which entered into force in August 2024 classifies AI systems used in law enforcement and tax administration as “high risk” requiring transparency obligations, human oversight and regular auditing. Post-Brexit, the framework doesn’t directly apply in Britain but it sets a standard that regulators and courts will increasingly reference when the fairness of algorithm-driven enforcement decisions gets challenged.

HMRC maintains that Connect generates leads, not conclusions, and that human investigators make all final decisions. That’s technically true. But when the system has already constructed a financial portrait that looks damning before any human gets involved, the practical effect on how investigations proceed is significant. The defence community has started pushing back on this, arguing that algorithm-generated risk scores should be disclosable to the taxpayer under the same principles that require prosecution evidence to be shared.

That debate is going to get louder as the systems get more powerful and the revenue targets get more ambitious. Four point six billion recovered in one year is impressive. Seven billion is the new target. The algorithms will keep getting fed and the investigations will keep getting more targeted. Whether the safeguards keep pace with the capability is genuinely an open question.