There are few industries in which the importance of data is as evident as in the world of financial technology. Behind every swipe, click, and transfer lies a universe of information waiting to be decoded. At the heart of this decoding process sits machine learning – an AI subarea that transforms how we process and interpret financial data.
Machine learning is fundamentally about teaching computers to learn from experience, much like humans. Unlike traditional programming, where every rule must be explicitly coded, machine learning algorithms identify patterns, make decisions on their own, and improve performance over time. It’s like training a detective who sharpens their skills with every case, becoming more adept at spotting the subtle clues others might miss.
In this article, our Data team dives into how AI – specifically machine learning – powers our product at its very core.
Do you want to join our engineering or data team?
Discover our current job openings at re:cap and become part of our journey!
See open positionsWhy is AI important for re:cap?
For financial technology and data companies like re:cap, AI is crucial for processing data. It enables us to manage and analyze over 250 million data points daily. The challenge? Finances demand precision and getting data wrong can lead to flawed financial decisions.
One crucial process is the categorization of bank transactions. Traditionally, this task requires human intervention by a team of financial specialists. Such teams streamline the process of managing, delivering, and analyzing data. They’re the gatekeepers of accuracy and reliability. However, every team of such financial specialists would face the overwhelming challenge of manually processing hundreds of thousands of transactions daily – an approach that is neither scalable nor economically viable.
All eyes on bank transactions
Bank transactions are a financial X-ray of a company. They reveal multiple insights: cash flow patterns, spending habits, income sources, investment strategies, and potential liabilities. For our underwriting process, transaction categorization is critical to assessing a company’s financial health. It allows us to provide a precise risk analysis tailored to each company’s unique situation, ensuring reliable financial insights and funding decisions. Misclassified transactions distort key metrics and skew risk assessments. With the right approach, AI offers precision, context awareness, and the ability to make raw financial data useful for us and our customers.
Machine learning truly excels when it comes to categorizing bank transactions. By leveraging its ability to process vast amounts of data with precision and speed, we turn messy transaction lists into clear, actionable insights.
First approach: rule-based learning
Our first attempt at categorizing bank transactions followed a rule-based model – one driven by predefined rules that determined how each transaction should be classified. We started by crafting rules based on transaction patterns and behaviors. The model then compares incoming transaction data to these rules to assign the correct category. Imagine a massive flowchart where specific keywords trigger particular classifications.
For example, a transaction might be labeled "Salary" if
- the word "Gehalt" is in the purpose
- and the booking date is in the last 10 days of the month
- and the amount is above €400
In this example, the rule-based model automatically classifies the bank transaction as "Salary". The engine relies on predefined rules and transaction data to assign the correct category.
Non-scalable, rigid, and prone to conflict
The rule-based model, while effective in some areas, has clear limitations.
Scalability, for one, is a major challenge. To achieve accurate transaction classifications, you need massive dictionaries of thousands of keywords in multiple languages. This makes maintenance time-consuming – constantly updating, checking, and expanding them to capture every possible transaction variation. Additionally, the rule-based model is rigid. It can’t adapt to contexts or unexpected scenarios. Rules are designed to handle specific patterns, but they often lack contextual understanding.
And multiple rules can be triggered at once. If more than one rule is valid, the system can’t execute multiple actions simultaneously. This limitation highlights a flaw in rule-based systems: they struggle with overlapping conditions or conflicting instructions.
Given the limitations of the rule-based model, we’ve gone one step further – machine learning.
Second approach: machine learning
Machine learning offers a different approach. Instead of programming every possible scenario, the system learns to recognize patterns and make intelligent classifications. It's a dynamic, repetitive process: collecting and labeling data, engineering features, training models, and continuously evaluating performance – and then repeating the cycle with more data, new labels, and continuous refinement. This iterative loop drives the system's constant evolution.
The foundation: labeled data and feature engineering
Machine learning models rely on labeled data to learn and create accurate classifications. Initially, we leveraged insights from our rule-based model to label thousands of transactions, establishing the “ground truth” essential for training the machine learning model effectively.
The next step? Defining the transaction features central to our approach. Feature engineering provides the raw material that allows machine learning models to decode data. These features are numerical representations of key characteristics that empower the system to spot patterns and uncover relationships.
The daily workflow: automation and human oversight
Our process is designed to run with minimal human involvement. Transactions are processed daily, predictions are generated, and feedback is incorporated to refine the model continuously. When the system encounters transactions it cannot confidently classify, they are flagged for human review.
The machine learning model calculates confidence scores using statistical methods to determine when a transaction requires human intervention. For this purpose, it evaluates historical transaction data, trends, and patterns to make informed predictions.
A continuous learning cycle
The prediction process for the machine learning model follows a continuous cycle. When a new transaction is encountered, it is transformed into numerical data, and the model generates a prediction. Since the transaction is new, the actual category is initially unknown, and the prediction is used in operational applications or reports.
Our team of financial specialists, which we call "DataOps", then reviews these predictions, validating or correcting them. Once this human validation is complete, the corrected data is fed back into the system. This updated information allows the model to learn from its successes and mistakes, refining its understanding for future predictions.
The iterative process
The model’s prediction cycle is designed to adapt and evolve:
- Prediction: a new transaction is transformed into numerical data and categorized.
- Review: DataOps validates or corrects the categorization.
- Learning: validated data is fed back into the model for learning.
This process repeats daily with hundreds of thousands transactions, ensuring the model becomes more accurate. Each cycle builds on the previous day’s insights, creating a continuously improving system. By systematically setting aside and testing against unseen data, these algorithms learn to predict new scenarios without being explicitly programmed for each one.
Results: rule-based vs. machine learning model
After testing both models, it became clear that the machine learning model outperformed the rule-based model.
The rule-based model started with a strong 94% accuracy but gradually declined to 92.2%. In comparison, the machine learning model has consistently improved, learning from daily transaction processing and user feedback, and now reaches 98.8% accuracy.
Rule-based models for routine transactions
Rule-based models excel at handling well-defined, frequently occurring categories. These tasks don’t require the depth of machine learning, making rules an efficient choice for:
- Common transaction types
- Predictable patterns
Machine learning for complex cases
For transactions that fall outside established rules, machine learning steps in. Using historical data and advanced algorithms, machine learning models:
- Classify complex or novel transactions
- Adapt to changing patterns over time
This allows the system to handle routine and novel transactions effectively, without relying solely on human review.
However, the two models cannot be viewed in silos. Machine learning can also help refine rule-based systems. For example, if a rule-based model consistently miscategorizes certain transactions, machine learning can help identify patterns and suggest modifications to the rules. In this way, machine learning augments the rule-based system, improving its accuracy and reducing reliance on manual intervention.
What’s next: custom categorization
The next steps in developing transaction categorization models center around the challenge of handling custom categories defined by individual customers. Adding new categories requires extensive labeled data for training, which becomes impractical when only a small number of examples exist for a niche category.
Customers may want to create specific categories, such as distinguishing "Salesforce costs" from the broader "IT" category. However, with only a handful of transactions labeled as such, the current model struggles to learn and generalize effectively. Compounding the problem is that a single shared model applies across all customers, making it difficult to account for unique naming conventions or category definitions specific to individual customers.
The main challenge lies in developing a system that:
- Learns from minimal data: adapts to unique, customer-defined categories even when very few labeled examples exist.
- Handles customer-specific preferences: accommodates differing definitions and naming conventions for similar transaction types across customers.
- Adapts dynamically: enables specialized models or algorithms tailored for each customer, moving away from the “one-size-fits-all” approach of current models.
The challenges are clear, but the solution remains elusive. That’s why we’re pushing the boundaries, exploring entirely new approaches to seamlessly integrate these specialized, customer-specific models. For us, it’s time to unlock the next stage in the evolution of banking transaction categorization.
Do you want to join our engineering or data team?
Discover our current job openings at re:cap and become part of our journey!
See open positions