Before you begin with this guide, you should have the basic knowledge from Getting started with Tapix tutorial.
In this guide, you will learn how to keep your data up-to-date to provide you and your customers the most value. You will familiarize yourself with:
Firstly, it is important to understand why it is necessary to keep data up-to-date and reasons why data can become outdated.
Tapix provides you with three types of information for each enriched transaction:
After a raw transaction is enriched, the results are stored. This information stays "frozen in time". However, the information may require changing in time due to several reasons:
For example - to understand the extent of changes - 5% of logos are changed annually.
To correctly understand the extent of changes in data, it is crucial to know the term transaction mapping. Transaction mapping involves establishing connections between transactions and their associated shop identifiers (UIDs or handles), ensuring accurate linkage and representation of transactional data within the system.
There are three options for keeping data up-to-date, each with its own advantages and disadvantages outlined below.
Solution | Keeps shops' and merchants' attributes up-to-date | Ensures that past transactions are mapped to the correct shop | API calls | Implementation requirements |
Cache expiry | ✓ | ✗ | Endpoints /shops/{id} and /merchants/{id} are called for all database objects, resulting in thousands of redundant API calls. | No specific implementation is required. |
Partial invalidations | ✓ | ✗ | Endpoints /shops/{id} and /merchants/{id} are called only for invalidated shops/merchants. | Simple, partial implementation of /invalidations endpoint |
Full invalidations | ✓ | ✓ | Endpoints /shops/{id} and /merchants/{id} are called only for invalidated shops/merchants. Moreover, use the respective /findBy endpoint for invalidated transactions. | More complex implementation of /invalidations endpoint due to the need to handle all types of data changes |
Note that if you utilize the complete response endpoint /shops/complete/findByCardTransaction
, none of these scenarios are applicable, as they do not align with the intended functionality provided by this endpoint.
Initial enrichment setup: Every transaction is sent to the API, and links between shops and transactions and merchant and shop objects are established and stored in the database indefinitely.
How it works
The link between a shop and a transaction (the handle) remains unchanged during this process. This applies to past transactions; new transactions will already have the correct shop UID.
Cache expiry can handle changes such as updating or adding new merchant logos, but it will not reflect previously unresolved transactions.
If you choose not to implement invalidations, there is a primary method to keep your current data relatively up-to-date. For the endpoints /shops/{id}
and /merchants/{id}
, you can cache the data for a short period, such as one week. After this period, overwrite all cached data about shops and merchants by calling the/shops/{id}
and /merchants/{id}
endpoints again.
Please note that calls for endpoints /shops/{id}
and /merchants/{id}
are not invoiced.
Invalidations in the Tapix ecosystem allow keeping data up-to-date. For example, we might add new attributes like a category or logo to a specific shop or merchant that you have already linked to transactions or transfers. Additionally, we may identify transactions that were previously unrecognized. In the first step, you can use this functionality to update shops and merchants (similar to cache expiry) more efficiently by updating only the objects marked as changed by the Invalidations API.
To reduce traffic and focus on current data, you can invalidate only recent payment data (e.g., from the past 1-2 years).
Initial enrichment setup: Every transaction is sent to the API, and links between shops and transactions and merchant and shop objects are established and stored in the database indefinitely.
How it works
/invalidations
provides objects that need to be reloaded from the API.This approach updates the shop and merchant data without addressing deeper transaction links
When invalidations are implemented fully, any part of the provided data can be changed, offering an ultimate approach to keeping the data up-to-date.
Initial enrichment setup: Every transaction is sent to the API, and links between shops and transactions and merchant and shop objects are established and stored in the database indefinitely.
How it works
/invalidations
provides objects that need to be reloaded from the API.Invalidations inform you of changes to data you've already received, such as new attributes (address, coordinates, logo) added to shops or merchants you've linked to transactions or newly recognized transactions that were previously unrecognized.
We recommend implementing /invalidations if you want to ensure that all your historical data is consistently updated with the latest information. However, if maintaining historical data accuracy is not essential or is unwanted for your use case, you may choose not to implement them. To reduce traffic and focus on current data, it is important to set a time window, such as 6 or 12 months, within which you want to keep transactions up to date. This approach minimizes the impact of full invalidations, which can cause a significant increase in traffic. Alternatively, if you need to invalidate all historical payment data, you have the flexibility to do so.
Now, we move to the implementation of invalidations. Unlike cache expiry, there is an additional logic implementation. Both types of invalidations are implemented similarly. The difference is that only"shallow" invalidations are considered in partial implementation of invalidations. Here's the implementation overview:
Invalidations work on multiple levels for different entities. Within invalidations, we provide you with two primary attributes - level and type. Let's look at these more closely.
Levels = refer to different degrees or depths of changes that have occurred within the system.
shallow
deep
or (or transfer-deep
)deep
update, the system needs to delete the existing mapping of transactions to their corresponding shop UID and handle and then remap them to the updated records.solved
or (or transfer-solved
)deep
, the link between transaction and shop UID occurssolved
update, the system must delete the mapping of transactions or payments that were previously unsolved and then remap them to the updated recordsPlease be aware that the transfer-deep and transfer-solved levels specifically pertain to bank transfers, not card transactions.
Types = refer to different categories or classifications of entities that a change within the system has impacted.
Types delineate distinct groups of entities affected by changes, outlining the actions required to maintain data integrity and relevance.
shop
merchant
Definition | Mechanism | Theoretical change | Example | What needs to be done | |
Remap | changing the link between a raw transaction and a shop | invalidation level: deep + type: shop (level: transfer-deep + type: shop ) | the same transaction gets a different shop UID | update the existing mapping of transactions to their corresponding shop UID or handle and then remap them to the updated records | |
invalidation level: solved + type: shop (level: transfer-solved + type: shop ) | |||||
Update | Fixing or adding attributes to a shop or merchant | Invalidation level: shallow + type: merchant | Edit of merchant attributes | add/change logo | Overwrite the existing records for merchants |
edit the merchant's name | |||||
Invalidation level: shallow + type: shop | Edit of shop attributes | add/ change/ delete tags | Overwrite the existing records for shops | ||
add/change category | |||||
edit address | |||||
add/edit URL address | |||||
add/change Google Place ID | |||||
add/change shop type |
In case a shop or merchant is deleted from our database, the UID is included in shallow invalidations. If you call this UID and receive a 404 response, it means the shop or merchant has been deleted. You should then remove the corresponding entry from your system.
The whole logic behind invalidations has two steps:
Each of these happens on a different endpoint
The whole logic of invalidations is done with two endpoints:
/invalidations
service to get a list of all invalidated items within the given numeric ID./invalidations
). This mechanism works best when used regularly, ideally every day.When calling these endpoints, set the parameter refresh = true
to inform Tapix that you are retrieving existing transactions again, and they do not represent new transactions in your system. These API calls are not invoiced.
For this, you will use /invalidations/item/range
. A sample query may look something like this: /v6/invalidations/item/range?from=2018-06-08T10:15:08Z&to=2018-06-10T10:15:08Z
Note that you do have to add a parameter from
, which is mandatory, unlike the parameter to
, which is optional. That is thanks to the fact you will most likely call for all invalidated objects up to the moment of the call.
The best practice when using the from
parameter is to set its value to the exact time you obtained the first data from Tapix. The response for such a call would look like this:
{ "fromId": 78757, "toId": 78945, "itemCount": 188 }
The response contains a range of IDs of the invalidated objects and their total count. The range of IDs will be used in the following section to get information about transactions, transfers, shops, and merchants, which data needs to be updated or discarded.
Given you have received a range of IDs of invalidated objects, now it is time to get what was invalidated and what respective actions need to happen in your database to keep it up-to-date.
For this, you will use endpoint /invalidations, which query you can affect by the following parameters:
fromId
- mandatory parameter that specifies from which ID of an invalidated object to return the datatoId
- optional field that specifies to which ID of an invalidated object to return the datalevel
- optional parameter specifying which entity and of what level you want the information from. Providing this parameter, you will know what to do with the objects returned. If not used, you will get all levels and entities contained within the objectpageSize
- required parameter to describe the number of returned IDs or handles in the response (maximum is 500,000)lastItemId
- optional parameter in case the number of returned objects is greater than the pageSizeNote that the API invalidations can be called and parsed as a whole for objects with different attributes for type and level. Let's have a look at the results of this query:/v6/invalidations?fromId=78757&toId=78798&pageSize=10
Which would look like this:
{ "data": { "currentTime": "2018-06-09T10:00:00Z", "batches": [ { "type": "shop", "level": "shallow", "ids": ["MmPdedgnvjXiRZnBJzJQKb","MmPdedgnvjXiRZnBJzJQKx"] }, { "type": "shop", "level": "deep", "ids": ["MmPdedgnvjXiRZnBJzJQKb","dSgWnvCCjXiRZnBJVzJQKx"] }, { "type": "shop", "level": "solved", "ids": ["!BzjVOpqibSzhZq7JJ8VNWT","!Bn44ohYfep5O6B8vLJyRxX"] }, { "type": "shop", "level": "transfer-deep", "ids": ["jXJzJMdediRZnBgnvQKbmP," "DfevFennBQKajXiRZxsJzJ"] }, { "type": "shop", "level": "transfer-solved", "ids": ["!qibSVOpq7VzhZBzjNWTJJ8","!5O68pBn44RxXBvLJyohYfe"] } ] }, "paging": { "lastItemId": 78790, "lastPage": false, "pageSize": 10 } }
Notice that you receive information about all invalidated objects of all levels and types in one response. If it is convenient for you to proceed level by level, or even if you are not interested in some levels at all, you may request only some types/levels.
After receiving the respective IDs, follow these steps to retrieve updated data based on different levels of invalidation:
shallow
+ type: merchant
: Obtain merchant IDs that have been invalidated and call the endpoint/merchants/{id}
for updates.shallow
+ type: shop
: Retrieve shop IDs that have been invalidated and call the endpoint /shops/{id}
for updates.deep
: Obtain transaction IDs that have been invalidated and call the /shops/findByCardTransaction
endpoint for updates.solved
: Retrieve transaction IDs that have been invalidated and call the /shops/findByCardTransaction
endpoint for updates.transfer-deep
: Obtain transfer IDs that have been invalidated and call the respective transfer endpoint (eg.,/shops/findByBankTransfer/sepa
) for updates.transfer-solved
: Retrieve transfer IDs that have been invalidated and call the respective transfer endpoint (eg.,/shops/findByBankTransfer/sepa
) for updates.Before we dive into the specifics, let's get familiar with pagination - depending on the size of the batch returned, you may not always get all invalidated objects at once. You always have all the necessary attributes to get all the invalidated objects that are paginated, namely:
lastItemId
- points to the last object in the response. This pointer is used in the next call as a parameter if the attribute lastPage is false.lastPage
- this attribute tells you whether you have reached the last page with the call you have made. If the value is false, you should call this endpoint again using lastItemId as a parameter.To give you an example, let's use this query: /v6/invalidations?fromId=78757&toId=79999&pageSize=2
For which the response would be this:
{ "data": { "currentTime": "2021-03-22T12:56:17Z", "batches": [ { "type": "shop", "level": "deep", "ids": ["qdeYYaLeRdlSV189dPpv54", "9Z52RlVPlGmUGJZebBKlr8"] } ] }, "paging": { "lastItemId": 78759, "lastPage": false, "pageSize": 2 } }
As you can see, the results returned are not all the results; thus, the pagination attribute lastPage
equals false
. That means you should make another call using the attribute fromId
with the value lastItemId
as a parameter in the next query, such as the following: /v6/invalidations?fromId=78759&toId=79999&pageSize=2
Repeat this until you get all the invalidated objects, which you can tell by the attribute lastPage
equals
true.
Please make sure to store the value of the last received lastItemId
, as you will need it for future invalidation when new data changes.
After implementation, end-to-end testing with assistance from Dateio should have proceeded.
Implementing invalidations is often seen among incumbent banks and teams with higher levels of expertise and capability in IT. On the other hand, cache expiry tends to be more common among fintechs and teams with limited capacity or lower levels of expertise. However, it's important to note that the choice of mechanism should align with the specific needs and capabilities of your team and project, and there are strategies available to optimize both approaches regardless of the level of expertise.