Data update mechanism

Before you begin with this guide, you should have the basic knowledge from Getting started with Tapix tutorial.

In this guide, you will learn how to keep your data up-to-date to provide you and your customers the most value. You will familiarize yourself with:

Why keeping data up-to-date matters in payment processing
Mechanisms for data update and implementations
- How to implement cache expiry
- How to implement invalidations

Why keep data up-to-date

Firstly, it is important to understand why it is necessary to keep data up-to-date and reasons why data can become outdated.

Tapix provides you with three types of information for each enriched transaction:

The link between a raw transaction and a shop (= a handle)
Shop attributes
Merchant attributes

After a raw transaction is enriched, the results are stored. This information stays "frozen in time". However, the information may require changing in time due to several reasons:

Update/addition of an attribute – logo, address, shop tag, merchant category attributes are added
Identification/correction of the past transaction (unknown terminal becomes correctly enriched, fix of error) – new link between a raw transaction and a shop is established

For example - to understand the extent of changes - 5% of logos are changed annually.

Mechanisms for data update

To correctly understand the extent of changes in data, it is crucial to know the term transaction mapping. Transaction mapping involves establishing connections between transactions and their associated shop identifiers (UIDs or handles), ensuring accurate linkage and representation of transactional data within the system.

There are three options for keeping data up-to-date, each with its own advantages and disadvantages outlined below.

Cache expiry
Invalidations – Partial implementation
Invalidations – Full implementation

Solution	Keeps shops' and merchants' attributes up-to-date	Ensures that past transactions are mapped to the correct shop	API calls	Implementation requirements
Cache expiry	✓	✗	Endpoints `/shops/{id}` and `/merchants/{id}` are called for all database objects, resulting in thousands of redundant API calls.	No specific implementation is required.
Partial invalidations	✓	✗	Endpoints `/shops/{id}` and `/merchants/{id}` are called only for invalidated shops/merchants.	Simple, partial implementation of /invalidations endpoint
Full invalidations	✓	✓	Endpoints `/shops/{id}` and `/merchants/{id}` are called only for invalidated shops/merchants. Moreover, use the respective `/findBy` endpoint for invalidated transactions.	More complex implementation of `/invalidations` endpoint due to the need to handle all types of data changes

Note that if you utilize the complete response endpoint /shops/complete/findByCardTransaction, none of these scenarios are applicable, as they do not align with the intended functionality provided by this endpoint.

Solution 1: Cache expiry

Initial enrichment setup: Every transaction is sent to the API, and links between shops and transactions and merchant and shop objects are established and stored in the database indefinitely.

How it works

Regular data reloading
- After a specific time period, stored merchant and shop objects are reloaded from the API. This reloading can occur after a certain period (e.g. one week) or for all objects at a specific time (e.g., every Saturday night).

The link between a shop and a transaction (the handle) remains unchanged during this process. This applies to past transactions; new transactions will already have the correct shop UID.

Cache expiry can handle changes such as updating or adding new merchant logos, but it will not reflect previously unresolved transactions.

Implementation of cache expiry

If you choose not to implement invalidations, there is a primary method to keep your current data relatively up-to-date. For the endpoints /shops/{id} and /merchants/{id}, you can cache the data for a short period, such as one week. After this period, overwrite all cached data about shops and merchants by calling the/shops/{id} and /merchants/{id} endpoints again.

Please note that calls for endpoints /shops/{id} and /merchants/{id} are not invoiced.

Solution 2: Implement invalidations – partially

Invalidations in the Tapix ecosystem allow keeping data up-to-date. For example, we might add new attributes like a category or logo to a specific shop or merchant that you have already linked to transactions or transfers. Additionally, we may identify transactions that were previously unrecognized. In the first step, you can use this functionality to update shops and merchants (similar to cache expiry) more efficiently by updating only the objects marked as changed by the Invalidations API.

To reduce traffic and focus on current data, you can invalidate only recent payment data (e.g., from the past 1-2 years).

Initial enrichment setup: Every transaction is sent to the API, and links between shops and transactions and merchant and shop objects are established and stored in the database indefinitely.

How it works

Invalidations endpoint
- The endpoint /invalidations provides objects that need to be reloaded from the API.
- Only "shallow" invalidations are considered, meaning only shops and merchants are reloaded.

This approach updates the shop and merchant data without addressing deeper transaction links

Solution 3: Implement invalidations – fully

When invalidations are implemented fully, any part of the provided data can be changed, offering an ultimate approach to keeping the data up-to-date.

Initial enrichment setup: Every transaction is sent to the API, and links between shops and transactions and merchant and shop objects are established and stored in the database indefinitely.

How it works

Invalidations endpoint
- The endpoint /invalidations provides objects that need to be reloaded from the API.
- This includes handles to raw transactions that need to be retrieved again via the API.
This approach addresses all levels of invalidations, meaning "shallow," "deep" (or "transfer-deep"), and "solved (or "transfer-solved") invalidations, ensuring that all types of data changes are handled.

Invalidations inform you of changes to data you've already received, such as new attributes (address, coordinates, logo) added to shops or merchants you've linked to transactions or newly recognized transactions that were previously unrecognized.

We recommend implementing /invalidations if you want to ensure that all your historical data is consistently updated with the latest information. However, if maintaining historical data accuracy is not essential or is unwanted for your use case, you may choose not to implement them. To reduce traffic and focus on current data, it is important to set a time window, such as 6 or 12 months, within which you want to keep transactions up to date. This approach minimizes the impact of full invalidations, which can cause a significant increase in traffic. Alternatively, if you need to invalidate all historical payment data, you have the flexibility to do so.

Implementation of invalidations

Now, we move to the implementation of invalidations. Unlike cache expiry, there is an additional logic implementation. Both types of invalidations are implemented similarly. The difference is that only"shallow" invalidations are considered in partial implementation of invalidations. Here's the implementation overview:

Terminology
Understand the concept of invalidations API calls
Endpoints
Technical steps to invalidations
Understanding cases

Invalidations work on multiple levels for different entities. Within invalidations, we provide you with two primary attributes - level and type. Let's look at these more closely.

Levels = refer to different degrees or depths of changes that have occurred within the system.

Level shallow
- Refers to changes where stored information needs to be overwritten for existing records for shops or merchants
Level deep or (or transfer-deep)
- Involves more substantial changes, the link between transaction and shop UID changes
- When encountering a level deep update, the system needs to delete the existing mapping of transactions to their corresponding shop UID and handle and then remap them to the updated records.
Level solved or (or transfer-solved)
- Similar to deep, the link between transaction and shop UID occurs
- This level pertains to previously unsolved records that have now been resolved, meaning that the system now has the necessary information to provide
- When dealing with a level solved update, the system must delete the mapping of transactions or payments that were previously unsolved and then remap them to the updated records

Please be aware that the transfer-deep and transfer-solved levels specifically pertain to bank transfers, not card transactions.

Types = refer to different categories or classifications of entities that a change within the system has impacted.

Types delineate distinct groups of entities affected by changes, outlining the actions required to maintain data integrity and relevance.

Type shop
Type merchant

	Definition	Mechanism	Theoretical change	Example	What needs to be done
Remap	changing the link between a raw transaction and a shop	invalidation level: `deep` + type: `shop` (level:`transfer-deep` + type: `shop`)	the same transaction gets a different shop UID		update the existing mapping of transactions to their corresponding shop UID or handle and then remap them to the updated records
Remap	changing the link between a raw transaction and a shop	invalidation level: `solved` + type: `shop` (level:`transfer-solved` + type: `shop`)	the same transaction gets a different shop UID
Update	Fixing or adding attributes to a shop or merchant	Invalidation level: `shallow` + type: `merchant`	Edit of merchant attributes	add/change logo	Overwrite the existing records for merchants
		Invalidation level: `shallow` + type: `merchant`	Edit of merchant attributes	edit the merchant's name	Overwrite the existing records for merchants
		Invalidation level: `shallow` + type: `shop`	Edit of shop attributes	add/ change/ delete tags	Overwrite the existing records for shops
				add/change category
				edit address
				add/edit URL address
				add/change Google Place ID
				add/change shop type

In case a shop or merchant is deleted from our database, the UID is included in shallow invalidations. If you call this UID and receive a 404 response, it means the shop or merchant has been deleted. You should then remove the corresponding entry from your system.

How it works – 2-step API call

The whole logic behind invalidations has two steps:

Getting the range of invalidated objects
Getting invalidated objects

Each of these happens on a different endpoint

The whole logic of invalidations is done with two endpoints:

/invalidations/item/range – this endpoint is used only once to initiate the process. The result contains the numeric ID of the first and the last invalidated item within the requested period of time. These values are expected to be used as the parameters in /invalidations service to get a list of all invalidated items within the given numeric ID.
/invalidations – returns UIDs of shops, merchants, or handles, within specified invalidation object ID, which were affected by changes and are ready to update or be discarded (typically since the last time you called endpoint /invalidations). This mechanism works best when used regularly, ideally every day.

When calling these endpoints, set the parameter refresh = true to inform Tapix that you are retrieving existing transactions again, and they do not represent new transactions in your system. These API calls are not invoiced.

Getting the range of invalidated objects

For this, you will use /invalidations/item/range. A sample query may look something like this: /v6/invalidations/item/range?from=2018-06-08T10:15:08Z&to=2018-06-10T10:15:08Z

Note that you do have to add a parameter from, which is mandatory, unlike the parameter to, which is optional. That is thanks to the fact you will most likely call for all invalidated objects up to the moment of the call.

The best practice when using the from parameter is to set its value to the exact time you obtained the first data from Tapix. The response for such a call would look like this:

{ 
    "fromId": 78757, 
    "toId": 78945, 
    "itemCount": 188 
}

The response contains a range of IDs of the invalidated objects and their total count. The range of IDs will be used in the following section to get information about transactions, transfers, shops, and merchants, which data needs to be updated or discarded.

Getting invalidated objects

Given you have received a range of IDs of invalidated objects, now it is time to get what was invalidated and what respective actions need to happen in your database to keep it up-to-date.

For this, you will use endpoint /invalidations, which query you can affect by the following parameters:

fromId - mandatory parameter that specifies from which ID of an invalidated object to return the data
toId - optional field that specifies to which ID of an invalidated object to return the data
level - optional parameter specifying which entity and of what level you want the information from. Providing this parameter, you will know what to do with the objects returned. If not used, you will get all levels and entities contained within the object
pageSize - required parameter to describe the number of returned IDs or handles in the response (maximum is 500,000)
lastItemId - optional parameter in case the number of returned objects is greater than the pageSize

Note that the API invalidations can be called and parsed as a whole for objects with different attributes for type and level. Let's have a look at the results of this query:/v6/invalidations?fromId=78757&toId=78798&pageSize=10

Which would look like this:

{ 
    "data": { 
        "currentTime": "2018-06-09T10:00:00Z", 
        "batches": [ 
            { 
                "type": "shop", 
                "level": "shallow", 
                "ids": ["MmPdedgnvjXiRZnBJzJQKb","MmPdedgnvjXiRZnBJzJQKx"] 
            },  
            { 
                "type": "shop", 
                "level": "deep", 
                "ids": ["MmPdedgnvjXiRZnBJzJQKb","dSgWnvCCjXiRZnBJVzJQKx"] 
            },  
            { 
                "type": "shop", 
                "level": "solved", 
                "ids": ["!BzjVOpqibSzhZq7JJ8VNWT","!Bn44ohYfep5O6B8vLJyRxX"] 
            },  
            { 
                "type": "shop", 
                "level": "transfer-deep", 
                "ids": ["jXJzJMdediRZnBgnvQKbmP," "DfevFennBQKajXiRZxsJzJ"] 
            },  
            { 
                "type": "shop", 
                "level": "transfer-solved", 
                "ids": ["!qibSVOpq7VzhZBzjNWTJJ8","!5O68pBn44RxXBvLJyohYfe"] 
            } 
        ] 
    }, 
    "paging": { 
        "lastItemId": 78790, 
        "lastPage": false, 
        "pageSize": 10 
    } 
}

Notice that you receive information about all invalidated objects of all levels and types in one response. If it is convenient for you to proceed level by level, or even if you are not interested in some levels at all, you may request only some types/levels.

Update of invalidated objects

After receiving the respective IDs, follow these steps to retrieve updated data based on different levels of invalidation:

For level: shallow + type: merchant: Obtain merchant IDs that have been invalidated and call the endpoint/merchants/{id} for updates.
For level: shallow + type: shop: Retrieve shop IDs that have been invalidated and call the endpoint /shops/{id} for updates.
For level: deep : Obtain transaction IDs that have been invalidated and call the /shops/findByCardTransaction endpoint for updates.
For level: solved : Retrieve transaction IDs that have been invalidated and call the /shops/findByCardTransaction endpoint for updates.
For level: transfer-deep : Obtain transfer IDs that have been invalidated and call the respective transfer endpoint (eg.,/shops/findByBankTransfer/sepa) for updates.
For level: transfer-solved : Retrieve transfer IDs that have been invalidated and call the respective transfer endpoint (eg.,/shops/findByBankTransfer/sepa) for updates.

Pagination

Before we dive into the specifics, let's get familiar with pagination - depending on the size of the batch returned, you may not always get all invalidated objects at once. You always have all the necessary attributes to get all the invalidated objects that are paginated, namely:

lastItemId - points to the last object in the response. This pointer is used in the next call as a parameter if the attribute lastPage is false.
lastPage - this attribute tells you whether you have reached the last page with the call you have made. If the value is false, you should call this endpoint again using lastItemId as a parameter.

To give you an example, let's use this query: /v6/invalidations?fromId=78757&toId=79999&pageSize=2

For which the response would be this:

{ 
    "data": { 
        "currentTime": "2021-03-22T12:56:17Z", 
        "batches": [ 
            { 
                "type": "shop", 
                "level": "deep", 
                "ids": ["qdeYYaLeRdlSV189dPpv54", "9Z52RlVPlGmUGJZebBKlr8"] 
            } 
        ] 
    }, 
    "paging": { 
        "lastItemId": 78759, 
        "lastPage": false, 
        "pageSize": 2 
    } 
}

As you can see, the results returned are not all the results; thus, the pagination attribute lastPage equals false. That means you should make another call using the attribute fromId with the value lastItemId as a parameter in the next query, such as the following: /v6/invalidations?fromId=78759&toId=79999&pageSize=2

Repeat this until you get all the invalidated objects, which you can tell by the attribute lastPage equals true.

Please make sure to store the value of the last received lastItemId, as you will need it for future invalidation when new data changes.

After implementation, end-to-end testing with assistance from Dateio should have proceeded.

Recommendation

Implementing invalidations is often seen among incumbent banks and teams with higher levels of expertise and capability in IT. On the other hand, cache expiry tends to be more common among fintechs and teams with limited capacity or lower levels of expertise. However, it's important to note that the choice of mechanism should align with the specific needs and capabilities of your team and project, and there are strategies available to optimize both approaches regardless of the level of expertise.