Blog

How to Expire Objects in Data Storage

by on May 16, 2019

Expire Objects in Data Storage

This post will show you how to implement a kind of “expiration” for your data objects. The strategy is sufficiently abstract, so it’s applicable to any resource you need to expire, including files, logs, and so on.

Since the database does not have any built-in expiration mechanism, we’ll have to implement it on our own. Fortunately, the task is pretty easy because we have access to Backendless Cloud Code. The idea is to create a Timer, which is going to run for a fixed time period and delete the resources based on your criteria.

Let’s reiterate what data we need to define to implement such a timer:

  • a period for the clean-ups
  • a resource we need to clean up
  • a criteria for deleting the resource

The period can be arbitrary — you choose how often you need to clean the data based on how fast it is added. Probably the optimal option would be once a day.

The resource may be a data object, a data table, a file or folder, or any real or virtual entity you have in your app, not limited to Backendless storage.

The simplest criteria (and often the most suitable) is based on the created field of an object. This way, you only need to perform a bulkDelete API request from the timer with a where clause created before <day_ago_in_milliseconds>, where <day_ago_in_milliseconds> is a number you calculate in your code. For example, in JavaScript you can get it with the expression  Date.now() — 24 * 60 * 60 * 1000.

For our case, let’s make something a bit more complex. Assume we have a table ActivityRecord with a relation to the Users table , so that each user has his own activity history. And we need to store the two latest activity records for each user (you’ll probably need more, but you’ll be able to easily change the number in your code, so for this exact sample we’ll keep things small). This makes our criteria a bit more complicated, but still, it’s possible to implement in just a few minutes. Let’s go through it step by step.

Here’s our ActivityRecord table with some data:

ActivityRecord data

Now go to the Business Logic tab and create a Timer named expireOldActivity with the period set to once a day. We’ll use JavaScript in this example, but you’re free to implement the same in either Java or Codeless.

Creating a new timer

Now click on the created timer to open the coding dropdown:

Timer coding dropdown

All that is left is to implement the logic of deleting the obsolete data. In our case, we need to delete all the data that is after the latest five. Since we have no ordering besides the one the created field sets, for each user we’ll have to perform two requests: the first will retrieve the date of the last element to keep and the second will remove all data older than this date. Now here’s how the code for that would look like:

execute(req){
  return Backendless.Data.of( 'Users' ).find()
    .then( users => users.map(user => user.objectId ) )
    .then( userIds => Promise.all( userIds.map( userId => {
      let queryBuilder = Backendless.DataQueryBuilder.create();
      queryBuilder.setWhereClause( `user.objectId = '${userId}'` )
      queryBuilder.setSortBy( 'created desc' )
      queryBuilder.setOffset( 2 ); // here you can set your own value - how much objects you need to keep
      queryBuilder.setPageSize( 1 ); // no need to retrieve more, we only need the first object to be removed
      return Backendless.Data.of( 'ActivityRecord' ).find( queryBuilder ) // find
        .then( data => {
          if( data.length > 0 ) {
            return Backendless.Data.of( 'ActivityRecord' ).bulkDelete( `created <= ${data[0].created} and user.objectId = '${userId}'`) // delete obsolete
          }
        })
        .catch( function( error ) {
          console.log( "Server reported an error " + error )
        })
    })))
}

The algorithm is as follows:

  1. Retrieve all users and their objectIds
  2. Find the first obsolete record for each user (this would be the 3rd element if we want to keep 2)
  3. Bulk delete all records for a user which are older than the found one

This effectively sums up the code above; all the other is arbitrary code with JS Promises to properly process that in parallel.

Note that to retrieve more than 100 users (which is the default page size), you’ll need to use paging and perform more than one request. In this example, it is avoided for simplicity. Also, you should be aware that when you have a lot of users, the timer’s code may not fit in the execution time limit, so you’ll probably need to implement a kind of segmented user processing with multiple timers.

Leave a Reply