Refining search results

If you work with disparate data within a common business context and need to increase the awareness and understanding of the data, you have probably already faced the challenge of how raw data can be refined based on complex rules involving processing of different data attributes.

The process can be split in two parts – search query returning data and a process of identifying the subsets of data that are most relevant to your needs.

BitCurb Crunch Engine fits in the second part of the process – narrowing down, filtering raw data into subset of data that serves specific business needs and requirements.

Search results (or in generally data sets) refinement problem could be found in many industries like – shopping search engines, event tickets, real estate, travel bookings, bank transactions grouping and processing.

Let’s review how BitCurb Crunch Engine could be helpful to refine (programmatically or through MS Excel) shopping engine results for mobile phones searches. You can relate the same approach to your data processing business.

Product comparison and search result refinement

Comparison shopping engines give ecommerce merchants the opportunity to attract new customers, increase sales, and go head-to-head against the competition.

They usually collect product information, including pricing, from participating retailers and then display that collective information on a single results page in response to a shopper’s search query. Shoppers can compare each retailer’s price, shipping options, and service on a single page and choose the merchant that offers the best overall value. Often the overall value is a combination of many factors like price, product features, product availability and even shipping terms. Search refiners can help and rectify the need of more enhanced search engine, but usually they are somewhat limited to a number of properties due the way data is collected before comparing starts. Refiners usually cannot apply complex logical expressions over a sub-content of a number of product properties (features).

BitCurb Crunch Engine is data source agnostic, because it works with data in JSON format only. It is up to the consumer how its data source will be fetched and converted to JSON. Refer BitCurb Crunch Engine API documentation for details.

Since getting the raw data is out of the scope of this article as not being concern of data processing, we used fonoapi  API to fetch data for leading mobile devices into one CSV file.

You can download the source file and code sample how the BitCurb Crunch Engine can be utilized for filtering from our GitHub code samples repository.

Below is a screenshot of the data in the source file that we will be processing:

Based on the data collected we would like to achieve the following objectives:

  • Show/Get all devices that are dual-sims with external slots for memory cards
  • Show/Get all devices that have stand by more than 400 hours and talk time more than 10 hours
  • Show/Get all devices that have more than 8 GB RAM and more than 8 Mp front camera

If you review the file you will note that for some devices some of the properties are empty, which means that information is not available in the original data source. This is often challenging problem to resolve in data processing operations.

The objectives above can be also translated to different target users – many people look for just 2 sim cards phone with external memory slot, other are keen to have best battery life devices and some users are interested in having the ability to shoot high quality pictures and videos.

Let’s dive into the implementation and see how these objectives are fulfilled.

BitCurb Crunch Engine Filtering

The first thing we have to do is to transform human language requirement into BitCurb rules, which have syntax similar to MS Excel. This is by design and aims to make the learning curve of mastering whole new “language” smaller. Yet, if you want to achieve some complex processing operation it may require some time to master the power and capacity of BitCurb Crunch Engine.

Filtering rules don’t type (1:1 or 1:M) or direction ($1 to $2, $1 to $1). Since we are processing single source of data and we are looking to find items (subset from the starting data set) that fulfills our requirements, we are only filtering the data rows. If consumer provides two sets to filter, BitCurb Crunch will run all the filtering rules and will return two sets with data matching the rules definitions. If consumer provides one set only, then the result will be a subset of data matching the rules definitions.

One or more rules can be applied against given data source for filtering purposes.

Now it is time to define the syntax of the rules.

The rules ordered in a way following defied objectives are:

  • $1.status != “Discontinued” AND FIND(“Dual SIM”, $1.sim) >= 1 AND $1.card_slot != “No”

dual-sims with no external slots for memory cards

  • $1.status != “Discontinued” AND ISBLANK($1.stand_by_hours) == FALSE AND ISBLANK($1.talk_time_hours) == FALSE AND TONUMBER($1.stand_by_hours) >= 400 AND TONUMBER($1.talk_time_hours) >= 10

have stand by more than 400 hours and talk time more than 10 hours

  • $1.status != “Discontinued” AND ISBLANK($1.internal_memory_mb) == FALSE AND ISBLANK($1.primary_camera_mp) == FALSE AND TONUMBER($1.internal_memory_mb) >= 8192 AND TONUMBER($1.primary_camera_mp) >= 8

more than 8 GB RAM and more than 8 Mp front camera

You can define the rules syntax from two different places depending on if you are consuming the API endpoint through BitCurb Excel Application or through other consumer. The syntax in both cases would be the same, only the look-end-feel of the rules designer has minor differences. Refer the sections below for more details on this.

For each of the rules we must provide a definition lists of the columns that are part of the rule’s body. The very same columns will be processed runtime by BitCurb Crunch Engine, so we must provide their name and data type in order all formulas expressions to be properly evaluated.

Find below a screenshot for each of the rules.

As you can see the rules may contain different structure rows definition that applies to the context of the rule. It is not necessary to define all the fields from your source that you are trying to reconcile.

Once defined, the rules can be assigned in a template. Order matters, since it defines the sequence in which the rules are executed. You cannot mix filtering with matching rules in one template.

Template ID is the first column in the list of the defined template. It could be used as input parameter for executing the Crunch operation.

API BitCurb

There are two options when you have more than one rule that you want to execute against raw source data.

  • Execute the rule one after another and process the result for every rule in the caller before applying next one. This will involve multiple request-response operations against the BitCurb api endpoint. If this is the case, the consumer should take care of processing result and defining if one item can be part of more than one result group.
  • Execute all the rules at once and return a subset as result in one request-response operation. In this case one item can’t be part in more than one result, which may potentially be the case if you run the rules one at a time.

In order to achieve this, you need to define reusable template in your account, create rules and then attach the rules to this template. Bulk rules execution is currently supported only through referencing template ID in your request.

To be able to work with the API your account must be assigned in licence and your connection settings must be set up in Administration module, Client Profile page.

Please, note that if you are consuming the API through client tool (any tailor developed program or tool), the Client Secret might need to be encoded. That is also valid for your password, when you acquire token. In our code samples available on GitHub, the values for the settings must be provided in the app.config as it is.

After you execute a rule you receive a JSON result which you can consume according your business process.

Refer the documentation of crunch engine for syntax reference.

As you see there are only two devices matching the requirements of all the three rules we defined.

In a tabular representation the result above looks like this:

The raw result is the json object below:


{
   "$1": [
      {
         "Id": 53,
         "CrunchValue": 0,
         "Fields": {
            "DeviceName": "Acer Liquid S1",
            "Brand": "Acer",
            "technology": "GSM / HSPA",
            "gprs": "Yes",
            "edge": "Yes",
            "announced": "2013, June",
            "status": "Available. Released 2013, August",
            "dimensions": "163 x 83 x 9.6 mm (6.42 x 3.27 x 0.38 in)",
            "weight": "195 g (6.88 oz)",
            "sim": "Optional Dual SIM (Micro-SIM, dual stand-by)",
            "size": "5.7 inches (~66.2% screen-to-body ratio)",
            "card_slot": "microSD, 32 GB",
            "stand_by_hours": "450",
            "talk_time_hours": "11",
            "cpu": "Quad-core 1.5 GHz Cortex-A7",
            "internal_memory_mb": "8192",
            "os": "Android OS, v4.2 (Jelly Bean)",
            "primary_camera_mp": "8"
         }
      },
      {
         "Id": 90,
         "CrunchValue": 0,
         "Fields": {
            "DeviceName": "Acer Liquid Z630",
            "Brand": "Acer",
            "technology": "GSM / HSPA / LTE",
            "gprs": "Yes",
            "edge": "Yes",
            "announced": "2015, September",
            "status": "Available. Released 2015, September",
            "dimensions": "156.3 x 77.5 x 8.9 mm (6.15 x 3.05 x 0.35 in)",
            "weight": "165 g (5.82 oz)",
            "sim": "Optional Dual SIM (Micro-SIM, dual stand-by)",
            "size": "5.5 inches (~68.8% screen-to-body ratio)",
            "card_slot": "microSD",
            "stand_by_hours": "1030",
            "talk_time_hours": "22",
            "cpu": "Quad-core 1.3 GHz Cortex-A53",
            "internal_memory_mb": "8192",
            "os": "Android OS, v5.1 (Lollipop)",
            "primary_camera_mp": "8"
         }
      }
   ],
   "$2": []
}

Conclusion

You can use BitCurb engine filtering functionality for grouping items/records based on a complex filtering criteria and formulas. You can execute a number of different conditions (BitCurb rules) one at a time, or simultaneous if you want to make sure group results won’t contain repeating items and one record belongs to not more than one result group.

Adopt in other industries

There is no limit for the industry in which BitCurb Crunch Engine filtering functionality could be applied.

Filtering functionality can fit your data flow as middleware processor, restricting the number of raw data rows based on complex correlations between data’s attributes.

Another useful application of the filtering functionality that we provide could be reporting, using result for enhanced presentation.