Tue Jan 5 09:59:45 EST 2016
This document describes an algorithm for classifying both customers, products and predicting preferences for products using a unique layered Naive Bayes probability model. We first use a probability model to assign classifiers to products, we assign classifiers to products to overcome some unique problems brought about by the geographic isolation of choice sets in travel data. We then use these classifiers to create additional Bayesian probability models that can be used to define market segmentation and provide excellent predictive power even given limited amounts of individual level data.
Intelligent customer and product differentiation are the primary drivers to many successful and future-oriented business of today. Large online services like Google and Amazon have clearly made a name for themselves by gathering and successfully using customer-supplied data to enhance and extend the customer experience in new and exciting ways. Even mainline brick-and-mortar businesses gather large amounts of customer data and then use that data to target particular customers with particular products lowering marketing costs and increasing customer satisfaction. The travel industry has to this date been far less successful in providing this level of service and satisfaction to its customers. The reason for this shortcoming is simple: insufficient individual-level data. With more traditional goods, the frequency people make choices amongst products is much higher. Even frequent travelers may make only four or five choices per year, and this is just not enough data required for traditional customer differentiation techniques. Furthermore, even when they do make choices over time, they are often making choices in different markets, thus making it difficult to draw connections between and inferences from choices made over time without additional work in the domain.
Our eventual solution will involve using traditional probability models (Naive Bayes), to classify products into qualitative categories, and then to use an additional Bayesian probability model to classify customers and anticipate the choices and needs of customers based upon this classification model. of customers who buy similar things and hence might belong to a particular marketing category (luxury adventure traveler, budget adventure spiritual traveler, etc).
First, however, we shall look at some attempts to solve this problem, what each approach brings to the table, and how they fail to solve the problem completely.
Some attempts to overcome these data limitations have been partially successful. We will discuss several of these: demographic inference, self-reporting, secondary market indicators, and statistical inference and probability models.
In some markets, you can often infer significant amounts of information about customers based on simple factors such as where they live, race, gender and income. The first and most obvious limitation is that some customers are not willing to give this information, additionally, it is obviously difficult to get a picture of an individual customer given only this information. You can narrow choices based on this information, but it is by definition a statistical broad brush. A further limitation of this technique is that travel is an interesting product in that it is often not isomorphic with demographics. Travel is often calculated on a long-term basis. A family of modest means may save for ten years to go on the ‘trip of a lifetime’, and their neighbor of similar means may travel very modestly every year.
The first set of individual-level attempts was to rely on self-reporting, e.g. tell us if you are a luxury or budget traveler, adventure or leisure, etc. While this self-reported data can be very useful, and we will be using it in other parts of ATLAS, it is not useful alone because these terms mean very different things to people from different socioeconomic strata. For instance, to someone from a poor background, moving from a Motel 6 to a Holiday Inn might be considered a luxury travel move, and a Marriott out of reach. Whereas to a wealthy customer, the the bottom of the range might be a Marriott and the Park Plaza or the Burj al Arab the top of the range. Combined with demographic data, an algorithm based on these factors can be a start but it is at best a blunt instrument.
A third and more successful attempt to overcome this limitation has been to estimate a set of travel options based on purchases in other areas. If you know what types of products people buy in other contexts, you might be able to infer something about their travel choices: the type of car they buy, they type of house they own, etc, combined with demographic and other data, one can come closer to a real picture of a customer, properly classify them and, ipso facto, market to them. This data, however, is difficult to come by. It is expensive and closely guarded by its proprietors. It is, to be honest, a natural advantage for larger companies such as Google, Facebook or Amazon who have this data (or proxies to it in the case of Facebook and Google), and they would be giving up competitive advantage by making this data easily available to others. In summary, if this data was available, then it does help form a more complete picture of customer, especially when combined with easier to obtain self-reported and demographic data, but startups without easy access to this data must look to more clever solutions to this problem.
The most promising of the previous attempts to solve this problem is to take all the decisions made by other customers, and then given a single choice made by a customer attempt to infer other choices they might like. These algorithms are used extensively by companies like Amazon where they can alert customers who buy product A that similar customers purchased product B. This is based on simple statistical inference using simple models like Naive Bayes in a data-dense but conceptually simple product space, i.e. the product space is continuous in the sense if someone buys product A, there is no reason they could not at the same time buy any other product in any other product category, and so each product can be statistically and directly related to any other product no matter how different those products are.
The travel industry is not like this, customer behavior is just as varied, but the product space is not continuous because it is place specific. This does not, as we shall see, mean we cannot make inferences from it, but it does mean that we can not make simple inferences from product-to-product comparisons, per se. For instance, a person who books the Park Plaza in New York might indeed like the W in Austin or the Palmer house in Chicago, etc., but unless we have lots of choices made by lots of customers over time, we cannot make these inferences directly using product-by-product probability models; we can only make inferences, as we did above, imprecisely and intuitively. The data of most travel startups is just too shallow to use product-to-product probability models. Therefore, without additional innovation we cannot use these techniques out of the box.
The two specific problems that must be solved are as follows: lack of data and lack of depth in the data, and the problem of a disjointed product space because of geographical separation in the travel industry. Once these issues are solved, standard probability distribution algorithms can be used along with travel context to yield multiple levels of individual-focused recommendations, and, as a side-benefit, begin to create a complex and sophisticated categorization of each customer as a traveler.
We will attempt to solve our data quality issues sociologically rather than mathematically by providing compelling user interfaces for managing and sharing travel memories as well as tools to plan future trips and manage current trips regardless of where they were booked.
The problem of the disjointed product space will be solved in several stages. First, domain experts at souljourn will specify qualitatively meaningful categories, call these D. Examples of categories would be luxury, budget, adventure, cultural, high culture, pop culture, religious, spiritual. These may also be grouped into sibling and/or hierarchy relationships, though they need not be.
We will then use simple Bayesian probability models to estimate how likely specific travel products (e.g. locations, hotels, restaurants, events, sights, cultural institutions) fit into each category. Call the array of product probability vectors P.
Any customer activity at all, including that in the memories section, will immediately start being used to assess the probability a given customer falls into a given category D. We will call this array of probability vectors C.
Once we have an initial assessment of product and customer prior probabilities, we will adjust these every time a new piece of data is added to the system.
Once we have an initial guess of categories for products and customers (P and C) we can predict additional choices by examining actual choices by customers using an additional Bayesian probability matrix. If we know the categories of products bought by a customer in the past, we can attempt to predict what categories that customer will make in the different contexts involving the same categories. For instance, if a customer prefers luxury products in city A, then if he or she is traveling to city B then it stands to reason he or she will prefer luxury products there as well.
Obviously, our eventual goal is to base our predictions on travel actually booked and/or managed by our software. In the early days of a new product, however, it is difficult to gather enough data to provide robust predictions, and, without robust predictions why would someone use our service over more established competitors. Direct customer data will of course be weighted the highest but we will also provide two sociological innovations that will help us collect information about customers at early stages.
The first attempt to solve the problem is actually sociological -- travel memories. We provide tools on both the web and the app to allow users to document past trips they have taken, allowing them to build special image galleries, link historical notes, make recommendations about previous travel to their social networks. While this increases the attractiveness and usability of the app and site in general, it also provides a free and easy way for us to collect data about previous choices made without requiring any trust-based information like credit cards or booking numbers.
The second set of tools we have for gathering information about customer preferences is a travel planning tool.
This functionality has two levels.
The first level is a wish list of places the user would like to go which can easily be converted to the more detailed levels described below.
The second is data about places they are going to go soon. This will allow them to store information about research into particular places, take notes about those resources, see attractions, restaurants, etc for those places, and, create a wish list of things they would like to see once they got there. This can easily transition to trip management described below, of course.
We will also provide another sociology-inspired tool through trip management. This functionality will allow all users to manage their trips even if those trips were booked through other booking outlets. At the early stages of this project, we need information about customers rather than booking fees. We will provide compelling tools that give customers an incentive to volunteer their travel information and thus enhance our dataset without requiring any extremely sensitive information.
Call real preferences expressed by users π (pi), and choices made managing a trip M, call detailed planning choices S (for soon) and wishlist items W.
The weights will be subject to the following rule:
1 ≥ π ≥ M > S > W ≥ 0
In other words, trips being managed or booked by the app (π and M) are weighted so they more directly impact the categorization of both the customer and products. Detailed planning has an impact but less than 'real' trip management, and wish list is weighted less (often very much less) than any other value.
The second major problem to be solved is as follows: once we have data in the system, we are still unable to provide clear predictions unless we have massive amounts of data using conventional means? Why? Because traditional one-step probability models look at a given product, and, if a customer chooses that product what is the likelihood they will choose a different product. This works fine in traditional product domains because the choice set is, ceteris paribus, the same for all customers. In travel, this is not the case. If customer A visits Dallas, and customer B visits Dallas, they make exactly the same choices of hotel, restaurant, airline section, etc. At this point a traditional algorithm with no additional processing steps, say persons who enjoyed u, v and w also enjoyed x, y and z. In travel, however, person A may next be going to Washington, DC and person B may be headed to Chicago. x, y and z, even though the only things we know about A and B are identical, are different for A and for B because their choice sets are radically restricted by location.
In order to use traditional probability distribution algorithms we must solve this problem of multiple product spaces. We do so by preprocessing the data we have using traditional probability models not to predict behaviour, but to find similar products in different locations, viz. to apply categories supplied by domain experts to those products. Once products are assigned to categories using the probability distribution, we can examine them in an additional probability model that predicts customer choices based upon the preferred categories of the customer identified by the model. By providing reliable predictions based on product type, we have ipso facto normalized our product space. In this way, we can provide good choice sets in each new location based on minimal information and increasingly better choice sets as choices are made by the user.
The specific steps of the algorithm are as follows:
Domain experts will define categories that will be used to classify both products and the customers that use them. Examples of these might be Luxury, Only-the-richest, Economy, Barebones. Other examples might be festive/party, relaxation, religious, spirtual,cultural, gay cultural, black cultural, latino cultural, black gay cultural, etc.
These product attributes or categories may be hierarchical but need not be. The latter categories are qualitatively hierarchical, and if you define them as hierarchy terms, then matches for any of the cultural will aggregate up accordingly.
These must be fixed for any given assessment but can be easily changed before the next assessement begines.
We begin by calculating a typical product-to-product probability matrix, that is, if a customer buys A, what is the probability he or she also bought B, C, ... Z. We only calculate the above vector for products that have been explicitly categorized or an automatic categorization has been confirmed by customer action. Domain experts and their trusted delegates start the process by manually assigning categories to particular products. For those categorized products, we then, very naively I might add, presume that any p(i) in p(B)...p(Z) that meets a threshold value also belong to that category. We assign an inital probability p′(i) this is correct based on a normalized value of P(i).
Early on, we have a panel of domain experts and trusted volunteers assess the predictions are correct and adjust the p′(i) accordingly. At this point we are still in the limited-predictions phase of the project where we are allowing travel memories, trip planning and trip management only. Any predictions made using this data should be assessed against a probabilistic assessment of choices made by customers.
It should also be noted that all unconfirmed values of p′ may end up in a category-based selection list to which they could belong. A mechanism to indicate "this thing is not like the others" will be provided to trusted users. For regular users,if this value is chosen at least as often as chance would predict, then it is probably a proper member of the group.
Periodic review by domain experts or trusted users will also be carried out.
At the end of this process, given user interaction, our classifications and predictive power will get better over time and we should end up with a complete categorization of all products
Once we have normalized the place-specific product space into a non-place-specific product-category space, then we can trivially calculate probability models for new locations. Given past choices we can model which categories the current customer prefers, recommend items based on those categories to him or her, and adjust negatively or positively the model based on the actual choices made in context.
In order to update the model, we present predictions made by the current state of the model. We then adjust the probability up or down based upon whether or not the item is chosen at least as much as chance would indicate. If it is never chosen, or chosen significantly less than chance, then it either does not belong in the choice set, or, alternatively it is simply a bad product. Regardless, we should this product aside for review by a domain expert or delegate.
The math used here is straightforward and well-documented. It is not original. What is original is the application of a multilayered bayesian analysis to reduce the problem of place-specific distinct choice sets unique to the travel industry. Once this issue is solved, then we are able to make a large variety of predictions reliably and with minimal calculation.
Simply stated, Naive Bayes classifiers calculate a conditional probability model of things we do not know based upon things that we do know given this new evidence.
P(A|B) = ( P(B|A) * P(A) ) / P(B)
We want to know the probability of A happening given B. We want to know if a person selects product B, what is the likelihood they will also like product A. The probability B happening given A is trivial. We know how many choices a customer has made and how many of those were B. P(A) is also trivial in that we know how many times A has been made in other contexts. We can also calculate P(B) but since this will be the same for every A (A is what we care about) then we can ignore this value altogether and focus on
P(A|B) = ( P(B|A) * P(A) )
There are many excellent sources out there document the Naive Bayes algorithm, but many in the field rely heavily on a 2005 article from Dr Dobb's. The Wikipedia article is also a good place to start.
At the end of this product and customer modeling, we will have a highly differentiated picture of any group of customers that interact with our software. The usefulness and value of this type of information is varied:
We have used this discussion to focus on simple products like hotels and restaurants, but it could easily be extended to things like planning steps. In other words, is there evidence that the steps required to make a certain type of trip happen requires different preparation. If you are luxury traveler, you might require a suit to be packed in addition to your Bermudas. If you are a adventurer or hunter, you might require licenses and lotteries. We won't really know till we start seeing use cases.
It could also be used predict whether people are interested in highly profitable room upgrades or hotel services like the spa or conference rooms.
It could also be used to predict which customers might be interested in expensive subscriptions that came with a concierge and/or a personal booking agent, or, alternatively, a specialty booking agent associated with certain types of trips (religious, cultural, adrenaline!).
We can also use this same technique to model not only the buying habits of customers, but also their likelihood to spend by combining frequency of travel with price point data of documented items.
Once the model is fully trained and we start having accurate customer classifications, we could easily extend the Bayesian model to use a simple geo-spatial model like k-means clustering to identify groups of customers that might be of interest to hotel chains, airlines, tourism boards, etc.
Given a simple clustering extension like that discussed earlier, we could easily create specific versions of an app targeted to specific customers. This could be useful with the discussion below of cheap non-prime-time local ad buys or podcasts where we preload a virtual customer likely to watch a given television program or listen to a specific podcast. Since predictions are made based on categories and place, then this could also be customized to be locale-specific as well.
Furthermore, if our domain experts associate specific versions of the app with
particular television segments, you could buy up much cheaper ad space in
non-prime-time local markets. This, of course, would not be appropriate for an
entire ad buy, but if you could refine the categorization of customers cheaply
and correctly, then targeting similar customers with small targeted ad buys
could greatly reduce the amount of money required to market to specific customers.
If the customer differentiation model identifies a specific audience that can be associated with a podcast then your ad buy becomes even cheaper and more targeted. Domain experts would be required to know which podcasts to sell to, but having a highly differentiated customer base means it would be worth it to figure out who are your targets and then to find them in a way that immediately draws them to your product.
We could easily extend this model to create two or more models per customer based upon different behavioral profiles like business or personal travel, alone or family, couple or family. The calculations are trivial given modern equipment so we are able to differentiate not infinitely but significantly and flexibly if new needs arise.
I would think this would be of great interest to city tourist boards who waste a significant amount of money with large non-specific appeals when they could offer more targeted marketing to the precise needs of potential customers.
It would be extremely cost prohibitive to do a tv-spot directed at specific demographics for specific customer groups, but if you had a generic spot, broadcast at a specific time, with a distinct activation keyword for the app that activated the described-above customization, then this could easily be fit into an ad-buys budget saving time and money over the short and long term.
TODO needs summary