Tue Jan 5 09:59:46 EST 2016
  1. We will describe some sociological methods (travel history/celebration and trip management) that will allow us to collect a lot more data about customers than would normally be possible in a traditional booking application.

  2. We will describe a unique, complex but relatively easy to calculate and train method to classify products (hotels, restaurants, locations) into preset but flexible categories using a Naive Bayes probability model. Our domain experts will define sets of attributes that can be assigned to products. These will be things like luxury, budget, economy, adventure, cultural, relaxation, spiritual, etc. Each product can be associated with more than one classifier, of course.

  3. we will then describe a way to assign attributes to customers based on their interactions with products assigned classifiers in step 1.

  4. we will then describe a k-means clustering algorithm designed to identify customers who prefer certain combinations of products which will allow us to

  5. validate both the customer and product models
  6. better predict what choices will appeal to customers when they move to new locations

describe a method that uses k-means clustering (or any spatial clustering algorithm, really) to try to identify customers who tend to buy products in particular categories determined particular

The problem as stated above is that classifying individual customers into qualitatively meaningful and useful groups is difficult given the infrequency of data involved in travel. We have also noted several attempts to classify customers based on self-reported data, demographics and clustered data. The techniques are too imprecise or require data that startups typically don't have. They also rely on very different product spaces, travel products are place specific which makes them inherently more disjointed and difficult to classify using traditional statistical inference.This section will qualitatively describe how we will solve this problem.

As we discussed above, the most promising of the previous solutions to this problem was statistical inference via probability models, but typically most startups do not have sufficient depth in their datasets for this to be useful. Our solution will attempt to solve two related problems: data depth and multiple product spaces.

The first problem is the more data we have about choices customers have made, the more accurate our probability models will become. However, this becomes a chicken-and-egg problem that if you do not have enough data, you cannot make good predictions, if you cannot make good predictions, then you cannot get additional data to improve those predictions. We try to solve this problem by providing a set of tools that allow people to collect memories about past travel and manage current travel while never actually making any initial (and incorrect) suggestions. Once we have gathered enough data to build a solid customer model, then and only then will we attempt to make predictions and apply machine learning techniques to improve suggestions over time.

The second problem is that even when we have greater depth in our datasets, given the disjointed nature of travel product spaces, it would be very difficult to build probability models based on product-to-product comparisons. This is because in travel, each location is more or less its own product space. In order to predict what customers would like in new product space, i.e. a new location, we have to normalize these disjunct product spaces into a single product space.

This is not an easy task and will require multiple layers of processing. To resolve the multiple product spaces into a single product space, we will use product-to-product comparisons not to make individual predictions but to assert similarity between products so that we might create clusters of related products. Basically, a simple probability model would be used aggregated data to create clusters of products from any customers who have purchases across different markets. We will also create clusters of related products. We then will provide feedback loops to assess and improve the quality of the clusters. Once we have a set of reliable and self-learning/self-improving clusters of products, then we can use these as proxies of a continuous product space so we don’t need to buil


This is a data and subscription play, not a booking fee play. The margins are too low on the booking side to matter at the beginning, and they are also assuredly going to take off if and only if we can make qualitatively better recommendations than larger players.

Clustering with domain knowledge

In addition to the sociological aspects of the solution, our algorithm will combine two well-known techniques in computer science into a unique classification technique. The first is an extremely simple clustering algorithm known as k-means clustering. This algorithm uses means of distance in Euclidean geometric space to separate different products into identifiable groups. It iteratively assesses the distance between points until convergence is achieved on the least distance between sets of points into n clusters where n is any positive integer theoretically but practically this is limited into a few well-defined but not a-priori categories.

The standard algorithm for k-means clustering is defined as the iterative minimization of distance among points

(source wikipedia)

and there are well-defined and easy to calculate iterations to generate random means, and update standard means through the calculation of geometric centers of the last calculated mean.

The first step takes the last known mean, and then partitions by calculating the sum of squared Euclidean distance (simple two-dimensional distance).

(source wikipedia)

and then updating the next mean for the next iteration by calculating the geometric center of the vector of means, i.e. the centroid.

(source wikipedia)


While I include the standard arithmetic specifications of these algorithms, this are not unique and in fact not even typically coded by hand as they are in off-the-shelf statistical libraries. Furthermore, the design of this algorithm is black-boxed in such a way that any spatial clustering algorithm could be used if it provided more accurate results. In fact, one of the unstated innovations is that it would be quite simple to A-B test different clustering algorithms in early days.

Our innovation is in the application of additional techniques to the clusters themselves. One of the most difficult tasks in using clustering algorithms is to determine how many clusters to define in the first place. Through interactive software we will define a variable number of arbitrary clusters, and allow customers to choose the level of granularity they want and provide up- or down-votes as to whether the cluster itself is useful and why. Our domain experts will then use qualitative assessment to determine whether these clusters are representative of real-world entities and, as such, become a part of how we represent the world of travel to our customers.

The second innovation is that we will define multiple cluster sets of the data using different feature sets -- price, cultural properties, service levels, etc. Then, we will allow the user to imply what weights they apply to each feature set using an intuitive narrowing-of-choices user interface. We then combine these weights in a very simple arithmetic fashion to get results that are simple from a user-interface and computational perspective point-of-view but sophisticated from a customer-differentiation point-of-view.

The third innovation is to use shallow choices made by a user as an approximation of a vector of choices that might appeal to a user across categories. So, for instance, suppose we have determined that a person has in the past has chosen a hotel that falls into the luxury-historic cluster, and then we have classified restaurants also into luxury-historic categories. It is not obvious to us that this would always be the case, and it is then an experimental question whether or not persons who like the hotels also like the restaurants. We will use simple statistical correlations via a Naive Bayes model to link customers who like items within a cluster who might also like items of another cluster. We will begin with that assumption, but through user action or inaction we will be able to customize those assumptions for customers in general and tailor the experience to a particular customer as well. We will come to the latter in a moment.

If these calculations and connections between customers are from aggregated data and connections between clusters, then how are we improving the choices made to an individual?

First, given a vector of clusters of hotels, and a vector of clusters of restaurants. For each item in the hotel vector, we calculate the probability a person who chooses a product from there will also like a product in each cluster in the restaurant vector. we show items from the highest probability and then the next highest probabilty, etc, and then track each individual’s preferences amongst clusters over time. If they choose cluster A 90% of the time, then we go with A, if they choose A and B equally we adjust accordingly, and, also, feed back this into the overall probabilties for people who have not specified any preference at all (and shapes of those who have).

Also, k-means was chosen for both technical and substantive reasons. It is easy to calculate and provide reasonable clusters if the data is well-behaved. It was also chosen because it is spatial. And, given that it is spatial, if our user does not like the predicted cluster, and we then have a large set of undifferentiated other clusters from our bayesian model linking clusters, we can then turn to geometric distance, i.e. clusters that are adjacent in space to our first predicted category to then provide a second level of approximation of user preference. Of course this may not work either given the variety of human preference and the shallowness of travel data, but these innovations provide a much higher likelihood of chance at predicting the right values.

In summary, we can move from aggregated data to individual data by modeling connections between vectors of clusters, and, because we are using Bayesian predictors, constantly improving these initial predictors for new users and targeting the individual preferences of users if their individual preferences diverge from the population. In this way, individual choices are constantly updating the presumptions made for new users, new users can tweak the pre-ordained clusters to their preferences, and that feedback can better shape choices made by future users.

You might be wondering at this point why all this emphasis on multiple cluster sets and connections between clusters, and, of course, you would be right to do so. The first reason is that travel predictions are not book predictions. Book predictions are done in a single choice space. So if a person likes a particular book, a company like Amazon can use a single bayesian model to say if a person likes book A, they have a probability of liking book B, a slighly lower probabiility of book B, etc. This focus on individual products is not really possibly in travel since each location is a separate choice space. So what we really need to do is cluster items into as precise a type as possible, and then build a single choice space out of those clusters, and, at that point, we can build bayesian models similar to single-choice-space models can do.

We can, of course, combine these with other techniques described here: self-reported data, demographic data, etc, but in our model, while these are very useful at finding a starting point, they are just and only that, a starting point. This is a data-driven project, and, as such, having a good starting position is important but once started it is far less relevant than the actual choices made by individuals and by all individuals aggregated into clusters.

The math used is also simple. It is a Naive Bayes classifier. Basically, if we take all the choices made by other people, given a choice space, we can calculate what is the prior probability if a person likes A that they will like choice-B, and then choice-C, …. and then choice-i given all the limited information we have about that person. We then make a prediction. Through user interaction we determine if that prediction was correct. Now we have information about that person, and, accordingly given that new information, we adjust all the other prior probabilities accordingly. You can read about the math here: https://en.wikipedia.org/wiki/Naive_Bayes_classifier

implications

Described here in terms of product and customer classification, but these techniques will be useful in other domains such as trip planning travelers require different steps. suit for instance. or reservations to be made months ahead for haute cuisine restaurants, etc.

we can a we can use these products to we can approximate the categories users fall into

The problem of the disjointed product space will be solved in several stages. First, domain experts at souljourn will specify qualitatively meaningful categories, call these D. Examples of categories would be luxury, budget, adventure, cultural, high culture, pop culture, religious, spiritual. These may also be grouped into sibling and/or hierarchy relationships, though they need not be.

We will then use simple Bayesian probability models to estimate how likely specific travel products (e.g. locations, hotels, restaurants, events, sights, cultural institutions) fit into each category. Call the array of product probability vectors P.

Any customer activity at all, including that in the memories section, will immediately start being used to assign the probability the customer falls into a given category D. We will call this array of probability vectors C.

Once we have an initial assessment of product and customer prior probabilities, we will adjust these every time a new piece of data is added to the system.

Once we have an initial guess of categories for products and customers (P and C) we can examine actual choices by customers using a clustering algorithm. The clustering algorithm will examine the categories of products bought by a customer. This will be used as quality control, that is, if the cluster analysis converges to the same categories as the probability analysis, then we can be pretty sure we have the product categories right or at least right enough. Furthermore, we can use the cluster analysis to provide secondary classifications of previously uncategorized products. Call this secondary product classification Q. We will always prefer the predictions based on actual choices (P) but, barring this, being in a well-defined cluster with members of P is pretty good evidence the members of Q are in P. We will not, however, use Q when calculating customer profiles since this would curve-fit our data quality check rendering it meaningless.

  1. We will describe some sociological methods (travel history/celebration and trip management) that will allow us to collect a lot more data about customers than would normally be possible in a traditional booking application.

  2. We will describe a unique, complex but relatively easy to calculate and train method to classify products (hotels, restaurants, locations) into preset but flexible categories using a Naive Bayes probability model. Our domain experts will define sets of attributes that can be assigned to products. These will be things like luxury, budget, economy, adventure, cultural, relaxation, spiritual, etc. Each product can be associated with more than one classifier, of course.

  3. we will then describe a way to assign attributes to customers based on their interactions with products assigned classifiers in step 1.

  4. we will then describe a k-means clustering algorithm designed to identify customers who prefer certain combinations of products which will allow us to

  5. validate both the customer and product models
  6. better predict what choices will appeal to customers when they move to new locations

describe a method that uses k-means clustering (or any spatial clustering algorithm, really) to try to identify customers who tend to buy products in particular categories determined particular

The problem as stated above is that classifying individual customers into qualitatively meaningful and useful groups is difficult given the infrequency of data involved in travel. We have also noted several attempts to classify customers based on self-reported data, demographics and clustered data. The techniques are too imprecise or require data that startups typically don't have. They also rely on very different product spaces, travel products are place specific which makes them inherently more disjointed and difficult to classify using traditional statistical inference.This section will qualitatively describe how we will solve this problem.

As we discussed above, the most promising of the previous solutions to this problem was statistical inference via probability models, but typically most startups do not have sufficient depth in their datasets for this to be useful. Our solution will attempt to solve two related problems: data depth and multiple product spaces.

The first problem is the more data we have about choices customers have made, the more accurate our probability models will become. However, this becomes a chicken-and-egg problem that if you do not have enough data, you cannot make good predictions, if you cannot make good predictions, then you cannot get additional data to improve those predictions. We try to solve this problem by providing a set of tools that allow people to collect memories about past travel and manage current travel while never actually making any initial (and incorrect) suggestions. Once we have gathered enough data to build a solid customer model, then and only then will we attempt to make predictions and apply machine learning techniques to improve suggestions over time.

The second problem is that even when we have greater depth in our datasets, given the disjointed nature of travel product spaces, it would be very difficult to build probability models based on product-to-product comparisons. This is because in travel, each location is more or less its own product space. In order to predict what customers would like in new product space, i.e. a new location, we have to normalize these disjunct product spaces into a single product space.

This is not an easy task and will require multiple layers of processing. To resolve the multiple product spaces into a single product space, we will use product-to-product comparisons not to make individual predictions but to assert similarity between products so that we might create clusters of related products. Basically, a simple probability model would be used aggregated data to create clusters of products from any customers who have purchases across different markets. We will also create clusters of related products. We then will provide feedback loops to assess and improve the quality of the clusters. Once we have a set of reliable and self-learning/self-improving clusters of products, then we can use these as proxies of a continuous product space so we don’t need to buil


This is a data and subscription play, not a booking fee play. The margins are too low on the booking side to matter at the beginning, and they are also assuredly going to take off if and only if we can make qualitatively better recommendations than larger players.

Clustering with domain knowledge

In addition to the sociological aspects of the solution, our algorithm will combine two well-known techniques in computer science into a unique classification technique. The first is an extremely simple clustering algorithm known as k-means clustering. This algorithm uses means of distance in Euclidean geometric space to separate different products into identifiable groups. It iteratively assesses the distance between points until convergence is achieved on the least distance between sets of points into n clusters where n is any positive integer theoretically but practically this is limited into a few well-defined but not a-priori categories.

The standard algorithm for k-means clustering is defined as the iterative minimization of distance among points

(source wikipedia)

and there are well-defined and easy to calculate iterations to generate random means, and update standard means through the calculation of geometric centers of the last calculated mean.

The first step takes the last known mean, and then partitions by calculating the sum of squared Euclidean distance (simple two-dimensional distance).

(source wikipedia)

and then updating the next mean for the next iteration by calculating the geometric center of the vector of means, i.e. the centroid.

(source wikipedia)


While I include the standard arithmetic specifications of these algorithms, this are not unique and in fact not even typically coded by hand as they are in off-the-shelf statistical libraries. Furthermore, the design of this algorithm is black-boxed in such a way that any spatial clustering algorithm could be used if it provided more accurate results. In fact, one of the unstated innovations is that it would be quite simple to A-B test different clustering algorithms in early days.

Our innovation is in the application of additional techniques to the clusters themselves. One of the most difficult tasks in using clustering algorithms is to determine how many clusters to define in the first place. Through interactive software we will define a variable number of arbitrary clusters, and allow customers to choose the level of granularity they want and provide up- or down-votes as to whether the cluster itself is useful and why. Our domain experts will then use qualitative assessment to determine whether these clusters are representative of real-world entities and, as such, become a part of how we represent the world of travel to our customers.

The second innovation is that we will define multiple cluster sets of the data using different feature sets -- price, cultural properties, service levels, etc. Then, we will allow the user to imply what weights they apply to each feature set using an intuitive narrowing-of-choices user interface. We then combine these weights in a very simple arithmetic fashion to get results that are simple from a user-interface and computational perspective point-of-view but sophisticated from a customer-differentiation point-of-view.

The third innovation is to use shallow choices made by a user as an approximation of a vector of choices that might appeal to a user across categories. So, for instance, suppose we have determined that a person has in the past has chosen a hotel that falls into the luxury-historic cluster, and then we have classified restaurants also into luxury-historic categories. It is not obvious to us that this would always be the case, and it is then an experimental question whether or not persons who like the hotels also like the restaurants. We will use simple statistical correlations via a Naive Bayes model to link customers who like items within a cluster who might also like items of another cluster. We will begin with that assumption, but through user action or inaction we will be able to customize those assumptions for customers in general and tailor the experience to a particular customer as well. We will come to the latter in a moment.

If these calculations and connections between customers are from aggregated data and connections between clusters, then how are we improving the choices made to an individual?

First, given a vector of clusters of hotels, and a vector of clusters of restaurants. For each item in the hotel vector, we calculate the probability a person who chooses a product from there will also like a product in each cluster in the restaurant vector. we show items from the highest probability and then the next highest probabilty, etc, and then track each individual’s preferences amongst clusters over time. If they choose cluster A 90% of the time, then we go with A, if they choose A and B equally we adjust accordingly, and, also, feed back this into the overall probabilties for people who have not specified any preference at all (and shapes of those who have).

Also, k-means was chosen for both technical and substantive reasons. It is easy to calculate and provide reasonable clusters if the data is well-behaved. It was also chosen because it is spatial. And, given that it is spatial, if our user does not like the predicted cluster, and we then have a large set of undifferentiated other clusters from our bayesian model linking clusters, we can then turn to geometric distance, i.e. clusters that are adjacent in space to our first predicted category to then provide a second level of approximation of user preference. Of course this may not work either given the variety of human preference and the shallowness of travel data, but these innovations provide a much higher likelihood of chance at predicting the right values.

In summary, we can move from aggregated data to individual data by modeling connections between vectors of clusters, and, because we are using Bayesian predictors, constantly improving these initial predictors for new users and targeting the individual preferences of users if their individual preferences diverge from the population. In this way, individual choices are constantly updating the presumptions made for new users, new users can tweak the pre-ordained clusters to their preferences, and that feedback can better shape choices made by future users.

You might be wondering at this point why all this emphasis on multiple cluster sets and connections between clusters, and, of course, you would be right to do so. The first reason is that travel predictions are not book predictions. Book predictions are done in a single choice space. So if a person likes a particular book, a company like Amazon can use a single bayesian model to say if a person likes book A, they have a probability of liking book B, a slighly lower probabiility of book B, etc. This focus on individual products is not really possibly in travel since each location is a separate choice space. So what we really need to do is cluster items into as precise a type as possible, and then build a single choice space out of those clusters, and, at that point, we can build bayesian models similar to single-choice-space models can do.

We can, of course, combine these with other techniques described here: self-reported data, demographic data, etc, but in our model, while these are very useful at finding a starting point, they are just and only that, a starting point. This is a data-driven project, and, as such, having a good starting position is important but once started it is far less relevant than the actual choices made by individuals and by all individuals aggregated into clusters.

The math used is also simple. It is a Naive Bayes classifier. Basically, if we take all the choices made by other people, given a choice space, we can calculate what is the prior probability if a person likes A that they will like choice-B, and then choice-C, …. and then choice-i given all the limited information we have about that person. We then make a prediction. Through user interaction we determine if that prediction was correct. Now we have information about that person, and, accordingly given that new information, we adjust all the other prior probabilities accordingly. You can read about the math here: https://en.wikipedia.org/wiki/Naive_Bayes_classifier

implications

Described here in terms of product and customer classification, but these techniques will be useful in other domains such as trip planning travelers require different steps. suit for instance. or reservations to be made months ahead for haute cuisine restaurants, etc.

we can a we can use these products to we can approximate the categories users fall into

The problem of the disjointed product space will be solved in several stages. First, domain experts at souljourn will specify qualitatively meaningful categories, call these D. Examples of categories would be luxury, budget, adventure, cultural, high culture, pop culture, religious, spiritual. These may also be grouped into sibling and/or hierarchy relationships, though they need not be.

We will then use simple Bayesian probability models to estimate how likely specific travel products (e.g. locations, hotels, restaurants, events, sights, cultural institutions) fit into each category. Call the array of product probability vectors P.

Any customer activity at all, including that in the memories section, will immediately start being used to assign the probability the customer falls into a given category D. We will call this array of probability vectors C.

Once we have an initial assessment of product and customer prior probabilities, we will adjust these every time a new piece of data is added to the system.

Once we have an initial guess of categories for products and customers (P and C) we can examine actual choices by customers using a clustering algorithm. The clustering algorithm will examine the categories of products bought by a customer. This will be used as quality control, that is, if the cluster analysis converges to the same categories as the probability analysis, then we can be pretty sure we have the product categories right or at least right enough. Furthermore, we can use the cluster analysis to provide secondary classifications of previously uncategorized products. Call this secondary product classification Q. We will always prefer the predictions based on actual choices (P) but, barring this, being in a well-defined cluster with members of P is pretty good evidence the members of Q are in P. We will not, however, use Q when calculating customer profiles since this would curve-fit our data quality check rendering it meaningless.