Sunday, March 16, 2008

Tools for analytical consulting.

In a previous post I talked a little about the type of analytical work I used to do in my previous life as a consultant in Cluster (then DiamondCluster, then Mercer, now OliverWyman). Most of this work was done using SAS, an statistical package I personally dislike.

The reasons for using SAS were mostly historic. It was used by one of our first clients in this area, and then a certain myth develops around the tool. It has been proved that works so... it has to be used. Consultants are a little bit risk-averse.

The whole concept is built around disk-based datasets, you read them, you sort them, you tranpose them... and you always move from a dataset to another dataset. Actually it is not so different from a relational database, and for all my disliking, SAS provides data manipulation capabilities that are at the very least similar or quite often better, if harder to use, than any database you may find.

That explains why people are using SAS and they are quite happy with it. And they have sound reasons for that. SAS provides with unparalleled ability to perform all type of manipulations and statistical operations on very large datasets. In the right environment, that is with the right people like some of the ones I met in the US, who were extremely knowledgeable about Statistics and had spent a lot of time programming in this environment, you have a winning combination.

But I still dislike the tool for three main reasons:
-I find it error-prone (syntax is horrible), and I would not like to be judged on all the SAS code I had to produce
-It is extremely programmer unfriendly (did I mention that the syntax is horrible) it is always really hard to debug code written by somebody else, especially people not trained in some basic programming practices, but SAS makes it even harder.
-SAS is not for casual programmers, if you are hiring people to offer them a career in consultancy they will be burnt if you force them into this. And finding good SAS programmers for contracting is not easy. To me this is a good enough reason to look for alternatives.

So if I am to replace SAS I need to find something that provides me similar flexibility and a more productive environment. And I have to say that so far I have not found it in one package. I am in the process of building it using what is available.

My current setup involves three different elements:
-A database, currently MySQL but will probably move to Postgres
-A statistical package, currently R . As good as SAS and I like the fact that I can focus on the statistics and forget the data part (done in the database)
-An external programming environment to act as a glue. I am currently using AppleScript and Perl (actually the PERL part has been done by one of my colleagues, Sergio). If I feel geeky enough I may even go to Objective C on my Mac.

My bet is that will be easier to find people who know how to deal with SQL databases (a commodity nowadays) and a language like PERL. R is also quite popular. I also like the fact that all are publicly available software giving me a broad pool of talent to tap (and access to the sources if required). Also I love the fact that all of this is running on my Mac, a much nicer environment to work. I know people think Apple machines are mostly for "creative" guys who do media production. Actually they can be used as a scientific workstation quite easily. This is much better than processing this in MS Access as I had to do a few times.

Of course my laptop is not the best environment to run a database server of several gigabytes, but this is easily scalable.

I will take a look at how the different elements are doing on a later post.

Friday, March 14, 2008

Springtime in Europe

I have spent the last few weeks helping my partners in Europe with a project involving one of the largest mobile operators where we are looking at how to improve the handset strategy.

This is another striking difference between Europe and the ME in how companies operate. In the ME people are used to pay fully for their mobile phones, in Europe handsets were subsidized almost from the beginning. That means that operators make a substantial investment  (up to 100 euros) upfront to capture a new customer and expect to recover the cost of the handset over the next months. The whole thing has grown, just do the maths: 100 euros per subscriber times 10 million handsets per year (or 60 million if you look at the whole operation) and you are quickly talking serious money here.

As operators face an increasing pressure on voice tariffs they are starting to look more and more into content related services to provide an extra revenue. New services (games, music, video...) require handsets with more advanced capabilities that are, not surprisingly, more expensive. Mobile operators need to understand how these new features translate into additional revenue through the consumption of these content-rich services.

One way of doing this would be asking people what they want and why and then tailor the offer to meet their needs. The problem with this approach is that people are not good at forecasting their own consumption (that is why we spend so much effort designing pricing plans that will give the perception of low tariffs while protecting the revenue of the operator).

The way we prefer is to look at the actual data patterns. We have extracted a quite large sample of their historical usage data and we have crossed it with the information regarding the handset used. This involves processing a large volume de data on relatively sophisticated ways (at least for what consultants usually do).

In the past we used SAS for this but I have never been happy about that as the syntax is horrible, the licencing process totally unfriendly and there is a serious scarcity of qualified programmers. So this time I decided to give a go to some alternative software. So far we are using MySQL and R running of course on my Mac.

I will post a more detailed assessment once the project is over but so far I am quite happy.