Sunday, 5 July 2009

The future of BI? It has nothing to do with business…

I've been reading and re-reading Stuart Sutherland's excellent book Irrationality for several weeks (review to come - promise). One of the things he talks about is "making the wrong connections". His point is that humans can't mentally evaluate evidence and make connections. We focus on the elements that are unusual or different and we massively over value our initial guesses.

That really resonates with me. After all that's what Business Intelligence is about, right? We provide factual, numeric, and clean data in a format that allows the user to make reasonable, rational decisions. We lambast the BI nay-sayers who operate on "gut instinct" and rightly so. But we leave that hyper-rational approach at the office door and conduct the rest of our lives in our normal irrational way.

In truth we conduct 95% of our working lives that way as well. The minute-to-minute stuff that business is *really* made of is unrecorded, unanalysed and (of course) irrational. All those conversations, relationships, emails, phone calls and meaningful looks are dealt with by instinct.

Outside the office we're seeing an explosion in personal monitoring and self surveillance. Devices like the iPhone can track every interaction, accessories like Nike+ allow us to track every step we take, software like RescueTime continuously monitors our computer usage. Even Facebook is a way to monitor your relationships, something that seemed completely intangible a few years ago. Etc, etc, etc.

This is the future of BI: Rational Augmentation. Using tracking data to make faster, better and more rational decisions about everything in our lives. It's about dealing with huge volumes of hyper-personal data and finding the patterns that matter. It lives outside the office and outside the corporation. It's a dash of text-mining, a pinch of regression, a dollop of aggregation and spoonful advanced analytics and a heap of basic statistics.

Many people will feel uncomfortable about this but the young will adopt it without question and those who adopt it will do better. Let's face it, it's a sub-optimal world out there and an edge in rationality could be a very big edge indeed.


As a final thought, this has the makings of a classic innovators dilemma for the current BI players. Rational Augmentation (I'm loving this phrase but call it what you like…) is going to need to deal with large data volumes very cheaply and very locally. It will probably be service based. It will probably be free for at least some users. But ultimately it will be a huge market, dwarfing the current BI market. The current players may have the skills to take this on but they've been swallowed by the corporate quicksand and they will sit and watch it pass them by. C'est la vie.

Wednesday, 20 May 2009

How to Fix the Newspaper Industry - everybody else is doing it…

NOTE: Don't expect me to be doing multiple posts per day. I don't know what's come over me!

Everyone seems to agree that newspapers are dead. Even here in the UK they're not doing great, although our papers seem to 'get' the web a lot more. One of the things that I hear quite a bit from the pundits is that they should make the physical paper free as well as the online version.

I was just reading a post on Tim Ferriss' blog about Alan Webber and his "RULE #24 - If you want to change the game, change the economics of how the game is played." In it he mentions the free paper theory.

This triggered a thought for me that giving the paper away is nowhere near a bold enough strategy. The problem with the paper is not that it costs too much (except on Sundays - £2! who are they kidding?). For a lot of people, especially the core newspaper market, the cost is not an issue. The issue is having to go get the damn thing, cart it around all day and then filter through the ads just to find a few interesting tidbits.

So here is my "fix": force people to take the paper. Stick it through *everyone's* mailbox every single day. Become *the* alternative delivery provider. I haven't bought a paper in ages but I can *guarantee* that if it came through my door I would look at it.

In the UK (and most of Europe) we have fairly strong opt-out regulation against so-called junk mail. However there is a huge loophole called the "door drop". Marketers are still allowed to put whatever they want through all of the doors in a given area. This allows a lot of room for targeting. Millionaires all live in the same neighborhood right? There is a big business around this. When I was involved (~2yrs ago) it cost about £0.05 per door. Now I get 3 or 4 drops a week, about 20 pieces in total. Hmm… that's sound like £1 of revenue per house minus delivery costs. Seems workable.

Now you wouldn't want to push your paper on literally everyone. You would target the exact slice of the population that already reads you. Plus your economics are now much more predictable. You know exactly how many papers to print and you can streamline your distribution arm. In fact you'd want to buy or partner with someone like DHL or TNT who are already doing alternative deliveries. You also need to get you deliveries done *very* early to catch the commuters.

This is a winner takes all play. There is only room for a handful of players in a market like this. Once they have your paper in their hands why would they buy a competing paper? If you get it right it should pay back in spades.

I don't really see anyone brave enough to make the switch right now. But they'll get more adventurous (desperate) as time goes on and profits dwindle.

Perhaps TNT should think about buying a newspaper group to beef up the delivery pipeline…

How To Fix Twitter - it came to me in the shower…

UPDATE: One Sentence Summary - It's possible to know in advance who will need to receive messages and therefore to structure the Twitter application and tweet data in a way that makes it much faster to deliver them.

So I got to thinking about Twitter and the ongoing problems they have keeping the service up and running smoothly. This line of thought was triggered by Twitter removing the ability to see all @ replies. This follows a long history of removing features to "streamline" the service (Google it if you care).

It's worth remembering that Twitter started out as a 'plain vanilla' Ruby On Rails app. Which is great, 'cuz RoR is great. But it means that Twitter was conceived as a database backed single instance app. There are tons of article out there about the architecture you need to scale such an app. Some of them where written by Twitter people who have since been ejected. (Again, Google it if you care).

The other thing to remember is that Twitter are only keeping a few weeks of tweets online (6-8 at last reporting). This may be a practical measure but it's also insane! There is huge value in all those old Tweets. I suspect they are doing this to limit the size of their databases. Which is a clue that they are still using a database (or probably a number of sharded databases) as the back-end.

Here's the thing though: Twitter is not a database app. It's a messaging platform. This is not an insight but it is important. We (the IT industry) know how to run messaging platforms at scale. We know how to run huge email services. We know how to run huge IM platforms. We know how to run huge IRC instances.

Of course Twitter is not exactly like any of those things. It's an asynchronous, asymmetric, instant micro-message stream. It's asynchronous because messages are simply pushed out (like email). It's asymmetric because there is no way to guarantee or confirm receipt (like IRC). But it's the instant streaming aspect that is key. That's what makes the experience unique.

My "fix" is based on the following observation: Twitter usage forms naturally into cliques. My wife tried out Twitter and found it boring. She didn't find a tribe that she connected with. I, on the other hand, love it because I can talk trash about Bikes, Business Intelligence and Data Warehousing all day long. What could be better?

Here's the architecture:
  • Load all of the data into a huge data warehouse (MPP of course!).
  • Cluster users into their natural cliques using data mining algorithms.
  • The cliques I follow might be:
    • BI-DW (~2,000)
    • UK Mountain Biking (~1,000)
    • Web 2.0 (~5,000)
    • Twitter Celebs (~1,000)
    • Of course cliques wouldn't really have names…
  • The backend database only contains users info, not tweets.
    • Following, Followers, Clique memberships, Bloom filter of following, etc.
  • Tweets are stored in "clique streams": all tweets for a clique in reverse order.
    • New tweets are added to the top/front of the stream.
    • Tweets can exist in multiple streams as required.
    • Streams have a maximum message age.
  • To provide an update the system only has to filter a small number of streams.
    • This has got to be a 1000x reduction. (60m users to 60k possibles)
  • The system stores a bloom filter of people a user follows as the first filter for streams.
    • Probably another 10x reduction, removes bulk of non-following clique messages.
  • The detailed filter should now be running over a very small dataset.
  • Final step is to combine the filtered streams and remove any duplicates.
  • It should go without saying that all tweets are added to the data warehouse in real time. ;-)
  • This also answers the question of how Twitter can make money: sell access to the data in that killer data warehouse.
{I have refrained from naming any specific technologies or products in the post because that's not really what it's about. Very restrained of me, don't you think?

I also haven't talked about DMs, mentions, etc. because I think that they can easily fit in this architecture and this post doesn't need to be any longer.}

UPDATE 2: This approach also makes it a lot easier to spot spam accounts. Someone may *actually* want to follow 4,000 people but they will only be in a few cliques. A spam account would be following too many different cliques.


Wednesday, 29 April 2009

Let the macro-blogging begin...

I'm setting this blog up as place to put thoughts that don't fit into Twitter's 140 character limit.

I've made a couple abortive blog starts in the past so… no promises! I'll also be putting up some essays that I've written in the past, probably reworked to save embarrassment.

Friday, 29 February 2008

Data can never be perfect...

[This was originally posted on an old blog: Data can never be perfect... ]

This is a re-post of a comment I made on Doug Henschen's article about data governance and the suprime crisis.

Doug's position (quoting from a lot of data governance vendors) is: "A first step toward avoiding such calamities... is an integrated, overarching data governance program that addresses data security, data privacy and data quality so that risks can be better understood and outcomes anticipated."

Basically, if the banks had better data they would have made better decisions and not got themselves into this mess.

The problem is not a lack of governance but an unshakeable belief in the data and risk models. Interested readers should look at "Did Black-Scholes Cause the Housing Bubble?" in Portfolio.

I'm a data guy, but every executive needs to understand that data is merely a map and the map is not the territory. If an explorer has a map that does not match the territory they can see, they would do well to question the map, rather than ask the territory to change.

The credit score is simply another map. There is evidence that they were significantly weakened by new financial products over the last 7 years. Again, see "Credit Scores: Not-So-Magic Numbers" for details.

Data quality, data governance, etc. are all **super** important. However, as data professionals we need to build systems that incorporate common sense, human based checks and balances. Trusting too much in software will eventually get you fired or indicted for criminal negligence.

Friday, 18 January 2008

Social Networking: The new Rock n' Roll

[ This was originally posted on an old blog: Social Networking: The new Rock n' Roll ]

In the 1950's Rock n' Roll swept the (western) world and created a youth culture impact that still reverberates today. It also created a complete break between the "kids" who loved and the older generation who just didn't get it. Some said Rock n' Roll was undermining the moral fiber of the nation, some just thought it was a bunch of noise.  The point is that music and culture where fundamentally changed by Rock n' Roll and everyone over a certain age was simply left behind. All of their objections and concerns simply became irrellevent.  The kids just didn't care and they quickly found that they could define the world on their own terms.

Social Networking is the new Rock n' Roll, it creates a complete break between generations.

In our era Social Networking (Facebook, MySpace, Bebo, et al) is going to generate another break.  Our culture will be fundamentally changed and a new generation will re-define how the world works.  You don't have to look far (or listening long) to hear complaints about lost productivity in the workplace, moral decline, "infantile" behaviour and kids not understaning the "real world".  Well, once again, the kids just don't care.  They grew up with IM and email.  Everyone they know or care to know is online all the time and they expect be there to.

I suppose that some of you who are 30+ (like me) will be thinking "I get it, I'm there".  I congratulate you for being so hip, but the truth is that you don't/can't really get it.  No matter how much you try to engage with this new paradigm it will always be an effort.  Some of you friends just won't participate. Those that do won't be totally open, or totally engaged.  You may feel like you're joining in but it will never be the same.

We're like old jazz fans who appreciate Rock n' Roll for it's roots in Jazz but really we long for something more nuanced and sophisticated.  Roll on you cool cats. 

Blog Interrupted: It's been a while...

[This was originally posted on an old blog: Blog Interrupted: It's been a while... ]

Well it's been almost a year and a half since my last post.  You would have been right to assume that this blog was dead but, just like Lazarus, it's alive again. Resurrected and better than ever.

I've been out of the Business Intelligence / Data Warehouse arena working in Database Marketing.  It's definitely given me an extra perspective on data warehouse and the value of data for it's own sake.  The first lesson was that it's not polite to laugh at people when they call a million row table "big";-).  The second lesson was that sub-queries in SQL Server do not work and will never return.

Anyway, I'm back in the industry working as a Data Warehouse Designer / Architect so I'll be using this blog as a place to crystallise my thoughts on how data warehousing should be done. I also love to research new products in this space so I'll be putting company and product summaries here as well with links to loads of useful resources.

Enjoy.

Disqus for @joeharris76