A Critique of Pure Data: Part 2

Please see Part 1 here.

Enter Big Data

In the June 2013 issue of Foreign Affairs (“The Rise of Big Data”), Kenneth Cukier and Viktor Mayer-Schoenberger describe the phenomena as more than larger sets of data. It is also the digitization of information previously stored in non-digital formats, and the availability of data, such as location and personal connections, that was never previously available.

They describe three profound changes in how we approach data.

  1. We collect complete sets of data, rather than samples that must be interpreted with traditional techniques of statistics.
  2. We are exchanging our preferences for curated, high quality data sets for variable, messy ones whose benefits outweigh the costs of curating.
  3. We tolerate correlation in the absence of causation. In other words, we accept the likelihood of what will happen without knowing why it will happen.

Big data has demonstrated significant gains, and a notable one is language translation. Formal models of language never progressed to a usable point, despite decades of effort. In the 1990s IBM broke through using statistical translation from a French-English dictionary gleaned from high-quality Canadian parliamentary transcripts. Then progress stalled until Google applied massive memory and processing power to much larger and messier data sets of words measuring in the billions. Machine translations are now much more accurate and cover 65 languages (which it can detect automatically when most humans could not).

Another notable success was the 2011 victory of IBM’s Watson over former winners in the game Jeopardy. Like Google Translate, the victory was based primarily on the statistical analysis of 200 million pages of structured and unstructured content. It was not based on a model of the human brain. Watson falls short of a true Turing Test, but it is significant nonetheless.

The loss of causality is not, by definition, a loss of useful information. UPS uses sensors to diagnose likely engine failures without understanding the cause of failure, reducing time spent on the roadside. Medical researchers in Canada have correlated small changes in large data streams of vital statistics to serious health problems, without understanding why those changes occur.

Given these successes, and the presence of influential political movements that attempt to discredit the validity of scientific models in areas such as evolutionary biology and climate science, it is tempting to announce the death of models. Indeed many pundits of late have written obituaries on causation.

I believe these proclamations are premature. For starters, models in the form of data structures and algorithms are the backbone of big data. The rise of big data is derived not only from the increased availability of processing power, memory, and storage, but also from the algorithms that use these resources more efficiently and enable new methods of identifying the correlations. Some of these techniques are implicit, such as the rise of NoSQL databases that eliminate structured data tables and table joins. Others are innovative ways to find patterns in the data. Regardless, understanding which algorithms to apply to which data sets requires the understanding of them as abstract models of reality.

As practitioners discover more correlations that were never known before, researchers will ask more questions and better questions about why those correlations exist. We won’t get away from the why entirely, in part because the new correlations will be so intriguing that the causation will become more important. Researchers can not only ask better questions, but they will have new computational techniques and larger data sets with which to establish the validity of new models. In other words, the same advances that enable big data will enable the generation of new models, albeit with a time lag.

Moreover, as we press for more answers from the large data sets. we will find it increasingly harder to establish correlations. Analysts will solve this in part by finding new sets of data, and there will always be more data generated. However much of the data will be redundant with existing data sets, or of poorer quality. As the correlations become more ambiguous, analysts will have to work harder to ask why. Analysts will inevitably have to establish causation in order to improve the quality of their predictions.

Please note that I don’t discount the successes of big data. This is one of the most important developments in the industry. Instead I conclude the availability of new data sources and means to process them does not mean the death of modeling. It is leading instead to a great renaissance of model creation that advances hand-in-hand with big data.

A Critique of Pure Data: Part 1

Rationalism was a European philosophy popular in the 18th and 19th centuries that emphasized discovering knowledge through the use of pure reason, independent of experience. It rejected the assertion of Empiricism that no knowledge can be deduced a priori. At the center of the dispute was cause and effect–whether effects could ever be determined from causes, whether causes could ever be deduced from effects, or whether they had to be learned through experimentation. Kant, a Rationalist, observed that both positions are necessary to understanding.

Modern science descended from Empiricism, but like Kant is pragmatic, neither accepting nor rejecting either position entirely. Scientists observe nature, deduce models, make predictions using the models, and test the predictions against observations. They describe the assumptions and limits of the models, and refine the models to adapt to new observations.

The old quip says all models are wrong, but some are useful. Scientific models are are useful only to the extent they are demonstrated useful. At their simplest, they are abstract representations of the real world that are simpler and easier to comprehend than the complex phenomena they attempt to explain. They can be intuited from pure thought, or induced from observation. The benefit of models is their simplicity–they are easier to manipulate and analyze than their real-world counterparts.

Models are useful in some situations and not useful in others. Good models are fertile, meaning they apply to several fields of study beyond those originally envisioned. For example, agent models have demonstrated how cities segregate despite widespread tolerance of variation. Colonel Blotto outcomes can be applied to electoral college politics, sports, legal strategies, and screening of candidates.

To be useful, models are predictive, meaning they can infer effects from causes. For example, a model can predict that a given force (i.e. a rocket) applied to a object of a given mass (i.e. a payload) will cause a given amount of acceleration, which causes an increase in velocity over time. Models Screenshot_5_20_13_12_09_PMpredict that clocks in orbit on Earth satellites are slightly faster than those on the surface, resulting from gravitational time dilation predicted by general relativity. Models may be useful in one domain but not appropriate for another. Users have to be aware of the capabilities and their limitations.

Models give us the ability to distinguish causation from correlation. We may correlate schools running equestrian programs with higher academic performance, but we would be unwise to accept causation. We would have to create a model to show how aspects of equestrian activities improve cognitive development, and to discount the relevance of other models that may show causation to other factors. We would then search out data that can confirm or deny the affects of equestrian development on cognition. (It is more likely there are other causal factors acting on both equestrian programs and academic performance.) Whether or not a model can show causal connections to all world phenomena, they can guide us to better questions.

For this discussion we are interested in computation, and that means Alan Turing who, in 1936, devised a Universal Turing Machine (UTM) that is a simple model for a computer. Turing showed the UTM can be used to compute any computable sequence. At the time this conclusion was astonishing. The benefit of UTM lay not in its practicality–it is not a practical device–but in the simplicity of the model. In order to prove a problem is computable, you just need to demonstrate a program in the UTM. Separately, Turing also gave us the Turing Test, an approximate model of intelligence.

Those who use models to make predictions are demonstrated more accurate than experts or non-experts using intuition. This last point is the most important, and is the main reason we develop and use them.

The IT Service Management industry lacks academic rigor because it has never been modeled. Most academic research focuses on mostly vain attempts to measure satisfaction and financial returns. Lacking a model, it is impossible to predict the effect of an “ITIL Implementation Project” on an organization or how changes to the frameworks will affect industry performance. Is ITIL 2011 any better than ITIL V2? We presume it is, but we don’t know.

Continued in Part 2

Service Management Is Dead

“Service Management is dead.”

That was my first thought when I read McKinsey Querterly’s “Capturing value from IT infrastructure innovation” from October 2012.

That was going to be the point of this blog post.

Then I read it again.

Conclusion 1: Innovation is more than just technology.

Conclusion 3: The path to end-user productivity is still evolving.

Conclusion 5: Proactive Engagement with the business is required.

Conclusion 6: Getting the right talent is increasingly critical

Conclusion 7: Vendor relationships must focus on innovation.

Getting the most from IT infrastructure has never been about technology (though technology is an important capability of IT). Innovating, maximizing productivity, and managing complexity evokes the mundane, at the expense of sexy.

It engages users.

It demands service.

It depends on process and automation.

It focuses on data and knowledge.

It understands and balances the needs of all stakeholders.

Technology is fun. Where technologists hang out are fun places to be. I know this may sound strange to those outside the industry, but the people who move technology are fascinating.

The most boring business events involve Project Managers and Risk and Compliance Officers. I have been to many meetings, and they are yawners, even for me.

That’s because project managers and auditors focus on the boring stuff.

Who are the stakeholders?

Who makes what decisions?

What do they want?

What kind of data do we have?

What kind of data we need?

Where is the data?

How do we use the data most effectively?

What are the risks, and how do we mitigate them?


For better or worse, this is the stuff that underpins business value; the foundation on which innovation is built.

Long live Service Management.

The Role of COBIT5 in IT Service Management

In Improvement in COBIT5 I discussed my preference for the Continual Improvement life cycle.

Recently I was fact-checking a post on ITIL (priorities in Incident Management) and I became curious about the guidance in COBIT5.

The relevant location is “DSS02.02 Record, classify and prioritize requests and incidents” in “DSS02 Manage Service Requests and Incidents”. Here is what is says:

3. Prioritise service requests and incidents based on SLA service definition of business impact and urgency.

Yes, that’s all it says. Clearly COBIT5 has some room for improvement.

COBIT5 is an excellent resource that compliments several frameworks, including ITIL, without being able to replace them. For the record, the COBIT5 framework says it serves as a “reference and framework to integrate multiple frameworks,” including ITIL. COBIT5 never claims it replaces other frameworks.

We shouldn’t expect to throw away ITIL books for a while. Damn! I was hoping to clear up some shelf space.

HP’s $10 billion SKMS

In August 2011 HP announced the acquisition of enterprise search firm, Autonomy, for $10 billion.

It is possible HP was just crazy and former CEO, Leo Apotheker, was desperate to juice up HP’s stock price. With Knowledge Management.

Within ITSM the potential value is huge. Value can be seen in tailored services and improved usage, faster resolution of Incidents, improved availability, faster on-boarding of new employees, and reduction of turnover. (Ironically, improved access to knowledge can reduce loss through employee attrition).

In 2011 Client X asked me for some background on Knowledge Management. I did prepare some background information on ITIL’s Knowledge Management that was never acted on. It seemed like too much work for too little benefit.

ITIL’s description does seem daunting. The process is riddled with abstractions like the Data —> Information —> Knowledge —> Wisdom lifecycle. It elaborates on diverse sources of data such as issue and customer history, reporting, structured and unstructured databases, and IT processes and procedures. ITIL overwhelms one with integration points between the Service Desk system, the Known Error Database, the Confirmation Management Database, and the Service Catalog. Finally, ITIL defines a whole new improvement (Analysis, Strategy, Architecture, Share/Use, and Evaluate), a continuous improvement method distinct from the CSI 7-Step Method.

Is ITIL’s method realistic? Not really. It is unnecessarily complex. It focuses too much on architecture and integrating diverse data sources. It doesn’t focus enough on use-cases and quantifying value.

What are typical adoption barriers? Here are some:

  1. Data is stored in a variety of structured, semi-structured, and unstructured formats. Unlocking this data requires disparate methods and tools.
  2.  Much of the data sits inside individual heads. Recording this requires time and effort.
  3. Publishing this data requires yet another tool or multiple tools.
  4. Rapid growth of data and complexity stays ahead of our ability to stay on top of it.
  5. Thinking about this requires way too much management bandwidth.

In retrospect, my approach with Client X was completely wrong. If I could, I would go back and change that conversation. What should I have done?

  1. Establish the potential benefits.
  2. Identify the most promising use cases.
  3. Quantify the value.
  4. Identify the low hanging fruit.
  5. Choose the most promising set of solutions to address the low hanging fruit and long-term growth potential.

What we need is a big, red button that says “Smartenize”. Maybe HP knew Autonomy was on to something. There is a lot of value in extracting knowledge from information, meaning from data. The rest of the world hasn’t caught up yet, but it will soon.

The 17 Step Expert

Originally To A Friend Struggling With Career:

I was chatting with an old friend a few days ago who is struggling with career direction and what she wants to do. Here is my advice, compiled from a variety of “expert” sources and personal experience. Keep in mind these pertain to expertise in a knowledge-based industry.

  1. Decide what you want to do. I am not a fan of 1 year, 2 year, 5 year, 10 year, and 20 year plans. Basically think big–end of lifetime goals, the stuff you would write on your gravestone. Then make short-term plans that get you there. Otherwise, the economy and your current circumstances change too much to predict where you will be in 5 years, or 10 years.
  2. Choose the area you want to be an expert. Choose a subject that is sufficiently narrow. “Expert in Information Technology” or “teaching” is too broad. However, “Expert in Agile software methods” is probably right.
  3. It takes one hour per day for 3 years to be an expert on a subject. This represents approximately 1,000 hours of effort.
  4. It takes one hour per day for 5 years will make you a nationally recognized expert. This represents approximately 1,800 hours of effort.
  5. It takes one hour per day for 7 years will make you an internationally recognized expert. This represents approximately 2,5000 hours of effort. (Among performers, for example professional musicians or athletes, the general rule of thumb is 10,000 hours of practice. Malcom Gladwell in Outliers also estimated that Bill Gates spent 10,000 programming computers before he started Microsoft. Please note that we have defined a field of study that is more narrowly defined, relative to Bill Gates. In addition we are avoiding areas that involve significant practice eye-hand coordination and muscular development.)
  6. Motivation will be an issue. Staying at something daily for this long is a challenge. Try to find ways to reward yourself along the way. If you achieve a certain milestone, then reward yourself with a vacation. This is personal, so take some time to think about this. On the other hand, some aspects of this are self-rewarding.
  7. Buy and read all the books on that subject. Summarize. Condense. Publish book reviews on Amazon or on a Blog about each book. If you are able to read and repeat the contents of three books on the subject, you are probably qualified to present at college-level seminars on the subject.
  8. Present at seminars and conferences.
  9. Find all the academic articles you can. Outline them. Summarize the arguments. Compare and contrast the findings. If possible, write and publish your own academic paper.
  10. Connect with the experts in the field: email, LinkedIn, FaceBook, Twitter, etc. The world of social networks has made it easier than ever to identify and connect with the world’s experts.
  11. Start a weblog. Try to write something twice a week. You won’t make money on a weblog, but that isn’t the point. It is about publishing your thoughts and expertise. Respond to comments. Engage with readers. There are methods to improving the readership and popularity of your weblog. I am not an expert in them, but they are there and you should research them.
  12. Cross-post your Blog posts on the social networks.
  13. Find online chat groups. Participate: ask questions, answer questions.
  14. Identify the conferences on the subject. Attend them if possible.
  15. Even better, present at the conference.
  16. If you entrepreneurial, start your own company doing just that. If not it helps to be working in or near that field, even if for someone else.
  17. Invent your own theories and methods. Publish them. Try them out in the real world.

Everyone struggles with money, but try not to worry about that in the short-term. After you have achieved expertise and recognition, the money will follow. But you need to focus every day, at least an hour. And try to do all of the above every week. It is difficult, but don’t let one aspect slip for too long.

Empowered: More FLAWs than an Uncelebrated HERO

If nothing else, Empowered, Unleash Your Employees, Energize Your Customers, Transform Your Business has given the world several new FLAWs (four letter acronym words). At last reckoning there were three: HERO, IDEA, and POST, but one of these was introduced in an earlier book, Groundswell.

Empowered has given the world a lot more than that. My title is unfair perhaps, because I liked this book, and the further I read the more I liked it. You cannot read a business book these days that doesn’t introduce a new acronym, and I have come to see it as a proxy for strong knowledge or good writing. Fortunately Bernoff and Schadler are both knowledgeable and good writers, so I wish they wouldn’t resort to gimmicks.

The best part of the book is the specific examples of real companies doing real projects, mostly Forrester customers. Empowered ties together many trends that, although I was aware of them individually, was not seeing them so closely interlinked. Social media (Twitter, Facebook, and LinkedIn), mobile computing, project management, information security, and the traditional roles of customer service are among the topics that are addressed. The hero of the story is, of course, the HERO, or highly empowered resourceful operatives who are dragging companies, kicking and screaming, into the 21st century.

HERO means more than it seems. Imagine a 2-dimensional matrix forming a quadrant—yes this quadrant is in the book, but not until chapter 8. On the X-axis (from left to right) is empowerment. On the Y-axis (from bottom to top) is resourcefulness. At the bottom left of the quadrant are disenfranchised employees who are neither empowered nor resourceful, making approximately one-third of most companies. The next one-third of employees are those who are locked-down—empowered but not resourceful. The smallest percent, maybe one-eighth, are those who are the rogues who are resourceful but not empowered. The rest are HEROs. The goal of organizations, then is not to expand that quadrant as big as possible, but to get the best people into the HERO roles and to get the organization behind them. Easier said than done, but there is a lot of  substance in Empowered to help on the journey.

The book is divided roughly in half. Part one discusses HEROs and HERO projects in detail, including how they have saved organizations and how the lack of a HERO has led to substandard responses and embarrassing situations. Prominent here are the realities of social media and mobile technologies. Part two discusses actions organizations can take to enable the HERO. Similar themes run through the book, and this is not a collection of random blog posts.

Part one did turn me off in many places. The author seemed to target me, an IT professional and my colleagues as the chief disablers of HERO behaviors. I hope that we can be forgiven. We understand as well as anyone the complexity behind modern businesses, and how frail it really is under the hood. We are the individuals whose heads get beat whenever a server crashes or data is compromised, regardless of whether we had anything to do with the initial implementation. We’ve been SOX’ed, mandated, legislated, and audited to death. A little more respect would be nice.

Fortunately, the book delivers some more of that in part two. It recognizes some of the issues faced by IT and provides some guidance for IT professionals. It spends time on a couple IT leaders who have reached out to other business units to build creative and innovative solutions. Ultimately this is not about IT, but about the business leaders understanding the borders of the organization are no longer around its physical premise and its high-walled data centers. The borders around the organization are around its people. Employees and customers are using Twitter and YouTube, and the conduits for leakage is unfathomable. Employees have to exercise common sense and be professional. The emphasis of the Information Security office has to migrate from applying technical band-aids to engaging leaders and employees. It will happen, and I predict IT will be leaders in this process, not inhibiters.

McKinsey on Automating Service Operations

Good article from McKinsey about automating customer service with IT. Some key takeaways: make pilots, do field tests,  don’t over-plan or over-build, and try to change processes and mindsets in conjunction with the technology roll out. These are common recommendations in consultant circles, but it illustrates with some good and original examples. Plus here is a new one (for me): new applications allow for full-scale simulations of business process changes. Cool stuff that might warrant some additional research.


W00t: The ITSM podcasts are back

I posted almost two years ago about the dearth of ITSM podcasts. The IT Skeptic blog is alive and well, but the podcasts are dead.

Fortunately a new batch of podcasts have arisen. Three that appear to be alive and well are:

I think the first organization needs no introduction: Connect, Learn, Grow! The itSMF USA Podcast

A second is the ITSM Weekly The Podcast From the folks at the ServiceSphere.

And last, but not least, the ITSM Manager: the IT Service Management Podcast, from ITSM Manager.

All are linked from the iTunes Store.

Wall Street & Technology CIO Round Panel

Here is an interesting round panel discussion on 2009 challenges in the financial industry from Wall Street & Technology. 

There is a major challenge for CIOs because they know what’s coming in the next year or two. They know they are going to be called upon to do much, much more with potentially much, much less. So potentially driving the efficiency in the IT organization is no small feat. They are going to have to figure out ways that technology can help.

The observations are specific to the financial industry, and as such are not very surprising. The industry remains beset by toxic assets on its balance sheets and declining (or highly volatile) asset values. Profitability challenges will constrain spending throughout 2009, on the top and bottom lines. Economists widely blame the current recession on “creative’ instruments and lack of regulatory oversight of the financial industry. In this environment leaders have little choice but to retrench on spending and deal with increasing regulatory scrutiny.

These observations don’t necessarily apply to other industries. Although the economic climate is challenged, there is opportunity in change, and 2009 may become a “breakout” year for aligning IT with organizational drivers. Alignment is a two-way street, and for many organizations it will not occur until IT is more successful in helping define strategic plans, rather than merely reflecting the organizational strategic plan in the IT strategic plan.