Should Backups be Tracked as CIs?

Short answer: probably not.

Is there any business value in tracking backups? In other words, can you improve operational efficiency, increase revenue or defer costs in excess of expenses by doing so?

One scenario to track backups as CIs would occur if the cost of loss (I.e to reputation) is significantly in excess of tracking costs, and the status of the CIs can be updated throughout their life cycle. This is especially true if tracking can be automated.

I am not aware of any companies doing this. In most cases a process or operating manual provides sufficient control of backups.

Six Steps to Successful CMDB Implementations

Have you been asked to implement a CMDB? Here are a few pointers for doing it successfully.

DSCN0259

  1. Find the “low hanging fruit” where you will obtain the most benefit for the least cost. Implement that.
  2. Configuration Management should be focused on improving processes, not implementing a database. A database is the presumed tool, but you need to look at how your processes will be improved.
  3. The leading candidates for improvement include: Request Fulfillment, Incident Management, Change Management, Problem Management, Availability Management, and Capacity Management (not necessarily in that order).
  4. Configuration Management is not about building a database. Your CMDB can be a spreadsheet, if that provides the most benefit for the lowest cost. If necessary you can generate a more robust database later. However, see the next caveat.
  5. Maintaining the CMDB will be costly. This leads to two points: 1) make sure you understand what data you need and why, and 2) automate data collection as much as possible.
  6. Implement your CMDB in phases, in conjunction with Continuous Service Improvement.

Don’t expect your CMDB to include everything that it could conceivably contain according to ITIL. That would be too costly for the value provided to most organizations.

A Critique of Pure Data: Part 2

Please see Part 1 here.

Enter Big Data

In the June 2013 issue of Foreign Affairs (“The Rise of Big Data”), Kenneth Cukier and Viktor Mayer-Schoenberger describe the phenomena as more than larger sets of data. It is also the digitization of information previously stored in non-digital formats, and the availability of data, such as location and personal connections, that was never previously available.

They describe three profound changes in how we approach data.

  1. We collect complete sets of data, rather than samples that must be interpreted with traditional techniques of statistics.
  2. We are exchanging our preferences for curated, high quality data sets for variable, messy ones whose benefits outweigh the costs of curating.
  3. We tolerate correlation in the absence of causation. In other words, we accept the likelihood of what will happen without knowing why it will happen.

Big data has demonstrated significant gains, and a notable one is language translation. Formal models of language never progressed to a usable point, despite decades of effort. In the 1990s IBM broke through using statistical translation from a French-English dictionary gleaned from high-quality Canadian parliamentary transcripts. Then progress stalled until Google applied massive memory and processing power to much larger and messier data sets of words measuring in the billions. Machine translations are now much more accurate and cover 65 languages (which it can detect automatically when most humans could not).

Another notable success was the 2011 victory of IBM’s Watson over former winners in the game Jeopardy. Like Google Translate, the victory was based primarily on the statistical analysis of 200 million pages of structured and unstructured content. It was not based on a model of the human brain. Watson falls short of a true Turing Test, but it is significant nonetheless.

The loss of causality is not, by definition, a loss of useful information. UPS uses sensors to diagnose likely engine failures without understanding the cause of failure, reducing time spent on the roadside. Medical researchers in Canada have correlated small changes in large data streams of vital statistics to serious health problems, without understanding why those changes occur.

Given these successes, and the presence of influential political movements that attempt to discredit the validity of scientific models in areas such as evolutionary biology and climate science, it is tempting to announce the death of models. Indeed many pundits of late have written obituaries on causation.

I believe these proclamations are premature. For starters, models in the form of data structures and algorithms are the backbone of big data. The rise of big data is derived not only from the increased availability of processing power, memory, and storage, but also from the algorithms that use these resources more efficiently and enable new methods of identifying the correlations. Some of these techniques are implicit, such as the rise of NoSQL databases that eliminate structured data tables and table joins. Others are innovative ways to find patterns in the data. Regardless, understanding which algorithms to apply to which data sets requires the understanding of them as abstract models of reality.

As practitioners discover more correlations that were never known before, researchers will ask more questions and better questions about why those correlations exist. We won’t get away from the why entirely, in part because the new correlations will be so intriguing that the causation will become more important. Researchers can not only ask better questions, but they will have new computational techniques and larger data sets with which to establish the validity of new models. In other words, the same advances that enable big data will enable the generation of new models, albeit with a time lag.

Moreover, as we press for more answers from the large data sets. we will find it increasingly harder to establish correlations. Analysts will solve this in part by finding new sets of data, and there will always be more data generated. However much of the data will be redundant with existing data sets, or of poorer quality. As the correlations become more ambiguous, analysts will have to work harder to ask why. Analysts will inevitably have to establish causation in order to improve the quality of their predictions.

Please note that I don’t discount the successes of big data. This is one of the most important developments in the industry. Instead I conclude the availability of new data sources and means to process them does not mean the death of modeling. It is leading instead to a great renaissance of model creation that advances hand-in-hand with big data.

Managing to Design

I realized yesterday that I almost didn’t buy my iPhone 5 because of a cable.

358033-apple-iphone-5-lightning-port

Apple introduced the new Lightening Connector with the iPhone 5 and iPad 3 to replace the older iPod connector. The new cable is smaller and reversible, but unlike Android devices supports only the slower USB2 standard.

Lack of higher speeds and incompatibility with earlier peripherals are flimsy excuses to switch platforms. The write speed of flash storage does not exceed USB2 performance, and I don’t have any incompatible peripherals anyway. Maybe it seemed Apple was simply advancing their agenda of incompatibility with the rest of the tech world.

The event is a reminder how easy it is to fixate on shiny objects and small road bumps, and to take our eyes off the goal. Whatever our specific intentions, our broad goals are similar: to improve our lives, and to make our businesses more productive in pursuit of their goals.

Technological developments are important. Like Apple, we build technical architectures in order to maximize  current use of technology as a function of cost, while maximizing our ability to adapt to change, also as a function of cost. Good design considers both current and expected future use of technology.

In my opinion, one of our worst behaviors is over-responding to user requests that compromise a designed service. I know this statement is heresy in our “customer service” culture, in which “the customer is always right.” In Real Business of IT, Richard Hunter and George Westerman explain it this way:

In the absence of a larger design, delivering without question on every request is a value trap. Over time, setting up IT as an order taker produces the complicated, brittle, and expensive legacy environments that most mature enterprises have. It hurts the business’s ability to deliver what’s needed for the future.

Our colleagues outside of IT are not customers. Our colleagues are just that–colleagues. We collaborate with each other as colleagues to create outcomes that deliver value to the customers who purchase our organization’s products. Like IT, our colleagues in HR, sales, marketing, and accounting consider the short-term and long-term ramifications of their decisions in the execution of their services.

This is not an excuse or empowerment to simply say no to our colleagues. There are correct and incorrect methods to refuse a request, and our reflexive action is not to push back. One way is to explain why we do things the way we do, and to offer alternatives that meet the objectives without compromising longer-term objectives. We can also offer to review our modes of operation if they appear incompatible with changing needs of the organization.

As a Project Manager for a call center and data center relocation, I remember the back and forth discussions between management and construction, with me in the middle, of laying out the new facility. The construction firm held traditional mindsets, but did not blindly refuse requests. Instead they politely and patiently described the byzantine fire and construction regulations and the cost implications of various design trade-offs.

We eventually achieved a design that met most current and future needs. Whatever Apple’s specific designs, I prefer the Lightening Connector.

A Critique of Pure Data: Part 1

Rationalism was a European philosophy popular in the 18th and 19th centuries that emphasized discovering knowledge through the use of pure reason, independent of experience. It rejected the assertion of Empiricism that no knowledge can be deduced a priori. At the center of the dispute was cause and effect–whether effects could ever be determined from causes, whether causes could ever be deduced from effects, or whether they had to be learned through experimentation. Kant, a Rationalist, observed that both positions are necessary to understanding.

Modern science descended from Empiricism, but like Kant is pragmatic, neither accepting nor rejecting either position entirely. Scientists observe nature, deduce models, make predictions using the models, and test the predictions against observations. They describe the assumptions and limits of the models, and refine the models to adapt to new observations.

The old quip says all models are wrong, but some are useful. Scientific models are are useful only to the extent they are demonstrated useful. At their simplest, they are abstract representations of the real world that are simpler and easier to comprehend than the complex phenomena they attempt to explain. They can be intuited from pure thought, or induced from observation. The benefit of models is their simplicity–they are easier to manipulate and analyze than their real-world counterparts.

Models are useful in some situations and not useful in others. Good models are fertile, meaning they apply to several fields of study beyond those originally envisioned. For example, agent models have demonstrated how cities segregate despite widespread tolerance of variation. Colonel Blotto outcomes can be applied to electoral college politics, sports, legal strategies, and screening of candidates.

To be useful, models are predictive, meaning they can infer effects from causes. For example, a model can predict that a given force (i.e. a rocket) applied to a object of a given mass (i.e. a payload) will cause a given amount of acceleration, which causes an increase in velocity over time. Models Screenshot_5_20_13_12_09_PMpredict that clocks in orbit on Earth satellites are slightly faster than those on the surface, resulting from gravitational time dilation predicted by general relativity. Models may be useful in one domain but not appropriate for another. Users have to be aware of the capabilities and their limitations.

Models give us the ability to distinguish causation from correlation. We may correlate schools running equestrian programs with higher academic performance, but we would be unwise to accept causation. We would have to create a model to show how aspects of equestrian activities improve cognitive development, and to discount the relevance of other models that may show causation to other factors. We would then search out data that can confirm or deny the affects of equestrian development on cognition. (It is more likely there are other causal factors acting on both equestrian programs and academic performance.) Whether or not a model can show causal connections to all world phenomena, they can guide us to better questions.

For this discussion we are interested in computation, and that means Alan Turing who, in 1936, devised a Universal Turing Machine (UTM) that is a simple model for a computer. Turing showed the UTM can be used to compute any computable sequence. At the time this conclusion was astonishing. The benefit of UTM lay not in its practicality–it is not a practical device–but in the simplicity of the model. In order to prove a problem is computable, you just need to demonstrate a program in the UTM. Separately, Turing also gave us the Turing Test, an approximate model of intelligence.

Those who use models to make predictions are demonstrated more accurate than experts or non-experts using intuition. This last point is the most important, and is the main reason we develop and use them.

The IT Service Management industry lacks academic rigor because it has never been modeled. Most academic research focuses on mostly vain attempts to measure satisfaction and financial returns. Lacking a model, it is impossible to predict the effect of an “ITIL Implementation Project” on an organization or how changes to the frameworks will affect industry performance. Is ITIL 2011 any better than ITIL V2? We presume it is, but we don’t know.

Continued in Part 2

Should Incidents Be Re-Opened?

Should Incidents be re-opened? The simple answer is: yes, if it was Closed incorrectly. Incorrect closure may include incorrect or incomplete testing or failure to confirm service restoration with the customer or user. However, IT environments are complex and reality is seldom so simple. I advocate instead against reopening Incidents, after a 2-3 day Resolved period.

The best trade-off, in general, is to allow a 2-3 day burn in period, during which the request is fulfilled or the Incident is Resolved. Resolved means service has been restored, the affected parties have been notified, and all records have been updated. The contact now has 2-3 days to test and validate before the Incident record is Closed, generally automatically by the tool workflow. Once Closed, the Incident cannot be reopened.

There exists perverse incentives to create multiple Incidents, particularly in a pay-per-issue billing model. On the other hand, there is also the opposite perverse incentive to re-open Incidents for new Incidents or requests, and to include multiple, unrelated requests in the same issue. Sometimes this happens just out of laziness, i.e. it is faster to reply to an existing email than to fill in a new one.

In addition there is gray area between what is a new Incident and what is an existing Incident. Some errors are intermittent. Restarting the device or application may restore service, but the Incident may occur again in a few hours, days, or weeks. In this case a Problem record should be raised, but the Incident may reoccur before the Problem Management and Change Management processes can run their course. Are these repeat Incidents new or existing? Every organization should have its own answer and it depends on the Incident. A 2-3 day separation between recurrences is a good, general policy to distinguish between new and existing ones.

Organizations who choose to re-open Incidents should track these Incidents. An independent party should verify they were re-opened appropriately, and any inappropriate activities should be managed through administrative or disciplinary actions, hand-slapping, or public humiliation. If this sounds bureaucratic or patriarchal, it is. In general it is easier to define in terms of time and enforce with a tool.

The 2-3 day Resolved period is not perfect for all situations and not suitable for all organizations. However, I have found through experience it is a good solution that is flexible, widely applicable, unbureaucratic, conceptually simple, and generally fair to all parties. Once Closed, the Incident should remain Closed.

2012 ITIL Exam Statistics

APM Group has released their ITIL exam statistics for the whole year 2012. I have compiled their statistics and present them with a little more context. 1

ITIL Foundation

  • Over 263,000 exams were administered in 2012, up 5% from 2011. Over 236,000 certificates were issued. 
  • This number finally exceeded the previous annual high which occurred in 2008 at 255,000. Annual exam registrations have climbed steadily since the global financial meltdown.
  • Overall pass rate was 90% in 2012, up steadily from 85% in 2010.
  • We have witnessed a shift in geographic distribution of Foundation certificates. North America’s representation in the global certificate pool dropped steadily from 25% in 2010 to 21.4% in 2012, while Asia’s has risen steadily from 29% to 32.7%.
  • Using unverified but credible data from another source that dates back to 1994, I estimate just under 1.5 million ITIL Foundation certificates have been issued total worldwide.

ITIL Intermediate

  • Over 3,700 ITIL Experts were minted in 2012. No V2 or V3 Bridge certifications were issued. 
  • Just under 54,000 ITIL V3 Intermediate exams were administered in 2012, up 21% from 2011. Over 42,000 Intermediate certificates were awarded (including MALC, which qualifies one for ITIL Expert).
  • The pass rates averaged 79% for the Lifecycle exams, and 78% for the Capability exams. Individual exam pass rates ranged from 75% (SO, ST) to 83% (SS).
  • Pass rate for the MALC (alt. MATL) was 66% in 2012, up steadily from 58% in 2010.
  • Of the distribution of Intermediate certificates, the global shift was even more striking than seen in Foundation. North America’s representation of certificates declined from 32% in 2010 to 20% in 2012, while Asia’s rose from 12% to 24%.
  • Europe’s representation of Intermediate certificates held steady at 47%.
  • Although interest in the ITIL Expert certification via MALC continues to climb, it will not exceed on an annual basis the 5,000 ITIL V3 Experts minted in 2011 via the Managers Bridge exam until 2014 at the earliest, based on historical trends.

Click on a graph to expand

2012ITILFound1 2012ITILFound2 2012ITILFound3 2012ITILFound4 2012ITILAdv1 2012ITILAdv2 2012ITILAdv3 2012ITILAdv4 2012ITILRegion1 2012ITILRegion2 2012ITILRegion3 2012ITILRegion4

1 Unless otherwise indicated, numbers are rounded to the thousands.

Driving IT in the Cloud Era

Kudos to @AndiMann for posting a link to a new blog post1 by a vCloud architect for VMWare with the strange title AWS: A Space Shuttle to Go Shopping. Go read it now. For brevity, here are the 4 myths:

  1. Cloud is not about virtualization
  2. It is about Pay as you go
  3. Cloud optimizes resource consumption
  4. Scalability and elasticity are primary drivers

The article demonstrates these 4 myths of cloud adoption are decidedly false, or at the very least incomplete, specific to AWS. Virtualized servers are the primary driver. Organizations are over-utilizing pay as you go pricing to their own financial loss. Resources are not better utilized, and few organizations need or use rapid scalability or elasticity.

It suggests that IT organizations can compete with cloud providers, and I believe they ultimately will be successful. Here are some drivers that I believe are more important to current adoption patterns.

  1. Transparent services and pricing. Customers and users of cloud-based services readily understand what service they are receiving and at what cost.
  2. Automated Provisioning: The provisioning of these services by the cloud providers is automated and generally completed within seconds or minutes.
  3. Bypassing controls: Business units can bypass traditional organizational policies, processes, procedures, or controls and order directly from cloud providers.

Information Technology is not helpless. In fact many of the tools IT will need have been at their disposal for several years and are described in frameworks of good practices.

  1. Service Catalog
  2. Financial Management
  3. Continuous Improvement

When cloud providers make their services and costs transparent, they are using a Service Catalog. IT organizations can do the same.

Financial Management goes hand-in-hand with the Service Catalog, for making transparent not only the costs of services, but how organizations are being charged for services and for which services. Here is the key point–this remains true whether or not there is any mechanism of charge back based on the services consumed specifically by business unit.

Automating ITSM in the Cloud Era.003I would argue that simplicity and transparency are two of the primary motivations behind cloud adoption. Moreover, I would argue that the longer-term adoption of outsourced services to organizations such as HP, IBM, or InfoSys had as more to do with transparency than with getting improved services for less cost. Arguably the latter seldom occurs.

In order to compete with the new breed of cloud provider, IT organizations will need to accelerate the  delivery of services from days or weeks to minutes or hours. This will inevitably require, among other things, automation. These subjects will become recurring themes throughout this blog during the coming months.

1 This post has been sitting in my queue for a few weeks.

TFT13: Automating Service Management in the Cloud Era

Proposal for TFT13 Presentation 2013/06/18

Title: Automating Service Management in the Cloud Era

Automating ITSM in the Cloud Era.002Summary: The environment surrounding IT is becoming increasingly complex. As cloud providers dominate IT service delivery, the coordination of service delivery is rising in importance. The ability to support services effectively through efficient and automated processes is also becoming more complex. For example, the resolution of Incidents and Problems may span multiple providers who may lack accountability.

This presentation will examine the challenges to IT Service Management organizations to processes and their automation. It will examine how organizations are dealing with these complexities, and what challenges remain to be solved. The presentation looks at these challenges from the perspective of automation.

You can vote for my application at TFT13.

Service Management Is Dead

“Service Management is dead.”

That was my first thought when I read McKinsey Querterly’s “Capturing value from IT infrastructure innovation” from October 2012.

That was going to be the point of this blog post.

Then I read it again.

Conclusion 1: Innovation is more than just technology.

Conclusion 3: The path to end-user productivity is still evolving.

Conclusion 5: Proactive Engagement with the business is required.

Conclusion 6: Getting the right talent is increasingly critical

Conclusion 7: Vendor relationships must focus on innovation.

Getting the most from IT infrastructure has never been about technology (though technology is an important capability of IT). Innovating, maximizing productivity, and managing complexity evokes the mundane, at the expense of sexy.

It engages users.

It demands service.

It depends on process and automation.

It focuses on data and knowledge.

It understands and balances the needs of all stakeholders.

Technology is fun. Where technologists hang out are fun places to be. I know this may sound strange to those outside the industry, but the people who move technology are fascinating.

The most boring business events involve Project Managers and Risk and Compliance Officers. I have been to many meetings, and they are yawners, even for me.

That’s because project managers and auditors focus on the boring stuff.

Who are the stakeholders?

Who makes what decisions?

What do they want?

What kind of data do we have?

What kind of data we need?

Where is the data?

How do we use the data most effectively?

What are the risks, and how do we mitigate them?

Yawn.

For better or worse, this is the stuff that underpins business value; the foundation on which innovation is built.

Long live Service Management.