User Experience Disasters

This week’s episode of my podcast Beneficial Intelligence is about User Experience disasters. Danes consistently rank among the happiest people in the world, but I can tell you for sure that it is not the public sector IT we use that makes us happy. We have a very expensive welfare state financed with very high taxes, but all that money does not buy us a good user experience.

Good User Experience (UX) is not expensive, but it does require that you can put yourself in the user’s place and that you talk to users. That is a separate IT specialty, and many teams try to do without it. It doesn’t end well. Systems with bad UX do not deliver the expected business value, and sometimes are not used at all. A system that is functionally OK but that the users can’t or won’t use is known as a user experience disaster.

We have a web application for booking coronavirus testing here in Denmark. First you choose a site, then you chose a data, and then you are told there are no times available at that site on that date. If a UX professional had been involved, the site would simply show the first available time at all the testing centers near you. We now also have a coronavirus vaccination booking site. It is just as bad.

As CIO or CTO, some of the systems you are responsible for offer the users a bad experience. To find these, look at usage statistics. If you are not gathering usage, you need to start doing so. If systems are under-utilized, the cause is most often a UX issue. Sometimes it is easy to fix. Sometimes it is hard to fix. But IT systems that are not used provide zero business value.

Listen here or find “Beneficial Intelligence” wherever you get your podcasts.

Do You Know Where the Problems Are?

In Arizona, there are prisoners still behind bars who should have been released. The reason: The software that calculates their release date hasn’t implemented a 2019 law change. With this being just one of the 14,000 bugs (!) reported on the system, these people can potentially stay locked up for a long time yet. Officials claim there is no problem and their manual process flawlessly implements a complicated rule estimated to take 2,000 hours to program.

It is a leadership decision to decide what gets implemented first. And this one should be at the top of the list – right after the bug that means gang affiliation is not properly recorded, and members of warring gangs might end up in the same cell…

A desparate whistleblower finally went to a local radio station with this story after having been ignored internally for a year. As the CIO, do you have a method in place that ensures concerned programmers and users have a way to point out critical issues?

Contingency Plans

Last week’s episode of my podcast Beneficial Intelligence was about contingency plans. Texas was not prepared for the cold, and millions lost power. The disaster could have been avoided, had the suggestions from previous outages been implemented. But because rarely gets very cold in Texas, everybody decided to save money by not preparing their gear for winter. At the same time, Texans have decided to go it alone and not connect their grid to any neighbors.

In all systems, including your IT systems, you can handle risks in two ways: You can reduce the probability of the event occurring, or you can reduce the impact when it occurs. For IT systems, we reduce the probability with redundancy, but we run into Texas-style problems when we believe the claims of vendors and fail to prepare for the scenario when our redundant systems do fail. 

Texas did not reduce the probability, and was not prepared for the impact. Don’t be like Texas.

Contingency Plans

This week’s episode of my podcast Beneficial Intelligence is about contingency plans. Texas was not prepared for the cold, and millions lost power. Amid furious finger-pointing, it turns out that none of the recommendations from the report after the last power outage have been implemented, and suggestions from the report after the outage in 1989 were not implemented either.

As millions of Texas turned up the heat in their uninsulated homes, demand surged. At the same time, wind turbines froze. Then the natural gas wells and pipelines froze. Then the rivers where the nuclear power plants take cooling water from froze. And finally the generators on the coal-powered plants froze. They could burn coal, but not generate electricity. You can built wind turbines that will run in the cold, and you can winterize other equipment with insulation and special winter-capable lubricants. But that is more expensive, and Texas decided to save that money.

The problem could have been solved if Texas could get energy from its neighbors, but it can’t. The US power grid is divided into three parts: Eastern, Western, and Texas. They decided to go it alone but apparently decided to ignore the risk.

In all systems, including your IT systems, you can handle risks in two ways: You can reduce the probability of the event occurring, or you can reduce the impact when it occurs. For IT systems, we reduce the probability with redundancy. We have multiple power supplies, multiple internet connections, multiple servers, replicated databases, and mirrored disk drives. But we run into Texas-style problems when we believe the claims of vendors that their ingenious solutions have completely eliminated the risk. That leads to complacency where we do not create contingency plans for what to do if the event does happen.

Texas did not reduce the probability, and was not prepared for the impact. Don’t be like Texas.

Listen here or find “Beneficial Intelligence” wherever you get your podcasts.

Use Real Intelligence Instead of the Artificial Kind

If you can leverage real user intelligence in your systems instead of the artificial kind, you get a better result with less effort. But it takes some intelligent thinking by your developers to get to that point.

The new Microsoft Edge (version 88) that rolls out soon has crowdsourced the difficult decision of which browser notifications to allow. Users are tired of constant “Allow this website to send you notifications?” prompts, but it didn’t work to just make all of them more unobtrusive. Microsoft tried that first with “quiet” notification requests, but that meant many users were missing out on the notifications they did want. Instead, the upcoming version will use the decisions by all Edge users to decide which notification requests to show. If everybody else has refused notifications from a specific website, the Edge infrastructure learns that and defaults to not show notification requests from that site.

Do you have ways to harvest the decisions your users are already making and use that data to improve your systems? Put your data scientists to work on the challenge of using human intelligence instead of continuing to try to train AIs.

Are you Releasing Sub-Standard Systems?

Out of a sample of 5,000 apps, 80% did not live up to a reasonable standard. Are you releasing sub-standard apps or systems?

A company the reviews healthcare apps for the UK National Health Service found many bad examples, including apps that provided complex medical advice without any expert backup, or apps without security updates for several years. They’ve been though 5,000 apps, but there are 370,000 health-themed apps out there.

As a CIO, look in your systems list for information about applicable regulation. For every system, you should see a list of what regulations (GDPR, CCPA, HIPAA etc.) apply to that system, and the name of the person who has certified that this list is complete. For every regulation, you should also see the name of the person who certify that the system complies. If you don’t have that information in your systems list, you are probably releasing sub-standard systems.

Which Snow do you Shovel?

Which snow should you shovel? We’ve just had a couple of inches of snow here in Denmark, which means that I will have to get out the snow shovel and clear the sidewalk. But I live on a small private road where the snowplough doesn’t go. Should I shovel the snow from the road as well? Should I clear the patio? There is always more snow I could shovel.

In any IT organization, there is an infinite amount of possible work. It is constantly snowing new tasks – security patches, new cloud services, new integrations, enhancement requests, bug reports. You can easily run out of space for more post-its on your Kanban board, but you will never run out of tasks. As Elton John sang in The Lion King: “There’s more to do than can ever be done.” As an IT leader, it is your job to decide what gets done. Do you have a policy for what gets done first? If you don’t, write one and distribute it to your team. That makes it easier for them to find and do the most important jobs first.

Risk and Reward

This week’s episode of my podcast Beneficial Intelligence is about risks and rewards. Humans are a successful species because we are good at calculating risks and rewards. Similarly, organizations are successful if they are good at calculating the risks they face and the rewards they can gain.

Different people have different risk profiles, and companies also have different appetite for risk. Industries like aerospace and pharmaceuticals face large consequences if something goes wrong and have a low risk tolerance. Hedge funds, on the other hand, takes big risks to reap large rewards.

It is easy to create incentives for building things fast and cheap, but it is harder to create incentives that reward quality. Most organizations don’t bother with quality incentives and try to ensure quality through QA processes instead. As Boeing found out, even a strong safety culture does not protect against misaligned incentives.

As an IT leader at any level, it is your job to consider the impact of your incentive structure. If you can figure out a way to incentivize user friendliness, robustness and other quality metrics, you can create a successful IT organization. If you depend on QA processes to counterbalance powerful incentives to ship software, corners will be cut.

Listen here or find “Beneficial Intelligence” wherever you get your podcasts.

Another Avoidable Disaster

Today’s totally avoidable IT disaster is found in the Slack app for Android. It turns out the app stored the user password in unencrypted plain text. That means that every other app on your phone had access to it, and it might now lurk in various log files on your device. Slack is red-facedly asking users to update their app and change their password.

This is an example of what happens when developers operate under tight deadlines and without adult supervision. Any competent IT development organization has code review procedures. If you are a large, high-profile organization that release apps to millions of user, any new release should have a separate security review performed by a security professional. But Slack insisted on letting their team operate without any guardrails. That means it was a matter of time before they ran off the track.

If you are a CIO, take a look at your systems list. For every non-trivial or externally facing system, there should be a link to the latest security review with a date and a name of a real person – outside the development team – who performed the security audit.

Avoidable Disasters

Humans keep causing avoidable disasters. I’m a pilot qualified to fly under Visual Flight Rules (VFR), and I am acutely aware that the number one cause of deadly crashes for pilots like me is to fly into clouds or fog. It turns out that it takes only 45 seconds for an untrained pilot  to become completely disoriented in clouds. Professionals train long hours to learn to override their intuitive feeling of what is up and down and trust their instruments.

Nevertheless, a professional helicopter pilot who had only VFR training flew his helicopter into the ground after getting disoriented in a cloud, killing himself, basketball icon Kobe Bryant, and seven others.

In IT, we also know how to do things. As an industry, we have decades of experience building solid, user-friendly systems and running IT projects. But we mysteriously insist on doing it wrong, causing one IT disaster after another. We think we can take a shortcut in order to meet our deadline, just like the helicopter pilot taking the shortcut through a cloud. As the CIO, you need to make sure you have a process in place to prevent people working on critical systems from taking shortcuts.