Will CAP survive 5G?

5G will reshape networks and information system architectures through the emergence of radical, large-scale, new products and usage patterns. Will long-established principles of computer science remain unchanged?

CAP Theorem, published in 1998 by Eric Brewer (founder at Inktomi, now at Google, a good story) restricts any information system to have at most two of the following three qualities:

  • Consistency: The most recently updated information is available across the system
  • Availability: The system provides low-latency, non-error responses for all queries
  • Partition tolerance: The system can continue to operate during loss of communication between nodes

CAP has provided valuable guidance and a foundation for system architecture for 20 years. But the future may contradict it.

Here are the three top-level objectives for 5G:

  1. Extremely high data rates and low latency for huge numbers of concurrent users
  2. Enabling communication for potentially billions of network devices
  3. Extremely reliable and low-latency communication for a broad and emerging category of services that are dependent on availability, latency and reliability

Clearly 5G architectures will be highly available, with high throughput, concurrency, and performance. To achieve these, 5G edge networks will grow significantly closer to the end user. Systems must be highly distributed to maximize this topology. And they’ll have robust partition tolerance to do so.

Interestingly, 5G objectives do not explicitly include consistency.

But consistency is required to transactionalize any system. When users pay for something, it must be delivered. Consistency will be central to 5G systems to make them profitable.

In other words, 5G systems will need all three CAP qualities: consistency, availability, and partition tolerance.

It’s important to emphasize that CAP applies to full systems, not only to data storage components or technologies. That’s an important detail because it’s possible for example to build an information system using a CAP consistent data tier, and provide availability or partition tolerance in the tiers above it.

And, more specifically, it’s possible to create systems that are CAP consistent and available, but that can trade availability for partition tolerance in the event of an outage.

For several years we’ve witnessed the emergence of ‘eventual consistency‘ -or optimistic consistency-, meaning that consistency can be delayed  as long as it’s transparent to the user. New technologies are using this approach to simulate all three aspects of CAP.

In effect, new systems will be constantly in motion, trading between consistency, availability, and partition tolerance in order to simulate the presence of all three. What an amazing outcome, indicating that future technologies will be more radical and powerful than we imagined just five years ago.

It’s a stark change from architectural predictions of the recent past. Guidelines such as “Centralize and consolidate” and “Keep state in reliable centralized stores” now seem simple, restrictive, and boilerplate. In reality, innovation will bend the rules and will produce diverse solutions to enable systems to appear to break CAP theorem.

Google Spanner is an example of the future of 5G systems. CAP choices are not static, instead the balance of properties change as needed. Eric Brewer comments:

Despite being a global distributed system, Spanner claims to be consistent and highly available… Does this mean that Spanner is a CA system as defined by CAP? The short answer is “no” technically, but “yes” in effect… during (some) partitions, Spanner chooses C and forfeits A. It is technically a CP system.

CAP Theorem will remain relevant, correct, and useful as 5G systems continue to emerge. But CAP is not a stopping point. Instead, it’s a kind of sign post, guiding the future, and predicting the ways we will bend and overcome established laws of computer science.

 

Posted in Uncategorized

WiFi Calling shows the path to 5G

WiFi Calling refers to the ability to place and receive calls over a WiFi network, using the native dialing (and video and text) interface that comes with your mobile phone. Both Android and iOS devices support WiFi Calling, and it’s available through most major carriers in the US.

WiFi Calling lets mobile subscribers take advantage of the vast network of WiFi access points available all over the world. These become part of the calling network so that coverage is greatly expanded, including important low-signal areas such as the home.

Although from a user standpoint it may seem like a simple feature, it’s a significant impact for networks and business models. And it’s part of a much larger shift toward 5G.

One impact has happened already. In part due to WiFi Calling, subscriber minutes are now unlimited for most service plans. Users no longer need to track service time when using their phones.

It’s also causing operators recognize the strategic importance of their WiFi access network. Many internet providers now provide free or discounted modem/WiFi devices as part of their service plans that have both public and private access points. Subscribers receive a WiFi device with a private network for personal use, but that also broadcasts a public access point that expands the provider’s footprint. This industry practice was challenged in a class-action lawsuit in 2014 and continues.

It’s not a coincidence that several internet providers have recently introduced mobile phone service. Although initially these are OTT products leverage existing carrier cell networks, they will rely more and more on WiFi as 5G becomes mainstream and blends the distinction between cell and WiFi service. WiFi Calling is a small, strategic pivot towards this future reality.

A critical aspect of WiFi Calling is that, although the user can connect through any available network and access point, ultimately service is still being delivered over their subscribed carrier network (including OTT). That means all of the carrier’s infrastructure is still being used. And that’s important for call quality, but it’s most significant for billing. It’s different than 3rd party calling apps like Skype and Tango. And this is a key aspect that will characterize new products in the 5G ecosystem.

WiFi Calling enables new business models, many of them transparent to the user. For example, coverage and call quality will continue to increase as devices become sophisticated in seamlessly switching between cell and WiFi networks, and leveraging simultaneous connections.

Carrier networks will increase in sophistication due to these impacts, as well as in importance of their role in business models. 802.11ax, QoS, edge networks, and OTT models will continue to develop in importance among many other operational aspects of carriers’ business.

These industry tectonics are a glimpse of the future of 5G: Hundreds of millions of discrete small signal access points. Fast, continuous network switching. And the emergence of brand new, powerful business models that may be transparent to the user, driven by large scale changes in infrastructure and operations. And the continued convergence of cable and communications providers, mobile providers, and a growing sector of IoT industries as companies continue to coordinate more and more closely to deliver the future.

5G means the introduction of a generation of compelling new products, driven by huge changes in business models, internal systems, and infrastructure. WiFi Calling is a small example of what is coming.

 

Posted in Uncategorized

More historical architectural guidelines

Here are more architecture guidelines from my work with a product development department in a Fortune 50 company five years ago. I wrote about “Optimize for last mile” previously. Some of the guidelines have survived the test of time, others clearly have not. Some are boilerplate now. But they’re interesting because they show how the company saw the future at that time- correctly, incorrectly, and business-as-usual.

Printed on a big poster in front of the common area, the guidelines read:

  • Simplify system designs
  • Build loosely coupled systems that may evolve independently
  • Design systems for 100% availability
  • Build to survive in hostile and unreliable environments
  • Scale horizontally
  • Centralize and consolidate
  • Keep state in reliable and scalable centralized stores, not tied to physical instances or locations
  • Design around performance bottlenecks, e.g. CPU, network IO
  • Design for least privilege
  • Design for simple, non-intrusive upgrades & rollbacks
  • Build for re-use
  • Prefer standard protocols and commodity platforms
  • Build in telemetry and monitoring
  • Embrace modern automated tool and practices throughout product lifecycle
  • Strive for maximum human readability in data and logging
  • Enable customer self-care
  • Optimize scale economics for CRAN/last mile network capacity & data center power/cooling
Posted in Uncategorized

Visualizing the high performance organization

Many of us who have spent significant time working with large organizations, are motivated by a sense of respect and optimism for the power of teams to achieve big, valuable objectives. But it can be surprisingly difficult to talk about this experience.

The symphony orchestra is a modern cultural foundation that clearly demonstrates the amazing ability of humans to organize, through  communication, coordination, horizontal structure and individual responsibility. The orchestra has 1000 years of history, and is driven by technical evolution.

Enterprise systems, systems that facilitate the work of large organizations, are much less about actual technology than about the people that use them. Enterprise systems really are focal points for the coordination and cooperation of groups of people. They are literally the digitization of consensus. It’s not about the technology, it’s about the real evolution of the ability of large numbers of people to work together.

As an example of the parallel between EIS and the orchestra, listen to the following section of the last movement of Beethoven’s ninth symphony, performed by the Chicago Symphony Orchestra. Turn up the volume. Imagine the organizational analogy in your business.

The section is here.

Posted in Uncategorized

Operations innovation is not product centric

Not all innovation is product based. In fact, product innovation is a very narrow slice of a much broader spectrum of innovation that’s possible.

Operations is particularly rich in non-product based innovation. Innovation in operations can take many diverse forms. For example:

  • A complex process is simplified by eliminating critical swivel-chair user interactions
  • An inventory system is modified to correctly track Work In Progress according to a key financial strategy
  • Response processing improved through innovative dynamic indexing built into a storage schema
  • Vendors are motivated to increase the accuracy of accounts payable by reversing the swim lanes on an invoicing process

None of these are products. They are feature, process, and integration-based innovations.

Here’s a real-world example. QA teams in an engineering organization I worked with recently are frequently challenged to track and resolve complex, intermittent issues that are not consistently reproducible. To help with this, developers added innovative functions to the environment that enable the internal user to click ‘record’ at the exact time of issue occurrence. This action creates a ticket in the tracking system, and then -critically- pulls and attaches logs from the processing environment at that moment, to the ticket. A confirmation is then sent to the user with the ticket details.

The innovation required integration with between several systems, including voice recognition and Jira. A new internal API provides container level details like address, user email, etc. That service is integrated with infrastructure services to get device logs from the processing environment to attach to the Jira ticket. XAPI sends an alert back to user that a ticket was created for the issue.

A large majority of innovation is not product based. The challenge is that, for most marketing and sales organizations, innovation is strictly defined as product innovation. This makes sense, since product innovation is directly visible to the customer. It’s critical that we measure and quantify the value of non-product based innovation in order to drive data-driven conversations with business leadership for investment in potentially high-impact areas of the organization.

Posted in Uncategorized

How will 5G change application architecture?

A few years ago I worked with a product development team in a large company. Posted on a large sign on a wall in the common area was a list of development and architecture guidelines. One of them read:

Optimize scale economics for CRAN/last mile network capacity & data center power/cooling.

I always thought that it was a strange guideline, oddly worded and difficult to follow. Develop enterprise software for the cooling needs of the data center? I could imagine a VP making this statement looking score some credibility points.

But with 5G is on the horizon and scheduled to be generally available in 2020, I’m reminded of that statement. Over time the pendulum for client-server architecture has swung from fat-client, to full server-side, to a hybrid document model. The current practice of frequent incremental asynchronous interaction and horizontal scaling is an evolution built on past models. What will happen as 5G becomes reality?

As complex as 5G is, the fundamental goals are clear:

  • Extreme Mobile BroadBand (xMBB), providing extremely high data rates and low latency for huge numbers of concurrent users
  • Massive Machine-Type Communication (mMTC), enabling communication for potentially billions of network devices
  • Ultra-reliable Machine-Type Communication (uMTC), targeting extremely reliable and low-latency communication for a broad and emerging category of services that are dependent on availability, latency and reliability

One important consequence is that 5G wireless will compete with current landline business. Large scale communication service providers operating physical networks today will definitely be impacted as consumers will buy 5G for residential and commercial service. This means that the behavior characteristics of these new networks will affect a lot of people. And these behaviors will certainly be different from current networks.

For example, 5G will likely rely heavily on edge caching to achieve responsiveness, throughput and availability. This will effectively much further distribute current hosting and data center architectures. It’s likely we will consider these fronthaul edges as a part of the server host network, even sending full containers and data shards to the edge cache to optimize user experience.

Radical network topologies ahead!

Posted in Uncategorized

Does SAFe correctly extend agile?

What’s great about agile is that its tenets emerged from the culture of real teams responsible for actual technical delivery. For example, horizontal team structures emerge again and again as a key characteristic of successful teams. The same is true about frequent communication. Agile ceremonies and guidelines resonate with people because they explicitly and openly strengthen important cultural values.

How to scale agile to have the same kind of impact at the enterprise level- with much larger, highly distributed, multicultural teams- is the big question that many of us are interested to answer.

Scaled Agile Framework (SAFe) is a framework for enterprise system production, created by Scaled Agile, Inc. SAFe builds and extends agile principles to coordinate hundreds or thousands of team members. It is relatively new in it’s 4th revision after 6 years of development. The Scaled Agile team is composed of a number of smart, broadly experienced people.

Does it work? Scaled Agile lists many successful case studies.

SAFe is complicated. To paraphrase a recent comment by founder Dean Leffingwell at the 2016 Scaled Agile conference in Colorado,

SAFe may be complicated, only because the domains it addresses are complicated. It is as simple as it can be. If there’s a simpler way, we will change the framework.

Agile principles state that

Simplicity—the art of maximizing the amount of work not done—is essential.

SAFe- in its current form -is not simple. By Conway’s law, what does this imply for social architecture of organizations implementing it?

Scaled Agile makes extensive recommendations for roles and structure above XP/Scrum. But ultimately, the agile ideal is a pure horizontal organization. Agile principal #4:

Business people and developers must work together daily throughout the project.

SAFe documentation acknowledges this risk:

However, historical use of the waterfall model, coupled with the somewhat natural inclination to institute top-down control over software development, has caused the industry to adopt certain behaviors and mindsets that can seriously inhibit the adoption of more effective Lean and Agile paradigms.

Distributed teams are the reality for every organization building enterprise systems of minimum scale or larger. This fact impacts every aspect of the delivery pipeline: vertical slicing, repository structures and versioning, test strategy, and etc. The SAFe framework doesn’t cover this fundamental in much depth. Where it does provide guidance, it is exclusively about how to overcome challenges. Distributed and multicultural teams can provide powerful advantages such as 24-hour production cycle and 360 degree visibility. These are fundamental aspects that must be first-class concepts to fully extend agile to large scale production.

Along with distributed delivery, another relevant question is the number of teams SAFe can support. This is important because at some scale there are important overlaps with supply chain management. Apple and other companies are successfully coordinating hundreds or even thousands of separate suppliers. They do not use the concept of PI. Instead production processes are designed to be as loosely coupled as possible. Is software different? In what ways?

What’s great about agile is that individual ceremonies add clear, coherent value to delivery providing an understandable, incremental way for teams to increase productivity and quality. In contrast, SAFe states:

Embracing a Lean-Agile mindset, understanding and applying the Lean-Agile Principles, and effectively implementing the SAFe practices all come before the business benefits.

SAFe has many good aspects and is the result of a lot of smart people collaborating. Does it correctly extend agile as an asymptote of Lean Read more ›

Posted in Uncategorized

The importance of questions

In the introduction to the excellent book How To Measure Anything, Douglas Hubbard provides this guidance about the importance of statistics, performance indicators, and data-driven business:

Like many hard problems in business or life in general, seemingly impossible measurements start with asking the right questions.

Meanwhile, on a completely separate subject- motivational coaching -author Niurka writes in Supreme Influence:

Questions cause us to seek, expand, and learn. As humans we are growth-seeking beings. Questions open the way for us to explore beyond the boundaries of our previous thinking… Questions have manifesting power.

It’s an interesting, important parallel.

Posted in Uncategorized

Distributed SDLC and Delivery Pipeline

I’d like to detail another key factor for successful software production in a distributed team environment: delivery pipeline.

Delivery pipeline refers to the process, infrastructure, roles, and tools an organization uses to deliver software from repository to production. Delivery pipeline varies between organizations and generally includes build, test, and deployment activities.

Delivering software to production is complex. For enterprise systems, complexity is a result of both the number and diversity of stakeholders, as well as extensive integration between systems characteristic of this domain. Highly distributed teams compound this complexity. Delivery efficiency is a critical aspect for the distributed organization to get right in order to leverage the strengths of distributed teams.

One subtle advantage of the pipeline concept, is that the term itself helps focus people on the end goal of the delivery process: getting functional value into the hands of users. Complexities of interaction and inter-dependency of teams within the distributed organization are secondary to this goal. To achieve this, each component of the delivery process must support this primary goal, through independence and highly parallel execution. In this way, the concept of delivery pipeline helps to align the organization. It’s a valuable point of consensus.

That consensus provides important guidance in key aspects of the delivery process. One important challenge for highly distributed teams is in merging code output between teams into a single version, in preparation for release. Because communication and logistics are more complex for distributed teams, merging can be extremely complex and is an enormous area of loss for distributed teams. The delivery pipeline viewpoint provides clear guidance: short-lived branching is essential. Quality here is a key factor for successful production with distributed teams.

By accurately defining change sets prior to execution, teams can closely control and even eliminate branching. Changes that are well defined and narrowly scoped are easier to merge. This is another important guideline of delivery pipeline: successful execution depends on the organization’s ability to divide large efforts into small, incremental, idiomatic slices. Work breakdown that intelligently reflects the delivery pipeline contributes to the efficiency of its execution. Delivery pipeline provides critical guidance to the distributed organization in this breakdown.

Automation is fundamental to the delivery pipeline concept. Automation is a big subject with importance beyond the delivery process. But it supports delivery in several key ways, and is essential to distributed production.

First, automation facilitates consensus and standardization, which are more complex for distributed teams and are crucial for execution. Automation makes consensus permanent and actionable.

Automation is a central platform for continuous improvement. As the organization identifies and implements automation improvements that strengthen the delivery pipeline, it accelerates team maturity.  The process itself facilitates a culture of ownership and responsibility within the organization.

An important aspect of automation is test automation. A small, well-designed, efficient suite of acceptance tests provides fast validation at key points in the delivery process and are a critical asset for distributed production. The entire organization must have ownership and responsibility for creating and maintaining acceptance suites.

Most importantly, automation is an excellent source of metrics. Metrics are an important and commonly overlooked pillar of modern SDLC, including Agile. Delivery automation is a primary opportunity to collect KPI indicating organization performance and improvement. Measurement and analysis of outcomes are essential for successful delivery in a distributed environment.

Feature switching is another core characteristic of the pipeline concept. Feature switching enables newly delivered functionality to be activated or deactivated independently, decoupling delivery from functionality. This provides critical flexibility and it’s an essential tool for highly distributed teams to manage delivery risk.

Delivery pipeline is a central practice within continuous delivery. Focus here builds momentum in the ongoing journey towards increasingly refined and sophisticated SDLC. This is one more way the pipeline concept helps bridge the increased complexity of distributed teams with the powerful benefits of this advanced organization structure.

Posted in Uncategorized

Horizontal Orientation

Following my recent post on distributed SDLC, I’d like to detail a key factor contributing to successful software production with highly distributed teams: horizontal orientation.

The Agile mindset is horizontal. Its purpose is to delight customers. Making money is the result, not the goal of its activities. Its focus is on continuous innovation. Its dynamic is enablement, rather than control. Its communications tend to be horizontal conversations. It aspires to liberate the full talents and capacities of those doing the work. It is oriented to understanding and creating the future.

Steve Denning, Forbes, January 26, 2015

Horizontal oriented organizations empower teams and individuals with latitude to self-direct toward outcomes rather than follow rote processes. This leads to productivity gains through shared goals, sense of purpose, and joint responsibility. Teams have broader awareness of the direction, challenge, and opportunities of the overall effort.

In contrast, vertical organizations require greater levels of management approval, and are command-and-control oriented. Decision making power rolls up to higher levels in the organization. Groups are not highly empowered to make decisions, and are generally not aware of efforts outside of each individual domain.

Horizontal orientation must develop to build enterprise systems sustainably in a distributed environment.

Horizontal orientation empowers teams to to find their own best practices while conforming to the broader SDLC, and to distinguish their working patterns from those of other teams. With guidance, this specialization leads to ownership, productive competition, and strong peer relationships that are really critical for successful software production in a distributed environment. Vertical organizations struggle to simulate these essential qualities.

Documentation is a critical aspect for distributed teams. User Stories are not contracts. They require trust and shared agreement on outcome, without precise detail about how to achieve it. This is another major area of challenge for vertical organizations. In an environment of low empowerment, low trust, and low ownership, groups in a vertical organization must communicate using formal, contract-like documents that specify minute implementation detail. This is a huge loss in efficiency for any organization, and a key reason why horizontal orientation is essential for distributed software production.

Vertical slicing, or delivery of incremental but complete units of functionality, is a key aspect of modern SDLC. For distributed teams producing enterprise systems, vertical slicing almost always requires multiple teams to work through complex inter-dependencies in order to deliver successfully. Horizontal orientation is the basis for strong communication, coordination, and trust, that are required to do so.

Distributed organizations are complex, and teams must have sufficient latitude to make mistakes and recover from them. This in turn requires trust, transparency, open communication, and strong peer networks. Horizontally oriented organizations have a much greater capacity to recover from errors than vertical organizations. In fact, vertical cultures can sometimes be so challenging that no level of error is acceptable because of political or bureaucratic factors. This is a particularly stark contrast since one of the primary goals of modern SDLC including Agile is to encourage risk-taking and a culture of safety and support within the organization. This is even more important for distributed development.

Distributed teams have enormous potential to create value in the development of enterprise systems. Horizontal orientation is an important factor in capitalizing this potential into reality.

Posted in Uncategorized