Environment strategies for growing enterprises

Nikola Schou
15 min readAug 17

--

Did you ever struggle in your organisation with a feeling that in many software projects and IT organisations in general, environments seems to be among the most difficult things to get right? And did you ever wonder why it can be so hard? Read on!

I have asked myself this question many times and even though I always had an intuitive understanding of what is wrong and how it should be fixed, I never managed to verbalise it well. In this article I’ll try to do just that and hopefully you can both recognise the issues I’m pointing to as well as understand the solution I’m proposing and the consequences it implies.

Starting with the end

Let me start with the end and state what I believe to be the core problem and the core solution:

Core problem: the industry consensus is based on a set of fallacies of software environment strategies

Core solution: the fallacies must be realised and a new set of environment strategy ground rules must be put in place

The fallacies are the following:

  • Fallacy 1: An environment strategy should define a number of environments in alignment with the testing strategy
  • Fallacy 2: The environments should be named according to the purpose of each testing phase.
  • Fallacy 3: There should be the same number of environments across all our systems so that they “fit together”.

We should put the following ground-rules in place instead:

  • Testability: All environments must be fully testable
  • Variation: Different systems shall be accepted to have a varying number of environments according to what is optimal considering the nature of the systems.
  • Growth: We must allow the number of environments to grow over time so that the testing strategy has the needed room to evolve
  • Naming: Environments should be generically named and not coupled to the testing strategy

I hope by the end of this article, you will agree with me and enjoy this picture:

Issues in current IT organisations

Let us first compile a list of typical issues you may find in your organisation. Issues typically come as complaints dropped over the coffee machine or from agitated people in large meetings pointing fingers at each other.

You may recognise the following as statements between different people from different parts of the IT organisation:

  • Issue 1 — Test Manager: “It is very confusing that what you call the INT environment, we call the QA environment. Everyone gets confused about where the Quality Assurance process and the Integration Testing really takes place. Can’t you rename your environment so we can speak straight?”
  • Issue 2 — Test Manager: “We have 4 environments for our system, and you only have 3. That creates a problem to us!”
  • Issue 3 — CTO: “Setting up new environments takes 2 months and increases the cloud-cost substantially. Although we would really like some additional environments, we simply can’t afford it in terms of timing and cost.“
  • Issue 4 — Project Manager: “The UAT environment is occupied by another project so we can’t go live as planned.”
  • Issue 5 — Developer: “It is so inefficient to develop this system as we can’t run and test it locally. It can only be tested properly in the shared environments.“
  • Issue 6 — Developer: “This feature is so hard to develop and test because we need to bring the code to the UAT environment to see the feature working and to get there, we need to wait for the current release to complete UAT testing. This makes any evolution in this feature almost impossible.”

And the most serious one:

  • Issue 7 — CEO: “We as a business are stuck with no time-to-market. We have lost all our business agility. Even the smallest change made to our platform takes 2 months to get to production.

Let us now try to understand what causes these issues. We will apply a structured thinking process as shown next.

Seeing all issues as implications from the wrong ground rules

In order to understand the root causes of the environment mess found in most complex enterprises, we need to go through a systematic analysis. This analysis starts from what I believe to be industry consensus around environment strategies. I want to prove by deduction that the industry consensus creates all of the above issues. Based on this we can realise that the industry consensus is wrong and we can build a new and better set of ground rules.

We need a small digression here before we can proceed. By the term system I will refer to any kind of software-based system in a general sense. Think of your customer portal as one system, the ERP system as another system and the CRM system as a third system. Other examples are the Integration Bus, the Email Gateway Service or the Azure Active Director. We can generalise and say that a system is any piece of software that is independently developed, released and configured.

I will now state what I claim to be the consensus in the industry around environment strategies. The rules expressed here may seem so obvious that they are hard even to question.

  • (S1) An environment strategy should define a number of environments in alignment with the testing strategy
  • (S2) The environments should be named according to the purpose of each testing phase.
    Elaboration: If the testing strategy defines a User Acceptance phase there should be a UAT environment. If the testing strategy defines a Business Acceptance Testing performed by another group of people, there should be a dedicated BAT environment.
  • (S3) There should be the same number of environments across all our systems so that they “fit together”.
    Motivation: It seems obvious that in order to have well-integrated environments, all systems in the enterprise must have the exact same environments

Let us now add some facts in which I believe we can all agree.

  • (S4) All our systems in our environments must be well-integrated so that all our business use cases can be properly tested in each environment.
  • (S5) The number of environments that is optimal for a given system, depends on the circumstances around that system.
    Elaboration: For core systems like the Active Directory or an SMTP server, it probably suffice with a single environment. If you have complex configuration of Active Directory, you may benefit from having two environments. For frontend applications where you have bi-weekly releases, you definitely need more environments.
  • (S6) The business and IT organisation will evolve over time, and so will the amount of phases in the testing strategy.
    Elaboration: In the beginning it may suffice to have a single QA-phase performed by the internal testers of the development team. Later you may need to add a business acceptance testing, which brings another phase to the testing strategy. After this you may realise that you need to have a Release Test performed by a release manager. At last you may decide to add a Pilot environment used to roll out the new release to a partial customer segment to mitigate the risk of regression errors coming with new releases.
  • (S7) Once an environment has been established under a given name, there is substantial friction and cost in renaming it, which makes us avoid such renaming.
    Elaboration: This stems from the fact that the environment name tends to be hard-wired in the naming of underlying infrastructure resources ranging from resource groups, to container name, database names, networks names, dns names, certificates and so forth. Although there may be cases where cloud-automation has been perfected to the degree where setting up a new environment under a different name in theory is easy, when you consider the integration with all other systems in the system landscape it is far from trivial.
  • (S8) Systems tend to be integrated in such a way that proper testing require more than one system to be present. This often implies that changes to certain features require changes made to several systems which must be developed, tested and released in coordination.
    Motivation: If two systems depend on each other, and complex changes are made to both as part of a new feature being developed, then very often the developer of one or the other system need to apply various tricks to develop and test the new features. This may be done via mockups, manual invocation of certain endpoints on localhost etc.
  • (S9) There is a substantial difference in the speed by which various systems are able to release new features.
    Motivation: The speed of releasing new features depends on the degree of automation and the quality of the regression test suite in place. Systems will vary vastly in these respects.

Let us now apply common sense to prove how the issues listed in the beginning are direct consequences of the industry consensus, S1-S3.

  • Given that the environments are defined from the testing strategy (cf. S1), and that the testing strategy must evolve over time from a smaller set of environments to a larger set of environments due to evolution in the business (cf. S6), and given that environments are very often not renamed due to friction (cf. S7), we can derive Issue 1 and Issue 2 where environments across the system landscape end up in an inconsistent state, where we have a lack of consistency in naming and a lack of compatibility as we end up with a different number of environments across systems.
  • Given that there should be the same number of environments across all systems (cf. S3) we arrive at Issue 3 where setting up a new environment becomes more and more expensive, the more systems you have and henceforth new environments is in reality not an option. This again implies environment congestion where too few environments are serving to many streams of work which is expressed in Issue 4 where the project manager complains that the UAT environment is occupied by another project.
  • Given the idea that we need the same number of environments across all systems (cf. S3), people tend to build bidirectional couplings between systems (because it seems completely natural) which implies that you need both systems running on your developer machine which over time becomes a bigger and bigger problem, and finally Issue 5 is a fact, namely that certain systems and features cannot be properly tested on a developer machine. And this — as we all know — simply kills productivity. You may even suffer from a situation where a feature can only be tested in an environment at the end of the entire release-cycle and to get there you need well-tested code to not block the release. You have a real catch-22. In such cases, you have ended up with features for which development is extremely costly, which is expressed in Issue 6.
  • Given that in many cases, changes made to different systems must be tested and released in coordination (cf. S8), and that there is a substantial difference in the speed by which various systems are able to release new features (cf. S9), we will see a “weakest link” effect where the slowest evolving system will dictate the speed of releasing features across all systems, and this leads to Issue 7 where the CEO complains about the lack of business agility.

We have now seen that the industry consensus on environment strategies will take you directly to a long list of issues having substantial negative impact on your business.

This brings me to declare:

The 3 fallacies of environment strategies:

Fallacy 1: An environment strategy should define a number of environments in alignment with the testing strategy.

Fallacy 2: The environments should be named according to the purpose of each testing phase.

Fallacy 3: There should be the same number of environments across all our systems so that they “fit together”.

Working out a better set of ground rules

The next step is to propose a new set of ground rules for environment strategies and prove that they will resolve all of the above issues.

An environment strategy must fulfil the following ground rules:

  • (T1) Testability: All environments must be fully testable
  • (T2) Variability: Different systems shall be accepted to have a varying number of environments according to what is optimal considering the nature of the systems.
  • (T3) Growth: We must allow the number of environments to grow over time so that the testing strategy has the needed room to evolve
  • (T4) Naming: Environments should be generically named and not coupled to phases in the testing strategy.

Note: Being testable means by definition that all use cases the business cares about can be tested. (T1) says that all use cases can be tested in all environments. This is far from a trivial rule.

You may stop and reflect a bit on these ground rules. If you find them reasonable or perhaps even obvious at first sight, then congratulations! If not, keep on reading.

Two foundational relationships between systems

In order to better explain what follows, we need a small digression and introduce two abstract ideas that may seem a bit hard to grasp at first.

  • Definition 1: System X is said to depend upon System Y if an environment of System Y is needed in order for an environment of System X to be testable. We may occasionally use the notation “X → Y” to denote this relationship.
    Elaboration: As an example you can think of System X making API calls to System Y. Or sending commands. Or having a browser-based redirect flow to System Y and back again.
  • Definition 2: System X is said to be many-to-one with System Y when System X and Y are built and integrated in such a way that we have the opportunity to have multiple environments of System X fully testable with a single environment of System Y. We may occasionally use the handy notation “X > Y” to denote this relationship.
    Elaboration: Notice that the many-to-one relationship is a matter of having the opportunity of setting up multiple environments of System X with a single environment of System Y. It does not mean that we will necessarily utilise this opportunity.

Let us make a couple of logical derivations from the ground rules using these definitions.

  • (T5) If a system depends upon another system, then it must also be many-to-one with that other system. We can also say, that if X → Y, then X > Y.
    Explanation: Given that System X may have a higher number of environments than System Y (cf. T2) and all these environments must be testable (cf. T1) it follows that of System X must be many-to-one with System Y.
  • (T6) If a system depends upon another system, the opposite cannot hold. In other words, dependency must be unidirectional.
    Explanation: Assume the opposite was true — that two systems can depend upon each other mutually. In this case you end up with a tight constraint that the two systems must exist in exactly the same number of environments, hereby violating ground rule T2, that the number of environments of the two systems may differ.
  • (T7) The dependency graph between all the systems must be uni-directed and acyclic. The same is true for the “many-to-one”-relationship.
    Explanation: It follows from generalisation of T6 by observing that any cycle of dependent systems would force them to exist in the same number of environments which again violates ground rule T2.

What exactly have we proven? We can summarise it like this:

  • IF we have the ambition of adhering with the ground rules T1 and T2,
  • THEN we must respect constraint that the whole system landscape must be organised in a unidirectional acyclic dependency graph.

Equivalently we can say:

  • IF we cannot organise the system landscape in a unidirectional acyclic dependency graph
  • THEN we can not fulfil the ground rules T1 and T2

It is well-known that a directed acyclic graph (DAG) can be understood as an ordering — see for instance this article. It means that we can perceive the system landscape of the enterprise as having a notion of “higher systems” (those depending upon) and “lower systems” (those being depended upon). It is tempting to refer to them as upstream and downstream systems but that will be the topic for another article.

The following illustrates the idea of a hypothetical system landscape nicely organised with System X at the top, System Y and System V in the middle and System Z and System W in the bottom.

Using T5 we know that dependency implies “many-to-one”-ness which we can illustrate as follows.

Please notice:

  • System X can only be fully tested with an environment of System Y which again can only be tested with an environment of System Z.
  • There can be any number of developers each having a local environment of System X — all using the same environment of System Y (in this example called Env 2). Similarly, all developers of System Y can use the same environment of System Z.
  • You may think of System X as the Customer Portal, System Y as the ERP system and System Z as Azure AD.

Verifying that all issues are resolved

At last we have reached the point where we can address the issues from the beginning of the article and prove that we have resolved them. Let us go over them one by one.

  • Issue 1: — Test Manager: “It is very confusing that what you call the INT environment, we call the QA environment. Everyone gets confused about where the Quality Assurance process and the Integration Testing really takes place.”
    Resolution: Given that naming of environments is decoupled from the phases in the test strategy (cf. T4), there is a reduced risk of a Babylonian mess of environment names entangled with test phases.
  • Issue 2 — Test Manager: “We have 4 environments for our system, and you only have 3. That creates a problem to us!”
    Resolution: It is built into the ground rules that variation in the number of environments must be fully supported. We have shown that by having a directed acyclic dependency graph between all systems, systems may in fact have a varying number of environments while remaining fully testable.
  • Issue 3 — CTO: “The cost of setting up new test environments are simply too high.“
    Resolution: Setting up a new environment of a specific system, is no longer a huge undertaking. We can limit the effort to set up a new environment of the specific system under development.
  • Issue 4 — Project Manager: “The UAT environment is not ready so we can’t go live as planned.”
    Resolution: If multiple projects work in parallel in the same system, and access to certain environments are on the critical path of one or more projects, the solution is to simply create new environments of that single system until the amount of environment congestion is reduced to no longer be the constraint in the projects. This is possible due to ground rule T2.
  • Issue 5 — Developer: “It is so inefficient to develop this system as we can’t run and test it locally. It can only be tested properly in the shared environments.“
    Resolution: Relying on the many-to-one relationship it will be possible to setup a local environment of a system while running against dependent systems in shared environments. In other words, to develop a single system, you never need to setup more than only that single system locally.
  • Issue 6 — Developer: “This integration is so hard to develop and test because we need to bring the code to the UAT environment to see the integration working and to get there, we need to wait for the next release to move to UAT testing. This makes any evolution in this integration almost impossible.”
    Resolution: The ground rule to make all environments fully testable addresses this issue. It should never be allowed that certain feature can exclusively be tested in certain environments.

Next, let us put our attention to Issue 7:

  • Issue 7 — CEO: “We as a business are stuck with no time-to-market. We have lost all our business agility. Any change made to our platform takes 2 months to get to production.
    Resolution: Although business agility depends on many factors, one necessary condition is that we avoid the “weakest link” effect coming from a tight coupling between the number of environments of different systems. As we know, in any business there will be systems of innovations, systems of differentiation and systems of records. Clearly systems of innovation need much more active development than the other ones, and for these systems we need a larger set of environments to make it possible.

Conclusions

We have seen that the current consensus in the industry, S1-S4, can be seen to logically imply the issues mentioned in the beginning. For this reason I call S1-S4 fallacies of software environment strategies.

We have also seen that a new set of ground rules for environment strategies will resolve all issues.

Finally I explained why a uni-directed acyclic dependency graph between all systems in the system landscape is a necessary condition to implement the ground rules.

Next steps will require a detailed explanation of how a unidirectional acyclic dependency graph can be achieved in a system landscape as well as an explanation of how the many-to-one relationship between systems can be realised. There is a small number of architectural design patterns involved for both of these things and this will be the topic of some future article.

To wrap up this article I will emphasise that I have implemented these ideas in various large-scale companies and in reality it works out well and in my opinion it provides the expected ultimate benefit of business agility at scale.

This article has also been posted here:

https://sprintingsoftware.com/environment-strategies-for-growing-enterprises/

--

--

Nikola Schou

Partner and Development Director of Sprinting Software, passionate about digital transformations, digital products and software development