Armchair Architects: So, you want to build a platform…

In this episode of the Azure Enablement Show, David Blank-Edelman talks to our Armchair Architects, Uli Homann and Eric Charran about what architects think about different platforms: What are the different kinds of platforms? When should you build one? What factors should be considered when designing them? What business problems can they solve?

What do we mean when we say platforms?

We get a lot of questions about platforms. We’ll start off our conversation with a level set about what we mean when we talk about building a platform.

A platform is a collection of technical capabilities that organizations have, which are intended to solve line-of-business problems. So typically in medium to large sized organizations the IT department will build a big data platform and application hosting platform and inner source platform with the goal of lines of business with specific value to be gained from this platform. It would help them in their jobs, bring ROI, and make their lives a lot easier. So when we say platforms, it's usually an assembly or a collection of technologies. Some of them are well engineered and managed like a product and some are not.

The reason why platforms are becoming key to many successful companies is they host many or multiple business applications. In the past, we used to build a technology stack per business application and deploy it, then maintain it as a unit. What we're now starting to see is that people are saying, no, I want to have a platform, but I wanted to run multiple applications. Because I run multiple applications, I do not want to maintain a stack per application. And that really started when you think about it with the hypervisor.

With the hypervisor we broke the link between hardware and software for the first time. Before that you had to have the operating system run directly on the hardware and there was no mobility. And now the hypervisor allows you to separate the operating system, the guest OS and move it around in a cluster of machines, giving much more flexibility. The hypervisor was the first platform that people used in a much more general-purpose sense and then we’re now starting to see Kubernetes as the next platform. Then people have always looked at databases as platforms or database as a service has been something that a lot of people have invested time in and so the key thing is that they want to manage one artifact. Be that a hypervisor, infrastructure, database or something like that, and then ideally run as many applications as possible against that artifact.

What are some of the challenges involved with designing a platform that customers want to use?

There are challenges associated with the concept of a platform. One challenge we see with customers is, we'll get to the meeting and we're having a great technology discussion and the C-Suite person leans back and says, I'm not getting a lot of value out of the data platform. I just don't feel like it's actually doing anything for me. I know data is going into this enormous data lake and I know I can hire engineers and people to unearth value out of the platform, but I just don't feel it.

And then platform owners typically say, yeah, I feel like we spent a lot of time, effort and energy building this amazing platform, whether it's a monolith or microservices based, hypervisor based, or a container based – but no one's using it. They're all actually going out in the lines of business and buying their own SaaS solutions or subscribing to completely off-strategy cloud platforms and then using those because they say that they're faster and easier.

We should talk about how to avoid those pitfalls. How do you actually start building a platform that people are going to use and then take it a step further and build a platform that people want to use instead of going outside or doing a bunch of dark IT stuff.

What are you going to do to make a good platform?

The first thing is to make sure that you're aware, as an architect and a technologist, of the requirements and to understand what this platform needs to be. The best way to think about this – and I've seen the most successful organizations implement these platforms successfully – is if they actually treat it like it's a product that they're building a solution for a line of business.

Now the difficult thing is that you can't over index, you can't actually build a lock and key solution for line of business. At each point in time you need to understand what those lines of business, your consumers, or your customers need and then you need to abstract out the common elements to share across those multiple lines of business. The danger is over indexing and building something that's completely verticalized, but not very useful to another adjacent line of business or the rest of the organization. But managing it in that way allows you to build an automatically useful day-one piece of software that functions as the platform that multiple lines of business can plug into.

When should you build a new platform instead of using SaaS and PaaS on a commercially available cloud?

We can start with a bit more of a controversial statement first. Don't build platforms just because you can. The best platform you've built is the one you've never built. Meaning, in the age of cloud, go with as much SaaS as you can, then go with PaaS, and then only if there's differentiated value in a platform element that you really feel you can't get from a provider – whoever it might be – then consider a platform.

It has to be differentiating. It has to be something that is really common. Meaning, that again, multiple applications need to use this, otherwise it makes no sense. And so the best thing is don't build it. If you have to build it, be very clear what are the common attributes that you need and ask why people cannot use a commercially available platform instead. In the age of cloud, I think that's the first set of considerations that you should think about.

What differentiates a platform from an operating system?

Platforms are all about balance. The platform itself should never be center stage. The platform itself should actually just exist to be in service to lines of business and the extraction of value from its utilization. As a result, the best platforms are the ones in which you as a consumer, don’t know it's there. You're consuming some core strained API. When there's an abstraction layer, you don't have to worry about what happens beneath the platform. But when the platform itself develops a life of its own, and we do things for the platform’s sake, then we might be drifting out of balance a little bit. Because now the platform takes on its own life and has its own goals and desires, which may run counter to or make the consumption of the platform services extremely difficult to those lines of business.

Is a good platform one that you can bounce out of in favor of hosting your applications on another or is there value to locking in your apps to run on a specific platform?

This is interesting and it’s a multifaceted answer. For the people writing the checks to build the platform, they want everybody in the world to use that platform. And if they're not, why? Let's go eradicate those people and get them onto the platform. If the platform is not good enough then let’s spend more money to rev the platform.

The other approach is, instead of tightening your hold and watching it slip through your grasp, have open and honest conversations and figure out why people aren’t super excited to use the internal platform. Ask why they are going to the cloud or why they are combining different multi cloud strategies associated with the internal platform. And then figure out is that okay, can they do it better than we can? After examination you can make improvements as needed.

Customers often look to build custom platforms to solve business-specific needs that aren’t addressed by commercial platforms.

Many platforms are actually built inside a specific company. At a customer this week, they went down a DevOps path for quite some time where each team decided to build their own database, messaging, etc. It turns out there were common choices made and the DevOps team spent a lot of time maintaining common components like messaging and databases. And so this company decided to centralize this after seeing the DevOps team spending more than 50% of their time maintaining platform-like components for their applications. They evaluated commercially available PaaS services, found them lacking from either a commercial or capability perspective, and decided to package it up as what they call Custom PaaS. I think that’s the majority of platform efforts you will see inside a company and that’s perfectly good.

They at least did the due diligence, looked at commercial offerings, found they wouldn’t fit their scenario, and so they're packaging up their own capabilities and messaging database as a PaaS platform. You will see there's very few people that will even endeavor building a platform that is commercially viable across multiple clouds. There are some and that makes sense, but most platforms are going to be in commercial customers where they just need specific capabilities related to their industry.

What are some of the business considerations when you decide to stand up a custom platform?

We can talk a little bit about what it means to be successful when it comes to building a platform.

If a corporation is building a platform for a specific need, they have to realize they are now serving customers when they’re not used to it. They are used to being the customer themselves and writing the code. Now you have to think about dependencies that other people have on you and your capability has to be available. You need to have service level objectives so that everybody who takes a dependency on you knows exactly what the rules of the engagement are. You need to define things like throttling. You need to define service hours.

Effectively you're becoming a service provider to other folks and you've got to think about that seriously. Then you need to think through high availability architecture, disaster recovery, or business continuity and failure detection. Many times we don't really spend enough time instrumenting code.

Monitoring instrumentation and the resulting monitoring is super critical. You should be able to detect the error or any problems before your customers do. That takes discipline.

Then you need to think about life cycle management. If you are going to update your platform to fix bugs and add security patches. Dependencies like operating systems need to be patched. How do you deal with that? What's your strategy to update? Is it going to be you're taking the whole thing down and everybody is down or is it a rolling upgrade? If it's a rolling upgrade, how much capacity do we have to do that? So there's a bunch of these kind of things that in often if you just build an app are not that prominent. They are there. You should think about all of them, but once you are a platform where many people depend on you, all of those become amplified and you need to think about your change management being under more scrutiny, instrumentation, and those kind of things.

What are the two measures of a successful platform?

If you are now becoming a service provider then you have dependents. Then there are two measures of success and ideally you want both. The first is the platform can be and should be an engineering success. But the worst case is when it's an engineering success and nobody is using it.

The converse is also true. If it's an engineering mess, but everybody's using it, now you have a different challenge because it can't scale. It's down all the time and you're just letting everybody down. The key things to worry about are, how can you measure success? How can you make sure you're building what people want? How can you anticipate demand? How do you even work around chargebacks? If you eventually are going to build this platform and make this investment, you may even want to implement internal financial chargebacks so people that want to actually utilize the platform.

And then finally, in order to avoid the scenario where somebody says they don't feel like they’re getting a lot of value out of the platform, then you actually want to have data associated with the utilization of the platform. Find out if people are using it and if they're not, why aren't they using it? And you have to approach it from a position of humility. The overarching goal might be to force an organizational mandate, and sometimes that works and sometimes mostly it doesn’t.

What are some of the architectural functional and non-functional concerns that come up when designing a platform?

There are several. There are functional and non functional. Such as capacity and availability. But I think the anatomy of the platform is something that has to be constantly looked at and constantly focused on and change over time as the organization’s needs change. But the way I think about it is that there's kind of three concentric circles.

The first circle is the platform has to be able to understand and have a full command of the entities, the core rules and validations associated with what it acts upon. And then the second concentric circle is, well, how do I actually expose these things? Like if the course entities are transactional systems or analytical systems or APIs in other micro service implementations, how do I actually implement a layer of abstraction that is coarse grained enough that is a joy to consume and not a pain to consume. And then if other people are consuming this API, how can I make it easier for them to stitch it together with even other microservices?

And then on top of that, there's a secondary consumption plane. If you're not integrating directly with the API of the platform, then how do we actually integrate with low code no code business logic applications, experiences and how can I make sure that without a coding background I can actually consume this platform?

Having a resource governor, creating thoughtful extensibility models, concepts like isolation and compartmentalization are additional considerations.

You mentioned capacity. That's a good one. You need to always think about this, not just for your own needs. Now you need to think about all the everybody else's needs and how you service those. How do you get the signal from people? If it's an internal platform, I'm sure it's fairly straightforward, but you still need to work through it. Then I think there's a thing called a noisy neighbor. So that means you now have a platform and all of a sudden you have my friend David using it. And my friend Eric using it. And Eric has written very bad code. David has written clean code. Eric is hogging all of your resources, so you need to ask if you need a resource governor. So that effectively is a technology that watches for abuse and potentially either throttles or removes the noisy neighbor.

Eric's point about extensibility is an important one. If you allow code to be built on top of your platform or integrated with the functions of your platform, like a SQL Server UDF, you need to think it through and ask, can this code potentially impact the availability of my platform? Suddenly you have these requirements because you're serving many people now, that you can't effectively go and do this. So if you go back to SharePoint, for example, before the cloud versions showed up, the extensibility model was built right into SharePoint itself and so upgrading the solution was very, very complicated. The SharePoint lifecycle and your code lifecycle all had to work together. A lot of ERP systems are still working like this.

And so when you're looking at what SharePoint is doing and what the Dynamics team has done, they're now all externalized custom code extensions and allowing through API's and events to call back into the core engine. And that has a number of benefits because you can still extend the platform. But now you have your own separate lifecycles, so the platform can evolve independently of custom code that's associated with the platform. And I think those are some of the patterns that you need to dig deeper into.

You might ask if you want to allow this at all. Originally, when we were calling Office 365 by another name, there was no extensibility because the team hadn't figured out the extensibility model yet. So we simply cut it all off, saying here’s SharePoint and you can use it or not. You can't extend and that might be okay for a V1 platform to not support that kind of thing and grow into this kind of extensibility with more consideration. Isolation and compartmentalization are two of the qualities that you really want to pay attention to.

Architects should also think of data as a platform.

One of the new platforms we are seeing is one that people don't think of as a platform because it's not code or stuff like that. It's actually data.

So normally we used to design data by use case. The use case requires some data. We create a data model of some sort. If it's structured as relational or non relational doesn't really matter. There is a schema of some kind and then you write code against this schema, and everybody lives happily ever after.

What really is happening now is people are starting to say hey, let's turn this around and say we built the first data set using one or two use cases. But now that we have the data set, what are the use cases? Can I actually build on top of this data set? You kind of turn the relationship around instead of making it a one-to-one relationship between use case and data, you now ask how much data can serve as a platform for how many use cases. And that way you avoid silos. Because that's another big piece that platforms actually solve is that there's too many silos everywhere that need to be maintained and you get more value out of the investments you made already. The data and use case relationships make that very plastic.

It's very interesting to have this two-way relationship between the platform dictating the applications and the applications dictating the platform.

To hear the whole conversation, you can watch the videos, found here as Part 1 and Part 2.

Published on: December 08, 2022

Learn more

Azure Architecture Blog articles