Bryan Friedman

The Evolving Technologist: Adventures of a Recovering Software Generalist

SpringOne Platform 2018 - Let’s Get Technical

SpringOne Platform is known for showcasing some of the most compelling customer stories you’ll find at any tech conference. Last year, we heard from many leading companies about how they are getting better at software. This year, there were more amazing tales of transformation from enterprise leaders. It’s safe to say that “it’s still about outcomes.”

But behind all these great outcomes is a lot of cool tech! I attended quite a few technology-focused sessions this year. They got me excited about the various announcements throughout the week. There was all the stuff you’d expect at a conference called “SpringOne Platform” — new versions of Spring components, Java 11 talk, and platform releases for PCF and PKS. Then there were so many other tech topics that showed up too. I found these five to be the most intriguing:

1. Continuous Everything (CI/CD)

We’ve heard the virtues of continuous integration and continuous delivery at SpringOne before. There’s plenty to be found on the power of Concourse as a CI tool. Even PCF operators are big on Concourse and its ability to provision and repave the platform. This year, there was talk of inviting a new friend to the party.

What Happened?

Pivotal announced that it “has a team working on contributing Cloud Foundry support to open source Spinnaker.” Spinnaker already supported AWS, Azure, GCP, Google App Engine, Kubernetes, and other platforms. Now, Pivotal is ensuring that Cloud Foundry is a first-class citizen of Spinnaker. Jon Schneider covered this on the main stage and in detail in a breakout session.

Why Is It Cool?

Spinnaker is one of the few true multi-cloud delivery platforms. Started by Netflix, it has contributions from Google, Amazon, Microsoft, and now Pivotal. There are two essential components: a multi-cloud application inventory and pipelines.

The inventory piece is critical, since applications rarely live on a single platform. Spinnaker presents an aggregate view of all your applications, clusters, and instances. (It can do this without even having deployed them.) This allows users to determine their application health and state across platforms. It also means Spinnaker is distinctly able to run out-of-band processes. As a result, it supports running things like vulnerability scanning or chaos engineering tooling at build time.

Along with the inventory, as you’d expect from a CD solution, Spinnaker offers pipelines. Even if you are a user of Concourse, Jenkins, or other CI tools, Spinnaker is best suited to help with these delivery aspects of your pipeline.

How Can I Get Started?

Check out Spinnaker on GitHub and at https://www.spinnaker.io/. Keep an eye out for the 1.10 release which will include an early version of Cloud Foundry support.

2. Secure Credentials

Presentations about security topics don’t always offer the most gripping demos. Still, I was very interested in a few of the breakout sessions on CredHub, the credential manager that’s baked right in to Cloud Foundry. It turns out, security can be seductive.

What Happened?

CredHub is the credential manager that’s baked in to Cloud Foundry. In his three sessions, Peter Blum offered a few different looks at CredHub. There was a great overview of how it works with PCF and Spring. His most fascinating session, though, brought the magic of CredHub together with Kubernetes.

In his example, a webhook object in Kubernetes injects CredHub into pods on the cluster. Then application code in the pods may access secrets from the credential store. It was a slick demo and an incredible way to show off CredHub’s simplicity and strong capabilities. Peter’s CredHub with Kubernetes code is on GitHub!

Why Is It Cool?

CredHub offers a secure way for humans and applications to interact with secrets. With Pivotal Application Service and the CredHub Service Broker, developers never have to know or see any passwords. Passwords are only available to application containers with authorized access. Each application container includes a signed certificate and key. This key provides identity when communicating with CredHub.

How Can I Get Started?

There are tons of amazing CredHub resources out there. Check out some of the recent blog posts from my colleagues:

You can also go straight to the CredHub Docs or the GitHub repo for more detailed info. For you Spring buffs, there’s even a Spring CredHub project.

3. Serverless

What’s a tech conference today without the mention of serverless? SpringOne definitely had its share of serverless moments. Of course, Pivotal Function Service (coming soon) got a shout-out from Onsi Fakhouri. Plus, there were plenty of other details covered about Knative and riff at the conference.

What Happened?

Mark Fisher did a live demo of riff on the main stage. There were also some very informative and demystifying sessions on Knative and riff. They ranged from YAML-heavy to YAML-free with one especially for Spring developers.

Why Is It Cool?

At SpringOne Platform last year, Pivotal announced riff, an open source serverless framework. Earlier this year, Pivotal revealed that riff was replatformed on top of Knative. This is the technology that is driving Pivotal's serverless future. Knative and riff will power the yet-to-be-released Pivotal Function Service.

How Can I Get Started?

Check out https://projectriff.io and https://pivotal.io/knative for more details. You can also find both riff and Knative on GitHub.

4. Buildpacks Everywhere

Containers and Kubernetes are hot topics at conferences. What's even hotter? Taking control of the application lifecycle in a container-centric world. Developers want a fast and secure way to get from source to container. It’s something the Cloud Foundry community has had solved for a while with buildpacks. Now this solution is expanding.

What Happened?

Day 1 main stage had a surprise ending from Stephen Levine from Pivotal and Terence Lee from Heroku. They introduced an effort to bring buildpacks to the broader cloud-native community. It's called Cloud Native Buildpacks, and it joins the CNCF as a sandbox project today.

Why Is It Cool?

Buildpacks are an “opinionated, source-centric way to build applications.” They are a big part of the magic behind Cloud Foundry’s `cf push` experience. Buildpacks detect the kind of app then fetch and install the tools needed to run it. For operators, the ability to manage a curated set of buildpacks is attractive. It also allows for rapid, secure patching en-mass using remote image layer rebasing. All the while, developers simply focus on delivering value for their own customers. The new specification and set of tools enable buildpacks to be used on any platform.

How Can I Get Started?

Check out https://buildpacks.io/ for more info. Meanwhile, use the `pack` CLI to experiment with Cloud Native Buildpacks.

5. Reactive Programming

Reactive programming is not a new concept for SpringOne Platform attendees. I vividly remember Phil Webb’s awesome keynote from last year comparing blocking with non-blocking. (Who can forget the swimming ducks and cats?) This year there was more Reactive-related fun.

What Happened?

There were two impressive keynotes relevant in the Reactive programming space. First, there was the introduction of the non-blocking relational database connectivity driver, R2DBC. We also learned about RSocket, a new message-based application network protocol.

Why Is It Cool?

In two articles on InfoQ, Charles Humble examines both R2DBC and RSocket. He does an amazing job explaining the advantages of Reactive programming. As Pivotal's Ben Hale explains in one article, "Reactive programming is the next frontier in Java for high efficiency applications." He points out two major roadblocks to Reactive programing: data access, and networking. R2DBC and RSocket aim to address these problems.

I found RSocket to be particularly fascinating. In the main stage presentation, Stephane Maldini gave a brief but helpful history of TCP and HTTP. He framed RSocket as an alternative to these protocols while sort of bringing the best of each to bear. Rather than simply request/response, RSocket offers four different interaction models. (They are Request/Void, Request/Response, Request/Stream, and Stream/Stream.) What's more, it's language-agnostic, bi-directional, multiplexed, message-based, and supports connection resumption. It kind of blew my mind.

How Can I Get Started?

As always there’s a .io site for RSocket (http://rsocket.io/) and an RSocket GitHub repo. R2DBC is on GitHub too. It’s also worth checking out the related content from the conference. Ben Hale covered both R2DBC and RSocket in his sessions.

Next Year in Austin!


My Interoperable Opinions of Cloud Foundry Summit 2018

Last week I visited Boston for the first time and attended my very first Cloud Foundry Summit. I also took the opportunity while I was there to make my first visit to Fenway Park. It was a fabulous week of firsts for me.

As with any conference, one measure of excellence is the amount of quality examples of customer success stories. It's also nice to see compelling demos of new and interesting technology. CF Summit 2018 did not disappoint in either of these departments. In fact, my colleagues have already written quite eloquently on these topics. So I'll spend some time on something else that was a key theme of the conference.

Interoperability FTW

Interoperability was an explicit thread through many of the keynotes and breakout sessions. Cloud Foundry Foundation CTO Chip Childers even hinted at this trend back in January.

To be sure, Cloud Foundry tech has always championed interoperability. It's multi-cloud. It's polyglot. It's OCI-compliant. The Open Service Broker API was even born of Cloud Foundry. (It's now been adopted by the Kubernetes community.) It was fantastic to see these concepts expand even more this year.

There was the introduction of Alibaba Cloud as a BOSH CPI. Some awesome advances in .NET support appeared (plus a whole conference track to go along with it). Kubernetes was also mentioned quite a bit as the Cloud Foundry Container Runtime continues to take hold.

Indeed, it's nice to see this interoperability movement flourish. Still, I couldn't help but think of how it relates to another critical part of Cloud Foundry's success.

Opinions Are Like... Everybody's Got One

Yes, it embraces interoperability. Yet Cloud Foundry has always been billed as an opinionated platform. So it's important to point out that "interoperable" and "opinionated" are not mutually exclusive. But they are equally important characteristics for an effective platform. Interoperability without opinions runs the risk of becoming complicated or difficult to use. But of course, opinions without interoperability may prove irrelevant. After all, a good platform has to be able to handle many types of workloads. It should integrate with the services and technologies that you need to use.

So both are important. But in my previous life working in IT, I'll admit I wasn't in the opinionated camp. I didn't even understand it as a concept. I generally went for selecting software with the ultimate flexibility. What I didn't realize was how often this led to analysis paralysis and decreased productivity.

I remember one of the last projects I worked on. We were selecting a software product for financial planning and reporting. Ideally, we'd have found a solution that did 80% of what was required. We should have reevaluated the actual importance of the other 20% we thought we needed. Instead, we focused on that 20% until we settled on something that could handle it. Then implementation details, changing requirements, and complex technology got in the way anyway. As I recently heard one industry analyst say, "Choice is not a differentiator."

Unfortunately, I had not yet learned about the value that opinionated software can bring. It's about a simplified user experience and increased productivity. I like how Duncan Winn describes it in his book, Cloud Foundry: The Definitive Guide:"

When you look at successful software, the greatest and most widely adopted technologies are incredibly opinionated. What this means is that they are built on, and adhere to, a set of well-defined principles employing best practices. They are proven to work in a practical way and reflect how things can and should be done when not constrained by the baggage of technical debt. Opinions produce contracts to ensure applications are constrained to do the right thing.

Platforms are opinionated because they make specific assumptions and optimizations to remove complexity and pain from the user. Opinionated platforms are designed to be consistent across environments, with every feature working as designed out of the box. For example, the Cloud Foundry platform provides the same user experience when deployed over different IaaS layers and the same developer experience regardless of the application language. Opinionated platforms such as Cloud Foundry can still be configurable and extended, but not to the extent that the nature of the platform changes...

That last part is key: "...can still be configurable and extended..." Remember, interoperability still matters. It just can't happen at the expense of complexity. That's why something like the Open Service Broker API is so elegant and powerful.

There's an interesting nugget there at the beginning of Duncan's description too: "...they are built on...well-defined principles..." It's not only how the software works but also what it's built on. The architecture is opinionated as well. A lot of times that means selecting a particular set of technologies or patterns and incorporating them together in a specific way. Basically: curation.

An Ounce of Productivity is Worth a Pound of Curation

Okay, so this play on a Benjamin Franklin quote isn't exactly a perfect analogy. But the point is, as I've recently heard a customer quoted: "Curation is how we get stuff done!"

In the consumer world, we enjoy the benefits of curation daily. We trust companies like Netflix to suggest movies and television we will like. We look to Amazon to tell us what we like to buy. Our Facebook and Twitter feeds are filtered for us. These are the modern giants of content curation. They use algorithms and AI to keep things relevant, but people still drive the behavior. Plus, think about traditional television or radio news, or even used bookstore or boutique owners. We embrace curation in our daily lives.

In the business and IT world, however, it seems like curation is often avoided. Remember the 20%? Sometimes the customer knows better and doesn't buy into an opinionated architecture. They insist on defining it themselves. It's true that curation may not be for everyone. Under the right circumstances, though, it can help save a lot of time and headaches. Determine where you are on the curation scale and pick the right solution. If you trust the curator, they can help.

At CF Summit, I attended many talks about Kubernetes and its role within the Cloud Foundry ecosystem. As Onsi Fakhouri spoke about at SpringOne Platform late last year, it's an and conversation, not or. It's not about Kubernetes vs. Cloud Foundry, but rather how can they interoperate? Or, more specifically (and more opinionated), how should they interoperate?

This was a popular topic at CF Summit this year. Right now, Cloud Foundry has a few ways it interoperates with Kubernetes. Most prominently it's a separate container runtime (as opposed to the application runtime). Some things fit better on the container runtime (like stateful workloads, ISV container images). Some are made for the application runtime (12-factor apps, microservices, etc.). The opinion right now is that it all depends on the use case.

Other examples and conversations about Kubernetes interoperability showed up at the conference too. There were products that include CF running on top of K8s and demos showing K8s running within CF. As a first-time attendee, it was amazing to see the open discussion and sharing of ideas. That's the beauty of open source software and its community. It can evolve to incorporate (read: "curate") other growing technologies and find the right (read: "opinionated") way to put it all together. (For Cloud Foundry, it doesn't just mean Kubernetes either. Look at how the code base has begun incorporating Envoy for another example.) It will all come together in the way that makes the most sense for the user experience. In the end, that's all that should matter.

It's All About the Outcomes

Technology is a great enabler. We can't do technology for technology's sake. Containers are cool. Machine Learning is fun. Yes, there are some amazing pieces of tech out there. Except it's not about the tech itself, but rather what it enables for its users. It's the user experience, the productivity gains, the value, that matters.

Ultimately, technology should be about doing things better, faster, more reliably. That's the level that all software curation conversations should arrive at: customer outcomes. Whatever the future of Cloud Foundry and Kubernetes brings, we can't forget the fundamental goal: build software better.


Five Things that Blew My Mind at SpringOne Platform 2017

Richard Watson from Gartner led a customer panel in the final round of keynotes at SpringOne. In it, he asked the company leaders what blew their minds during the conference. Of course, it got me thinking about what blew my mind at SpringOne Platform this year. Here's what I came up with, in no particular order.

1. A High Quality Event

I've been to a fair amount of conferences in my career, and this one was truly top notch. Conferences are often draining and it can be hard to keep up the excitement throughout the week. This event felt elevated from the moment I checked in at Moscone Center. You had to be there to feel it I guess, but all these things contributed to the greatness:

  • Signage and graphics looked amazing and were well themed. Complete with ASCII art and 8-bit renditions of the keynote speakers.
  • The main stage room was incredible, and the keynote speaker lineup was tremendous. It was a nice mix of tech talks, customer stories, and philosophy. Everyone seemed to engage for the full two hours. That's quite a feat.
  • The breakout sessions were right-sized, on point, and on schedule. And they were well attended! During sessions, the hallways were empty, with only a few stragglers at some booths or on laptops.
  • It had a fun vibe! Lots of discussion and socializing. Plenty of power strips everywhere. Coffee, drinks and food available at regular intervals. There were even old school arcade games!

2. Open Source is Thriving in the Enterprise!

When I saw links to GitHub repos in the Comcast and Intuit sessions, it was another mind blowing moment. It's been a long road, but we're finally there. Open source is in the enterprise for real. And I'm not talking about using open source software, though that's impressive too. I mean that enterprises are contributing code back to the open source community.

Comcast has a lot of stuff out there, including a BOSH release for telegraf. Intuit showed off a validator and inspector for Spring Cloud Config. Other companies using PCF like Home Depot and Mastercard seem to have thriving public GitHub repos as well. What a time to be alive.

3. Windows and .NET at a Spring Conference!?

The announcement of PCF 2.0 highlighted some key Windows-related features. First, native Windows Server 2016 containers for .NET workloads. In one demo, Richard Seroter showed off ssh-ing directly into a Windows container. Typing dir into an ssh window may feel weird, but what a relief for .NET developers.

Speaking of feeling weird, how about displaying hardware at a software conference? That's right, PCF 2.0 will have beta support for Azure Stack. The Microsoft booth had a working Dell EMC server cabinet to showcase it. Mind blown.

4. A Who's Who of Cloud Native Celebrities

There were plenty of cloud famous folks to be found in both the keynotes and breakout sessions. I'm not ashamed to admit I had my fair share of geek out moments during the week. I've followed a lot of these tech personalities on Twitter for a long time, even before I joined Pivotal. So getting to see or meet a lot of them in person for the first time was super cool. It's like bringing my Twitter feed to life.

People like James Watters, Andrew Clay Schafer, Onsi, Coté. Or legends of the Spring community like Juergen Hoeller, Phil Webb, Kenny Bastani and Josh Long. There was an entire panel of brilliant women — Cornelia Davis, Meagan Kjelland, Therese Stowell, Erin Schnabel, and Mathangi Venkatesan — talking about distributed systems. Other giants of the tech community outside of Pivotal even made appearances — Chip Childers from the Cloud Foundry Foundation, Erich Gamma of Microsoft, and Google's Eric Brewer.

I know that's a lot of name dropping. But it really was an incredible showing of very smart and talented professionals. The best part about all of this is how lucky I feel to be able to call so many of these people colleagues now. That realization is what blew my mind the most.

5. Thoughtful Analyst Community

Finally, I have to drop a few more names so I can share the amazing interactions I had with the analyst community. RedMonk's James Governor gave a thought-provoking keynote. Richard Watson of Gartner led the aforementioned customer panel. And Dave Bartoletti from Forrester gave a great session on cloud native ops superpowers.

But it was the personal interactions I had with these analysts this week that had the most impact for me. It's one of the great privileges I have in my role at Pivotal now. I get to have insightful, relevant conversations with these folks. Doing it in person is always an even more superior experience. The questions they had about platforms and the product landscape alone blew my mind. I appreciated their thoughts and observations this week. I look forward to more mind-blowing 😲 action next year in Washington, D.C.


My First SpringOne Platform

After two full days at my very first SpringOne Platform, my head is spinning. At times I've felt excited, lucky, proud, impressed, and overwhelmed -- sometimes all at the same time. So what's the best thing to do when I'm feeling all the feelings? Write about it!

I've been having lots of thoughts that I can't shake in two key areas so I want to share about them.

PCF 2.0: It's a Cloud!

During the keynote on Tuesday, among a slew of announcements, Onsi Fakhouri unveiled PCF 2.0. I'm not going to get into the details here, but you can (and should!) read all about it and watch Onsi's incredible presentation if you haven't already.

A few months ago, I caught a glimpse of what was coming with PCF 2.0. When I saw a rough sketch of the "four pillars" on a whiteboard, I thought "Hey! That's a cloud!" It sounds silly to me now. Of course it's a cloud! It's right there in the name. Pivotal Cloud Foundry. And PCF 2.0 is its natural evolution.

To be clear, I'm not interested in having a "what is the cloud?" discussion. (I already get that with my family when they ask me what it is that I do.) Still, it's fair to say that the cloud encompasses many things these days. Public clouds now offer such a breadth of products and services that it's hard for some customers to keep up. At the same time, customers have more and more types of workloads and want more and more choices.

All the public clouds have an app service, a container service, and a functions (serverless) service. Some have more than one of each! They all also offer many data persistence and messaging services. So the concept of Pivotal Cloud Foundry offering these same products makes total sense. PCF is staying just opinionated enough. Like Richard Seroter commented in his summary of Day 1, customers will have choices, but not too many. The reality is that customers are running in on-premise data centers. They need workloads to run in hybrid or multi-cloud environments. IaaS isn't enough to constitute a "private cloud" anymore. But PCF 2.0 sure is. (And it's not even limited to that. It runs on public clouds too, remember!)

Everyone at SpringOne Platform seems pretty pumped about the announcements. But I've heard of other folks wondering why Pivotal introduced PKS when they already have PAS. Some may wonder why anyone would still use PAS once they have PKS. And of course, there are many who don't yet understand what role serverless has to play and why PFS is even a thing. It's simple. They are all choices. Did anyone ask Amazon why they didn't kill their app service once they launched a container service? As Onsi said in his talk, the conversation is not an "OR" conversation. It's an "AND" conversation. PCF will be able to handle all customer workloads.

During Wednesday morning's keynote, I felt a little like a kid eating ice cream for the first time. It was riveting watching Kim Bannerman and Meaghan Kjelland do a PKS demo and seeing Mark Fisher show off riff. There is such an exciting future for PCF and I'm stoked I get to go along for the ride.

Did Somebody Say "Digital Transformation"?

SpringOne Platform is full of developers and technology enthusiasts. There are plenty of tech talks and deep dives into code and platform architecture. I love that stuff and I attended a few sessions like that. Mostly though, I opted to attend the more customer-driven sessions. I haven't yet gotten to talk to enough customers in my time here, so I wanted to see the success stories up close.

See, I worked in IT at a large enterprise for 11 years. I saw how things run in an organization like that. I've been gone for more than 3 years, but I still know people there. Not very much has changed. They can get VMs provisioned a little faster now, but that's about it. So while I work for a company whose mission is to "transform how the world builds software," my experience in enterprise IT is so tainted, it has still been hard to fathom that it's actually possible.

But believe me, it is. Digital transformation is real, and it's spectacular. It's true that "digital transformation" as a term may be over used. It's probably the phrase I heard the most during all the sessions (aside from maybe "we're hiring"). The thing is though, buzzword or not, companies are actually doing it. And Pivotal is making it possible.

I listened to industry giants from many sectors — telco, banking, insurance, government, automotive — all tell amazing stories. It was inspiring. Refreshing even. It was beautiful. I found myself feeling sorry for my younger self, stuck in the past and trapped in a cloud-foreign world. It may sound hyperbolic, but I'm not kidding when I say there were moments of shock and awe. It's like meeting Big Foot. You've heard the rumors, you know the legend, but it's not real until you see it.

Of course, these companies' journeys aren't over. Far from it. They know that. They all said it. But they know the path now. They have the confidence they need to move forward. Or at least to move. Pivotal showed them the way and continues to partner with them on their journey. Like Onsi said, it's all about learning.


Comparing Public PaaS Offerings - Part 2

In my last post, I explored what it's like to deploy a Spring Boot app to four different public PaaS products. Now that the app is live, we have to manage it. As promised, for this post, I'll build on the experience by going beyond the Day 1 deployment. I'll try out some of the features of each platform associated with Day 2 operations.

Once again I'll look at three major public cloud players (AWS Elastic Beanstalk, Microsoft Azure App Service, and Google App Engine), as well as a third-party option that can run on a public or private cloud (Pivotal Cloud Foundry).

FULL DISCLOSURE: I work for Pivotal. I’ve also worked in the IaaS product space for 3 years. I have more than 10 years of experience working in enterprise IT. I’d like to believe I can remain pragmatic and present a fair view of related technologies.

I'm still not going to pick a winner at the end or make any claims about price or performance along the way. I'm focusing on the ongoing maintenance and management of the app once it's deployed. As before, I'm only interested in exploring the experience that each service provides.

By no means will I (or could I) cover everything required to keep an app up and running. I'll examine three key areas of Day 2 operations — observability, resilience, and patching.

Observability

The term "observability" comes from control theory and linear dynamic systems. It's "a measure of how well internal states of a system can be inferred by knowledge of its external outputs." In the software world, there are probably some differing opinions about its meaning. It sometimes gets used synonymously with "monitoring" and "logging."

In her post "Monitoring and Observability", Cindy Sridharan does a nice job of breaking it down. She borrows from Twitter's Observability Engineering Team's charter, and I like the definition. Observability is a superset of things. It has monitoring, but it also includes log aggregation, alerting, tracing, and visualization.

For each PaaS, I'll review some of the features offered that relate to any of these areas.

Note: The public clouds do tend to have a standalone service that covers lots of these features. Amazon has CloudWatch, Azure has Azure Monitor, and Google has Stackdriver. I'll try not to stray too far from the PaaS itself, and won't go into a ton of detail on these offerings. I'll just highlight where it's relevant and integrates with the PaaS product.1

Spring Boot Actuator

Since I'm running a Spring Boot app, I would be remiss not to first mention its Actuator. The Spring Boot Actuator exposes handy built-in HTTP endpoints for monitoring an application. (It supports adding custom endpoints as well.) For example, a /health endpoint shows application health information. This can be useful when setting up alerts to notify operators if a status changes from UP to DOWN for instance.

(By default, most of the endpoints are not routable without authentication. This is easy to enable with Spring Security, but I disabled it for the purposes of this exercise.)

It's super simple to include the Spring Boot Actuator as part of a Spring Boot app. It only requires adding a dependency to the pom.xml file.

I'll take advantage of the /health endpoint when using various platform monitoring features.

AWS Elastic Beanstalk

From a UI perspective, AWS does a nice job of making the observability features easy to find. There are menu items on the left navigation for Logs, Health, Monitoring, and Alarms. The Health section shows an overview of the application status. It includes other metrics like response codes, latency, and CPU utilization. (A similar view is also available from the CLI by using eb health.)

By default, it uses a TCP connection on port 80 to determine application health. We can set it to use a specific HTTP request instead from the Health section of the Configuration area. We'll start by setting the "Application health check URL" to the /health endpoint.

This sets the health check URL for the load balancer that sits in front of the application. We could also modify the Elastic Load Balancer (ELB) health check settings directly if we chose to. It supports different timeout and interval durations as well as customizable thresholds. This is set through the Load Balancer area of the EC2 service.

Back in the EB console, the Monitoring dashboard offers some nice visualizations on metrics like requests, health, and latency. It's also customizable, depending on what CloudWatch metrics are available.

This is the place where Alarms can be set up as well to send notifications based on certain thresholds. Here, I've set up an Alarm to notify me when CPU usage goes above 90% for 5 minutes.

And finally, Logs. You can easily download the last 100 lines or the full set of logs through the UI. As I mentioned last time, there is no simple interface for streaming or viewing them. Not a big deal, but you're stuck downloading and viewing in your browser or favorite text editor.

The last 100 lines option concatenates the most common logs together, but only the last 100 lines of each. The full log download isn't aggregated at all. Web server, application server, errors, platform operations— they each have their own output.

Downloading logs as above (or using eb logs) is what I used to do basic troubleshooting to get the app up and running. These logs only stick around for 15 minutes, so there are some options for log persistence. For one, they can be rotated and published to Amazon S3. AWS also offers integration with their CloudWatch monitoring service. This is the way to get streaming logs if you want them.

After enabling log streaming in the Software Configuration section, logs appear in CloudWatch. From there, they can be viewed in real time or sent off to Elasticsearch or S3 as needed.

Azure App Service

As with everything in Azure, the user interface can feel overwhelming at times. To discover what monitoring options Azure offers, I found this overview document helpful. It's more about monitoring Azure services holistically, but can be applied here too. It describes three tools and gives examples of when to use which one, which is useful. The tools are Azure Monitor, Application Insights, and Log Analytics.

The Azure Monitor interface is the consolidated UI view for all these services. From there you can create and configure metrics and alerts. Remember, this is a global view across all Azure resources, so you always have to target the specific App Service. Alternatively, you could access these settings on the App Service blade. Metrics are available on the "Overview" dashboard linked to from the top of the App Service left menu.

By default, visualizations appear for key metrics like requests, response time, and errors. The dashboard is pretty customizable though, and you can pin charts as desired. Clicking on a chart lets you configure metrics and other simple options like type or time range.

From here you can set alerts as well.2 The "+ Add metric alert_"_ button will let you create an alert on various metrics like CPU time or response codes. It also supports alerts for events like successful stop or failed start. There doesn't seem to be a way to configure a health check endpoint though.

The Azure Monitor interface also provides access to Application Insights and Log Analytics. Application Insights is Azure's APM tool. (I'm not going to go into detail on APM in this post.1) Log Analytics supports capturing various details like infrastructure logs and Windows metrics. Unfortunately, it doesn't appear to support capturing application logs. (I did find this blog post on how to programatically push application logs to Log Analytics but I didn't try it.)

So Azure Monitor isn't super helpful for App Service users after all. We're back to the App Service blade menu to view application logs. You must first turn them on in the "Diagnostics Logs" section, and then you can view them in the "Log stream." You can also choose to store them in Azure storage.

The one other place I found worth exploring was the "Advanced Tools" option. It uses the Kudu project to provide various tools including a log stream, among other things.

While I didn't get into the CLI much, there are options there as well. Tail and download logs with with az webapp log or configure alerts and metrics with az monitor.

Google App Engine

Most of Google's Day 2 operational functions are part of GCP's Stackdriver product. With Stackdriver, you can set up Alerting and Uptime Checks and view dashboards. (It also supports Debugging and Tracing if you've enabled your project for them.1) From within the native GAE interface, we do have access to a few things. We have a basic dashboard for viewing certain metrics over time. We can also view streaming logs for each instance or version running. This links to the Stackdriver Logging area where we can stream and search through logs.

I used this interface plenty while deploying the app to discover problems. It's the best logging interface I saw from any of the public cloud PaaS products. It even supports exporting logs to GCP storage services and creating metrics based on the content of log entries. Of course the CLI offers gcloud app logs as well.

For anything beyond the logs, we have to move on to a full-fledged Stackdriver account. You have to select the project to monitor, since it applies to all GCP services, not just App Engine. There are two account types — free or premium. Premium provides longer retention times and more customizations, but costs extra. I signed up for a 30-day trial of premium features.

After activating a Stackdriver account, it provides instructions on installing a monitoring agent. This enables collecting even more information from the VM than is available from GAE alone. It would be nice if this were more automated or even already included. It wouldn't be too hard to script I suppose, but I didn't go through with it for this exercise.

Strackdriver lets you create alerting policies, uptime monitors and custom dashboards.

Alerting policies are super granular. They let you create notifications based on many different conditions. (Some are only for premium users.) It supports not only metrics, but health alerts as well.

Again, I created a basic condition to alert after 5 minutes of CPU usage above 90%.

After setting the condition, you set the notification mechanism. There are plenty of choices, particularly for premium users, but I opted for simple e-mail for now. Then you name the policy and optionally set a message to go along with the notification which is a nice feature.

Uptime monitors (health checks) are also configured in Stackdriver. You set the type, path and polling interval. It even supports advanced settings like custom headers, authentication, and response text matching. I set up a simple check against the Actuator's /health endpoint. From there you can set up an alert policy as above to notify when there's a problem.

Note: These settings are separate from the health checks you can set in the app.yaml file. Those seem to handle where to send load balancer traffic to as opposed to any kind of alerting.

Pivotal Cloud Foundry

By default, PCF performs health checks using a TCP port to determine whether to route traffic to a given instance. If a connection can be established within 1 second, it is considered healthy. For HTTP apps, PCF also supports setting a health check endpoint using either the CLI or the manifest. In this case, it expects to receive a 200 OK response within 1 second. (The PCF documentation provides great detail around how health checks work.)

From the command line you can set the health check at time of deployment with the -u parameter. You can also set it after the fact with the cf set-health-check command and an app restart. I've updated the manifest file in my GitHub repo to include the health check settings. Here, I also show how to use the CLI to set the endpoint:

cf set-health-check friedflix-media-tracker http --endpoint /health

As for logging and metrics, these are some of PCF's strongest areas. The Loggregator system collects all logs and metrics from apps and platform components and streams them to a single endpoint. The logs can be viewed from the Logs page in Apps Manager. They can also be retrieved using the cf logs command from the CLI.

There is also the option to launch PCF Metrics (from the Overview tab) for a closer look at the data. PCF Metrics stores logs, metrics data, and event data from the past two weeks.

PCF Metrics displays graphical representations of the logs, metrics, and event data. It includes data views for container and network metrics (CPU, latency, etc.), app events, and logs. This is very handy for helping operators and developers troubleshoot problems. For example, when the events view shows a crash, it can be correlated with corresponding container and network metrics and the log output for that same time period.

If these built-in logs and metrics tools aren't enough, there is yet another option. The endpoint where logs and metrics get sent is called the Firehose. PCF supports configuring plugins, called nozzles, for the Firehose. They can send custom data to the log stream, or have an external service consume data from the stream. Write a custom nozzle, or use one of the Marketplace offerings. Tools like New Relic or Datadog can be set up this way to perform more advanced monitoring and alerting.

Finally, since we used the Spring Boot Actuator, we have a few extra features to explore. PCF actually offers quite a few nice integrations with the Spring Boot Actuator. Thanks to the /health endpoint, we can view the app health right from within Apps Manager on the Overview tab:

The /dump and /trace endpoints allow us to view the thread dump and request traces right from the UI as well.

We can even configure logging levels and filter which loggers to show. This is all done right from Apps Manager without even redeploying the app.

Impressions

  • AWS Elastic Beanstalk has some handy options, and the interface is pretty easy to use. Most of the more advanced metrics and logging features require getting deeper into CloudWatch though.
  • Azure, as before, required me to hunt through a lot of documentation to figure things out. The user interface is at best inconsistent. The UI is definitely a pretty big weak spot for Azure in general. It's not enough to completely deter usage, but it's something to consider for heavy web portal users. Also, many of the Azure Monitor services seemed to offer nice integrations with other Azure services but not so much with App Service.
  • Aside from the free vs. premium features in Google, the GAE tools were very nice from an interface perspective. In particular, the alerting interface was very rich but still easy to use.
  • PCF's logging interface was a clear winner. Only Google's Stackdriver log interface even came close to what PCF's Loggregator provides. PCF Metrics is also quite nice for correlating metrics with logs. For Spring Boot apps, PCF has a clear advantage given the extra integrations and ease of use.

Resilience

What do I mean by resilience? I'm not doing a deep dive into HA best practices here. For the moment, I'm referring to features around scaling, autoscaling, and self-healing. Since our app itself is stateless, we can rely on multiple instances of the app to easily scale it out (or in). As for self-healing, I added a /crash endpoint to help test how each PaaS handles losing an instance.

AWS Elastic Beanstalk

As stated before, AWS EB uses an ELB in front of the app, so it offers some good scaling options. (This assumes the environment type is "load balanced, auto scaling" as opposed to "single instance.")

The eb scale command lets us quickly set a specific number of instances to be running using the CLI. In the UI, the Scaling box of the Configuration section has what we need.

For manual scaling, we can set instance counts and desired availability zones. We can also configure autoscaling pretty granularly with a nice selection of triggers. If we prefer time-based scaling for peak hours or days, it's available as well.

On the self-healing front, Elastic Beanstalk handles things gracefully without any additional configuration. I crashed the app when it was set to run one instance only. In this case, I experienced only a very brief period of downtime before it restarted. (I actually did it twice to make sure it worked as the first time I missed the window.) When crashing it with two instances running, there was no downtime. It started the second instance in the background.

Azure App Service

In Azure, the options for manual scaling actually include both horizontal and vertical. You can opt to scale up (or down) by selecting a different App Service Plan size. This replaces underlying VMs in favor of ones with specified CPU-Memory settings.

For scaling out, there is a way to specify the number of instances as well as configuring autoscaling. The autoscaling rules are granular and can be set trigger on a number of metrics.

Scaling options can be set from the CLI as well using the az appservice plan update command. It's clear that there is a load balancer component involved to enable the scale out features. Except, there isn't a way to access its settings or view the individual instance health like we saw with AWS.

When I tested the self-healing feature using the /crash endpoint, Azure handled things. I never saw a 4xx or 5xx error, but the app did take a little longer to load after a crash, even with multiple instances. The app eventually would come back, but it always seemed to take at least a minute to recover. (Maybe load balancer related? I'm not sure.)

This is a feature called Proactive Auto Heal and it was introduced not that long ago. It's turned on by default and will restart the app based on percent of memory used or percent of failed requests. Auto heal actions can also be set in the manifest file. This was the way to configure it before the proactive feature was implemented.

Google App Engine

When I deployed my app to GAE, I used a Flexible environment. Google does offer a Standard product option as an alternative. They provide documentation contrasting them with guidance on when to choose each. I mention this because there are some differences in how each type handles scaling. The documentation outlines this, so I'm not going to go into much detail on that here. I'll look at how scaling works for my app in the Flexible environment.

By default, GAE has autoscaling turned on. It starts with two instances minimum and will scale up with 50% CPU utilization. These settings can be changed in the app.yaml file. (You also set the instance size and resource settings here.) With the Flexible environment at least, there doesn't seem to be a way to trigger autoscale with anything other than CPU usage. Manual scaling happens the same way — in app.yaml. There's no clear way to change any scaling settings in the UI or CLI other than modifying app.yaml or using the API.

automatic_scaling:   min_num_instances: 5   max_num_instances: 20   cool_down_period_sec: 120 # default value   cpu_utilization:     target_utilization: 0.5

Finally, I tested the self-healing capabilities using the /crash endpoint. With no extra configuration and two instances, I experienced no downtime. When I crashed both instances in succession, it took less than a minute for at least one to come back.

Pivotal Cloud Foundry

PCF lets you easily scale number of instances as well as memory and disk limits via the UI or CLI. Using the CLI, the cf scale command lets you specify the scaling parameters:

cf scale -i 4

The same can be done through Apps Manager right on the Overview tab:

Pivotal Cloud Foundry offers autoscaling through the App Autoscaler available in the Marketplace. Add it with the standard cf create-service command or from the Marketplace UI.

cf create-service app-autoscaler standard friedflix-media-autoscale

From the Service page in Apps Manager, use the "Manage" link to control autoscaling. Minimum and maximum instances get specified. Rules can be set based on CPU utilization or HTTP throughput and latency. Thresholds are set as percentages for scaling down or up. Finally, similar to AWS, you also can set scheduled changes based on date and time.

Like all the platforms, PCF does self-healing and handles crashed instances gracefully. With multiple instances running, there's no downtime after hitting the /crash endpoint. Instances only took seconds to come back to life.

Impressions

  • Across all the Day 2 operations I examined, the platforms had the most parity in this area. Particularly with respect to self-healing, on all the platforms it just worked.
  • From a scaling perspective, GAE was the only real outlier in the sense that it didn't allow an easy way to scale instances from the UI or CLI. It was also the hardest to understand how autoscaling worked given the different types of environments.
  • Azure autoscaling options were fine, though they lacked a time-based option and as always had me hunting through documentation.
  • Other than having to know to find the Autoscaler in the Marketplace, PCF's interface was the most straightforward and understandable to configure.

Patching

For patching, I'm interested in how each platform handles software updates. In particular, I want to explore how they manage zero downtime deployments. Typically this gets handled by either rolling updates or blue-green deployments. The main difference between these two options is the number of environments.

With a rolling deployment, there is only one environment. Updates are first deployed to a subset of instances in that environment. After successful completion, deployment moves on to the next subset. In the blue-green scenario there are two complete environments. Only one gets updated at a time, and once confirmed working, traffic is directed to the new version.3

AWS Elastic Beanstalk

Elastic Beanstalk does a nice job handling zero downtime deploys. The easiest option is to use the eb deploy command while having more than one instance running. This is essentially the rolling update method. It deploys new code one instance at a time, removing it from the load balancer and only putting it back and moving on once deemed healthy. With only one instance running, there is a brief period of downtime though.

There is also a blue-green deployment method offered, and it's pretty simple to use. First, clone the environment, deploy new code, then swap the URLs. This can be done from the "Actions" menu or from the CLI using eb clone and eb swap.

One final feature worth noting is AWS EB calls Managed Platform Updates. This allows operators to configure scheduled upgrades of the underlying platform components. This will update the platform to include fixes or new features recently released. While maintenance windows are scheduled, applications remain in service during the update process.

Azure App Service

Azure App Service offers what they call Deployment Slots for doing blue-green deployments. While Deployment Slots enable isolated app hosting, they do share the same VM instance and server resources. They are also only supported at the Standard and Premium levels.

By default an application lives in the "production" slot. Creating a new slot allows for an App Service to be cloned.

Once created, deploy new code to the slot and swap URLs once verified.

Google App Engine

As I mentioned last time, GAE's project-app-version construct lends itself well to blue-green deployments. In fact, just deploying code through the CLI results in a blue-green deployment. GAE deploys to a new "version" and then cuts over traffic to that version. Versions stick around until deleted. Traffic can split across versions for slower rollouts or moved back in case of rollbacks.

Pivotal Cloud Foundry

The cf push command does stop and start an app during a deployment. PCF does support blue-green deployments though, and it's well documented. It's as easy as using cf push to deploy the new app code using a temporary name and route. Then, after verifying the deployment, use the cf map-route and cf unmap-route to get the hostnames correct.

The community-built plugin Autopilot also helps users orchestrate this process. It offers a cf zero-downtime-push command for hands-off, zero-downtime deploys.

Another important thing that PCF supports is rolling updates at the platform-level. This is a powerful feature enabled by PCF's underlying infrastructure orchestrator called BOSH. Operators can patch the platform components in place while still running apps. This doesn't bring down any apps in the process and even uses canaries to ensure success before moving on.

Impressions

  • AWS Elastic Beanstalk is pretty slick in this department. It's definitely the most straightforward blue-green deployment model of the group.
  • Google's solution is the most opinionated and unique but also quite powerful. The versions concept offers a lot of benefits even beyond the blue-green deployments.
  • Azure handles things okay, but in keeping with a theme, the interface isn't great. The Deployment Slots concept is perfectly good, but creating them and swapping URLs wasn't as straightforward as on the other platforms.
  • PCF's method is straightforward, if a bit manual. Of course, the community plugins help and it's all CLI-based so can be easily scripted.

Wrap Up

All platforms offer complete, feature rich experiences. The public clouds of course have some of the Day 2 operations wrapped up in separate products, as I pointed out. This is particularly true for the observability features. Even with tight integrations, the experience isn't always seamless. However, if you have workloads running on other services within that cloud, it's definitely convenient to have some shared capabilities here. (Some even offer cross-cloud integrations, like Stackdriver monitoring AWS resources.)

As before, each platform has strengths and weaknesses. In general, the more opinionated the platform, the easier to use. Even opinionated platforms offer some level of customization, typically with some complexity tradeoff. These posts should provide a nice high-level view into the key features of each platform. Consider the specific use case, workload, and cloud landscape of an organization when selecting the right PaaS for the job.

Footnotes
  1. In addition to not going into full-fledged detail about the native monitoring products, I'm also not going much into Application Performance Management tools. There are some native offerings for APM and lots of good third-party options. I decided it's outside the scope of this post, particularly because it may also involve additional code packages, etc.
  2. You can also use the "Alerts" link from further down on the App Service left menu to accomplish the same thing. The two "Add Rule" dialogs are a little different though, for some inexplicable reason.
  3. As I footnoted last time as well, any solution should support automation for building into CI/CD pipelines. That's really a whole other post and topic for another day though.