Welcome!

If everyone is thinking the same, someone isn't thinking

Lori MacVittie

Subscribe to Lori MacVittie: eMailAlertsEmail Alerts
Get Lori MacVittie via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Minimizing the impact of code changes on multi-tenant applications requires a little devops “magic” and a broader architectural strategy

Ignoring the unavoidable “cloud outage” hysteria that accompanies any Web 2.0 application outage today, there’s been some very interesting analysis of how WordPress – and other multi-tenant Web 2.0 applications – can avoid a similar mistake. One such suggestion is the use of a “feathered release schedule”, which is really just a controlled roll-out of a new codebase as a means to minimize the impact of an error. We’d call this “fault isolation” in data center architecture 101. It turns out that such an architectural strategy is fairly easy to achieve, if you have the right components and the right perspective. But before we dive into how to implement such an architecture we need to understand what caused the outage.


WHAT WENT WRONG and WHY CONTROLLED RELEASE CYCLES MITIGATE the IMPACT

What happened was a code change that modified a database table and impacted a large number of blogs, including some very high-profile ones:


quote-left Matt Mullenweg [founding developer of WordPress] responded to our email.

"The cause of the outage was a very unfortunate code change that overwrote some key options in the options table for a number of blogs. We brought the site down to prevent damage and have been bringing blogs back after we've verified that they're 100% okay."

Wordpress's hosted service, WordPress.com, was down completely for about an hour, taking blogs like TechCrunch, GigaOm and CNN with it.

WordPress.com Down for the Count, ReadWriteWeb, June 2010

Bob Warfield has since analyzed the “why” of the outage, blaming not multi-tenancy but the operational architecture enabling that multi-tenancy. In good form he also suggests solutions for preventing such a scenario from occurring again, including the notion of a feathered release cycle. This approach might also accurately be referred to as a staged, phased, or controlled release cycle. The key component is the ability to exercise control over which users are “upgraded” at what time/phase/stage of the cycle.

quote-left Don’t get me wrong, I’m all for multitenancy.  In fact, it’s essential for many SaaS operations.  But, companies need to have a plan to manage the risks inherent in multitenancy.  The primary risk is the rapidity with which rolling out a change can affect your customer base.   When operations are set up so that every tenant is in the same “hotel”, this problem is compounded, because it means everyone gets hit.

What to do?

[…]

Last step: use a feathered release cycle.  When you roll out a code change, no matter how well-tested it is, don’t deploy to all the hotels.  A feathered release cycle delivers the code change to one hotel at a time, and waits an appropriate length of time to see that nothing catastrophic has occurred.

WordPress and the Dark Side of Multitenancy

Bob Warfield

This approach makes a great deal of sense. It is a standing joke amongst the Internet digerati that no Web 2.0 application every really comes out of “beta”, probably because most Web 2.0 applications developed today are done so using an agile development methodology that preaches small functional releases often rather than large, once in a while complete releases. What Bob is referring to as “feathered” is something Twitter appears to have implemented for some time, as it releases new functionality slowly – to a specific subset of users (the selection of which remains a mystery to the community at large) – and only when the new functionality is deemed stable will it be rolled out to the Twitter community at large.

This does not stop outages from occurring, as any dedicated Twitter user can attest to but it can mitigate the impact of an error hidden in that code release that could potentially take the entire site down if it were rolled out to every user.

From an architectural perspective, both Twitter and WordPress are Web 2.0 applications. They are multi-tenant in the sense that they use a database containing user-specific configuration metadata and content to enable personalization of blogs (and Twitter pages) and separation of content. The code-base is the same for every user, the appearance of personalization is achieved by applying user-specific configuration at the application and presentation layers. Using the same code-base, of course, means deploying a change to that code necessarily impacts every user.

Now, what Bob is suggesting WordPress do is what Twitter already does (somehow): further segment the code-base to isolate potential problems with code releases. This would allow operations to deploy a code change to segment X while segment Y remains running on the old code-base. The trick here is how do you do that transparently without impacting the entire site?


APPLICATION DELIVERY and CONTEXTUAL APPLICATION ROUTING

image One way to accomplish this task is by leveraging the application delivery tier. That Load balancer, if it isn’t just a load balancer and is, in fact, an application delivery controller, is more than capable of enabling this scenario to occur.

There are two prerequisites:

1. You need to decide how to identify users that will directed to the new codebase. One suggestion: a simple boolean flag in the database that is served up to the user as a cookie. You could also identify users as guinea pigs beta testers based on location, or on a pre-determined list of accounts, or by programmatically asking users to participate. Any piece of data in the request or response is fair game; a robust network-side scripting implementation can extract information from any part of the application data, HTTP headers and network layers.

2. You need to have your application delivery controller (a.k.a. load balancer) configured with two separate pools (farms, clusters) – one for each “version” of the codebase.

So let’s assume you’re using a cookie called “beta” that holds either a 1 or a 0 and that your application delivery controller (ADC) is configured with two pools called “newCodePool” and “oldCodePool”. When the application delivery controller inspects the “beta” cookie and finds a value of 1 it will make sure the client is routed to one of the application instances in “newCodePool” which one assumes is running the new version of the code. Similarly if the cookie holds a value of 0, the ADC passes the request to one of the application instances in “oldCodePool”.

At some point you’ll be satisfied that the new code is not going to take down your entire site or cause other undue harm to tenants (customers) and you can simply roll out the changes and start the cycle anew.


THIS is WHERE DEVOPS SHINES

It is exactly this kind of scenario in which the emerging devops discipline can show its real value. A developer who is also well-versed in the operational aspects of application delivery (load balancing) can design an architecture that leverages the operational components to support the application development and deployment lifecycle. Using network-side scripting, as would be the case in this example, allows devops to determine how to roll-out code changes in a controlled and yet agile manner. It allows developers to isolate the impact of code-changes (fault isolation) without sacrificing the benefits of a rapid development cycle in meeting the ever-increasing demand for new features and functionality by customers/users.

twitterbird

 

 

 

You can join the conversation regarding

the emerging discipline known as “devops”

on Twitter by following the #devops

hashtag

Taking advantage of context-aware and programmable application delivery platforms – in any datacenter model – enables devops to architect smarter, more flexible solutions. If devops rolls out a new codebase and immediately sees a problem it can immediately “rollback” to the stable version of the application by simply modifying a single network-side script and immediately all users are back on the “old” codebase. Combining a flexible application delivery tier with the rapid provisioning capabilities associated with virtualization and cloud computing , this process can be made even easier and allows devops to react immediately and potential avoid the outages/downtime associated with a code release gone wrong.

Organizations that encourage the development of a devops role and discipline will inevitably realize greater benefits from virtualization and cloud computing because the discipline encourages a broader view of applications, extending the demesne of application architecture to include the application delivery tier. That tier is key today in enabling scalability of applications across a wide range of datacenter models through load balancing. The ability to intelligently route application requests based on application development needs is also available, but rarely leveraged today because most organizations do not have a devops role and instead remain siloed with a wall between the network teams responsible for application delivery and the development teams responsible for application development. When the two disciplines meet in the middle and remove the wall that has long stood between them, more efficient and flexible methods of architecting and controlling application behavior can be achieved.

Read the original blog entry...

More Stories By Lori MacVittie

Lori MacVittie is responsible for education and evangelism of application services available across F5’s entire product suite. Her role includes authorship of technical materials and participation in a number of community-based forums and industry standards organizations, among other efforts. MacVittie has extensive programming experience as an application architect, as well as network and systems development and administration expertise. Prior to joining F5, MacVittie was an award-winning Senior Technology Editor at Network Computing Magazine, where she conducted product research and evaluation focused on integration with application and network architectures, and authored articles on a variety of topics aimed at IT professionals. Her most recent area of focus included SOA-related products and architectures. She holds a B.S. in Information and Computing Science from the University of Wisconsin at Green Bay, and an M.S. in Computer Science from Nova Southeastern University.