Welcome!

If everyone is thinking the same, someone isn't thinking

Lori MacVittie

Subscribe to Lori MacVittie: eMailAlertsEmail Alerts
Get Lori MacVittie via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Related Topics: Cloud Computing, Virtualization Magazine, Infrastructure On Demand, Cloudonomics Journal, Sun Developer Journal, Infrastructure 2.0 Journal, Datacenter Automation, Java Developer Magazine, Cloud Expo on Ulitzer, Virtualization Expo, Open Source Journal, CIO/CTO Update

Blog Feed Post

Scaling Security in the Cloud: Just Hit the Reset Button

Sometimes the best answer to a problem is to hit the reset button, but it should probably be the last answer, not the first

Cloud Security Journal on Ulitzer

Sometimes the best answer to a problem is to hit the reset button, but it should probably be the last answer, not the first.

My cohort Pete Silva attended the 2009 Cloud Computing and Virtualization Conference & Expo and offered up a summary of one of the sessions he enjoyed (‘Cloud Security - It's Nothing New; It Changes Everything!’ (pdf)) in a recent post, “Virtualization is Real

blockquote One of the sessions I enjoyed was ‘Cloud Security - It's Nothing New; It Changes Everything!’ (pdf) from Glenn Brunette, a Distinguished Engineer and Chief Security Architect at Sun Microsystems.

Scale – Today Security administrators deal with 10’s, 100’s, even 1000’s of servers but what happens when potentially tens of thousands of VM’s get spun up and they are not the same as they were an hour ago. Security assessments like Tripwire, while work, inject load and what if those servers are only up for 30 minutes?  How can you be sure what was up and offering content was secure?  One idea he offered was to have servers only live for 30 minutes then drop it and replace.  If someone did compromise the unit, they’d only have a few moments to do anything and then it’s wiped. You can keep the logs but just replace the instance.  Or, use an Open Source equivalent every other time you load, so crooks can’t get a good feel for baseline system.

The “scale” we’re talking about is a combination of scaling processes and systems. We don’t often talk about the impact of large-scale environments on processes but security processes are almost always the hardest hit as an environment grows because of the sheer volume of data and systems involved. That said, Glenn’s idea to only allow servers to “live” for 30 minutes is an interesting one, and I am going back and forth between “that’s a good idea’ and “that’s a bad idea” and “there’s got to be a better way.”


THE GOOD

One of the reasons this is a good idea is because virtualization provides a snap-shot in time, a known state, a known security posture for the applications deployed within the virtual container. By releasing it and launching it anew, you are assured of the security of the application and environment because it you essentially go back to the beginning. Any changes to the system since the last “launch” are effectively wiped out (logging to an external storage system would be a requirement, of course) and any back-doors, trojans, malware, or rootkits dropped onto the system would be gone.


Cloud Expo New York to present 5,000 delegates and more than 100 exhibitors at the Jacob Javits Convention Center in New York City

That would frustrate the heck out of an attacker, wouldn’t it?

But it would also likely frustrate the heck out of end-users who might have been using the application at the time it was released.


THE BAD

There are a couple reasons this is just a bad idea, and the impact on availability to end-users is just the most obvious one. In a live environment it’s never a good idea to just “bring down” an instance of an application – virtual or traditional – that users might be accessing. Doing so severs their connections and wipes out any session state that might have been stored on the server and forces them to “start again”. That said, if you knew this part of your security strategy you could ensure that developers understood this behavior so that the implemented a database-based shared-session model for the applications. If session data is stored in a shared database – on a separate instance – then the potential damage to user sessions is mitigated because it does not rely on any given application instance.

Assuming this is the case, you then have to be concerned about the loss of the connection to the application for users. Again, if you knew this was going to be one of your security techniques then you’d best let the network or application delivery network folks know ahead of time as they can ensure that users are seamlessly redirected to new (or other existing) instances as soon as the one they were connected to is released. Basically you’d have to ensure you had a load balancing solution in place to ensure reliability of access to the application.

This also means it’s more likely you should always have two instances of the application available, and rotating through this up-down-up-down schedule on different time intervals.

Overall you’re likely to incur higher costs with this kind of a strategy as well. It is typical for providers to charge “by the hour” and any partial hour is counted as a full hour. Rotating server/application instances every half-hour would likely incur charges for two instances per hour instead of one anyway.


THE UGLY

This strategy also does very little to address the most pressing security threat facing applications today: tainted user data. That’s going to hit the database, and unfortunately Glenn’s “go back to the beginning” approach to security would be disastrous when applied to virtual environments in which a database is running. You want them to change, to grow, to be modified. It is in their nature to store data and change over time.

So you can’t use this concept for a virtualized environment in which a database is deployed. It would be detrimental to the health of the business.

But there’s something to Glenn’s idea that’s certainly appealing when part of a broader security strategy. What his “up-down-up” technique is designed to prevent is compromise of the system, i.e. trojans, worms, viruses, and malware inserted into the system that can be used for illegitimate access or as part of a larger botnet. HIs technique certainly addresses those security risks by effectively wiping them out on a regular basis. What’s not accounted for is the injection of malicious code into the database, which cannot be so easily “reset.”

Perhaps this is a job for Infrastructure 2.0?


INFRASTRUCTURE 2.0 IS MORE THAN JUST NETWORK STUFF

If we employ the use of an infrastructure 2.0 capable application delivery network we can utilize Glenn’s technique in conjunction with other security technology to provide better coverage in a more dynamic way. Consider that the integrated network and application network security capabilities of the application delivery network can protect application instances against web application attacks, especially those that are really targeting the database, e.g. SQL injection.

Also consider that an application delivery solution can provide the failover capabilities required to assure availability in an environment in which instances may be going down and coming up in a highly volatile pattern.

That addresses the “bad” and the “ugly” impact on end-users resulting from Glenn’s “up-down-up” technique, leaving us only with the “good”.

But it really doesn’t address the root of the problem, the reason Glenn suggests going back to the beginning in the first place: volatility and change. Scaling security processes across thousands of virtual instances is problematic, I agree, but one of the reasons it’s so hard to scale is that you don’t know what’s going on. There’s currently no real collaboration across the entire infrastructure. Security folks can’t get a good feel for what’s going on in a large scale, dynamic environment because the information they need to correlate and assess the current security posture of the environment and applications is dispersed across the infrastructure.

What’s needed is an overarching system that can integrate security solutions with the rest of the infrastructure. When a virtual environment is brought on line the security infrastructure needs to know about it –not just to apply the proper policies but also to assess its current posture and ensure it is added to the pool of resources that needs to participate in the larger security scheme. If a HIPS (Host Intrusion Prevention System) is used to monitor a system for intrusion and its alarm is triggered, that information don_t_panic_buttonneeds to be imparted to the rest of the infrastructure. If a virtual machine is potentially compromised it should be immediately removed from the available pool of resources. That requires collaboration across the entire infrastructure. If part of the launch process includes a vulnerability scan of the application and that scan comes back positive perhaps the instance should not be allowed to launch, and the infrastructure notified immediately so that it can take whatever steps are necessary, such as automatically virtually patching the vulnerability if possible and allowing the instance to launch while notifying security and developers that there’s a vulnerability in need of patching.

Cloud computing and virtualization are going to force integration and collaboration into the fore of architecture design necessarily. The scale of systems using virtualization is growing and becoming less and less manually manageable, which will inevitably result in more automation and orchestration at the infrastructure layer.

Let’s not forget the myriad pieces of security software that provide valuable information and threat mitigation are also part of the “infrastructure 2.0” family, as it were. We need to start thinking more broadly, more strategically about how to leverage collaboration across the disparate functional silos within IT to come up with better solutions to address security and its associated scaling challenges in a cloud computing environment.

More Stories By Lori MacVittie

Lori MacVittie is responsible for education and evangelism of application services available across F5’s entire product suite. Her role includes authorship of technical materials and participation in a number of community-based forums and industry standards organizations, among other efforts. MacVittie has extensive programming experience as an application architect, as well as network and systems development and administration expertise. Prior to joining F5, MacVittie was an award-winning Senior Technology Editor at Network Computing Magazine, where she conducted product research and evaluation focused on integration with application and network architectures, and authored articles on a variety of topics aimed at IT professionals. Her most recent area of focus included SOA-related products and architectures. She holds a B.S. in Information and Computing Science from the University of Wisconsin at Green Bay, and an M.S. in Computer Science from Nova Southeastern University.