You’re talking to one of your users about hosting a new application on your OpenStack deployment. One of the first questions they ask you is: “I’m running my SQL database on 64 cores and 256GB of memory… can you host that for me on OpenStack?”
Databases on OpenStack
As I described in my last post, we’re building a huge private cloud at Symantec. Early on, before we had chosen OpenStack as our platform, we recognized a common need for a horizontally scaling database service across many of the Symantec product teams that would eventually become our customers. Some of these teams were already running and operating NoSQL clusters. Others were pushing the upper limits of vertically scaling database technologies, buying larger and larger servers or manually sharding their data into multiple physical servers.
This post isn’t about the pros and cons of relational and NoSQL databases – there are plenty of those out there. Let’s start by agreeing that both types of databases have their sweet spots, and both are necessary in an OpenStack environment. In this post I'll be talking specifically about how to provide your OpenStack users with easy access to powerful NoSQL capabilities.
OpenStack provides some great storage technologies: Swift for large scale object storage, Trove for database provisioning, Cinder for block storage. However we saw a need within Symantec for NoSQL as a service, a place where users could store and query lots of table data with very low latency, and without the operational overhead of managing the database cluster. This is where MagnetoDB comes in.
MagnetoDB
MagnetoDB is a fully open sourced, high performance, high throughput NoSQL database service. It satisfies the NoSQL need on OpenStack that DynamoDB fills on AWS. It provides our users with most of the benefits of running their own NoSQL cluster without… well… having to run their own NoSQL cluster. The user manages and populates their data tables via a Web services API, in a secure multi-tenant environment. We worry about how to operate and scale the underlying NoSQL database.
A MagnetoDB table has a very flexible schema: the user defines only a primary key and any secondary indexes; the other row attributes can be defined dynamically. The user can then query that table by either the key or indexes. Tables can grow in the many terabytes and billions of rows while still maintaining excellent performance. MagnetoDB also supports more advanced features like configurable consistency, conditional updates, and row expiration. We’re adding a streaming interface for higher performing bulk load processes. And naturally MagnetoDB is natively integrated with Keystone for authentication and multi-tenancy.
True to the OpenStack ethos, MagnetoDB provides an API layer and a driver abstraction, allowing you to plug in the NoSQL database of your choice. We’ve implemented the Cassandra driver, though HBase, MongoDB, and others may also be good options. Based on our user’s requirements, we’re building our MagnetoDB and Cassandra deployments to handle up to 10TBs of storage, billions of rows, and 10Ks requests per second for each tenant.
Do You Need MagnetoDB?
Is MagnetoDB something you need? Let’s get back to your user’s question about hosting a vertically scaling relational database on OpenStack. I generally answer this question with one of my own: “Can you help me understand what you’re storing?” Sometimes the answer is that the data is truly relational. However, often much of the data would be better suited to a different type of storage.
Large blob data can go into Swift, with just the object reference stored in the database. If a majority of the remaining data can be stored in tables where joins can be avoided, MagnetoDB may likely be a good fit, removing the requirement for ever larger and larger machine instances. The remaining, relational data may likely be appropriate for a small, Trove-provisioned MySQL instance running on a much more manageable sized VM.
Some users have been interested in accessing the features of the underlying NoSQL database. In order to enforce authentication and multi-tenancy, we require users to interact with MagnetoDB only through the REST API. Folks who need raw Cassandra can deploy it on top of OpenStack, and in fact Trove is adding support for Cassandra provisioning. However, we’ve found that many users who initially want their own NoSQL cluster ultimately use MagnetoDB instead, making the trade-off that gives them the easier operational model.
Our product teams are already designing their applications around the use of MagnetoDB, and we’re starting to see some patterns:
- Storing, retrieving, and searching user profile data
- Supporting searchable metadata for objects stored in Swift
- Importing results from big data processes, for more interactive data mining
- Tracking application metrics in real-time
If you’d like to try MagnetoDB out, we’re integrated with DevStack. You can leave your questions and comments here, or drop me a note privately if you prefer. And look for more details in future posts.
Keith Newstadt
Symantec Cloud Platform Engineering
Follow: @knewstadt