October 25, 2011

Cool Stuff: OpenEdge 11 Multi-tenancy

Posted by Gus Bjorklund

One of the most important and useful new capabilities in the upcoming OpenEdge 11 release (planned for December 2011) is direct support for multi-tenancy in the database. What is multi-tenancy and why would I want one? Read on and I'll tell you.

The notion of multi-tenancy arises in the field of Software-as-a-Service (aka SaaS). When a vendor offers an application to be used as a "service", its customers do not have to buy a computer system to run the application on, nor do they have to have staff trained in the care and feeding of the system or backing up the data. Instead, customers subscribe to the service and the vendor does all of that. The customer simply uses the application over the Internet and has no idea where it is or what computer it is in. When you use a search engine, you don't know where it is and it doesn't matter. It's the same with SaaS applications. A SaaS application you are probably already familiar with is email. Google, Microsoft, Yahoo, and most ISP's run email servers for their subscribers. You don't have to know anything except where to log in. Someone else is responsible for everything to do with operating and maintaining the service. You just receive, read, compose and send your mail.

All this is very easy, even trivial, for the Software-as-a-Service customer. But what about the poor vendor? The vendor has to do all that messy IT stuff. Is he going to have a dedicated computer for each of the customers, as they would if they were running the application on their own computer? No, of course not. The vendor is going do everything possible to hold their operating costs as low as they can. That is where multi-tenancy comes in. Each of the SaaS vendor's customers is called a "tenant", a word taken from the rental housing market. In an apartment block we can have many tenants in the same building, all living in separate spaces. Similarly, we can put many application tenants into the same computer.

With a number of tenants sharing the same computer, the SaaS vendor has fewer machines to buy, fewer machines to take care of, and less work to do. No tenant can see the other tenants or their data and in fact do not know anything about them or their existence. Quite a number of the SaaS vendors do this. Since it is very simple to with OpenEdge, many of them have created a separate database for each tenant. But that means you have to do backups, schema changes, and other maintenance functions individually for each tenant's database. It would be much better if tenants could share the database too. We call this database multi-tenancy.

In OpenEdge 10 and before, with quite a bit of work, you can achieve database multi-tenancy. Some of our partners have rolled up their sleeves and done it. What you need to do is this:

  • First, add a "tenant identifier" column to every table. This tenant id column is a column that contains a unique identification number, perhaps an integer, assigned to each tenant. The value indicates which tenant owns the data in each row of the table.
  • Next, add the tenant id column to every index as the leading key component.
  • Create a table to store the tenant names and their tenant id's and assign an id to each tenant.
  • Then, go through all the code in your application and everywhere that a new table row is created, assign the correct value to the row's tenant id column.
  • You also have to invent a way to keep track of which tenant id is currently in effect.
  • Finally, go through all the code in the application again and find all the queries. Modify each WHERE clause to add a term that says "(tenantId = currentTenant) and ". Don't forget CAN-FIND. And make sure to add the tenant id term for each table in a multi-table query.

Once you do all those things, you can have database multi-tenancy. But in addition from the obvious fact that taking this approach is labor-intensive and invasive, there are a number of other disadvantages. I will list just a few here:

0) It is error prone. If you make a mistake when you change the code to do multi-tenancy, the wrong tenant's data will be returned. Or if you forget when you or another developer is fixing a bug, the wrong tenant's data will be returned.

1) Even if you use Type ii data areas, rows from multiple tenants will be commingled in the same data blocks and the same table's allocation clusters. This negates many of the advantages of using Type ii data areas. You get lower I/O efficiency because one tenant will have to read a data block that contains other tenants' data. Your customers will probably have the perception (whether true or not) that commingling their data reduces its security.

2) You can't do per-tenant maintenance easily. How do you reindex just one tenant's data?

3) How do you restore one tenant's data when they do something foolish like run end of month processing in the middle of the month?

4) You can't do per-tenant disk space allocation or disk space usage tracking very easily, if at all.

5) There is lock interference among tenants. Table-locks can lock out all the other tenants.

In spite of the disadvantages, I think the advantages far outweigh them and it is worth considering the use of this approach. But what if you could eliminate all the disadvantages? What if you could have your cake and eat it too? That's where OpenEdge 11 comes in. All that work I said you have to do? Gone. All those disadvantages I listed? All gone. OpenEdge 11 does all the hard work.

With the OpenEdge 11 RDBMS, database multi-tenancy is an inbuilt feature. The database knows what tenants are, who they are, and where their data are. It knows where to put new data and where to get existing data for each and every tenant. You do not have to modify all of the data access parts of your application. In fact, you shouldn't have to change much of anything! Most of your code should just work.

Well, all right, maybe you do have to make a few changes. Those changes have to do with how a user logs in to the application and the database and how the user's identify is verified. As I said, the database knows about tenants. But you will have to tell it which tenant a user belongs to. In the 4GL we use something called the CLIENT-PRINCIPAL to help in detraining that.

The CLIENT-PRINCIPAL (aka the "cp") is an inbuilt and extensible security token that we added to OpenEdge a few years ago, in the 10.1 release. The cp encapsulates a user's identity once it has been validated. In OpenEdge 11 we use the cp (with some enhancements) to encapsulate both user identity and tenant identity. Depending on which cp token is currently in effect in the 4GL runtime, the database uses the tenant id to decide what data to return for a query. For code running in AppServers and accessing the database on behalf of different users at different times, the AppServer can easily switch the cp that is in effect to that of the user that made the AppServer call.

To get ready for OpenEdge 11, you should learn about the CLIENT-PRINCIPAL. The name may sound a bit intimidating but it is really very easy to use. It takes only 3 lines of code to make one and to validate the user's identity. Go and watch the video of Sarah Marshall's Exchange Online 2010 talk over on PSDN.

In the OpenEdge 11 RDBMS, each tenant gets a separate data partition for each multi-tenant table (and not every table has to be made multi-tenant), and each data partition has its own associated index partitions. The tenant id in the cp is used to control which data partition to fetch table rows from and a tenant only gets to see their own data (and data in regular shared tables). We also have a special tenant called the "super tenant", conceptually similar to the UNIX root user, that is allowed to see /all/ the data.

This scheme works really well, is very efficient, and requires very few application changes. There are of course a lot of other things in OpenEdge 11. But I don't have space to talk about them just now and we will have to do that another time.

I hope you will like the new release. It is really cool.

February 04, 2010

Check out the latest Forrester Consulting study

Posted by Nancy Haynes

We wanted to find out more about the economic impact that Progress Partners may realize by building applications with OpenEdge versus using a non-Progress platform. To help us, we enlisted Forrester Consulting to do some research. You can see what they discovered in this new report:  The Total Economic Impact™ of Progress Software OpenEdge Platform. Among the revealing findings:

  • Developing with OpenEdge was shown to be 40% more  productive than alternative platforms
  • ISVs choosing OpenEdge are able to deliver their  application 30% faster to market than when using an alternative platform
  • Once deployed the support staff productivity gain is 80%  as compared to alternative platforms
Get the full report to learn more.

April 17, 2008

About That Scalability ...

Posted by Tom Harris

Does scalability only apply to very large deployments ?

Try this, fold a piece of paper in half by bringing the top edge down to meet the bottom edge. Now imagine an OpenEdge application with a 20GB database that uses either wifi or cabled network to run a small business. Imagine the business system occupies that half-sheet of paper, uses 2 watts of power, weighs two pounds, and costs $400 US.

That's reality today. I bought one of those little Eee laptops recently, and was delighted at how easy the linux system is to use, and how easy it integrates with my Windows-dominated infrastructure. Nice job. Linux even runs my spreadsheets and presentations unchanged (you can even change the laptop to XP if you want). What a nice parallel this "mini-system flexibility" is to OpenEdge! The ability to scale down is partly one of architecture and partly discipline in the implementation.

But why scale down? One reason could be new products or new markets. Another reason is to focus on providing a consistent service at a high performance level. Have you seen applications that have just seemed to grow and grow. Application bloat leads to a loss of control over the key function of the software, and it seems that poor usability and uneven performance come right along for the ride. So maybe scaling down is another way at looking for clear boundaries between layers of an application as well as separation of functions in an application environment. That highly connectable 2 pound workstation was no accident! It was drawn from the OLPC (One Laptop Per Child) design effort which is spawning a lot of creativity in the Ultra-Mobile PC market. A solid software architecture has some similarities in terms of defined interfaces, guiding principles, and scalable design.

Does your application scale up? Or down? Or out? Sometimes a "readiness review" of your architecture from scalability or an service-oriented view can lead to a plan that leads into the business roadmap by opening up new options. We hope that the new capabilities in OpenEdge ABL can help you implement that roadmap in a smooth way that phases in the stages you plan for your evolution. Are you trying this approach? How well is this  working for you? The AutoEdge application gives some nice examples of how the ABL supports the scalable OpenEdge Reference Architecture. I was just reviewing a presentation about AutoEdge that will be done for Exchange in Paris. If you are attending, this session could be very useful. Another interesting session shows a racetrack example which illustrates how a SOA environment comes together with Sonic, OpenEdge, and other fine products. Check them out and let me know what you think.

What about Deployment?

From the deployment side we still support efficient storage of data, so that 20GB database on the tiny laptop might hold a surprising amount of business data. we still support compatibility of the database from the little laptop up through large systems too. With OpenEdge 10.1B, every database can grow to have very large tables, wide indexes, and a full range of datatypes. You even get built-in management features like automatic defragging now. On a small database, it is easy to make it run "hands off". For large databases, you get on-line management of table space, schema additions, and the ability to increase some key performance parameters without taking downtime. OpenEdge needs less tuning than those big databases, and we are still the #1 pure-play database for applications that need a capable database "built right in". Our vision is to let your production environment evolve smoothly with a self-tuning, self-healing, and self-provisioning database that handles the transactional business load. What else? Well, improvements in appservers keep coming, as do improvements in the reporting capabilities of OpenEdge SQL. Similarly, support for Sonic keeps getting easier with 10.1C, and we'll talk about that in a separate note. We do take performance and manageability seriously, and the roadmap for this area really features a lot of customer-requested capabilities. We really appreciate your advice and insights into your needs. Please keep them coming! What are your top 5 needs in the database and appserver area?

Whatever happens in the scalability area, one thing remains. Scalability is not about standing still. It's about a dynamic range and that means opportunity for you and for us. Together.

Progress Software
Progress Software