October 25, 2011

Cool Stuff: OpenEdge 11 Multi-tenancy

Posted by Gus Bjorklund

One of the most important and useful new capabilities in the upcoming OpenEdge 11 release (planned for December 2011) is direct support for multi-tenancy in the database. What is multi-tenancy and why would I want one? Read on and I'll tell you.

The notion of multi-tenancy arises in the field of Software-as-a-Service (aka SaaS). When a vendor offers an application to be used as a "service", its customers do not have to buy a computer system to run the application on, nor do they have to have staff trained in the care and feeding of the system or backing up the data. Instead, customers subscribe to the service and the vendor does all of that. The customer simply uses the application over the Internet and has no idea where it is or what computer it is in. When you use a search engine, you don't know where it is and it doesn't matter. It's the same with SaaS applications. A SaaS application you are probably already familiar with is email. Google, Microsoft, Yahoo, and most ISP's run email servers for their subscribers. You don't have to know anything except where to log in. Someone else is responsible for everything to do with operating and maintaining the service. You just receive, read, compose and send your mail.

All this is very easy, even trivial, for the Software-as-a-Service customer. But what about the poor vendor? The vendor has to do all that messy IT stuff. Is he going to have a dedicated computer for each of the customers, as they would if they were running the application on their own computer? No, of course not. The vendor is going do everything possible to hold their operating costs as low as they can. That is where multi-tenancy comes in. Each of the SaaS vendor's customers is called a "tenant", a word taken from the rental housing market. In an apartment block we can have many tenants in the same building, all living in separate spaces. Similarly, we can put many application tenants into the same computer.

With a number of tenants sharing the same computer, the SaaS vendor has fewer machines to buy, fewer machines to take care of, and less work to do. No tenant can see the other tenants or their data and in fact do not know anything about them or their existence. Quite a number of the SaaS vendors do this. Since it is very simple to with OpenEdge, many of them have created a separate database for each tenant. But that means you have to do backups, schema changes, and other maintenance functions individually for each tenant's database. It would be much better if tenants could share the database too. We call this database multi-tenancy.

In OpenEdge 10 and before, with quite a bit of work, you can achieve database multi-tenancy. Some of our partners have rolled up their sleeves and done it. What you need to do is this:

  • First, add a "tenant identifier" column to every table. This tenant id column is a column that contains a unique identification number, perhaps an integer, assigned to each tenant. The value indicates which tenant owns the data in each row of the table.
  • Next, add the tenant id column to every index as the leading key component.
  • Create a table to store the tenant names and their tenant id's and assign an id to each tenant.
  • Then, go through all the code in your application and everywhere that a new table row is created, assign the correct value to the row's tenant id column.
  • You also have to invent a way to keep track of which tenant id is currently in effect.
  • Finally, go through all the code in the application again and find all the queries. Modify each WHERE clause to add a term that says "(tenantId = currentTenant) and ". Don't forget CAN-FIND. And make sure to add the tenant id term for each table in a multi-table query.

Once you do all those things, you can have database multi-tenancy. But in addition from the obvious fact that taking this approach is labor-intensive and invasive, there are a number of other disadvantages. I will list just a few here:

0) It is error prone. If you make a mistake when you change the code to do multi-tenancy, the wrong tenant's data will be returned. Or if you forget when you or another developer is fixing a bug, the wrong tenant's data will be returned.

1) Even if you use Type ii data areas, rows from multiple tenants will be commingled in the same data blocks and the same table's allocation clusters. This negates many of the advantages of using Type ii data areas. You get lower I/O efficiency because one tenant will have to read a data block that contains other tenants' data. Your customers will probably have the perception (whether true or not) that commingling their data reduces its security.

2) You can't do per-tenant maintenance easily. How do you reindex just one tenant's data?

3) How do you restore one tenant's data when they do something foolish like run end of month processing in the middle of the month?

4) You can't do per-tenant disk space allocation or disk space usage tracking very easily, if at all.

5) There is lock interference among tenants. Table-locks can lock out all the other tenants.

In spite of the disadvantages, I think the advantages far outweigh them and it is worth considering the use of this approach. But what if you could eliminate all the disadvantages? What if you could have your cake and eat it too? That's where OpenEdge 11 comes in. All that work I said you have to do? Gone. All those disadvantages I listed? All gone. OpenEdge 11 does all the hard work.

With the OpenEdge 11 RDBMS, database multi-tenancy is an inbuilt feature. The database knows what tenants are, who they are, and where their data are. It knows where to put new data and where to get existing data for each and every tenant. You do not have to modify all of the data access parts of your application. In fact, you shouldn't have to change much of anything! Most of your code should just work.

Well, all right, maybe you do have to make a few changes. Those changes have to do with how a user logs in to the application and the database and how the user's identify is verified. As I said, the database knows about tenants. But you will have to tell it which tenant a user belongs to. In the 4GL we use something called the CLIENT-PRINCIPAL to help in detraining that.

The CLIENT-PRINCIPAL (aka the "cp") is an inbuilt and extensible security token that we added to OpenEdge a few years ago, in the 10.1 release. The cp encapsulates a user's identity once it has been validated. In OpenEdge 11 we use the cp (with some enhancements) to encapsulate both user identity and tenant identity. Depending on which cp token is currently in effect in the 4GL runtime, the database uses the tenant id to decide what data to return for a query. For code running in AppServers and accessing the database on behalf of different users at different times, the AppServer can easily switch the cp that is in effect to that of the user that made the AppServer call.

To get ready for OpenEdge 11, you should learn about the CLIENT-PRINCIPAL. The name may sound a bit intimidating but it is really very easy to use. It takes only 3 lines of code to make one and to validate the user's identity. Go and watch the video of Sarah Marshall's Exchange Online 2010 talk over on PSDN.

In the OpenEdge 11 RDBMS, each tenant gets a separate data partition for each multi-tenant table (and not every table has to be made multi-tenant), and each data partition has its own associated index partitions. The tenant id in the cp is used to control which data partition to fetch table rows from and a tenant only gets to see their own data (and data in regular shared tables). We also have a special tenant called the "super tenant", conceptually similar to the UNIX root user, that is allowed to see /all/ the data.

This scheme works really well, is very efficient, and requires very few application changes. There are of course a lot of other things in OpenEdge 11. But I don't have space to talk about them just now and we will have to do that another time.

I hope you will like the new release. It is really cool.

September 13, 2010

Join us for a Discussion on Load Testing with Gus Bjorklund

Posted by Ken Wilner

 

Upon joining Progress in 1989, Gus Bjorklund fixed bugs in Progress 4.2N. Since then, Gus has joined the OpenEdge Best Practices team and he has worked on various initiatives, focusing primarily on the OpenEdge RDBMS. He has been a frequent speaker at Progress conferences and user group events and was an executive producer and session presenter during Progress' first online conference in 2009.

 

During Progress Exchange Online 2010, Gus will be returning as a producer and speaker and in his presentation, "Introduction 2 Load Testing", he will explain the real benefits of load testing OpenEdge applications and how to get started doing it.

 

Load testing is useful for a variety of purposes, including capacity planning, system sizing, regression testing, stress testing, and proving your application's performance and scalability to sales prospects. The key, as is the case with many other kinds of projects, is planning. Testing everything an application does is usually impractical so an important element of planning for load testing is deciding what to leave out.

 

Join Gus at his session "Introduction 2 Load Testing" at Exchange Online 2010 on Tuesday September 14, 2010 at 10:30 am. For more information and to register, please visit www.progress.com/exchange2010

 

July 20, 2010

DIY Cloud ?

Posted by Gus Bjorklund

I've mentioned Amazon EC2 a few times here. But did you know that you could build your own elastic cloud infrastructure instead of using Amazon's?

Eucalyptus is software that was originally developed as a research project in the Computer Science Department, at the University of California Santa Barbara. Now it has been spun off as an open source commercial venture to continue its development. Eucalyptus is largely compatible with Amazon's API's.

From the Eucalyptus FAQ: "Eucalyptus is a private cloud-computing platform that implements the Amazon specification for EC2, S3, and EBS. Eucalyptus conforms to both the syntax and the semantic definition of the Amazon API and tool suite, with few exceptions. Along with providing a REST and SOAP interface compatible with Amazon AWS, Eucalyptus also exposes administrative functionalities (e.g., user management, storage configuration, network management, hypervisor configuration, etc.) for managing and maintaining the cloud. Our implementation of the specification, however, is undoubtedly different than Amazon's implementations. These differences are related to several engineering design decisions. Our primary goal is to provide an open-source software tool for community distribution that is highly scalable and extensible, as well as easy to install and maintain."

Check it out and let me know what you think.

June 07, 2010

My First OpenEdge GUI Application, And In The Cloud Too !

Posted by Gus Bjorklund

Last week I built my very first OpenEdge GUI Application. I even used OpenEdge Architect to do it (I normally use vi for programming). The whole development environment was running up in the cloud. I performed this remarkable feat at a workshop during the Netherlands PUG meeting near Utrecht.

The workshop was ably directed by Peter van Dam, from Futureproof Software. He made it really easy for about 60 beginners to learn how to build a simple but functional GUI application with a login screen, an updateable data grid, ribbons, etc with an Office-2007-like appearance.

Each workshop student was given a login for an AMI on Amazon EC2 where the OpenEdge development environment was installed and configured, along with a few image files for icons, and a database with customer records already in it.  This was accessed using Windows Remote Desktop. Nobody had to install anything on their own machine. Peter led everybody through the exercise in two 90 minute sessions with a half-hour break between them. Since he was speaking in Dutch, I had some trouble understanding him.

Nearly everybody finished their application. I had some cosmetic bugs I didn't bother to fix (labels in the wrong place and stuff like that) so mine was not 100% complete.

A few observations:

0) Amazon EC2 works really well for this sort of thing. Dev tools, code, database, everything was somewhere in the cloud. You could start as many machines as needed and run them as long as needed.

1) I was pleasantly surprised at how responsive the GUI was, running Remote Desktop over a hotel Internet connection to Amazon's data center.

2) Developing with the Eclipse-based OpenEdge Architect tool in the cloud is a lot nicer than using some crappy development tool running in a web browser.

3) The Infragistics UI controls have a lot of functionality and are very complicated.  Without a cookbook, I could never have figured out what to do and which properties did what.

4) Programming by clicking, clicking, and more clicking with the mouse is really boring.

5) Boring or not, it's powerful.  You can get a lot done quickly.

One more thing: at the very beginning of the workshop, the hotel's router had a meltdown. It took about 90 minutes or so to get a replacement. Being clever folks, the Dutch PUG meeting organizers had scheduled two talks for after the workshop. So we did these first, while the router was being dealt with.

All in all, an excellent outcome. If I can do this, you can too.

November 20, 2009

Exchange Online 2009 - Behind the Scenes

Posted by Gus Bjorklund

In early Spring 2009, when we began planning our first online conference, we had no idea what to do. We knew that we wanted to do something unique and different, but had no idea how long it would take get things done, what would be hard or easy, or how many people we would need. We didn't even know the right questions to ask. I spent a number of weeks educating myself, looking at and participating in online conferences other companies, universities, and organizations like the Smithsonian Museum were doing. Among other things, I spoke at BravePoint's Virtual InterChange conference, which allowed me to see what's involved in doing a live broadcast.

We eventually decided to do something akin to a TV "newsmagazine" style broadcast with a number of separate programs and "episodes", using a variety of media and techniques. What we produced in the end was, on the one hand, not too far from that, and on the other, not at all like that.

We had numerous discussions about what to do, how to do it, and who we could get to help.  Also many arguments about things which turned out not to be important in the end. Key to helping figure it out and get it all done was Cramer, a production company that has done a lot of work for us in the past for our F2F (face-to-face) conferences and all sorts of videos.

Among the things we discussed endlessly were how many technical sessions we should have and how long they should be. Would anyone come? How long an attention span would an online audience have? How many days would be people be willing to "tune in"? For how long? 2 hours? 4 ?  6 ? Would people leave if there was "dead air"?  What should we do about people in different timezones? How many sessions would we have time to produce? Is live or recorded better? Which is harder to do? We got many different answers to these questions. In the end we decided on two concurrent video "channels" starting with general sessions followed by a series of 30 minute technical sessions with a short question-and-answer period in the last few minutes of each one, and a live discussion channel. The sessions were recorded but the q&a period was live. Would that work? We wouldn't know until we actually did it.

When we opened the conference registration site, we got a huge surprise. After two days, we saw that about two hundred people had registered. Then, over the weekend, registrations spiked up over 3,000 and continued rising. At first we were thrilled. Then we learned that someone had posted a link to our registration page on several sites that enabled people to find places where they could get free stuff. We had free stuff. Anyone who registered would get a free T-shirt. Over 4,000 people who had no interest in the conference had signed up just for the free shirt. What else could go wrong? I loaded all the registration information into an OpenEdge database and wrote a 4GL program to rank all the registrations and identify the bogus ones so we could delete them.

Gradually, things came together and all the speakers got their materials prepared and practiced their talks. In early August, we began filming. I spent 4 weeks at Cramer's studios, working with the speakers while we were filming and then with Cramer's folks (thanks Theo!) editing video while others worked on the web site, registration, promotion, and many other things. It was a lot of work but it was fun.

Finally we were ready. The conference broadcast went live at 8:30 am on September 15. We held our breath. Everything worked, with only a few small technical glitches. Here's a picture of the control room for one of the channels.

Exchange_control_room



When the broadcast was finally finished on the third day, we were relieved.  We made it.

As we discovered, 30 minute sessions worked pretty well. Almost. In hindsight, we learned that:

0) 23 minutes of content with 7 minutes of q&a time didn't allow enough time for questions for most of the sessions,

1) our speakers were not used to speaking for such short periods and had some difficulty with that,

3) there were (deliberately) no breaks between the end of one session and the start of the next. Feedback says we should have had them,

4) the "networking lounge" and discussions were hard to use and didn't work as nearly as well as they should have. In spite of the difficulty some good discussion did take place.

Still, I think those were relatively small problems.  In the grand scheme of things, I would say Exchange Online 2009 was a success.

Session videos for the Exchange Online 2009 conference are still available for viewing for another month.  If you haven't had a chance to visit, please point your browser to http://events.unisfair.com/rt/exchangeonline~sept2009
Progress Software
Progress Software