Making Software a Service
This excerpt Making Software a Service is from the book, Building Applications in the Cloud: Concepts, Patterns, and Projects" authored by CHRISTOPHER M. MOYER. The book is published by Addison-Welsey Copyright 2011 Pearson Education, Inc. For additional information kindly visit the publishers site at Building Applications in the Cloud.
Developing your Software as a Service (SaaS) takes you away from the dark ages of programming and into the new age in which copyright protection, DMA, and pirating don't exist. In the current age of computing, people don't expect to pay for software but instead prefer to pay for the support and other services that come with it.When was the last time anyone paid for a web browser? With the advent of Open Source applications, the majority of paid software is moving to hosted systems which rely less on the users physical machines.This means you don't need to support more hardware and other software that may conflict with your software, for example, permissions, firewalls, and antivirus software.
Instead of developing a simple desktop application that you need to defend and protect against pirating and cloning, you can develop your software as a service; releasing updates and new content seamlessly while charging your users on a monthly basis.With this method, you can charge your customers a small monthly fee instead of making them pay a large amount for the program upfront, and you can make more money in the long run. For example, many people pirate Microsoft Office instead of shelling out $300 upfront for a legal copy, whereas if it were offered software online in a format such as Google Docs, those same people might gladly pay $12.50 a month for the service.
Not only do they get a web-based version that they can use on any computer, but everything they save is stored online and backed up. After two years of that user paying for your service, you've made as much money from that client as the desktop version, plus you're ensuring that they'll stay with you as long as they want to have access to those documents. However, if your users use the software for a month and decide they don't like it, they don't need to continue the subscription, and they have lost only a small amount of money. If you offer a trial-based subscription, users can test your software at no cost, which means they're more likely to sign up.
Tools Used in This Book
You need to take a look at some of the tools used throughout this book. For the examples, the boto Python library is used to communicate with Amazon Web Services.This library is currently the most full-featured Python library for interacting with AWS, and it's one I helped to develop. It's relatively easy to install and configure, so you can now receive a few brief instructions here. boto currently works only with Python 2.5 to 2.7, not Python 3. It's recommended that you use Python 2.6 for the purposes of this book.
Signing Up for Amazon Web Services
Before installing the libraries required to communicate with Amazon Web Services, you need to sign up for an account and any services you need.This can be done by going to http://aws.amazon. com/ and choosing Sign Up Now and following the instructions. You need to provide a credit card to bill you for usage, but you won't actually be billed until the end of each month.You can log in here at any time to sign up for more services.You pay for only what you use, so don't worry about accidentally signing up for too many things. At a minimum, you need to sign up for the following services:
After you create your account, log in to your portal by clicking Account and then choosing Security Credentials. Here you can see your Access Credentials, which will be required in the configuration section later. At any given time you may have two Access keys associated with your account, which are your private credentials to access Amazon Web Services.You may also inactivate any of these keys, which helps when migrating to a new set of credentials because you may have two active until everything is migrated over to your new keys.
You can install boto in several different ways, but the best way to make sure you're using the latest code is to download the source from github at http://github.com/boto/boto.There are several different ways to download this code, but the easiest is to just click the Downloads button and choose a version to download. Although the master branch is typically okay for development purposes, you probably want to just download the latest tag because that's guaranteed to be stable, and all the tests have been run against it before bundling.You need to download that to your local disk and unpack it before continuing.
The next step will be to actually install the boto package.As
with any Python package, this is done using the setup.py file, with
either the install or develop command. Open up a terminal, or
command shell on Windows, change the directory to where you
downloaded the boto source code, and run
Setting Up the EnvironmentAlthough there are many ways to set up your environment for boto, use the one that's also compatible with using the downloaded Amazon Tools, which you can find at http://aws.amazon.com/developertools. Each service has its own set of command-line-based developer tools written in Java, and most of them enable you to also use the configuration file shown here to set up your credentials. Name this file credentials.cfg and put it somewhere easily identified:
.zshrcor add the following to your
.tcshrcif you use T-Shell instead:
For boto, create a boto.cfg that enables you to configure some of the more boto-specific aspects of you systems. Just like in the previous example, you need to make this file and then set an environment variable, this time BOTO_CONFIG, to point to the full path of that file. Although this configuration file isn't completely necessary, some things can be useful for debugging purposes, so go ahead and make your boto.cfg:
# File: boto.cfg
local-ipv4 = 127.0.0.1
local-hostname = localhost
security-groups = default
public-ipv4 = 127.0.0.1
public-hostname = my-public-hostname.local
hostname = localhost
instance-type = m1.small
instance-id = i-00000000
# Set the default SDB domain
db_name = default
# Set up base logging
format=%(asctime)s [%(name)s] %(levelname)s %(message)s
The first thing to do here is set up an [Instance] section that makes your local environment act like an EC2 instance.This section is automatically added when you launch a boto-based EC2 instance by the startup scripts that run there.These configuration options may be referenced by your scripts later, so adding this section means you can test those locally before launching an EC2 instance.
Next, set the default SimpleDB domain to default, which will be used in your Object Relational Mappings you'll experiment with later in this chapter. For now, all you need to know is that this will store all your examples and tests in a domain called default, and that you'll create this domain in the following testing section. Finally, you set up a few configuration options for the Python logging module, which specifies that all logging should go to standard output, so you'll see it when running from a console.These configuration options can be custom configured to output the logging to a file, and any other format you may want, but for the basics here just dump it to your screen and show only log messages above the INFO level. If you encounter any issues, you can drop this down to DEBUG to see the raw queries being sent to AWS.
Testing It AllIf you installed and configured boto as provided in the previous steps, you should be able to launch a Python instance and run the following sequence of commands:
>>> import boto
>>> sdb = boto.connect_sdb()
The preceding code can test your connectivity to SimpleDB and create the default domain referenced in the previous configuration section.This can be useful in later sections in this chapter, so make sure you don't get any errors. If you get an error message indicating you haven't signed up for the service, you need to go to the AWS portal and make sure to sign up for SimpleDB. If you get another error, you may have configured something incorrectly, so just check with that error to see what the problem may have been. If you're having issues, you can always head over to the boto home page: http://github.com/boto/boto or ask for help in the boto users group:http://groups.google.com/group/boto-users.
What Does Your Application Need?After you have the basic requirements for your application and decide what you need to implement, you can then begin to describe what you need to implement this application.Typically this is not a question that you think about when creating smaller scale applications because you have everything you need in a single box. Instead of looking at everything together as one complete unit or 'box,' you need to split out what you actually need and identify what cloud services you can use to fit these requirements. Typical applications need the following:
Think about this application as a typical nonstatic website that
requires some sort of execution environment or web server, such as
an e-commerce site or web blog.When a request comes in, you
need to return an HTML page, or perhaps an XML or JSON representation
of just the data, that may be either static or dynamically
created.To determine this, you need to process the actual request
using your compute power.This process also requires fast temporary
storage to store the request and build the response. It may also
require you to pull information about the users out of a queryable
long-term storage location.
If you expand this simple website to include any service, you
can realize that all your applications need the same exact thing. If
you split apart this application into multiple layers, you can begin
to understand what it truly means to build SaaS, instead of just the
typical desktop application. One major advantage of SaaS is that it
lends itself to subscription-based software, which doesn't require
complex licensing or distribution points, which not only cuts cost,
but also ensures that you won't have to worry about pirating.
Because you're actually providing a service, you're locking your
clients into paying you every time that they want to use the service.
Taking a look back at your website, you can see that there are three main layers of this application.This is commonly referred to as a three-tier application pattern and has been used for years to develop SaaS.The three layers include the data layer to store all your long-term needs, the application layer to process your data, and the client or presentation layer to present the data and the processes you can perform for your client.
Data LayerThe data layer is the base of your entire application, storing all the dynamic information for your application. In most applications, this is actually split into two parts. One part is the large, slow storage used to store any file-like objects or any data that is too large to store in a smaller storage system.This is typically provided for you by a network-attached-storage type of system provided by your cloud hosting solution. In Amazon Web Services, this is called Simple Storage Service or S3.
Another large part of this layer is the small, fast, and queryable information. In most typical systems, this is handled by a database. This is no different in cloud-based applications, except for how you host this database.
Introducing the AWS DatabasesIn Amazon Web Services, you actually have two different ways to host this database. One option is a nonrelational database, known as SimpleDB or SDB, which can be confusing initially to grasp but in general is much cheaper to run and scales automatically.This nonrelational database is currently the cheapest and easiest to scale database provided by Amazon Web Services because you don't have to pay anything except for what you actually use. As such, it can be considered a true cloud service, instead of just an adaptation on top of existing cloud services.
Additionally, this database scales up to one billion key-value pairs per domain automatically, and you don't have to worry about over-using it because it's built using the same architecture as S3.This database is quite efficient at storing and retrieving data if you build your application to use with it, but if you're looking at doing complex queries, it doesn't handle that well. If you can think of your application in simple terms relating directly to objects, you can most likely use this database. If, however, you need something more complex, you need to use a Relational DB (RDB).
RDB is Amazon's solution for applications that cannot be built
using SDB for systems with complex requirements of their databases,
such as complex reporting, transactions, or stored procedures.
If you need your application to do server-based reports that use
complex select queries joining between multiple objects, or you
need transactions or stored procedures, you probably need to use
If you can't figure out which solution you need to use, you can
always use both. If you need the flexibility and power of SDB, use
that for creating your objects, and then run scripts to push that
data to MySQL for reporting purposes. In general, if you can use
SDB, you probably should because it is generally a lot easier to use.
SDB is split into a simple three-level hierarchy of domain, item,
and key-value pairs.
Figure 2.1 illustrates the relation between the three levels. In Figure 2.1, the connection between item to key-value pairs is a many-to-one relation, so you can have multiple key-value pairs for each item. Additionally, the keys are not unique, so you can have multiple key-value pairs with the same value, which is essentially the same thing as a key having multiple values.
Connecting to SDBConnecting to SDB is quite easy using the boto communication library. Assuming you already have your boto configuration environment set up, all you need to do is use the proper connection methods:
>>> import boto
>>> sdb = boto.connect_sdb()
>>> db = sdb.get_domain("my_domain_name")
This returns a single item by its name, which is logically equivalent to selecting all attributes by an ID from a standard database.You can also perform simple queries on the database, as shown here:
>>> db.select("SELECT * FROM `my_domain_name` WHERE `name`
?LIKE '%foo%' ORDER BY `name` DESC")
The preceding example works exactly like a standard relational DB query does, returning all attributes of any item that contains a key name that has foo in any location of any result, sorting by name in descending order. SDB sorts and operates by lexicographical comparison and handles only string values, so it doesn't understand that [nd]2 is less than [nd]1.The SDB documentation provides more details on this query language for more complex requests. Using an Object Relational Mapping boto also provides a simple persistence layer to translate all values so that they can be lexicographically sorted and searched for properly. This persistence layer operates much like the DB layer of Django, which it's based on. Designing an object is quite simple; you can read more about it in the boto documentation, but the basics can be seen here:
from boto.sdb.db.model import Model
from boto.sdb.db.property import StringProperty, IntegerProperty,
"""A simple object to show how SDB
Persistence works in boto"""
name = StringProperty()
some_number = IntegerProperty()
multi_value_property = ListProperty(str)
"""A second SDB object used to show how references work"""
name = StringProperty()
object_link = ReferenceProperty(SimpleObject,
This code creates two classes (which can be thought of like tables) and a SimpleObject, which contains a name, number, and multivalued property of strings.The number is automatically converted by adding the proper value to the value set and properly loaded back by subtracting this number.This conversion ensures that the number stored in SDB is always positive, so lexicographical sorting and comparison always works.The multivalue property acts just like a standard python list, enabling you to store multiple values in it and even removing values. Each time you save the object, everything that was in there is overridden. Each object also has an id property by default that is actually the name of the item because that is a unique ID.
It uses Python's UUID module to generate this ID automatically if you don't manually set it.This UUID module generates completely random and unique strings, so you don't rely on a single point of failure to generate sequential numbers.The collection_name attribute on the object_link property of AnotherObject is optional but enables you to specify the property name that is automatically created on the SimpleObject.This reverse reference is generated for you automatically when you import the second object.
boto enables you to create and query on these objects in the
database in another simple manor. It provides a few unique methods
that use the values available in the SDB connection objects of
boto for you so that you don't have to worry about building your
query.To create an object, you can use the following code:
To select an object given an ID, you can use the following code:
Application LayerThe application layer is where you'll probably spend most of your time because it is the heart and soul of any SaaS system.This is where your code translates data and requests into actions, changing, manipulating, and returning data based on inputs from users, or other systems.This is the only layer that you have to actually maintain and scale, and even then, some cloud providers offer you unique solutions to handle that automatically for you. In Google AppEngine, this is handled automatically for you.
In Amazon Web Services, this can be handled semi-automatically for you by using Auto-Scaling Groups, for which you can set rules on when to start and stop instances based on load averages or other metrics. Your application layer is built on top of a base image that you created and may also contain scripts that tell it to update or add more code to that running instance. It should be designed to be as modular as possible and enable you to launch new modules without impacting the old ones.This layer should be behind a proxy system that hides how many actual modules are in existence. Amazon enables you to do this by providing a simple service known as Elastic Load Balancing, or ELB.
Using Elastic Load BalancingAmazon's Elastic Load Balancing, or ELB, can be used simply and cheaply to proxy all requests to your modules based on their Instance ID. ELB is even smart enough to proxy only to systems that are actually live and processing, so you don't have to worry about server failures causing long-term service disruptions. ELB can be set up to proxy HTTP or standard TCP ports.This is simple to accomplish using code and can even be done on the actual instance as it starts, so it can register itself when it's ready to accept connections.This, combined with Auto-Scaling Groups, can quickly and easily scale your applications seamlessly in a matter of minutes without any human interaction. If, however, you want more control over your applications, you can just use ELB without Auto-Scaling Groups and launch new modules manually.
You must pass at least one listener and one zone as arguments to create the instance. Each zone takes the same distribution of requests, so if you don't have the same amount of servers in each zone, the requests will be distributed unevenly. For anything other than just standard HTTP, use the tcp protocol instead of http. Note the DNS Name returned by this command, which can also be retrieved by using the elbadmin get command.This command can also be used at a later time to retrieve all the zones and instances being proxied to by this specific ELB.The DNS Name can be pointed to by a CNAME in your own domain name.This must be a CNAME and not a standard A record because the domain name may point to multiple IP addresses, and those IP addresses may change over time.
Recently, Amazon also released support for adding SSL termination to an ELB by means of the HTTPS protocol.You can find instructions for how to do this on Amazon's web page. At the time of this writing, boto does not support this, so you need to use the command-line tools provided by Amazon to set this up.The most typical use for this will be to proxy port 80 to port 443 using HTTPS. Check with the boto home page for updates on how to do this using the elbadmin command-line script.
Adding Servers to the Load BalancerAfter you have your ELB created, it's easy to add a new instance to route your incoming requests to.This can be done using the elbadmin add command:
% elbadmin add test i-2308974
This instance must be in an enabled zone for requests to be proxied.You can add instances that are not in an enabled zone, but requests are not proxied until you enable it.This can be used for debugging purposes because you can disable a whole zone of instances if you suspect a problem in that zone. Amazon does offer a service level agreement (SLA), ensuring that it will have 99% availability, but this is not limited to a single zone, thus at any given time, three of the four zones may be down. (Although this has never happened.)
It's generally considered a good idea to use at least two different zones in the event one of them fails.This enables you the greatest flexibility because you can balance out your requests and even take down a single instance at a time without effecting the service. From a developer's perspective, this is the most ideal situation you could ever have because you can literally do upgrades in a matter of minutes without having almost any impact to your customers by upgrading a single server at a time, taking it out of the load balancer while you perform the upgrade.
Although ELB can usually detect and stop proxying requests
quickly when an instance fails, it's generally a good idea to remove
an instance from the balancer before stopping it. If you're intentionally
replacing an instance, you should first verify that the new
instance is up and ready, add it to the load balancer, remove the old
instance, and then kill it.This can be done with the following three
commands provided in boto package:
i-69c3e401 us-east-1a Wordpress ..compute-1.amazonaws.com
i-e4675a8c us-east-1c default ..compute-1.amazonaws.com
i-e6675a8e us-east-1d default ..compute-1.amazonaws.com
i-1a665b72 us-east-1a default ..compute-1.amazonaws.com
This command prints out the instance IDs, Zone, Security Groups, and public hostname of all instances currently running in your account, sorted ascending by start date.The last instances launched will be at the bottom of this list, so be sure to get the right instance when you're adding the newest one to your ELB.The combination of these powerful yet simple tools makes it easy to manage your instances and ELB by hand.
Although the load balancer is cheap (about 2.5 cents per hour
plus bandwidth usage), it's not free. After you finish with your load
balancer, remove it with the following command:
Automatically Registering an Instance with a Load BalancerIf you use a boto pyami instance, you can easily tell when an instance is finished loading by checking for the email sent to the address you specify in the Notification section of the configuration metadata passed in at startup. An example of a configuration section using gmail as the smtp server is shown here:
smtp_host = smtp.gmail.com
smtp_port = 587
smtp_tls = True
smtp_user = firstname.lastname@example.org
smtp_pass = MY_PASSWORD
smtp_from = email@example.com
smtp_to = firstname.lastname@example.org
Assuming there were no error messages, your instance should be up and fully functional. If you want the instance to automatically register itself when it's finished loading, add an installer to your queue at the end of your other installers. Ensure that this is done after all your other installers finish so that you add only the instance if it's safe.A simple installer can be created like the one here for ubuntu:
from boto.pyami.installers.ubuntu.installer import Installer
"""Register this instance with a specific ELB"""
"""Register with the ELB"""
# code here to verify that you're
# successfully installed and running
elb_name = boto.config.get("ELB", "name")
elb = boto.connect_elb()
b = ebl.get_all_load_balancers(elb_name)
if len(b) <1:
raise Exception, "No Load balancer found"
b = b
This requires you to set your configuration file on boot to contain a section called ELB with one value name that contains the name of the balancer to register to.You could also easily adapt this installer to use multiple balancers if that's what you need. Although this installer will be called only if all the other installers before it succeed, it's still a good idea to test anything important before actually registering yourself with your balancer.
HTTP and RESTNow that you have your instances proxied and ready to accept requests, it's time to think about how to accept requests. In general, it's a bad practice to reinvent the wheel when you can just use another protocol that's already been established and well tested. There are entire books on using HTTP and REST to build your own SaaS, but this section provides the basic details. Although you can use HTTP in many ways, including SOAP, the simplest of all these is Representational State Transfer (REST), which was officially defined in 2000 by Roy Fielding in a doctoral dissertation 'Architectural Styles and the Design of Network-based Software Architectures' (http://www.ics.uci.edu/~fielding/pubs/ dissertation/top.htm). It uses HTTP as a communication medium and is designed around the fundamental idea that HTTP already defines how to handle method names, authentication, and many other things needed when working with these types of communications. HTTP is split into two different sections: the header and the body (not to be confused with the HTML and tags), each of which is fully used by REST.
This book uses REST and XML for most of the examples, but this is not the only option and may not even suite your specific needs. For example, SOAP is still quite popular for many people because of how well it integrates with Java. It also makes it easy for other developers to integrate with your APIs if you provide them with a Web Service Definition Language (WSDL) that describes exactly how a system should use your API.The important point here is that the HTTP protocol is highly supported across systems and is one of the easiest to use in many applications because much of the lower-level details, such as authentication, are already taken care of.
The HeaderThe HTTP header describes exactly who the message is designed for, and what method the user is instantiating on the recipient end. REST uses this header for multiple purposes. HTTP method names can be used to define the method called and the arguments (path) which is sent to that method.The HTTP header also includes a name, which can be used to differentiate between applications running on the same port.This shouldn't be used for anything other than differentiating between applications because it's actually the DNS name and shouldn't be used for anything other than a representation of the server's address.
The method name and path are both passed into the application.
Typically you want to use the path to define the module,
package, or object to use to call your function.The method name
is typically used to determine what function to call on that module,
package, or object. Lastly, the path also contains additional
arguments after the question mark (?) that usually are passed in as
arguments to your function. Now take a look at a typical HTTP
If-MatchThe most interesting header that you can provide for is the If-Match header.This header can be used on any method to indicate that the request should be performed only if the conditions in the header represent the current object.This header can be exceptionally useful when you operate with databases that are eventually consistent, but in general, because your requests can be made in rapid succession, it's a good idea to allow for this so that they don't overwrite each other. One possible solution to this is to provide for a version number or memento on each object or resource that can then be used to ensure that the user knows what the value was before it replaces it.
In some situations, it may be good to require this field and not accept the special * case for anyone other than an administrative user. If you require this field to be sent and you receive a request that doesn't have it, you should respond with an error code of 412 (Precondition Failed) and give the users all they need to know to fill in this header properly. If the conditions in this header do not match, you must send back a 412 (Precondition Failed) response.This header is typically most used when performing PUT operations because those operations override what's in the database with new values, and you don't know if someone else may have already overridden what you thought was there.
If-Modified-SinceThe If-Modified-Since header is exceptionally useful when you want the client to contain copies of the data so that they can query locally. In general, this is part of a caching system used by most browsers or other clients to ensure that you don't have to send back all the data if it hasn't been changed.The If-Modified-Since header takes an HTTP-date, which must be in GMT, and should return a 304 (Not Modified) response with no content.
If-Unmodified-SinceIf you don't have an easy way to generate a memento or version ID for your objects, you can also allow for an If-Unmodified-Since header.This header takes a simple HTTP date, formatted in GMT, which is the date the resource was last retrieved by the client.This puts a lot of trust in the client, however, to indicate the proper date. It's generally best to use the If-Modified header instead, unless you have no other choice.
AcceptThe Accept header is perhaps the most underestimated header in the entire arsenal. It can be used not only to handle what type of response to give (JSON, XML, and so on), but also to handle what API version you're dealing with. If you need to support multiple versions of your API, you can support this by attaching it to the content type.This can be done by extending the standard content types to include the API version number:
This enables you to specify not only a revision number (in this case, 1.0) and content type, but also the name of the application so that you can ensure the request came from a client that knew who it was talking to.Traditionally, this header will be used to send either HTML, XML, JSON, or some other format representing the resource or collection being returned.
AuthorizationThe Authorization header can be used just like a standard HTTP authentication request, encoding both the password and the username in a base64 encoded string, or it can optionally be used to pass in an authentication token that eventually expires. Authentication types vary greatly, so it's up to you to pick the right version for your application.The easiest method is by using the basic HTTP authentication, but then you are sending the username and password in every single request, so you must ensure that you're using SSL if you're concerned about security.
In contrast, if you choose to use a token or signing-based authentication method, the user has to sign the request based on some predetermined key shared between the client and server. In this event, you can hash the entire request in a short string that validates that the request did indeed come from the client.You also need to make sure to send the username or some other unique identifier in this header, but because it's not sending a reversible hash of the password, it's relatively safe to send over standard HTTP. We won't go into too much depth here about methods of hashing.
The BodyThe body of any REST call is typically either XML or JSON. In some situations, it's also possible to send both, depending on the Accept header.This process is fairly well documented and can be used to not only define what type of response to return, but also what version of the protocol the client is using.The body of any request to the system should be in the same format as the response body.
In my applications, I typically use XML only because there are some powerful tools, such as XSLT, that can be used as middleware for authorization purposes. Many clients, however, like the idea of using JSON because most languages serialize and deserialize this quite well. REST doesn't specifically require one form of representation over the other and even enables for the clients to choose which type they want, so this is up to you as the application developer to decide what to support.
MethodsREST has two distinct, important definitions that you need to understand before continuing.A collection is a group of objects; in your case this usually is synonymous with either a class in object terms, or a table in database terms.A resource is a specific instantiation of a collection, which can be thought of as an instance in object terms or a row in a table in database terms.This book also uses the term property to define a single property or attribute on an instance in object terms, or a cell in database terms.Although you can indeed create your own methods, in general you can probably fit most of your needs into one of the method calls listed next.
GETThe GET method is the center of all requests for information. Just like a standard webpage, applications can use the URL in two parts; everything before the first question mark (?) is used as the resource to access, and everything after that is used as query parameters on that resource.The URL pattern can be seen here:
The resource_id property_name and query in the preceding example are all optional, and the query can be applied to any level of the tree. Additionally, this tree could expand exponentially downward if the property is considered a reference. Now take a simple example of a request on a web blog to get all the comments of a post specified by POST-ID submitted in 2010.This query could look like this:
The preceding example queries for the posts collection for a specific post identified as POST-ID. It then asks for just the property named comments and filters specifically for items with the property submitted that matches 2010%.Responses to this method call can result in a redirection if the resource is actually located at a different path.This can be achieved by sending a proper redirect response code and a Location header that points to the actual resource.
A GET on the root of the collection should return a list of all resources under that path. It can also have the optional ?query on the end to limit these results. It's also a good idea to implement some sort of paging system so that you can return all the results instead of having to limit because of HTTP timeouts. In general, it's never a good idea to have a request that takes longer then a few seconds to return on average because most clients will assume this is an error. In general, most proxy systems will time out any connection after a few minutes, so if your result takes longer than a minute, it's time to implement paging.
If you use XML as your communication medium, think about
implementing some sort of ATOM style next links.These simple
tags give you a cursor to the next page of results, so you can store a
memento or token of your query and allow your application to pick
up where it left off.This token can then be passed in via a query
parameter to the same URL that was used in the original request. In
general, your next link should be the full URL to the next page of
results. By doing this, you leave yourself open to the largest range of
possibilities for implementing your paging system, including having
the ability to perform caching on the next page of results, so you
can actually start building it before the client even asks for it.
PUT The HTTP definition of a PUT is replace, so if you use this on the base URL of a resource collection, you're actually requesting that everything you don't pass in is deleted, anything new is created, and any existing resources passed in are modified. Because PUT is logically equivalent to a SET operation, this is typically not allowed on a collection.
A PUT on a resource should update or create the record with a specific ID.The section after the collection name is considered the ID of the resource, and if a GET is performed after the PUT, it should return the resource that was just created or updated. PUT is intended as an entire replacement, but in general, if you don't pass in an attribute, that is assumed to be 'don't change,' whereas if you pass in an empty value for the attribute, you are actually requesting it to be removed or set to blank.
A PUT on a property should change just that specific property for the specified resource.This can be incredibly useful when you're putting files to a resource because those typically won't serialize very well into XML without lots of base64 conversions. Most PUT requests should return either a 201 Created with the object that was just created, which may have been modified due to server application logic, or a 204 No Content if there are no changes to the original object request. If you operate with a database that has eventual consistency, it may also be a good idea to instead return a 202 Accepted request to indicate that the client should try to fetch the resource at a later time.You should also return an estimate of how long it will be before this object is created.
POST A POST operation on a collection is defined as a creation request. This is the user requesting to add a new resource to the collection without specifying a specific ID. In general, you want to either return a redirection code and a Location header, or at the least the ID of the object you just created (if not the whole object serialized in your representation).
A POST operation on a resource should actually create a subobject of that resource; although, this is often not used.Traditionally, browsers don't handle changing form actions to PUT, so a POST is typically treated as a special form of a PUT that takes formencoded values.
A POST operation on a property is typically used only for uploading files but could also be used as appending a value to a list. In general, this could be considered an append operation, but if your property is a single value, it's probably safe to assume that the client wanted this to be a PUT, not a POST.
DELETE A DELETE operation on a collection is used to drop the entire collection, so you probably don't want to allow this. If you do allow it, this request should be treated as a request to remove every single resource in the collection.
A DELETE operation on a specific resource is simply a request to remove that specific resource. If there are errors in this request, the client should be presented with a message explaining why the request failed.The resulting error code should explain if the request can be issued again, or if the user is required to perform another operation before reissuing the request.The most typical error message back from this request is a 409 Conflict, which indicates that another resource is referencing this resource, and the server is refusing to cascade the delete request.A DELETE may also return a 202 Accepted response if the database has eventual consistency. A DELETE operation on a specific property is identical to a PUT with an empty value for that property. It can be used to delete just a single property from a resource instead of having to send a PUT request.This can also be used as a differentiation between setting a value to blank and removing the value entirely. In programming terms, this the difference between an empty string and None or Null.
A HEAD request on any URL should return the exact same headers
as a standard GET request but shouldn't send back the body.This is
typically used to get a count of the number of results in a response
without retrieving the actual results. In my applications, I use this
to send an additional header X-Results, which contains the number
of results that would have been retrieved.
Authorization LayerThe authorization layer sits just above your application layer but is still on the sever. In most application systems, this is actually integrated directly with your application layer. Although this is generally accepted, it doesn't provide for as much flexibility, and it's a lot harder to code application logic and authorization in the same location. Additionally, if you go to change your authentication, you now have to worry about breaking your application layer and your authentication layer. If you build this as a separate layer, you can most likely pull all authorization directly out of a database, so you don't have to worry about changing your code just for a minor change in business logic.
If you use XML for your object representation, you can use XSLT for your authorization layer, and use some custom tags to pull out this authorization logic directly from your database. If you use an application framework such as Django or Ruby Rails, chances are you already have this layer built for you, either directly or in a third-party module. Check your specific language for how to build your own extensions for XSLT.When you can build your own extensions into your XSLT processor, not only can your filters be retrieved from a shared filesystem so that you can update them without redistributing them to each of your servers, but you can also pull the exact details about authorization from there.These XSLT filters can be used to specifically hide elements of the response XML that the user shouldn't see.The following example code assumes you've already built a function called hasAuth that takes three arguments, authorization type (read, write, delete), object type, and property name:
The preceding example is for output from your server, but you
could easily adapt this to any method you need by changing the
first argument to each hasAuth call to whatever method this is filtering
on.You could also easily use this as a base template and pass
in the method name to the filter.This example assumes you have
an input XML that looks something like the following example:
Client LayerAfter you build your complex application server, you need to focus on making that simple WebAPI usable to your clients. Although developers may be fine with talking XML or JSON to your service, the average user probably won't be, so you need to build a client layer.The biggest concept to understand about this layer is that you cannot trust anything it sends.To do so means you're authenticating the client, and no matter what kind of protection you try to do, it's impossible to ensure that the information you get from any client is your client.
You must assume that everything you send to your client is viewable by the user. Don't assume the client will hide things that the user shouldn't see. Almost every web service-based vulnerability comes from just blindly trusting data coming from a client, which can often be formed specifically for the purpose of making your application do something you didn't intend.This is the entire basis behind the SQL injection issues many websites still suffer from. Because of these types of security concerns, you have to ensure that all authentication and authorization happens outside of this layer.
You can develop this layer in several different ways, the easiest of which is to expose the API and allow other third-party companies to build their own clients.This is the way that companies such as Twitter handled creating its clients.You can also make your own clients, but by having a well-documented and public-facing API, you expose yourself to having other people and companies developing their own clients. In general, you have to be ready for this event, so it's always a good idea to ensure that the client layer has only the access you want it to have.You can never tell for sure that the client you're talking to is one that you've developed.
Native ApplicationsIf you've ever used Twitter, chances are you know that not only is there a web interface, but there are also dozens of native client applications that work on a wide variety of platforms. In your system, you can make as many different client applications as you want to, so eventually you may want to branch out and make some native applications for your users. If you're building that wonderful new electronic filing cabinet to organize your entire world online, it might be a good idea to also provide some native apps, such as an iPhone application that also communicates to the same API as every other client to bring you the same content wherever you are in the world.
Even if you're not the one building native applications, it may still be a good idea to let other third parties have this option. If you make your API public and let third-party companies make money from creating client layers, you not only provide businesses with a reason to buy into your app, but you also don't need to do any of the development work yourself to increase your growth. Although this may not seem intuitive because people are making a profit off your system, every penny they earn is also bringing you business, and chances are they're marketing as well. Not only does this give your users more options for clients, but it also gives your system more exposure and potential clients.
SummaryJust like cloud providers give you hardware as a service, developing your applications as Software as a Service gives you another level of benefit, expanding your applications beyond the standard services. By expanding your application to be a service instead of just software, you're giving yourself a huge advantage over any competition. Services are always growing and have infinite potential to keep customers. SaaS gives your customers the same advantage that the cloud gives to you, low initial cost and a reason to keep paying. Everything from business-level applications to the newest games are being transformed from standard single-person applications into services. Don't let your development time go to waste by developing something that will be out of date by the time it's released.