Dealing with state in modern Apps (2/2)

My previous blog post covered the need for handling state across devices/users and introduced the different Windows Azure storage options. In this post, I want to discuss the approach to data architecture in more detail.

Why not just use SQL databases?
While SQL databases provide many of the functionality known from a RDBMS they come with a higher price point and pretty hard size limitations (150GB as of March 2013). This makes them great for solutions with a predictable amount of data and scenarios which benefit from RDMBS capabilities such as Transact-SQL support. Another benefit might be the reuse of your client libraries because tabular data stream (TDS) being the communication protocol for both SQL Server and SQL databases.

However most services will have the need to store and query an increasing amount of data which pushes  a single database at its scale up limitations. Since cloud computing is based on scale out we’re soon confronted with the challenge to partition our data across multiple storage nodes or different storage technologies (such as Tables, Blobs, Hadoop on Azure, SQL databases, …).

data partitioning

data partitioning

While traditional reasons for partioning where predominately about horizontal partitioning (e.g. sharding) the cloud provides new reasons for data partitioning such as cost optimization through the usage of different storage technologies or the ability to only temporarily store data (e.g. when running a Monte Carlo simulation on a Hadoop cluster on Windows Azure).

Horizontal partitioning
In horizontal partitioning, we spread all data across similar nodes to achieve massive scale out of data and load. In such a scenario, all queries within a partition are fast and simple while querying data cross-partitions becomes expensive. An example of horizontal partitioning is the distribution of an order table according to the customer which placed the order. In this example we partition the order table using the customer as the partition key. This would make it very efficient for retrieving orders that belong to a specific customer but very ineffective to retrieve information that involves cross customer queries such us “What are the customers that ordered product xyz”.

Vertical partitioning
In vertical partitioning, we spread data across dis-similar nodes to take advantage of different storage capabilities within a logical dataset. By doing so, we can leverage more expensive indexed storage for frequently queried data but store large data entities in cheaper storage (such as blob and tables). For instance, we could store all order information in a SQL database except the order documents, which we store as pdf in blob storage. The downside of this approach is that retrieving a whole row requires more than just one query.

Hybrid partitioning
In hybrid partitioning we take advantage of horizontal and vertical partitioning within the same logical dataset. For instance leverage horizontal partitioning across multiple similar SQL databases (sharding) but use blob storage to store the order documents.

To take advantage of cheap cloud storage we must partition our data.

partitioning conclusions

partitioning conclusions

In all partitioned scenarios it is cheap to query data within a partition but expensive to query it across multiple partitions or storage types. However since storage is fairly cheap and available in unlimited capacity, it is a very common approach to aggressively duplicate data to ensure every query includes a partition key. By doing so, we optimize the service for data retrieval. For example, if we have an order table which is partitioned by customers, it is expensive to retrieve a list of customers which ordered product xyz. This is because we can’t provide the query with a partition key. One way to address this problem is to create a second table which duplicates the data but uses product as the partition key. We basically optimize our service for data retrieval and not for data inserts. Which is a fundamental change for many of us used to SQL databases.

Dealing with state in modern Apps (1/2)

Too many Apps are designed for single device usage and they don’t allow me to share and store data across devices. This not only makes the configuration of a new/additional device painful but in my case, it also makes me decide against a re-purchase of certain Apps. Take for instance Angry Birds: When I switched from my HTC to my Nokia, I basically lost all my unlocked levels. I wouldn’t mind purchasing the game a second time but I have definitely no interest in replaying all the different levels again… this would be just too painful. While upgrading a device is normally not a daily routine, the inability to share data across devices becomes a painful shop-stopper for sequential and simultaneous device usage. There are examples of Apps which preserve state across devices but the trend will go towards seamless cross device usage which will lead to the ability of sequential and simultaneous device usage:

For gaming/entertainment that means PLAY – PAUSE – RESUME
carry the game progress across screens

For productivity this means WORK – SAVE – SYNC
carry the workflow state across Screens

Unfortunately, today’s reality looks different: Only a few Apps take advantage of services but most store their data directly on the device. The reason for this is either the App doesn’t need to share/store information or more likely the reduced complexity to develop and test the App because there is no need to establish a communication with the service and no user authentication/authorization is required. Beside not supporting cross device and App scenarios, many devices have limited storage capacity and query capabilities, so it might become tricky to either store all collected data and/or making good use out of it.

On the other side, a service enables a seamless cross device and upgrade experiences and helps to overcome local storage constraints. This doesn’t mean that the only storage is in the cloud. It’s a best practice to reduce network dependency and leverage a combination of local storage and service capabilities.

Windows Azure provides the flowing storage options:

Windows Azure storage options

Windows Azure storage options

  • Tables are designed for large scale NoSQL data and have a very favorable price point (7 cents / GB). Storing large scale of data requires the developer to understand the concepts of data partitioning (more about this in a future post). Tables can store up to 100 TB and support either local or geographical redundancy. A unique storage account key grants access via REST and managed APIs.
  • Blobs are the preferred way to store files , whether these are images, text or media documents. Similar to tables, blobs can store up to 100 TB, support local or geographical redundancy and the storage account key grants access via REST and managed APIs.
  • Queues are a great way to implement reliable, persistent messaging between apps and services. Each message can store up to 64KB. The number of messages is unlimited. As with tables and blobs, a unique storage account key grants access via REST and managed APIs.
  • SQL databases provide the capabilities of a fully fletched relational database-as-a-service. The rich transactional support helps writing LOB services. Another great feature is SQL Data Sync, which enables hybrid scenarios through the synchronization of Windows Azure SQL databases and on-premise SQL servers. The current size limitation of SQL databases is 150GB and the cost per GB is between 10$ (the first GB) and 1$ (each GB above 50GB). The database connection can be established using ADO.NET, ODBC, JDBC, Entity Framework and php drivers for SQL server.

But with all these options, how do I pick the one which suits me the best?
Since there is no simple answer to this question, I will cover this is in a future post

Dealing with identity in devices and services scenarios

Seamless device scenarios require services to store and share data across devices and
Apps. This requires the solution to authenticate the user (who is it) and to authorize the request (is this user allowed to perform this task/see this information). There are different options to deal with user authentication:

Authentication options

Authentication options

While the most pragmatic way would be to introduce a username and password for our solution, this introduces two major problems:

  • First, we need to implement a proprietary credential management system which allows us to create, store and manage the user name and password.
  • Secondly, the user needs to remember the logon information for our solution. I have to say that I really hate to create a user name and password for all the different services and website I use! Why can’t they just use one of the existing identity providers such as Microsoft Account or Facebook ID?

We actually can: It is quite simple to use existing identity providers and federate them with our solution using Windows Azure Access Control Service. This allows the users to use their identity provider of choice and work seamlessly across devices and solutions. The simplest way to get started is to use Windows Azure Mobile Services: The following tutorial shows how to configure a Mobile Service solution to give users the choice of a Microsoft Account, Facebook, Twitter or Google login. Sweet…


From applications towards Apps (2/2)

Different devices have different capabilities and many of our workflows and business processes involve multiple devices (sequential and/or simultaneous).

core device capabilities

core device capabilities

The device choice is usually based on its core characteristics:

  • Smartphones are personal devices which provide connectivity everywhere and everytime but they are not really suitable for productivity tasks
  • Tablets are great consumption and entertainment devices but may have limited connectivity
  • PCs are THE productivity devices but their form factor may introduce some mobility constraints

However it seems that these distinctions blur more and more:

  • Some Smartphones are already close to the size of tablets and provide great consumption experiences
  • Newer Tablets have dramatically improved their connectivity capabilities (e.g. built in 3G/4G)
  • Microsoft launched with Surface a tablet with PC capabilities (or vice versa, depending on the way you look at it).
from applications towards Apps

from applications towards Apps

To take advantage of the different device capabilities, the solution landscape is moving from one application that supports multiple scenarios towards multiple Apps that deliver an individual scenario across multiple devices (each App optimized for the specific device experience). Building such scenarios has an impact on how we deal with the following aspects (the list is by no means complete):

  • Identity – The underlying services need to recognize users across multiple Apps and devices. Being able to federate the user’s identity of choice enables the required single sign-on experience across services and devices.
  • Storage – The ability to access the same data from different devices independent of location requires the solution to store state and information not locally but on a shareable location.
  • Communication – Seamless, rich bi-directional communication across different users and devices. This requires collaboration and notification capabilities across Apps and devices.
  • Monetization – Different scenarios require different monetization strategies. There is a large spectrum from starting with freemium to pay for Apps to service usage subscriptions. These business models are dependent on the user type: Consumers have different needs and paying behaviors than organizations.
  • LifecycleApps will most likely change more frequently than the service they’re built on. New devices require new Apps all connecting to the same service. Decoupling of service and App becomes a core asset in providing a seamless App experience across multiple devices. Taking a service oriented approach and reducing the dependency between the service and the devices allows for independent development and release management. It’s also important to understand the Apps distribution constraints which might be given through the various marketplaces.

I will cover these aspects in more details in future posts.

From applications towards Apps (1/2)

Looking at today’s application landscape, one core distinction is whether the solution is available on the public internet or only within corporate boundaries while the application is connected to the Intranet. Bring Your Own Device (BYOD) may introduce the ability to connect devices to the corporate network, but only a few devices are able to domain join and  the existing applications are not designed with the respective device capabilities in mind. Considering that various devices have different capabilities and form factors it is no surprise that existing browser and client applications don’t deliver the best user experience across devices. With a few exceptions (such as email) these applications aren’t available outside the corporate firewalls.

Application and App landscape today

Application and App landscape today

Today, Apps are predominantly available for consumers and not for business scenarios which requires a connection to corporate data and workflows. Many device scenarios increase productivity the most while not in the office but offsite with customers or while commuting to the office. Given the nature of devices, their connectivity capabilities and compliance requirements, VPN/DA may not be an option.

That’s where dedicated device Apps together with (Internet) services will play a major role. However, the trend towards Apps won’t replace browser and client applications for the more complex workflows.

Application and App landscape tomorrow

Application and App landscape tomorrow

To deliver the most value and best usability, Apps need to be designed for a focused set of interactions. That’s why many existing and new solutions won’t be replaced by Apps but rather will require multiple Apps for dedicated tasks within longer workflows (e.g. taking a picture of a receipt and classify the expense, while the expense report will be completed using a desktop application). In addition to browser and client-based applications, Apps will become an integral part of how business processes and solutions will be delivered. The services of these solutions most likely will take advantage of cloud capabilities such as elastic/scalable compute, cheap/reliable storage and federated identity.

Setting the scene

Back in the days of client server applications, we designed systems end-to-end. This meant that most server functionality surfaced through exactly 1 application – the client. When web applications became the de-facto standard for delivering solutions, most server functionality (if not all) was consumed by just one client, the one web application for that function. This really started to change with SOA and composite applications: The monolithic client server applications were replaced by reusable services and composite UI technology. These mash-ups provided the user with rich information coming from multiple services. Early examples of such mash-ups include solutions taking advantage of mapping services such as Google or Bing maps.  This change introduced a new dynamic into the application ecosystem where suddenly one could build an application taking advantage of already existing services and now focus on value add through visualization, usability or unique composition. Or one could focus on delivering a building block service which will be used by other applications. We moved from an ecosystem of holistic application builders (ISVs) to one based on service providers and services consumers. The ability to connect to literally any web based service allowed companies to focus on their core competency, whether this was building and operating services or providing a great application experience to their users.

There seems to be an agreement that some of the recent and most impactful trends emerging are mobility, social and big data. All three become even more powerful if combined with the fourth mega trend: The cloud.

  • Mobility gives access to information anytime and anywhere. However, due to limited storage and compute capabilities and the need to access information from multiple devices, the cloud becomes the key player in providing great mobile experiences. The cloud enables new ways to store and compute information and is accessible across all devices.
  • Social undoubtedly established itself as a core marketing, content sharing and App distribution strategy. However, the power of social requires an underlying service platform that can handle the unpredictability and the viral nature of social Apps. Such Apps can cause serious headaches in forecasting the needed compute capacity. However, if such applications are written in cloud style, they can access nearly an unlimited set of resources that are easily scalable when needed.
  • Data and information is power. Big data technology enables a new level of data and business insight, but only if a significant amount of data is available. While historically storing large amounts of data was very expensive, the economies of scale of cloud computing now allow storage of huge amount of data at an affordable price. The rapid development of cloud style big data technology makes data analysis broadly accessible.

There is no doubt that these trends have an impact on the way we build and use applications. For instance the popularity of mobile devices established an App ecosystem. While many of today’s Apps resemble mini applications (reflecting the whole solution), over time we will see Apps become building blocks for rich end-to-end solutions with various Apps supporting scenarios across devices. This move will not only imply technology changes but will require new marketing, distribution and monetization strategies.

While in the beginning, Apps are predominately used in consumer scenarios, Apps for business usage will become more and more popular. Many employees expect their personal devices to seamlessly work across corporate applications and consumer apps. This obviously introduces challenges for IT departments: How to expose services and make them accessible to devices without compromising the security and compliance policies?