Security and identity in a mobile first and cloud first world

Maybe the most radical change for IT is the fact that the perimeter is gone!

the perimeter is gone

the perimeter is gone

Beside this, users have high expectations on IT: They want to bring their own devices and expect that everything works seamlessly across different apps. They want to access their work data whether they’re in the office or on the road. Providing these capabilities in a secure way is not a simply task.

To help navigate through this, I compiled a list of user expectations, IT requirements and their corresponding security threats – all along the dimension of users, devices, apps and data:

expectations requirements and the threats in a mobile and cloud first world

expectations requirements and the threats in a mobile and cloud first world

So what are the core things which will help IT to address this?

The visual below illustrates three initiatives unify identity, manage and secure devices and protect data together with the involved top level activities:

a way to think about security and identity in a mobile and cloud first world

a way to think about security and identity in a mobile and cloud first world

Delivering on these core initiates will be crucial for every IT in a mobile first and cloud first world!

Machine learning: predicting wine quality ratings based on physicochemical properties

While prepping for my YOW! talk, I was looking for a clean and over-seeable dataset to demo predictive analytics using MAML. It was then when I came across the P. Cortez wine quality dataset: It contains data samples of 4898 white and 1599 red wines, each consisting of 11 physiochemical properties (such as alcohol, ph, acidity) and its quality rated by wine connoisseurs (0 – very bad to 10 – very excellent).
My goal was to create a model that could predict the quality of a wine based on its physiochemical properties.
The most straightforward approach was to just use a multiclass classification algorithm to predict the rating. Here the experiment that I used to train and evaluate the model:

wine quality prediction using training and validation data

wine quality prediction using training and validation data

I used a random 50% split between the training and the validation data and trained the model using the quality label. Looking at the performance of the model, it becomes clear that we do an ok job in predicting good wines (5,6,7) but do a poor job at predicting bad and great wines:

model performance using training and validation data

model performance using training and validation data

A cause for the inaccurate prediction of bad and great wines most likely resides in the fact that we don’t have enough bad and great wines to train the model – which becomes obvious if we look at the histogram for wine quality:

wine quality histogram

wine quality histogram

As we can see, there are only a few wines with a rating <=4 and >=8, which makes it very hard to build a model that is well trained on bad and great wines, especially considering that the training algorithm only sees 50% of the data as we use the other 50% to test the model.
One possible approach to train models with only limited data available, is to use cross validation. This will divide the data into n-folds and the model will be trained on each fold while using the other folds for validation:

wine quality prediction using cross validation

wine quality prediction using cross validation

It looks like the performance of the model had been increased without really over fitting it:

model performance using cross validation

model performance using cross validation

While we increased the accuracy of predictions for wines in the 4-8 range, we still have a hard time to predict the 3 and 9 (there are no 1, 2 and 10) because we just don’t have enough data points to learn them.
This is why I moved away from the 1-10 rating but started to quantize the wines into three buckets

  • great (8,9,10)
  • good (6,7)
  • bad (1,2,3,4,5)

To do so, I use the quantize module and define the bin edges as 5 and 7:

wine quality prediction using quantized values and cross validation

wine quality prediction using quantized values and cross validation

This will give as the following distribution across the three quality buckets:

quantized wine quality histogram

quantized wine quality histogram

The performance of the above model looked like the following:

model performance using binned quality values

model performance using binned quality values

While we still struggle to accurately predict good and great wines, we’re doing a good job in predicting the bad ones. So we can fairly accurately make the following two predictions:

  • Predicted as 3: these wines have a very high probability to be great
  • Predicated as 1: these wines have a high probability to be bad

I’m fairly sure that we could dramatically improve the performance of the model if we would have additional features such as grape types, wine brand or vineyard orientation. Unfortunately these are unavailable due to privacy and logistic issues, but obviously would be available to wineries that leverage such an approach to fine-tune their winemaking process.

Credits:
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
Modeling wine preferences by data mining from physicochemical properties.
In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.

Available at: [@Elsevier] http://dx.doi.org/10.1016/j.dss.2009.05.016
[Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf
[bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib

Deployment and release strategies for Microsoft Azure Web Sites

One of the easiest ways to implement continuous deployment with web sites is to use Git. Developers can write Git hooks that push the deployable repository to the web site repository. When we take this approach, it is important to fully script the creation and configuration of the web site. It is not a good practice to “manually” create and configure it. This might not be apparent, but it is crucial for supporting disaster recovery, creating parallel versions of different releases, or deploying releases to additional data centers. Further, the separation of configuration and settings from the deployable artifacts makes it easy to guard certificates and other secrets, such as connection strings.
The proposed approach is to create a web site (including staging slot) for each releasable branch. This allows deployment of new release candidates by simply pushing the Git repository to the staging web site. After testing, this can be switched to the production environment.
As described above, it is recommended that we create two repositories, one for the creation and configuration of the web site and one for the deployable artifacts. This allows us to restrict access to sensitive data stored in the configuration repository. The configuration script must be idempotent, so it produces the same outcome regardless of if it runs the first or the hundredth time. Once the web site has been created and configured, the deployable artifacts can be deployed using Git push to the staging web site’s Git repository. This push should take place with every commit to the release repository.
It is important that all web site dependencies, such as connection strings and URLs, are sourced from the web site’s application and connection string settings. (Do not make them part of the deployable artifacts!) This allows us to deploy the same artifacts across different web sites without interference. For this example, assume we have an application that consists of two sites, one serving as the frontend and the other as the backend. The backend site also uses storage services (Figure 1).

Application consisting of two sites

Figure 1: Application consisting of two sites

The first step is to split the application into independent deployable components. Each component has its own source repository. Because the backend is the only component that accesses the storage service, we can group them together. The configuration script creates the web site for each component as well as the containing resources, such as storage accounts or databases. Further, it configures all dependencies. In the example below, the script for site 1 will configure the site 2 URL as an application setting. Splitting an application into independent deployable components (Figure 2).

Splitting an application into independent deployable components

Figure 2: Splitting an application into independent deployable components

There are different strategies to handle code branches when releasing new functionality. The following two are commonly used:

  • Keep the master always deployable and use short-lived branches for feature work.
  • Create long-lived branches for releases and integrate feature work directly into the master.

In this series of posts I will focus on the second approach—creating long-lived branches for every new release. The benefit of this approach resides in the fact that there is a 1:1 relationship between a specific release and its corresponding web site creation and configuration script. This makes deploying previous versions extremely simple because we just run the respective script and then deploy the component. It also allows us to easily run multiple releases of the same component in parallel, which is great for A/B testing.

The next posts will cover how to manage long-lived branches for releases while working on features on master. So stay tuned…

Deployment and release strategies

deployment doesn’t equal release

As part of continuous deployment, we have to build an automated deployment pipeline that allows us to frequently deploy and test new functionality and bug fixes. We will first test the deployment on a staging environment. Only if a deployment passed all tests, we will release it to the production environment:

  1. running unit tests (triggered by check-in)
  2. running integration builds
  3. running integration tests
  4. deploy the artifacts to the staging environment
  5. perform more tests (e.g. smoke and acceptance tests)
  6. If all tests are passed, the new release can be rolled out across the production environment.

Having such a deployment strategy in place becomes very handy when instant releases (e.g. bug fixes) are required. The goal is to fully automate this deployment pipeline to shorten the time (and pain) from check-in to release. While doing so, the solution needs to be able to respond to requests at all times, even when in the process of deploying or testing a new release. To achieve zero downtime, we commonly take advantage of the two deployment strategies “blue-green deployment” and “canary releasing”.

It is important to understand that the risk is exponentially greater the more check-ins that occur between releases. So actually, launching with new releases more frequently is less risky because the scope of the changes are better understood. This is counterintuitive to many people.

Blue-green deployment
Blue-green deployments are based on two identical deployment environments – one for production and one for staging. The key is to ensure that the two environments are truly identical, including the data it manages. Zero downtime releases are achieved by deploying to the staging environment. After smoke testing the new deployment, traffic will be routed to the staging environment which now becomes the production environment. While blue-green deployments provide a simple and powerful way to test a deployment before going into production, it might require staging environments of similar size and capacity to perform capacity tests, which might not be an economically feasible option for large scale services. Microsoft Azure Web Sites and its staging slots provide an out-of-the-box experience for “blue-green deployments”. It basically provides two deployment slots which can be swapped. In most scenarios, this will be the default deployment strategy.

Canary releasing
Canary releasing addresses the challenge of testing a new release with only a subset of the servers. This approach can also be used for A/B testing: small percentage of the users will be routed to the new service while the majority still works against the “old” version. This allows the team to get direct feedback without being at risk of impacting the majority of users. It is actually possible to have multiple versions running in parallel. The same approach can be used to perform capacity tests without routing actual users to the release which is under test (basically test the new version in production without routing actual users to it). While “blue-green deployments” are simple, doing “canary releases” is more complicated, because all instances within a Web Site are based on the same deployment. As part of this series, I will discuss the use of a custom router which acts as a reverse proxy. Using this approach allows to route certain users to the “canary deployment” while the majority of users work against older releases.

Continuous deployment – automation is key

During the last couple of months I had many discussions on DevOps and especially continuous deployment. These discussions were not only about technology – they were also about a cultural change:

Delivering a service instead of shipping software requires a cultural change in how development teams and operation teams interact – there has to be a joint accountability to deliver the SLA as one team. A central part of continuous delivery is automation, from the check-in of new code to build, test and deploy – automation is the key to continuously deliver a service. Gone are the days of the waterfall process where the developer hands the code to the test department to hand it to the ops guys. It is one team that is responsible to develop, test and operate the service.

Over the next couple of weeks I plan to blog concrete guidance on how to build an automated deployment pipeline using Azure Web Sites. Here’s a rough outline of the topics to come:

Stay tuned …

Playing with numbers

Recently I strolled over the following disk ad from the early 80’s:

disk advertisement from the 80's

disk advertisement from the 80’s

This was the trigger to play with some numbers:

What would it cost to provide 1GB geo-redundant high availability storage (similar to Windows Azure storage) using these ancient disks?

  • Windows Azure stores data 6 times across two geo-redundant locations
  • Which means that storing 1GB of data requires 6GB of storage capacity
  • Taking the disk from the early 80s – storing 6GB of data would have required 600 10MB disks
  • This would have cost $2M+!
  • Let’s say we would have gotten a 50% discount on those drives, we would still have to pay around $1M
  • And that would be just the cost for the disks…

Today, Windows Azure provides geo-redundant storage for $0.095 per GB/month. Which means we can store a 1GB of data over 5 years and it costs less than $6.

This is 166’000 times cheaper than 30 years ago, not even considering that we’re not just getting the disks but a complete storage service.

Communication between Apps and services

In this blog post I will discuss some communication options for device and services scenarios. Looking at it from a higher level, we can differentiate between device initiated and service initiated communication. However on most devices, there is a fundamental difference between the two:

  • the service has the capability to listen for incoming requests and therefore implementing device initiated communication is as straight forward as sending the request (REST, WS-*, …) to the service endpoint
  • most device platforms don’t provide the capability of exposing a service endpoint and dis-encourage from listening/polling for requests, which makes pushing data to a device quite a bit more challenging

Push Notifications
Actively pushing information to mobile devices is a common requirement. That’s why the different device platforms offer capabilities which take care of push notifications in a bandwidth and battery friendly way. This is achieved through a client component which takes care of receiving the message and then dispatches it to the App on one side, and a service component which facilitates the interaction with the client component.

Let’s have a look how this works for Windows Store Apps:

push notification overview

push notification overview

  • to receive push notifications, the Windows Store App simply requests a so called channel URI from the Notification Client Platform (which represents the client component of Windows 8 notification capabilities).
  • this URI is used to identify the device and App and needs to be shared with services that should send notifications to this App. To do so, the service provides a function which allows the App to register its channel URI (in other words, the service simply receives and stores the different channel URIs)
  • to actually send a notification to a Windows Store App, the service authenticates itself to the Windows Push Notification Service (which is a service run by Microsoft) and makes the request to send the notification message to a specific channel
  • the Windows Push Notification Service sends the message to the requested device (there is not guarantee for delivery)
  • on the client side, the Notification Client Platform simply dispatches the message according to the channel URI

Since there is a strong coupling between the client and the service component, it shouldn’t come as a surprise that the different device platforms provide you with different notification services:

  • Windows 8 – Windows Push Notification Service (WNS)
  • Windows Phone – Microsoft Push Notification Service (MPNS)
  • iOS – Apple Push Notification Service (APNS)
  • Android – Google Cloud Messaging (GCM)

However the really good news is that Windows Azure Mobile Services makes sending push notifications to the above mentioned platforms very easy: It not only provides you with an easy way to configure the services on its portal, but it also provides objects for implementing the service and SDKs for the client. This makes the request for a Windows Store Channel URI as simple as the following line of code:

channelURI = pushNotificationChannelManager.
CreatePushNotificationChannelForApplicationAsync();

Once the mobile service knows about the channelURI, it simply can send a push notification using the server side object model:

push.wns.sendToastText04(channelURI, “this is my message”);

As already mentioned, the server side scripting of Mobile Services doesn’t only provide a push object for Windows Store Apps but also one for APNS, GCM and MPNS.

I’m lovin’ it…

 

Dealing with state in modern Apps (2/2)

My previous blog post covered the need for handling state across devices/users and introduced the different Windows Azure storage options. In this post, I want to discuss the approach to data architecture in more detail.

Why not just use SQL databases?
While SQL databases provide many of the functionality known from a RDBMS they come with a higher price point and pretty hard size limitations (150GB as of March 2013). This makes them great for solutions with a predictable amount of data and scenarios which benefit from RDMBS capabilities such as Transact-SQL support. Another benefit might be the reuse of your client libraries because tabular data stream (TDS) being the communication protocol for both SQL Server and SQL databases.

However most services will have the need to store and query an increasing amount of data which pushes  a single database at its scale up limitations. Since cloud computing is based on scale out we’re soon confronted with the challenge to partition our data across multiple storage nodes or different storage technologies (such as Tables, Blobs, Hadoop on Azure, SQL databases, …).

data partitioning

data partitioning

While traditional reasons for partioning where predominately about horizontal partitioning (e.g. sharding) the cloud provides new reasons for data partitioning such as cost optimization through the usage of different storage technologies or the ability to only temporarily store data (e.g. when running a Monte Carlo simulation on a Hadoop cluster on Windows Azure).

Horizontal partitioning
In horizontal partitioning, we spread all data across similar nodes to achieve massive scale out of data and load. In such a scenario, all queries within a partition are fast and simple while querying data cross-partitions becomes expensive. An example of horizontal partitioning is the distribution of an order table according to the customer which placed the order. In this example we partition the order table using the customer as the partition key. This would make it very efficient for retrieving orders that belong to a specific customer but very ineffective to retrieve information that involves cross customer queries such us “What are the customers that ordered product xyz”.

Vertical partitioning
In vertical partitioning, we spread data across dis-similar nodes to take advantage of different storage capabilities within a logical dataset. By doing so, we can leverage more expensive indexed storage for frequently queried data but store large data entities in cheaper storage (such as blob and tables). For instance, we could store all order information in a SQL database except the order documents, which we store as pdf in blob storage. The downside of this approach is that retrieving a whole row requires more than just one query.

Hybrid partitioning
In hybrid partitioning we take advantage of horizontal and vertical partitioning within the same logical dataset. For instance leverage horizontal partitioning across multiple similar SQL databases (sharding) but use blob storage to store the order documents.

Conclusion
To take advantage of cheap cloud storage we must partition our data.

partitioning conclusions

partitioning conclusions

In all partitioned scenarios it is cheap to query data within a partition but expensive to query it across multiple partitions or storage types. However since storage is fairly cheap and available in unlimited capacity, it is a very common approach to aggressively duplicate data to ensure every query includes a partition key. By doing so, we optimize the service for data retrieval. For example, if we have an order table which is partitioned by customers, it is expensive to retrieve a list of customers which ordered product xyz. This is because we can’t provide the query with a partition key. One way to address this problem is to create a second table which duplicates the data but uses product as the partition key. We basically optimize our service for data retrieval and not for data inserts. Which is a fundamental change for many of us used to SQL databases.

Dealing with identity in devices and services scenarios

Seamless device scenarios require services to store and share data across devices and
Apps. This requires the solution to authenticate the user (who is it) and to authorize the request (is this user allowed to perform this task/see this information). There are different options to deal with user authentication:

Authentication options

Authentication options

While the most pragmatic way would be to introduce a username and password for our solution, this introduces two major problems:

  • First, we need to implement a proprietary credential management system which allows us to create, store and manage the user name and password.
  • Secondly, the user needs to remember the logon information for our solution. I have to say that I really hate to create a user name and password for all the different services and website I use! Why can’t they just use one of the existing identity providers such as Microsoft Account or Facebook ID?

We actually can: It is quite simple to use existing identity providers and federate them with our solution using Windows Azure Access Control Service. This allows the users to use their identity provider of choice and work seamlessly across devices and solutions. The simplest way to get started is to use Windows Azure Mobile Services: The following tutorial shows how to configure a Mobile Service solution to give users the choice of a Microsoft Account, Facebook, Twitter or Google login. Sweet…

 

From applications towards Apps (2/2)

Different devices have different capabilities and many of our workflows and business processes involve multiple devices (sequential and/or simultaneous).

core device capabilities

core device capabilities

The device choice is usually based on its core characteristics:

  • Smartphones are personal devices which provide connectivity everywhere and everytime but they are not really suitable for productivity tasks
  • Tablets are great consumption and entertainment devices but may have limited connectivity
  • PCs are THE productivity devices but their form factor may introduce some mobility constraints

However it seems that these distinctions blur more and more:

  • Some Smartphones are already close to the size of tablets and provide great consumption experiences
  • Newer Tablets have dramatically improved their connectivity capabilities (e.g. built in 3G/4G)
  • Microsoft launched with Surface a tablet with PC capabilities (or vice versa, depending on the way you look at it).
from applications towards Apps

from applications towards Apps

To take advantage of the different device capabilities, the solution landscape is moving from one application that supports multiple scenarios towards multiple Apps that deliver an individual scenario across multiple devices (each App optimized for the specific device experience). Building such scenarios has an impact on how we deal with the following aspects (the list is by no means complete):

  • Identity – The underlying services need to recognize users across multiple Apps and devices. Being able to federate the user’s identity of choice enables the required single sign-on experience across services and devices.
  • Storage – The ability to access the same data from different devices independent of location requires the solution to store state and information not locally but on a shareable location.
  • Communication – Seamless, rich bi-directional communication across different users and devices. This requires collaboration and notification capabilities across Apps and devices.
  • Monetization – Different scenarios require different monetization strategies. There is a large spectrum from starting with freemium to pay for Apps to service usage subscriptions. These business models are dependent on the user type: Consumers have different needs and paying behaviors than organizations.
  • LifecycleApps will most likely change more frequently than the service they’re built on. New devices require new Apps all connecting to the same service. Decoupling of service and App becomes a core asset in providing a seamless App experience across multiple devices. Taking a service oriented approach and reducing the dependency between the service and the devices allows for independent development and release management. It’s also important to understand the Apps distribution constraints which might be given through the various marketplaces.

I will cover these aspects in more details in future posts.