Saturday, 25 February 2012

Unit Testing Workflow Code Activities - Part 1

When I first started looking into Windows Workflow one of the first things that I liked about it was how it separated responsibilities. The workflow was responsible for handling the procedural logic with all it's conditional statements, etc. Whilst individual code activities could be written to handle the business logic processing; created in small easily re-usable components. To try and realise my original perception this series of blog posts will cover the unit testing of bespoke code activities; broken down into:

  • Part One: Unit testing a code activity with a (generic) cast return type (this post)
  • Part Two: Unit testing a code activity that assigns it's (multiple) output to "OutArguments" (Not yet written)

So to make a start consider the following really basic code activity; it expects an InArgument<string> of "Input" and returns a string containing the processed output; in this case a reverse copy of the value held in "Input".

namespace ExampleCode.Workflow
{
using System.Activities;
using System.Linq;
 
public class StringReverse : CodeActivity<string>
{
public InArgument<string> Input { get; set; }
 
protected override string Execute(CodeActivityContext context)
{
var input = this.Input.Get(context);
return string.IsNullOrWhiteSpace(input)
? input
: new string(Enumerable.Range(1, input.Length).Select(i => input[input.Length - i]).ToArray());
}
}
}

This code should have been extremely easy to unit test, apart from two immediate problems.

  1. The protected method "Execute" is not exposed to the calling code; making it impossible to call directly.
  2. I have no idea what is required to set up a "CodeActivityContext" or how to go about doing it - as a concrete implementation it's not possible to mock.
I don't really want to create a public method I can call directly without having to worry about the "context" as this is creating code for testing sake; something that is never a good idea. Just for completeness this could be implemented as follows, but I really wouldn't recommend it!

protected override string Execute(CodeActivityContext context)
{
return this.Process( this.Input.Get(context));
}
 
public string Process(string input)
{
return string.IsNullOrWhiteSpace(input)
? input
: new string(Enumerable.Range(1, input.Length).Select(i => input[input.Length - i]).ToArray());
}

So if we don't want to create new code and expose protected functionality purely for testing, what can we do? the answer lies in the unit test itself. Checking the prototypes for the "Invoke" method of the static class WorkflowInvoker class highlights that it takes either an instance of Activity or an instance of Activity<TResult>; it's important to remember that even a complex XAML workflow is contained within a single Sequence or Flow activity, which both inherit from Activity! Checking the return value of the "Invoke" method further highlights that we should get back the return value of the code activity instance. This means our unit test can simply be:

namespace ExampleCode.Workflow.Tests
{
using System.Activities;
using Microsoft.VisualStudio.TestTools.UnitTesting;
 
[TestClass]
public class StringReverseTests
{
[TestMethod]
public void TestMethod1()
{
var output = WorkflowInvoker.Invoke(new StringReverse { Input = new InArgument<string>("123") });
Assert.AreEqual("321", output);
}
}
}

It could be argued that it's not ideal because we really haven't isolated the code we want to test, but this solution doesn't require any test only changes to the code activity. It's also extremely easy to set up and call - given that I think it's an acceptable risk. No matter what logic is contained within the code activity, the only additional complexity in this instance is the number of input arguments. Things like external dependencies (Inversion of Control) and multiple output arguments will be covered in future posts.

Tuesday, 21 February 2012

Windows Workflow: Re-introducing old anti-patterns?

As part of my day job I've been experimenting with Windows Workflow in both modifying the existing TFS2010 build templates and as a way of controlling the process flow in our new suite of applications. On the most part I've been really impressed; when you sit in a process workshop watching the business users mapping the existing steps out on a white board (or even a wall) it is quickly apparent that showing them a similar flow should hold significant benefits. Gherkin goes some way towards creating a syntax/language that works for both technical and non-technical people, but it is a test language verifying the application is working as intended - you don't write the process itself in gherkin. We've also found (from experience) that gherkin has a reasonable learning curve for both technical and non-technical users; whilst most people seem to find it easy to relate to the visual element of workflow with little or no training.

But as I opened I've been impressed for the most part with what I've seen of workflow so far. Over the past 10 years or so there has been a significant push to improve the quality of code that is produced. We have unit tests and TDD, guidelines such as SOLID and DRY all designed to aid the developer in creating code that should be easier to maintain and less bug-ridden. This has all helped and it's not often that you should come across methods inside of a class that remind you of the days of classic ASP and/or VB6 with massive blocks of procedural code containing large conditional sections of code; harbouring numerous responsibilities, reasons to change and worse still, code that can't be tested in isolation(*).

Getting back to workflow re-introducing old anti-patterns, step forward the default build template for TFS2010. I'm not sure how many people have taken a look at this workflow, or worse still had to work with / modify it? That nice happy feeling of replicating what the business wants quickly vanishes and it's like taking a step back in to code from the early 90's. Everything (and I mean everything) is in one single workflow. In the designer, even at 50% scale, you have to scroll down several pages to move through one conditional block of code. Want to find where a particular exception is thrown, it's probably quicker to open up the workflow as plain XAML and text search for the exception type and/or message text. Even if you figured out how to host the workflow to test it, if you want to test the "gated check-in" functionality you'll have to run the entire workflow from start to finish just to reach the code under test. Want to isolate a hard to test entry point into the gated check-in workflow; you'll have to figure out the exact scenario to replicate it because you can't set it up or Moq the areas you're not interested in. Sound familiar, I'm sure everyone's worked on a code base that suffered the same problems but in real code we're mostly past those problems now.

It doesn't have to be this way, workflow allows sequences and flows to be declared in separate XAML files and nested inside a larger workflow. There's absolutely no reason why the gated check-in sequence could not have been it's own XAML file; with it's own in/out arguments. It quickly becomes SOLID and DRY - it only needs to change when the requirements or process for gated check-in changes and can easily be tested in isolation. I might not be able to figure out how to host / run the entire build template, but even now I'd probably be able to throw together a workflow host application that loaded, set-up and ran a "gated check-in" XAML file.

So workflow doesn't have to re-introduce old anti-patterns but all the time we have real-world examples that contain bad practises it will be harder for less experienced developers not to replicate. It's probably worth remembering that there are many developers that have come into the workplace never having to suffer the pain of procedural code that generated many of the recognised anti-patterns. It would be a big step back for development (and workflow) if examples like TFS became common place! As a side project I'm trying to gain a full understanding of the default build template (something that also seems to be missing from the on-line Microsoft TFS documentation) and break the XAML into smaller, focused sequences / flows that are easier to understand. Workflow does look like it can successfully be the procedural glue that handles the transition between state and complex / long running process flows; but it does need to adhere to the same testing and principles as the code it contains!

(*) This sort of code might still exist but I'm happy to say that I've been lucky to work in and with teams that don't produce it.

Monday, 20 February 2012

TFS2010: Publishing solution projects to their own directories

When looking to automate a TFS2010 build one of the first issues that most people seem to encounter is that all the binaries of each project in a solution end up in the same "bin" directory. The forum post TFS 2010 BUILD SERVER: Can not keep folder tree in the drop location ? details the solution; which is changes to both the CSPROJ file and the workflow template that is called by your build. Note: each CSPROJ file in your project needs to be updated as the workflow loops through the solution finding all the referenced projects.. The answer in the forum post has everything about what is needed but not why, which can be a bit confusing if you're just starting out with TFS / workflow.

The image to the left is the section of the workflow that is changed. The workflow variable "OutputDirectory" is defined within the scope of "Compile and Test for configuration" (highlighted in green). The value of "outputDirectory" is assigned (highlighted in red) and typically just includes the the build name + the build version (so Trunk_20120220.1 would be the first time the trunk build has run on 20th Feb 2012). For the default template the value of "outputDirectory" is assigned to the input argument "OutDir" of the "Run MSBuild for project" step (highlighted in blue). In the default template it is for this reason why all the binaries of all the projects end up in a single directory. The first documented change is to modify the properties of "Run MSBuild for project", the value of "outputDirectory" is no longer passed in the input argument "OutDir" but is assigned to the input argument of "CommandLineArguments" instead.

The code below duplicates the change to the "CommmandLineArguments" input argument referred to in the forum post but highlights the interaction of "OutputDirectory". The value of which is passed as the value of a variable whose name is not important - as long as it matches the reference you add to your CSPROJ file.

String.Format(
"/p:SkipInvalidConfigurations=true {0} /p:TfsProjectPublishDir=""{1}""",
MSBuildArguments, outputDirectory)

The code below shows the corresponding change made to the CSPROJ file. Similar to the previous section the important interaction with the workflow changes are highlighted. The name you use can be any unique variable name that you like that does not clash with existing workflow / environment variables. The change does need to be made to all CSPROJ files and for all configurations that you need to build. In this instances we can build both debug and/or release and we will end up with separate binary directories for each. For each CSPROJ file change the projectname value to a value unique to the project in question.

<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|AnyCPU' ">
...
<OutputPath Condition=" '$(TfsProjectPublishDir)'=='' ">bin\debug\</OutputPath>
<OutputPath Condition=" '$(TfsProjectPublishDir)'!='' ">$(TfsProjectPublishDir)\projectname\</OutputPath>
...
</PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|AnyCPU' ">
...
<OutputPath Condition=" '$(TfsProjectPublishDir)'=='' ">bin\release\</OutputPath>
<OutputPath Condition=" '$(TfsProjectPublishDir)'!='' ">$(TfsProjectPublishDir)\projectname\</OutputPath>
...
</PropertyGroup>

If you're not familiar with CSPROJ files, they are just XML based files that MSBuild uses to build the referenced project. The "Condition" attribute determines whether that XML element (and any children) are passed to MSBuild. So the first "PropertyGroup" is only present if configuration and platform equal Debug and AnyCPU (respectively), the second if it is equal to Release and AnyCPU. As should be apparent this conditional logic implies that only one of these PropertyGroups should ever be presented to MsBuild and they contain the configuration/platform specific values needed for the build. This conditional logic is how MsBuild behaviour changes between the IDE and TFS. If TfsProjectPublishDir is defined then that "OutputPath" node is included, otherwise the default more familiar "OutputPath" node is presented to MsBuild.

Thursday, 16 February 2012

Improving “Boiler Plate” Data-Reader Code – Part 5

In this post we will extend the query functionality to handle stored procedures with parameters. To do this we need to create a new query type interface with an example implementation:

public interface IDefineAStoredProcedureQuery : IQuery
{
string StoredProcName { get; }
IList<SqlParameter> Parameters { get; }
}
 
public class GetCustomersByFirstName : IDefineAStoredProcedureQuery
{
public GetCustomersByFirstName(string firstName)
{
this.Parameters = new List<SqlParameter> { new SqlParameter("FirstName", firstName) };
}
 
public string StoredProcName { get { return "GetCustomersByFirstName"; } }
 
public IList<SqlParameter> Parameters { get; private set; }
}

Now that we have the ability to create stored procedure queries we need something to handle them. To do this we need a concrete implementation of the interface "IHandleAQuery":

public class StoredProcedureQueryHandler : IHandleAQuery
{
public void Assign(SqlCommand command, IQuery query)
{
var castQuery = query as IDefineAStoredProcedureQuery;
command.CommandType = CommandType.StoredProcedure;
command.CommandText = castQuery.StoredProcName;
 
if (castQuery.Parameters != null)
{
command.Parameters.AddRange((SqlParameter[])castQuery.Parameters);
}
}
}

Finally we update our factory to handle the new query interface and return the correct handler:

public static class QueryHandlerFactory
{
public static IHandleAQuery Create(IQuery query)
{
if (query is IDefineCommmandTextQuery)
{
return new HandleCommandTextQuery();
}
 
if (query is IDefineAStoredProcedureQuery)
{
return new StoredProcedureQueryHandler();
}
 
var ex = new NotSupportedException();
ex.Data.Add("IQuery Type", query.GetType());
throw ex;
}
}

This can now be called using the following code; which should highlight the power of this repository pattern. Whilst we have implemented quite a lot of code behind the scenes the only change the consumer sees is what type of query object they are passing in.

var customers = new SqlRepository(connectionString).Get(
new GetCustomersByFirstName("Paul"),
new CustomerDRConvertorPart2()).ToList();

Improving “Boiler Plate” Data-Reader Code – Part 4

In part 3 we created a SQL repository object that took a populated instance of IQuery to select/return an enumerable list of objects. A limitation of the repository was that the query had to be text based, it couldn't handle stored procedures and/or parameters. By incorporating an abstract factory pattern we can extend the functionality to handle different types of query.

The original code inside "SqlRepository.Get(...)" needs to be changed from:

using (var command = connection.CreateCommand())
{
command.CommandText = query.Text;
 
connection.Open();

To:

using (var command = connection.CreateCommand())
{
var handler = QueryHandlerFactory.Create(query);
handler.Assign(command, query);
 
connection.Open();

The static factory class takes an instance of IQuery and determines which "query handler" to return depending upon the additional interface that the "passed in" query implements. This is implemented via the following code:

public static class QueryHandlerFactory
{
public static IHandleAQuery Create(IQuery query)
{
if (query is IDefineCommmandTextQuery)
{
return new HandleCommandTextQuery();
}
 
var ex = new NotSupportedException();
ex.Data.Add("IQuery Type", query.GetType());
throw ex;
}
}

To finish off the implementation the following new interfaces and code are needed:

public interface IDefineCommmandTextQuery : IQuery
{
string Text { get; }
}
 
public interface IHandleAQuery
{
void Assign(SqlCommand command, IQuery query);
}
 
public class HandleCommandTextQuery : IHandleAQuery
{
public void Assign(SqlCommand command, IQuery query)
{
command.CommandType = CommandType.Text;
command.CommandText = ((IDefineCommmandTextQuery)query).Text;
}
}

Finally we simplify the IQuery interface as it's only property has now been moved up into IDefineCommandTextQuery and then update all concrete implementations of "IQuery" to "IDefineCommandTextQuery " as well, without these final changes the factory class will not be able to correctly determine the handler from the interface that is implemented.

public interface IQuery
{
}
 
public class GetAllCustomersQuery : IDefineCommmandTextQuery
{
public string Text
{
get
{
return "SELECT id, Firstname, Surname FROM Customer";
}
}
}

Now our code can be safely extended to handle different query types just be creating the new implementation of the query handler and modifying the factory to handle the identification and creation of the new type. Part 5 will show how this functionality can be extended to handle stored procedures with parameters.

Improving “Boiler Plate” Data-Reader Code – Part 3

In Part 1 of this series we started with a basic Data-Reader / SQL Connection/Command pattern and illustrated how it is possible to abstract the parsing of the Data Reader into a standalone object that can be fully unit tested in isolation of the calling code. In Part 2 of the series we made a very simple optimisation to the “DataReader” convertor and updated the tests to capture/verify the changes.

In part 3 of the series we put this all together into a repository pattern to create a reusable and testable data access layer. The first step is to create an interface for the repository.

namespace DataAccess.Example
{
using System.Collections.Generic;
using System.Data.BoilerPlate;
 
public interface IRespository
{
IEnumerable<TEntity> Get(IQuery query, IConvertDataReader<TEntity> dataReaderConvertor);
}
}

The intent implied by the interface is that "Get" will be responsible for returning an enumerable list of a generic. To do this we pass in an implementation of IQuery and an implementation of IConvertDataReader for the generic we are to return. We already have an implementation of IConvertDataReader that we can use from the previous post. In this example the implementation of IQuery just returns SQL text that can be executed and is shown below.

namespace DataAccess.Example
{
public interface IQuery
{
string Text { get; }
}
}
 
namespace DataAccess.Example
{
public class GetAllCustomersQuery : IQuery
{
public string Text
{
get
{
return "SELECT id, Firstname, Surname FROM Customer";
}
}
}
}

The final piece of the jigsaw is the implementation of the repository interface, in this instance for SQL but it could be any data provider that returns an implementation of IDataReader. A basic implementation of the SQL repository is shown below:

namespace DataAccess.Example
{
using System.Collections.Generic;
using System.Data.BoilerPlate;
using System.Data.SqlClient;
 
public class SqlRepository : IRespository
{
private readonly SqlConnectionStringBuilder config;
 
public SqlRepository(SqlConnectionStringBuilder config)
{
this.config = config;
}
 
public IEnumerable<TEntity> Get<TEntity>(IQuery query, IConvertDataReader<TEntity> dataReaderConvertor)
{
using (var connection = new SqlConnection(this.config.ConnectionString))
{
using (var command = connection.CreateCommand())
{
command.CommandText = query.Text;
connection.Open();
 
using (var dataReader = command.ExecuteReader())
{
while (dataReader.Read())
{
yield return dataReaderConvertor.Parse(dataReader);
}
}
}
}
}
}
}

We now have a repository object that can be used against any SQL database to return a list of any type of object that can be populated using a standard SQL SELECT statement; the code below shows how to put this together.

var customers = new SqlRepository(connectionString).Get(
new GetAllCustomersQuery(),
new CustomerDRConvertorPart2()).ToList();

To understand the power of this pattern, consider the following code that is all that is needed to update the previous code to return a different list of customers; in this case all those with the first name of "Paul".

namespace DataAccess.Example
{
public class GetAllCustomersCalledPaul : IQuery
{
public string Text
{
get
{
return "SELECT id, Firstname, Surname FROM Customer WHERE Firstname = 'Paul'";
}
}
}
}
 
var customers = new SqlRepository(connectionString).Get(
new GetAllCustomersCalledPaul(),
new CustomerDRConvertorPart2()).ToList();

Part 4 shows how the code can be modified so it can be extended to handle different "query types".

Wednesday, 15 February 2012

Managing your dependencies in NuGet

When creating NuGet packages, how do you define your dependencies?

If you're using the default setting of 'x' version or newer are you sure that all future versions of the dependency will work with the current version of your code? I'm not sure many people would be happy saying yes to that question but most NuGet packages are deployed with the default setting for their dependencies. Using a typical dependency, Log4Net, you might deploy a package today referencing the current build and everything's fine. But in a month or two's time there may be an update to Log4Net deployed that contains breaking changes. From that point on anyone that grabs your package from NuGet will find that it no longer works - instead of the version of Log4Net you developed against, they are now getting the latest version that breaks your code.

Whilst it may be more work, the safer option may be to use the version "range" option for managing dependencies; only including versions that have been safely tested and are known to work with your code base. It may be more work, requiring you to retest and update your package each time a dependency is updated but your users will thank you for it.

Update

Since writing this blog post a breaking change has been published to NuGet for the very well used log4Net package. Phil Haacks blog post covers the issue in create detail and I won't attempt to recover it here but it does highlight the risks associated with being fully dependent one or more 3rd parties - your code (and therefore your consumers code) may fail because a dependency has been incorrectly versioned / deployed. In his article Phil also links to an interesting set of posts on how NuGet handles the package versioning.

Thursday, 2 February 2012

What's New In Windows Workflow 4.5

Just as I start getting up to speed with WF4.0 MSDN magazine publishes an article detailing what's new in WF4.5. It looks like there's a lot of good stuff coming but the main thing that I noticed was that v4.0 requires full-trust to run. That shouldn't be a problem for the project we're intending to run workflow in, but if it will run in partial trust in the next release that will open up it's usage for many other applications.

Wednesday, 1 February 2012

Windows Workflow 4.0 on Twitter

I've been trying to find workflow resources on Twitter but unlike some other technologies there doesn't seem to be much regular traffic. The hashtag #WF4 does seem to be used and I've started a twitter list of people I find regularly talking about WF4.0 on-line.

Starting Windows Workflow v4.0

Today I've started learning about Windows Workflow v4.0, I'm hoping that it will help out with a current project. The original requirement was for mapping real world business processes to what would normally be complex long running computer tasks. Whilst investigating what it could do I'm beginning to think there could be real value using it for any task that requires process flow logic, even extremely short lived ones. This could be controlling page flow through a web application, or even defining the complex logic that can sometimes end up making MVC controller actions fatter than they really should be.

For getting me started a really good starting point has been the MSDN WF4.0 videos under the beginners guide, if you have a PluralSite subscription, Matt Milner presents some really useful stuff too.

As well as just learning the basics of Workflow; what really interests me is how it should be architected in a real world LOB application; taking into account the usual "-bility" issues, Scalability, Maintainability and Reliability.

  • The biggest challenge will probably be understanding how to manage multiple long running workflows that have many external dependencies inside a service that can paused, stopped, restarted and on a server that can obviously be rebooted or even crash. How does the server restart and reload all the workflows it was managing.
  • Building from that, how would this all work in a load balanced environment where you may even have 2 or more of these services running sharing the load - maybe a workflow that is started on one server will be persisted / unloaded when it becomes idle and then restarted on another server.
  • How can a workflow task be passed between different layers of a system, so it may be started on as part of a web request from a user and passed to a service to be completed a long running task.
As I start investigating each of these issues I'm hoping that this group of MSDN articles might be a good starting point.