Monday, February 27, 2006

System Integration for Dummies


Working on large scale project you get to learn that “putting things together” is one of the riskiest areas in the whole project lifecycle, yet system integration is somewhat underestimated both as an activity and as a discipline. Anyway, after participating to a large project where barbarian hordes of developers needed to integrate their own component a week before the official release date, nobody wants to repeat the same mistake twice. The current landscape in system integration currently consists in two leading approaches.

Prescriptive approach
The traditional approach focuses on an early definition of stable APIs at the boundaries between the different project components (especially for those that need to be used by different teams). The idea here is to allow development of the different components to advance separately as long as possible. Overall project planning proceeds accordingly: development and integration have well defined scheduled activities

This approach has some known drawbacks: early defined APIs can’t be completely stable, due to evolving scenarios or requirements, better knowledge acquired at implementation time and so on. Moreover, APIs are never completely clear (or I never had a chance to see such a beast), so you could end up having two perfectly legal implementation of an agreed API, which are not compatible in practice. Discovering such differences at integration time might end up in a big mess, deadlines are closer now and patching the APIs or the depending implementation with schedule pressures normally harms code quality and overall system stability.

On the other hand, attention and ceremony put in APIs definition (normally documentation and blueprints are agreed and signed up by the different parties involved) makes them difficult to change before integration time. One team might discover an extra need, but the change request on the API might then be agreed with several parties (which normally don’t love changing their ongoing design, or might find it a great chance to mask underlying schedule slips…) making this change process quite costly, even if it doesn’t affect code and system behaviour yet. I am tempted to define this approach “meeting room integration” ‘cause in large organization most of the effort is spent around a table.

More generally, the prescriptive approach weak point is the same of the waterfall development process: if everything is done properly at the right time, integration might proceed smoothly; but, if a change is required along the way, costs might rise out of control.

Continuous Integration
The opposite approach, was introduced by Martin Fowler’s article and became a building block of the XP and TDD approaches. Continuous Integration focuses on integrating systems as early as possible: the leading principle is that integration is a high risk area and integration costs have to be paid anyway, so integrating early helps mitigating the associated risk. Different components are then integrated while still in development in an update-build-deploy-test cycle, which can be tuned to include the latest version, or the latest stable version, or the last officially released version from the SCM system. The whole system is built and deployed in a “neutral” test environment and reports of the current state of the system are sent to all the tem members involved.

Tests (automated test) play a great role in ensuring system stability: having a comprehensive integration test suite helps spotting potential conflicts between components as early as possible, giving the project management a better grip on the overall status of the system. Comparing this approach with the previous one we might notice that a significant part of the integration test could be developed earlier, without waiting the overall system to be ready.

Of course there are some drawbacks with this approach also: relaxing stability constraints doesn’t necessarily mean “don’t design the APIs”, cost of change is paid differently, but it’s still a cost even if recent refactoring additions to common IDE introduced a significant improvement in this area. Starting integration tests early increases also the need for physical resources. If the hardware is a big cost on the overall project (or you have to rent it somehow), there might be constraints on test hardware availability. Same result may arise from organizational constraints: if the hardware is provided from another department you might have a looser control on hardware planning.

Another subtle consequence of this approach is that imposes iterative development: you need to have some working code to integrate as early as possible, so if some team is doing waterfall development inside its boundaries you’ll discover problems at the very last moment anyway. This doesn’t sound like a drawback (and in fact it’s normally a positive side effect) but need the project management to put a little control in process definition of the different teams too.

Back to my (sad) real world
I bet most of the things I wrote here sound pretty obvious to most of the readers: whether you favour one approach or the other one, or a combination of the two, you know the underlying principles of both, or at least one. If we have to agree on an integration strategy, these are the fundamentals. Then we can agree on a trade-off that best suits the current project scenario.

To my surprise, I realized that this is quite an optimistic assumption. On a large scale project, involving several separate organization, I had the pleasure of being taught that “Believe me, you do not want an integrated test environment: if something goes wrong you’ll never understand where the problem is…” and “running a couple of tests there, and a couple of tests there” was the best approach. After this, option a), b) and c) were all against the law. Option d) was to put the sentence in the dumbest sentences heard and it turned out to be the compelling reason for the whole post.

Tags: ,

No comments: