The Microsoft Research Kinect for Windows SDK beta provides academic and enthusiast developers with a simple toolkit to access the raw sensor streams of the Kinect and skeletal tracking functionality. The SDK is offered under a non-commercial license currently, with an expectation of greater functionality and a commercial license later this year. If you’ve used other SDKs already with Kinect you’ll notice the huge speed and performance improvements the Microsoft SDK delivers.
With the SDK installed it’s extremely easy to access the Kinect’s video, depth and audio streams for your own analysis, but it’s even easier to use these streams with the provided skeletal tracking and advanced audio capabilities. The four-element microphone array of the Kinect enables sophisticated acoustic noise suppression, while using the beam formation capabilities enables the programmer to identify the direction of the current sound source. It’s these capabilities of the Kinect; enabling computers to ‘see’ and ‘hear’ better than anything else to date, that are exciting developers all over the world. For me, the prospect of using the Kinect to deliver a ‘single camera navigation and recognition system’ to my Whitebox Robotics PC-Bot 914, is as exciting as it is now real. By combining image recognition technology with the Kinect’s depth stream, ‘A1-DW’ (my PC-bot 914), can not only recognise a location in my home but also easily determine how far from the objects in the view, it is. Combining several ‘compass point’ views and ‘A1’ can start to perceive its location. ‘A1’ can also use face recognition technology to identify people it sees. Coupling this with powerful speech recognition capabilities of the Windows platform and we have the start of some interesting computer/human interactions.
To take advantage of the audio capabilities for speech recognition you’ll need to install the Microsoft Speech Platform Runtime, a suitable Language Pack and the Microsoft Speech Platform SDK. Then it’s just a case of defining the grammars your application or game will listen for. This is very easily done as demonstrated by the ShapeGame sample included in the SDK.
One area lacking direct support in the current release of the SDK is gesture recognition. We hope to see this supported in future SDK releases. For now, there are a number of ways in which gesture support can be provided.
First up, there are already some simple gesture recognition examples published on Codeplex. Many of these use alternative Kinect SDKs to the Microsoft one, but converting them to utilise the Microsoft SDK isn’t very challenging. If you are looking for control interaction rather than broad gesture support, the guys at IdentityMine have published KinectPaint on Codeplex. This not only demonstrates skeletal tracking but also incorporates some simple WPF control modifications that work well with the Kinect – button, listbox, checkbox etc.
The most successful gesture recognition systems use a learning machine approach. There are several such implementations in development at the moment and we hope to see them on Codeplex very soon. These enable a developer to record several attempts at a gesture to define the recognition parameters of that gesture type. These are then used with a generic gesture recognition engine within the application to recognise the defined range of gestures. This approach greatly reduces the Kinect developers work load – so I expect this area of functionality will get a lot of effort early on by the NUI guru community, so that the rest of us more normal developers can get on applying this technology in our everyday world.