A working implementation of my real-time activity recognition system:

Times at which I perform the various activities: Squats - 0:20, Pushups- 0:40, Bicep curls - 1:04, Tricep extensions - 1:20, Walking - 1:41, Rest - 2:00.

How does it work?
The set up involves two iPhones: one is worn on my forearm (iPhone 4s) to capture motion, and the other (iPhone 6 - right corner of the video) displays the predicted activity.

The iPhone 4s is the brains of the operation. It captures my motion through the accelerometer and gyroscope sensors, transforms this raw data into meaningful input vectors, feeds them into a multi-layer neural network which then outputs a prediction. This value is then sent and displayed on the iPhone 6 via bluetooth. This whole process repeats every second to produce a local and real-time* activity recognition system.

*The recognition is not perfectly real-time. There is an inherent lag of ~3 to 4 seconds. This is because the system uses a 4-second window (with a sliding length of 1 second) to compute the input vectors for the neural network. Thus the system predicts the correct label approximately 3 to 4 seconds after an activity has been started.

What’s with the random predictions at times?
These random predictions have a pattern. They occur during the transitional periods between exercises. The neural network does not know about transitions, so it tries to fit an activity to the observed motion. As the motion during these periods is sporadic, the predictions jump from activity to activity.

The video shows the raw output of the neural network. In my project, I have addressed this problem by employing a simple accumulator strategy. The raw prediction of the neural network is fed into an accumulator which requires a threshold (i.e. a streak of x consistent predictions) to be met before changing its prediction to a new activity.

The app

The iOS application has three major functions:

Viewer: This view looks for devices running the app in 'Tracker' mode so it can connect and display the input it receives.
Tracker: Once the user hits 'Starts Tracking', this view performs the 'activity-recognition' part and then sends its output to devices running in the 'Viewer' mode.
Trainer: Allows the user to perform additional training on any activity. Once the user's motion has been captured, the neural network learns from this data to better adapt to the user's form.

Rationale for using a neural network
Neural networks provide a level of malleability that is very important for this project. The multi-layer network is implemented using online learning and this enables a level of personalisation. The neural network comes with a base training set (my training data) but the user is able to build on this by performing additional training. With this extra training the network can adapt and change to fit and match user's form.

This level of malleability is not available in decision trees. A decision tree would have to be re-built each time to accomodate additional training from the user. This is a relatively computationally expensive solution as the data set grows.

I experimented with a naive Bayes classifier and although it's quicker to build and has no optimisation step, the neural network had higher accuracy in almost all of the test scenarios.

Test results
The neural network has high accuracy on my activity data. It achieves accuracy levels of ~93% on a test circuit containing 5 exercises — with transitional periods removed. Such high accuracies are to be expected as it’s trained on my data. To test it out on another person, I recruited one of my friends to complete the same test circuit. The system achieved accuracy levels of 78% without additional training, and 92% with 30 seconds worth of additional training per activity. These results look promising but I’m going to conduct more tests with additional participants over the next week to see if the results are reproducible.

Some random/highly specific questions you may have:

Can it count repetitions for gym exercises?
Not yet. I would love to add that functionality but I haven't had time to tackle that problem yet.

Wouldn’t a better approach be to initially train the neural network on more than just one person?
That is a great point! In fact Microsoft’s research arm published a paper last year doing exactly that. Although they achieved great results, their initial training cohort consisted of 94 participants! I, as a one man team, can’t possibly duplicate that. This is why I created a system that can adapt, eliminating the need of Microsoft level resources.