Human Activity Recognition Using Smartphones

With the wide spread of smartphone popularity and the substantial increase in their functionality, there is much effort in attempting to incorporate all of the features to assist in predicting human activities. With all the additional features to basic telephony, the variety of sensors installed in smartphones can be a powerful tool in Activity Recognition. In this project, we will employ smartphone censors data for human activities recognition, with potential applications in the healthcare industry.

The aim of Activity Recognition is to identify actions carried out by any person, given a set of observations of themselves and the surroundings. Data retrieved from embedded inertial sensors within the smartphone, such as accelerometers, can be exploited for such task. By processing these data through different supervised machine learning algorithm, we will be able classify physical activities, such as standing, walking, laying and etc.

This report is structured in the following way. Section 2 entails the methodologies we have employed. In particular, we will examine the theoretical and logic behind SVM, ELM and ELM’s constraint variants. Section 3 contains two ensemble implementation of ELM namely Random Forest and multiclass Adaboost. Section 4 gives a brief introduction to multidimensional scaling technique called tSNE. Section 5 begins with model selection by cross validation or out of bag errors and then compares testing accuracies among different models. Finally, we give the conclusion and our approach to the data mining competition in Section 6.

2 SVM and ELM

In this section, we first briefly introduce the theoretical background of SVM and ELM and their optimization problems respectively. Then we extended the basic ELM models to several variants which impose some constraint on the parameters in the first layer. More comparison among SVM, ELM and Constraint ELM and their performance on the testing data set can be found in Section 5.

2.1 SVM

In 1995 Cortes and Vapnik [1] proposed the Support Vector Machine (SVM) which maps the data from the input space to some feature space

through some nonlinear Prior mapping function

. In this feature space, constraint optimization is used to find the “optimal” separating hyperplane that maximize the separating margins of two classifies in the feature space. Given a set of training data points

where

and

. We can mapped the training data point in the input space to some feature space

via a nonlinear function

. Then the distance between two hyperplane in the feature space

. We maximize the distance by

where

is a slack variables introduced to make data points within the margins satisfies the first constraint in Eq (↓) and

is non-negative user-specified penalty term which controls the trade off between training error and the margin distance. Using Karush-Kuhn-Tuchker (KTT) theorem, the above optimization is equivalent to the following

provided that the kernel functions

satisfies Mercer’s condition. Then the decision function of SVM is

where

is the number of support vector and

is a sparse representation of the original data matrix

or . The SVM can be consider a similar type of Single Layer Feedback Network as illustrate in Figure 1↓.

2.2 ELM

Figure 1 ELM Neural Network structure.

Guang-Bin Huang and his colleges introduced the extreme learning machine as an SLFN with fast learning speed and good generalization ability (ELM can approximate any target function [3]). Unlike traditional SLFN, the parameters in the hidden layer need not be tuned or fixed once randomly uniform generated. The input data

is mapped to the feature space

by mapping

where

is the activation function for the hidden node

as illustrated in Figure 1↑. In Eq (↓), we know SVM can solve bi-classification and SVM is extended to

classes classification by One verse One (OVO) or One verse All (OVA). For OVO

SVMs are needed and

in OVA. However, ELM solves the problem in one model by having

output nodes. Then for class

the expected output vector is

or only the

th element of

is one while the others are zeros. ELM is to optimize the training error as well as the norm of the output weights

where

is the hidden-layer output matrix,

is the output weights between the hidden layer

nodes and the output layer

nodes and

is the expected output coded as above.

Formulate Eq (↓) as optimization problem

based on KTT theorem whose dual form is

Set the derivative of

with respective to

and

to zeros. We have

The predicted class label for sample

Also if the kernel function

satisfies Mercer’s condition, we can define the kernel matrix

with element

. Then Eq (↓) can be reformulated as

and the prediction formula for the Kernel form is the same as Eq (↓).

2.3 Constraint ELM

In previous section, we introduced ELM which maps the original data into the feature space by some activation function (e.g. sigmoid function) whose parameters are randomly generated and then solves essentially a linear system in the second layer. However, it is interesting to investigate whether prediction accuracy on the assessment data set will increase or not, if we add some constraints to the random parameters. The constraints added to the random weights are one of actively research areas in the ELM literature [4, 6]. Since Constraint ELM is claimed to be robust to outliers, here we only present several easily implemented methods for time and space limit.

The constraint difference (CDELM) method utilizes prior “distribution” of two classes. The aim is to constraint the weights to map two classes into positive and negative hyper planes. For observation

, it is mapped to

and

planes by

solve for

and

, we obtains

and

where

and

represent observations from two different classes.

We can generate weight

by randomly sample

from data matrix

and then normalize it (SELM) or sample two data points of different classes and normalize their sum (CSELM). We even can relax the different class constraint and directly normalize the sum of two random sampling (RSELM). There is little statistical support for those constraints but for the competition we tried to believe those will alleviate the possible outliers. However extensive comparison of all the constraint models will be evaluated in Section 5. For a brief reference, the weights and bias are defined in Table 1↓.

Name	Weights	Bias
CDELM
SELM		uniformly generated
CSELM		uniformly generated
RSELM		uniformly generated
CMELM	a mixture of the above	a mixture of the above

Table 1 Constraint ELM methods

3 Ensemble ELM

Since ensemble is very useful in dealing with overfitting problem and can possibly generate a better model for the testing data set, here we present two ensemble implementations for ELM. Our Random Forest here utilize ELM as week learner. Following Freund et al. [2], we extended the original AdaBoost to multiclass condition. The experiment results of this section is shown in Section 5.

3.1 Random Forest

Let

denotes as the

th ELM week learner. Random Forest version of ELM is defined as:

For $m=1:M$
1. Bootstrap training subset from training data matrix .
2. Randomly select variables from the bootstrap and fit ELM model
Output: where denotes the class label.

3.2 Adaboost

Our Multiclass AdaBoost ELM algorithm is following:

Initialize the observation weights where
For $m=1:M$
1. Fit to the training data using weights .
2. Compute the weighted error
3. Compute the weight of the th classifier
4. Update the weights
5. Re-normalize
Output

Note that in Adaboost ELM, we updates the

in Eq (↓) by

where

and

for

4 Visualization by tSNE

tSNE stands for

distributed Stochastic Neighborhood Embedding [5]. Its basic idea is to use conditional probability to define a measure of similarity between data points. Or For any

, define conditional probability that

would pick

as its neighbor.

For any

, same definition extends to mapped lower dimension.

Our aim is to find a mapping

such that

. Naturally Cross-entropy or Kullback-Leibler divergence is utilized to model the lost between two probability.

Since Eq (↓)’s

induces a probability distribution

whose entropy increase as

increase, an user specific parameter perplexity is introduced to constraint the information.

where

is the Shannon entropy of

defined as

. Finally, we can use gradient descent to optimize the cost function in Eq (↓). Or we take derivative in Eq (↓) with respective to each mapped point

shown in Eq (↓) and then use monotone Gradient descent in Eq (↓) to update each individual mapped point

where

a momentum term is added to the update function to expedite the descending process. The above is the definition of Stochastic Neighborhood Embedding (SNE) purposed by G. Hinton. They then further changed the condition distribution defined in Eq (↓) to Joint Probability distribution in Eq (↓) and use heavy tail

distribution with one degree of freedom shown in Eq (↓).

Those changes not only save computational time (exponential term in Eq (↓) is more intensive than the inverse operator in Eq (↓)) but also solves the crowding problem faced in SNE (the area to accommodate similarity points in

is not large enough for mapped point

). For more information about the technique detail about t-SNE, please refer to the their paper [5]. Here we present a comparison visualization plots between PCA (or Classical MDS) and t-SNE in Figure (2↓).

(a) PCA

(b) tSNE

Figure 2 PCA and tSNE’s Visualization Result.

We can see tSNE does a better job at visualization than PCA and Group 4 and 5 are such overlapping with each other in t-SNE that we later confirmed some data points between 4 and 5 are hard to separate by either ELM or SVM.

5 Experiment Result

5.1 Tuning Parameters

It is important to tune parameters for finding the optimal model with good generation ability and performance on the testing data set. We used minimum Cross Validation errors to select ELM and SVM’s optimal parameters as shown in Figure 3↓ and 4↓. The basic process for Cross Validation is to divide the data set we have into model selection (80%) and model assessment (20%). Further divide the selection data set into

folds, and then take turns to build a model using

folds to predict on the last fold.

For Random Forest or Bagging, the most convenient way to tune the number of parameters sampled is to use Out of Bag errors or calculate prediction errors on the observations left out from the bootstrap sample. From Figure 3↓ to 5↓, the following parameters in Table 2↓ are used for our model comparison and assessment in the Section 4-2.

Linear Model
Name	SVM	ELM	CDELM	CSELM	MIXELM	RELM	RSELM
Cost
Kernel Model
Name	SVM	ELM
Cost

Random Forest Model
Name	SVM	ELM
mtry

Table 2 Parameter Selected using Cross Validation and Out of Bag errors.

5.1.1 Kernel Form

(a) ELM

(b) SVM

Figure 3 Kernel SVM (5 folds) and ELM (10 folds)’s Cross Validation

5.1.2 Linear Form

Figure 4 10 Folds Cross Validation for 6 linear ELM models.

5.1.3 Random Forest Form

Figure 5 Random Forest Out of Bag errors.

5.2 Accuracy Comparison

Since we just got the testing data set labels from Dr. Allen, it is interesting to know how those different basic learner perform. Here we present the testing accuracies in Table (3↓) where each model’s parameters are selected in Section 5.1 shown in Table (2↑). However, we don’t have the results for SVM type of Random Forest and Adaboost for time limits. 2000 hidden notes are used in each linear models. Bag number equates to 300 for every random forest and adaboost method.

Linear Model
Version	png SVM	ELM	CDELM	CSELM	MIXEML	RELM	RSELM
Normal	0.888	0.923	0.896	0.904	0.945	0.879	0.906
RF	NA	0.929	0.923	0.947	0.930	0.930	0.900
Adaboost	NA	0.933	0.903	0.910	0.931	0.887	0.910
Kernel Model
Version	SVM	ELM
Normal	0.900	0.962
RF	NA	0.960
Adaboost	NA	0.962

Table 3 Testing dataset accuracy.

6 Conclusion

In this report we compared SVM, ELM and ELM’s constraint versions. ELM outperforms SVM in three ways: First ELM is extremely fast because its time complexity is similar to that of the least squared. Second ELM has better model assessment accuracy as indicated by our competition results. Finally ELM is very easy to implement since it provides an analytical prediction function for the new observations. However, ELM’s theoretical background is even weaker than SVM. Although some researchers pointed out ELM is a deep neural network with infinite layers and it is a biological inspired method, more deep research need to be done to provide a solid theoretical background for the model’s interpretability.

For our final entry, we first iteratively added prediction results returned by kernel ELM. Then we divided our prediction results in to Group 1,2,3 and Group 4,5,6 and separately built two Kernel ELMs. For hard separating classes 4 and 5, we tried to use Random Forest ELM and Adaboost ELM but we gave up those two methods because group 4 and 5 are overlapping seriously enough for any classifier to distinguish them right. Finally, after trial and error we detected testing subject No. 13 as an abnormal walking person in our training set and then used unsupervised methods to recover the label of Subject No. 13 shown in Figure 6↓.

We would like to thank glass and water server at Rice University for providing fast and stable computing power throughout this data mining competition.

Figure 6 Abnormal walking behavior Subject No. 13 visualized using tSNE (anomalous pattern circled).

References

[1] Corinna Cortes, Vladimir Vapnik. Support-vector networks. Machine learning, 20(3):273—297, 1995.

[2] Yoav Freund, Robert E Schapire. A desicion-theoretic generalization of on-line learning and an application to boosting. Computational learning theory:23—37, 1995.

[3] Guang-Bin Huang, Lei Chen, Chee-Kheong Siew. Universal approximation using incremental constructive feedforward networks with random hidden nodes. Neural Networks, IEEE Transactions on, 17(4):879—892, 2006.

[4] Punyaphol Horata, Sirapat Chiewchanwattana, Khamron Sunat. Robust extreme learning machine . Neurocomputing , 102(0):31 - 44, 2013. URL http://www.sciencedirect.com/science/article/pii/S0925231212004171. Advances in Extreme Learning Machines (ELM 2011) .

[5] Laurens Van der Maaten, Geoffrey Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9(2579-2605):85, 2008.

[6] Wentao Zhu, Jun Miao, Laiyun Qing. Constrained Extreme Learning Machine: A novel highly discriminative random feedforward neural network. Neural Networks (IJCNN), 2014 International Joint Conference on:800-807, 2014.

Yinsen Miao Tech Blogs

Human Activity Recognition Using Smartphones

1 Introduction

2 SVM and ELM

2.1 SVM

2.2 ELM

2.3 Constraint ELM

3 Ensemble ELM

3.1 Random Forest

3.2 Adaboost

4 Visualization by tSNE

5 Experiment Result

5.1 Tuning Parameters

5.1.1 Kernel Form

5.1.2 Linear Form

5.1.3 Random Forest Form

5.2 Accuracy Comparison

6 Conclusion

References