Sound Recognition with Qeexo AutoML

Zhongyu Ouyang and Dr. Geoffrey Newman 21 September 2020


Sound Recognition is a technology based on traditional pattern recognition theories and signal analysis methods which is widely used in speech recognition, music recognition and many other research areas such as acoustical oceanography [1]. Generally, microphones are regarded as sufficient sensing modalities as input to machine learning methods within these fields. Microphones are capable of capturing information necessary for the variety of classification tasks that can be performed on lightweight devices. With this type of sensor, Qeexo AutoML provides a diverse feature stack, taking advantage of the physical properties of microphone data to extract information relevant to such classification tasks. This blog will show you how to perform sound recognition with Qeexo AutoML and explain some of the basics concepts of our feature stack.

AutoML Tutorial

Qeexo AutoML offers a general use user-friendly interface for engineers who wants to perform sound recognition, or any other classification task on embedded devices. The processes discussed in this blog are not specific to sound recognition, but are specifically applicable to it. To get started, navigate to the training page and select (or upload) the labeled training data that you want to use to build models for your embedded device. In the Sensor Selection page, you can select the desired sensor types, (in our sound recognition example we utilize the microphone sensor), to choose the collected data, as shown in Figure 1.

Figure 1: Sensor Selection Page

You are also provided with the option of automatic sensor and feature group selection, if you want to use additional sensor modalities or experiment with feature subgroups. If this is selected, Qeexo AutoML will automatically choose the sensor and feature groups that make the classes most distinct. In the Inference Settings page, you can manually set up the instance length and the classification interval, or let Qeexo AutoML determines them by selecting Determine Automatically, as shown in Figure 2.

Figure 2: Inference Settings Page

In the Model Settings page, you can pick the algorithm(s), choose whether to generate learning curve and/or perform hyperparameter tuning and click Start Training button to start. After the training is finished, a binary file will be generated and can be flashed to the device by clicking the Push to Hardware button. Once the process is finished, you can perform live tests on the model that was built, as shown in Figure 3.

Figure 3: Model Details Page

While the process is by design very straightforward, the details of some of the choices may appear ambiguous.
Other blog posts go into some detail on different aspects of the pipeline, but we will focus on some of the feature
choices applicable to sound recognition.

Sound Recognition Highlighted Features

Fast Fourier Transform (FFT)

Signals in the time domain are difficult for humans and computers alike to distinguish among similar sound sources. One of the most popular ways to transform raw sound data is the Fast Fourier Transform (FFT). Due to the constraints of embedded devices, the FFT is an efficient frequency decomposition technique. The process is described in Figure 4.

Figure 4: FFT process

For different classes, the signals differ in their magnitudes for a given frequency bin. E.g., in Figure 5, sounds generated with different instruments have different distributions of the magnitudes among the frequencies 0-800 Hz; even with differences present up to 2000 Hz.

Figure 5: FFT Features for Different Classes

The Qeexo AutoML training methods will take advantage of the increased class separability in this range to train the model through model training. Qeexo AutoML doesn’t just use all of the FFT coefficients as input in training the model, but actually aggregate the coefficients to create sophisticated features. The specific groupings can be hand-picked during the model selection process to accommodate implementation constraints. To select the features groups, simply check the box(es) in the manual feature selection page as shown in Figure 6.

Figure 6: Manual Feature Selection Page

Mel Frequency Cepstral Coefficients (MFCC)

Mel Frequency Cepstral Coefficients (MFCC) is also an important technique for sound recognition. Humans react differently to distinct ranges of frequencies. As a species, we are more capable of telling the difference in frequencies between a 50Hz and a 100 Hz signal, than that between 10050Hz and 10100 Hz. In other words, we are really bad at distinguishing high pitched sounds. Therefore, in situations where you want to replicate a task performed by humans, such as voice separation, the difference when the frequency is low is the most important. The value of the signal properties decreases with increasing frequency. Mel scale comes into place here, by assigning more importance to the low frequency content and less to the high frequency content. The formula for converting from frequency to Mel score is:

    \begin{align*} M(f) = 1125 * ln(1+f/700)\\ \end{align*}

We build a filter bank containing many triangular filters and apply them to our FFT features to rescale the signals again and convert them to the corresponding Mel scales. In the Mel spectrograms shown in Figure 7, we can see that different classes’ Mel spectrograms appear to have many differences, making them ideal inputs for training a classifier.

Figure 7: Mel Spectrograms for Different Classes

Qeexo AutoML also provides features generated from the coefficients of MFCC. The feature groups can also be selected in the manual selection page shown in Figure 3. If desired, you can visualize the selected features through a UMAP plot by clicking the Visualize button shown in the Sensor Selection page and Feature Group Selection page.

Based on this discussion, it should be apparent that MFCC features will work well for tasks involving human speech. Depending on the task, it may be disadvantageous to include these MFCC features if it does not share similarities with human hearing. Qeexo AutoML performs automatic feature reduction, however, when automatic selection is enabled, so this does not need to be an active concern when training models. If the MFCC features are not highly separable for the task, assuming sufficient data is provided, they will be dropped from the final model during this process.


Qeexo AutoML not only provides model building functionality, but also present the details of the trained models. We provide evaluation metrics like confusion matrix, by-fold cross validation, ROC curve, and even support downloading the trained model to test it elsewhere. As mentioned earlier, we provide support for, but do not limit to microphone sensor usage for sound recognition. You are free to select any other provided sensors such as accelerometer and gyroscope. If these additional sensors don’t improve model performance, they won’t be included in the final device library, through the automated sensor selection process.


[1] Wikipedia: Sound Recognition,


Inference Settings: Instance Length and Classification Interval

Xun (Jared) Liu, Dr. Rajen Bhatt, and Dr. Geoffrey Newman 09 September 2020

Qeexo AutoML enables machine learning application developers to customize inference settings based on their use-case. These parameters are critical for achieving the best live performance of models on the embedded target. In this article, we will discuss the two parameters associated with the inference settings; instance length and classification interval.

Figure 1. Inference settings with microphone sensor (16000Hz) on Arduino

Instance Length

Instance length is the time period over which to make one prediction using raw sensor data. It is measured in milliseconds. According to the selected sensors and their ODRs, this time is then converted to the number of raw sensor data samples. These samples are used for computing features for training of ML models and also during on-device inference. If only one sensor is considered for the application, instance length is converted from milliseconds to number of samples using that sensor’s corresponding ODR. If there are multiple sensors with different ODRs, however, this conversion takes into consideration the sensor with the highest ODR. For other sensors, the number of samples is determined proportionally. Below are some examples for the Arduino sensor board with instance length of 500 milliseconds (0.5 seconds).

Setting 1: Microphone with ODR of 16000Hz.

Setting 2: Accelerometer and Gyroscope with 952Hz and microphone with 16000Hz.

For microphone,

For accelerometer and gyroscope,

How to Determine the Instance Length

Long instance length corresponds to a larger number of samples for featurization. According to Fourier Transform basic principles, more data points could yield finer frequency resolution, which captures an increased quantity of information from the signals. Therefore, it produces a greater number of features for the ML model training.

However, given the total time length, a long instance length would reduce the training dataset size. For example, if a signal of length L seconds is given and we divide that into segments of T seconds each, we get more segments if T is smaller and fewer if T is larger. For on-device live testing, larger T also implies more data needs to be collected at once to form a single prediction. Due to memory constraints of embedded devices, there will be limitations on the maximum instance length. Too small of an instance length can sometimes result in numerical instability of signal processing algorithms and may not capture sufficient discriminative information from the signals. For these reasons, AutoML restricts the minimum signal length to at least 64 samples.

Consider the following example for the microphone sensor (16000Hz) on Arduino. The instance length supported is at minimum 64 samples and at most 12000 samples. In milliseconds, this represents a range from 4 milliseconds to 750 milliseconds, as calculated here:

If multiple sensors (accelerometer & gyroscope; 952Hz ODR) are chosen, the range then becomes 4 to 1075 milliseconds.

Selecting the Best Instance Length

Qeexo AutoML supports automatically determining the instance length or setting it manually. The “Determine Automatically” option takes the minimum and maximum permissible values of instance length and finds the optimal value within this range. The optimization process tries to maximize the classification performance. It should be kept in mind for efficient model training that the optimization process takes longer to train models than manual selection.

Manual selection constrains the mininum and maximum permissible values. Any value within this range can be chosen for building the models. One way to estimate an instance length manually is visualizing the signal. As a general guideline, choose an instance length that is neither too short to miss part of the signal, nor too long that it could include unnecessary noise over multiple instances.

Instance length is a common parameter across all of the models, i.e., an instance length determined automatically or manually is applicable across all of the models.

Classification Interval (CI)

Classification interval refers to the time interval in milliseconds between any two classifications when live streaming sensor signals as illustrated in Fig. 3. It is a user defined parameter and accepts a value between 100 milliseconds (10 classifications in 1 second) and 3600 seconds (1 classification every 1 hour). Classification interval is not optimized even when selecting the “Determine Automatically” option.

Shorter intervals make predictions more frequent, but consume more power, while longer intervals save power, but can miss quick-burst live-streaming events when they occur between two consecutive classifications.

Figure 2. Instance length and classification interval

The detailed description of the Classification Interval is in this blog post.


Classification Interval for Qeexo AutoML Inference Settings

Sidharth Gulati and Dr. William Levine 12 August 2020

Inference settings contain two important parameters; Instance length and Classification interval. In this blog, we will explain the Classification Interval and in conjunction with raw sensor signals, ODR, Instance length, latency, and performance of the model on the embedded target.

Classification Interval is the step–size for each on-device classification, i.e., live testing. This interval determines “how often” we do on-device classification as shown in the plot below. For example, if Classification Interval is set to 200 milliseconds, Qeexo AutoML will produce a classifier that classifies incoming data at a rate of 5 Hz (5 times / second).

In the plot below, Instance Length (in milliseconds) determines how many milliseconds of sensor data are taken into account for each classification. Depending on the maximum sensor ODR selected for the use case, instance length in milliseconds gets converted into the number of raw sensor data samples. For example, 250 milliseconds of instance length is essentially 238 raw sensor data samples if sensor ODR is 952 Hz.

Please note that the true classification interval can never be less than the classification latency (the amount of time needed to calculate a single classification result). So, if the requested classification interval is less than the classifier latency, the true classification interval will necessarily be larger than the requested one; as the next classification will not begin until the current one is finished. 

There are 3 different relational cases between Classification Interval and Instance length which are described below.

Case 1: Classification Interval < Instance Length

Below case shows on-device classification with Classification Interval < Instance Length. This will result in overlapping of instances, i.e., some “overlap” of data between 2 classifications.

This choice of parameters may be appropriate for detecting short-lived transient events.

Case 2: Classification Length = Instance Length

Below case shows on-device classification with Classification Interval = Instance Length. This will result in no “overlap” of data between 2 classifications. Although, there will be no gap between 2 classifications.

This will reduce the rate of classification and depending on the application use-case, if it involves working with high ODR sensors, may result in missing some transitional data.

Case 3: Classification Interval > Instance Length

Below case shows on-device classification with Classification Interval > Instance Length. This will result in no “overlap” of data between 2 classifications and there will be a gap between 2 classifications.

This gap will manifest in even “slower” classifications (in comparison to Cases 1 and 2 described above) and might result in missing some transitions or classes completely.

This choice of parameters may be appropriate for monitoring the state of long-running machinery, where an anomalous state is expected to persist for some time. The larger classification interval has the advantage of reducing power consumption.


Anomaly Detection in Qeexo AutoML

Dr. Karanpreet Singh and Dr. Rajen Bhatt 15 July 2020

Qeexo AutoML supports three one-class classification algorithms widely used for anomaly/outlier detection; Isolation Forest, Local Outlier Factor, and One-class Support Vector Machine. These algorithms build models by learning from only one class of data. After learning, anomaly detection algorithms determine whether a test instance belongs to the normal class or if it is an anomaly. Qeexo has taken one-class approach for anomaly detection because it is easy to collect the data from normal class (e.g., normal operation of a machine) compared to doing multi-class data collection where each type of anomaly represents one class.

Isolation Forest (IF) [1]

Isolation Forest is an efficient algorithm for outlier detection, also very effective in high-dimensional datasets. It builds an ensemble of decision trees in which each tree is trained randomly; at each node in the trees, it picks a feature randomly, then it picks a random threshold value (between minimum to maximum value of the feature) for splitting the dataset. The trees are grown until all the instances are isolated from other instances. The anomalies generally tend to be far away from normal instances. The number of divisions required to isolate a sample from other instances is equivalent to the path length from the root node to the terminating node in the tree. The path length, averaged over all the trees, produces noticeable shorter paths for anomalies, and comparatively longer paths for normal data.

The average path length over the collection of isolation trees, referred as E(h(x)) in [1], is used to compute the anomaly score as:

Where  is the total number of instances in training data and . The  is the harmonic number.

Local Outlier Factor (LOF) [2]

LOF algorithm compares the density of instances around a given instance with the density around its neighboring instances. The distances of the given instance with respect to its k-nearest neighbors are used to estimate its local density. The LOF compares the local density of the given instance to the local densities of its neighbors. Instances that have substantially lower density than their neighboring instances are considered as outliers.

If we consider some data points in a space, the reachability distance of a data point p with respect to data point o is defined as:

where k is the number of neighbors considered in this calculation. The k-distance(o) is the distance of the data point o to its kth farthest data point from the dataset. The d(p, o) is the distance between data points p and o.

The reachability distance is used to calculate local reachability density (LRD). The (LRD) is inverse of the average reachability distance based on the k-neighbors of data point p. It can be written as:

Finally, the LOF of a data point p is average of the ratio of LRD of the p and those of its k-neighbors.

One-class SVM (OCSVM) [3]

OCSVM tries to separate instances in high-dimensional space from the origin. In original space, this corresponds to finding a small region which encompasses all the instances. If a given instance doesn’t lie in this small region, then it is considered an anomaly. The OCSVM makes use of quadratic programming to solve the optimizing problem for finding the coefficients corresponding to the support vectors. 

The objective function of the model for separating the data from the origin is written as:

The  variables are non-zero and are penalized in the objective function. Thus, the decision function for an instance becomes  which will be positive for most of the training data points while having the regularization term  to be small. The variable  controls the trade-offs between these two goals. 

Mapping Anomaly Scores to Range of 0 to 1

Qeexo AutoML internally squashes anomaly scores from different models in the range (0,1]. This is done to have consistent view of anomalies across all the algorithms which in turn assists in better calibration of the anomaly threshold. An instance is called an anomaly if the output of the squashing function is larger than a threshold value. The default value of the threshold in AutoML is 0.5. The user has the option to calibrate the threshold to make the predictions biased towards inliers or outliers.

Advantages of Qeexo AutoML Anomaly detection:

  • Only Normal­ class data is required. It is extremely difficult and sometimes even impossible to collect data for different kinds of anomalies. Qeexo AutoML need data only from one class.
  • Easy calibration of anomaly detection threshold with live streaming of scores and live classification
  • Support of multiple algorithms described in this blog with Quantization support for Isolation Forest
  • Can also be utilized for other one-class applications such as detecting unique air gesture using magic wand against all other gestures
  • Support for Automatic and Manual selection of features

Example Case

An application of anomaly detection for machine monitoring can be found here:


[1] Liu, F. T., Ting, K. M., & Zhou, Z. H. (2008, December). Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining (pp. 413-422). IEEE.

[2] Breunig, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (2000, May). LOF: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data (pp. 93-104).

[3] Schölkopf, B., Williamson, R. C., Smola, A. J., Shawe-Taylor, J., & Platt, J. C. (2000). Support vector method for novelty detection. In Advances in neural information processing systems (pp. 582-588).


Detecting Anomalies in Machine Data with Qeexo AutoML

Josh Stone 07 June 2020

Project Description

In industrial environments, it is often important to be able to recognize when a machine needs to be serviced before the machine experiences a critical failure. This type of problem is often called predictive maintenance. One approach to solving predictive maintenance problems is the use of a one-class classification model for anomaly detection, where the model can make a monitoring system aware that a machine is running in a manner that is different than its standard operating behavior.

This blog describes how to use Qeexo AutoML to build a one-class classification model for anomaly detection on machine vibration data. For this application, we will be using the ST, one of the many embedded hardware platforms that has been integrated into AutoML.

Problem Scenario

We will be using a fault simulator to simulate various normal or anomalous machine operating conditions. The fault simulator we are using consists of a flywheel driven by a rotational motor that can be configured to spin at various rates and can also be configured to have a number of different attachments.

Sensor Configuration

For this problem, we will select accelerometer and gyroscope sensors at an ODR of 6667 Hz, with FSRs of +/- 2g and +/- 125 dps, respectively. This should allow us to accurately capture the high frequency, high precision data typically required for machine vibration classification.

For more details about how to select an appropriate sensor configuration for any project type, check out our blog post on building Air Gesture models using Qeexo AutoML

Data Collection

For this problem, we want to determine whether the machine is running normally or not. In this case, normal machine behavior is set to be approximately 1500 RPM with no physical attachments.

Since we’re going to be building a one-class, anomaly detection model, we only need to collect data under these “normal” conditions, and we will use the resulting model to determine whether or not the machine is running under these conditions.

For this case, we will collect 200 seconds of continuous “1500 RPM” data. The first 10 seconds of this data is shown in the figure below.

Model Training

After configuring our sensors and collecting our data, we are ready to build an initial model. We will select the collected data from our Training page and press “Start New Training”.

Running a benchmark build

For this demo, we’ll be testing the difference between Manual and Automatic feature selection. To start, let’s check a build with the full Qeexo AutoML feature set enabled. We’ll build a model using these features on both the accelerometer and gyroscope data. To do this, we’ll select Manual Sensor Selection on the first training settings page, and then we’ll select Manual Feature Selection on the next page, so that all of the feature groups are selected.

Next, we’ll select the maximum instance length for 6.6kHz data and a similar classification interval, 307 ms and 250 ms respectively, and we’ll select LOF model type for the build. We will use these same values for instance length, classification interval, and model type for all of the builds in this demo.

After all of the configuration parameters have been set, we can launch the build by pressing the “Start Training” button.

After training has completed, the library will be flashed to the connected device and the model results will be available on the Models tab:

As shown here, our LOF model is already able to achieve very high CV accuracy with relatively low latency and size! This suggests that this problem is solvable with Qeexo AutoML.

Running a build with Automatic Sensor & Feature Selection

Next, we’ll try running a build with Automatic Sensor & Feature Selection enabled. We’ll use most of the same settings from before, except we will select the Automatic option in the Sensor Selection pane.

Enabling this option will apply Qeexo’s selection algorithms to find the optimal sensors and features for the given problem, at the expense of increased build time. In this case, the build took about 40% longer than the all-features build.

After the build has completed, the final model will appear at the top of the Models tab:

From the image above, we can see that with AutoML’s sensor and feature selection enabled, we are able to achieve even higher model accuracy than all-features model, while also having similar latency and substantially smaller model size than the all-features model!  

Finally, we will want to flash the compiled binary to our and check that the classifier is producing the expected output. As shown in the video version of this tutorial, the final model is able run inference on the embedded device and accurately recognize a variety of anomalous states, in real time. Check it out on our website at:


ODR and FSR of Sensors

Dr. Rajen Bhatt and Josh Stone 28 May 2020

Qeexo’s AutoML enables Machine Learning and AI applications development for a range of sensors. A comprehensive list of sensors includes Accelerometer, Gyroscope, Magnetometer, Temperature, Pressure, Humidity, Microphone, Doppler Radar, Geophone, Colorimeter, Ambient light, and Proximity. In this article, we will discuss two very important configurable parameters that apply to many of these sensors, Output Data Rate (ODR) and Full-Scale Range (FSR).

Output Data Rate (ODR):

ODR (also known as “sampling rate”) is the rate at which a sensor obtains new measurements, or samples. ODR is measured in number of samples per second (Hz). Higher ODR configurations result in more samples per second.  Different sensor packages often come with multiple available ODRs, and it is typically up to the application developer to determine which ODR to use based on the needs of the application.

For example, accurately distinguishing between knocking and swiping on tabletop may require a higher ODR, in the range of several kHz (see Figure 1). This means that thousands of new samples are available every second, enabling the machine learning model to find the differences in rapidly changing vibration data. However, other applications such as distinguishing between walking, sitting, and running activities will likely operate very well in the range of 10-50 Hz, or tens of samples per second. Other types of scenarios, such as distinguishing between varying air gestures, fall between the previous two examples and will generally work well with ODRs in the range of 400-800 Hz.

Figure 1: Accelerometer impact data at 6.6 kHz (top) vs. 104 Hz (bottom)

Often, higher ODRs can improve model accuracy, since higher ODRs make more information available to the machine learning model. However, there are two major drawbacks to using higher ODR signals for embedded applications: memory constraints and power consumption.

Memory constraints need to be considered for ML models in embedded applications. On an embedded hardware platform, it is only possible to hold a relatively small number of samples in-memory, in addition to handling all of the processing required to prepare and run the machine learning model. Since this upper bound of samples is fixed, higher sensor ODRs have a lower maximum window size in terms of real time. For example, if a given hardware platform can only hold 1000 samples in memory at any given time, this represents approximately 2.5 seconds of 400 Hz data, while it only represents 1/3 of a second of 3.3 kHz data.

Power consumption also needs to be considered for embedded applications and will be higher for higher sampling rates. Generally, running machine learning on embedded devices means striking a good balance between performance of the machine learning algorithms and meeting power consumption constraints for the embedded application. This is an especially important consideration for models which will be deployed to devices running only on battery power. It is recommended to try building models with a few different ODRs and check the performance of the models.

Qeexo AutoML can build ML applications for all the available ODRs of sensors included in the supported hardware platforms. Accelerometers and gyroscopes generally have many different ODR options. Some industrial grade accelerometers can have ODRs as high as 26KHz, e.g., ~=26000 samples in one second. These accelerometers are capable of operating in industrial environments and are a great fit for machine monitoring applications on Qeexo AutoML.

Number of samples per second can vary depending on the hardware and firmware properties of the sensor module. Qeexo AutoML performs data quality checks, where it tests that the effective ODR matches the configured ODR of the sensors, among other things. We also recommend using Qeexo AutoML’s visualization tool to visually check the signal before training the ML models.

Full Scale Range (FSR):

Full Scale Range is associated with the range of values that can be measured for a given sensor and allows the application developer to trade-off measurement precision for larger ranges of detection. Two sensors that often have variable FSR settings are accelerometers and gyroscopes. Accelerometers measure the acceleration (rate of change of velocity of an object) in X, Y, and Z directions in the units of g (relative to the force of gravity). Gyroscopes measure angular velocity in Degrees per Second (DPS) in X, Y, and Z rotational directions.

Full scale range for accelerometers is generally programmable as ±2/±4/±8/±16 g, depending on the hardware platform. The smaller the range, the more sensitive the accelerometer will be to lower amplitude signals. For example, to measure small vibrations on a tabletop, using a FSR of 2g would provide more detailed data as it will be very sensitive to any minor accelerations, whereas using a 16g range might be more suitable to measure vibrations of somebody walking.

The DPS range for gyroscopes is generally programmable to ±125/±250/±500/±1000/±2000 depending on the hardware platform. The smaller the DPS range, the more sensitive the gyroscope will be to smaller angular motions. For example, to measure small angular motions for hand gestures used in a gaming application, using a smaller range would provide more detailed angular velocity data than using a 2000 DPS range, which might be more suitable to measure the angular motion of a fan.

It is recommended to check for saturation of signals while working with FSR. If lower g and DPS are configured for accelerometer and gyroscope, but their actual measurements are higher than the configuration, their signals will saturate. Saturation happens because they cannot measure the desired physical quantities greater than their configuration, which would result in overflow. We recommend using Qeexo AutoML’s visualization tool to check for the saturation of the signals. Qeexo AutoML’s data quality check also checks for signal saturation and warns users when saturation is suspected.

Figure 2: Gyroscope motion gesture data at 125 DPS FSR (top) vs. 1000 DPS FSR (bottom)


Detecting Air Gestures with Qeexo AutoML

Josh Stone 26 May 2020

Project Description

We would like to build a machine learning model to distinguish between the following three classes:

– “X”
– “O”
– No gesture

This blog describes building the Air Gesture with Arduino Nano 33 BLE Sense. You can also build the same using any of the boards available on your Qeexo AutoML.

Sensor Configuration

For any sensor configuration, we need to consider three factors:

  • What type of data will capture the differences between our classes
  • What signal length will capture the differences between our classes
  • What range of sensor values will fully capture the range of our input

Based on these factors, we will select accelerometer and gyroscope sensors at 476 Hz for the air gesture problem. We will use +/- 8g and +/- 500 dps for the sensor FSRs.

These two sensors should be able to capture the type of data well, since they are motion sensors and our problem deals with differences in device motion.

Based on hardware memory constraints, we can only use 1024 samples per-channel on the Arduino Nano 33 BLE Sense, so an ODR of 476 Hz should allow us to have a signal length of multiple seconds to make each classification.

Finally, based on the fact that the device will be in motion, we will need a large range of possible sensor values. These larger values of FSR will prevent our sensors from saturation, even under scenarios of rapidly changing device position and speed.

Data Collection

For these three classes, we will need to use both types of AutoML data collection: event and continuous. To decide which type of data collection to use for each class, we need to consider the average time spent in a given class. If this time is 10 seconds or less, we should typically use event data collection. Otherwise, we’ll use continuous data collection.

Collecting continuous data

For the “no gesture” case, we will use continuous data collection, because we will often expect our final ML classifier output “no gesture” for long periods of time, sometimes minutes or even hours. We want our classifier to output “no gesture” for as long as the device is at rest. To collect continuous data, we will select Continuous, enter an appropriate class label, and enter an amount of time for an initial data collection. For now, we will collect 30 seconds to build an initial model – we can always collect more later if we find that performance isn’t as good as we’d like.

From there, we will press “Record” and go on to collect our “no gesture” data.

Collecting event data

Since the “X” and “O” letter gestures are discrete events, typically entering and exiting the class within a second or two, we will use event data collection.

To collect event data, we will select Event, enter an appropriate class label, and enter two additional values: a length per event, and a number of instances. For now, we will collect 10 instances to build an initial model. For length per event, we will select a number of seconds which will give you enough time to complete a full example of the given class. For example, since the “X” class takes 1-2 seconds typically, we will use a value of 3 or even 4 seconds to make sure we can complete the gesture in time.

From there, we will press “Record” and go on to collect our “X” and “O” letter gesture data.

Note: at the start and stop of each “event”, the device should be in an at-rest state. This will help AutoML to segment the incoming signal and determine where the actual event data occurred inside the collection window. This is why we should select a value for length per event which ensures we can start and stop the given event within the allotted time.

Here’s an example of a good event instance:

Note how the actual event is located fully within the collection window, and that AutoML is able to detect both the start and stop and highlight the event signal.

Here’s an example of a bad event instance:

When compared with the previous image, you can see how AutoML is not able to successfully find the full event range.

Model Training

After configuring our sensors and collecting our data, we are ready to build an initial model. We will select the data from our Training page and press “Start New Training”.

NOTE: The initial window that appears (step 1 of 4) is an optional “Group Labels” page. We can skip through this page for now, since we’ve only collected three classes of data, and we want to build a model which can distinguish between all three classes.

Sensor and Feature Selection

You will now be presented with Sensor and Feature Selection options (step 2 of 4). This section allows choosing a subset of recorded sensors and features to compute for each sensor, either automatically or manually. The automatic mode performs sensors and feature group selection fully automatically. Manual sensor selection can be combined with automatic feature selection or manual feature selection. For now, since this is our initial model, we will manually select both the recorded sensors, Accelerometer and Gyroscope and manually select all the feature groups available with both the sensors.

Configuring Inference Settings

The next step in building our initial model is configuring the inference settings (step 3 of 4). There is an option to have AutoML make these selections for you. If you want to use that option, please skip to the next section.

To manually configure the inference settings, we need to consider two things:

  • How long does the signal need to be for our model to make an informed decision?
  • How often does the class change for my problem?

In our case, our event signals last roughly 1-2 seconds. This timeframe should also be long enough to distinguish between either of our gestures and the “no gesture” class. Based on this reasoning, we will select 2000 ms as our instance length.

Since the current gesture class is user-controlled, and since we can move between classes quickly, we should select a fairly low value for the classification interval. A value of 500 ms should make classifications often enough to catch any changes between states.

Configuring Model Settings

The final step in building our initial model is configuring the model settings. There are a variety of options on this page, all of which control various aspects of the model-building process. You can select from among various models, chose to do hyperparameter optimization, or decide whether to generate learning curves.

For now, we will de-select all of the optional optimizations available at the top of the page and train a simple Logistic Regression model.

and train a simple Logistic Regression model. This model should be able to handle our small dataset well, as opposed to the deep learning models for which we might not have enough data and will hopefully find some simple patterns which can distinguish between these three classes.

Next you will see the real-time training progress of various steps of machine learning model training and results generation. Clicking on Training Results will show cross validation performance, library size, and latency of the model. Details will show many other results such as confusion matrix, ROC curves, and MCC matrix. You can flash the library using Live Test and after flashing is successful, your Arduino Nano 33 BLE Sense is ready to detect one of these three gestures.


Feature Selection Approaches: Part – I

Qifan He and Dr. Rajen Bhatt

In machine learning, the quality of feature selection strongly affects the quality of the trained model. Feature selections approaches differ depending on the type of machine learning problem, e.g., supervised learning or unsupervised learning. For supervised learning algorithms two most popular feature selection techniques are Wrapper-based and Model meta-transformer approach. For unsupervised learning algorithms, filter-based approaches are widely used.

In the part-I of this article, we will look into the wrapper-based feature selection approaches used for supervised learning problems.

Wrapper methods:

This approach keeps underlying ML algorithm in the loop with different subsets of features and sets the objective function quantifying the performance of the model, e.g., cross-validation score. Based on the strategy it adopts to iterate over different subsets of features, wrapper methods can further be categorized into exhaustive search, forward selection, backward selection, and k-best.

  1. Exhaustive search as its name suggests creates all the possible subsets of the actual feature sets and recommends the best subset according to the criterion. This approach is very time consuming as for p number of features it needs to evaluate 2p-1 combinations. For large p, this approach is almost impractical.
  2. Forward selection starts with zero features, evaluates each 1-feature model according to the defined criterion, and keeps the best performing feature at the root. Then makes 2-feature combination with the selected best and repeats the procedure until any further gain in the performance criterion cannot be achieved or all the features are exhausted. Fig. 1 explains the example run of forward selection algorithm with four features in the original set, X = {x1, x2, x3, x4}. Note the highlighted feature groups at each stage of the algorithm and algorithm termination with selected features {x2, x1, x4} because either the performance of {x2, x1, x4} is greater than or equal to {x2, x1, x4, x3}. For p number of features, this approach in the worst case evaluates p*(p+1)/2 combination of features instead of 2p-1 exhaustive combinations.
  3. Backward elimination first builds model with all the available features and recursively removes the least significant feature. The least significant feature can be identified by building models with all but one feature at a time and finding the drop in the performance. A feature which results in the performance drop of more than the threshold should be removed from the pool. The process continues until all the features with least significance are removed.
Fig. 1 Example Flow of Forward Selection Algorithm with Four Features

4. k-best approach is basically equivalent to running only the first stage of forward selection approach. It ranks each feature according to its performance on the defined criterion and can be cross-validation performance. Then it chooses the top k features, where k can be specified according the requirements on the memory, compute power, and latency.

Qeexo AutoML Feature Selection:

AutoML runs grouping-based forward selection wrappers method for feature selection to achieve the best model performance and smaller memory footprint libraries with low latency. To deal with continuously streaming data from multiple sensors at many different sampling rates, AutoML extracts hundreds of features from these streams. Some of these sensors are even multi-channel sensors, e.g., accelerometer, gyroscope, and magnetometer have three dimensional streams, colorimeter has four dimensional streams, etc. Multiple dimensions add to the complexity of feature space. To deal with such a large and complex feature space, AutoML categorizes feature space into several groups, for example, statistical-based, frequency-based, filter bank-based, etc. AutoML then runs forward feature selection algorithm on groups and recommends best feature groups.

AutoML expert mode also expose feature groups for each sensor to the user and then users can manually choose the feature groups. This approach allows users to rapidly iterate and trade off among various criteria library size, performance, and latency. In built visualizer projects the feature groups to 2-dimensional embedding space to visualize the classification properties of the selected group of features to guide the user selection.

For one-class problems, e.g., anomaly detection, Qeexo’s AutoML platform uses different methods to perform feature selection. These methods will be covered in Part-II.


Deep Learning in Qeexo AutoML Platform

Yanfei Chen and Dr. Rajen Bhatt

Deep learning (DL) has gradually become one of the most popular areas in artificial intelligence after the 1990s. Deep learning is a branch of machine learning and uses neural layers to build models. It combines low-level features and gradually forms abstract representation features to model the input data. Deep learning builds neural networks that simulates the human brain for analysis and learning. It mimics the mechanism of the human brain to interpret data, such as images, sounds, texts, and sensor readings.

According to the Universal Approximation Theorem [1], if a feedforward neural network has a linear output layer and at least one hidden layer with finite number of neurons and sigmoid activation function, it can approximate any continuous functions on a compact subset of ℝn with a sufficiently small error e > 0. This theorem is considered to be the theoretical basis of neural networks in deep learning.

The training process in deep learning is not particularly different from ordinary machine learning models. In deep learning, the objective is to find the weights of neurons or parameters of convolution filters in each layer, and this is done through predefined loss function, using gradient descent to continuously reduce the predefined loss, and strive to achieve the convergence. Backpropagation algorithm is the most common method for training neural networks. The algorithm will first calculate (and cache) the outputs of each layer according to the forward propagation, and then calculate the partial derivative of the loss function by applying chain rule with respect to each parameter in the way of traversing the graph backwards.

Deep neural networks, such as several common convolutional neural network models shown in the figure 1 [2], often have many parameters. Their parameters can often be in several thousands to millions depending on the application and size of the datasets presented for the learning. In the field of machine learning, the collection and labeling of training data is often a complicated and cumbersome process, which requires huge human and other resources. Thanks to Qeexo AutoML, the collection and labeling of the datasets is all integrated in the platform.

Figure. 1 Architecture of LeNet-5, a convolutional neural network.

Tuning of deep learning model parameters is another challenge. For example, selecting the depth of the neural network, size of the convolution filters for each layer, what pooling to use, number of neurons in each layer, the learning rate, the optimization function, the batch size, number of training epochs, etc., often requires machine learning engineers to debug based on their knowledge and experience. This tuning process can be frustrating and time consuming. Deep learning is often very sensitive to model parameters, and the selection of model parameters will also affect the accuracy, size, and latency of the resulting model.

According to different learning tasks, commonly used models in deep learning include feedforward neural networks, convolutional neural networks, and recurrent neural networks. Qeexo AutoML platform for sensor data makes use of all these deep learning architectures for machine learning on micro-controllers. Due to memory, computation, and power consumption limitations on micro-controllers, Qeexo performs very sophisticated optimizations and model compressions on these architectures. Qeexo AutoML deep learning architectures does not require runtime interpreters when deployed on the microcontrollers. Libraries with no runtime make these models better suited for microcontrollers, they are light weight and has low latency.

Qeexo AutoML performs hyper-parameter tuning for deep learning models, saving time and effort of users dealing with this cumbersome process. Users also have a choice of configuring these model parameters themselves and if these parameters tend to generate models which are bigger than available memory size on the embedded devices, it performs automatic model reduction to make sure it fits on the device without sacrificing the model performance significantly.

Qeexo is one of the world’s first companies to run Deep Learning-based models from mobile phones. These models are light weight with latencies as low as 10 mSec. on mobile phone’s application processors. Qeexo’s AI engines are running on more than 300 Million mobile devices and growing rapidly. Today, Qeexo AutoML builds and deploys deep learning models on embedded targets using various sensor signals. Qeexo uses its proprietary model conversion technique to build light weight deep learning models on embedded microcontrollers. Additionally, Qeexo AutoML supports quantization-aware training of deep learning models. This type of training is aware of the fact that models are going to be quantized post-training and strives to retain the performance of the model despite quantization. Quantization can further reduce the model size while striving to maintain accuracy. This approach makes it possible to build bigger models when required to achieve high accuracy for certain applications.


  1. Cybenko, G., “Approximations by superpositions of sigmoidal functions”Mathematics of Control, Signals, and Systems, 1989, 2(4), 303–314.
  2. LeCun, Yann, et. al. “Object recognition with gradient-based learning.” Shape, contour and grouping in computer vision. Springer, Berlin, Heidelberg, 1999. 319-345.