We can think about the data generated by an IoT device as going through three separate phases. The initial phase is that of the data creation itself. This phase takes place on the level of the IoT device from where the data is transmitted over a network. The second phase of the journey of IoT data is that of collection and organization. The third phase involves the actual use of that data. This is the process of making that data valuable for a wealth of contexts.
In what follows, we trace the process of IoT data collection and data storing across these stages.
Streaming IoT data
In the world of IoT, each event creates data. Sending the data over involves standard protocols such as MQTT, WAMP, HTTP, CoAP, or Sigfox. Each of them comes with its strong points and adjacent use cases. These protocols support the fetching of updates or other information from the IoT device to send it to a given centralized location for actual processing.
At this stage, one has to decide how the data is aggregated and stored for future use. How one is to proceed from now on depends on how the IoT data is to be consumed. Here you select an approach and determine whether the data should be sent over in real time or in batches. Also, you determine in what order the data points should be created for a maximally accurate analysis.
Storing IoT data
Using real-time data safeguards maximum accuracy. This approach guarantees access to all data generated by each IoT device. However, this usually means massive amounts of incoming data. Putting the right timestamp for sorting data coming from multiple IoT devices becomes a challenge. One should consider systems that are able to accommodate the speed and volume of that incoming IoT data.
Opting for the approach of collecting all available IoT data in real-time and analyzing it all should be backed by a clear rationale. Otherwise, the sheer volume of the data may have an unnecessarily heavy impact on the cloud systems—that is, the network resources and computational resources required to keep the inflow of IoT data.
One should also consider concrete IoT applications. These have unique requirements in terms of latency, energy consumption, and accuracy. Some applications may allow for delays. Others, such as security applications, are rather seen as time-critical and cannot have room for delays.
Many use cases may not require high accuracy and would allow for sending the data over in batches. If the data is sent in batches or micro-batches, you still get a record of all data. This takes place not in real time but only at given pre-established intervals. The choice of a particular use case depends on the requirements. In some scenarios, you will need accurate real-time data for your analyses. In other scenarios, historic data will do just as well.
The platform approach
If planning to analyze the incoming data directly, at this stage, you need a platform capable of ingesting data coming from large numbers of IoT devices streaming data in real time. One such platform has to be able to respond to temporary connectivity issues. These may include a lost connection, outage or a server failure. The platform helps you to avoid loss of data. You, therefore, do not risk impairing the accuracy of the results to be generated on the basis of that data. On the platform, data is stored in a ready-made form for data modeling and analytics.
Target-driven approaches to data collection
In broad brushstrokes, again, the approach selected in collecting and storing IoT data heavily depends on the target requirements of a given use case. These involve, but are not limited to, data collection procedures motivated by needs as diverse as the accuracy of the data, energy consumption, response time, and privacy protection. Proven approaches for reducing the volumes of collected IoT data are data aggregation, filtering, interpretation, and compression at the sensor or IoT edge level, as close to the data source as possible.
In what follows, we summarize some of the key data collection approaches based on the intended application as laid out in this study.
Strategies motivated by accuracy
This IoT data collection strategy takes into account the tradeoff between the frequency of the measurement requests and the accuracy of the data. Here the frequency is adapted with an eye towards the target data accuracy. It may be the case that a given use case requires only a certain level of data accuracy. A higher level of accuracy does not bring any additional value to the effort.
As part of this strategy, one needs to reduce the frequency of the data measurements. This reduces the resources necessary for the collection of the IoT data while retaining the accuracy target that has been calculated as optimal for that use case.
Strategies motivated by time criticality
In this approach to IoT data collection, a maximum delay value is established in advance. The time that has elapsed since the timestamp of the last measurement has to remain below that maximum delay value. In IoT data collection scenarios motivated by time-related requirements, each new measurement that is received minus the timestamp of the last measurement has to be below the maximum delay. Hereby the elapsed time corresponds to the “freshness” of the data measurements.
IoT data collection strategies motivated by energy demands
When opting for this approach, energy consumption is a key factor. The effort is to achieve the targeted accuracy while optimizing power consumption. An energy-driven strategy for IoT data collection aims at maximum efficiency. The benefit here is measured as the difference between the utility achieved for some targeted data accuracy minus the power consumption for the measurements necessary to achieve that data accuracy.
Similar to the accuracy-driven scenario, here it is assumed that, depending on the application, one will aim for a specific target accuracy knowing that a higher accuracy will offer no additional benefit.
IoT data collection strategies motivated by privacy concerns
If the privacy of the end-users is of prevalent importance, this approach can be adopted. According to the study, here there will be an effort to minimize the number of times a sensor is asked for data and to add noise to the results using privacy techniques.
The goal is to safeguard the privacy of end-users by modifying the accuracy of individual measurements while at the same time maintaining a certain level of “adequate accuracy” for the results as a whole.
Adding “noise” to the result is achieved by using “differential privacy”. According to the study, this permits the extraction of data pertaining to a population of users without revealing any information about individuals. This is thanks to the addition of “noise” during the very data extraction process without significantly altering the statistical result.
This approach recommends adding noise at the IoT data collection stage and not at the data processing stage. The related experiments show that an intervention of this kind has a non-significant effect on the final statistics.
Customization with the Record Evolution Platform
This has been just a taste of the various data collection approaches and trends in IoT data harvesting right now. Within the Record Evolution platform, the data science studio allows you to connect to your IoT data source and stream IoT data directly into a compact lightweight data warehouse. The platform makes it possible to adjust the exact way you want to have the data collected as well as monitor and control the data streams at all times. Get in touch for an in-depth discussion.
About Record Evolution
We are a Data Science & IoT team based in Frankfurt am Main, committed to helping companies of all sizes to innovate at scale. So we’ve built an easy-to-use development platform enabling everyone to benefit from the powers of IoT and AI.