Beyond the Solution Stack: Data Management for the Internet of Things

When Lux first introduced a framework and taxonomy for approaching the industrial internet of things (IIoT), the focus was on tools and understanding the various parts of the solution stack. The Lux “IIoT Toolbox” provided clients and readers with a structured view of how to turn data from industrial assets, people, and environments into actionable insights to do things like improve operational efficiency, generate new services and revenue streams, and improve employee health and safety. This was an important and valuable baseline for understanding the IoT, and how to properly construct a solution stack.

However, understanding each of the elements in the IoT toolbox was only the first challenge. As clients and readers have started to figure out how to properly construct an IoT solution, the focus now shifts towards refining, optimizing, and future-proofing such a solution. The ensuing challenge here becomes effectively and efficiently managing all of the data generated by the IoT. For this, we propose a data-centric framework and taxonomy to help clients and readers understand the next layer of capabilities, processes, and approaches to properly managing the glut of data being generated within their respective deployments.

In no particular order, the seven core components to this data-centric framework include the following:

  • Data creation (or data generation) references all of the activities related to capturing and contextualizing sensor data. It is more than just recording simple, discrete sensor readings. It is also determining how data is collected, choosing an appropriate sampling frequency, imposing boundaries and limits on the data to filter spurious events, and other activities like manual annotation to further contextualize the data sample.
  • Data security references all of the activities related to isolating data to a discrete audience of authorized individuals, machines, applications, or processes.It is not just the deployment of software and systems such as firewalls, intrusion detection systems (IDS), and antivirus software to keep data isolated and protected. It is also establishing role-based data access, user authentication schema, data encryption, and the utilization of AI and machine learning to continuously monitor for network intrusion or unauthorized data access. It is also the ongoing remediation and patching of systems to ensure data and the surrounding environment remains protected from illicit activity. Many decisions must be made with regard to data security, and they will always appertain to the decisions made with regard to the other components of the data-centric framework.
  • Data transmission references the connectivity and conveyance of data across the entire end-to-end IoT solution stack. It is more than just a singular event of transmitting data from sensor to end device. Data is often transmitted across multiple devices, among multiple locations, using multiple formats or standards, and using varying physical means. Data may be transmitted using short-range technologies, such as Bluetooth or Wi-Fi, or long-range technologies, such as LPWAN or cellular. Data transmission may also take place at varying intervals, and may only involve subsets or summaries of entire data sets. Decisions regarding what to send and how to send it are always unique to the specific end-to-end solution stack.
  • Data cleanliness references the relative signal to noise ratio of the IoT device data being generated. It is a measure of how much useful information was collected, as a proportion of all data collected. Data often requires cleaning and validation prior to being stored and analyzed. Spurious sensor readings, instrumentation failure, transmission failure (leading to incomplete data sets), environmental interference, and other issues can “contaminate” data sets. “Cleaning” data to remove or repair these types of issues helps to ensure the general accuracy and completeness of IoT analytics and insights.
  • Data analytics references all of the activities related to computational processing and understanding of IoT device data. It is the processes involved with examining data sets in order to draw conclusions about the information they contain. Analytics were traditionally administered by human data scientists, but artificial intelligence (AI) and machine learning have expanded the capabilities, speed, and breadth of analytical capabilities of most modern IoT analytics solutions.
  • Data storage references all of the activities related to storing data across the entire end-to-end IoT solution stack. As with data transmission, data may be stored in multiple locations, in multiple formats, and on varying storage media. It is more than simply storing data in a particular digital or physical format. Data may need to be re-formatted in a particular schema (such as JSON or XML), or stored as a file, object or in a database. It may need to be aggregated, annotated, deduplicated, compressed, encrypted, or archived. Many decisions must be made with regard to data storage, and it often depends on other components of the data-centric framework, such as data analytics, transmission, and security.
  • Data sharing references the imparting of data across separate IoT deployments. It may involve an IoT solution ingesting outside, third-party data to provide better internal analytics capabilities. In this scenario, outside data would be used for training of machine learning algorithms, and improving the accuracy of analytics and insights. Data sharing may also involve an IoT solution sending internal data to an outside, third-party entity to enhance collective analytics capabilities. Some platforms even enable the monetization and sale of internal data, which opens the possibility of generating new revenue streams.

We plan to explore each of these components in greater detail in upcoming journals. For each topic, we will discuss traditional approaches, key innovations, and some specific applications demonstrating the importance of each. Readers should continue to follow the discussion surrounding effectively and efficiently managing all of the data generated by their own respective IoTs.