KTH BigDataBase and Enabling Data Sharing
Design and development of a big data database that collects data from other databases – which in turn handles sensor data from various activities. The database will also handle the Internet of Things and artificial intelligence techniques such as machine learning. The database makes data available that can enhance innovation in the construction and real estate sectors.
The interest in placing research and development in actual environments is great, and there is a strong need from industry and academia to make cities, buildings and activities available for research, testing and education. Cities and buildings with sensors and connected devices are often available, however, structures, routines and systems for efficient and secure handling of data, and information connected to data, are lacking. Companies and organizations often also have large datasets stored, but relevant legal structures for how data can and should be used are often lacking. In order to be able to use data and understand cities, buildings and businesses better, an open and transparent data management system is needed: an open database.
The development of an open database also raises questions about integrity and data security. KTH Live-In Lab (LIL) has already carried out a project on GDPR and smart buildings in collaboration with the Law and Information Technology department at Stockholm University (SU), and an ongoing project on ethics testing and smart buildings is underway. Furthermore, talks with SU are underway regarding a national research database.
Many research groups and teachers currently lack real data for extracting information, which means that research and education are often referred to simulations based on fabricated datasets for testing and verification of models or prototypes. KTH BigDataBase (KTHBD) will provide both a large database and real-time data to research groups, who previously have not had access to this. What is more, BigDataBase will provide expertise on how the data has been generated, which increases the validity of produced research results.
This project aims to set requirements for and implement a database that in a scalable way can handle large amounts of data with an initial focus on property and user data as well as data from cities. It is about designing and developing a big data database that collects data from other existing databases which, in turn, handle sensor data from different activities. The database will also handle Internet of Things and artificial intelligence technologies, such as machine learning, to provide data to various stakeholder groups, researchers, students, and to small and medium-sized enterprises and property owners, for the purpose of enhancing innovation and, thereby, strengthening the competitiveness of Sweden.
The underlying hypothesis of this database project is that innovation in the construction and real estate sectors can be made visible in completely new ways and to new groups, by showing the usefulness of real estate-related data, e.g., increasing the understanding of both managers and residents for the actual operation of buildings.
- Create a database design, and proposals for structures for information management, information dissemination and collaboration.
- The database will be able to handle data from KTH Live-In Lab and other buildings that are intended to become part of KTH Live-In Lab's infrastructure.
- Data will be made available to all different actors involved, from students and researchers to companies and authorities, in a secure and reliable way.
- Issues such as data quality, scalability and communication speed will be considered and examined carefully.
- In addition to a database prototype, the project will also result in guidelines, requirements specifications, methods and guides for how different actors who want to handle large amounts of data and information can work with storage, accessibility and information dissemination.
Aims and objectives
The project aims to develop structures for information management and data dissemination primarily from operations within KTH. These structures are tested by building a full-scale database for KTH Live-In Lab and KTH-MIT Senseable Stockholm Lab. The structures must be generic and useable for development of databases for other types of data. Data must be made available to various actors in a secure and reliable way. Issues such as data quality, scalability and communication speed will be considered.
The project objective is a scalable and open database prototype that supports the users' needs, and collects and provides data and information. A long-term objective is to be able to model and predict events, by using methods such as machine learning, and thereby automate processes in new areas by linking data from previously separate datasets / activities. Another long-term objective is improving the technology (sensor behaviors, products and services), the methods and the behavioral patterns.
After reviewing some of today’s larger and more comprehensive solutions that could meet the needs of both KTH Live-in Lab and KTH in general, we have concluded that a database solution based on the open platform Hadoop would be suitable for our purpose.
This solution is used and further developed by several major operatos, such as Google and British Royal Mail.
As this solution, in terms of big data and versatility, is of a larger model with regard to project planning and character, a solution will be developed and evaluated in stages. This way we will be able to gradually expand its functionality as the Live-in Labs grow and new needs arise.
- ”As a student, I would like to analyse data (or rather that something analyses the data for me!) in order to improve my studying condition." This can be data that comes from different sources and lives unstructured in KTHBD.
- “As a researcher, I want to be able to access raw data (real-time and historical) in order to be able to develop new services." This can be data that comes from different sources and lives unstructured in KTHBD.
- “As an operator, I want to be able to create simple diagrams and aggregated datasets, and be able to depersonalize data for web publishing." This can be data that comes from different sources and lives unstructured in KTHBD.
- “As a teacher, I want to be able to retrieve depersonalized data to be able to create case studies / projects around buildings and systems.” This can be data that comes from different sources and lives unstructured in KTHBD.
- “As a researcher, I would like LIL's data to be made available so that I can analyse and explore value-creating innovative services.”
- “As a researcher, I would like LIL's systems and installations in the homes to be made available so that I can involve the tenants in testing new innovative products and services.”
- “As a financier, I want research projects conducted in LIL to be published so that I can take part of methods and results to gain better insight and knowledge.”
- “As a financier, I want to be able to initiate research and development projects in LIL so that I can act as an involved and driven partner.”
- “As a financier, I want research projects conducted in LIL to generate value-creating and commercially viable solutions so that I can act as a business partner.”
- ”As a researcher, I would like to use ... information of the inhabitants (age, activity...) and information about the different sites (outside temperature, light condition...) in order to ponder my findings.”
- “As a startup in sensor development, I want to be able to verify raw data from self-developed sensors against existing control data from KTH LIL, and be able to train mathematical models to extrapolate new data.”
- “As a researcher, I want to be able to install new sensors and systems and get the values from these into the same system as existing sensors / systems.”
With the above needs as a starting point, we are convinced that a big data solution (KTHBD) will meet the existing needs at KTH today and in the foreseeable future.