By utilizing health and wellbeing data, it is possible to find solutions to many individual, societal and even global health challenges. However, the use of data involves a lot of country-specific, legal, and ethical issues, which is why a major part of collected health data currently is still unused.
One solution to the challenge is synthetic data, the usage and application of which in product development is increasing all the time. In the "Synthetic Wellbeing Data Incubator" project of the University of Jyväskylä and the city of Jyväskylä, the measurement data provided by the Finnish Institute of High Performance Sport KIHU is synthesized. In addition, the Faculty of Sport and Health Sciences of the University of Jyväskylä conducts measurements for fitness enthusiasts for almost a year, and the data obtained from the measurements is synthesized by the Faculty of Information Technology of the University.
Like organic data, but secure
Synthetic data interests companies for many reasons. When the amount of available organic data is small, synthetic data can be used to supplement the data pool. Creating synthetic data is faster, but often also more cost-effective than collecting real data.
More than anything else, using synthetic data solves problems related to privacy protection: duplicating sensitive health data separates them from real individuals, in which case the risk and possibility of identifying a person based on the data is reduced or even eliminated. This also enables various collaborative projects, as data can be shared and processed more openly.
– Synthetic health and wellbeing data is artificial data with the same properties as real data. It looks the same and leads to the same conclusions as real data, but it does not involve problems related to personal data protection, says Joonas Tuomikoski, doctoral researcher at the University of Jyväskylä.
Today, synthetic data is largely created using machine learning methods.
– Synthetic data can also be produced by simulation, but in exercise and sports we are often collecting long-term data, in which case simulation is more difficult because of the many individual characteristics of people that affect their training over a long period of time, Tuomikoski points out.
Extreme values challenge thinking – what if they are lost?
One company familiar with synthesizing and using health data is Terveystalo. Oskar Niemenoja, Head of Data, Analytics and Business Intelligence, says that Terveystalo wants to be actively involved in developing new tools and solutions for healthcare. He believes that a lot of social benefits could be achieved by sharing health data more openly than at present.
– The more we can pan data collected throughout society, the better it fosters ideas and boosts innovation within all parties, he says.
Terveystalo has noticed that despite its favorable aspects, one of the challenges associated with synthetic data is a special feature of machine modeling: the removal of extreme values.
– In general, the most interesting findings are, for example, extreme values referring to rare diseases, which do not behave "nicely" in statistics. When dealing with data, we would have to make a choice and base our research on safe and nicely behaved data, thus losing the extreme values and their benefits at the scientific level. If these values were included, they would be more identifiable in theory, and we would lose the benefit of the anonymisation that synthesis brings, Niemenoja describes.
– The interesting question is whether it would be possible to create data that is both useful and absolutely anonymous and contains these atypical data points that challenge thinking and serve as a stimulus for scientific insights, he reflects.
– We also want to contribute to supporting basic research on the subject.
Solutions to individual and societal issues
Jukka Perkiö, Sprint AI's Lead AI Architect and Specialist at the University of Jyväskylä, sees great opportunities in the use of synthetic data and digital twins, both in the development of elite sports at the individual level and in solving societal challenges.
– For example, at the individual level, we can model a digital twin of both the athlete and reference athletes representing the target state. By describing the differences between these twins, we can use scientific methods to decide what needs to be changed in training so that the athlete’s development moves towards the target state, Perkiö says while pointing out that in practice, the concept is of course more complicated than the simplified example.
On a societal level, a digital twin could be used to find solutions to, for example, the immobility epidemic.
– We have a huge societal challenge in how we can reach and motivate the groups most in need of help, says Perkiö.
– By modeling subpopulations from a larger sample, it is possible to study how public health changes can be derived to the level of the entire population. The idea is that people’s social relationships, demographic similarity and the digital twin created from the material help us plan and target the right measures to the right target group, he explains.
Inviting companies to join the Wellbeing Data Lab
Program Manager Nina Rautiainen from the Business Development Unit of the City of Jyväskylä reminds that the data economy in general has a huge international business potential – an estimated 500 billion euro in Europe alone.
– We have the conditions to make Finland a global center point for wellbeing expertise, she believes.
Putting synthetic data into heavy-duty use is accelerated by the Wellbeing Data Lab community, a joint project between the University of Jyväskylä and the City of Jyväskylä. It develops synthetic data from exercise and wellbeing data, which is used to support the product development of companies working in the wellbeing field, as well as creating successful innovations. Companies like Terveystalo, Polar, Solita and Firstbeat are already involved in the project.
– In Jyväskylä alone, we have a huge arsenal of expertise in the field of wellbeing, which can collaboratively be processed into recommendations, decision-making support and innovation. However, no one can do this alone, which is why companies and researchers need to work together for change, she says.
Rautiainen encourages companies to be bold and get involved, if they are at all interested in using synthetic data.
– All companies that own organic data and have the motivation to start modeling synthetic data from it, can join.
Interested? Please contact Project Specialist Taija Lappeteläinen: email@example.com
Joonas Tuomikoski, Oskar Niemenoja, Jukka Perkiö and Nina Rautiainen spoke in the “Possibilities of Synthetic Wellbeing Data in The Product Development of Companies and Other Organizations” seminar arranged in Jyväskylä, on November 3, 2023.
The synthetic wellbeing data incubator project, Wellbeing Data Lab, is a joint project between the University of Jyväskylä (the Faculties of Faculty of Sport and Health Sciences and Information Technology) and the City of Jyväskylä, which develops synthetic data from data related to exercise and wellbeing to support multifaceted product development innovations in companies. The project is co-financed by the European Union.