Machine translations by Deepl

Connected cars generate seas of data

Privacy First follows the data trail of connected cars, starting at the beginning: the hundreds of sensors that generate endless streams of ones and zeros. Part of the many Gigabytes of data generated per hour by a modern car is passed on to the manufacturer. The advent of 5G opens up possibilities for them and others that were not there until recently.  


This piece in five points:

  • Modern cars are equipped with hundreds of sensors that constantly measure and monitor all kinds of things. These days, that includes cameras, radar, lidar and (ultrasonic) microphones that scan the vehicle's surroundings for the purpose of driver assistance systems.

  • Collectively, those sensors generate seas of Probe Vehicle Data - many Gigabytes per hour. This ranges from technical data that gives insight into the condition of the vehicle to biometric data of its occupants.

  • Floating vehicle data also exist: this is the type of data that, while 'coming from the car', is not generated by the car itself. Its source is external (navigation) boxes or mobile phones (car apps).

  • The data arising from the car itself will be onboard in real time processed and analysed using edge computing, and to a large extent, that data is also immediately deleted. A relatively small part of the data is sent (compressed) to the manufacturer.

  • To get data from the car to the cloud, car manufacturers are working closely with telecom companies, among others. Whereas the capabilities of the 4G network were still quite limited, with the arrival of 5G they have been stretched considerably. 5G should provide a big step in Europe in terms of Connected and Automated Mobility.

The status of the right rear door, that of the front left window, the position of the sunroof, the position of the gear lever, the angle of the seats, the pressure of the tyres, the oil level - the list is endless. These are just some of the 246 vehicle data points in the data catalogue from Caruso ('From Connected Cars to Connected Business'). This German company processes vehicle data from 16 major car manufacturers, including Audi, Fiat, Ford, Peugeot and Renault.

The also German High Mobility ('Powering your business with car data') is in the same market and serves 15 car manufacturers. This company differentiates 58 categories (!) vehicle data and processes as much as 668 different types of data. Otonomo, a US competitor of Caruso and High Mobility, says it receives on average more than 3.4 billion sensor readings from 50 million cars a day. (More on such companies next time.)

400 Sensors

The numbers quoted are good indications of the amount of sensors in modern cars. After all, all the data that data processors like Caruso receive are generated by sensors. How many there are varies from model to model. A new, slightly more expensive car may easily contain 400 sensors and there are more and more.

Manufacturers can fit as many sensors as they like: the EU does not place any restrictions on this (in the European regulations for vehicle type-approval, the word 'sensor' appears only four times before). As a result of driving assistance systems mandated by the EU (Advanced Driver Assistance Systems, ADAS), the number of sensors in cars has skyrocketed in recent years. New types have also been added to scan the vehicle's environment, such as cameras, radar, lidar and (ultrasonic) microphones.

Ultrasonic sensors are used to determine the distance to objects, which is why you hear those nervous beeps while parking. Cameras (3D) are good at detecting small, stationary objects; radar (radio waves) and lidar (laser pulses, 2D) are better at detecting moving objects, especially at night or in poor weather conditions. Radar and lidar will play a more prominent role especially in fully autonomous vehicles.

In today's driver assistance systems, cameras are mainly important. Cameras at the front of the car are most common, but they are also increasingly built in at the rear and at the sides (and in the interior). With eight cameras, a Tesla Model 3 everything around it, up to 250 metres away.

Electronic Control Units (ECUs)
In the late 1970s, General Motors introduced the Electronic Control Unit (ECU), a small computer that based on what sensors transmit - control one or more components: the drive train, the brakes, the airbags, and so on. Today, a high-end car easily has around 150 ECUs, all connected to the CAN bus, the vehicle's internal communication network. Also connected to this is one of the most important ECUs, the Telematics Control Unit (TCU) that makes the car 'connected'. Equipped with one or more SIM cards, the TCU takes care of (external) communication with the car manufacturer, with other cars (V2V) or with the infrastructure (V2X).

V2V and V2X are still in their infancy. For now, the car mainly talks to the manufacturer. But what kind of data are we talking about exactly?

Probe Vehicle Data opposite Floating Vehicle Data

In general terms, cars involve three types of data:

  • data about vehicles (manufacturing and maintenance history)
  • data from vehicles (technical data for repair parts)
  • data from vehicles (broken down below)

This article is mainly about data from vehicles, which are also called Probe Vehicle Data are called, unlike Floating vehicle data. The latter is the type of data that, while 'coming from the car', is not generated by the car itself. Its source is external (navigation) boxes or mobile phones (car apps) that track locations, routes and driving behaviour, among other things. In the Netherlands alone, there are about a hundred service providers which work with boxes installed 'afterwards' in the car. Floating Vehicle Data are at least as useful and valuable to different parties as Probe Vehicle Data.

Probe Vehicle Data, thus data from vehicles, can be divided into four types:

  • technical data
  • user data
  • infotainment data
  • biometric data

Well considered, there is a spectrum. With, on the one hand, data that is purely technical in nature, and on the other: pure personal data. Legally, it is tricky that there is a lot of overlap between these two types of data.

Technical data give insight into the overall condition and functioning of the car.

User data go over all the settings, such as the seats, mirrors, lights, wipers, and on-board systems (such as cruise control). But your driving behaviour is also recorded: how is steering, shifting, accelerating and braking? As well as location data: where have you been when?

Infotainment data (for learning purposes) relate to the use of radio and navigation, among others, and all kinds of services offered for free or as subscriptions through the dashboard screen.

Biometric data are also increasingly recorded these days: a camera in the rear-view mirror, for example, films the driver and any occupants. More and more cars are equipped with voice control and recognise your voice. It also measures, for example, how much you weigh as soon as you sit down, whether you have consumed alcohol, whether you doze off at the wheel, and what your heart rate is. What happens to this data, and who gets hold of it, is often not clear.

Diagnostic error codes

All in all, given the number of sensors, there seems to be very little that car manufacturers do not want to know about. The data they receive is especially useful for early detection of defects and areas of concern for maintenance. Diagnostic fault codes (Diagnostic Trouble Codes, DTC) based on which it is clear which parts are due for overhaul.

In addition, the data may have commercial value. That value increases as that data can be combined with data already collected previously, or with data from other cars in the neighbourhood, or of the same make.

Yet several insiders tell Privacy First that car manufacturers are far from knowing exactly what they can, or should, do with all the data, and are also more careful with it than is often claimed.

Notice Then again, Reuters' early April report on Tesla is not exactly reassuring. Employees of the electric carmaker appear to have shared all sorts of sensitive (video) footage from its cars between 2019 and 2022, including of a completely naked man. In 2020, the brand received a Big Brother Award because it would structurally survey occupants and the vicinity of the vehicles and flout privacy laws.

The question is, however, to what extent US-based Tesla, which has been frequently discredited, is representative of the entire car market. The fact is, however, that when such data is collected legitimately or not - and Tesla is certainly not alone in this - breaches or leaks are always lurking.

25 Gb or a multiple?

Back to vehicle data. If you read about that, you come across one figure everywhere: 25 Gigabytes. That's how much data a modern car would generate per hour. We too mentioned this value in a previous article, but it is good to revisit it. The original source of this otherwise unsubstantiated figure appears to be a 'white paper' from 2015 by Japanese concern Hitachi. By now, we are eight years on and the latest cars can't help but spit out an even greater amount of zeros and ones.

After all, in the white paper in question, Hitachi itself talks about "exponential data growth" in connection with connected cars, and also reports that test cars equipped with cameras and additional sensors (at the time) generated ten times that 25 Gb per hour. What only applied to test cars just under a decade ago is now the practice for many vehicles.

According to the FD we are now talking about 1400 Gb - i.e. almost one and a half terabytes - per hour. The volumes are so large mainly because of all the HD camera footage. So a big increase, but still peanuts compared to what is to come.

So quantified the then CEO of chip and computer components manufacturer Intel back in 2016 that self-driving cars will start generating as much as 160 times more data per hour than the then-standard 25 Gb. That amounts to 4,000 Gb - or 4 terabytes - per hour, and is equivalent to the combined data production of about 3,000 people simultaneously accessing the internet. However, an employee of Lucid Motors, a US electric car manufacturer, estimated that amount in 2017 was still many times higher in: up to 19 terabytes per hour.

Slide from presentation on self-driving cars by Stephan Heinrich.
A slide from a presentation on self-driving cars by Stephan Heinrich, former systems architect at Lucid Motors, 2017.

This is well beyond our imagination by now. For most people, these will be no more than fun facts. Moreover, self-driving cars are still only in the future. However, this does not alter the fact that more and more autonomous (safety) systems in vehicles are being added and more and more data is needed to make them work properly. (Those driver-assistance systems, by the way, often still work far from flawlessly today).

What data is relevant?

The question remains of how manufacturers manage this data explosion for themselves. An interplay of hardware (advanced flash memory technology) and high-speed data storage software enables the simultaneous processing of multiple large data streams.

Before a small portion of vehicle data - via an encrypted connection - reaches the manufacturer's cloud, that data is first analysed and processed in the vehicle itself, in real-time. This is done on the basis of edge computing. It is so called because this process takes place right next to the sensors and controllers in the car: 'at the edge' of the data source. A platform favoured by car manufacturers for this is the LinkedIn-developed Apache Kafka.

The car distinguishes between relevant data and irrelevant data:

  • Relevant data are used to complete a task, and/or sent compressed to the cloud. Compressing makes sense because many sensor values remain constant for some time and it is too costly (and adds nothing) to send the same code, say, 500 times in a row.
  • Irrelevant data be deleted, either immediately or within 24 hours. This concerns the bulk of the data. What is considered relevant (at what time) and what is not is determined by the manufacturer, who devises customisable algorithms for this. Those algorithms also include the interval at which data is sent. This happens between one and six times a minute, depending on the brand and model.

By no means all data is sent

So by no means all data leaves the connected car. All the data from all the hundreds of millions of vehicles connected to the internet would also have no way of all getting to the cloud (in time): it would be pointless, would not be able to handle the bandwidth of the mobile network and would also cost a huge amount of money.

However, data transmission for devices (including vehicles) equipped with special Machine-2-Machine (M2M) SIM cards with 12-digit, 097 numbers is cheaper than for 'normal' SIM cards with 10-digit, 06 numbers. Manufacturers are also buying such large volumes that this will reduce purchase prices from telecom providers.

A representative of a large European-based processor of vehicle data coming directly from cars reveals - on condition of anonymity - that his company's servers receive an average of one to one and a half Gigabytes of data per car per day. How much exactly depends heavily on the make, model and type of vehicle, year of manufacture and also the driving assistance systems in place. (The car manufacturers themselves receive even more data, but do not make everything available for third-party processing).

Third-party boxes that are retrofitted - to come back to this for completeness - involve a considerably smaller amount of data. With such boxes, the (data connection of the) car manufacturer plays no role whatsoever. As a customer of the box, you basically pay for that yourself, while car manufacturers pay for their own data transfer.

Erik Kamps - the CEO of Crossyn, a Dutch company that mainly provides services based on data from boxes - says that the number of messages a vehicle sends varies between 24 and 48 million per month. These are then - per message - numbers or simple input/output values (I/O) of a few bytes. Per month, a car sends a maximum of about 300 Mb to the cloud. In short, this involves a lot of 'records', but only little volume.

Retrofitted boxes do not have access to all car data: they have a relatively limited dataset. The dataset of built-in, 'ex-factory' boxes is much more extensive and thus provides more bytes.

A quantity of, say, one GB during a day normally goes to the cloud over 4G without too much trouble, and with ease over 5G. Whereas the capabilities of the 4G network were still quite limited, with the advent of 5G they have been stretched considerably: this makes it easier to actually transfer larger amounts of data over the air to get, for example, if all kinds of camera images also need to be transmitted following an accident. There will be more and more cars on the market that support 5G technology.

5G paves the way for Connected and Automated Mobility (CAM)

The EU has commissioned a lot of research into 5G in recent years and Connected and Automated Mobility (CAM), particularly within the projects 5G Mobix and 5G Blueprint. This has included an extensive look at the cyber security and privacy aspects of connected cars (see, for example, pp. 59-69 of this report).

Regarding the 3.5 Gigahertz (GHz) 5G frequency band used for nationwide mobile communications in the Netherlands, declared KPN after tests with smart vehicles several years ago that self-driving cars and intelligent transport systems benefit from the huge capacity, fast communication capabilities, high reliability and minimal network delay that this spectrum offers. ''The network shows no hiccups at all, even if there is a whole bus of schoolchildren at the traffic lights, all watching Netflix on their smartphones.''

With the arrival of 5G and - in the future - 6G and self-driving cars, data transfer to the cloud will increase significantly, though. In order to minimise this transfer for really large amounts of data (the 'latency' feared by manufacturers), edge computing will become increasingly important and, if necessary, another processing layer can be pushed in between: in that case, it is called 'fog computing'. Edge (and fog) computing thus do the necessary pre(sorting) work, the server farms of car manufacturers and affiliated data processors that make up the cloud provide further storage, processing, analysis and visualisation.

Also for the purpose of providing services in the field of mobility and data exchange within the automotive industry, the EU is investing over two billion euros in programmes to further develop edge computing and next-generation internet and cloud services. To this end, the European Alliance for Industrial Data, Edge and Cloud created. It is just one of many initiatives from Brussels in the field of the Internet of Things, of which connected cars are the most striking sign.