- Home
- The Pulse
- The Pulse: Articles
- Understanding Europe’s Housing Market: How Bayesian Networks Reveal Hidden Dynamics
Understanding Europe’s housing market: How Bayesian networks reveal hidden dynamics
Europe is gripped by a housing crisis. Rapid urban migration and the much-criticized property speculation boom have led to chronic shortages in city centers. Planners and private developers are racing to fill these gaps, but to provide genuine solutions, they must build where the demand is. Given the increasing diversity of European cities and the new work-life patterns birthed by the pandemic, this isn’t as easy as it sounds.
Bayesian networks provide a way to cut through the confusion. While other mathematical models focus on the relationship between cause and effect, Bayesian networks identify each of the various causes at play and explore their relationships with one another, showing which factors are most important to the overall outcome. Not only that, they work on probabilities rather than certainties, an important caveat for the volatile real estate market.
Dr. Manuele Leonelli, Associate Professor of Statistics at IE University, and Álvaro García Murga, student of the Dual Degree in Business Administration + Data & Business Analytics, applied the Bayesian lens to analyze real estate data from Spain’s biggest three cities: Madrid, Barcelona and Valencia. The results clearly demonstrate that each city has its own set of house price drivers, meaning that a “must-have” in Madrid is barely even a “nice-to-have” in Valencia.
The results are relevant to any city that’s grappling with a housing shortage and wants to put the right solutions in place. By identifying what people are looking for in their property, we can build cities that truly understand the people who inhabit them.
Putting the data into context
The study came about partly by chance. Through LinkedIn, Dr. Leonelli came across a vast dataset compiled by Idealista, Spain’s biggest property website, including 180,000 geo-referenced housing listings.
Dr. Leonelli was immediately struck by the size of the data. It’s unusual to have massive data sets on this kind of topic, available free of charge. There was a personal motivation, too; as a tenant in Madrid, he is accustomed to the idiosyncrasies of the Spanish property market and relished the chance to get a better understanding of the situation.
Spain is one of the fastest-growing economies in Europe and the world’s second-most popular tourist destination, but its biggest cities are struggling to keep up. The Bank of Spain estimates the current housing deficit at around 700,000 properties, and the suffocating demand, exacerbated by the so-called “Airbnb effect,” is driving rents beyond what many ordinary people can pay.
“Having some understanding of the Madrid market and of Barcelona and Valencia meant we could really understand what the data was telling us,” explains Dr. Leonelli. “A model can tell you something, data can tell you something, but it's the researcher who puts it into context.”
How Bayesian networks provide transparency for housing data
At their simplest, Bayesian networks provide a way of managing uncertainty by looking at the traditional cause-and-effect relationship from both directions. They examine individual variables and assess their likelihood of causing a particular outcome, but they also analyze the outcome and predict which factors are most likely to have caused it. Each variable is treated as a probability, and updating one factor will affect the probability of all. If one factor is found to be absent, it will increase the likelihood of the others.
Bayesian models have long been used in environmental modelling, and they are commonly applied in medical research to study everything from arthritis to neuroscience. In real estate, however, researchers have typically used other probabilistic mathematical approaches, such as linear regression, which identifies the most important pricing factors by drawing a best-fit line between them on a graph. However, Dr. Leonelli believes that Bayesian networks are ideally suited to the property world, as they illustrate how the ecosystem actually works and may lead to more realistic predictions.
“Traditional econometric models are about measuring the effect of a specific variable that we’re interested in: in this case, prices. Those models are designed to find the effect that a particular input has on the output. Bayesian networks do something a little bit different. They model how the inputs interact with each other.”
Simplicity is another advantage. AI and machine learning may already be able to predict house prices with a certain degree of accuracy, but many models are “black box,” meaning that they don’t explain their calculations. Bayesian networks are scalable and transparent. At a time when governments around the world are being criticized for perceived secrecy, the Bayesian approach can help them justify decisions around taxation and investment.
“How could a government say, ‘We implemented this policy because an AI model told us to,’ and we have no understanding as to why?” Dr. Leonelli points out. “Housing is something that the public, the government and the private sector are all interested in. We want to create something that supports humans.”
Crunching the numbers
In building this robust, human-centric model, the Idealista dataset provided a perfect launchpad. Each listing came packed with information, including the property’s latitude, longitude and key structural attributes, such as the number of bedrooms, year of construction and distances to key reference points such as the city center. Using data from Google Maps and Geopy, which converts addresses and landmarks into geographic coordinates, Dr. Leonelli and his team were able to add additional factors like the distance to the closest park, supermarket and other facilities.
They then condensed their vast trove of data into three primary variables: structural factors (such as the age of the building, the constructed area, and the number of rooms and bathrooms), spatial information (distance to essential points nearby) and amenities (such as garden, terrace or swimming pool). While the desirability or social status of an area is impossible to gauge with raw data, the spatial variable considered the number of properties available nearby, which may indirectly indicate popularity.
Perhaps the most notable aspect of the research was bootstrapping, which essentially involves drawing random samples of the dataset over and over again to road-test results and increase the overall sample size without requiring fresh inputs.
In all, the team created 200 models from their data and identified the features that were consistent. “With bootstrapping, your conclusions are much more likely, and you are more certain that what you are actually estimating is a signal of a true relationship in the data,” Dr. Leonelli explains.
Matching the data with the reality
These dozens of models provided a number of clear patterns, with each of the three cities displaying its own very specific pricing drivers.
While prices in all three cities were influenced by access to the most affluent corridors, such as Las Ramblas in Barcelona or Valencia’s Blasco Ibañez, important contrasts emerged beyond these alluring enclaves. Prices in Madrid were driven by amenities inside the building, such as terraces, swimming pools and lifts. Barcelona’s prices were most heavily influenced by the type and condition of the property, as well as access to local supermarkets. Valencia’s were dominated by structural fundamentals, notably the age of the building.
Dr. Leonelli explains, “The models really told us that prices are driven by very distinct dynamics in the three cities. In a way, you might expect that, but you would also expect that some things would be universal: amenities, for example. You would think that they’re always important, but what we found was that in different cities, they matter more or less.”
It is easy to speculate on the reasons. Madrid’s average income far outstrips any other city in Spain, and this affluence, combined with its interior location, may explain the pull of desirable amenities such as swimming pools and terraces. Barcelona, on the other hand, is a densely packed city with little space for cars, which may underpin the desire for nearby retail outlets.
The biggest takeaway right now, Dr. Leonelli says, is the evidence that large differences exist across locations. Analyzing this data allowed him to reaffirm what he himself has experienced as a resident in Spain. And as he notes, when the data matches reality, it’s a good sign that the model is correct.
Solving the housing puzzle
Dr. Leonelli is at pains to stress that his model cannot predict the evolution of the housing market. Its purpose is to illustrate the fundamental drivers of the housing market to facilitate smarter decisions in the future. In this, he believes his model has provided a good proof of concept.
Given the clarity of the results obtained from his study, Dr. Leonelli feels there is clear potential for the further application of Bayesian statistics to real estate trends. “The model has already proven that we could actually support public decisions [with data]. It would be fantastic to bring this work forward and get policymakers or the private sector involved, because they could also give us more information to make it even better.”
He hopes to take this model further and study other European cities, but he highlights the importance of local knowledge and context in correctly analyzing datasets. Just as Spain’s cities each have unique factors affecting property demand, so will each country in Europe have its own real estate riddle to solve: a puzzle that combines culture, economy and trends, condensed into four walls.
Explore More from This Research Series
Explore More from This Research Series