A knowledge discovery mechanism to user requirement identification in building design Identificación de los requisitos del usuario en el sector de la construcción bajo mecanismos de descubrimiento del conocimiento

The purpose of this paper is to investigate how the knowledge of real estate market can be used to support user requirement identification. A construction project well adjusted to the user requirements increase value and causes minors changes during its life cycle. As a consequence, renewal, refurbishments, and demolition are less present, reducing waste generation, reworking and material consumption. It is especially important in housing customization markets. However, one of the challenges faced by designers is frequently concerned about how properly to identify user requirements, wishes and needs, which are on the essence of the briefing phase. In this context, real estate data can be useful to designers, since it reflects the users’ evaluation of the building attributes. The research strategy uses a knowledge discovery mechanism, composed by five steps: (1) formulation of a general database; (2) specific data selection using Case-Based Reasoning; (3) enrichment of data-sample; (4) development of hedonic price models using regression analysis; and (5) simulation of the value of design alternatives. Based on an application of an hedonic price model, using data from the medium-class housing market of Porto Alegre, Brazil, the main results indicate that adjusted price models have sufficient detailing and statistical precision to support decisions in the initial stage of design.


Introduction
The segment that involves the activities of architecture, engineering, and construction (AEC industry) has great importance for the society, regarding economic, social, and environmental aspects.A construction project well adjusted to user requirements increases building value, and probably it will have minor changes during its life cycle (Koskela, 2000).However, to identify user requirements is not a simple task for designers.This is especially important in housing customization markets, such as pre-sales segments (Juan et al., 2006;Shin et al., 2008).The challenge faced by designers is concerned about how to identify user requirements, wishes, and needs, since the initial stage of building design, known as briefing (RIBA, 2007).
The purpose of this paper is to investigate how the knowledge of real estate market can be used to support user requirement identification.We present a knowledge discovery mechanism designed to identify user's requirements and to support decision-making process.The proposal is based in hedonic price models which are generated through market transaction data.Paper discusses briefly design, actual systems to support briefing, and related aspects.In sequence, it is presented the proposal and some results obtained from a case study, concerning price modeling of middle-class market range apartments, using data from Porto Alegre, a southern Brazilian city.

Building design process 2.1 The role of user' requirements and the initial steps of design
Building design also can be regarded as a complex task.By its nature, building design is a creative process where problems and solutions emerge simultaneously, in rhetoric, persuasive, and exploratory ways.It requires the identification and weighting of different needs, requirements and wishes, which need be properly translated to the construction language to be incorporated into the final product.Design is multidisciplinary and has a significant influence on other processes, as well as on the final product, in terms of quality and value (Koskela, 2000;Macmillan et al., 2001;Tzortzopoulos et al., 2001).
There are some models developed to represent building design process.In the traditional RIBA's framework, the initial stage of the process of designing a building project is called Preparation, which is composed by appraisal ("identification of client's needs and objectives") and design brief ("development of initial statement of requirements").The second stage is Design, which is composed by conceptual design, design development, and technical design, followed by Pre-construction, Construction and Use stages (RIBA, 2007).In more recent times, demolition and design for deconstruction were included in the design agenda (Morgan and Stevenson, 2005).
An expressive part of value, cost, and waste generated in the life cycle of a building is defined in the initial phases of design.In some cases early decision in design define from 70% to 80% of the final cost (Bouchlaghem et al., 2006;Rafiq et al., 2005) and have considerable impact on the building performance (Wang et al., 2005).
Design process generated a large quantity of knowledge (Baldwin et al., 1999;Langford and Retik, 1996;Yusuf and Alshawi, 1999).The success of the final design solution depends on how design team can coordinate knowledge at the earliest possible time, after Meniru et al. (2003).However, time constraints and inadequate communication between client and architect and among the design team have negative influence on design quality (Ballard and Koskela, 1998;Kamara et al., 2000;Luck and McDonnell, 2006;Yu et al., 2005).Batty (1995) says that the lack of time available can have a number of consequences, including a lack of desire, on the part of the design team, in terms of taking risks with new materials or systems.
To identify properly user requirements in the early stage of design reduces the effort of re-design in subsequent stages.A good time to make decision is during briefing.Briefing is an initial step of designing process and often the most important, since it has to establish project goals and set a basis to develop conceptual design (Peña and Parshall, 2001).It's also known as architectural programming.Briefing was not needed in other times, when buildings were more simples.Since industrial revolution, buildings are increasingly specialized, requiring elaborate and specific briefs (Donia, 1998).Barrett et al. (1996) define briefing as a systemic process by which client/ user's ideas are made explicit and formalized.In general, briefing is also placed as a planning activity in the building design process.
Early approaches considered brief as a static document, produced at a specific point of time.However, authors argue that briefing must be dynamic and practiced throughout the design process (Aouad et al., 1998;Barrett et al., 1996;Tzortzopoulos et al., 2006).For instance, in the RIBA's Plan of Work, brief evolves since an "initial brief" until a "detailed project brief" and this evolution occurs in parallel with the development of conceptual and scheme design (Kamara et al., 2001;RIBA, 2007).Kamara et al. (2001) suggest that the briefing may involve two parts: a strategic program and the brief itself.
Other way to improve design is through Decision Support Systems (DSS).DSS are interactive, computer-based systems that help decision-makers to solve structured or unstructured problems, which have multiple attributes, objectives or goals (Power, 2002).It has been used basically two types of DSS in design: Expert Systems (also known as Knowledge-Based systems or Knowledge-Driven Decision Support Systems), and systems using Case-Based Reasoning or other Data Mining tools.
An Expert System has a knowledge base (KB) and an inference engine.A KB is a collection of organized knowledge, rules, and procedures.The active component is the inference engine, which contain rules elicited from a domain expert.It is built using explicit, structured knowledge (Power, 2002).Some ES has been proposed in design, with applications on building refurbishment (Kaklauskas et al., 2005;Zavadskas et al., 2006), semi-automated design of housing (González-Uriel and Roanes-Lozano, 2004), and housing evaluation (Natividade-Jesus et al., 2007).
More focus has been also given to DSS using Case-Based Reasoning (CBR).CBR is an usual tool to reasoning and learning with successful applications in several fields (Watson, 1997).This technique considers solutions for new problems using as base solutions adopted for previous problems (described as "cases"), identified in a data base through a mechanism of case selection based on the similarity among problem case and each case available in database (Aamodt and Plaza, 1994;Kolodner, 1993;Watson, 1997).CBR have some advantages upon ES, such as it does not need explicit models to obtain problem solution and it has flexibility to work with great amounts of data.A further advantage is that CBR has the possibility to learn with new cases, being easy to keep the application updated.A weakness of CBR is to adjust selected cases to perform numerical results (Watson, 1997).CBR has been applied in design since the end of 80s (Maher, 1987;Pearce et al., 1992).
There are some applications of CBR in briefing.SEED-Pro search for client needs, budget and constraints, generating an architectural program (Akin et al., 1995;Donia, 1998).Marir et al. (2000) describes a CBR system designed to improve the specification of construction projects that involves the integration of information and supports concurrent engineering and decision making for the effective management and realization of all stages of a construction project's lifecycle.Van Leeuwen et al. (2000) propose a system for housing refurbishment, which uses knowledge about architectural design, cost, and building products.The system considers user requirements through a CBR system that have a case base with a number of typical housing layouts.Serpell and Rueda (2007) developed a CBR system to briefing, using a three step process that search for similar cases into definition of a new project.

Although the literature presents several efforts to improve the building design process, most of the examples consider designer knowledge (managing design/team knowledge or using previous experiences, expressed in cases or rules elicited from experts).
There is relatively little effort in applications to discover building requirements based on user knowledge.However, it is possible to identify the preferences of building customers from real estate market information, regarding the price paid for each property.Since customers' decisions are mainly economically rational, prices are proportional to the quality level perceived or, in a larger sense, the quality level perceived by users.In other words, price is a proxy variable to product' value.
Properties may be considered as multi-dimensional commodities, considering the simultaneous influence of several characteristics that form the final price.Thus, properties are heterogeneous goods and have a unique bundle of attributes.They differ in terms of design, size, inner configuration, construction quality, and location, for example.Therefore, there is a great variety of products in real estate market, and their heterogeneity makes the direct comparison a difficult task.As a consequence, initially it is hard to understand the relative importance of each property characteristic, with respect of its participation in the final price.It is only known the total price (Harvey, 1996;Lavender, 1990;Robinson, 1979).

Hedonic price models (HPM) search to establish the relationship between the property price and its characteristics.
In hedonic models, the goods are described through a "bundle of attributes", congregating the characteristics that are important.As the partial prices related to each attribute cannot be directly isolated, because it does not have specific markets for each one, the prices are obtained indirectly.The implicit prices of each one of these attributes, also called hedonic prices or "shadow prices", are the prices related with each attribute of the property.By regressing the characteristics of the building on the observed price, it can be extracted the contribution of each attribute on the total price.The tool generally used to obtain the coefficients that measure these contributions is multiple regression analysis (MRA), a well known statistical tool, used in almost all hedonic studies.Following the hedonic price theory, the coefficients represent the prices that the purchasers are willing to pay, on average, for these characteristics (Rosen, 1974;Sheppard, 1999).
To give effective support to designers, these models must be sustainable, meaning that they might be based on sound data.The analysis of large databases is seen as a typical challenge to be approached by knowledge discovery in databases (KDD).KDD is a relatively novel approach to data analysis.It consists of a special data organization and techniques to allow the revelation of knowledge that presumably is occult in the data.This area appeared in the end of 80's, as an alternative for the analysis of very large databases (Fayyad et al., 1996).
The knowledge discovery occurs through different phases, composed by three basic stages of the process: preprocessing, selection of relevant data, and data mining.The preprocessing phase includes data collection and preparation.It may use several statistical techniques, such as clustering, multiple regression analysis, factor analysis, sampling, and descriptive statistics.The output of this stage is a reliable database.The second stage is data selection, looking for relevant data for each problem.It uses sampling, CBR or clustering, for instance.In the data mining phase other techniques may be chosen to solve the knowledge discovery problem, such as neural networks, multiple regression analysis, clustering, case-based reasoning, genetic algorithms and fuzzy rule-based systems (Berry and Linoff, 2000;Hair et al., 1998;Pyle, 1999).There are some studies in construction field, investigating delay in construction projects (Kim et al., 2008;Soibelman and Kim, 2002).

Proposed system
This study considers Knowledge Discovery in Databases (KDD) paradigm in a hybrid system, using Case-Based Reasoning (CBR) and Multiple Regression Analysis (MRA) for the construction of hedonic price models.Investigation presented was based in a simulation using actual real estate data.
KDD was included because it permits manage large databases.CBR is a convenient way to select properties, using a similarity index to find relevant properties (those are similar to the building in study).As CBR has difficulties to manage and to adapt numerical data, MRA is used to generate quality indexes, based on the relationship of property price and its characteristics.This kind of relationship is presented in equation form (a hedonic price model).
However, to develop studies using real estate market it is need to include two additional steps in the basic scheme of KDD/CBR.Detailed samples are necessary to generate specific price models.As properties in general have a large set of important attributes, detailing all cases in a large database since first data collecting is not viable.The solution for this apparent paradox is to collect basic information for all cases (sufficient to permit case selection in CBR) and to complement data sample after the selection, as well as, to make total collecting work only for selected properties (sufficient to generate useful HPM).
In traditional CBR' scheme some similar cases are selected which will be adapted to provide a solution.In this view, the final result is provided by CBR itself.The second difference in our proposal is that the system to select a sample to be modeled, using an additional tool to generate the result (MRA).Then, the system proposed is composed by five stages: (1) To create database -to collect and preprocess data from real estate market; (2) To select a sample with relevant data through CBR; (3) To enrich data (detailing the sample); (4) To generate HPM using MRA; (5) To simulate value of different alternatives in design process.
Figure 1 shows the general configuration of proposed DSS.
As follow, this system is demonstrated trough an empirical study consisting of the estimation and use of a price model considering data from middle-class segment apartments in Porto Alegre, a Southern Brazilian city.The first four stages are presented in this section and the fifth is on the next section.

Creating a database
This stage of the study is based on similar research about the real estate modeling (Author, 2006).Initially, it had been collected data about local real estate market, forming a database composed by more than 30,000 cases.The information about these units was obtained from Sales Tax files in the Porto Alegre Tax Department.
In this phase, it was collected basic data, such as sales price (as declared by taxpayer), privative and total floor area, year of the building completion, and the construction quality level.In sequence, it was collected variables indicative of the location quality, such as the distance from the central business district and from the main centers of commerce and leisure.It was also collected qualitative variable indicating the quality of the neighborhood of the buildings.After the data collection phase, preprocessing operations were conducted, resulting in a data set ready for general modeling.

Selecting data
It was simulated a design process of apartments on presales market.The relevant data for the segment studied was selected through CBR, considering apartment units in the medium class regions of the city.Selection was made using basic case attributes, such as sales prices, floor area and location.It was used a similarity mechanism based on nearest neighborhood algorithm (K-NN).At all, it was formed a dataset composed by 110 apartment units.

Enriching data
The sample was detailed by the use of sources such as design plans and building photos.Important information of this market segment was identified, as number of bedrooms and bathrooms; existence of fireplace, barbecue equipment, balcony and laundry; number of privative parking spaces; characteristics of the building leisure (presence of swimming pool, sauna, spaces for sports and playground), as well as the quality of the construction.
The region considered is relatively uniform, with respect to accessibility and neighborhood quality.The sample is composed of new properties, and prices were relatively stable in the period of data collection.Therefore, it was not necessary to include variables such as age, location, neighborhood, and time of sale.

Generating HPM
To generate the hedonic price modeling, different quantitative variables were considered: the sale price (€), the floor area (m²), the number of simple bedrooms, the number of bedrooms with an exclusive bathroom, the number of leisure spaces (including swimming pool/sauna/fitness center/ playground), the number of privative parking spaces and the assessment of building quality (from 1 -very poor to 10 -very good).
Dummy variables were regarded, with respect to the presence of the follow elements: bathtub, home office, balcony with barbecue equipment, simple balcony, fireplace, and laundry space.The dummy variable were considered equals to one if the property has the presence of the attribute, and zero otherwise.
Equation 1 presents the basic model, while • BQ is the assessment of building quality; • a i are the coefficients to be estimated; and • e is the stochastic term.
While the majority of these attributes is commonly used in hedonic studies (Ball, 1973;Boyle and Kiel, 2001;Chau and Wong, 2004;Chau et al., 2001;Din et al., 2001;Smith et al., 1988), it may be considered the local cultural differences.For instance, the importance of balconies was demonstrated by Chau and Wong (2004).
In the present case, the attribute "balcony with barbecue equipment" (BBE) means an important characteristic for the real estate market in Porto Alegre.The local culture privileges the consumption of meat baked in meetings with family and friends.The inclusion of barbecue equipment in uppermiddle market segment apartments in the city of Porto Alegre was initiated about 30 years ago, and it was quickly spread for other kinds of properties and also on surrounding ones.It consists of a strong sales appeal, and for many cases, the design of the privative area is defined using this element as principal start point.
Also, the presence of a home office in a separated room (HOr) is considered a very important space in this segment.By another hand, the building leisure was represented in one variable, counting the presence of different elements (swimming pool, sauna, fitness center, and playground), based on results from initial studies that indicated that the elements individually had no statistical importance.

Statistical analysis of HPM
The regression results from the sample of the data selected are shown in Table 2.It should be mentioned that some other models have been developed, and linear specification provided better results.

The equation of the HPM, regarding the variables and the coefficients from Table 2 is presented in Equation 2:
As presented in Table 2, all variables have shown importance and coefficients are significant at 5 percent level or better.The results indicate a high adjusted R 2 of 0.9694.The analysis of graphs with collected prices x estimated prices, and estimated prices x standardized errors, although are not shown here, did not indicate the of trends or other statistical problems, such as autocorrelation, not constant error variance, etc.Likewise, the correlation analysis did not point out strong correlations among the explanatory variables, since they did not exceed 0.6.Therefore, it may be concluded that the model presents adequate statistical conditions and can be helpful for market analysis, in terms of value estimation.The analysis of the coefficients set allows the comparison among variables and the identification of alternatives with a major aggregated value.For example, results indicate that a simple bedroom (SBr) or a collective space for gymnastics, included in the leisure spaces (LE), adds similar values for the users in the total price of the property.In this example, the hedonic price of SBr is 4.5 thousand Euros and the hedonic price of LE is 4.4 thousand Euros.On the other hand, the difference between the price variation considering simple bedroom (SBr,4.5 thousand Euros) and a bedroom equipped with a bathroom (BBr,16.7 thousand Euros) certainly would justify the inclusion of the bathroom as a better alternative.
However, in other cases, the importance of some attributes is not too clear.For example, the hedonic coefficients of some variables such as fireplace (FP), bathtub (BT), and with barbecue equipment (BBE) represent not exclusively the named equipment, but actually a larger set of elements.In fact, they are emblematic elements and work as value signs, indicating a superior category of the property.They have also strong influence on design.For instance, the inclusion of a balcony or a fireplace causes the increasing and redistribution of inner spaces, and a bathtub causes the magnifying of the bathroom area.Most of the time, it is very difficult to add those elements, during the production phase.After the conclusion of the production, it is more difficult, or even impossible.Designers should consider those elements since the first drafts, reinforcing the importance the knowledge to support the early design stages.
In most of the properties of the sample, balconies with barbecue equipment (BBE) are relatively large spaces, from about 8 to 15m 2 and they also play as an extra living/dining room.Simple balconies (SBa) are located, mostly, in the bedrooms and have distinct functions: to promote the contact with the exterior space and to improve the natural ventilation.In general, they are smaller, around 6m 2 .From a comparison of both hedonic prices (BBE = 23.9thousand Euros; SBa = 6.5 thousand Euros) it is possible to conclude that simple balconies are less intense used and have minor importance for the users.

Simulation using information from HPM
When a designer is faced to a new project composed by several 100 m² apartment units, for example, different configurations are possible.In this case, the value of distinct elements should be helpful to guide the designer, in order to maximize the user preferences and the final price of the project.In this context, as follow it is presented a demonstration of two situations, using the results obtained in the HPM generated.Both examples consist of a 100m² apartment with two parking spaces, regarding a building leisure composed of swimming pool and playground, with a building quality level equals to seven.
Table 3 shows the description of the two situations, based on the variables considered in the HPM.
Equations 3 and 4 shows the equations of Situation A and B, as well as and the sale price resulted from both.

Equation of Situation A Equation of Situation B
According to Equations 3 and 4, Situation B shows higher sale price estimation, compared to the sale price of Situation A. Since both situations may have approximately the same building costs because of the same area, building quality, and same leisure, Situation B shows a significant financial and economical advantage.The difference of both sale prices is around 35 thousand Euros.
Also, regarding the fact that Situation B strongly regards the market preferences, because of the real state data used, it is expected less refurbishment, rebuilding, and demolition during the life cycle of the building.For example, to transform the apartment of Situation A to the apartment of Situation B, it would be necessary to remove something like 13m 2 , generating around 2m 3 of construction and demolition waste amount.Home office (yes = 1; no = 0) (HOr)/ Escritorio (si = 1; no = 0) (HOr) 0 1 Balcony with barbecue equipment (yes = 1; no = 0) (BBE)/ Terraza equipada con barbacoa (si = 1; no = 0) (BBE) 0 1 Simple balcony (yes = 1; no = 0) (SBa)/ Terraza simple (si = 1; no = 0) (SBa) 1 0 Laundry space (yes = 1; no = 0) (LA)/ Espacio para lavadero (si = 1; no = 0) (LA) 0 1 Leisure spaces (yes = 1; no = 0) (LE)/ Espacios para recreación (si = 1; no = 0) (LE) 2 2 Privative parking spaces (units) (PPS)/ Espacios para estacionamiento privado (unidades) (PPS) 2 2 Building quality (from 1 to 10) (BQ)/ Calidad de la construcción (de 1 a 10) (BQ) 7 7 One may conclude that the hedonic model (Equation 2) have good potential to be used in design.They can be used to support decisions in the early design phase, being helpful in trade-offs situations, making it possible to evaluate the impact of different configuration decisions on the final sale price.By this way, it can be used for estimate values and designers can simulate the value of some alternatives, deciding for that ones that result in greater market value.

Conclusions
This study proposes a novel approach, since it suggests the use of market real estate data to discover user' requirements.It is a hybrid KDD system, using the techniques of CBR for case selection and multiple regression analysis for the generation of price models.The principle behind the system is that there is occult knowledge about user requirements in sales data, which maybe discovered through hedonic price models.
One application was presented, based actual property data from a Brazilian city.The hedonic model calculated had a good statistical performance.
As main advantages, five points can be highlighted.Firstly, it is a numerical model that allows simulation and designer can study a mix of different options, exploring optimal or best solution in terms of the main user' requirements.Secondly, the model can be obtained through actual real estate market data.Thirdly, the information obtained might be used to support decision in the very early stage of design process, when most of the total cost and price of the project are defined.Fourthly, the use of automated tools may improve the efficiency of the process, since the results are easily and quickly obtained if the database is organized.Finally, HPM identify economically viable choices, since the model are based on actual market data.
In summary, hedonic price models can be understood as a sound way to identify requirements and preferences of buildings users through an indirect form.Thus, the phase of conception of new products can become more objective and intelligent.