Inside IT
Smart Cities, Data Warehouses, Data Lakes and the Information Management Challenge


As we consider smart cities, great attention is needed to consider the challenges of building the data infrastructure to support the information management strategy of the digital ecosystem driving the digital transformation of our cities.


Over the last year, I had the opportunity to bring my research into the classroom in two recent occasions to look at Smart Cities. The first was an intensive summer course looking at the digital transformation of organizations. The students were an excellent group of undergraduates from Beihang University and Zheijang University participating in the China Executive Leadership Programs at the University of Illinois. The second are an outstanding group of graduate students at the Graduate School of Library and Information Science at the University of Illinois who are looking at digital government and Smart Cities.

We studied how leading industrial, commercial and government organizations are evolving their service strategies around “big social data”, an emerging term joining the domains of social media and big data[1]. Big social data looks at how best to glean knowledge and insight about the activities of people from social media data sources. These data sources are very large and disparate; collected in multiple formats, structured in some cases, unstructured or semi-structured in other cases; and, stored in static repositories or streamed in real-time.

Diversity People Big Data Working Teamwork Discussion Concept

We then turned our attention to smart cities. Smart Cities is an emerging field that brings together digital information and communication technology with local communities to build a sociotechnical infrastructure to use data and software to empower governments, business and civil society, and citizens to improve city services. Smart Cities include using Internet of Things (IoT) and Cyber-Physical systems enmeshed with a collection of interconnected smart devices and systems in transportation, energy, manufacturing, environmental management, and healthcare.

Smart Grid and Smart City image illustration

There are many examples worldwide where cities are deploying technology to drive improvements in electricity production and consumption (smart grid), traffic management and parking (smart traffic), lighting, signage, buildings, waste, and so on. In the United States, President Obama’s Administration announced the New Smart Cities Initiative on September 14, 2015. With this announcement, the US Federal government will “invest over $160 million in federal research and leverage more than 25 new technology collaborations to help local communities tackle key challenges such as reducing traffic congestion, fighting crime, fostering economic growth, managing the effects of a changing climate, and improving the delivery of city services. The new initiative is part of this Administration’s overall commitment to target federal resources to meet local needs and support community-led solutions”[2].


Barack Obama, the 44th President of the United States, said “Every community is different, with different needs and different approaches. But communities that are making the most progress on these issues have some things in common. They don’t look for a single silver bullet; instead they bring together local government and nonprofits and businesses and teachers and parents around a shared goal.” (Source: / This image is a work of an employee of the Executive Office of the President of the United States, taken or made as part of that person’s official duties. As a work of the U.S. federal government, the image is in the public domain)

Last week, the National Institute of Standards and Technology of the US Department of Commerce and US Ignite held the Global City Teams Challenge to bring together cities from around the world, developers, and tech companies to continue collaborations to develop Internet of Things applications for smart cities and smart communities. Many of the current and proposed projects presented at the Global City Teams Challenge focus on using networks of data-gathering sensors, cloud computing and machine-to-machine communication. At the heart of each project is the challenge of not only gathering this information, but also how to mash it up with other data sources to analyze the data in real-time. The cloud is key because teams of developers are working on cloud-based applications to analyze and understand the data streaming from the network of sensors.


Underpinning each innovative approach is a vast data ecosystem necessary to capture, store, process, manage transmit and share the data needed to build knowledge to address critical problems to improve city life. Information management is a critical part of component of Smart Cities. Information management is the organization and control over the collection and management of information from one or more sources and the processing, delivery and sharing of that information to one or more audiences. Information management transforms data from its raw form to information by establishing the practices and garnering resources and capabilities to contextualize and add meaning to data. Effective information management ensures that a city and its citizens have high quality information resources that are relied upon to build knowledge, deepen understanding, and ultimately develop wisdom to fulfill its mission.


Traditional data analysis methods fall short in many ways to working with today’s data. Data analysis and the data management issues in the pre-Internet days (late 1980’s to mid-1990) were quite different. Traditional methods to forecast the growth of traffic on highways used every bit of data available and organized it into a data repository. Transportation planners used census data, traffic sensor data, tax and parcel data, business activity data, vehicle ownership data, and research data on driving behavior. For policing, cutting-edge research used advanced forecasting models to predict crime hotspots fueled with a wide array of demographic, land and building use, historical crime, 911 calls, and seasonal and location factor data. School districts used similar arrays of data to make decisions about enrollment policies. In each case, back in the day, the data analysis approaches used structured data that was cleaned, reformatted into a structured form, and stored in an early equivalent of a data warehouse. It was difficult to incorporate structured and unstructured data into the analysis with computer driven data management methods.

But imagine working on these same problems, in a smart city environment. Organizations once relied on “lean” data captured in numbers and text, but now need also to manage “rich” data captured in multimedia images, sound, semantic and relational formats. Would a traditional data warehouse and with traditional analytic methods be sufficient?

In a Smart City context, imagine the data flows stemming from life in a big city through our interactions with each other socially, the physical movement of people, vehicles and things, the production of energy, data communications, fresh water, sewage; and the impact on the built and natural environment. The data management system needs to handle data sets that are terabytes to petabytes in size. With big data, the data management system and analytical tools have to deal with volume, velocity, variety, value, and veracity of the data. The sensor environment has very particular storage characteristics that need consideration. The size of the database that captures the sensor data will vary by the size and complexity of the sensor application. This depends on the number and frequency of the measurements being taken, the size of the coverage area, and the sampling rate[3]. Add to this: the ever-increasing flow of individual or business level data from social media, click streams, and transactions. The computing environment has a greater expectation for system interoperability for information exchanges and a need to develop structures and approaches to support the interoperability of systems. In order to develop applications to interpret and analyze the patterns of data, information management strategies are needed to support stewardship of the underlying data. For the Smart City environment, the critical issue will be to translate these streams of data into information and knowledge to support timely decision-making.

Moving ahead to today, the big data paradigm stretches the capabilities to use traditional data methods. The industry has seen wide use of data warehouses to capture data for analysis, forecasting, modeling, reporting and visualization. Data warehouses tend to focus on a specific subject and integrates directly related data from internal and external sources to support managerial decision-making. As Bill Inmon, considered the “father of the data warehouse”, observes, the first generation of data warehouses focused on placing primarily transaction-oriented structured data on a disk for storage. The real challenge is that the modern data warehouse contains structure and unstructured data and how to capture the meaning of both types of data through metadata[4].

The Best

There is an interesting debate brewing about the best approach to managing the storage and analytics methods of big data. The industry is now evolving data warehouses to work in an integrated fashion with emerging approaches using Hadoop. Hadoop develops open-source software for reliable, scalable, distributed computing on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs[5]. Hadoop developers instantiated a concept known as a “data lake” into its architecture. It allows disparate records to be stored in their native formats for later parsing, rather than forcing into a single format for integration. A data lake preserves the native format and maintains data provenance and fidelity, so different analysis can be performed using different contexts[6]. “In broad terms, data lakes are marketed as enterprise wide data management platforms for analyzing disparate sources of data in its native format,” said Nick Heudecker, research director at Gartner. “The idea is simple: instead of placing data in a purpose-built data store, you move it into a data lake in its original format. This eliminates the upfront costs of data ingestion, like transformation. Once data is placed into the lake, it’s available for analysis by everyone in the organization”[7].

Many analysts suggest that it’s not a data warehouse versus data lake issue. Rather “New School big data Kool-Aid drinkers think Hadoop is the ultimate data management technology, while the Old Guard points to the market dominance of legacy solutions and the data governance, stewardship, and security lessons learned from past decades. But Hadoop isn’t just about replacing data warehouse technologies. Hadoop brings value by extending and working alongside those traditional systems, bringing flexibility and cost savings, along with greater business visibility and insight”[8]. Hadoop is by many organizations as a staging area for a data warehouse and analytics toolkits. Many organizations are using Hadoop to handle unstructured data and to manage historical data from their data warehouses.

As work continues on Smart Cities, the developer community will continue to look at innovative and cost-effective approaches to manage the large data collected to do a better job of data analysis and formulation of solutions to very difficult challenges that our cities around the world are facing.

(1) Knowledge Management: Digitally Transforming Knowledge into Intelligence

Written by Jon Gant, LG CNS Blog’s Regular Contributor


[1] Bello-Orgaz, G., J.J. Jung, and D. Camacho. “Social Big Data: Recent Achievements and New Challenges.” Information Fusion 28 (2016): 45–59. doi:10.1016/j.inffus.2015.08.005 [back to the article]
[2] [back to the article]
[3] [back to the article]
[4] Inmon, William, Derek Strauss, and Genia Neushloss. DW 2.0: The Architecture for the Next Generation of Data Warehousing. Burlington, MA: Elsevier, 2008 [back to the article]
[5] and [back to the article]
[6] [back to the article]
[7] [back to the article]
[8] [back to the article]

Post navigation

'Inside IT' Category Post
  • IoT
  • Cloud
  • Big Data
  • Security
  • Data Center
  • e-Government
  • Transportation
  • Energy
  • Manufacturing
  • Finance