With 30 departments represented across more than 250 open datasets (and counting), the City of Seattle’s Open Data inventory has grown significantly since data.seattle.gov was launched in 2010. While this growth has contributed to the success and reputation of Seattle’s Open Data Program, it also comes with a responsibility to ensure that data consumers can not only find, but trust the accuracy, currency, and privacy of open datasets published on data.seattle.gov.
Like many other open data programs that share their origins with the Federal Government’s launch of data.gov (also in 2010), the first datasets were often spreadsheets uploaded from employees’ hard drives, or one-time “snapshot” of data from source systems. As programs matured, they increasingly relied on more sophisticated ETL and automation tools to extract data directly from transactional databases and OLTP systems. Most recently, open data pipelines and platforms are being integrated into enterprise data architectures so that open data becomes a natural byproduct of an organization’s overall data architecture strategy. This evolution has put open data squarely at the forefront of innovation and near-real-time smart city initiatives and brought with it a heightened need to focus on the “health” and curation of open data inventories.
As open data portals begin to age, and more datasets are added to their inventories with ever-increasing amounts data, datasets can get stale and search results can become cluttered. This ultimately makes it more difficult for data consumers to find the data they’re looking for because top search results might contain outdated or superseded data, or datasets with minimal metadata aren’t properly indexed for searching. This “search noise” has the potential to diminish the overall trust that departments and residents place in open data portals, and to an extent, their government. Needless to say, ignoring the need to keep open data inventories current and clean won’t make these challenges go away.
While Seattle’s Open Data Team is excited about the future of open data and the possibilities to help enable smart city initiatives and near-real-time data flows, they’re also keenly aware that keeping the inventory healthy is vital to maintaining the overall value of the City’s open data inventory.
To help improve and maintain the health of data.seattle.gov, Seattle’s Open Data Team is working with departments and developing analytical tools to improve metadata, remove un-needed datasets from the platform, and reduce the average time between dataset updates. By May 2019, the Open Data team will have met with representatives from over 15 City departments to implement open data action plans which identify opportunities for improving metadata or curating stale or low-value datasets. In just the first four months of 2019, this work has led to a roughly 30% rise in average metadata-completeness across Seattle’s Open Data inventory!
Cleaning up Seattle’s Open Data inventory has been an all-hands-on-deck effort for the Open Data team and its network of Open Data Champions who represent their departments’ open data needs. From developing a data warehouse that enables long term analytics of Seattle’s Open Data inventory, refactoring legacy automation code, and developing analytical self-service dashboards, every member of the Open Data team has played a vital role in not only cleaning up Seattle’s open data inventory but building tools and processes to help ensure the inventory stays healthy for years to come.
As you search for datasets on https://data.seattle.gov, rest assured that Seattle’s Open Data Team is hard at work behind the scenes to ensure you can not only find the dataset you’re looking for, but have confidence in the quality and currency of the underlying data. If you have any questions about Seattle’s Open Data Program, or its efforts to improve the health of the City’s open data inventory, please reach out to Paul Alley or send an email to open.data@seattle.gov.
Paul Alley is the City of Seattle’s Open Data Manager, and has worked in public sector data management and software engineering since 2001. Prior to his current focus on open data architecture and governance, Paul led the development of custom data management and analytics solutions across a variety of domains including university accreditation, remote telemetry, ecosystem monitoring, supply chain logistics, tax calculation, and asset management. Paul’s enjoys bringing his passion for data stewardship to the City of Seattle and helping departments release the city’s most valuable data for public consumption.