<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:psc="http://podlove.org/simple-chapters" xmlns:podcast="https://podcastindex.org/namespace/1.0"><channel><title><![CDATA[Great Data Products]]></title><description><![CDATA[A podcast about the ergonomics and craft of data. Brought to you by Source Cooperative.]]></description><link>https://greatdataproducts.com</link><generator>Riverside.fm (https://riverside.com)</generator><lastBuildDate>Mon, 13 Apr 2026 05:21:01 GMT</lastBuildDate><atom:link href="https://api.riverside.fm/hosting/fvjHDnex.rss" rel="self" type="application/rss+xml"/><author><![CDATA[Source Cooperative]]></author><pubDate>Wed, 01 Oct 2025 21:25:43 GMT</pubDate><copyright><![CDATA[2025 Source Cooperative]]></copyright><language><![CDATA[en]]></language><ttl>60</ttl><category><![CDATA[Technology]]></category><category><![CDATA[Science]]></category><itunes:author>Source Cooperative</itunes:author><itunes:summary>A podcast about the ergonomics and craft of data. Brought to you by Source Cooperative.</itunes:summary><itunes:type>episodic</itunes:type><itunes:owner><itunes:name>Source Cooperative</itunes:name><itunes:email>ops@radiant.earth</itunes:email></itunes:owner><itunes:explicit>no</itunes:explicit><itunes:category text="Technology"/><itunes:category text="Science"/><itunes:image href="https://hosting-media.rs-prod.riverside.fm/media/podcasts/02a41c0e-5563-4b0a-9fc9-60b01317d14f/logos/b82bd04f-283a-4abb-aaad-79ae517c9bb9.png"/><item><title><![CDATA[The Storm Events Database Explorer]]></title><description><![CDATA[<p>Jed talks with Kwin Keuter and Brad Andrick, geospatial software engineers at Earth Genome, about the <a rel="noopener noreferrer nofollow" href="https://stormevents.internetofwater.app/" target="_blank">Storm Events Database Explorer</a>. This collaborative project between Earth Genome, The Commons, and the Internet of Water Coalition provides access to over 1.9 million U.S. severe weather events spanning 70+ years of NOAA’s National Center for Environmental Information (NCEI) storm records, including tornadoes, floods, hail, and hurricanes.</p><h3>Links and Resources</h3><ul><li><a rel="noopener noreferrer nofollow" href="https://stormevents.internetofwater.app/" target="_blank">Storm Events Database Explorer</a> — Interactive map and search interface</li><li><a rel="noopener noreferrer nofollow" href="https://source.coop/repositories/earth-genome/noaa-storm-events/description" target="_blank">Storm Events Database on Source Cooperative</a> — Cloud-optimized Parquet files</li><li><a rel="noopener noreferrer nofollow" href="https://www.earthgenome.org/blog" target="_blank">Earth Genome blog post on the project</a> — Technical process and discovery work</li><li><a rel="noopener noreferrer nofollow" href="https://www.thecommons.earth/" target="_blank">The Commons case study</a> — Project background and case study</li><li><a rel="noopener noreferrer nofollow" href="https://www.ncdc.noaa.gov/stormevents/" target="_blank">NOAA Storm Events Database</a> — Original NOAA dataset and beta interface</li><li><a rel="noopener noreferrer nofollow" href="http://GeoParquet.io" target="_blank">GeoParquet.io</a> — Chris Holmes’s project for working with Parquet files<br /><br />More show notes and transcript at <a rel="noopener noreferrer nofollow" href="https://greatdataproducts.com/episodes/2026/02/keuter-andrick-storm-events/" target="_blank">https://greatdataproducts.com/episodes/2026/02/keuter-andrick-storm-events/</a></li></ul>]]></description><guid isPermaLink="false">e9a2184e-090e-46d6-802f-8ff3715b653b</guid><dc:creator><![CDATA[Source Cooperative]]></dc:creator><pubDate>Sat, 28 Feb 2026 16:47:20 GMT</pubDate><enclosure url="https://api.riverside.fm/hosting-analytics/media/3e7adcdaec2e4e9e43ec3f215a935f81f4aea1937fa608facd0f389811f25191/eyJlcGlzb2RlSWQiOiJlOWEyMTg0ZS0wOTBlLTQ2ZDYtODAyZi04ZmYzNzE1YjY1M2IiLCJwb2RjYXN0SWQiOiIwMmE0MWMwZS01NTYzLTRiMGEtOWZjOS02MGIwMTMxN2QxNGYiLCJhY2NvdW50SWQiOiI2NDg3NjJjM2NkNmZhZjVmMGRkYmY2OGMiLCJwYXRoIjoibWVkaWEvY2xpcHMvNjk4YjgyMDQwNWVjM2Y0MzNiMWJkZmIxL3RlY2hzLW9uLXRleHRzLWNvbXBvc2VyLTIwMjYtMi0xMF9fMjAtNy00OC5tcDMifQ==.mp3" length="91605307" type="audio/mpeg"/><itunes:summary>&lt;p&gt;Jed talks with Kwin Keuter and Brad Andrick, geospatial software engineers at Earth Genome, about the &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://stormevents.internetofwater.app/&quot; target=&quot;_blank&quot;&gt;Storm Events Database Explorer&lt;/a&gt;. This collaborative project between Earth Genome, The Commons, and the Internet of Water Coalition provides access to over 1.9 million U.S. severe weather events spanning 70+ years of NOAA’s National Center for Environmental Information (NCEI) storm records, including tornadoes, floods, hail, and hurricanes.&lt;/p&gt;&lt;h3&gt;Links and Resources&lt;/h3&gt;&lt;ul&gt;&lt;li&gt;&lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://stormevents.internetofwater.app/&quot; target=&quot;_blank&quot;&gt;Storm Events Database Explorer&lt;/a&gt; — Interactive map and search interface&lt;/li&gt;&lt;li&gt;&lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://source.coop/repositories/earth-genome/noaa-storm-events/description&quot; target=&quot;_blank&quot;&gt;Storm Events Database on Source Cooperative&lt;/a&gt; — Cloud-optimized Parquet files&lt;/li&gt;&lt;li&gt;&lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://www.earthgenome.org/blog&quot; target=&quot;_blank&quot;&gt;Earth Genome blog post on the project&lt;/a&gt; — Technical process and discovery work&lt;/li&gt;&lt;li&gt;&lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://www.thecommons.earth/&quot; target=&quot;_blank&quot;&gt;The Commons case study&lt;/a&gt; — Project background and case study&lt;/li&gt;&lt;li&gt;&lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://www.ncdc.noaa.gov/stormevents/&quot; target=&quot;_blank&quot;&gt;NOAA Storm Events Database&lt;/a&gt; — Original NOAA dataset and beta interface&lt;/li&gt;&lt;li&gt;&lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;http://GeoParquet.io&quot; target=&quot;_blank&quot;&gt;GeoParquet.io&lt;/a&gt; — Chris Holmes’s project for working with Parquet files&lt;br /&gt;&lt;br /&gt;More show notes and transcript at &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://greatdataproducts.com/episodes/2026/02/keuter-andrick-storm-events/&quot; target=&quot;_blank&quot;&gt;https://greatdataproducts.com/episodes/2026/02/keuter-andrick-storm-events/&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;</itunes:summary><itunes:explicit>no</itunes:explicit><itunes:duration>01:03:37</itunes:duration><itunes:image href="https://hosting-media.rs-prod.riverside.fm/media/podcasts/02a41c0e-5563-4b0a-9fc9-60b01317d14f/logos/b82bd04f-283a-4abb-aaad-79ae517c9bb9.png"/><itunes:episode>6</itunes:episode><itunes:title>The Storm Events Database Explorer</itunes:title><itunes:episodeType>full</itunes:episodeType></item><item><title><![CDATA[Turning Federal Data Into Action]]></title><description><![CDATA[<p>  Jed talks with Denice Ross, Senior Fellow at the Federation of American Scientists and former U.S. Chief Data Scientist, about federal data's role in American life and what happens when government data tools sunset. Denice led efforts to use disaggregated data to drive better outcomes for all Americans during her time as Deputy U.S. Chief Technology Officer, and now works on building a Federal Data Use Case Repository documenting how federal datasets affect everyday decisions.</p><p></p><p>  The conversation explores why open data initiatives have evolved over the years and how administrative priorities shape public data tool availability. Denice emphasizes that federal data underpins economic growth, public health decisions, and governance at every level. She describes how data users can engage with data stewards to create feedback loops that improve data quality, and why nonprofits and civil society organizations play an essential role in both data collection and advocacy.</p><p></p><p>  Throughout the discussion, Denice and Jed examine the balance between official government data products and innovative tools built by external organizations. They discuss creative solutions for filling data gaps, the importance of identifying tools as "powered by federal data" to preserve datasets, and strategies for protecting federal data accessibility for the long term.</p><p></p><p>LINKS AND RESOURCES</p><p>  - Denice Ross at the Federation of American Scientists: <a rel="noopener noreferrer nofollow" href="https://fas.org/expert/denice-ross/" target="_blank">https://fas.org/expert/denice-ross/</a></p><p>  - The federal data and tools that died this year (Marketplace): <a rel="noopener noreferrer nofollow" href="https://www.marketplace.org/episode/2025/11/25/the-federal-data-and-tools-that-died-this-year" target="_blank">https://www.marketplace.org/episode/2025/11/25/the-federal-data-and-tools-that-died-this-year</a></p><p></p><p>TAKEAWAYS</p><p>  1. Federal data underpins daily life — From public health decisions to economic planning, federal datasets inform choices that affect Americans whether they realize it or not.</p><p>  2. Data tools require active protection — When administrative priorities shift, public data tools can disappear. Building awareness of data dependencies helps preserve access.</p><p>  3. Feedback loops improve data quality — Data users should engage directly with data stewards. Public participation in the data lifecycle leads to better, more relevant datasets.</p><p>  4. Civil society fills critical gaps — Nonprofits and external organizations can collect data and advocate for data resources in ways government cannot.</p><p>  5. Disaggregated data drives equity — Breaking down aggregate statistics reveals disparities and enables targeted interventions that benefit underserved communities.</p><p>  6. External innovation complements government stability – A healthy ecosystem keeps federal data stable while enabling community-driven tools to evolve and serve specific needs.</p><p></p><p>  ---</p><p></p><p>Great Data Products is brought to you by Source Cooperative. Learn more at <a rel="noopener noreferrer nofollow" href="https://greatdataproducts.com" target="_blank">https://greatdataproducts.com</a></p>]]></description><guid isPermaLink="false">be0c759b-cfb1-4e30-88f5-6097de2c4332</guid><dc:creator><![CDATA[Source Cooperative]]></dc:creator><pubDate>Sat, 10 Jan 2026 00:19:17 GMT</pubDate><enclosure url="https://api.riverside.fm/hosting-analytics/media/3f2958b617dc5057a4b3aadf3ed2e6a4605dac9f7b55c4c20e36aecee1374d33/eyJlcGlzb2RlSWQiOiJiZTBjNzU5Yi1jZmIxLTRlMzAtODhmNS02MDk3ZGUyYzQzMzIiLCJwb2RjYXN0SWQiOiIwMmE0MWMwZS01NTYzLTRiMGEtOWZjOS02MGIwMTMxN2QxNGYiLCJhY2NvdW50SWQiOiI2NDg3NjJjM2NkNmZhZjVmMGRkYmY2OGMiLCJwYXRoIjoibWVkaWEvY2xpcHMvNjk1ZmVkZGY5NTc3NTAxZTczNzdhNmViL3RlY2hzLW9uLXRleHRzLWNvbXBvc2VyLTIwMjYtMS04X18xOC00OC0xNS5tcDMifQ==.mp3" length="52664107" type="audio/mpeg"/><itunes:summary>&lt;p&gt;  Jed talks with Denice Ross, Senior Fellow at the Federation of American Scientists and former U.S. Chief Data Scientist, about federal data&apos;s role in American life and what happens when government data tools sunset. Denice led efforts to use disaggregated data to drive better outcomes for all Americans during her time as Deputy U.S. Chief Technology Officer, and now works on building a Federal Data Use Case Repository documenting how federal datasets affect everyday decisions.&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;  The conversation explores why open data initiatives have evolved over the years and how administrative priorities shape public data tool availability. Denice emphasizes that federal data underpins economic growth, public health decisions, and governance at every level. She describes how data users can engage with data stewards to create feedback loops that improve data quality, and why nonprofits and civil society organizations play an essential role in both data collection and advocacy.&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;  Throughout the discussion, Denice and Jed examine the balance between official government data products and innovative tools built by external organizations. They discuss creative solutions for filling data gaps, the importance of identifying tools as &quot;powered by federal data&quot; to preserve datasets, and strategies for protecting federal data accessibility for the long term.&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;LINKS AND RESOURCES&lt;/p&gt;&lt;p&gt;  - Denice Ross at the Federation of American Scientists: &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://fas.org/expert/denice-ross/&quot; target=&quot;_blank&quot;&gt;https://fas.org/expert/denice-ross/&lt;/a&gt;&lt;/p&gt;&lt;p&gt;  - The federal data and tools that died this year (Marketplace): &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://www.marketplace.org/episode/2025/11/25/the-federal-data-and-tools-that-died-this-year&quot; target=&quot;_blank&quot;&gt;https://www.marketplace.org/episode/2025/11/25/the-federal-data-and-tools-that-died-this-year&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;TAKEAWAYS&lt;/p&gt;&lt;p&gt;  1. Federal data underpins daily life — From public health decisions to economic planning, federal datasets inform choices that affect Americans whether they realize it or not.&lt;/p&gt;&lt;p&gt;  2. Data tools require active protection — When administrative priorities shift, public data tools can disappear. Building awareness of data dependencies helps preserve access.&lt;/p&gt;&lt;p&gt;  3. Feedback loops improve data quality — Data users should engage directly with data stewards. Public participation in the data lifecycle leads to better, more relevant datasets.&lt;/p&gt;&lt;p&gt;  4. Civil society fills critical gaps — Nonprofits and external organizations can collect data and advocate for data resources in ways government cannot.&lt;/p&gt;&lt;p&gt;  5. Disaggregated data drives equity — Breaking down aggregate statistics reveals disparities and enables targeted interventions that benefit underserved communities.&lt;/p&gt;&lt;p&gt;  6. External innovation complements government stability – A healthy ecosystem keeps federal data stable while enabling community-driven tools to evolve and serve specific needs.&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;  ---&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;Great Data Products is brought to you by Source Cooperative. Learn more at &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://greatdataproducts.com&quot; target=&quot;_blank&quot;&gt;https://greatdataproducts.com&lt;/a&gt;&lt;/p&gt;</itunes:summary><itunes:explicit>no</itunes:explicit><itunes:duration>01:10:01</itunes:duration><itunes:image href="https://hosting-media.rs-prod.riverside.fm/media/podcasts/02a41c0e-5563-4b0a-9fc9-60b01317d14f/logos/b82bd04f-283a-4abb-aaad-79ae517c9bb9.png"/><itunes:episode>5</itunes:episode><itunes:title>Turning Federal Data Into Action</itunes:title><itunes:episodeType>full</itunes:episodeType></item><item><title><![CDATA[How Standards Emerge: Lessons from STAC]]></title><description><![CDATA[<p>[Jed's audio in this sounds terrible because of a hardware setting that Marshall Moutenot very kindly helped us identify. Will sound better in future episodes!]<br /><br />Jed talks with Matt Hanson from Element 84 about the SpatioTemporal Asset Catalog (STAC) specification and its role in making geospatial data findable and usable. Matt describes STAC as "a simple, developer-friendly way to describe geospatial data so that people can actually find it and use it." The conversation covers how STAC emerged from a 2017 sprint in Boulder with 20 people and grew into a specification now adopted by NASA, USGS, and commercial satellite companies worldwide. Matt discusses the concept of "guerrilla standards," why adoption is the only metric that matters, the limitations of remote sensing, and why credibility can't be skipped when launching standards efforts.<br /><br />Full show notes and transcript: <a rel="noopener noreferrer nofollow" href="https://greatdataproducts.com/episodes/2025/12/hanson-stac/" target="_blank">https://greatdataproducts.com/episodes/2025/12/hanson-stac/</a><br /><br />Links and Resources:</p><ul><li>STAC Specification: <a rel="noopener noreferrer nofollow" href="https://stacspec.org/" target="_blank">https://stacspec.org/</a></li><li>STAC: A Retrospective, Part 2: <a rel="noopener noreferrer nofollow" href="https://element84.com/software-engineering/stac-a-retrospective-part-2-why-stac-was-successful/" target="_blank">https://element84.com/software-engineering/stac-a-retrospective-part-2-why-stac-was-successful/</a></li><li>Emergent Standards white paper: <a rel="noopener noreferrer nofollow" href="https://tial.org/publications/white-paper-003-emergent-standards-enabling-collaborations-across-institutions/" target="_blank">https://tial.org/publications/white-paper-003-emergent-standards-enabling-collaborations-across-institutions/</a></li><li>STAC Auth Proxy: <a rel="noopener noreferrer nofollow" href="https://github.com/developmentseed/stac-auth-proxy" target="_blank">https://github.com/developmentseed/stac-auth-proxy</a> </li><li>FilmDrop UI: <a rel="noopener noreferrer nofollow" href="https://console.demo.filmdrop.element84.com/" target="_blank">https://console.demo.filmdrop.element84.com/</a></li><li>Planet Planetary Variables: <a rel="noopener noreferrer nofollow" href="https://www.planet.com/products/planetary-variables/" target="_blank">https://www.planet.com/products/planetary-variables/</a></li><li>CommonSpace: <a rel="noopener noreferrer nofollow" href="https://www.commonspace.world/" target="_blank">https://www.commonspace.world/</a></li><li>"You Just Haven't Earned It Yet Baby": <a rel="noopener noreferrer nofollow" href="https://www.youtube.com/watch?v=jc9F0bh5OXc" target="_blank">https://www.youtube.com/watch?v=jc9F0bh5OXc</a></li></ul><p></p><p>Great Data Products is brought to you by  Source Cooperative: <a rel="noopener noreferrer nofollow" href="https://source.coop" target="_blank">https://source.coop</a></p>]]></description><guid isPermaLink="false">3234727b-4fc8-47d6-a594-01520a619c08</guid><dc:creator><![CDATA[Source Cooperative]]></dc:creator><pubDate>Sat, 27 Dec 2025 21:48:53 GMT</pubDate><enclosure url="https://api.riverside.fm/hosting-analytics/media/86998c0d576d6d944951e5e6af66b179f687de2566e128f80ddf05f85a209ec8/eyJlcGlzb2RlSWQiOiIzMjM0NzI3Yi00ZmM4LTQ3ZDYtYTU5NC0wMTUyMGE2MTljMDgiLCJwb2RjYXN0SWQiOiIwMmE0MWMwZS01NTYzLTRiMGEtOWZjOS02MGIwMTMxN2QxNGYiLCJhY2NvdW50SWQiOiI2NDg3NjJjM2NkNmZhZjVmMGRkYmY2OGMiLCJwYXRoIjoibWVkaWEvY2xpcHMvNjkzYWYxYzFjZjEyN2YyNjc4YTM1ZjA1L3RlY2hzLW9uLXRleHRzLWNvbXBvc2VyLTIwMjUtMTItMTFfXzE3LTMwLTU3Lm1wMyJ9.mp3" length="73894543" type="audio/mpeg"/><itunes:summary>&lt;p&gt;[Jed&apos;s audio in this sounds terrible because of a hardware setting that Marshall Moutenot very kindly helped us identify. Will sound better in future episodes!]&lt;br /&gt;&lt;br /&gt;Jed talks with Matt Hanson from Element 84 about the SpatioTemporal Asset Catalog (STAC) specification and its role in making geospatial data findable and usable. Matt describes STAC as &quot;a simple, developer-friendly way to describe geospatial data so that people can actually find it and use it.&quot; The conversation covers how STAC emerged from a 2017 sprint in Boulder with 20 people and grew into a specification now adopted by NASA, USGS, and commercial satellite companies worldwide. Matt discusses the concept of &quot;guerrilla standards,&quot; why adoption is the only metric that matters, the limitations of remote sensing, and why credibility can&apos;t be skipped when launching standards efforts.&lt;br /&gt;&lt;br /&gt;Full show notes and transcript: &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://greatdataproducts.com/episodes/2025/12/hanson-stac/&quot; target=&quot;_blank&quot;&gt;https://greatdataproducts.com/episodes/2025/12/hanson-stac/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Links and Resources:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;STAC Specification: &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://stacspec.org/&quot; target=&quot;_blank&quot;&gt;https://stacspec.org/&lt;/a&gt;&lt;/li&gt;&lt;li&gt;STAC: A Retrospective, Part 2: &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://element84.com/software-engineering/stac-a-retrospective-part-2-why-stac-was-successful/&quot; target=&quot;_blank&quot;&gt;https://element84.com/software-engineering/stac-a-retrospective-part-2-why-stac-was-successful/&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Emergent Standards white paper: &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://tial.org/publications/white-paper-003-emergent-standards-enabling-collaborations-across-institutions/&quot; target=&quot;_blank&quot;&gt;https://tial.org/publications/white-paper-003-emergent-standards-enabling-collaborations-across-institutions/&lt;/a&gt;&lt;/li&gt;&lt;li&gt;STAC Auth Proxy: &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://github.com/developmentseed/stac-auth-proxy&quot; target=&quot;_blank&quot;&gt;https://github.com/developmentseed/stac-auth-proxy&lt;/a&gt; &lt;/li&gt;&lt;li&gt;FilmDrop UI: &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://console.demo.filmdrop.element84.com/&quot; target=&quot;_blank&quot;&gt;https://console.demo.filmdrop.element84.com/&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Planet Planetary Variables: &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://www.planet.com/products/planetary-variables/&quot; target=&quot;_blank&quot;&gt;https://www.planet.com/products/planetary-variables/&lt;/a&gt;&lt;/li&gt;&lt;li&gt;CommonSpace: &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://www.commonspace.world/&quot; target=&quot;_blank&quot;&gt;https://www.commonspace.world/&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&quot;You Just Haven&apos;t Earned It Yet Baby&quot;: &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://www.youtube.com/watch?v=jc9F0bh5OXc&quot; target=&quot;_blank&quot;&gt;https://www.youtube.com/watch?v=jc9F0bh5OXc&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;Great Data Products is brought to you by  Source Cooperative: &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://source.coop&quot; target=&quot;_blank&quot;&gt;https://source.coop&lt;/a&gt;&lt;/p&gt;</itunes:summary><itunes:explicit>no</itunes:explicit><itunes:duration>01:28:45</itunes:duration><itunes:image href="https://hosting-media.rs-prod.riverside.fm/media/podcasts/02a41c0e-5563-4b0a-9fc9-60b01317d14f/logos/b82bd04f-283a-4abb-aaad-79ae517c9bb9.png"/><itunes:episode>4</itunes:episode><itunes:title>How Standards Emerge: Lessons from STAC</itunes:title><itunes:episodeType>full</itunes:episodeType></item><item><title><![CDATA[Inside Harvard's data.gov Archive]]></title><description><![CDATA[<p>Jed talks with Jack Cushman from the Harvard Law School Library Innovation Lab about their project to archive and preserve more than 311,000 datasets from <a rel="noopener noreferrer nofollow" href="http://Data.gov" target="_blank">Data.gov</a>. We explore how they use BagIt for long-term preservation, built a serverless search interface that makes 17.9 TB of data discoverable in the browser, and what this means for the future of online archives.</p>]]></description><guid isPermaLink="false">a82441d8-c6ad-4304-8dcc-821aa56f9d25</guid><dc:creator><![CDATA[Source Cooperative]]></dc:creator><pubDate>Fri, 21 Nov 2025 17:19:28 GMT</pubDate><enclosure url="https://api.riverside.fm/hosting-analytics/media/01e71182cc1445afcd878b26fd080df48ddce69c329cb645091967e9b7b7880c/eyJlcGlzb2RlSWQiOiJhODI0NDFkOC1jNmFkLTQzMDQtOGRjYy04MjFhYTU2ZjlkMjUiLCJwb2RjYXN0SWQiOiIwMmE0MWMwZS01NTYzLTRiMGEtOWZjOS02MGIwMTMxN2QxNGYiLCJhY2NvdW50SWQiOiI2NDg3NjJjM2NkNmZhZjVmMGRkYmY2OGMiLCJwYXRoIjoibWVkaWEvY2xpcHMvNjkyMDllM2VlNzJlODUwY2FjOGJhNjkzL3RlY2hzLW9uLXRleHRzLWNvbXBvc2VyLTIwMjUtMTEtMjFfXzE4LTE1LTQyLm1wMyJ9.mp3" length="62870864" type="audio/mpeg"/><itunes:summary>&lt;p&gt;Jed talks with Jack Cushman from the Harvard Law School Library Innovation Lab about their project to archive and preserve more than 311,000 datasets from &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;http://Data.gov&quot; target=&quot;_blank&quot;&gt;Data.gov&lt;/a&gt;. We explore how they use BagIt for long-term preservation, built a serverless search interface that makes 17.9 TB of data discoverable in the browser, and what this means for the future of online archives.&lt;/p&gt;</itunes:summary><itunes:explicit>no</itunes:explicit><itunes:duration>01:19:21</itunes:duration><itunes:image href="https://hosting-media.rs-prod.riverside.fm/media/podcasts/02a41c0e-5563-4b0a-9fc9-60b01317d14f/logos/b82bd04f-283a-4abb-aaad-79ae517c9bb9.png"/><itunes:episode>3</itunes:episode><itunes:title>Inside Harvard&apos;s data.gov Archive</itunes:title><itunes:episodeType>full</itunes:episodeType></item><item><title><![CDATA[Protomaps and PMTiles]]></title><description><![CDATA[<p>Jed talks with Brandon Liu about building maps for the web with Protomaps and PMTiles. We cover why new formats won't work without a compelling application, how a single-file base map functions as a reusable data product, designing simple specs for long-term usability, and how object storage-based approaches can replace server-based stacks while staying fast and easy to integrate. Many thanks to our listeners from Norway and Egypt who stayed up very late for the live stream!</p><p></p><p>Links and Resources</p><p>- <a rel="noopener noreferrer nofollow" href="https://protomaps.com" target="_blank">Protomaps</a> – a free, customizable base map you can self-host</p><p>- <a rel="noopener noreferrer nofollow" href="https://pmtiles.io" target="_blank">PMTiles Viewer</a> – drag-and-drop viewer for <code>.pmtiles</code> files</p><p>- Browse 2.7 billion building footprints in PMTiles in the <a rel="noopener noreferrer nofollow" href="https://source.coop/vida/google-microsoft-osm-open-buildings/pmtiles/goog_msft_osm_country_layers.pmtiles" target="_blank">Google-Microsoft-OSM Open Buildings - combined by VIDA</a> product on Source</p><p>- <a rel="noopener noreferrer nofollow" href="https://tial.org/publications/white-paper-003-emergent-standards-enabling-collaborations-across-institutions/" target="_blank">Emergent standards white paper</a> from the Institutional Architecture Lab</p><p></p><p>Key takeaways:</p><p>1. <b>Ship a killer app if you want a new format to gain traction</b> — The Protomaps base map is the product that makes the PMTiles format matter.</p><p>2. <b>Single-file, object storage first</b> — PMTiles runs from a bucket or an SD card, with a browser-based viewer for offline use.</p><p>3. <b>Design simple, future‑proof specifications</b> — Keep formats small and reimplementable with minimal dependencies; simplicity preserves longevity and portability.</p><p>4. <b>Prioritize the developer experience</b> — Single-binary installs, easy local preview, and eliminating incidental complexity drive adoption more than raw capability.</p><p>5. <b>Build the right pipeline for the job</b> — Separate visualization-optimized packaging from analysis-ready data; don’t force one format to do everything.</p><p></p>]]></description><guid isPermaLink="false">d5720eb4-ce0b-4156-954b-003968e8b1f8</guid><dc:creator><![CDATA[Source Cooperative]]></dc:creator><pubDate>Sat, 01 Nov 2025 00:10:14 GMT</pubDate><enclosure url="https://api.riverside.fm/hosting-analytics/media/3edb6d2bd0e0e144163e3a2fff1c18c77d433c6ed6d7d76445e417f539fe405e/eyJlcGlzb2RlSWQiOiJkNTcyMGViNC1jZTBiLTQxNTYtOTU0Yi0wMDM5NjhlOGIxZjgiLCJwb2RjYXN0SWQiOiIwMmE0MWMwZS01NTYzLTRiMGEtOWZjOS02MGIwMTMxN2QxNGYiLCJhY2NvdW50SWQiOiI2NDg3NjJjM2NkNmZhZjVmMGRkYmY2OGMiLCJwYXRoIjoibWVkaWEvY2xpcHMvNjkwNTRlYTRjYjQyNGFkMDdhN2I2MTdiL3RlY2hzLW9uLXRleHRzLWNvbXBvc2VyLTIwMjUtMTEtMV9fMS00LTUyLm1wMyJ9.mp3" length="62642639" type="audio/mpeg"/><itunes:summary>&lt;p&gt;Jed talks with Brandon Liu about building maps for the web with Protomaps and PMTiles. We cover why new formats won&apos;t work without a compelling application, how a single-file base map functions as a reusable data product, designing simple specs for long-term usability, and how object storage-based approaches can replace server-based stacks while staying fast and easy to integrate. Many thanks to our listeners from Norway and Egypt who stayed up very late for the live stream!&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;Links and Resources&lt;/p&gt;&lt;p&gt;- &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://protomaps.com&quot; target=&quot;_blank&quot;&gt;Protomaps&lt;/a&gt; – a free, customizable base map you can self-host&lt;/p&gt;&lt;p&gt;- &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://pmtiles.io&quot; target=&quot;_blank&quot;&gt;PMTiles Viewer&lt;/a&gt; – drag-and-drop viewer for &lt;code&gt;.pmtiles&lt;/code&gt; files&lt;/p&gt;&lt;p&gt;- Browse 2.7 billion building footprints in PMTiles in the &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://source.coop/vida/google-microsoft-osm-open-buildings/pmtiles/goog_msft_osm_country_layers.pmtiles&quot; target=&quot;_blank&quot;&gt;Google-Microsoft-OSM Open Buildings - combined by VIDA&lt;/a&gt; product on Source&lt;/p&gt;&lt;p&gt;- &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://tial.org/publications/white-paper-003-emergent-standards-enabling-collaborations-across-institutions/&quot; target=&quot;_blank&quot;&gt;Emergent standards white paper&lt;/a&gt; from the Institutional Architecture Lab&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;Key takeaways:&lt;/p&gt;&lt;p&gt;1. &lt;b&gt;Ship a killer app if you want a new format to gain traction&lt;/b&gt; — The Protomaps base map is the product that makes the PMTiles format matter.&lt;/p&gt;&lt;p&gt;2. &lt;b&gt;Single-file, object storage first&lt;/b&gt; — PMTiles runs from a bucket or an SD card, with a browser-based viewer for offline use.&lt;/p&gt;&lt;p&gt;3. &lt;b&gt;Design simple, future‑proof specifications&lt;/b&gt; — Keep formats small and reimplementable with minimal dependencies; simplicity preserves longevity and portability.&lt;/p&gt;&lt;p&gt;4. &lt;b&gt;Prioritize the developer experience&lt;/b&gt; — Single-binary installs, easy local preview, and eliminating incidental complexity drive adoption more than raw capability.&lt;/p&gt;&lt;p&gt;5. &lt;b&gt;Build the right pipeline for the job&lt;/b&gt; — Separate visualization-optimized packaging from analysis-ready data; don’t force one format to do everything.&lt;/p&gt;&lt;p&gt;&lt;/p&gt;</itunes:summary><itunes:explicit>no</itunes:explicit><itunes:duration>01:17:14</itunes:duration><itunes:image href="https://hosting-media.rs-prod.riverside.fm/media/podcasts/02a41c0e-5563-4b0a-9fc9-60b01317d14f/logos/b82bd04f-283a-4abb-aaad-79ae517c9bb9.png"/><itunes:episode>2</itunes:episode><itunes:title>Protomaps and PMTiles</itunes:title><itunes:episodeType>full</itunes:episodeType></item><item><title><![CDATA[Why LLM Progress is Getting Harder]]></title><description><![CDATA[<p>Jed Sundwall and Drew Breunig explore why LLM progress is getting harder by examining the foundational data products that powered AI breakthroughs. They discuss how we've consumed the "low-hanging fruit" of internet data and graphics innovations, and what this means for the future of AI development.</p><p></p><p>The conversation traces three datasets that shaped AI: MNIST (1994), the handwritten digits dataset that became machine learning's "Hello World"; ImageNet (2008), Fei-Fei Li's image dataset that launched deep learning through AlexNet's 2012 breakthrough; and Common Crawl (2007), Gil Elbaz's web crawling project that fueled 60% of GPT-3's training data. Drew argues that great data products create ecosystems around themselves, using the Enron email dataset as an example of how a single data release can generate thousands of research papers and enable countless startups. The episode concludes with a discussion of benchmarks as modern data products and the challenge of creating sustainable data infrastructure for the next generation of AI systems.</p><p></p><p>Links and Resources:</p><p>- <a rel="noopener noreferrer nofollow" href="https://hai.stanford.edu/events/common-crawl-foundation-preserving-humanitys-knowledge-and-making-it-accessible-addressing-challenges-of-public-web-data" target="_blank">Common Crawl Foundation Event</a> - October 22nd event at Stanford!</p><p></p><p>- <a rel="noopener noreferrer nofollow" href="https://cloudnativegeo.org/events/cng-conference-2026/" target="_blank">Cloud-Native Geospatial Forum Conference 2026</a> - 6-9 October 2026 at Snowbird in Utah!</p><p></p><p>- <a rel="noopener noreferrer nofollow" href="https://www.dbreunig.com/2024/12/05/why-llms-are-hitting-a-wall.html" target="_blank">Why LLM Advancements Have Slowed: The Low-Hanging Fruit Has Been Eaten</a> - Drew's blog post that inspired this conversation</p><p></p><p>- <a rel="noopener noreferrer nofollow" href="https://radiant.earth/blog/2024/01/unicorns-show-ponies-and-gazelles/" target="_blank">Unicorns, Show Ponies, and Gazelles</a> - Jed's vision for sustainable data organizations</p><p></p><p>- <a rel="noopener noreferrer nofollow" href="https://arcprize.org/arc-agi" target="_blank">ARC AGI Benchmark</a> - François Chollet's reasoning benchmark</p><p></p><p>- <a rel="noopener noreferrer nofollow" href="https://thinkingmachines.ai" target="_blank">Thinking Machines Lab</a> - Mira Murati's reproducibility research lab</p><p></p><p>- <a rel="noopener noreferrer nofollow" href="https://www.tbench.ai" target="_blank">Terminal Bench</a> - Stanford's coding agent evaluation benchmark</p><p></p><p>- <a rel="noopener noreferrer nofollow" href="https://ar5iv.labs.arxiv.org/html/2310.00865" target="_blank">Data Science at the Singularity</a> - David Donoho's masterful paper examining the power of frictionless reproducibility</p><p></p><p>- <a rel="noopener noreferrer nofollow" href="https://arxiv.org/pdf/2507.18971" target="_blank">Rethinking Dataset Discovery with DataScout</a> - New paper examining dataset discovery</p><p></p><p>- <a rel="noopener noreferrer nofollow" href="https://huggingface.co/datasets/ylecun/mnist" target="_blank">MNIST Dataset</a> - The foundational machine learning dataset on Hugging Face</p><p></p><p>Key Takeaways</p><p>1. Great data products create ecosystems - They don't just provide data, they enable entire communities and industries to flourish</p><p></p><p>2. Benchmarks are data products with intent - They encode values and shape the direction of AI development</p><p></p><p>3. We've consumed the easy wins - The internet and graphics innovations that powered early AI breakthroughs are largely exhausted</p><p></p><p>4. The future is specialized - Progress will come from domain-specific datasets, benchmarks, and applications rather than general models</p><p></p><p>5. Data markets need new models - Traditional approaches to data sharing may not work in the AI era</p>]]></description><guid isPermaLink="false">d58ca93a-7a63-48c3-bb3f-40b520641221</guid><dc:creator><![CDATA[Source Cooperative]]></dc:creator><pubDate>Thu, 02 Oct 2025 04:59:41 GMT</pubDate><enclosure url="https://api.riverside.fm/hosting-analytics/media/e94c897a132ee0b5925603c5b1c2c16322fb3ca10113c849d10833662290804e/eyJlcGlzb2RlSWQiOiJkNThjYTkzYS03YTYzLTQ4YzMtYmIzZi00MGI1MjA2NDEyMjEiLCJwb2RjYXN0SWQiOiIwMmE0MWMwZS01NTYzLTRiMGEtOWZjOS02MGIwMTMxN2QxNGYiLCJhY2NvdW50SWQiOiI2NDg3NjJjM2NkNmZhZjVmMGRkYmY2OGMiLCJwYXRoIjoibWVkaWEvY2xpcHMvNjhkZDljOTdjZGQyMDFjNGNmMzE1Y2M0L3RlY2hzLW9uLXRleHRzLWNvbXBvc2VyLTIwMjUtMTAtMV9fMjMtMjYtNDcubXAzIn0=.mp3" length="95831952" type="audio/mpeg"/><itunes:summary>&lt;p&gt;Jed Sundwall and Drew Breunig explore why LLM progress is getting harder by examining the foundational data products that powered AI breakthroughs. They discuss how we&apos;ve consumed the &quot;low-hanging fruit&quot; of internet data and graphics innovations, and what this means for the future of AI development.&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;The conversation traces three datasets that shaped AI: MNIST (1994), the handwritten digits dataset that became machine learning&apos;s &quot;Hello World&quot;; ImageNet (2008), Fei-Fei Li&apos;s image dataset that launched deep learning through AlexNet&apos;s 2012 breakthrough; and Common Crawl (2007), Gil Elbaz&apos;s web crawling project that fueled 60% of GPT-3&apos;s training data. Drew argues that great data products create ecosystems around themselves, using the Enron email dataset as an example of how a single data release can generate thousands of research papers and enable countless startups. The episode concludes with a discussion of benchmarks as modern data products and the challenge of creating sustainable data infrastructure for the next generation of AI systems.&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;Links and Resources:&lt;/p&gt;&lt;p&gt;- &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://hai.stanford.edu/events/common-crawl-foundation-preserving-humanitys-knowledge-and-making-it-accessible-addressing-challenges-of-public-web-data&quot; target=&quot;_blank&quot;&gt;Common Crawl Foundation Event&lt;/a&gt; - October 22nd event at Stanford!&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;- &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://cloudnativegeo.org/events/cng-conference-2026/&quot; target=&quot;_blank&quot;&gt;Cloud-Native Geospatial Forum Conference 2026&lt;/a&gt; - 6-9 October 2026 at Snowbird in Utah!&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;- &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://www.dbreunig.com/2024/12/05/why-llms-are-hitting-a-wall.html&quot; target=&quot;_blank&quot;&gt;Why LLM Advancements Have Slowed: The Low-Hanging Fruit Has Been Eaten&lt;/a&gt; - Drew&apos;s blog post that inspired this conversation&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;- &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://radiant.earth/blog/2024/01/unicorns-show-ponies-and-gazelles/&quot; target=&quot;_blank&quot;&gt;Unicorns, Show Ponies, and Gazelles&lt;/a&gt; - Jed&apos;s vision for sustainable data organizations&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;- &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://arcprize.org/arc-agi&quot; target=&quot;_blank&quot;&gt;ARC AGI Benchmark&lt;/a&gt; - François Chollet&apos;s reasoning benchmark&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;- &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://thinkingmachines.ai&quot; target=&quot;_blank&quot;&gt;Thinking Machines Lab&lt;/a&gt; - Mira Murati&apos;s reproducibility research lab&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;- &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://www.tbench.ai&quot; target=&quot;_blank&quot;&gt;Terminal Bench&lt;/a&gt; - Stanford&apos;s coding agent evaluation benchmark&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;- &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://ar5iv.labs.arxiv.org/html/2310.00865&quot; target=&quot;_blank&quot;&gt;Data Science at the Singularity&lt;/a&gt; - David Donoho&apos;s masterful paper examining the power of frictionless reproducibility&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;- &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://arxiv.org/pdf/2507.18971&quot; target=&quot;_blank&quot;&gt;Rethinking Dataset Discovery with DataScout&lt;/a&gt; - New paper examining dataset discovery&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;- &lt;a rel=&quot;noopener noreferrer nofollow&quot; href=&quot;https://huggingface.co/datasets/ylecun/mnist&quot; target=&quot;_blank&quot;&gt;MNIST Dataset&lt;/a&gt; - The foundational machine learning dataset on Hugging Face&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;Key Takeaways&lt;/p&gt;&lt;p&gt;1. Great data products create ecosystems - They don&apos;t just provide data, they enable entire communities and industries to flourish&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;2. Benchmarks are data products with intent - They encode values and shape the direction of AI development&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;3. We&apos;ve consumed the easy wins - The internet and graphics innovations that powered early AI breakthroughs are largely exhausted&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;4. The future is specialized - Progress will come from domain-specific datasets, benchmarks, and applications rather than general models&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;5. Data markets need new models - Traditional approaches to data sharing may not work in the AI era&lt;/p&gt;</itunes:summary><itunes:explicit>no</itunes:explicit><itunes:duration>01:51:38</itunes:duration><itunes:image href="https://hosting-media.rs-prod.riverside.fm/media/podcasts/02a41c0e-5563-4b0a-9fc9-60b01317d14f/logos/b82bd04f-283a-4abb-aaad-79ae517c9bb9.png"/><itunes:episode>1</itunes:episode><itunes:title>Why LLM Progress is Getting Harder</itunes:title><itunes:episodeType>full</itunes:episodeType></item></channel></rss>