The Arctic climate system is undergoing rapid change with rising air and sea surface temperatures, accompanied by declines in the Greenland Ice Sheet, Arctic glaciers, sea ice, permafrost and snow cover on land. Increases in global air temperatures and ice-sheet mass loss are driving sea level rise around the globe. As the Arctic thaws, maritime and commercial activities in the region are expanding, presenting new opportunities as well as societal and cultural challenges. Given the importance and urgency, “Navigating the New Arctic” has become a national priority of research (e.g., NSF’s 10 Big Ideas). According to the Global Climate Observing System (GCOS), satellite remote sensing systems play a key role in monitoring essential climate variables that describe the cryosphere and global oceans, as these regions are largely inaccessible, negating the use of traditional observation techniques. Indeed, advances in remote sensing continue to demonstrate promising potential. For example, NASA ICESat-2 observations have recently delivered unprecedented new details of sea-ice properties including detecting individual melt ponds from a space-based altimeter for the first-time. Transformative opportunities enabled by rich datasets from satellite sources are emerging across many geoscience domains including oceanography, cryospheric science and ecology and are further expanded with detailed in-situ measurements.

The unprecedented volume and variety of geospatial big data (GeoBD) has now reached far beyond the capacity of computing platforms accessible to most geoscientists. No convenient solution exists as current distributed GeoBD platforms are mainly designed for siloed tasks and lack advanced analytical capabilities to facilitate geoscience discoveries. Both the volume of GeoBD and the variety of remote sensing technologies makes it extremely tedious and time-consuming to identify coincident and complementary data that can provide critical insights across geoscience domains (e.g., sea-ice science, glaciology, oceanography). This asymmetry between data growth and data discovery capacity significantly limits the value of GeoBD. 

ICESpark seeks to address this emerging challenge in GeoBD, with a desire to improve understanding of changes underway in the Arctic region in particular, but with tools that will be applicable across all of the geosciences. It is a distributed platform that can combine local commodity computers into a powerful environment that is ideal for handling GeoBD. First, built on Apache Sedona, ICESpark develops data integration and filtering tools to harness a wide variety of GeoBD across geoscience domains including oceanography, cryospheric science and ecology. Second, ICESpark provides a scalable data discovery layer to efficiently identify coincident data across streams from heterogeneous sensing platforms under a variety of user-defined search conditions. Third, ICESpark offers advanced data analytics capabilities, including an AI-enabled geo-feature identification system and a geo-pattern mining package, to equip geoscientists with geophysical or statistical tools to examine complex relationships and patterns embedded in GeoBD. To enhance research infrastructure, ICESpark will provide a variety of pre-packaged front-ends including Jupyter notebooks as well as interoperation with EarthCube’s existing QGreenland project, improving system accessibility by broad disciplinary communities. ICESpark will be fully open-source following the EarthCube GeoCODES Dataset schema for long-term sustainability. The multidisciplinary team looks forward to designing and developing ICESpark to harness GeoBD and address some of today’s most challenging geoscience problems.

The project will be carried out by by researchers in computer science, cryospheric science, ecology and oceanography from:


Principal Investigator
Co-Principal Investigators
Graduate Researchers
Project Sponsor
National Science Foundation (NSF)