Huan Ning, Zhenlong Li | 2025-04-13
In a recent vision paper, GIScience in the Era of Artificial Intelligence: A Research Agenda Towards Autonomous GIS, we proposed three scales of autonomous GIS agents: local, centralized, and infrastructure. Some GIS agents have been developed on local and centralized scales, such as LLM-Geo and Google Geospatial Reasoning. What do the infrastructure agents look like? In the vision paper, we discussed that “A key challenge [to develop the infrastructure-scale GIS agents] lies in establishing secure standards, protocols and policies to let generative AI manage computational resources.” The release of Google’s A2A protocol (early April 2025) to support communications of agents shined a light on the infrastructure scale agents, allowing AI agents can collaborate with each other to accomplish complex tasks using the resources across different types of infrastructure, such as data and computational resources.
Besides protocols among AI agents like A2A, Anthropic AI’s Model Context Protocol (MCP) focuses on AI-ready resources (e.g., data and tools). Released in late 2024, it aims to expose resources for large language models (LLMs). Our understanding is that MCP is a wrapper of existing resources (e.g., data, services) to provide information or materials for LLMs, so that they can provide better answers for users, rather than output plausible text according to LLM’s internal knowledge. In GIScience, geospatial data sharing attempts led by Open Geospatial Consortium (OGC) have been started since 1994, benefiting the geospatial community. Online geospatial resources that meet OGC standards can be shared across programs via API (application programming interface). In addition, the strong needs for geospatial data sharing, integration, and discovery led to the development of National Spatial Data Infrastructure (NSDI) and the large NSF-funded initiatives such as EarthCube and I-GUIDE. In the era of artificial intelligence (AI), these geospatial resources may need to be wrapped up for AI applications, especially for autonomous GIS agents. Our paper, Autonomous GIS: the next-generation AI-powered GIS has mentioned the need to prepare data for AI agents, and we implemented a data retrieval agent LLM-Find and it’s QGIS implementation, demonstrating a practical and plug-and-play manner for AI applications to access geospatial data. MCP is probably a more general way to share data for AI, especially authorization, but whether it can be adapted for geospatial data need further investigation.
Due to the nature of LLMs, many of these solutions or practices, like A2A, MCP, LLM-Find, and GIS Copilot, used a straightforward method – feeding descriptions of all available resources (e.g., agents, tools, and data) into an LLM and expect the LLM chooses the needed resources. This strategy faces the scalability problem: what if there are hundreds or thousands of resources? For example, there are about 30,000 Census variables and 200,000 OGC services (by PolarHub). Obviously, it is challenging to put the super-long descriptions of all variables and services into the LLM. Our team is testing the RAG (retrieval augmented generation) in GIS Copilot to help the Copilot to select appropriate tools to process geospatial data for users. We think fine-tuning multiple LLMs may be an alternative. Since humans have created countless public and private services for geospatial data infrastructure, autonomous GIS agents at the infrastructure scale need to discover, test, document, and recommend these resources, but how? We think this is a fascinating and useful research topic.
Reposted from: https://sites.psu.edu/giscience/2025/04/13/autonomous-gis-as-infrastructure/
Huan Ning | 04/09/2025
Google Research released a framework named Geospatial Reasoning, which provides a user-friendly inference for spatial analysis and visualization. By receiving users’ requests in natural language about geospatial tasks, Geospatial Reasoning can choose appropriate data sources and foundation models and then create and execute geoprocessing workflows to extract information and insights from data to reply to users’ requests. One application of Geospatial Reasoning is to assess the damaged building for emergency response. This is fantastic work, and many congratulations to the team!
Geospatial Reasoning can be viewed as an autonomous agent for GIS/RS (Geographic Information Systems and Remote Sensing). It reflects the vision of our 2023 paper “Autonomous GIS: the next-generation AI-powered GIS“. The core idea of that paper is converting the spatial analysis to a geoprocessing workflow using the reasoning and coding capabilities of generative AI, i.e., dividing the task into small pieces and solving them (divide-and-conquer). A step can be a small function, a tool, a model, or a smaller geoprocessing workflow. Google’s Geospatial Reasoning adopts the strategy of using GIS tools and foundation models as the steps in the geoprocessing workflow.
Recently, in collaboration with other 14 leading GIScience scholars, our team released another paper to further the discussion of autonomous GIS: “GIScience in the Era of Artificial Intelligence: A Research Agenda Towards Autonomous GIS“. Besides five autonomous levels (routine-, workflow-, data-, result-, and knowledge-aware), we also proposed three agent scales: local, centralized, and infrastructure. Geospatial Reasoning can be categorized as an earlier stage of Level 3 data-aware GIS as it is supposed to be able to use appropriate data for the given task. It is designed as a centralized GIS agent because it serves multiple users and runs on computing clusters under centralized management. Agents at this scale can process geospatial big data for analysis.
Since early 2023, we released the LLM-Geo among the first attempts of autonomous GIS agents, we are witnessing more and more GIS agents being researched and developed. We believe that Autonomous GIS is emerging as a new sub-field of GIScience, and our team is collaborating with various domain experts to advance the research of autonomous geospatial agents. The combination of large language models (or more broadly generative AI) and domain knowledge can significantly automate research productivity.
Reposted from: https://giscience.psu.edu/2025/04/10/geospatial-reasoning-by-google-a-leap-toward-autonomous-gis/
This is a piece adapted by AI from an abandoned part of a revision paper for an autonomous geographic information system (GIS) agent. I think it is interesting, so I put it here.
1. What is a Large Spatial Model?
Large Language Models (LLMs) learn patterns in human language and generate text by predicting what comes next. They are trained on huge amounts of text and sometimes images or videos.
Can we do something similar in GIS (Geographic Information Science)? In geography, nearby places are more connected than distant ones—similar to how words in a sentence relate to each other. The idea is to break down complex geographic patterns into smaller, simpler parts (like tokens in language) and train a model to predict spatial and temporal patterns.
But geographic data is more complicated than text. It comes in many forms—maps, satellite images, networks, and more—and varies by location and scale. Data gaps are also common. While there are models built for specific tasks (e.g., satellite image analysis), a general model for all geographic data, like an LLM for language, hasn’t been fully developed yet.
Simply put, a Large Spatial Model (LSM) is like a multi-, hyper-, or ultra-modal LLM for geospatial science. It would handle many types of geographic data (images, maps, networks, text) and learn patterns across space and time to support a wide range of applications, from environmental monitoring to urban planning.
2. Geospatial Embedding
A key challenge is how to represent geographic data in a way the model can process—this is called geospatial embedding. In language models, the basic unit is a token. For geographic data, it’s more complex.
Some researchers use graphs to represent connections between places based on data like human movement or environmental factors. However, geography isn’t just static connections; many phenomena (like traffic or weather) change over time. Current models often miss this temporal aspect, so more work is needed to capture both space and time in these embeddings.
3. Challenges to Building Large Spatial Models
Training large models requires enormous computing power. GIS researchers often have fewer resources than those working on LLMs, so they may need to focus on specific types of big geospatial data, like satellite images or mobility data.
There are also concerns about privacy and data security, especially when dealing with sensitive location data. Cross-disciplinary collaboration will be key to building responsible and secure LSMs.
In short, there is big potential in building Large Spatial Models, but also major technical and ethical challenges ahead.
Note: the picture is generated by Gemini 2.0 Flase.
I have devleoped a very useful Python package to operate Google Street View images, but maintaining it is painful, because I released more and more packages and research code. The maintaince like the "interest", and it is getting more and more. So that I envision a contiual agent, specialized of a relatively simple and specific task, such as maintaining the code repository and data crawling. It have the ability the run and learn all time. It will follow the given "meta-instruction" forever. For example, it can keep adding functions to my street view image package, testing the functions, receiving pull requests, and creating documentations. This is it only task. I may have research idea to prototype such an agent! I thinkg it is amazing.
(The image is generated by Gemini, please ignore its types.😁)
I tried to write some blogs in the site, but seems Google Site is not very good for blog writing, so I put the post to Wordpress.com. There is the link. Sorry for that!
This post was written in Feb. 2024. It is the first time I start to think and learn complex systems. I would like to write a story about this post later.