In this article I will show how to use a local LLM with RAG and tool calling to extract structured data from a document. My first example will show how to use tool calling to expand content from http links in the original prompt. The second example will show how to use tool calling to extract structured content that requires more detailed analysis of the content.
Initial Setup
In my demo I will be using Ollama as the local hosting environment for the LLM as well as a Python API for querying the LLM. I am using Llama 3.1 7B and LlamaIndex, but any LLM will work as long as it’s available through Ollama and supports tool calling. Check out the readme in the project for ways to download additional models.
Instead of installing the dependencies directly on my machine I have opted for a dockerized environment with docker-compose for running everything on my local DEV box. I have included the docker-compose.yml file below.
Tool Calling
LLM Tool calling is essentially a feature that allows consumers of an LLM to register code functions that are made available to the LLM as a bridge to interact with external data. The core idea is to give the LLM a method signature with input parameters and metadata that will guide the LLM in extracting specific data that can be passed as input to the tool function. The function can then be called with the LLM provided input parameters to run user defined code. Not only can you use this to enrich LLM extracted data, but it also enables programmatic integration of the LLM in user applications since the method call provides a predictable schema.
Let’s look at a few examples where tool calling can be helpful.
The first example I will show is a simple case where I provide links in the prompt that I want the LLM to extract and feed to a custom tool method. Inside the tool method I will make http requests to pull in the content behind the links and merge the response with the original prompt. You can think of this as real-time RAG where external content is made available to the LLM to provide the necessary context to answer the question in the prompt.
Take the sample prompt below:
Write a summary of the simmilarities and differences between the two articles found here: https://techcrunch.com/2025/02/28/microsoft-hangs-up-on-skype-service-to-shut-down-may-5-2025/ and here: https://www.cnn.com/2025/02/28/tech/skype-microsoft-shutdown/index.html
In the prompt above I have included two links to content that is required to perform the requested article comparison.
The code sample below shows how to configure the LLM to achieve this by following the following sequence of events:
- Extract any links from the original prompt
- Make http requests to fetch the content behind the links
- Merge the loaded content with the original prompt for retrieval augmented generation (RAG)
The first code listing shows how to wire up the connection between the LLM and the method call.
Example 1 - Extracting links from the original prompt
Next, I will show the actual implementation of the method call. Notice that this is just a regular Python function, but to aid the LLM in understanding the input parameters I have added a method level comment and Pydantic Field definitions.
In addition to fetching the content I am also doing some massaging of the response to remove any html markup. Finally, I am merging the content with the original prompt to give the LLM the full context that is required to perform the analysis that is requested in the original prompt.
Example 2 – Extracting more detailed information
In the previous example it may have seemed like overkill to use method calling to extract http links since a simple regex would do the trick. However, in the next example I will show how to leverage method calling to extract input parameters that require deeper analysis of the RAG content. In this example I will pull in articles about specific countries from Wikipedia and ask the LLM to extract a list of all neighboring countries. To make it even more specific I want the LLM to extract only countries that share a physical border with the country from the article. All this information can be inferred from the RAG content from Wikipedia, but it would be much harder to extract this information with Regex. Instead, we can rely on the LLM’s excellent ability to process semantic meaning of text, which is key here since each article will represent the information differently.
The first code listing is very similar to the first example where I wire up the link between the LLM and the method
In the second code listing I have included the implementation of the tool method. I have intentionally kept the implementation simple, but the key here is to provide as much detailed metadata to describe the function and the input parameters as possible. Think of this as prompting where the metadata provides the LLM with important information that will help increase accuracy.
As before I am using a method comment and Pydantic Field definitions for the input parameters. Notice how we can instruct the LLM to give us structured data like an array of countries instead of just a combined string.
Source
I have provided a Github repo here in case you want to check out the code.