In this article I will show how to use a local LLM with RAG and tool calling to extract structured data from a document. My first example will show how to use tool calling to expand content from http links in the original prompt. The second example will show how to use tool calling to extract structured content that requires more detailed analysis of the content.

Initial Setup

In my demo I will be using Ollama as the local hosting environment for the LLM as well as a Python API for querying the LLM. I am using Llama 3.1 7B and LlamaIndex, but any LLM will work as long as it’s available through Ollama and supports tool calling. Check out the readme in the project for ways to download additional models.

Instead of installing the dependencies directly on my machine I have opted for a dockerized environment with docker-compose for running everything on my local DEV box. I have included the docker-compose.yml file below.

services: api: build: . restart: always ports: - "8080:9000" networks: - ollama-docker volumes: - ./api:/usr/api ollama-docker: image: ollama/ollama:0.4.0 ports: - 11435:11434 volumes: - .:/code - ./ollama/ollama:/root/.ollama container_name: ollama-docker deploy: resources: reservations: devices: - driver: nvidia count: "all" capabilities: [gpu] pull_policy: always tty: true restart: always environment: - OLLAMA_KEEP_ALIVE=24h - OLLAMA_HOST=0.0.0.0 networks: - ollama-docker networks: ollama-docker: external: false

Tool Calling

LLM Tool calling is essentially a feature that allows consumers of an LLM to register code functions that are made available to the LLM as a bridge to interact with external data. The core idea is to give the LLM a method signature with input parameters and metadata that will guide the LLM in extracting specific data that can be passed as input to the tool function. The function can then be called with the LLM provided input parameters to run user defined code. Not only can you use this to enrich LLM extracted data, but it also enables programmatic integration of the LLM in user applications since the method call provides a predictable schema.

Let’s look at a few examples where tool calling can be helpful.

The first example I will show is a simple case where I provide links in the prompt that I want the LLM to extract and feed to a custom tool method. Inside the tool method I will make http requests to pull in the content behind the links and merge the response with the original prompt. You can think of this as real-time RAG where external content is made available to the LLM to provide the necessary context to answer the question in the prompt.

Take the sample prompt below:

Write a summary of the simmilarities and differences between the two articles found here: https://techcrunch.com/2025/02/28/microsoft-hangs-up-on-skype-service-to-shut-down-may-5-2025/ and here: https://www.cnn.com/2025/02/28/tech/skype-microsoft-shutdown/index.html

In the prompt above I have included two links to content that is required to perform the requested article comparison.

The code sample below shows how to configure the LLM to achieve this by following the following sequence of events:

  1. Extract any links from the original prompt
  2. Make http requests to fetch the content behind the links
  3. Merge the loaded content with the original prompt for retrieval augmented generation (RAG)

The first code listing shows how to wire up the connection between the LLM and the method call.

Example 1 - Extracting links from the original prompt
from llama_index.llms.ollama import Ollama from llama_index.core import Settings from llama_index.core.tools import FunctionTool from llama_index.core import PromptTemplate from country_tool import * from article_service import * from link_tool import * llm_model_name = "llama3.1" def init_llm(): Settings.llm = Ollama(model=llm_model_name, request_timeout=1000.0, base_url = "http://ollama-docker:11434", temperature=0) def predict_with_external_link(prompt: str): tool = FunctionTool.from_defaults(fn=make_http_request, name="get_link_info") res = Settings.llm.predict_and_call([tool], prompt, verbose = True) return Settings.llm.predict(prompt=PromptTemplate(res.response))

Next, I will show the actual implementation of the method call. Notice that this is just a regular Python function, but to aid the LLM in understanding the input parameters I have added a method level comment and Pydantic Field definitions.

import aiohttp import asyncio from pydantic import Field from bs4 import BeautifulSoup async def make_http_request(urls: list[str] = Field(description="a list of links found in the content"), original_promt = Field(description="The original full prompt")): """Usfeful for extracting one or more links from a prompt""" async with aiohttp.ClientSession() as session: async with aiohttp.ClientSession() as session: tasks = [fetch(url, session) for url in urls] results = await asyncio.gather(*tasks) rag_content = ' '.join(results) return f"{original_promt} Rely only on the following content when generating a response: {rag_content}" async def fetch(url, session): """Asynchronous function to fetch a URL""" async with session.get(url) as response: text = await response.text() soup = BeautifulSoup(text, "html.parser") paragraphs = soup.find_all("p") article_text = ' '.join([p.get_text() for p in paragraphs]) return f"The content from {url} is {article_text}"

In addition to fetching the content I am also doing some massaging of the response to remove any html markup. Finally, I am merging the content with the original prompt to give the LLM the full context that is required to perform the analysis that is requested in the original prompt.

Example 2 – Extracting more detailed information

In the previous example it may have seemed like overkill to use method calling to extract http links since a simple regex would do the trick. However, in the next example I will show how to leverage method calling to extract input parameters that require deeper analysis of the RAG content. In this example I will pull in articles about specific countries from Wikipedia and ask the LLM to extract a list of all neighboring countries. To make it even more specific I want the LLM to extract only countries that share a physical border with the country from the article. All this information can be inferred from the RAG content from Wikipedia, but it would be much harder to extract this information with Regex. Instead, we can rely on the LLM’s excellent ability to process semantic meaning of text, which is key here since each article will represent the information differently.

The first code listing is very similar to the first example where I wire up the link between the LLM and the method

def predict(title: str): article = get_article(title=title) tool = FunctionTool.from_defaults(fn=get_country_border_info, name="get_country_border_info") prompt = f"Get information about {title} based on {article}. Rely only on the provided document when generating the response" res = Settings.llm.predict_and_call([tool], prompt, verbose = True) return res.response

In the second code listing I have included the implementation of the tool method. I have intentionally kept the implementation simple, but the key here is to provide as much detailed metadata to describe the function and the input parameters as possible. Think of this as prompting where the metadata provides the LLM with important information that will help increase accuracy.

from pydantic import Field def get_country_border_info(bordering_countries: list[str] = Field(description="bordering countries", default=[])): """Usfeful for getting a list of countries that share a physical border with the country""" bordering_countries.sort() return {"bordering_countries": bordering_countries}

As before I am using a method comment and Pydantic Field definitions for the input parameters. Notice how we can instruct the LLM to give us structured data like an array of countries instead of just a combined string.

Source

I have provided a Github repo here in case you want to check out the code.