How to reliably use llm to get json outputs

Tags:

When using LLM, esp., for getting json output, there are many things that go wrong. I’ll explain some of them in this post.

Prompt

First is to ask LLM to generate json as output. I use the following:

* Output format
  - Output is a json format str.
  - JSON will have fields exactly in the following order. This means that the fields that appear later can not be generated before the fields above. Below are the fields to fill in:
    .. <field1>: <field explanation>
    .. <field2>: <field explanation>
...
  - Never say additional text than the JSON output itself so that it can be parsed directly by json.load:
    .. Here is the JSON output" or similar should not be said.
    .. Output shouldn't be enclosed within ``` ... ```
    .. Never write 'json'.
    .. In the instruction, [INPUT]...[/INPUT] or its variants are for organizing instructions. However, your generated output must not have that. Instead, just print the JSON output only.
    .. Output JSON must be parsable by json.loads(). You must use proper quotation, field separations using comma, and escaping.
  - When it's impossible to give answer, simply write: {"errors": ".... reason of no answer ...."}. For example, {"errors": "No cause and effect can be identified."}.

Every LLM generates all the different errors. Tweak the above as necesary.

Parsing

Even with a strong expression in the instruction, LLM may still produce incorrect json string. One approach to fix that would be asking LLM to fix the json string before the parse, but it comes at the cost of additional LLM call, i.e,. $$$.

Instead, I perform manual fixes.

def _json_parse(content: str) -> dict:
    """Parse json, considering all the weirdity due to scraping and LLM.
    1. Replace new lines with ' ' as new line values cause trouble. This
       assumes that the output from LLM doesn't need to contain paragraphs.
    2. Replace \\\' with '. It's cause by dict printed with single quote,
       e.g., '....\'...' which LLM escapes as "... \\\' ...".
    """
    lines = []
    for current in content.strip().split("\n"):
        current = current.strip()
        if not current:
            continue
        current = current.replace("\\'", "'")
        lines.append(current)
    return json.loads(" ".join(lines))

Passthrough, Drop

Even when a LLM is asked to modify or add a new field, it has all the reasons to modify irrelevant fields. For example, when asked to add “summary” field to { “url”: … , “contents”: … }, it may change “url” or “contents” additionally.

Likewise, even when LLM is asked to remove “contents” while adding “summary”, it may still keep the “contents”.

We can define and use passthrough and drop to update resulting json.

# inputs is the input json to the llm
# outputs is the json.loads()-ed llm output
# passthrough is a set, containing keys to passthrough from inputs to outputs
# drop is a set, containing keys to remove
outputs = {
  **outputs.outputs,
  **{k: v for k, v in inputs if k in pass_through},
}
outputs = {
  k: v for k, v in outputs.outputs.items() if k not in self.drop
}

Ensure generated

As LLM can do all the weird thing, it can also refuse to add a field that was asked to add. For example, “summary” might not be added to the {“url”: …, “contents”: … } inputs. Thus, the last thing to check is if the fields that were asked to add are actually added.

# ensure_generated is a set of keys that must be present in the outputs.
for k in ensure_generated:
  if k not in outputs:
    raise ...
return outputs

Manual code for reduction

I decided not to rely on LLM for stuffs that don’t require creativity due to all the mistakes LLM can make. Let’s say there’s a reduce stage in a map-reduce. Such a reduce may perform concatenation of map outputs, sorting of map outputs, etc. Those should be manually written code.

Manually written reducer should not believe its inputs. For example, when mapper is asked to generate ‘score’ as float, chances are that they may contain incorrect thing even like “adsdf!”.

Below is how I perform reduce.

def sort_by_score(inputs_list: list[Inputs], logger) -> Outputs:
    filtered_list = []
    for i in inputs_list:
        try:
            _ = float(i.inputs["score"])
        except ValueError:
            logger.warning(f"Skip {i.inputs['score']} as it's not a float")
            continue
        filtered_list.append(i)
    filtered_list.sort(key=lambda i: i.inputs["score"], reverse=True)
    return Outputs(outputs={"summaries": [i.inputs for i in filtered_list]})

Conclusion

LLM can make all the mistakes however one constraints its outputs. Use of LLM to fix the output may work, but not guaranteed. What’s more troublesome is that each LLM call costs money, and we can avoid it. This post explained how one could use LLM reliably in this settings.