Map reduce pattern in LLM

Tags:

One of the primary patterns of using LLM is map reduce. For example, process multiple docs in mappers and then reduce them as a single result in the reducer.

LLM mapreduce sounds very intuitive and simple, but in reality it isn’t. One of the problem is hallucinations. When hallucinate, LLM fails unexpectedly in an obvious task. For example, when asked to copy paste title and url to the output while adding a summary of contents, LLM may touch the title and url to change them unexpectedly.

One way to avoid hallucinations is doing the obvious task manually while relying on LLM for creative task. As an example, when title and url should be kept, ask LLM produce summary only while letting the title and url pass through to the output manually (or, by human written code). Likewise, if a field should be removed after LLM, it should be removed manually.

Another solution, albeit it’s a bit tangential, is asking LLM to keep the quotes of original data when producing an output. In the summary example, quotes should be the ones directly related to the generated summary.

It’s also good idea to let LLM think step by step as COT (Chain of Thoughts) does.

Reducer stage should consider hallucinations as well. Let’s say we want LLM sort a list of inputs. When performing that, LLM may choose to change the data while sorting. To avoid it, one can write map stage that produce values for the sort (e.g. score), and sort the list manually using the values.

In sum, this is how one should run LLM map reduce:

  • Let the obvious values pass through LLM by manually written code.
  • Let the obvious values that should be removed after LLM removed manually.
  • Have LLM write both step by step reasons and quotes when generating the outputs.
  • Converts reducer to mappers that generate signals and manual merge code that utilizes the signals.