Do not use LLM for data processing nor filtering

https://www.pexels.com/photo/time-lapse-photography-of-vehicles-passing-near-building-635609/

When dealing with real-time chat applications, it's crucial to avoid using Large Language Models (LLMs) for data processing tasks. We can provide tools to LLM to have data processing.

Accuracy and Reliability

LLMs are not designed for precise data processing. They may misinterpret or inaccurately process data, leading to unreliable outcomes. In order to perform accurate data processing, we need to provide strict rules and logic to LLM. And unfortunately, LLMs are not good at following strict rules. For example, if we want to filter out messages that contain certain keywords, an LLM might miss some instances or incorrectly flag messages that do not contain those keywords.

The proper way to handle such tasks is to expose tools to LLM so that it can call those tools to perform the data processing. These tools can be implemented using traditional programming languages like Python, which are better suited for precise data manipulation. And, they can be tested and validated to ensure they work correctly.

Context Size Limitations

LLMs have a limited context window size. Feeding large amounts of data into the model can quickly exhaust this limit, leading to incomplete or inaccurate responses. For real-time chat applications, where data can be voluminous and dynamic, relying on LLMs for data processing can result in loss of critical information.

Whenever the data to be processed exceeds the context window size of the LLM, LLM will not be able to see the entire data, which can lead to incorrect processing results. For example, if we provide LLM with the response from a RESTful API that returns a large dataset, LLM may only be able to see part of the dataset due to the context size limitation. This can lead to incorrect processing results, such as missing important data points or misinterpreting the data.

Latency

Using LLMs for data processing can introduce significant latency. LLMs are computationally intensive and can take longer to process data compared to traditional programming methods. In real-time chat applications, where speed is essential, this latency can degrade the user experience.

Moreover, in enterprise deployed LLM, there are content filtering tasks that need to be performed on user inputs before sending them to LLM for processing. When the input data is small, using LLM for content filtering is acceptable. However, as the size of the input data increases, the latency introduced by using LLM for content filtering becomes significant. For example, when the input data size reaches several kilobytes, the latency can increase to several seconds, which is unacceptable for real-time chat applications.

This latency is primarily due to the time it takes to send data to the LLM, process it, and receive the response. In contrast, traditional programming methods can handle data processing tasks much more quickly, ensuring that the chat application remains responsive.

Cost Considerations

The number of tokens processed by an LLM directly impacts the cost of using the model. Relying on LLMs for data processing can lead to higher costs, especially when processing large volumes of data. In real-time chat applications, where data is continuously generated, this can result in significant expenses.

In order to run a sustainable real-time chat solution, product owners should be aware of the cost of using LLMs. When the solution is too costly to operate, it may not be viable to continue using it.

Rate Limits

LLM providers often impose rate limits on API usage. Relying on LLMs for data processing can lead to hitting these limits, resulting in service interruptions or degraded performance. In real-time chat applications, where timely responses are critical, this can be a significant drawback.

The volume of data that needs to be processed in real-time chat applications can be high. If the application relies on LLMs for data processing, it may quickly hit the rate limits imposed by the LLM provider. This can lead to delays in processing data or even service outages, which can negatively impact the user experience.

Maintainability and Predictability

Code based data processing is often easier to maintain and update compared to LLM-based solutions. Changes in business logic or data handling can be implemented directly in the code without needing to retrain or adjust an LLM.

Changing the code that performs data processing is straightforward. For example, if we need to update the filtering criteria for messages, we can simply modify the code to reflect the new criteria. In contrast, updating an LLM to change its data processing behavior may require retraining the model or adjusting its parameters, which can be complex and time-consuming.

LLM is inherently probabilistic, which means its outputs can vary even with the same inputs. This unpredictability can be problematic for data processing tasks that require consistent and reliable results. In contrast, code-based solutions provide deterministic outcomes, ensuring that the same input will always yield the same output.

LLM is great in understanding natural language, reasoning and selecting the required tools to perform tasks. However, when it comes to data processing that requires strict rules, code-based solutions are more suitable.

In conclusion, while LLMs are powerful tools for natural language understanding and generation, they are not ideal for data processing tasks in real-time chat applications. Traditional programming methods offer greater accuracy, reliability, and efficiency for these tasks, making them the preferred choice for developers working on real-time chat solutions.

Dennis Seah

Search This Blog