Suvansh Sanjeev • 2023-05-23
Data has stories to tell. We've created a tool that lets it speak.
Introducing our latest project - Open Data Interpreter (OpenDI), an open-source chatbot designed to provide users with an engaging, interactive interface to chat with and explore their own data.
Following our successful release of Blitz, a chatbot providing a user-friendly chat interface over football statistics, we wanted to extend this capability to a wider array of data sources. We were inspired by OpenAI’s Code Interpreter but saw an opportunity to create a tool that was open source and widely accessible. This led us to create Open Data Interpreter.
With OpenDI, users can interact with their CSV files using natural language queries. Just upload your CSV file, and start the conversation! You can generate statistics, tables, and graphs using simple conversational prompts. Want to know the average of a particular column or see a pie chart of categories? Just ask!
Video walkthrough of the Open Data Interpreter interface.
Some of the features offered by OpenDI:
Some limitations of its current implementation:
We use LangChain and GPT-3.5/4 to digest users’ chat history into a standalone question and map it to code to answer the question and generate appropriate tables/graphs as needed. We request a structured response containing three things:
Below is output for the question "create a scatterplot of the men's weights in september vs april," asked about this CSV file.
CODE: male_weights = df[df['Sex'] == 'M'][['Weight_Sep', 'Weight_Apr'] plt.figure(figsize=(10, 6)) sns.scatterplot(x='Weight_Sep', y='Weight_Apr', data=male_weights) plt.title('Scatterplot of Male Weights in September and April') plt.xlabel('Weight in September') plt.ylabel('Weight in April') img_path = 'male_weights_scatterplot.png' plt.savefig(img_path) out_variable = "Here is the scatterplot of weights of men in September and April." OUT_VARIABLE: out_variable IMG_PATHS: ['male_weights_scatterplot.png']
We then run the code in a constrained environment (allowing NumPy, Pandas, etc. but blocking the use of
exec, etc.). The specified output variable is used to return a textual response to the user, and the given image files are uploaded to Freeimage.host, a free image-hosting service. This output is then displayed to the user.
A couple features we want to get around to adding:
in-browserbranch of the GitHub repository. Update 5/25/23: The SQL option is live, speedy, and works well with GPT-3.5.