top of page
kim ngan nguyen

[Pandas AI + Streamlit] Building a Data Analysis Assistant

Updated: Aug 16, 2023


If you want to extract information from a structured dataset, what languages would you use to "talk" with it? I guess Python? or SQL? In today article, we will look into how to use human natural language to "talk" with a structured dataset like in the below video:

Feature Analysis

  1. Upload csv file: Build a web app that allows user to upload csv file, see a preview of the dataset. We will use Streamlit to build this web app.

  2. Ask questions: User can send questions relating to the data inside the csv file and the web app will generate answers. We will use Pandas AI (Powered by OpenAI) to build the agent behind the web app.

Step by Step Guidance

1. Install neccessary packages

!pip install openai
!pip install streamlit
!pip install pandasai
!pip install langchain

2. Start Writing The Backend For Our WebApp

The first line of code is:

%%writefile app.py

You can use this command to create or update the Streamlit app's code in a separate Python file named app.py. This allows you to develop and organize your Streamlit app in a more structured manner, rather than having all the code directly in the notebook cells.

Next, we will install all libraries needed to build this web app:

import streamlit as st
from langchain.chat_models import AzureChatOpenAI
import pandas as pd
from pandasai import PandasAI

Then, we will innitiate the LLM. In this project, I use Azure Open AI gpt-35-turbo as the engine of our web app.

#initialize LLM object
llm = AzureChatOpenAI(
    deployment_name="your deployment name",
    model_name="gpt-35-turbo",
    openai_api_key='your azure open ai key',
    openai_api_version = '2023-06-01-preview',
    openai_api_base='your azure endpoint'
    )

Next, we will activate the Pandas AI agent - who will help us to query the dataset and generate response from user's inquiries. Imagine instead of using python or SQL to query from a structured dataset, we can use human natural language to 'talk' with the dataset.

pandas_ai = PandasAI(llm)

Finally, we will build the functions and content for our web app UI with Streamlit

st.title('Data Analysis Made Easy')
uploaded_file = st.file_uploader('Upload a csv file for your analysis', type = ['csv'])

if uploaded_file is not None:
  df = pd.read_csv(uploaded_file)
  st.write(df.head(5))
  prompt = st.text_area('Enter your question: ')

  if st.button('Ask'):
    if prompt:
      with st.spinner('Gerating response ...'):
        st.write(pandas_ai.run(df, prompt))
    else:
      st.warning('Please enter a prompt.')

And now we can launch our web app and test it!

Note that: I build the web app using Google Colab, so the code might not be applicable to you.

!streamlit run app.py &>/content/logs.txt &
!npx localtunnel --port 8501

This will launch the web app and ask us for IP address. To get this IP, run the below code:

import urllib
print("Password/Enpoint IP for localtunnel is:",urllib.request.urlopen('https://ipv4.icanhazip.com').read().decode('utf8').strip("\n"))

Conclusion

Through the fusion of Streamlit's web-building prowess and Pandas AI's cognitive capabilities, we have constructed a bridge between data and human inquiry, enabling us to converse with datasets in a language that is both intuitive and insightful. Although the PandasAI performs very well with clear and simple queries, it currently struggles to answer more complicated questions, where thinking or reasoning is involved. Thus, you should use it only as your assistant to extract data more quickly, and don't expect it to think or argue for you!


Reference

BugBytes. (2023, May 22). PandasAI, OpenAI and Streamlit - Analyzing File Uploads with User Prompts [Video]. YouTube. https://www.youtube.com/watch?v=oSC2U2iuMRg

Contributors, P. (n.d.). Pandas-ai. https://pandas-ai.readthedocs.io/en/latest/



Comments


bottom of page