kim ngan nguyen
- Aug 13, 2023
- 6 min read

[AzureOpenAI + Langchain] Building an Effective HR Assistant Chatbot: A Comprehensive Guide

Updated: Aug 16, 2023

In this article, we will explore the step-by-step process of building an HR assistant chatbot. This chatbot will be designed to provide users with quick and accurate responses to inquiries related to company policies, employee personal information, and even execute actions based on user requests. We will leverage technologies such as Langchain, Azure Open AI, Pandas AI, Pinecone, and Streamlit to create a powerful and user-friendly chatbot. A demo video of how the chatbot works is shown as below.

Feature Analysis

Let's begin by analyzing the key features of our HR chatbot and the dataset it will utilize.

1. Answering Company Policy Queries: Users can ask questions about company policies, and the chatbot will provide answers based on the provided policy document. We will use a text file containing the company policy for this purpose.

To implement this feature, we'll need to:

Create embeddings for the policy document using Azure Open AI's text-embedding-ada-002 model.
Build a vectorstore database to store the generated embeddings, utilizing Pinecone for semantic search.

2. Handling Employee Personal Data Queries: Users can inquire about their personal data stored in the HR system or request actions to be performed on their behalf. For example, checking vacation leave balance or booking a leave. To implement this feature, we will utilize Pandas AI to extract relevant information from the employee dataframe and perform data alterations.

Here is my code to generated employee data for testing purpose:

import pandas as pd
import random
from faker import Faker 

# Initialize Faker for generating random data
fake = Faker()

# Lists to store employee data
employees = []
departments = ["Engineering", "Product Management", "Design", "Sales", "Marketing"]
positions = ["Software Engineer", "Product Manager", "UX Designer", "Sales Representative", "Marketing Specialist"]
statuses = ["Full-time"]
marital_statuses = ["Single", "Married", "Divorced", "Widowed"]

# Generate 5 employees' data
for _ in range(5):
    employee = {
        "Name": fake.name(),
        "DOB": fake.date_of_birth(minimum_age=22, maximum_age=60).strftime('%Y-%m-%d'),
        "Hired Date": fake.date_this_decade().strftime('%Y-%m-%d'),
        "Employment Status": random.choice(statuses),
        "Department": random.choice(departments),
        "Position": random.choice(positions),
        "Marital Status": random.choice(marital_statuses),
        "Gross Salary": round(random.uniform(50000, 150000), 2),
        "Sick Leave": random.randint(5, 15),
        "Vacation Leave": random.randint(10, 25),
        "Supervisor": fake.name()
    }
    employees.append(employee)

# Create a DataFrame from the employee data
df = pd.DataFrame(employees)

# Print the DataFrame
df

And this is what it generates:

Finally, the engine of our chatbot will be Azure Open AI - gpt-4.

Step-by-Step Guidance to build HR Chatbot Assitant

1. Install all neccessary libraries and packages

Ensure you have the required libraries and packages installed using the provided code snippets.

!pip install openai
!pip install pinecone-client
!pip install langchain
!pip install google-search-results
!pip install tiktoken
!pip install pandas
!pip install Faker
!pip install pandasai

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Pinecone
import pinecone
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType
import pandas as pd
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import RetrievalQA
from langchain.utilities import SerpAPIWrapper
from datetime import datetime
from langchain.tools import BaseTool
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from pandasai import PandasAI

2. Innitiate LLM

Set up the foundational language models for the chatbot:

Azure Open AI: gpt-4 for conversation handling.
Azure Open AI: text-embedding-ada-002 for document embeddings.

Azure Open AI: gpt-4

llm = AzureChatOpenAI(
    deployment_name="chat-agent",
    model_name="gpt-4",
    openai_api_key='your open ai key',
    openai_api_version = '2023-06-01-preview',
    openai_api_base='your azure open ai endpoint',
    openai_api_type='azure'
    )

Azure Open AI: text-embedding-ada-002

# initialize embeddings object; for use with user query/input
embed = OpenAIEmbeddings(
                deployment="embeddings",
                model="text-embedding-ada-002",
                openai_api_key='your open ai key',
                openai_api_version = '2023-06-01-preview',
                openai_api_base='your azure open ai endpoint',
                openai_api_type='azure'
            )

3. Create vector database for document (HR Policy)

Create a vector database using Pinecone to store embeddings for the company policy document.

# initialize pinecone client and connect to pinecone index
pinecone.init(
        api_key="your pinecone api key",
        environment="your pinecone environment")
index_name = 'tk-policy'

#Load the text file 
loader = TextLoader("/content/hr_policy.txt")
policy = loader.load()
#split the text into chunks
text_splitter=RecursiveCharacterTextSplitter(chunk_size=2500, chunk_overlap=100)
docs=text_splitter.split_documents(policy)

To create and upload all embeddings to pinecone database, use the below code:

docsearch = Pinecone.from_texts([t.page_content for t in docs], embed, index_name=index_name)

Once you have uploaded these embeddings vector, next time, you only need to reload the database:

docsearch = Pinecone.from_existing_index(index_name, embed)

4. Create Langchain Agent

Next, we setup Langchain agents to facilitate querying information and generating responses. We will create agents for:

Retrieving information from the company policy.
Extracting and manipulating data from the employee dataframe.
Accessing real-time internet data using SerpAPIWrapper.

The first agent will help us find answer from the company policy text file:

timekeeping_policy = RetrievalQA.from_chain_type(llm=llm,chain_type="stuff",retriever=docsearch.as_retriever())

Let's test it:

timekeeping_policy.run('How do leave days get incremented?')

And here is the answer it generates, which is correct according to the HR Policy text file.

Leave days are typically incremented on a monthly basis. According to the information provided, employees earn 1.25 days of Vacation Leave and 1.25 days of Sick Leave per month of service, accruing to 15 days per year. For Service Incentive Leave, employees earn 5 days per year. Other types of leave, such as Paternity Leave, Maternity Leave, Volunteer Service Leave, Family Care Leave, Personal Leave, and Religious Observance Leave, have specific eligibility and accrual details mentioned in the policy but do not specify the monthly increment.

The second agent will help us find answer from the employee dataframe and perform actions to alter data in the dataframe:

pandas_ai = PandasAI(llm)
class pandas(BaseTool):
    name = "pandas"
    description = "use this tool when you need to extract employee information from df dataframe"
    
    def _run(self, prompt):
        return pandas_ai(df, prompt=prompt)
    
    def _arun(self, prompt):
        raise NotImplementedError("I cannot answer this question")

Let's test it:

pandas().run('What is Amanda Taylor salary?')

And here is the answer it generates, which is correct according to our employee dataframe:

50056.45

Finally, to allow the chatbot agent to get access to realtime data on the Internet, we add google search agent.

search = SerpAPIWrapper(serpapi_api_key='your serpapi key')

Now we can combine three agents together as one agent and let that combined agent be the backend of our chatbot:

user = df.Name[1] # set user

instructions = """ Use the following format:
              Question: the input question you must answer
              Thought: you should always think about what to do
              Action: the action to take, should be one of  the tools you have accessed to. 
              Action Input: the input to the action
              Observation: the result of the action
              ... (this Thought/Action/Action Input/Observation can repeat 5 times)
              Thought: I have gathered detailed information to answer the question
              Final Answer: the final answer to the original input question and short explaination on how you arrive at that answer."""


tools = [Tool.from_function(
              name = "Timekeeping Policies",
              func=timekeeping_policy.run,
              description= f"""
              Useful for when you need to answer questions about company general policies.
              {instructions}
              """),
         Tool.from_function(
              name = "Employee Data",
              func=pandas().run,
              description = f"""
              Useful for when you need to answer questions about employee {user} personal data stored in pandas dataframe 'df'.
              If user request you to book leave, you will deduct the number of leave days of the type of leave you user try to book (vacation/sick leave) directly to dataframe 'df' and give the number of leaves of that specific type (sick/vacation leave) left after booking.
              {instructions}
              """
              ),
         Tool.from_function(
              name = "Realtime Internet Data",
              func=search.run,
              description= f"""
              Useful for when you need to answer questions which require realtime internet data.
              {instructions}
              """)]
agent_kwargs = {'prefix': f'You are friendly HR assistant. You are tasked to assist the current user: {user} on questions related to HR. You first greet user by his/her name, then ask user how you can help him/her. Do not give false information. You have access to the following tools:'}

# initialize the LLM agent
agent = initialize_agent(tools,
                         llm,
                         agent = AgentType.ZERO_SHOT_REACT_DESCRIPTION,
                         handle_parsing_errors=True,
                         verbose=True,
                         agent_kwargs=agent_kwargs
                         )
# define q and a function for frontend
def get_response(user_input):
    response = agent.run(user_input)
    return response

Now you have an chatbot agent that can answer any HR related questions!

5. Build the frontend for our chatbot

Now that we have finished building back-end of our chatbot, let's make our chatbot come to live with Streamlit. Below are the code to build the frontend with Streamlit.

!pip install import-ipynb
!pip install -q streamlit
!pip install streamlit_chat

%%writefile app.py

import streamlit as st
import random
from streamlit_chat import message
import import_ipynb
from hr_agent import get_response

st.write('Hello, how are you today? :sunglasses:')

st.header("I'm your HR Assistant!")
st.markdown("Ask your HR-related questions here.")
def process_input(user_input):
    response = get_response(user_input)
    return response

# Initialize session_state if it doesn't exist
if "past" not in st.session_state:
    st.session_state["past"] = []
if "generated" not in st.session_state:
    st.session_state["generated"] = []
if "input_message_key" not in st.session_state:
    st.session_state.input_message_key = str(random.random())

chat_container = st.container()
user_input = st.text_input("Type your message and press Enter to send.", key= st.session_state.input_message_key)

if st.button("Send"):
    response = process_input(user_input)
    st.session_state["past"].append(user_input)
    st.session_state["generated"].append(response)
    st.session_state["input_message_key"] = str(random.random())
    st.experimental_rerun()

if st.session_state["generated"]:
    with chat_container:
        for i in range(len(st.session_state["generated"])):
            message(st.session_state["past"][i], is_user=True, key=str(i) + "_user")
            message(st.session_state["generated"][i], key=str(i))

Now let's launch our web app!

!streamlit run app.py &>/content/logs.txt &
!npx localtunnel --port 8501

Conclusion

This article has provided a detailed guide on creating a highly functional HR assistant chatbot. By integrating Langchain's capabilities with Azure Open AI, Pandas AI, Pinecone, and Streamlit, we've successfully developed a tool that can significantly enhance HR-related interactions and streamline information retrieval. With the knowledge gained from this guide, you're well-equipped to build your own customized chatbot tailored to your organization's HR needs.

Reference

Bonifacio, S. (2023, June 23). Creating a (mostly) Autonomous HR Assistant with ChatGPT and LangChain’s Agents and Tools. Medium. https://pub.towardsai.net/creating-a-mostly-autonomous-hr-assistant-with-chatgpt-and-langchains-agents-and-tools-1cdda0aa70ef

Stepanogil. (n.d.). GitHub - stepanogil/autonomous-hr-chatbot: An autonomous HR agent that can answer user queries using tools. GitHub. https://github.com/stepanogil/autonomous-hr-chatbot

HeyitsKim