Some might disagree, but I think the world runs on Excel. Whether it is government agencies, corporation small or large, each of them use spreadsheet and most notably Microsoft product — Excel.
In particular, I want to show how to create editable charts. Previously, I showed how we can generate dynamic graphs and charts using Python, which in turn can be exported. However, they are non-editable. In this case, I want to highlight how we can automate the process of Excel file generation and create editable charts.
Let’s begin with discussion of more in-depth motivation of why using programming language like Python is useful to generate Excel file.
Table of Content
- Environment & Workspace Set Up
- Create and/or Read Excel File
- Two Ways to Add or Get Data
- Generate Editable Chart using Python
So why create Excel file in programming language like Python if we can manually create the file?
First, in this day and age we tend to work with large datasets. Furthermore, they are constantly growing and changing (and often in real time). As a result, we are in need of tools that will allow us connecting to databases and/or fetching data via from somewhere (e.g. API calls). In sum, we need to process large amounts of data that are constantly being updated and pulling data from Excel file directly could be both cumbersome and slow.
Second, we can automate the process of Excel file generation. In cases when one has large models, this might not be the case. But imagine a use case where we may want to automate Excel file generation with one click of a button. The process will allow us automation of file generation and will make it easy to make adjustments in the future.
2. Environment & Workspace Set Up
In the previous article, I have shown how to set up virtual environment in Python and install dependencies. Follow the link here and refer to Section 2. Set up Python Environment & Plotly Installation
Now, clone this GitHub repo into your local computer using your IDE (e.g. VSCode):
git clone https://github.com/AlmasM/Medium.git
Then navigate to directory:
You will see
requirements.txt file. Install dependencies into your virtual environment:
pip install -r requirements.txt
This will install all the necessary dependencies we use in this tutorial
At this point you have workstation set up and we are ready to go into next steps.
Create and/or Read Excel File
excel_file.py within cloned repo and you will that we are creating Python class
ExcelBook. Let’s break down process line by line:
Lines 1–8: important statements we use including chart creation,
openpyxl library that let’s us create and modify Excel file using Python.
Line 11–17: we are creating Python class
ExcelBook with constructor that contains variables that will be accessible within this class.
Lines 19–24: create auxiliary method that checks if file with given name exists and return boolean value.
If you want to invoke this class, you can create
main.py file and write the following:
from excel_file import ExcelBookcwd = os.getcwd() # get current working directory
fileName = "SampleSpreadsheet.xlsx" # give it a name
filePath = os.path.join(cwd, fileName) # create file pathexcelObject = ExcelBook(filePath) # instantiate object
isFileExists = excelObject.fileExists
The following code will output whether if file with given name exists. Note that we are just validating file exists, but not yet whether it is an Excel file
Now, let’s create two methods
createFile, which will take as an input and either fetch Excel file or create if one doesn’t exist. In this case, we will ensure the file is loadable (aka is valid Excel file)
createFile use aux method fileExists we created above. Depending on the outcome of the aux method, we will load the notebook (Excel file) or create one with
In case of
getFile, we take as a parameter name of the worksheet. Remember that Excel file can have more than one worksheet. In this method, we will first see if requested sheet exists. If sheet DNE, we will create one.
In the createFile, we verified file DNE, and then we will try to create file and worksheet as shown on lines 50–57.
Now, let’s use instantiated object
excelBook and create or get file.
workSheetName = "Sheet-List"
currentWorkSheet = excelObject.getFile(workSheetName)if not currentWorkSheet:
isFileCreated = excelObject.createFile(workSheetName)
Recall that we require sheet name provided for either getting or creating the file. The snippet will get file if one exists with given sheet name and if not, we will create file. Also note that we have already created
excelObject at this point and there is no need to recreate it again.
Two Ways to Add or Get Data
There are at least two ways to add data into Excel worksheet. If you have ever worked with Python and data, chances that you have seen
pandas library is pretty high. In this section we will add into Excel using random data structure and
pandas dataframe. We will also use both methods to fetch the data.
You may wonder, why even use
openpyxl library if pandas gives us methods like
It is true that you can use
pandas.to_excel() method to convert
pandas dataframe into Excel format. You can also use
pandas.read_excel() method to read Excel file as Pandas dataframe. In general,
pandas provides methods to fetch and manipulate Excel file to do some calculations. However, these methods are limited in functionality. In other words,
pandas doesn’t let you create charts or various formulas into the spreadsheet like
Let’s start with adding the data. Lines 62–65 show how we are takin a list of lists and adding each list as a new row into the spreadsheet.
Lines 67–76: library
openpyxl provides method
dataframe_to_rows() which automatically converts
pandas dataframe into Excel file along with header and index column. We can also specify styling as in lines 73–74.
Note in both cases we need to
.save() the workbook we are working on, or else data won’t be displayed.
You can invoke both methods above on our
excelObject as follows:
data = [ ['Technology Type', 'Market Share(%)'], ['Phone', 50], ['PC', 20], ['Tablets', 20],['Smart Watch', 10]]df = pd.DataFrame(data)
In the snippet above, we created sample dataset and dataframe using
Now, we can write either:
We may need to use one or the other depending on how our dataset is incoming and being preprocessed. Ultimately it comes down to your project requirements when going with one or the other
Fetching the data will be similar process. Use
excelObject such as this:
data_result = excelObject.getDataRowByRow()ORdf_result = excelObject.getDataAsDataFrame()
Note that we haven’t demonstrated methods
getDataAsDataFrame() yet. The point was to illustrate the benefit of creating a class and how we access it’s values when using a class.
In general, reading data using
openpyxl library is pretty straightforward.
Lines 78–85: we iterate row by row of the spreadsheet and create list of lists to store the data
Lines 87–91: we use built in method worksheet.values to fetch only the values of the worksheet where we are working. Note that it is
.values that gives us only values which in turn are converted into dataframe.
The last part of this tutorial is to create editable graph in Python and be exported in Excel. I believe all of the work was leading up to this feature.
Generate Editable Chart using Python
Again, we could have used
pandas helper methods to work with Excel file, however, this is the benefit of using
openpyxl library. We can create editable charts in the Excel. In fact, we can do much more and I recommend checking out their documentation to view their capabilities (e.g. pivot tables, formulas, styling, etc).
In the snippet below we are creating pie chart and passing in data to display. Lines 95–96 define labels and data that will be passed into the chart itself. We can also define where the chart can start on line 101 by setting start cell to be D1. Lastly, ensure save the changes or otherwise it won’t be reflected.
To invoke this file we can use sample data we created above and use
excelObject again. All we need to do is pass in data and write following invocation:
The final view will be editable pie chart along with inserted data as shown below:
You can find working source code in GitHub page here.
In article we created and modified Excel file using Python. Why do it outside of Excel file itself? Because datasets we work with are growing in size and/or there might be a need to automate the process of Excel file generation.
In general, if you need to create complex Excel files for budgeting with many formulas, this might not be the most useful path. However, imagine scenario where user may want to export their dashboard chart in an editable format with just one click. In this case, you won’t be able to generate Excel file manually for each user. As a result, using programming language like Python to automate the process might come in handy.