Coding in Python is fun but what makes it even more fun is the availablity of Packages suited for different purposes. For example, availability of scientific calculation and Machine Learning packages is what has made Python the most popular language in Data Science and Analytics. In this post, we will get an introduction to the world of Python packages and see how we can build our own packages.
In my my previous post about Python modules, Use Modules to Better Organize your Python Code, I discussed how we can better organize our codes using Python Modules. In this post we will see how we can better organize our Python modules using Package.
We will look at,
What and Why?¶
Python packages are collections of modules.
Python modules are the homes of functions and variables. Whereas, Python packages are the homes of modules.
❓ But if we can organize our codes using modules then why bother to use packages?
As we already know that code base tends to grow. With the growing code base, you will very likely group your functions into multiple modules based on the types of tasks they perform. As the number of modules grow, a natural progression is to find a way to organize the modules based on the types of problem they try to solve. The modules organized in a directory form can easily turned into Python Package in order to make the module usage and maintenance streamlined.
Packages are a way of structuring Python’s module namespace by using “dotted module names”. - Source: Python Tutorial
Once a package is installed, we can easily access to the modules stored in different directory levels using a dot notation.
An Example Scenario¶
Before getting into our discussion about packages, let's think about a scenario so that it'll be easier to put our learning into a context. Let's assume that we are working on a project where we get user names separated by a specific character stored in a long string. We need to break them down, assign them with unique numeric IDs so that they can be stored in a database in future. So our tasks are to:
- Break the input string into a set list of names,
- Then assign unique IDs to these names.
To perform these tasks we will build a couple of functions, store them in a couple of modules, and eventually will wrap them inside a Python package. It's definitely an overkill to build a package for such a trivial project but for our learning purpose let's assume it's a good idea for now 🤷♂️.
Two Modules¶
Let's create a couple of modules: stringProcess
, idfier
.
stringProcess
: A module that would help us process strings. Currently it contains just one function calledstringSplit()
that takes a string then split it based on the desired separator.idfier
: A module that will help us create unique IDs. Currently it contains only one function calledrandomIdfier()
that takes a list of name, assign them with randomly generated unique IDs, and save them as a dictionary.
Create two separate .py
files with the codes bellow and name the files by the module names. Also, make sure to save them in the same directory of where you are running your script or notebook.
%%script false --no-raise-error # to make sure the code block doesn't execute
# stringProcess module
print("stringProcess module is initialized")
def stringSplit(string, separator):
outputList = string.split(sep=separator)
return outputList
# --------------------------------------------------------------
# idfier module
print("idfier module is initialized")
import random as rn
def randomIdfier(ids):
outputDict = {}
for id in ids:
outputDict[id] = rn.randint(1000, 9999)
return outputDict
Couldn't find program: 'false'
Useful Module Properties for Package Building¶
We have discussed general module properties in our last post. Here we will discuss some additional properties that will come handy in our discussion about package development.
Module Initialization¶
Once a module is imported, Python implicitly executes the module and have it initialize some of it's aspects. One important aspect to notice is that,
A module initialization takes place only once in a project.
If a module is imported multiple times, which is not necessary but even if it happens i.e. inside a module used, for the subsequent imports, Python remembers the previous import and silently ignores the following initializations. You can see this feature of Python in action in the following example. Where we see that when we import idfier
and stringProcess
modules twice but they only prints out the messages once during the first initialization and doesn't produce any message during the subsequent imports.
import idfier
import idfier
import stringProcess
import stringProcess
idfier is used as a module stringProcess is used as a module
Private Properties in Modules¶
You may want to have variables inside your Modules that are only intended for internal use. By nature these are considered as private properties. You can declare such properties i.e. variable or functions, by one or two underscores (__
). But doing this is merely a convention and doesn't impose any protection per se.
⚠️ Unlike other languages like Java, Python doesn't impose any strict restriction on accessing such private properties. Adding underscores gives other developers a note that the properties are only intended for internal use.
__name__
Variable¶
Modules are essentially Python scripts. When Python scripts are imported as Modules, Python creates a variable called __name__
and stores the name of the module in it e.g. when we import our idfier
module, the __name__
variable contains the value - idfier
in it. On the contrary when a script is directly executed, the __name__
variable contains __main__
as the value.
For demonstration, let's add the following simple if-else
condition inside the stringProcess.py
file and checkout how we can use the __name__
property.
name = "stringProcess"
if __name__ == "__main__":
print(name, "is used as a script")
else:
print(name, "is used as a module")
In the code cell below we called stringProcess
module both as a script and a module. See how two methods produce two different messages.
🛑 Remember, to restart your Jupyter notebook or re-run your script before running the following code block otherwise you wouldn't see the outcomes properly. Remember the rule of Python initializing a module only once?
%run -i stringProcess.py
import stringProcess
stringProcess is used as a script stringProcess is used as a module
💡 So how would this __name__
variable be of any help?
__name__
ariable can be a very useful feature to run some primary tests on codes inside the module script. Using this variable's stored value, you can ask Python to run some tests during the development mode when you are actively working with the script and ignore while they are used as modules in a project. We will see a demo of this functionality a bit later.
Improving idfier
¶
Now based on our newly known properties of module, let's improve module - idfier
.
Improving
randomIdfier()
: Currently this function generates a random integer between 1000 to 9999 and assign it as a value. But since this number is generated randomly there's no guarantee that the numbers will be unique. But for our purpose we need it to create unique numbers to be used as IDs. To ensure uniqueness, let's add awhile
loop to check for duplicates and re-generate random number unless it finds a unique one.Using
__name__
for Automated Test: We will add a test code block and wrap it inside aif-else
condition so that it will run the test only when the value of__name__ == "__main__"
, or in other words the modules scripts are directly run. This simple test will check if the length of unique ID values equals to 2 when we runrandomIdfier()
with an input of string containing two names.Adding Short Descriptions: We have used
""" """
(docstring) to add short descriptions for the code blocks.
%%script false --no-raise-error
# idfier module
"""
Tests if randomIdfier() correctly generates unique IDs
"""
name = "idfier"
if __name__ == "__main__":
ls = randomIdfier(["name01", "name02"])
if len(set(ls.values())) == 2:
print("randomIdfier() is ok.")
print(ls)
else:
print("randomIdfier() is not ok.")
print(ls)
else:
print(name, "is used as a module")
"""
Takes a list of strings and assigns randomly generated numbers as unique IDs
"""
__ids = []
import random as rn
def randomIdfier(names):
global __ids
outputDict = {}
for name in names:
id = rn.randint(1000, 9999)
while id in __ids:
id = rn.randint(1000, 9999)
__ids.append(id)
outputDict[name] = id
return outputDict
Couldn't find program: 'false'
import idfier as idf
idf.randomIdfier(['name1', 'name2'])
idfier is used as a module
{'name1': 6571, 'name2': 7469}
🛑 Execute idfier.py
as a script. Can you guess the output?
🛑 Try changing the code inside so that the test fail. The execute it again.
Python's Search for Module¶
So far we have left our modules in the same directory with our project script. But in real projects, we would like to keep our modules and packages in a separate location. So to mimic that, let's copy our two modules to a different folder and let's call this folder Silly_Anonymizer
.
❓ So how do we add this location to our Python project?
Python maintains a list of locations or folders in
path
variable fromsys
module where Python searches for modules.
It searches the locations inside sys.path
in the order they are stored in the list starting with the location where the script's execution happens.
import sys
sys.path
['C:\\Users\\ahfah\\Desktop\\Curious-Joe\\content\\post\\2202-04-02-oop-python-package', 'C:\\Users\\ahfah\\AppData\\Local\\Programs\\Python\\Python39\\python39.zip', 'C:\\Users\\ahfah\\AppData\\Local\\Programs\\Python\\Python39\\DLLs', 'C:\\Users\\ahfah\\AppData\\Local\\Programs\\Python\\Python39\\lib', 'C:\\Users\\ahfah\\AppData\\Local\\Programs\\Python\\Python39', 'c:\\Python_Envs\\pcap02', '', 'c:\\Python_Envs\\pcap02\\lib\\site-packages', 'c:\\Python_Envs\\pcap02\\lib\\site-packages\\win32', 'c:\\Python_Envs\\pcap02\\lib\\site-packages\\win32\\lib', 'c:\\Python_Envs\\pcap02\\lib\\site-packages\\Pythonwin']
We can add our custom module location to this list and make sure that Python knows where to find the module. The Silly_Anonymizer
module directory is location here in my work station: C:\Users\ahfah\Desktop\Anonymizer\Silly_Anonymizer
. Let's add that and check that out.
sys.path.append('C:\\Users\\ahfah\\Desktop\\Anonymizer\\Silly_Anonymizer\\')
sys.path[len(sys.path)-1]
'C:\\Users\\ahfah\\Desktop\\Anonymizer\\Silly_Anonymizer\\'
🛑 Note the double backslashes. Backslashes are used to escape other characters so we need to use double backslashes to have Python understand that we are looking for a literal backslash.
Building a Package¶
Putting our modules inside Silly_Anonymizer is the first direct step towards making a Package. Let's create two sub-directores inside Silly_Anonymizer - NonStringOperation
and StringOperation
, so that in future if we have more modules we can store them based on their types of tasks - manipulating strings, or manipulating non-string operations. For now, let's move our two modules inside these two sub-directories: idfier.py
inside NonStringOperation
, and stringProcess.py
inside StringOperation
. So the folder structure should look like this:
- Silly_Anonymizer/
- NonStringOperation/
- idfier.py
- StringOperation/
- stringProcess.py
- NonStringOperation/
Looking at the Silly_Anonymizer directory as our package, shows us an important aspect of Python Packages that,
A Python package follows a directory structure!
A concrete view of the function, module, and package relationship for our Silly_Anonymizer package is as follows:
Initializing a Package¶
Like Modules, Python Packages also need initializer. To do that we need to include a file called __init__.py
in the root directory of Silly_Anonymizer
. But since a Package is not a file we can't do that as part of a function and hence this separate file is used for initialization. It can be left empty but it needs to be present at the root directory of a module directory to be considered as a Package.
So after adding __init__.py
the Anonymizer folder should look like this:
- Silly_Anonymizer/
- NonStringOperation/
- idfier.py
- StringOperation/
- stringProcess.py
- init.py
- NonStringOperation/
🛑 Note, that you can have __init__.py
in other sub-folders too depending on if you need any special initialization for them or just want to consider them as a sub-package. Which we will do later on in our package too.
Importing a Module from Package¶
Once we have __init__.py
file in our module's home directory, we are ready to use it as a Package. To import a module from a Python package we need to use a fully qualified path from the root of the package. In our case, for the module - stringProcess
the import would look like as follows:
import Anonymizer.StringOperation.stringProcess as sp
stringProcess is used as a module
sp.stringSplit(string="Arafath, Samuel, Tiara, Nathan, Moez", separator=",")
['Arafath', ' Samuel', ' Tiara', ' Nathan', ' Moez']
💡 Python can also read package from compressed location.
Python packages can be imported from zip folder too. If you notice the output from sys.path
you may already find some zip folders in the list. That's because Python treats the zip folders as regular folders.
🛑 Try zipping Silly_Anonymizer
into a zipped folder Silly_Anonymizer.zip
and try to import it.
Publishing a Package¶
We have built our package and it's ready to be used locally. Now let's talk very briefly about how we can make it available for others to use. For this purpose, we will use PyPi. Python Packaging Index or PyPi is the most commonly used repository to host Python packages.
⚠️ A Word of caution
The steps for package publishing explained here will be bare minimum that'll be required to publish a package. Use it as the stepping stone and then explore the official documentations to understand the nitty-gritty of package publication. I will add a couple of resources as references.
Preparation¶
To make our package ready for upload, let's add these following files to the directory where our module directory is located:
Add a Git Repository for Silly_Anonymizer
¶
Create a remote repository in GitHub, or any other distributed version control solutions, and add the Silly_Anonymizer
package to the repo. You can check mine here.
Add a readme.md
file¶
A README file give the users a description about the project. In our case, it will describe the users, what the Silly_Anonymizer
package is about, how to use it and so on. Check the GitHub repo for sample readme file.
Add a setup.py
file¶
This is the main file required to successfully prepare a package for PyPi upload. This file contains some basic instructions that ensures that this local directory is prepared properly for the upload to PyPi. Checkout the sample file in the GitHub repo.
The information in the setup file facilitate the package development, hosting and maintenance. The three absolute minimum required properties are: name, version, and packages. For detail about these and all the other parameters checkout the official documentation.
With these files included the file directory should look like this:
- Anonymizer/
- .git
- README.MD
- setup.py
- Silly_Anonymizer/
- NonStringOperation/
- idfier.py
- StringOperation/
- stringProcess.py
- __init__.py
- NonStringOperation/
Building the Distribution Package¶
PyPi distributes Python package source codes wrapped inside distribution packages. Two of the commonly used distribution packages are source archives and Python wheels. To create to source archive and wheel for our package we will use a package called twine
and run python setup.py sdist bdist_wheel
.
This should create two files in the newly created directory called dist
- a source archive (.tar.gz
file) and a wheel (.whl
file). Checkout the source archive file to make sure all the source codes are populated inside it.
Quick Check¶
We are ready to push our package to PyPi. Before pushing it to PyPi, let's run twine check dist/*
to quickly check if the package will properly render in PyPi. If everything goes properly, you should see PASSED
printed on the screen after running the check.
Upload¶
PyPi has a test version of it, which we can use for learning and testing purpose. We will use PyPi test to host our package. Before doing that, make sure you register in PyPi test.
Once you are registered, run twine upload --repository-url https://test.pypi.org/legacy/ dist/*
. Enter your username and password once prompted.
🛑 If you have followed the article so far, you will not likely be able to upload the package with the same name as I have already uploaded it. Checkout the package here. Give it a different name and re-build the package.
🛑 Use Test PyPi search to find out if your package name is available.
That's it! You should see the location of your newly created Python package on your console. Checkout mine here.
What's Next?¶
In this post, I tried to take someone from the functional understanding of writing Python functions to building his/her first Python package. Before ending reiterating what stated earlier, this article is by no means a detail tutorial on how to build a package for production. In a production code it's imperative that you add rigorous testing. Also it's unlikely that you'd build a package without any dependencies! I didn't cover either of these, so make sure you learn about them. As promised, here are couple of resources that you can use to supplement your understanding: