Package and Distribute PySpark with PyInstaller

PyInstaller official website

One of my customer asked how to package PySpark application in one file with PyInstaller, after some research, I got the answer and share it here.

PyInstaller freezes (packages) Python applications into stand-alone executables, under Windows, GNU/Linux, Mac OS X, FreeBSD, Solaris and AIX.

Environment

  • OS:MacOS Mojave 10.14.5
  • Python :Anaconda 2019.03 for macOS 
  • Spark :  spark-2.4.3-bin-hadoop2.7
  • PostgreSQL: 11.2
  • PostgreSQL JDBC: 42.2.5
  • UPX: brew install upx

code is from PySpark Read/Write PostgreSQL

After compile with the following code, we can get an app with 297M and named pyspark_pg.

Let’s run it, ./pyspark_pg

btw, this app does not include JDK, but Python3.7 and Spark-2.4.3-bin-hadoop2.7

For Chinese version, please visit here.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.