Recent I was asked to batch run Python script in a virtualenv and also run in crontab.
for example, pyspark_hello_world.py
import sys
from pyspark import SparkContext
from operator import add
sc = SparkContext()
data = sc.parallelize(list(sys.argv[1]))
counts = data.map(lambda x: (x, 1))\
.reduceByKey(add)\
.sortBy(lambda x: x[1], ascending=False)\
.collect()
for (word, count) in counts:
print("{}: {}".format(word, count))
sc.stop()
I want to run it in bash, after some research , I got a solution,
#!/bin/bash source /Users/steven/.pyenv/versions/3.6.4/envs/ts/bin/activate # virtualenv is now active, which means your PATH has been modified. # Don't try to run python from /usr/bin/python, just run "python" and # let the PATH figure out which version to run (based on what your # virtualenv has configured). python "$@" #another way #echo 'source /Users/steven/.pyenv/versions/3.6.4/envs/ts/bin/activate; python /Users/steven/tmp/hello.py' | /bin/bash
name it “runpy”, now I can easily run it in bash
./runpy pyspark_hello_world.py "hello world"
or use it in crontab
0 9 * * * /path/to/runpy /path/to/pyspark_hello_world.py
btw, if under Windows,
we can add the following code at the top of pyspark_hello_world.py
exec(open("D:\\venv\\Scripts\\activate_this.py").read(), \
{'__file__': "D:\\venv\\Scripts\\activate_this.py"})
and runpy.bat is like this,
C:\ProgramData\Anaconda3\python.exe d:\pyspark_hello_world.py "hello world"