Recent I was asked to batch run Python script in a virtualenv and also run in crontab.
for example, pyspark_hello_world.py
import sys from pyspark import SparkContext from operator import add sc = SparkContext() data = sc.parallelize(list(sys.argv[1])) counts = data.map(lambda x: (x, 1))\ .reduceByKey(add)\ .sortBy(lambda x: x[1], ascending=False)\ .collect() for (word, count) in counts: print("{}: {}".format(word, count)) sc.stop()
I want to run it in bash, after some research , I got a solution,
#!/bin/bash source /Users/steven/.pyenv/versions/3.6.4/envs/ts/bin/activate # virtualenv is now active, which means your PATH has been modified. # Don't try to run python from /usr/bin/python, just run "python" and # let the PATH figure out which version to run (based on what your # virtualenv has configured). python "$@" #another way #echo 'source /Users/steven/.pyenv/versions/3.6.4/envs/ts/bin/activate; python /Users/steven/tmp/hello.py' | /bin/bash
name it “runpy”, now I can easily run it in bash
./runpy pyspark_hello_world.py "hello world"
or use it in crontab
0 9 * * * /path/to/runpy /path/to/pyspark_hello_world.py
btw, if under Windows,
we can add the following code at the top of pyspark_hello_world.py
exec(open("D:\\venv\\Scripts\\activate_this.py").read(), \ {'__file__': "D:\\venv\\Scripts\\activate_this.py"})
and runpy.bat is like this,
C:\ProgramData\Anaconda3\python.exe d:\pyspark_hello_world.py "hello world"