This zip file contains a simple demo of how to use the CDE auto-packaging tool with MLcomp. CDE allows you to easily run your programs on MLcomp even when they use languages, extensions, or libraries that are not installed on the default Linux worker machines. Here are the basic steps involved:
Run your program under CDE supervision on your x86-Linux machine to create a self-contained package within the cde-package/ sub-directory. This package contains all of the dependencies (e.g., shared libraries, language run-times) required to re-execute your program on any contemporary x86-Linux machine.
Create a run wrapper script that copies input files into your CDE package, invokes your program, and then copies the output files out of the package. This wrapper is necessary because CDE-packaged programs can only access files within their own package.
Create a metadata file like this one.
Zip up all of your files and upload to MLcomp. Use 'zip -ry' to preserve symbolic links, since CDE packages might break if symlinks get converted into regular files.
For this demo, I've created a simple Python script located in: cde-package/cde-root/home/pgbovine/python-mlcomp-classifier/dumb_classifier.py
This script implements the dumbest possible classifier (always returning '1' regardless of input). However, the crucial line that we care about is: import networkx
This line tells Python to import the NetworkX graph library. When this script is run natively on the MLcomp worker machine, it will crash since NetworkX is not installed. However, since NetworkX is installed on my personal Linux machine, I can run dumb_classifier.py on my machine with CDE to package up all of its dependencies, most notably the version of Python with NetworkX installed for it.
To execute the program within the CDE package, we must first change into this working directory: cde-package/cde-root/home/pgbovine/python-mlcomp-classifier/
and then execute ./python.cde dumb_classifier.py, which will execute the script using the version of Python that resides within the package. Note that running python dumb_classifier.py will not work on the MLcomp worker machine, since that will execute the native version of Python, which does not have NetworkX.
The final required component is the top-level run script required by the MLcomp program interface. For this demo, we have created a BASH script that wraps around ./python.cde dumb_classifier.py. Here is the tricky part about writing a run script to work with CDE: CDE-packaged programs can only access files within the package. This means that run must properly copy the input files from their original locations into the package, execute your CDE-packaged program, and then copy the resulting output files back out of the package into the location expected by MLcomp. Please see the run script within this directory for an example of how this is done. Fortunately, it's pretty simple once you understand this concept.
Note that although this demo is for Python, CDE is language-agnostic; it will work with any set of programs on your x86-Linux machine. Please email me at email@example.com if you need help getting CDE working with MLcomp.
Must be logged in to post comments.