<module 'networkx' from '/home/pgbovine/epd-6.2-2-rh5-x86/lib/python2.6/site-packages/networkx/__init__.py'>
===== MAIN: learn based on training data =====
=== START program1: ./run learn ../dataset2/train
BASH: learn ../dataset2/train
PYTHON: learn train
=== END program1: ./run learn ../dataset2/train --- OK [1s]
===== MAIN: predict/evaluate on train data =====
=== START program3: ./run stripLabels ../dataset2/train ../program0/evalTrain.in
=== END program3: ./run stripLabels ../dataset2/train ../program0/evalTrain.in --- OK [2s]
=== START program1: ./run predict ../program0/evalTrain.in ../program0/evalTrain.out
BASH: predict ../program0/evalTrain.in ../program0/evalTrain.out
PYTHON: predict evalTrain.in evalTrain.out
=== END program1: ./run predict ../program0/evalTrain.in ../program0/evalTrain.out --- OK [1s]
=== START program4: ./run evaluate ../dataset2/train ../program0/evalTrain.out
=== END program4: ./run evaluate ../dataset2/train ../program0/evalTrain.out --- OK [3s]
===== MAIN: predict/evaluate on test data =====
=== START program3: ./run stripLabels ../dataset2/test ../program0/evalTest.in
=== END program3: ./run stripLabels ../dataset2/test ../program0/evalTest.in --- OK [1s]
=== START program1: ./run predict ../program0/evalTest.in ../program0/evalTest.out
BASH: predict ../program0/evalTest.in ../program0/evalTest.out
PYTHON: predict evalTest.in evalTest.out
=== END program1: ./run predict ../program0/evalTest.in ../program0/evalTest.out --- OK [1s]
=== START program4: ./run evaluate ../dataset2/test ../program0/evalTest.out
=== END program4: ./run evaluate ../dataset2/test ../program0/evalTest.out --- OK [1s]
supervised-learning: Main entry for supervised learning for training and testing a program on a dataset.
(learner:Program) cde-mlcomp-demo: Demo of using the CDE auto-packaging tool with MLcomp
This zip file contains a simple demo of how to use the CDE auto-packaging tool with MLcomp. CDE allows you to easily run your programs on MLcomp even when they use languages, extensions, or libraries that are not installed on the default Linux worker machines. Here are the basic steps involved:
Run your program under CDE supervision on your x86-Linux machine to create a self-contained package within the cde-package/ sub-directory. This package contains all of the dependencies (e.g., shared libraries, language run-times) required to re-execute your program on any contemporary x86-Linux machine.
Create a run wrapper script that copies input files into your CDE package, invokes your program, and then copies the output files out of the package. This wrapper is necessary because CDE-packaged programs can only access files within their own package.
Create a metadata file like this one.
Zip up all of your files and upload to MLcomp. Use 'zip -ry' to preserve symbolic links, since CDE packages might break if symlinks get converted into regular files.
For this demo, I've created a simple Python script located in: cde-package/cde-root/home/pgbovine/python-mlcomp-classifier/dumb_classifier.py
This script implements the dumbest possible classifier (always returning '1' regardless of input). However, the crucial line that we care about is: import networkx
This line tells Python to import the NetworkX graph library. When this script is run natively on the MLcomp worker machine, it will crash since NetworkX is not installed. However, since NetworkX is installed on my personal Linux machine, I can run dumb_classifier.py on my machine with CDE to package up all of its dependencies, most notably the version of Python with NetworkX installed for it.
To execute the program within the CDE package, we must first change into this working directory: cde-package/cde-root/home/pgbovine/python-mlcomp-classifier/
and then execute ./python.cde dumb_classifier.py, which will execute the script using the version of Python that resides within the package. Note that running python dumb_classifier.py will not work on the MLcomp worker machine, since that will execute the native version of Python, which does not have NetworkX.
The final required component is the top-level run script required by the MLcomp program interface. For this demo, we have created a BASH script that wraps around ./python.cde dumb_classifier.py. Here is the tricky part about writing a run script to work with CDE: CDE-packaged programs can only access files within the package. This means that run must properly copy the input files from their original locations into the package, execute your CDE-packaged program, and then copy the resulting output files back out of the package into the location expected by MLcomp. Please see the run script within this directory for an example of how this is done. Fortunately, it's pretty simple once you understand this concept.
Note that although this demo is for Python, CDE is language-agnostic; it will work with any set of programs on your x86-Linux machine. Please email me at email@example.com if you need help getting CDE working with MLcomp.
Go to the page for the run and look at the log file for signs of the responsible error.
You can also download the run and run it locally on your machine (a README file should
be included in the download which provides more information).
We said that a run was simply a program/dataset pair, but that's not the full story.
A run actually includes other helper programs such as the evaluation program and
various programs for reductions (e.g., one-versus-all, hyperparameter tuning).
More formally, a run is a given by a run specification,
which can be found on the page for any run.
A run specification is a tree where each internal node represents a program
and its children represents the arguments to be passed into its constructor.
For example, the one-versus-all program takes your binary classification program
as a constructor argument and behaves like a multiclass classification program.