Mac software installation for Analytics/ML

Notes on installing relevant software on a recent Mac / OS X system.

Will cover Python, TensorFlow, TFlearn, and PyCharm IDE.

Baseline System

For reference, the base system used for this process in March 2018 is a MacBook Pro (late-2016, touch-bar) running macOS 10.13.3 "High Sierra". The system has a 3.3GHz Intel Core i7 processor, 16GB LPDDR3 RAM, and a solid-state drive. The graphics subsystem is Intel Iris Graphics 550 1536, and TensorFlow does not use the graphics processor on macOS (since version 1.2?).

Some aspects may require or benefit from elements of Xcode being installed. I already had an Xcode install, but some extensions were installed along the way (details below).

((REQUEST: anyone else using Mac, can you list your system properties, as well as adding other components or notes below.))

Python and installation related tools

Python is a popular language for ML work and has good support for common libraries and wrappers like TensorFlow and TFlearn. It has been fairly broadly used by a broad scientific-user community for analytics and AI/ML, so there are tutorials and documentation available from a non- or novice-programmer's viewpoint (as well as more expert material). Python claims to be designed for readability, although like anything new users may take some time to get used to it. There are important incompatibilities between the Python2 and Python3 versions, and both are supported in production environments. Python2 and Python3 can be installed in parallel, and common practice includes the use of "virtual environments" which package a Python runtime and associated libraries/modules with a python script or project, allowing for distinct versioning and inclusion of modules/features/overhead per-project. There are also multiple package-management tools which automate download, install, update, etc for add-on modules and libraries - including PIP (Python) and conda (Anaconda project), which can co-exist with each other to some extent.

Homebrew and PIP

Reference link: https://www.macworld.co.uk/how-to/mac/coding-with-python-on-mac-3635912/

Homebrew is a Mac utility which makes it easier to install common UNIX tools not included in Apple's macOS. To install it, open Utilities → Terminal and execute:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

PIP is a package installer for Python, which is used to install and manage non-default libraries and modules. Make sure it is installed and up-to-date:

sudo easy_install pip

pip install —upgrade pip

((NOTE: I'm not sure Homebrew is strictly required for any steps below, but it's a good tool. PIP may have been pre-installed, but the upgrade is probably wise.))

Python3 / Anaconda

Macs come with Python pre-installed, but probably a Python2 version (mine had 2.7.10). To use Python3, an additional installation is required. Anaconda is a popular python distribution for data science users, and includes the "conda" add-on management tool. There is an Enterprise/supported version, but the Community version is good for most non-commercial purposes. The public source/site for Anaconda is https://www.anaconda.com/distribution/.

Reference link: https://unidata.github.io/online-python-training/conda-osx.html

Download an appropriate installer for Anaconda Python3 (2018-03-24: Anaconda version 5.1.0, Python3 version 3.6) from the "distribution" link above. The installer .pkg is nearly 600MB, and the installation takes ~2GB. It will update your PATH to include /anaconda3/bin, making their python version the default that will be run from CLI (re-open or refresh open Terminal sessions or other shells). You can optionally install the Microsoft VisualSuite VSCode IDE at the end of the install, I didn't do this.

TensorFlow (and Virtual Environments for Python)

TensorFlow is a popular package developed by the Google Brain team to support their Deep Neural Network research, and released publicly as a graph-calculation package. TensorFlow is powerful and common for network modeling, but is somewhat complicated and thus has often been wrapped with other libraries which simplify or direct the creation of and interaction with TensorFlow models for specific purposes, e.g. TFlearn and Keras (the latter will actually also run on top of other platforms like CNTK or Theano).

((NOTE: likely I could have used something as simple as "conda install -c conda-forge tensorflow" at this point, but I found links with other methods first. Not sure of the current TensorFlow version available through conda.))

Reference link: https://www.tensorflow.org/install/install_mac

The above link describes installing TensorFlow within the context of building a "virtual environment" for a Python project. First we install "virtualenv" tools to create python virtual environments:

pip install --upgrade virtualenv

... then create a specific virtual environment for the project including TensorFlow (substitute for your_project_dir and virt_env_name):

cd ~/your_project_dir

virtualenv --system-site-packages -p python3 virt_env_name

Next we "activate" the virtual environment using the "activate" script, install TensorFlow, and exit:

cd virt_env_name

source ./bin/activate

easy_install -U pip

pip3 install --upgrade tensorflow

deactivate

Per the above link: "When the Virtualenv environment is active, you may run TensorFlow programs from this shell.

When you are done using TensorFlow, you may deactivate the environment by issuing the \[deactivate\] command."

TFlearn:

TFlearn is a wrapper package around TensorFlow. Per tflearn.org, "TFlearn is a modular and transparent deep learning library built on top of Tensorflow. It was designed to provide a higher-level API to TensorFlow in order to facilitate and speed-up experimentations, while remaining fully transparent and compatible with it."

Reference link: http://tflearn.org/installation/

TFlearn's installation specifies installation of TensorFlow as a pre-requisite for obvious reasons: I skipped that section. As of 2018-03-24 the current TFlearn version was v0.3.2, installed with a simple:

pip install tflearn

If you are using virtual environments, then this command should be executed in each & every environment which needs TFlearn by sourcing the "activate" script and executing the above PIP install.

A Checkpoint:

At this point you should have a system that is capable of running Python scripts/applications, including TensorFlow network models directly or mediated by TFlearn. You can code in your favorite text-editor or other environment, and execute on the command line (perhaps within a virtual environment isolating each application). There are many examples and sample problems which can be found by searching for tensorflow or tflearn sample, and tutorials and challenges on sites like Kaggle.com.

PyCharm:

Given the structured formatting and interpreted execution of Python and the use of mechanisms like virtual environments, an Integrated Development Environment (IDE) can be quite helpful. As many tutorials and sample apps are available online, IDEs can also be helpful if they mediate version-control and download from repositories (e.g. github, etc). One such environment is PyCharm from JetBrains. The free Community edition (as opposed to paid Professional) is fine for our purposes. Download the Community installer (pycharm-community-2017.3.4.dmg as of 2018-03-24) and install. For access to github, configure an API key on the web site ((TODO: need details)) and associated it in the VCS (version control) setttings.

((TODO: add description of creating a sample TFlearn-based app.))

(((Original page/content contributed by Adam Smith.)))

Space shortcuts

Page tree