python notes

installing xgboost on python3

xgboost doesn't install through pip for python3 by default. i always seem to forget this, and have no interest in using graphlab create properties (and a lot of kaggle scripts are written explicitly with xgboost in mind) and i try to keep it isolated to virtualenv's so i sometimes forget what the process is
the best way to install is create virtualenv and project folder:

  1. git clone --recursive https://github.com/dmlc/xgboost
  2. cd xgboost
  3. bash build.sh
  4. cd python-package
  5. pip3 install -e .


using pandas to change object feature columns to numerical category columns

useful for multiple reasons (i.e. save visual space when viewing, lots of category data is indecipherable, MOSTLY for using sklearn features and not having to dummy variables which can get extremely computationally expensive)
another reason why cat is better than object dtypes for pandas

import pandas as pd

df = pd.read_csv() # or whatever
for x in df.dtypes[df.dtypes == 'object'].index:
    df[x] = (df[x].astype('category')).cat.codes