set global size with real output size, also optimize max pooling index computation if necessary. Signed-off-by: Li Peng <peng.li@intel.com>