Pengfei Zuo

Install and Run ISPASS2009-benchmarks on GPGPU-Sim

Thu, 10 Jan 2019 00:00:00 +0000

ISPASS2009-Benchmarks are used in the ISPASS 2009 paper on GPGPU-Sim for evaluation. The benchmark suite includes 11 benchmarks, i.e., AES, BFS, CP, LPS, LIB, MUM, NN, NQU, RAY, STO, and WP. Please do the following steps to install and run ISPASS2009 Benchmarks.

1 Build the NVIDIA CUDA SDK benchmarks

1) Install NVIDIA driver if have no one:

apt-get install nvidia-340

There may be some errors during the installing. That is all right.

2) We have installed the NVIDIA CUDA SDK benchmarks (i.e., GPU Computing SDK code samples) when installing GPGPU-Sim. We now build it:

cd ~/NVIDIA_GPU_Computing_SDK
make

During the building, if the error /usr/bin/ld: cannot find -lOpenCL collect2: ld returned 1 exit status ../../common/common_opencl.mk:254: recipe for target '../../..//OpenCL//bin//linux/release/oclPostprocessGL' failed occurs, make the following modifications:

(a) Edit ./C/common/common.mk, lines like LIB += … ${OPENGLLIB} …. $(RENDERCHECKGLLIB) … should have $(RENDERCHECKGLLIB) moved before ${OPENGLLIB}. There are 3 lines like this.

LIB += $(RENDERCHECKGLLIB) ${OPENGLLIB} $(PARAMGLLIB) $(CUDPPLIB) ${LIB} -ldl -rdynamic
LIB += -lcuda   $(RENDERCHECKGLLIB) ${OPENGLLIB} $(PARAMGLLIB) $(CUDPPLIB) ${LIB}
LIB += $(RENDERCHECKGLLIB) ${OPENGLLIB} $(PARAMGLLIB) $(CUDPPLIB) ${LIB}

(b) Similarly, edit ./CUDALibraries/common/common.mk

(d) Edit Makefile. Comment all lines with CUDALibraries and OpenCL as we only want the application binaries. You comment by placing # in the front of the line.

# GPU Computing SDK Version 4.0.8
all:
    @$(MAKE) -C ./shared
    @$(MAKE) -C ./C
    #@$(MAKE) -C ./CUDALibraries
    #@$(MAKE) -C ./OpenCL

clean:
    @$(MAKE) -C ./shared clean
    @$(MAKE) -C ./C clean
    #@$(MAKE) -C ./CUDALibraries clean
    #@$(MAKE) -C ./OpenCL clean

clobber:
    @$(MAKE) -C ./shared clobber
    @$(MAKE) -C ./C clobber
    #@$(MAKE) -C ./CUDALibraries clobber
    #@$(MAKE) -C ./OpenCL clobber

(e) make

The NVIDIA CUDA SDK benchmarks have been installed by now. All executed files are listed in the fold ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/.

3) Test GPGPU-Sim using one of NVIDIA CUDA SDK benchmarks

cd /home/gpgpu-sim_distribution/test
~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/

2 Build the ISPASS2009-Benchmarks

1) Download ISPASS2009-Benchmarks

cd /home/gpgpu-sim_distribution
git clone https://github.com/gpgpu-sim/ispass2009-benchmarks.git
cd ispass2009-benchmarks/

2) Define the following environment variables at the top of Makefile.ispass-2009:

export CUDA_INSTALL_PATH=/usr/local/cuda
NVIDIA_COMPUTE_SDK_LOCATION=/root/NVIDIA_GPU_Computing_SDK

3) Comment some benchmarks that fail to be built, e.g., AES, DG, and WP, in Makefile.ispass-2009:

#$(SETENV) make noinline=$(noinline) -C AES
#$(SETENV) make noinline=$(noinline) -C DG/3rdParty/ParMetis-3.1
#$(SETENV) make noinline=$(noinline) -C DG
#$(SETENV) make noinline=$(noinline) -C WP

4) Build the benchmarks

"make -f Makefile.ispass-2009

The binaries generated are in the ./bin/release/ fold.

5) Source setup_environment and Place a link to the GPU configuration files:

cd /home/gpgpu-sim_distribution
source setup_environment 
cd ispass2009-benchmarks/
./setup_config.sh GTX480

You can also change the GPU type (e.g., to TeslaC2050) by the following instructions:

./setup_config.sh --cleanup
./setup_config.sh TeslaC2050

6) Run a benchmark such as NN:

cd NN/
sh README.GPGPU-Sim

Install and Run GPGPU-Sim

Wed, 09 Jan 2019 00:00:00 +0000

GPGPU-Sim is a cycle-level simulator for modeling contemporary GPUs running CUDA and OpenCL workloads. The current GPGPU-Sim supports the GPU simulation with four kinds of architectures, i.e., GTX480, QuadroFX5600, QuadroFX5800, and TeslaC2050 architectures. This blog introduces the detailed steps to install and run GPGPU-Sim.

1 Download and Install NVDIA CUDA 4.0

GPGPU-Sim has to be run with NVDIA CUDA and does not support the CUDA versions larger than 4.0. Hence, we should first install NVDIA CUDA 4.0. The linux OS in my computer is Ubuntu 18.04, and the gcc version is 7.3.0. To install NVDIA CUDA 4.0, please do the following setps.

1) Download the CUDA Toolkit for Ubuntu Linux 10.10 and GPU Computing SDK code samples from the NVDIA website.

2) Install CUDA Toolkit for Ubuntu Linux 10.10 first:

chmod +x cudatoolkit_4.0.17_linux_64_ubuntu10.10.run
sudo ./cudatoolkit_4.0.17_linux_64_ubuntu10.10.run

The CUDA Toolkit has been installed in the path of /usr/local/cuda in default.

3) Add the path of CUDA Toolkit into the ~/.bashrc file:

echo 'export PATH=$PATH:/usr/local/cuda/bin' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib:/usr/local/cuda/lib64' >> ~/.bashrc
source ~/.bashrc

4) Install GPU Computing SDK code samples:

chmod +x gpucomputingsdk_4.0.17_linux.run
sudo ./gpucomputingsdk_4.0.17_linux.run

The GPU Computing SDK has been installed in the path of ~/NVIDIA_GPU_Computing_SDK in default.

5) Install gcc-4.4 and g++-4.4 (since CUDA 4.0 supports the gcc version until 4.4):

 apt-get install gcc-4.4 g++-4.4

If the error package gcc-4.4 is not available, but is referred to by another package occurs, do the following steps to address it:

vim /etc/apt/sources.list

Add the two-line codes into the opened file:

deb http://dk.archive.ubuntu.com/ubuntu/ trusty main universe    
deb http://dk.archive.ubuntu.com/ubuntu/ trusty-updates main universe 

Then, update the apt source:

apt-get update

By now, the gcc-4.4 and g++-4.4 have been installed.

6) Change the gcc/g++ in the system to gcc-4.4/g++4.4:

sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 150
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.4 100
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-7 150
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.4 100

Select the 4.4 version by using update-alternatives:

sudo update-alternatives --config gcc
sudo update-alternatives --config g++

2 Download and Install GPGPU-Sim

1) Download GPGPU-Sim from GitHub

git clone https://github.com/gpgpu-sim/gpgpu-sim_distribution.git

2) Install dependencies

sudo apt-get install build-essential xutils-dev bison zlib1g-dev flex libglu1-mesa-dev
sudo apt-get install doxygen graphviz
sudo apt-get install python-pmw python-ply python-numpy libpng12-dev python-matplotlib
sudo apt-get install libxi-dev libxmu-dev freeglut3-dev

3) Add the CUDA_INSTALL_PATH into the ~/.bashrc file:

echo 'export CUDA_INSTALL_PATH=/usr/local/cuda' >> ~/.bashrc
source ~/.bashrc

4) Build GPGPU-Sim:

make

During the building, if there is an error cuobjdump.l:110: error: unterminated comment cuobjdump.l:108: error: expected declaration or statement at end of input, remove the comments in cuobjdump.l:108-109.

5) Run GPGPU-Sim:

Copy the contents of a GPU config, e.g., configs/GTX480/*, to your application’s working directory, and then run a CUDA application.

mkdir test
cd test/
cp ../configs/GTX480/* ./

Using Quartz to Simulate Persistent Memory

Sat, 22 Jul 2017 00:00:00 +0000

前几篇博客介绍的NVMain是一个体系结构级的非易失内存模拟器，主要是提供给体系结构方面的研究者使用。关于NVM硬件级研究可以使用NVMain，如NVM内存的写策略、磨损均衡策略、内存控制器设计等。由于需要模拟NVM硬件级的特征，包括时序、能耗、写耐久性等，在NVMain模拟器的运行负载的速度远小于在真实DRAM系统上运行的速度。

然而系统软件方面的研究者主要关注系统软件在NVM系统中的性能（延时/吞吐量）并不需要对NVM的硬件机制做更改。NVMain这类体系结构级模拟器中大部分功能，系统软件方面的研究都用不上，而且由于运行速度问题也不能运行大规模的负载。因此，惠普（Hewlett Packard）公司为系统软件方面的研究者开发了一款轻量级的基于DRAM的NVM模拟器，Quartz。在Quartz模拟器上的运行负载的速度可实现接近于在真实DRAM系统上运行的速度。Quartz只支持三种CPU架构包括Sandy Bridge, Ivy Bridge, 和Haswell（注意其它架构类型的CPU使用不了Quartz）。下面介绍Quartz的用法：

1 下载和安装Quartz

Quartz的代码已开源在GitHub上，可以直接下载：

git clone https://github.com/HewlettPackard/quartz.git

Quartz的安装需要一些依赖库，可以运行其提供的install.sh文件自动安装所有的依赖库：

sudo scripts/install.sh

使用以下命令编译Quartz源代码：

mkdir build
cd build
cmake ..
make clean all

编译好后，可以看到在./build/lib/路径下生成了一个动态库文件libnvmemul.so。

2 运行Quartz

首先load模拟器核心模块，在Quartz根目录运行如下命令：

sudo scripts/setupdev.sh load

设置CPU运行在最大频率：

echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

如果机器的Linux内核版本号大于或等于4.0需要运行如下命令：

echo 2 | sudo tee /sys/devices/cpu/rdpmc

运行自己的程序：

scripts/runenv.sh <your_app>

2 模拟NVM延迟

Quartz目前的版本不能同时模拟NVM的延迟和带宽。只能让带宽不变模拟不同的延迟，或让延迟不变模拟不同的带宽。模拟带宽我们一般用不到，这里主要介绍怎样模拟NVM延迟。

模拟读延迟：NVM的读延迟可以直接在根目录下的./nvmemul.ini文件中配置（里面的写延时配置好像并没有用）：

read = 200;

模拟写延迟：Quartz目前的版本不能支持对写延迟的模拟，所以我们需要自己实现写延迟模拟。由于NVM一般作为持久化内存（Persistent Memory），所以CPU对NVM的写都需要使用CLFLUSH指令（cache line flush）把CPU cache中的脏数据刷回NVM中，并使用MFENCE指令（memory fence）保证cache line flush的顺序性。为了模拟NVM写延迟，我们在每个CLFLUSH指令后面植入额外的延迟。

MFENCE和CLFLUSH指令后植入延迟的实现代码可参照./quartz-master/src/lib/路径下的pflush.c文件。

3 编写基于Persistent Memory的程序

基于Persistent Memory的程序中，内存的分配和回收一定要用Quartz中对于的函数pmalloc和pfree。所以程序中需要引用头文件./quartz-master/src/lib/pmalloc.h，并且编译时需要链接libnvmemul.so动态库。

对Persistent Memory的写需要使用CLFLUSH指令刷回NVM中，并且使用MFENCE指令保证多个CLFLUSH指令执行的顺序性。

对于大于原子写（一般是8 bytes）的数据需要进一步使用logging或copy-on-write(CoW)来保证一致性。

Configure Gem5 with NVMain to Simulate Non valotile Memory

Thu, 12 Jan 2017 00:00:00 +0000

NVMain是一个体系结构级的非易失内存模拟器，可以准确地模拟内存系统的时序和能耗。NVMain需要放在GEM5全系统模拟器中运行。

1 安装Mercurial

集成NVMain到GEM5中需要用到一个源代码控制管理工具：Mercurial,请自行安装并学习使用方法。

2 安装GEM5

使用hg clone命令下载GEM5（推荐使用最新版本的GEM5）：

hg clone http://repo.gem5.org/gem5

配置GEM5的运行环境，可参照该教程。

3 配置hgrc文件

3.1 打开hgrc文件：

vim ~/.hgrc

3.2 把以下内容加入到hgrc文件中，并将相关配置（如：username，style，from）修改成自己的信息：

[ui]
# Set the username you will commit code with
username=Your Name <your@email.address>
ssh = ssh -C
# Always use git diffs since they contain permission changes and rename info
[defaults]
qrefresh = --git
email = --git
diff = --git
[extensions]
# These are various extensions we find useful
# Mercurial Queues -- allows managing of changes as a series of patches
hgext.mq =
# PatchBomb -- send a series of changesets as e-mailed patches
hgext.patchbomb =
# External Diff tool (e.g. kdiff3, meld, vimdiff, etc)
hgext.extdiff =
# Fetch allows for a pull/update operation to be done with one command and automatically commits a merge changeset
hgext.fetch =
# Path to the style file for the M5 repository
# This file enforces our coding style requirements
style = /path/to/your/m5/util/style.py
[email]
method = smtp
from = Your Name <your@email.address>
[smtp]
host = your.smtp.server.here

4 下载NVMain

4.1 注册bitbucket账号；

4.2 按照NVMain网站上的说明获取NVMain的使用权；

4.3 进入GEM5根目录，使用hg clone命令下载NVMain；

5 安装NVMain补丁

5.1 进入GEM5根目录；

5.2 Initialize queues in gem5:

hg qinit

5.3 Import the NVMain patch:

hg qimport -f ./nvmain/patches/gem5/nvmain2-gem5-10688+

5.4 Apply the patch:

hg qpush

6 编译GEM5 with NVMain

scons EXTRAS=nvmain ./build/X86/gem5.opt

Compile and Debug SPEC CPU2006 in Linux

Sun, 12 Jun 2016 00:00:00 +0000

SPEC CPU 2006是一个比较老的benchmark，所以在较新的Linux系统上编译会出现不兼容的问题。在编译过程中，需要对SPEC CPU 2006的源代码做几处修改来兼容新的Linux系统。本文以CentOS 7系统为例，介绍在Linux系统中SPEC CPU 2006的编译过程。

Compile

首先，由于兼容性问题SPEC CPU 2006中自带的install.sh文件是运行不了的，我们需要重新编译源代码。进入/tool/src目录，运行buildtools文件：

./buildtools

Debug

运行过程中，会出现几个错误。下面列出了这几个错误和相应的解决方法。

error building specmd5sum

编译specmd5sum时，会出现如下错误：

 gcc -DHAVE_CONFIG_H    -I/home/gem5/cpu2006/tools/output/include   -I. -Ilib  -c -o md5sum.o md5sum.c
 In file included from md5sum.c:38:0:
 lib/getline.h:31:1: error: conflicting types for 'getline'
  getline PARAMS ((char **_lineptr, size_t *_n, FILE *_stream));
  ^
 In file included from md5sum.c:26:0:
 /usr/include/stdio.h:678:20: note: previous declaration of 'getline' was here
  extern _IO_ssize_t getline (char **__restrict __lineptr,
             ^
 In file included from md5sum.c:38:0:
 lib/getline.h:34:1: error: conflicting types for 'getdelim'
  getdelim PARAMS ((char **_lineptr, size_t *_n, int _delimiter, FILE *_stream));
  ^
 In file included from md5sum.c:26:0:
 /usr/include/stdio.h:668:20: note: previous declaration of 'getdelim' was here
   extern _IO_ssize_t getdelim (char **__restrict __lineptr,
                      ^
 make: *** [md5sum.o] Error 1
 + testordie 'error building specmd5sum'
 + test 2 -ne 0
 + echo '!!! error building specmd5sum'
 !!! error building specmd5sum
 + kill -TERM 1299
 + exit 1
 !!!!! buildtools killed

错误原因主要是：函数冲突，stdio.h库已经定义getline和getdelim函数，而SPEC CPU 2006中的getline.h中也定义了这两个函数。

解决方法：打开./tools/src/specmd5sum/md5sum.c文件，注释掉getline.h头文件（第38行）

 //#include "getline.h"

error building Perl

编译Perl时，会出现如下两个错误。

ERROR 1:

 collect2: error: ld returned 1 exit status
 make: *** [miniperl] Error 1
 + testordie 'error building Perl'
 + test 2 -ne 0
 + echo '!!! error building Perl'
 !!! error building Perl
 + kill -TERM 15173
 + exit 1
 !!!!! buildtools killed

ERROR 2:

 t/op/sprintf..............................FAILED--no leader found
 t/op/sprintf2.............................FAILED--expected 263 tests, saw 3

错误原因：

1) 高版本的Linux内核中删除了asm/page.h头文件;

2) 配置perl时，需要用到数学库;

解决方法：

1) 打开./tools/src/perl-5.8.8/ext/IPC/SysV/SysV.xs文件，注释asm/page.h头文件（第7行）

 //#   include <asm/page.h>

2) 打开./tools/src/buildtools文件，在编译perl的代码部分（第333行和334行）做如下修改。

修改前：

 LD_LIBRARY_PATH=`pwd`
 DYLD_LIBRARY_PATH=`pwd`
 export LD_LIBRARY_PATH DYLD_LIBRARY_PATH
 ./Configure -dOes -Ud_flock $PERLFLAGS -Ddosuid=undef -Dprefix=$INSTALLDIR -Dd_bincompat3=undef -A ldflags=-L${INSTALLDIR}/lib -A ccflags=-I${INSTALLDIR}/include -Ui_db -Ui_gdbm -Ui_ndbm -Ui_dbm -Uuse5005threads ; testordie "error configuring perl"

修改后：

 LD_LIBRARY_PATH=`pwd`
 DYLD_LIBRARY_PATH=`pwd`
 ./Configure -Dcc="gcc -lm" -Dlibpth='/usr/local/lib64 /lib64 /usr/lib64' -dOes -Ud_flock $PERLFLAGS -Ddosuid=undef -Dprefix=$INSTALLDIR -Dd_bincompat3=undef -A ldflags=-L${INSTALLDIR}/lib -A ccflags=-I${INSTALLDIR}/include -Ui_db -Ui_gdbm -Ui_ndbm -Ui_dbm -Uuse5005threads ; testordie "error configuring perl"	

Configure and Run PARSEC-2.1 Benchmark in Gem5

Mon, 06 Jun 2016 00:00:00 +0000

上一篇讲了怎样在linux系统里单独运行PARSEC Benchmark，本篇介绍如何在GEM5模拟器中配置和运行PARSEC Benchmark （以ARPHA架构方式为例）。PARSEC Benchmark需要在GEM5中的全系统（full system）模式下运行，其配置方法和上上篇中2.4节比较相似。相关教程可参考：http://www.m5sim.org/PARSEC_benchmarks .

首先新建一个文件夹用于存储PARSEC Benchmark的disk image
```
 mkdir full_system_images
 cd full_system_images
```

下载初始的系统文件，并解压，再重命名文件夹（重命名可选）

 wget http://www.m5sim.org/dist/current/m5_system_2.0b3.tar.bz2
 tar jxvf m5_system_2.0b3.tar.bz2
 mv m5_system_2.0b3 system

解压后，文件的目录结构如下：

 system/
     binaries/
          console
          ts_osfpal
          vmlinux
     disks/
          linux-bigswap2.img
          linux-latest.img

下载PARSEC Benchmark相关文件，并替换掉system文件夹中的相应文件

下载PARSEC对应的linux kernel文件，并替换掉 ‘system/binaries/vmlinux’

 cd ./system/binaries/
 wget http://www.cs.utexas.edu/~parsec_m5/vmlinux_2.6.27-gcc_4.3.4
 rm vmlinux
 mv vmlinux_2.6.27-gcc_4.3.4 vmlinux

下载PARSEC对应的PAL code文件，并替换掉 ‘system/binaries/ts_osfpal’

 wget http://www.cs.utexas.edu/~parsec_m5/tsb_osfpal
 rm ts_osfpal
 mv tsb_osfpal ts_osfpal

下载PARSEC-2.1 Disk Image并解压

 cd ../disks/
 wget http://www.cs.utexas.edu/~parsec_m5/linux-parsec-2-1-m5-with-test-inputs.img.bz2
 bzip2 -b linux-parsec-2-1-m5-with-test-inputs.img.bz2

进入gem5文件夹，修改两个文件（SysPaths.py 和 Benckmarks.py）配置parsec的路径和文件名

打开SysPaths.py配置parsec disk image的完整路径：

 vim ./configs/common/SysPaths.py

修改前：

 path = [ ’/dist/m5/system’, ’/n/poolfs/z/dist/m5/system’ ]

修改后：

 path = [ ’/dist/m5/system’, ’/home/full_system_images/system’ ]

打开Benchmarks.py，修改image文件名：

 vim ./configs/common/Benchmarks.py

修改前：

 elif buildEnv['TARGET_ISA'] == 'alpha':
     return env.get('LINUX_IMAGE', disk('linux-latest.img'))

修改后：

 elif buildEnv['TARGET_ISA'] == 'alpha':
     return env.get('LINUX_IMAGE', disk('linux-parsec-2-1-m5-with-test-inputs.img'))

生成benchmark的script文件，用于运行benchmark

下载PARSEC script生成包，并解压：

 wget http://www.cs.utexas.edu/~parsec_m5/TR-09-32-parsec-2.1-alpha-files.tar.gz
 tar zxvf TR-09-32-parsec-2.1-alpha-files.tar.gz

生成script命令：

 ./writescripts.pl <benchmark> <nthreads>

有以下13种benchmark：

 blackscholes
 bodytrack
 canneal
 dedup
 facesim
 ferret
 fluidanimate
 freqmine
 streamcluster
 swaptions
 vips
 x264
 rtview

根据生成的script文件运行gem5：

 ./build/ALPHA/gem5.opt ./configs/example/fs.py -n <number> --script=./path/to/runScript.rcS --caches --l2cache -F 5000000000

新开一个窗口，使用telnet与gem5模拟系统进行交互
```
 telnet localhost 3456  
```

Compile and Debug PARSEC-3.0 in Red Hat

Tue, 31 May 2016 00:00:00 +0000

Download

Download parsec-3.0 from its official website.

Compile

Decompress the parsec-3.0 compressed file, and compile it using the following instruct:

sudo ./bin/parsecmgmt -a build

Debug

ERROR 1: pod2man error

 POD document had syntax errors at /usr/bin/pod2man line 69.  
 make: *** [install_docs] Error 1   
 [PARSEC] Error: 'env PATH=/usr/bin:/home/gem5/parsec-3.0/bin:/sbin:/bin:/usr/sbin:/usr/bin /usr/bin/make install' failed.   

Solution: delete pod2man file

 sudo rm /usr/bin/pod2man

ERROR 2: conflicting types for __mbstate_t

 /usr/include/wchar.h:94:3: error: conflicting types for ?._mbstate_t?
 } __mbstate_t;

Solution: delete the 4 lines of code from 102 line to 105 line in the file ‘/parsec-3.0/pkgs/libs/uptcpip/src/include/sys/bsd__types.h’. The deleted content is shown below.

 typedef union {
    char            __mbstate8[128];
     __int64_t       _mbstateL;      /* for alignment */
 } __mbstate_t;   

ERROR 3: tbb failed

  Error: 'env version=tbb /usr/bin/make' failed.

Solution:

 yum -y install -y tbb

ERROR 4: No package ‘xt’ found

 No package 'xt' found

Solution:

 yum -y install libXt-devel
 yum -y install libXmu-devel

ERROR 5: No package ‘xi’ found

 No package 'xi' found

Solution:

 yum -y install libXi-devel

Install and Run GEM5 in Unbuntu 14.04

Sat, 30 Apr 2016 00:00:00 +0000

GEM5是一个非常强大的模拟平台，服务于计算机系统架构相关研究，包括系统级架构和处理器微架构。最近在做GEM5相关的研究工作，顺便在blog上记下学习笔记。本文主要描述怎么正确地在Linux系统上安装和运行GEM5。

安装一些依赖软件

运行GEM5需要一些依赖软件，包括：g++ （4.7版本及以上）、Python （2.5版本及以上）、 SCons （0.98.1版本及以上）、 SWIG （2.0.4版本及以上）、zlib、m4、 protobuf （2.1版本及以上）。

1. 安装g++

g++一般系统自带，可用 g++ -v 查看版本号。

如果系统没有的话，使用如下命令安装：

sudo apt-get install g++

2. 安装Python

Python一般系统自带，可用 python --version 查看版本号。

3. 安装Scons

使用以下命令安装SCons：

sudo apt-get install scons

安装后查看版本号：

scons -v

4. 安装SWIG

SWIG下载地址，解压后安装：

./configure   
make    
sudo make install   

安装后查看版本号：

swig -version

5. 安装zlib

zlib一般系统自带，使用 whereis zlib 查看安装位置，如果系统没有的话，使用如下步骤安装：

zlib下载地址，解压后安装：

./configure   
make    
sudo make install   

6. 安装m4

一般系统自带，使用 m4 --veriosn 查看版本，如果系统没有的话，使用如下步骤安装：

m4下载地址，解压后安装：

./configure   
make    
sudo make install  

7. 安装protobuf

protobuf下载地址，解压后安装：

./configure   
make    
sudo make install  

然后使用如下命令可以查看版本号，检查是否安装完成：

protoc --version

8. 安装 libprotobuf-dev 和 libgoogle-perftools-dev

sudo apt-get install libprotobuf-dev    
sudo apt-get install libgoogle-perftools-dev   

运行GEM5

1. 下载GEM5

GEM5稳定版下载地址，然后解压。

2. 编译GEM5

以编译一个RAM处理器为例：

scons build/ARM/gem5.opt

大约二十多分钟后，编译完成。可以使用多线程提高编译速度，如使用8线程：

scons build/ARM/gem5.opt -j8

3. SE测试

输入如下命令进行SE测试：

./build/ARM/gem5.opt ./configs/example/se.py -c ./tests/test-progs/hello/bin/arm/linux/hello

运行结果如下：

root@zuo:/home/zuo/GEM5/gem5-stable# ./build/ARM/gem5.opt ./configs/example/se.py -c ./tests/test-progs/hello/bin/arm/linux/hello
gem5 Simulator System.  http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.

gem5 compiled May  1 2016 00:37:35
gem5 started May  1 2016 00:51:06
gem5 executing on zuo
command line: ./build/ARM/gem5.opt ./configs/example/se.py -c ./tests/test-progs/hello/bin/arm/linux/hello

/home/zuo/GEM5/gem5-stable/configs/common/CacheConfig.py:48: SyntaxWarning: import * only allowed at module level
def config_cache(options, system):
Global frequency set at 1000000000000 ticks per second
warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)
0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000
**** REAL SIMULATION ****
info: Entering event queue @ 0.  Starting simulation...
Hello world!
Exiting @ tick 2924500 because target called exit()
root@zuo:/home/zuo/GEM5/gem5-stable# 

可见输出来有Hello world!表示运行成功。

4. FS测试

全系统（full system）的模拟比较麻烦，需要下载和配置磁盘镜像。以下以X86系统为例。

首先新建一个文件夹用于存储disk image

 mkdir full_system_images
 cd full_system_images

下载X86的disk image, 并解压

 wget http://www.m5sim.org/dist/current/x86/x86-system.tar.bz2
 tar jxvf x86-system.tar.bz2

进入gem5文件夹，修改两个配置文件: SysPaths.py 和 Benckmarks.py

打开SysPaths.py配置disk image的完整路径（本文以/home/full_system_images为例）：

vim ./configs/common/SysPaths.py

修改前：

path = [ ’/dist/m5/system’, ’/n/poolfs/z/dist/m5/system’ ]

修改后：

path = [ ’/dist/m5/system’, ’/home/full_system_images’ ]

打开Benchmarks.py，修改image文件名：

vim ./configs/common/Benchmarks.py

修改前：

elif buildEnv['TARGET_ISA'] == 'x86':
    return env.get('LINUX_IMAGE', disk('x86root.img'))

修改后：

elif buildEnv['TARGET_ISA'] == 'x86':
    return env.get('LINUX_IMAGE', disk('linux-x86.img'))

运行，输入如下命令：

 ./build/X86/gem5.opt ./configs/example/fs.py

Evaluate the Performance of KSM in the Linux Kernel-based Virtual Machine (KVM)

Mon, 11 Apr 2016 00:00:00 +0000

安装KVM

首先查看主机CPU的虚拟化支持:

$ egrep -o '(vmx|svm)' /proc/cpuinfo

如果显示如下,则表示该主机CPU支持虚拟化
```
 zuo@zuo:~$ egrep -o '(vmx|svm)' /proc/cpuinfo   
 vmx   
 vmx   
 vmx   
 vmx   
 zuo@zuo:~$   
```
安装KVM软件包：virt-manager为GUI管理窗口，bridge-utils:用于网络桥接

$ apt-get install qemu-kvm libvirt-bin virt-manager bridge-utils

检查KVM是否安装成功

方法1：

$ lsmod | grep kvm

如果显示如下信息，则表示KVM安装成功

 zuo@zuo:~$ lsmod | grep kvm   
 kvm_intel             143630  0    
 kvm                   452096  1 kvm_intel   
 zuo@zuo:~$    

方法2：

$ virsh -c qemu:///system list

如果显示如下信息，则表示KVM安装成功

 zuo@zuo:~$ virsh -c qemu:///system list   
 Id    Name                           State   
 ----------------------------------------------------   
 zuo@zuo:~$    

比较容易被忽视的一点是：开机前，进入BIOS中，开启CPU虚拟化支持
```
 Intel(R) Virtualization Technology (Enabled)
```

运行KVM

终端中输入如下指令，打开virtual machine manager

$ sudo virt-manager

这时会弹出virtual machine manager的界面
创建虚拟机前，需要下载一个操作系统的镜像文件（.iso）,用于安装虚拟机的操作系统;
点击virtual machine manager界面上的“create a new virtual machine”按钮，就可以开始创建虚拟机了（如下图所示）:

运行KSM

KSM需要在root权限下运行，所以首先获取root权限：

# su root
```
 zuo@zuo:~$ su root   
 Password:   
 root@zuo:/home/zuo#    
```

KSM进程由’/sys/kernel/mm/ksm/’路径中的文件控制，我们可以查看KSM的运行状态：

# grep -H '' /sys/kernel/mm/ksm/*

结果如下：

 root@zuo:/home/zuo# grep -H '' /sys/kernel/mm/ksm/*
 /sys/kernel/mm/ksm/full_scans:0
 /sys/kernel/mm/ksm/merge_across_nodes:1
 /sys/kernel/mm/ksm/pages_shared:0
 /sys/kernel/mm/ksm/pages_sharing:0
 /sys/kernel/mm/ksm/pages_to_scan:100
 /sys/kernel/mm/ksm/pages_unshared:0
 /sys/kernel/mm/ksm/pages_volatile:0
 /sys/kernel/mm/ksm/run:0
 /sys/kernel/mm/ksm/sleep_millisecs:200
 root@zuo:/home/zuo# 

其中每个参数的含义可参照 https://www.kernel.org/doc/Documentation/vm/ksm.txt

开启KSM进程：

# echo 1 > /sys/kernel/mm/ksm/run

虚拟机内存去重测试全过程

首先用KVM创建了3个相同的Linux虚拟机(系统：Ubuntu-12.04 32位, 内存：1GB，硬盘空间：8GB);

只开启一个虚拟机，查看内存去重情况

KSM开始合并相同内存页，3次full scan后（约10分钟），内存去重结果如下：

 root@zuo:/home/zuo# grep -H '' /sys/kernel/mm/ksm/*
 /sys/kernel/mm/ksm/full_scans:4
 sys/kernel/mm/ksm/merge_across_nodes:1
 /sys/kernel/mm/ksm/pages_shared:3633
 /sys/kernel/mm/ksm/pages_sharing:26033
 /sys/kernel/mm/ksm/pages_to_scan:100
 /sys/kernel/mm/ksm/pages_unshared:109842
 /sys/kernel/mm/ksm/pages_volatile:25486
 /sys/kernel/mm/ksm/run:1
 /sys/kernel/mm/ksm/sleep_millisecs:200

第4次full scan后（约6分钟），内存相对稳定，内存去重结果如下：

 root@zuo:/home/zuo# grep -H '' /sys/kernel/mm/ksm/*
 /sys/kernel/mm/ksm/full_scans:5
 /sys/kernel/mm/ksm/merge_across_nodes:1
 /sys/kernel/mm/ksm/pages_shared:10446
 /sys/kernel/mm/ksm/pages_sharing:41652
 /sys/kernel/mm/ksm/pages_to_scan:100
 /sys/kernel/mm/ksm/pages_unshared:112580
 /sys/kernel/mm/ksm/pages_volatile:5436
 /sys/kernel/mm/ksm/run:1
 /sys/kernel/mm/ksm/sleep_millisecs:200

开启第二个虚拟机（此时两个虚拟机同时运行），查看内存去重情况

查看此时的内存去重结果：

 root@zuo:/home/zuo# grep -H '' /sys/kernel/mm/ksm/*
 /sys/kernel/mm/ksm/full_scans:5
 /sys/kernel/mm/ksm/merge_across_nodes:1
 /sys/kernel/mm/ksm/pages_shared:11512
 /sys/kernel/mm/ksm/pages_sharing:42692
 /sys/kernel/mm/ksm/pages_to_scan:100
 /sys/kernel/mm/ksm/pages_unshared:112405
 /sys/kernel/mm/ksm/pages_volatile:3505
 /sys/kernel/mm/ksm/run:1
 /sys/kernel/mm/ksm/sleep_millisecs:200

可见新开启的虚拟机的内存页还没计入KSM，在下一次full scan时，才会计入;

一次full scan后（约10分钟），内存去重结果：

 root@zuo:/home/zuo# grep -H '' /sys/kernel/mm/ksm/*
 /sys/kernel/mm/ksm/full_scans:7
 /sys/kernel/mm/ksm/merge_across_nodes:1
 /sys/kernel/mm/ksm/pages_shared:12135
 /sys/kernel/mm/ksm/pages_sharing:84749
 /sys/kernel/mm/ksm/pages_to_scan:100
 /sys/kernel/mm/ksm/pages_unshared:122351
 /sys/kernel/mm/ksm/pages_volatile:136353
 /sys/kernel/mm/ksm/run:1
 /sys/kernel/mm/ksm/sleep_millisecs:200

第二次full scan后（约10分钟），内存去重结果：

 root@zuo:/home/zuo# grep -H '' /sys/kernel/mm/ksm/*
 /sys/kernel/mm/ksm/full_scans:8
 /sys/kernel/mm/ksm/merge_across_nodes:1
 /sys/kernel/mm/ksm/pages_shared:57047
 /sys/kernel/mm/ksm/pages_sharing:133770
 /sys/kernel/mm/ksm/pages_to_scan:100
 /sys/kernel/mm/ksm/pages_unshared:155843
 /sys/kernel/mm/ksm/pages_volatile:8928
 /sys/kernel/mm/ksm/run:1
 /sys/kernel/mm/ksm/sleep_millisecs:200

第三次full scan后（约13分钟），内存相对稳定，内存去重结果：

 root@zuo:/home/zuo# grep -H '' /sys/kernel/mm/ksm/*
 /sys/kernel/mm/ksm/full_scans:9
 /sys/kernel/mm/ksm/merge_across_nodes:1
 /sys/kernel/mm/ksm/pages_shared:80047
 /sys/kernel/mm/ksm/pages_sharing:159071
 /sys/kernel/mm/ksm/pages_to_scan:100
 /sys/kernel/mm/ksm/pages_unshared:113814
 /sys/kernel/mm/ksm/pages_volatile:3680
 /sys/kernel/mm/ksm/run:1
 /sys/kernel/mm/ksm/sleep_millisecs:200

开启第三个虚拟机（此时三个虚拟机同时运行），查看内存去重情况

第一次full scan过程中，前6分钟pages_sharing和pages_volatile数目在增加，pages_shared和pages_unshared数目几乎没变

第六分钟：

 root@zuo:/home/zuo# grep -H '' /sys/kernel/mm/ksm/*
 /sys/kernel/mm/ksm/full_scans:10
 /sys/kernel/mm/ksm/merge_across_nodes:1
 /sys/kernel/mm/ksm/pages_shared:81335
 /sys/kernel/mm/ksm/pages_sharing:263226
 /sys/kernel/mm/ksm/pages_to_scan:100
 /sys/kernel/mm/ksm/pages_unshared:111860
 /sys/kernel/mm/ksm/pages_volatile:72891
 /sys/kernel/mm/ksm/run:1
 /sys/kernel/mm/ksm/sleep_millisecs:200

第一次full scan完成后（约17分钟），内存去重结果：

 root@zuo:/home/zuo# grep -H '' /sys/kernel/mm/ksm/*
 /sys/kernel/mm/ksm/full_scans:11
 /sys/kernel/mm/ksm/merge_across_nodes:1
 /sys/kernel/mm/ksm/pages_shared:81414
 /sys/kernel/mm/ksm/pages_sharing:269549
 /sys/kernel/mm/ksm/pages_to_scan:100
 /sys/kernel/mm/ksm/pages_unshared:111442
 /sys/kernel/mm/ksm/pages_volatile:77609
 /sys/kernel/mm/ksm/run:1
 /sys/kernel/mm/ksm/sleep_millisecs:200

第二次full scan过程中，刚开始page_volatitle增加了约4万，然后开始减少：

 root@zuo:/home/zuo# grep -H '' /sys/kernel/mm/ksm/*
 /sys/kernel/mm/ksm/full_scans:11
 /sys/kernel/mm/ksm/merge_across_nodes:1
 /sys/kernel/mm/ksm/pages_shared:81413
 /sys/kernel/mm/ksm/pages_sharing:278134
 /sys/kernel/mm/ksm/pages_to_scan:100
 /sys/kernel/mm/ksm/pages_unshared:111442
 /sys/kernel/mm/ksm/pages_volatile:118125
 /sys/kernel/mm/ksm/run:1
 /sys/kernel/mm/ksm/sleep_millisecs:200

第二次full scan完成后（约20分钟），内存相对稳定，内存去重结果：

 root@zuo:/home/zuo# grep -H '' /sys/kernel/mm/ksm/*
 /sys/kernel/mm/ksm/full_scans:12
 /sys/kernel/mm/ksm/merge_across_nodes:1
 /sys/kernel/mm/ksm/pages_shared:85499
 /sys/kernel/mm/ksm/pages_sharing:278616
 /sys/kernel/mm/ksm/pages_to_scan:100
 /sys/kernel/mm/ksm/pages_unshared:171862
 /sys/kernel/mm/ksm/pages_volatile:53213
 /sys/kernel/mm/ksm/run:1
 /sys/kernel/mm/ksm/sleep_millisecs:200

第三次full scan完成后（约20分钟），内存稳定，内存去重结果：

 root@zuo:/home/zuo# grep -H '' /sys/kernel/mm/ksm/*
 /sys/kernel/mm/ksm/full_scans:13
 /sys/kernel/mm/ksm/merge_across_nodes:1
 /sys/kernel/mm/ksm/pages_shared:100562
 /sys/kernel/mm/ksm/pages_sharing:293995
 /sys/kernel/mm/ksm/pages_to_scan:100
 /sys/kernel/mm/ksm/pages_unshared:189267
 /sys/kernel/mm/ksm/pages_volatile:5366
 /sys/kernel/mm/ksm/run:1
 /sys/kernel/mm/ksm/sleep_millisecs:200

关闭第三个虚拟机后：

 root@zuo:/home/zuo# grep -H '' /sys/kernel/mm/ksm/*
 /sys/kernel/mm/ksm/full_scans:15
 /sys/kernel/mm/ksm/merge_across_nodes:1
 /sys/kernel/mm/ksm/pages_shared:100155
 /sys/kernel/mm/ksm/pages_sharing:159629
 /sys/kernel/mm/ksm/pages_to_scan:100
 /sys/kernel/mm/ksm/pages_unshared:79381
 /sys/kernel/mm/ksm/pages_volatile:17447
 /sys/kernel/mm/ksm/run:1
 /sys/kernel/mm/ksm/sleep_millisecs:200

关闭第二个虚拟机后：

 root@zuo:/home/zuo# grep -H '' /sys/kernel/mm/ksm/*
 /sys/kernel/mm/ksm/full_scans:16
 /sys/kernel/mm/ksm/merge_across_nodes:1
 /sys/kernel/mm/ksm/pages_shared:77934
 /sys/kernel/mm/ksm/pages_sharing:43155
 /sys/kernel/mm/ksm/pages_to_scan:100
 /sys/kernel/mm/ksm/pages_unshared:48300
 /sys/kernel/mm/ksm/pages_volatile:1237
 /sys/kernel/mm/ksm/run:1
 /sys/kernel/mm/ksm/sleep_millisecs:200

关闭第一个虚拟机后：

 root@zuo:/home/zuo# grep -H '' /sys/kernel/mm/ksm/*
 /sys/kernel/mm/ksm/full_scans:17
 /sys/kernel/mm/ksm/merge_across_nodes:1
 /sys/kernel/mm/ksm/pages_shared:0
 /sys/kernel/mm/ksm/pages_sharing:0
 /sys/kernel/mm/ksm/pages_to_scan:100
 /sys/kernel/mm/ksm/pages_unshared:0
 /sys/kernel/mm/ksm/pages_volatile:0
 /sys/kernel/mm/ksm/run:1
 /sys/kernel/mm/ksm/sleep_millisecs:200

虚拟机内存去重结果分析

内存去重率

一个虚拟机单独运行的情况：

pages_shared	pages_sharing	pages_unshared	pages_volatile
10446	41652	112580	5436

总内存页数： 10446 + 41652 + 112580 + 5436 = 170114
去重率： 41652/170114 × 100% = 24.5%
（可见单个虚拟机的内存冗余也是蛮多的）

两个虚拟机同时运行的情况：

pages_shared	pages_sharing	pages_unshared	pages_volatile
80047	159071	113814	3680

总内存页数： 80047 + 159071 + 113814 + 3680 = 356612
总去重率： 159071/356612 × 100% = 44.6%
单个虚拟机内重复的页数约为： 41652 × 2 = 83304
两个虚拟机间重复的页数约为： 159071 - 83304 = 75747
两个虚拟机间的去重率约为： 75747/356612 × 100% = 21.2%

三个虚拟机同时运行的情况：

pages_shared	pages_sharing	pages_unshared	pages_volatile
100562	293995	189267	5366

总内存页数： 100562 + 293995 + 189267 + 5366 = 589190
总去重率： 293995/589190 × 100% = 49.9%
单个虚拟机内重复的页数约为： 41652 × 3 = 124956
三个虚拟机间重复的页数约为： 293995 - 124956 = 169039
三个虚拟机间的去重率约为： 169039/589190 × 100% = 28.7%

CPU占用

测试过程中，观测到KSM进程的CPU占用一直处于0.3%～0.7%间;
一些发现
- 前几次full scan中，每次full scan后pages_sharing数会大幅度增加，一个主要原因是：内存逐渐趋于稳定，pages_volatile的页慢慢变少，有些merge到了stable tree中;
- 新开启的虚拟机产生的内存页，在下一轮full scan时，才被KSMD处理;
- 关闭一个虚拟机后，stable tree中的内存页，有些会成为unshared，但不会从stable tree中删掉（上一章第6步）;
- 测试过程中，同一轮full scan内可以观测到，某段时间pages_sharing数一直在增加，某段时间pages_sharing数一直趋于稳定，可见重复内存页具有一定的局部性;

Memory Deduplication

Wed, 16 Mar 2016 00:00:00 +0000

1 Memory Deduplication

Memory deduplication aims to save memory space in RAM by identifying and merging the identical memory pages. Red Hat proposed the KSM (Kernel Shared Memory or Kernel Same-page Merging)¹ technique to implement memory deduplication in Linux kernel. KSM has merged into Linux kernel since the version 2.6.32 in 2009². VMware also proposed a technique called TPS (Transparent Page Sharing)³ in its product ESXi for memory deduplication. Some literatures claims that KSM may be more efficient than TPS. However, I have not found any work comparing them in real experiments.

1.1 KSM

KSM leverages two red-black trees⁴ to manage the memory pages, i.e., a stable tree and an unstable tree. There are three page states for each page in the host, i.e., frequently modified, sharing candidate yet not frequently modified, and shared. The pages frequently modified are not recorded in KSM until their modification frequency decreases. The unstable tree maintains pages that are sharing candidate and not frequently modified. The stable tree records the pages which have been shared and marked copy-on-write (CoW).

1.2 TPS

TPS uses the hash value of memory page’s content to identify the same page. TPS maintains a global hash table, in which each entry records the hash value of a page’s content and the reference number of a shared page. For each new page, its hash value is calculated and used as a key to search the global hash table. If the hash value of the new page matches that of an existing page, a full comparison of their page contents is performed to exclude a false match. If their contents are confirmed to be identical, the new page is reclaimed and pointed to the existing page. The copy-on-write (CoW) technique is used to handle writes to the shared pages.

2 Analysis of Memory Deduplication

2.1 Redundancy Quantity

Virtual Machine (VM) is a primary application scenario of memory deduplication. When the same/similar operating systems or applications are running in different VMs, lots of duplicated memory pages will be generated on the host/hypervisor system¹. Many works⁵,⁶,⁷,⁸ perform the empirical study on memory sharing of VMs. Chang et al.⁶ found that the amount of redundant pages can be as low as 11% but also as high as 86% depending on the operating system and workload. Gupta et al.⁷ measured the amount of duplicated memory across three VMs and found that about 50% of the allocated memory could be saved through memory deduplication. Miller et al.⁸ found 110 MB of redundant memory in typical desktop workloads (LibreOffice, Firefox), and measured 400 MB (39%) of redundant data in their benchmarks.

2.2 Time Overhead

To keep KSM enabled with parameters like sleep = 5000, pages_to_scan = 60, so that around 12000 virtual pages are scanned each second, allowing a maximum memory merging rate of 46.87MB/s in the original KSM¹. The memory scanner in the original KSM needs a considerable amount of time to detect new sharing opportunities (e.g., 5 mins) which hence misses the short-lived sharing opportunities (i.e., < 5 mins). However, the benchmarks⁸ show 80% of all sharing opportunities live between 30s and 5 mins. Thus, the original scanner is ineffective in the scenario. To address this problem, Miller et al.⁸ leverage the page hints in the host’s virtual file system (VFS) layer to speed up the memory scanner and observe more sharing opportunities.

3 My Comments

In order to ensure the access efficiency of memory, memory deduplication has to perform off-line deduplication due to the slow speed of the memory scanner. Specifically, the new pages are first stored in memory and then performed memory deduplication.
TPS needs to perform a full comparison of page contents when the hash matching occurs. I guess that TPS uses a general hash algorithm. If a secure hash algorithm, i.e., SHA-1 and MD5, is used to calculate the hash value of a page, the full comparison of page contents could be avoided. Which method is more efficient?
Docker container has become more and more popular in recent years which is considered as an alternative to VM. A host may maintain hundreds of containers at the same time in which the memory situation is more complex. Exploring the redundancy quantity and memory sharing in docker container may be an important problem.

To be continued ……

References:

Arcangeli A, Eidus I, Wright C. Increasing memory density by using KSM. In proceedings of the Linux ymposium. 2009. ↩ ↩² ↩³
Linux kernel 2.6.32, Section 1.3. Kernel Samepage Merging (memory deduplication). kernelnewbies.org. 2009. ↩
Guo F. Understanding memory resource management in vmware vsphere 5.0. VMware Inc, 2011. ↩
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Chapter 13: Red-Black Trees Introduction to Algorithms, Second Edition. The MIT Press, September, 2001. ↩
BARKER, S., ET AL. An empirical study of memory sharing in virtual machines. USENIX ATC 2012. ↩
CHANG, C.-R., ET AL. An empirical study on memory sharing of virtual machines for server consolidation. Proc. of ISPA 2011. ↩ ↩²
GUPTA, D., LEE, S., VRABLE, M., SAVAGE, S., ET AL. Difference engine: harnessing memory redundancy in virtual machines. Communications of the ACM, 2010, Volume 53. ↩ ↩²
Miller K, Franz F, Rittinghaus M, et al. XLH: More Effective Memory Deduplication Scanners Through Cross-layer Hints. USENIX ATC 2013. ↩ ↩² ↩³ ↩⁴