Parallel Computing


  1. tmux
  2. I found that, if someone tries to run parallel computing in AWS, and calls cuda multiple times, the system will crash and the python session will terminate.

    One of the possible solutions is to use tmux. It allows the user to open a computing session at the background, so that if the user logout of the aws, he or she can still login and see the same session again.

    
    tmux new -t session-name
    tmux attach -t session-name
    tmux kill-session -t session-name
    tmux new -d -s session-name "source web/bin/activate ; python3 main.py"
        
  3. multiprocessing
  4. 
    import multiprocessing as mp
    print(mp.cpu_count())
    with mp.Pool(processes=4) as pool:
        pool.map(f,range(10))
    def chunks(lst,cores):
        return [lst[cores*i:cores*(i+1)] for i in range(len(lst)//cores+1)]
        
  5. os.environ["CUDA_VISIBLE_DEVICES"] = "1"
  6. torch .to(device)
  7. It is possible to load the data through cpu, forward pass the data via model 1 by gpu 1, and forward pass the data via model 2 gpu 2. The user can do this by specifying the device. device = 'cpu' or 'cuda:0' or 'cuda:1' or so on

References


  1. tmux Tutorial — Split Terminal Windows Easily
  2. Build Raspberry Pi Hadoop/Spark Cluster from scratch
  3. A Data Science/Big Data Laboratory — part 1 of 4: Raspberry Pi or VMs cluster — OS and communication