Thursday, July 11, 2013

AGATE: GPU-accelerated 3-D Chebyshev collocation methods for elliptic equations


Purpose
  • The package "Agate" provides a GPU-accelerated Chebyshev collocation method for solving the single elliptic equation with Dirichlet, Neumann, or Robin boundary conditions on a 3-D rectangular domain.

Specifications
  • Name: Agate.
  • Author: Feng Chen.
  • Finishing date: 05/30/2013.
  • Languages: CUDA C++, Fortran 90.
  • Required libraries: BLAS, LAPACK, CUBLAS.

Simple Example
  • Equation: \begin{equation} \begin{aligned} & \alpha u -(\beta_1 u_{xx} + \beta_2 u_{yy} + \beta_3 u_{zz}) = f, &&\quad \text{in } \Omega, \\ & au + b\frac{\partial u}{\partial \boldsymbol{n}} = c, && \quad \text{on } \partial \Omega. \end{aligned} \end{equation}
  • Parameters: \begin{equation} \Omega = (-1,1)^3 , \quad \alpha =2, \quad \beta_1=3, \quad \beta_2=4, \quad \beta_3 = 5, \quad a = 1, \quad b=1. \end{equation}
  • Exact solution and input functions: \begin{equation} u(x,y,z) = e^{x+y+z}, \end{equation} then $f(x,y,z)$ and $c(x,y,z)$ are calculated accordingly.

Quick Start
  • Compiling and running:
    cd ./Agate
    make library
    nvcc Agate_Main.cu -llibrary -llapack -lblas
    ./a.out
    
  • Output:
    Nx = 2^5, Ny = 2^6, Nz = 2^7
    Device 0: Tesla M2050 of version 2.0
    intialization time: 0.0961988
    cpu time bulk: 0.110826
    cpu error bulk: 1.57341e−11
    cpu error upx: 1.53673e−11
    cpu error umx: 3.08875e−12
    cpu error upy: 1.48668e−11
    cpu error umy: 2.78527e−12
    cpu error upz: 1.2097e−11
    cpu error umz: 1.57017e−11
    cpu error du: 2.71758e−10
    gpu time HtD: 0.00597085
    gpu time bulk: 0.00152022
    gpu time BV: 0.000412896
    gpu time diff: 0.00107117
    gpu time DtH: 0.00265923
    gpu error bulk: 1.57332e−11
    gpu error upx: 1.53664e−11
    gpu error umx: 3.08864e−12
    gpu error upy: 1.48654e−11
    gpu error umy: 2.78522e−12
    gpu error upz: 1.2097e−11
    gpu error umz: 1.57021e−11
    gpu error du: 2.71349e−10
    ...
    
  • CPU: Intel(R) Xeon(R) CPU X5630 @2.53GHz.
  • GPU: Nvidia Tesla M2050.
  • OS: CentOS release 6.4 (Final).
  • Compiler: nvcc 4.2.9, gcc/gfortran 4.5.1.

References
Code Highlight
    
    // 
    // Mars_D_Invert performs the inversion in the frequency space
    // on the device.
    //
    // Input:
    //        dd: structure for static data
    //        alpha, beta: coefficient
    //        d_f: right hand side
    // Output:
    //        d_f: solution
    // Other:
    //        lambda: eigenvalue
    //        s: stiffness matrix
    //        q: number of quadrature points
    //
    inline void Mars_D_Invert (Mars& dd, double alpha, 
     double betax, double betay, double betaz, double* d_f, double* wk) {
     dim3 T(8, 8, 8);  
     dim3 B((dd.qx+7)/8, (dd.qy+7)/8, (dd.qz+7)/8);
     Mars_Kernel <<<B,T>>> (alpha, betax, betay, betaz, 
      dd.qx, dd.qy, dd.qz, dd.lambdax, dd.lambday, dd.lambdaz, wk, d_f);    
    }
    
    

No comments:

Post a Comment