Purpose
- The package "Agate" provides a GPU-accelerated Chebyshev collocation method for solving the single elliptic equation with Dirichlet, Neumann, or Robin boundary conditions on a 3-D rectangular domain.
Specifications
- Name: Agate.
- Author: Feng Chen.
- Finishing date: 05/30/2013.
- Languages: CUDA C++, Fortran 90.
- Required libraries: BLAS, LAPACK, CUBLAS.
Simple Example
- Equation: \begin{equation} \begin{aligned} & \alpha u -(\beta_1 u_{xx} + \beta_2 u_{yy} + \beta_3 u_{zz}) = f, &&\quad \text{in } \Omega, \\ & au + b\frac{\partial u}{\partial \boldsymbol{n}} = c, && \quad \text{on } \partial \Omega. \end{aligned} \end{equation}
- Parameters: \begin{equation} \Omega = (-1,1)^3 , \quad \alpha =2, \quad \beta_1=3, \quad \beta_2=4, \quad \beta_3 = 5, \quad a = 1, \quad b=1. \end{equation}
- Exact solution and input functions: \begin{equation} u(x,y,z) = e^{x+y+z}, \end{equation} then $f(x,y,z)$ and $c(x,y,z)$ are calculated accordingly.
Quick Start
- Compiling and running:
cd ./Agate make library nvcc Agate_Main.cu -llibrary -llapack -lblas ./a.out
- Output:
Nx = 2^5, Ny = 2^6, Nz = 2^7 Device 0: Tesla M2050 of version 2.0 intialization time: 0.0961988 cpu time bulk: 0.110826 cpu error bulk: 1.57341e−11 cpu error upx: 1.53673e−11 cpu error umx: 3.08875e−12 cpu error upy: 1.48668e−11 cpu error umy: 2.78527e−12 cpu error upz: 1.2097e−11 cpu error umz: 1.57017e−11 cpu error du: 2.71758e−10 gpu time HtD: 0.00597085 gpu time bulk: 0.00152022 gpu time BV: 0.000412896 gpu time diff: 0.00107117 gpu time DtH: 0.00265923 gpu error bulk: 1.57332e−11 gpu error upx: 1.53664e−11 gpu error umx: 3.08864e−12 gpu error upy: 1.48654e−11 gpu error umy: 2.78522e−12 gpu error upz: 1.2097e−11 gpu error umz: 1.57021e−11 gpu error du: 2.71349e−10 ...
- CPU: Intel(R) Xeon(R) CPU X5630 @2.53GHz.
- GPU: Nvidia Tesla M2050.
- OS: CentOS release 6.4 (Final).
- Compiler: nvcc 4.2.9, gcc/gfortran 4.5.1.
References
- Feng Chen and Jie Shen. A GPU parallelized spectral method for elliptic equations in rectangular domains, Journal of Computational Physics, Volume 250, 555-564, (2013).
Code Highlight
// // Mars_D_Invert performs the inversion in the frequency space // on the device. // // Input: // dd: structure for static data // alpha, beta: coefficient // d_f: right hand side // Output: // d_f: solution // Other: // lambda: eigenvalue // s: stiffness matrix // q: number of quadrature points // inline void Mars_D_Invert (Mars& dd, double alpha, double betax, double betay, double betaz, double* d_f, double* wk) { dim3 T(8, 8, 8); dim3 B((dd.qx+7)/8, (dd.qy+7)/8, (dd.qz+7)/8); Mars_Kernel <<<B,T>>> (alpha, betax, betay, betaz, dd.qx, dd.qy, dd.qz, dd.lambdax, dd.lambday, dd.lambdaz, wk, d_f); }
No comments:
Post a Comment