Skip to main content
Skip to article control options
Open AccessRegular Article

High-Order Implicit Large-Eddy Simulations of Flow over a NACA0021 Aerofoil

Published Online:https://doi.org/10.2514/1.J055304

Abstract

In this study the graphical-processing-unit-accelerated solver PyFR is used to simulate flow over a NACA0021 aerofoil in deep stall at a Reynolds number of 270,000 using the high-order flux reconstruction approach. Wall-resolved implicit large-eddy simulations are undertaken on unstructured hexahedral meshes at fourth- and fifth-order accuracy in space. It was found that either modal filtering or antialiasing via an approximate L2 projection is required in order to stabilize simulations. Time-span-averaged pressure coefficient distributions on the aerofoil and associated lift and drag coefficients are seen to converge toward experimental data as the simulation setup is made more realistic by increasing the aerofoil span. Indeed, the lift and drag coefficients obtained by fifth-order implicit large-eddy simulation with antialiasing via an approximate L2 projection agree better with experimental data than a wide range of previous studies. Stabilization via modal filtering, however, is found to reduce solution accuracy. Finally, performance of various PyFR simulations is compared, and it is found that fifth-order simulations with antialiasing via an L2 projection are the most efficient. Results indicate that high-order flux reconstruction schemes with antialiasing via an L2 projection are a good candidate for underpinning accurate wall-resolved implicit large-eddy simulation of separated, turbulent flows over complex engineering geometries.

I. Introduction

Unsteady turbulent flow with separation is a challenging regime for computational fluid dynamics. Conventional Reynolds-averaged Navier–Stokes approaches are ill suited [1] since they temporally average large-scale unsteady coherent structures and scale-resolving strategies such as large-eddy simulation (LES) and direct numerical simulation (DNS) are often too expensive for industrial applications. High-order numerical methods for mixed unstructured grids offer the promise of increased simulation accuracy in the vicinity of complex engineering geometries and thus a route to efficient scale-resolving simulations of unsteady turbulent flows with separation in an industrial context.

Over the last decade, various high-order methods for mixed unstructured grids have been developed. One such method is the flux reconstruction (FR) approach [2], which provides a unifying framework for various high-order schemes. Using FR, it is possible to recover the nodal discontinuous Galerkin (DG) schemes of Hesthaven and Warburton [3] and, for linear flux functions, spectral difference schemes, originally proposed by Kopriva and Kolias [4] and later popularised by Wang et al. [5]. When combined with explicit time-marching schemes, FR methods exhibit a significant degree of temporal/spatial locality, which is favorable when considering recent hardware architectures, such as graphical processing units (GPUs).

PyFR [6] is an open-source Python-based framework for solving advection-diffusion-type problems using the FR approach. It is capable of solving the Euler and compressible Navier–Stokes equations on mixed unstructured grids. Furthermore, it is also designed to target a range of modern hardware platforms, including heterogeneous mixtures of CPUs and GPUs, via C/OpenMP, CUDA, and OpenCL backends [7].

When employing the FR approach, it is customary to use a collocation projection in order to project the nonlinear flux function into the polynomial space of the solution. However, in the case of underresolved solutions and nonlinear fluxes, as encountered when simulating turbulent flows, this type of projection can result in aliasing-driven instabilities. This issue is well known and has been theoretically analyzed by Jameson et al. [8] and Hesthaven and Warburton [3], among others. The effect of aliasing within the context of turbulent flow simulations has also been investigated by Gassner et al. [9]. A variety of strategies to mitigate aliasing exists, including filtering [3], adding spectral vanishing viscosity [10], switching to a skew-symmetric formulation [11,12], and antialiasing via an approximate L2 projection with a suitable quadrature rule [13]. Effective mitigation of aliasing-driven instabilities is critical to enabling the industrial adoption of high-order schemes for real-world unsteady flow problems.

In this study, we use PyFR to simulate unsteady turbulent flow over a NACA0021 aerofoil in deep stall, at a Reynolds number of 270,000, using a wall-resolved implicit large-eddy simulation (ILES) approach. In Sec. II, we provide a brief summary of the FR approach, including the extensions required to incorporate modal filtering and antialiasing via an approximate L2 projection. In Sec. III, we provide a detailed description of the NACA0021 test case. In Sec. IV, stability is assessed along with the affects of both modal filtering and antialiasing via an approximate L2 projection. Also, various flow metrics are computed and compared with experimental data, and the performance of various simulations is assessed. Finally, in Sec. V, conclusions are drawn.

II. Methodology

A. Flux Reconstruction

For a detailed overview of the multidimensional FR approach, the reader is referred to the work of Witherden et al. [6] and the references therein. In this section, we will outline salient numerical aspects of FR within the context of solving the three-dimensional Navier–Stokes equations on unstructured hexahedral grids. In conservative form, the Navier–Stokes equations can be expressed as

uαt+fα(u,u)=0(1)
where α is the field variable index, u=u(x,t)=(ρ,ρvx,ρvy,ρvz,E) is the solution, ρ is the mass density of the fluid, v=(vx,vy,vz) is the fluid velocity vector, and E is the total energy per unit volume. For a perfect gas, the pressure p and total energy are related through the ideal gas law
E=pγ1+12ρv2(2)
where γ=cp/cv is the ratio of specific heats. The flux can be written as fα=fα(u,u)=fαcfαv, where fc is the inviscid flux given by
fc=(ρvxρvyρvzρvx2+pρvxvyρuvzρvxvyρvy2+pρvyvzρvxwρvvzρvz2+pρvx(E+p)ρvy(E+p)ρvz(E+p))(3)
and fv is the viscous flux
fv=(000TxxTxyTxzTxyTyyTyzTxzTyzTzzθxθyθz)(4)
where
θi=vjj3TijςTxi(5)
Here, ς is the thermal conductivity coefficient, T is the absolute static temperature, and Tij is the viscous tensor, which can be determined by the Stokes hypothesis
Tij=μ(vixj+vjxi)23μvlxl(6)
where μ is the dynamic viscosity.

To solve the system using the FR approach, we start by decomposing the physical domain Ω into a set of NE conformal, nonoverlapping, hexahedral elements. Take Ωi to refer to the ith element in this set. It is convenient, for reasons of both mathematical simplicity and computational efficiency, to work in a transformed space. We accomplish this by introducing a standard element Ω˜, which exists in a transformed space x˜. Take Ji(x˜) to be the relevant Jacobian matrix for the ith element and Ji(x˜) its determinant. We proceed to associate this standard element with a set of Nu solution points denoted by {x˜i(u)}Ω˜ and a set of Nf flux points denoted by {x˜i(f)}Ω˜. At each flux point, we define a polynomial correction function gi(f)(x˜) with the property that ni(f)gj(f)(x˜i(f))=δij, where ni(f) is the unit normal vector associated with the ith flux point. The specific form of these correction functions determines various properties of the resulting FR scheme, including stability and accuracy [2,14,15]. In particular, by the appropriate choice of correction functions, one can recover the nodal DG scheme [2].

Inside of each element, we represent the solution as a degree k polynomial uiαPk(Ω˜). This polynomial can be defined using a nodal basis as

uiα(x˜)=jNuuijαj(x˜)(7)
where uijα is the value of the α field variable at the jth solution point inside of the ith element and {j(x˜)} are a set of nodal basis functions defined such that j(x˜i(u))=δij. These polynomials are permitted to be discontinuous between elements. The objective in FR is to obtain an approximation of fα in the same polynomial space as the solution.

This is accomplished as follows. First, the solution polynomial is evaluated at each of the flux points. As the flux points on the faces of adjacent elements pair up, this results in two distinct values of the solution at a single point in physical space. By employing a local discontinuous Galerkin (LDG)-type method, it is possible to correct these discontinuous solutions to obtain an approximation, which is C0 continuous at the flux points and the gradient of which sits in Pk3(Ω˜). The gradients of this approximate solution, along with the original discontinuous solution defined by Eq. (7), can be used to compute a discontinuous flux within each as

f˜iαD(x˜)=jNuf˜ijα(uij,uij)j(x˜)(8)
where the transformed flux is given by
f˜ijα(u,u)=Ji(x˜j(u))Ji1(x˜j(u))fα(u,u)(9)

Furthermore, using the solution gradient, it is also possible to compute an approximation of the inviscid and viscous fluxes at each flux point pair. These can then be combined to compute a normal transformed flux, which is common to both flux points. For the inviscid portion, this involves application of an approximate Riemann solver, whereas for the viscous portion, a LDG-type scheme can be employed. This common approximation, however, is generally different from that obtained by evaluating the transformed discontinuous flux of Eq. (8) at the flux point and dotting it with the associated normal vector.

The final step involves scaling the vector correction function associated with each flux point by the difference between the common and and discontinuous flux at each flux point. By summing the scaled corrections from all of the flux points and adding these onto f˜iαD, the result will be C0 continuous at each of the flux points. The divergence of this polynomial can then be taken to give the desired semidiscretized form of the governing system.

B. Aliasing-Driven Instabilities

Inside of a FR step, it is necessary, at several points, to perform polynomial projections: specifically, the projection of the transformed flux into the polynomial space of the solution within an element, projection of the transformed normal common interface flux into the polynomial space of the correction function divergence on the face of an element, and the projection of the divergence of the corrected flux into the space of the solution within an element. In the prescription outlined previously, these projections are performed by simply evaluating the relevant functional at the appropriate set of nodal basis points, a so-called collocation projection.

Although extremely efficient, collocation-type projections can result in aliasing, whereby energy from modes that are unresolved is erroneously transferred into those that are. This aliasing can impact not just the accuracy of the solution but also the stability. Aliasing can be mitigated through modal filtering [3], adding spectral vanishing viscosity [10], switching to a skew-symmetric formulation [11,12], and antialiasing via an approximate L2 projection with a suitable quadrature rule [13]. Two of these approaches are considered in this study: modal filtering and antialiasing via an approximate L2 projection. Modal filtering is perhaps the most straightforward approach to suppressing aliasing errors. It is relatively simple to implement and relatively cheap to apply at a given time step. However, it often requires careful parameter tuning in order to achieve a balance between stability and accuracy and even then can be overly dissipative. Antialiasing via an approximate L2 projection, on the other hand, can be relatively complex to implement, requiring significant changes to a codebase, and can be relatively expensive to apply at a given time step. However, if the L2 projection is sufficiently accurate, it can suppress aliasing without additional tuning parameters and without being overly dissipative.

C. Modal Filtering

It is known that the effects of aliasing are most pronounced in the highest-frequency modes of the expansion [1618]. Therefore, one means of stabilizing a simulation is to filter the higher modes of the expansion [3] every N time steps. The modal expansion of the solution can be written as

uiα(x˜)=jNuu^ijαLj(x˜)(10)
where u^ijα are the modal coefficients and Lj(x˜) are the orthonormal modal basis functions. A popular choice of filter is an exponential filter [3]. Such a filter can be expressed as a diagonal matrix that acts on the vector of modal coefficients as
Λii={1if  degLiηcexp(κ((degLiηc)/(ηmηc))s)otherwise(11)
where κlogε with ε being the machine precision, ηc<ηm is a cutoff parameter, s is the strength of the filter, deg Li is the degree of the ith modal basis function, and ηm=k+1. Damping is only applied to modes of which the degree is greater than the cutoff with higher modes receiving progressively more damping. The rate at which this ramps up is controlled by s.

D. Antialiasing via Approximate L2 Projection

The principle behind antialiasing is to compute the modal expansion coefficients of the desired polynomial exactly. Taking the projection of the nonlinear volume flux as an example, we note that in modal form the polynomial inside of the ith element can be expressed as

f˜iαD(x˜)=jNuf^ijαLj(x˜)(12)
with the modal coefficients being given by
f^ijα=Ω˜f˜iα(x˜)Lj(x˜)dx˜(13)
where the integration domain Ω˜ is that of the reference element. It can be readily verified that the resulting polynomial is optimal in that it is the one that minimizes the norm f˜iαD(x˜)f˜iα(x˜)Ω˜,2. The integrals can be approximated through Gaussian quadrature in which
Ω˜f˜iα(x˜)Lj(x˜)dx˜lNqωlf˜iα(x˜j)Lj(x˜l)(14)
where {x˜l} are the abscissa of an Nq-point quadrature rule and {ωl} are a set of associated weights. In the case of a nonpolynomial integrand, as is the case when solving the Navier–Stokes equations, the result is always approximate on account of f˜iα(x˜) being inherently nonpolynomial. However, the error can be reduced by choosing a quadrature rule of a sufficiently high degree. Generally, this results in NqNu.

When antialiasing via an approximate L2 projection, it is necessary to evaluate the relevant functional at the chosen set of quadrature points. This requires all arguments of the functional, for example, the solution and its gradients, to be interpolated to these points. Next, the functional itself must be evaluated at the abscissa and a summation performed to obtain the desired modal coefficients. As a consequence, antialiasing via an approximate L2 projection, especially with a large number of points, is more computationally expensive than a simple collocation projection.

III. Flow over NACA0021 Aerofoil in Deep Stall

A. Overview

In this study, we use PyFR to simulate unsteady separated turbulent flow over a NACA0021 aerofoil in deep stall. The aerofoil is set at a 60 deg angle of attack to the oncoming freestream flow. The Reynolds number Re (based on the aerofoil chord c, the freestream velocity u, the freestream density ρ, and the fluid viscosity μ) is set at 270,000. The Mach number Ma (based on the freestream velocity u, the freestream density ρ, and the freestream pressure p) is set at 0.1. The setup exhibits significant unsteady dynamics and is hence a good test case for scale-resolving approaches such as detached-eddy simulation (DES), LES, and DNS. Moreover, significant experimental [19] and computational (from the DESider project) [20,21] data exist for the test case, including time-span-averaged pressure coefficient distributions on the aerofoil, associated lift and drag coefficients, and lift and drag coefficient spectra.

B. Domains and Meshes

Four domains were considered. All had an extent of 30c in the stream- and crosswise directions. However, the spanwise extent Lz was varied. Specifically, values of Lz=c, 2c, 4c, 7c were considered. Previous experimental data were obtained with a span of 7.2c [19], and previous computational results suggest that simulations should employ a span of at least 4c [20]. The leading edge of the aerofoil was located at the center of the cross-streamwise plane. Riemann-invariant boundary conditions were applied in the far field, a no-slip adiabatic wall boundary condition was applied on the aerofoil surface, and a periodic condition was imposed in the spanwise direction.

For each domain, two quadratically curved hexahedral meshes were made using Gmsh [22], one for simulation with third-order solution polynomials (nominally fourth-order accurate in space), henceforth referred to as P3 meshes, and one for simulation with fourth-order solution polynomials (nominally fifth-order accurate in space), henceforth referred to as P4 meshes. All meshes were unstructured in the stream- and crosswise directions but uniformly extruded in the spanwise direction. Elements adjacent to the aerofoil were sized such that the first (Gauss–Legendre) solution point sat a distance equivalent to y+1 from the wall (where y+ is based on a flat plate boundary layer at Reynolds number 270,000). Elements of the P3 meshes had a characteristic size 0.05c in the stream- and crosswise directions near the aerofoil and a uniform extent 0.05c in the span-wise direction throughout the domain. Elements of the P4 meshes had a characteristic size 0.0625c in the stream- and crosswise directions near the aerofoil and a uniform extent 0.0625c in the spanwise direction throughout the domain. For both meshes, the majority of elements was concentrated near the aerofoil, as shown in Fig. 1.

Fig. 1
Fig. 1

View of P3 mesh with Lz=c.

In total, the P3 meshes had 80,840 elements per spanwise extent c, and the P4 meshes had 51,632 elements per spanwise extent c. However, since third-order solution polynomials require 64 solution points within each hexahedral element and fourth-order solution polynomials require 125, the total number of degrees of freedom per spanwise extent c was similar across P3 and P4 meshes.

C. Methodology

The compressible Navier–Stokes equations with constant viscosity were solved using PyFR version 0.8.0 [6]. A DG scheme was used for the spatial discretisation, solution points were located as a tensor product of Gauss–Legendre points within each hexahedra, flux points were located as a tensor product of Gauss–Legendre points on each face of each hexahedra, common inviscid fluxes were computed via a RoeM-type Riemann solver [23], and common viscous fluxes were computed using a LDG approach [24]. An explicit RK45[2R+] scheme [25] with adaptive PI time-step control [26] was used to advance the solution in time. A wall-resolved ILES approach was adopted, and hence no turbulence model was employed.

Simulations on P3 and P4 meshes with Lz=c, 2c, 4c, 7c were started from a freestream initial condition with Re=27,000. Each simulation was then run “natively” without any antialiasing for 100c/u time units using first-order solution polynomials to facilitate the passing of initial transients. After this warmup period, various simulations were restarted on each mesh with Re=270,000. Specifically, P3 meshes with Lz=c, 2c, 4c, 7c were restarted with third-order solution polynomials and 1) antialiasing via modal filtering, henceforth referred to as P3-F simulations; 2) antialiasing via an L2 projection, henceforth referred to as P3-AA simulations; and 3) natively without any antialiasing, henceforth referred to as P3-N simulations. P4 meshes with Lz=c, 2c, 4c, 7c were restarted with fourth-order solution polynomials and 1) antialiasing via an L2 projection, henceforth referred to as P4-AA simulations, and 2) natively without any antialiasing, henceforth referred to as P4-N simulations. All the aforementioned simulations, which are summarized in Table 1, were run for a further 400c/u time units, over which period data were extracted for analysis.

In the previous description, antialiasing via modal filtering refers to the application of Eq. (11) with κ=36, s=16, and ηc=1 every N=50 time steps. A visual representation of the resulting filter coefficients is shown in Fig. 2. Antialiasing via an L2 projection on the P3 and P4 meshes, with third-order and fourth-order solution polynomials, respectively, refers to use of 9th-order and 11th-order Gauss–Legendre quadrature rules, respectively, to perform an approximate L2 projection of the transformed flux into the polynomial space of the solution (within an element) and an approximate L2 projection of the transformed normal common interface flux into the polynomial space of the correction function divergence (on the face of an element).

Fig. 2
Fig. 2

Modal filter coefficients for the P3-F simulation obtained with κ=36, s=16, and ηc=1.

We note that the filter parameters were selected based on the results of a series of numerical experiments using a P3 mesh with Lz=c and third-order solution polynomials. First, experiments were performed varying filter strength s and cutoff ηc, with fixed κ=36 and N=50. Outcomes are presented in Table 2. Looking at the table, it is clear that only the strongest filter (s=16 and ηc=1) is stable up to t=400c/u. Next, for this strongest filter, with fixed κ=36, s=16, and ηc=1, experiments were performed varying application frequency N. Outcomes are presented in Table 3. Looking at the table, it is clear that only the highest application frequency (N=50) is stable up to t=400c/u.

IV. Numerical Results

A. Stability

It was found that P3-N and P4-N simulations quickly diverged for all Lz considered. However, P3-F, P3-AA, and P4-AA simulations remained stable for all Lz. Consequently, all subsequent results are from P3-F, P3-AA, and P4-AA simulations. This finding clearly demonstrates the importance of suppressing aliasing-driven instabilities when using FR schemes.

B. Comparison with Experimental Data

1. Pressure Distributions

Figure 3 plots time- and span-averaged distributions of pressure coefficient Cp on the pressure and suction surfaces of the aerofoil. All simulations achieved good agreement with the experimental data on the pressure surface. P3-AA and P4-AA simulations achieved increasingly good agreement with the experimental data on the suction surface as Lz was increased, visually converging to the experimental data when Lz4c. P3-F simulations exhibited the same trend. However, they failed to visually converge to the experimental data, even when Lz=7c.

Fig. 3
Fig. 3

Time- and span-averaged Cp distributions on the pressure and suction surfaces of the aerofoil.

2. Time-Averaged Force Coefficients

Figures 4, 5, and 6 compare the time-averaged lift coefficient Cl and time-averaged drag coefficient Cd with experimental data [19] and computational results from the DESider project [20,21], for P3-F, P3-AA, and P4-AA simulations, respectively Table 4. Figures 7 and 8 plot percentage errors in Cl and Cd, relative to experimental data [19], for PyFR simulations and computational results from the DESider project [20,21], respectively.

Fig. 4
Fig. 4

Cl and Cd obtained from P3-F simulations, along with the experimental data of [19] and previous numerical results [20] (gray markers; see Table 4).

Fig. 5
Fig. 5

Cl and Cd obtained from P3-AA simulations, along with the experimental data of [19] and previous numerical results [20] (gray markers; see Table 4).

Fig. 6
Fig. 6

Cl and Cd obtained from P4-AA simulations, along with the experimental data of [19] and previous numerical results [20] (gray markers; see Table 4).

Fig. 7
Fig. 7

Percentage errors, relative to experimental data [19], in Cl and Cd obtained from P3-F, P3-AA, and P4-AA simulations.

Fig. 8
Fig. 8

Percentage errors, relative to experimental data [19], in Cl and Cd obtained from various previous numerical studies [20] (gray markers; see Table 4).

As one would expect, the overall trends are similar to those observed for the Cp distributions of Fig. 3. When Lz=c, all of the PyFR simulations overpredict both Cl and Cd. This is especially evident for the P3-F simulation with Lz=c. However, as Lz is increased, the PyFR simulations converge toward the experimental data, and when Lz4c, the P3-AA and P4-AA simulations exhibit less than 2.5% deviation in Cl and Cd. Finally, we note that the P4-AA simulation with Lz=7c achieves the best agreement with experimental data and that the resulting errors in Cl and Cd are lower than any of those obtained from the DESider project [20,21].

3. Force Spectra

Figure 9 compares the power spectral density (PSD) of Cl with related experimental data of [19], where St is a Strouhal number based on the freestream velocity u and the chord c.

Fig. 9
Fig. 9

Plots of PSD of Cl from various PyFR simulations and similar experimental data [19]. It should be noted that, while the PyFR data were obtained from Cl (the time- and surface-averaged lift coefficient), the experimental data were obtained from a sectional lift coefficient at a fixed spanwise location. Peak 1 and peak 2 mark distinct peaks in the experimental PSD.

To obtain the PSD, Cl was sampled every 0.025c/u time units, and the PSD was computed using Welch’s averaged periodogram method with windows of length 4096 samples and a shift between windows of 10 samples. It should be noted that, while the PyFR data were obtained from Cl (the time- and surface-averaged lift coefficient), the experimental was obtained from a sectional lift coefficient at a fixed spanwise location. Consequently, it is only meaningful to compare frequencies for peak 1 and peak 2 (see Fig. 9), which are presented in Table 5. We see that the P3-AA and P4-AA simulations with Lz=4c and Lz=7c achieved very good agreement with the experimental results.

C. Discussion

1. Effect of Lz

Increasing Lz toward the experimental value of 7.2c was found to give consistently better agreement with experimental data, in terms of both time-averaged statistics and spectral characteristics. This appears to be an improvement compared to previous results from the DESider project [20,21], which, as illustrated in Fig. 8, exhibited a degree of scatter in time-averaged statistics as Lz was varied, including cases in which time-averaged statistics became worse with increasing Lz.

To better understand Lz dependence, Fig. 10 plots time-averaged nondimensional pressure contours, with streamlines, on the midspan plane obtained from the P3-AA simulations for the four different Lz. In all cases, there are two primary vortices behind the suction surface of the aerofoil, which gives rise to a low-pressure region. As Lz is increased, these vortices move farther away from the suction surface, and the extent of the pressure drop in the wake is also reduced. These observations suggest that as Lz is increased the two primary vortices become weaker.

Fig. 10
Fig. 10

Time-averaged nondimensional pressure contours, with streamlines, obtained from P3-AA simulations with various Lz. Streamlines were generated by integrating forward and backward from 50 equispaced seed points located along the line x=0.8.

Further insight is gained by plotting isosurfaces of Q criterion, which define vortical structures [27], obtained from P3-AA simulations with Lz=c and Lz=4c. Figure 11 plots such isosurfaces colored by nondimensional density and cut by the midspan plane. Small-scale vortical structures for the Lz=c case are elongated and predominately perpendicular to the spanwise direction, whereas those for the Lz=4c case appear more isotropic. Also, the large region of low density (indicating the presence of a large-scale vortical structure) behind the suction surface is more pronounced for the Lz=c case compared with the Lz=4c case. This is further illustrated by Fig. 12, which plots colour maps of nondimensional density on the midspan plane, and Fig. 13, which plots isosurfaces of Q criterion colored by nondimensional density and cut by a plane described in Fig. 12.

Fig. 11
Fig. 11

Instantaneous isosurfaces of the Q criterion colored by nondimensional density and cut by the midspan plane for P3-AA simulations with Lz=c and Lz=4c.

Fig. 12
Fig. 12

Instantaneous color maps of nondimensional density on the midspan plane. Solid lines denote location of cut planes used for visualization in Fig. 13.

Fig. 13
Fig. 13

Instantaneous isosurfaces of the Q criterion colored by nondimensional density and cut by a plane described in Fig. 12 for P3-AA simulations with Lz=c and Lz=4c.

2. Effect of Antialiasing via Modal Filtering and Approximate L2 Projection

Antialiasing via an L2 projection is found to give consistently more accurate results than antialiasing via modal filtering. Figure 14 plots time-averaged nondimensional pressure contours, with streamlines, on the midspan plane obtained from P3-F and P3-AA simulations with Lz=c. For the P3-F case, the two primary vortices are closer to the surface of the aerofoil than those obtained by the P3-AA simulation. The P3-F simulation also has a larger low-pressure region, which indicates that the primary vortices of the P3-F simulation are stronger than those of the P3-AA simulation.

Fig. 14
Fig. 14

Time-averaged nondimensional pressure contours, with streamlines, obtained from P3-F and P3-AA simulations with Lz=c. Streamlines were generated by integrating forward and backward from 50 equispaced seed points located along the line x=0.8.

Further insight is gained by plotting isosurfaces of the Q criterion, obtained from P3-F and P3-AA simulations with Lz=c. Figure 15 plots isosurfaces of the Q criterion colored by nondimensional density and cut by the midspan plane. It is clear that the P3-AA simulation resolves a wider range of small-scale vortical structures compared to the P3-F simulation. The inability of the filtered P3-F simulations to accurately resolve small-scale structures likely underlies their observed lack of convergence to the experimental data as Lz is increased.

Fig. 15
Fig. 15

Instantaneous isosurfaces of the Q criterion colored by nondimensional density and cut by the midspan plane for P3-F and P3-AA simulations with Lz=c.

D. Computational Cost

A total of 12 simulations are run on a variety of different GPU-accelerated machines: local clusters at Imperial College, London, Emerald at the Science and Technology Facilities Council Rutherford Appleton Laboratory, the Wilkes cluster at Cambridge University, and Piz Daint at the Swiss National Supercomputing Center. All these machines consist of NVIDIA GPUs and are targeted using the CUDA backend of PyFR.

To compare computational cost across all the simulations, we choose to conduct performance testing on single machine, Piz Daint, which is comprised of NVIDIA K20X GPUs. This is accomplished by running 0.1c/u extra time units after the final solution for each simulation. The number of NVIDIA K20X GPUs is selected to be 10Lz/c (effectively weak scaling). Hence, when Lz=c, a total of ten GPUs are employed, whereas at Lz=7c, a total of 70 GPUs are used. Depending on the order of accuracy and the type of antialiasing approach, GPU memory load varies from 5 to 10% of capacity. This setup mirrors how the actual simulations are performed in practice with a greater number of GPUs being used at larger spans and loading toward the limit of strong scaling.

Measurements of the wall clock time, excluding that required for file input/output, can be seen in Fig. 16. From the figure, it is clear that the P4-AA simulations are quickest for all Lz. This is attributed to the fact that the P4 meshes are significantly coarser than the P3 meshes. The increase in element size serves to offset the more severe time-step restriction that comes from increasing the solution polynomial order, so as to permit a larger overall time step to be taken as shown in Fig. 17. This in turn offsets the higher wall clock time per time step for the P4-AA simulations, shown in Fig. 18, leading to an overall reduction in time to solution.

Fig. 16
Fig. 16

Wall clock time for each simulation to run 0.1c/u time units, normalized by the wall clock time required for the P4-AA simulation with Lz=c to run 0.1c/u time units.

Fig. 17
Fig. 17

Average time step for each simulation as measured over 0.1c/u time units.

Fig. 18
Fig. 18

Average wall clock time per time step for each simulation as measured over 0.1c/u time units, normalised by the wall clock time required for the P4-AA simulation with Lz=c to run 0.1c/u time units.

Finally, we note that, while model filtering is generally considered a cheaper route to antialiasing than an approximate L2 projection, in this case it leads to a higher wall clock time. This was because the adaptive PI time-step controller [26] selects a smaller time step for P3-F simulations than P3-AA simulations, as can be seen in Fig. 17. This reduction in time-step size offsets the reduced wall clock time per time step for the P3-F simulations, shown in Fig. 18, leading to an overall increase in time to solution. Further studies revealed that this effect is proportional to the filter application frequency N, as shown in Fig. 19. Thus, when employing an adaptive PI time-step controller, it is important to apply the filter as infrequently as possible. The interaction between modal filtering and adaptive time stepping remains a topic for further research.

Fig. 19
Fig. 19

Average time step for the P3-F simulations with various filter application frequencies N as measured over 0.1c/u time units.

V. Conclusions

In this study, the open-source GPU-accelerated computational fluid dynamics solver PyFR was used to simulate flow over a NACA0021 aerofoil in deep stall at a Reynolds number of 270,000 using the high-order FR approach. Wall-resolved ILES were undertaken on unstructured hexahedral meshes at nominally fourth- and fifth-order accuracy in space. It was found that either modal filtering or antialiasing via an approximate L2 projection was required in order to stabilize simulations. Results for time-span-averaged pressure coefficient distributions on the aerofoil and associated lift and drag coefficients were seen to converge toward experimental data as the simulation setup was made more realistic by increasing the aerofoil span. Indeed, the lift and drag coefficients obtained by fifth-order accurate ILES with antialiasing via an approximate L2 projection agree with experimental data better than a wide range of previous studies. Stabilization via modal filtering, however, was found to reduce solution accuracy relative to experimental results. Finally, the performance of various PyFR simulations was compared, and it was found that nominally fifth-order simulations with antialiasing via an approximate L2 projection were the most efficient. The results indicate that high-order GPU-accelerated FR schemes with antialiasing via an approximate L2 projection are a good candidate for underpinning accurate wall-resolved ILES of unsteady, separated, turbulent flows in the vicinity of complex engineering geometries.

H. BlackburnAssociate Editor

Acknowledgments

The authors would like to thank the Engineering and Physical Sciences Research Council for their support via an Early Career Fellowship (EP/K027379/1), a Platform Grant (EP/L000407/1), the Hyper Flux project (EP/M50676X/1), and a Doctoral Training Grant. Finally, the authors would like to thank the Centre for Innovation for access to the Emerald graphical processing unit cluster, Cambridge University, for access to the Wilkes graphical processing unit cluster and the Swiss National Supercomputing Centre for access to the Piz Daint graphical processing unit cluster. Data related to this publication is available as electronic Supplementary Material.

References

  • [1] Slotnick J., Khodadoust A., Alonso J., Darmofal D., Gropp W., Lurie E. and Mavriplis D., “CFD Vision 2030 Study: A Path to Revolutionary Computational Aerosciences,” NASA/CR  2014-218178, 2014. Google Scholar

  • [2] Huynh H. T., “A Flux Reconstruction Approach to High-Order Schemes Including Discontinuous Galerkin Methods,” 18th AIAA Computational Fluid Dynamics Conference, AIAA Paper  2007-4079, 2007. LinkGoogle Scholar

  • [3] Hesthaven J. S. and Warburton T., Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and Applications, Springer–Verlag, Berlin, 2008. CrossrefGoogle Scholar

  • [4] Kopriva D. A. and Kolias J. H., “A Conservative Staggered-Grid Chebyshev Multidomain Method for Compressible Flows,” Journal of Computational Physics, Vol. 125, No. 1, 1996, pp. 244–261. doi:https://doi.org/10.1006/jcph.1996.0091 JCTPAH 0021-9991 CrossrefGoogle Scholar

  • [5] Wang Z. J., Liu Y., May G. and Jameson A., “Spectral Difference Method for Unstructured Grids II: Extension to the Euler Equations,” Journal of Scientific Computing, Vol. 32, No. 1, 2007, pp. 45–71. doi:https://doi.org/10.1007/s10915-006-9113-9 JSCOEB 0885-7474 CrossrefGoogle Scholar

  • [6] Witherden F. D., Farrington A. M. and Vincent P. E., “PyFR: An Open Source Framework for Solving Advection–Diffusion Type Problems on Streaming Architectures Using the Flux Reconstruction Approach,” Computer Physics Communications, Vol. 185, No. 11, Nov. 2014, pp. 3028–3040. doi:https://doi.org/10.1016/j.cpc.2014.07.011 CPHCBZ 0010-4655 CrossrefGoogle Scholar

  • [7] Witherden F. D., Vermeire B. C. and Vincent P. E., “Heterogeneous Computing on Mixed Unstructured Grids with PyFR,” Computers and Fluids, Vol. 120, Oct. 2015, pp. 173–186. doi:https://doi.org/10.1016/j.compfluid.2015.07.016 CrossrefGoogle Scholar

  • [8] Jameson A., Vincent P. E. and Castonguay P., “On the Non-Linear Stability of Flux Reconstruction Schemes,” Journal of Scientific Computing, Vol. 50, No. 2, 2012, pp. 434–445. doi:https://doi.org/10.1007/s10915-011-9490-6 JSCOEB 0885-7474 CrossrefGoogle Scholar

  • [9] Gassner G. J. and Beck A. D., “On the Accuracy of High-Order Discretizations for Underresolved Turbulence Simulations,” Theoretical and Computational Fluid Dynamics, Vol. 27, Nos. 3–4, 2013, pp. 221–237. doi:https://doi.org/10.1007/s00162-011-0253-7 TCFDEP 0935-4964 CrossrefGoogle Scholar

  • [10] Kirby R. M. and Sherwin S. J., “Stabilisation of Spectral/hp Element Methods Through Spectral Vanishing Viscosity: Application to Fluid Mechanics Modelling,” Computer Methods in Applied Mechanics and Engineering, Vol. 195, Nos. 23–24, 2006, pp. 3128–3144. doi:https://doi.org/10.1016/j.cma.2004.09.019 CMMECC 0045-7825 CrossrefGoogle Scholar

  • [11] Mahesh K., Constantinescu G. and Moin P., “A Numerical Method for Large-Eddy Simulation in Complex Geometries,” Journal of Computational Physics, Vol. 197, No. 1, 2004, pp. 215–240. doi:https://doi.org/10.1016/j.jcp.2003.11.031 JCTPAH 0021-9991 CrossrefGoogle Scholar

  • [12] Gassner G. J., “A Skew-Symmetric Discontinuous Galerkin Spectral Element Discretization and its Relation to SBP-SAT Finite Difference Methods,” SIAM Journal on Scientific Computing, Vol. 35, No. 3, 2013, pp. A1233–A1253. doi:https://doi.org/10.1137/120890144 SJOCE3 1064-8275 CrossrefGoogle Scholar

  • [13] Mengaldo G., De Grazia D., Moxey D., Vincent P. E. and Sherwin S. J., “Dealiasing Techniques for High-Order Spectral Element Methods on Regular and Irregular Grids,” Journal of Computational Physics, Vol. 299, Oct. 2015, pp. 56–81. doi:https://doi.org/10.1016/j.jcp.2015.06.032 JCTPAH 0021-9991 CrossrefGoogle Scholar

  • [14] Vincent P. E., Castonguay P. and Jameson A., “A New Class of High-Order Energy Stable Flux Reconstruction Schemes,” Journal of Scientific Computing, Vol. 47, No. 1, April 2011, pp. 50–72. doi:https://doi.org/10.1007/s10915-010-9420-z JSCOEB 0885-7474 CrossrefGoogle Scholar

  • [15] Vincent P. E., Farrington A. M., Witherden F. D. and Jameson A., “An Extended Range of Stable-Symmetric-Conservative Flux Reconstruction Correction Functions,” Computer Methods in Applied Mechanics and Engineering, Vol. 296, Nov. 2015, pp. 248–272. doi:https://doi.org/10.1016/j.cma.2015.07.023 CMMECC 0045-7825 CrossrefGoogle Scholar

  • [16] Karniadakis G. and Sherwin S. J., Spectral/hp Element Methods for Computational Fluid Dynamics, Oxford Univ. Press, Oxford, England, U.K., 2005. CrossrefGoogle Scholar

  • [17] Kirby R. M. and Karniadakis G., “De-Aliasing on Non-Uniform Grids: Algorithms and Applications,” Journal of Computational Physics, Vol. 191, No. 1, 2003, pp. 249–264. doi:https://doi.org/10.1016/S0021-9991(03)00314-0 JCTPAH 0021-9991 CrossrefGoogle Scholar

  • [18] Kirby R. M. and Sherwin S. J., “Aliasing Errors Due to Quadratic Nonlinearities on Triangular Spectral/hp Element Discretisations,” Journal of Engineering Mathematics, Vol. 56, No. 3, 2006, pp. 273–288. doi:https://doi.org/10.1007/s10665-006-9079-5 JLEMAU 0022-0833 CrossrefGoogle Scholar

  • [19] Swalwell K. E., “The Effect of Turbulence on Stall of Horizontal Axis Wind Turbines,” Ph.D. Thesis, Monash Univ., Melbourne, Australia, 2005. Google Scholar

  • [20] Garbaruk A., Shur M. L., Strelets M. Kh. and Travin A. K., “3 NACA0021 at 60 o Incidence,” DESider, A European Effort on Hybrid RANS-LES Modelling: Results of the European-Union Funded Project, 2004-2007, edited by Haase W., Braza M. and Revell A., Springer–Verlag, Berlin, 2009, pp. 127–134, Chap. V Applications—Test Cases. Google Scholar

  • [21] Garbaruk A., Leicher S., Mockett C. R., Spalart P. R., Strelets M. Kh. and Thiele F. H., “Evaluation of Time Sample and Span Size Effects in DES of Nominally 2D Airfoils Beyond Stall,” Progress in Hybrid RANS-LES Modelling, Vol. 111, edited by Peng S. -H., Doerffer P. and Haase W., Springer-Verlag, Berlin, 2010, pp. 87–99. CrossrefGoogle Scholar

  • [22] Geuzaine C. and Remacle J.-F., “Gmsh: A 3-D Finite Element Mesh Generator with Built-in Pre- and Post-Processing Facilities,” International Journal for Numerical Methods in Engineering, Vol. 79, No. 11, Sept. 2009, pp. 1309–1331. doi:https://doi.org/10.1002/nme.v79:11 IJNMBH 0029-5981 CrossrefGoogle Scholar

  • [23] Kim S.-S., Kim C., Rho O.-H. and Hong S. K., “Cures for the Shock Instability: Development of a Shock-Stable Roe Scheme,” Journal of Computational Physics, Vol. 185, No. 2, March 2003, pp. 342–374. doi:https://doi.org/10.1016/S0021-9991(02)00037-2 JCTPAH 0021-9991 CrossrefGoogle Scholar

  • [24] Cockburn B. and Shu C.-W., “The Local Discontinuous Galerkin Method for Time-Dependent Convection-Diffusion Systems,” SIAM Journal on Numerical Analysis, Vol. 35, No. 6, 1998, pp. 2440–2463. doi:https://doi.org/10.1137/S0036142997316712 SJNAEQ 0036-1429 CrossrefGoogle Scholar

  • [25] Kennedy C. A., Carpenter M. H. and Lewis R. M., “Low-Storage, Explicit Runge–Kutta Schemes for the Compressible Navier–Stokes Equations,” Applied Numerical Mathematics, Vol. 35, No. 3, Nov. 2000, pp. 177–219. doi:https://doi.org/10.1016/S0168-9274(99)00141-5 ANMAEL 0168-9274 CrossrefGoogle Scholar

  • [26] Hairer E. and Wanner G., Solving Ordinary Differential Equations II. Stiff and Differential-Algebraic Problems, Springer–Verlag, Berlin, 1996. CrossrefGoogle Scholar

  • [27] Hunt J. C. R., Wray A. A. and Moin P., “Eddies, Streams, and Convergence Zones in Turbulent Flows,” Proceedings of the Summer Program, Center for Turbulence Research, Center for Turbulence Research, Stanford Univ., 1998, pp. 193–208. Google Scholar

Tables

Table 1 Summary of simulations

NameMesheskStabilization
P3-NP33None
P3-FP33Modal filtering
P3-AAP33L2 projection
P4-NP44None
P4-AAP44L2 projection

Table 2 Effect of filter strength s and cutoff ηc on the stability of a P3-F simulation with Lz=c up to t=400c/u with fixed κ=36 and N=50

sηcOutcome
161Stable
321Diverged (at t=95.87c/u)
162Diverged (at t=57.69c/u)
322Diverged (at t=30.17c/u)

Table 3 Effect of application frequency N on the stability of a P3-F simulation with Lz=c up to t=400c/u with fixed κ=36, s=16, and ηc=1

NOutcome
50Stable
75Diverged (at t=100.54c/u)
100Diverged (at t=1.92c/u)
125Diverged (at t=0.92c/u)

Table 4 Marker legend for Figs. 4, 5, 6, and 8

MarkerApproachLz
DLR (SA DDES)c
IMFT (k-ω OEM DES)c
NLR (X-LES)c
NUMECA (SA DES)c
TUB (SALSA DES)c
TUB (LLR DES)c
NTS (SST-SAS)2c
NTS (TRRANS)2c
TUB (CEASM DES)3.24c
ANSYS (SST-SAS)4c
NTS (SA DES)4c
EADS-M (SA DES)7.2c

Table 5 St for peak 1 and peak 2 of the PSD of Cl for all simulations (for reference, the experimental St for peak 1 was 0.1994 and for peak 2 was 0.3987 [19])

SpanP3-FP3-AAP4-AA
LzPeak 1Peak 2Peak 1Peak 2Peak 1Peak 2
c0.16600.33200.23480.44350.18550.3027
2c0.18550.34180.18550.36130.19530.3711
4c0.18550.38090.19530.39060.19530.3906
7c0.18550.38090.19530.39060.19530.3809