Predicting Aviation Hazards During Convective Events, ch. 4

4 Results

This section is divided into several parts. The first part, forecast products, accounts for the aviation targeted forecast products published online at by the Swedish aviation administration (Luftfartsverket) and produced by SMHI. It also includes some discussion and references to other studies concerning the usefulness and validity of such products.

The second part describes the numerical prediction models in use in Sweden today whose accuracies and handlings of convection are large contributors to the reliability of the aforementioned forecast products. It also includes a summary of the convection indices used. This is followed by part three which is a summary of studies evaluating the models in the context of convection.

The fourth part consists of insights received from the correspondences with SMHI and METOCC. Lastly, in part five, the results from the previous sections are summarized and weighted together. Finally, an attempt is made to answer the question presented in the introduction: “What are the chances of a pilot planning a safe flight?”

4.1 Forecast products

In Sweden , a collection of weather observations and predictions especially designed for pilots are assembled at the briefing web page of the Swedish aviation administration ( Most products are observations or analyses, already suggesting that the best information a pilot can get before a flight is the current weather directly before the flight. However , this study focuses on forecast products, and the Swedish aviation academy publishes a few different such products produced by SMHI.

One of them is the Terminal Area Forecast (TAF), regularly issued for approximately 30 airports around Sweden. Four of these are 24 hour forecasts published every sixth hour, while the rest are 9 hour forecasts updated every third hour (SMHI, 2015c). If the progress of the weather changes distinctly from what was forecasted in the latest TAF, an amendment (TAF AMD) can be issued according to the judgement of the forecaster (National Weather Service, 2014). TAFs are issued in many countries across the world, but the ones in Sweden are governed and produced by SMHI and the Swedish military (SMHI, 2015b).

A TAF is a forecast written in a semi-coded text giving information about the predicted wind direction and speed for the period as well as information about visibility, cloud coverage, cloud base and precipitation. It does not include any cloud types except for CB and TCU which are written explicitly (SMHI, 2015b). If the wind direction is predicted to vary significantly (as it would in the presence of thunderstorms) the direction is omitted and replaced by the abbreviation VRB (variable). In the presence of gusts exceeding the mean wind velocity with 10 knots or more (also to be expected from thunderstorms), the maximum gust speed is added after the wind speed (SMHI, 2015b). The precipitation type is always specified, as well as a rough estimate of the intensity. Sometimes a description is also included, for example indicating if the precipitation will come in showers (SH) and/or hail (GR, GS), which are other indicators for convective activity. In the expected presence of thunder, this is included with the abbreviation TS (Thunderstorm) (SMHI, 2015b). In the case of especially stormy TS and/or CB there are additional abbreviations for squalls and funnel clouds (SMHI, 2015b), but these are seldom seen in Sweden.

If the weather is predicted to change over this period, and if this causes one or more of the variables in the TAF to change in a way that exceeds a specified limit, a change is made in the TAF. This can be in the form of a probable change (PROB), a definite change (BECMG) or a temporary change (TEMPO). The probable changes are specified for either 30% or 40% chance, and all of them are usually stated to occur within a specified time interval (SMHI, 2015b). This means that during certain hours there is more than one forecast state valid, and the pilot should prepare for the worst of them but cannot assume that this will be the actual outcome. An example of a TAF is included below (SMHI, 2015b).

TAF ESXX 120830Z 1209/1218 34010KT 9999 BKN020 TEMPO 1212/1218 VRB15G30KT 3000 +SHRA OVC010CB PROB40 1214/1217 TSRA=

All in all, a TAF is designed specifically for pilots and the information given is adjusted accordingly. Predictions of convective weather are not included explicitly, but indications of significant events can be found in the case that the TAF includes variable wind and gusts, showers of precipitation, hail, and of course whether the abbreviations TS, CB or TCU are included.

The biggest flaw of a TAF is that it is limited to the Terminal Area of an airport, meaning that most information in the TAF is only valid for an area of 5km within the runway. However , CB is included in the forecast as soon as it is predicted to be inside a radius of 15km from the runway (Flygteoriskolan, n.d.). Even so, the few regular TAFs far from cover the entire country and hence the pilot needs to check for additional information if he or she is intending to fly en route.

Other drawbacks are the short range of the forecasts, the complicated abbreviations and the lack of visualization. Furthermoremoreover / furthermore / in addition / additionally , the fact that no changes are made in the TAF unless they exceed a specified value proposes a risk of distinct changes in the weather being “lost” in the forecast just because it was not big enough. As an example, this could be a situation where the wind direction changes with 50° or the wind speed increases with 9 knots but is not included in the TAF since the limits require a change of 60° and 10 knots respectively. Fortunately, if the presence of a CB is predicted to occur, it would never be omitted (SMHI, 2015b).

No evaluations of the Swedish TAFs have been found for this study, but Mahringer (2008) has verified TAFs over Graz Airport in Germany over a period of 3 months (September to November, 2006). The overall results showed that the forecasts have a tendency to overestimate the worst weather values and underestimate the best values, indicating that the forecasters making a TAF would rather include possible scenarios that never actually occur than miss certain developments (Mahringer, 2008).

As a complement to TAF, most Swedish pilots also take a look at the Nordic Significant Weather Chart (NSWC) before a flight (Figure 2). It is an analytical chart of “significant weather” covering the Nordic countries as seen in Figure 2. Today , the NWSC is issued 3 hours in advance, but only for every sixth hour (SMHI, 2015e). This means that it can be used as a forecast some hours of the day, but will show the past weather during others. It shows the location of fronts, high and low pressure areas, cloud cover, vertical cloud distribution, different kinds of precipitation, troughs and much more. It also shows certain aviation hazards such as icing and turbulence of different degrees.

Areas of active convective weather are easy to recognize on the NSWC. Green triangles show the general location of showers, and in the case of thunder, a red runic R is displayed. Moreover , the occurrence of CB is specified along with the information about cloud distribution. However , although this gives a good estimate of the weather situation and warns a pilot for the general area of convective activity along with its extent and approximate intensity, it does not give any exact locations of separate cells and specific hazards.

Fig 2 [Figure not shown]

No evaluations of the NSWC were found for this study, but the limitations of update frequency and lacking precision stated above are evident. From 2 June, 2015, the chart will look a bit different as SMHI and the Finnish National Weather Service will start producing a joint chart (Lundblad, 2015a). One improvement when it comes to convection is the introduction of severe squall lines. Furthermore , the issuance time will be changed to four hours in advance, resulting in it being a forecast for a longer time (Lundblad, 2015a).

Other than TAF and NSWC, on the same briefing site a pilot can also find a low-level forecast focusing on smaller airplanes flying lower than other air traffic as well as flying under visual flight rules (VFR). This product is useful for small-plane pilots planning to fly en route (away from the departure airport) where it is necessary to look at forecasts covering bigger areas. There is also a need of such alternative forecast products in places where the airfields are too small or insignificant to have a TAF produced.

The Low Level Forecasts (LLFs) are published four times a day (SMHI, n.d.) and cover a period of six to seven hours ahead. In the evening , a summarized approximate forecast is done for the next day as well (SMHI, n.d.). The LLFs are produced for almost the entire country for areas of about from ground level to flight level 125 (about 3.8 km) and show detailed information about wind (speed and direction) at different levels as well as visibility, cloud base, precipitation, icing and turbulence (SMHI, n.d). The wind information includes an estimate of the maximum gusts predicted. In the case of TCU or CB these are included as significant weather and this implies the occurrence of turbulence and icing without explicitly including warnings of such (SMHI, 2015d). The LLFs are presented both graphically and in text, an example is shown in Figure 4 for the yellow area in Figure 3.

The reason information about turbulence and icing for TCU and CB is omitted in an LLF is easy to understand since you would find a large range of different intensities of both of these parameters inside the cloud. As mentioned before, these clouds are very local and so are its accompanying effects of turbulence and icing. It would therefore be difficult, if not to say impossible, to present an accurate distribution in time and space of different intensities of these two variables.

Fig 3 [Figure not shown]

Table 2 [Table not shown]

Table 2 – SMHIs Low Level Forecasts (LLF) for an area in middle Sweden 30 April, 2015. Showing averages of a number of forecasted weather features between 13.00 and 19.00 the same day (SMHI, n.d). Reprinted with permission.

Evaluations were not found for this product either, but because the weather is only given for predefined areas much bigger than the size of a convective event, it is clear that the LLF has limitations of lacking precision as well, even if not to the same extent as for the NSWC. According to Maria Lundblad (2015b) at SMHI, the LLF will soon be taken out of production since a similar product is produced in cooperation with the Danish Meteorological Institute, DMI, at

Finally, the forecasters at SMHI publish short range warnings of severe weather capable of significantly affecting the safety of the air traffic. These warnings are called SIGMET (SIGnificant METeorological information) and are issued at most four hours before the event is due. The warnings comprise thunderstorms that are obscured, embedded, frequent or in squall lines, but not for isolated thunderstorms or CB without thunder. It also warns for extreme turbulence and severe icing, but alike the LLF such hazards are implied for any convective event and are not warned for explicitly in such a case. SIGMETs are seldom issued, but according to Fyrby (2015) at SMHI, when they are, summertime thunder is by far the most common cause. No evaluations of the Swedish SIGMETs were found.

Assessing the total amount of information a pilot can get from the mentioned products, it can be said that the pilot can be informed about the occurrence of CB and TS in the area a few hours ahead of time, but cannot get any specific information such as exact locations of the cells at specific times or the intensity and severity of turbulence and icing accompanying the cells.

4.2 Models and indices behind the products

The forecasts mentioned above are produced by forecasters at SMHI with assistance from the numerical weather prediction models HIRLAM (High Resolution Limited Area Model), AROME and ECMWF (European Centre for Medium Range Forecasts) (Fyrby, 2015). This section is a summary of the technical configurations of the models as well as of the convection indices in use by SMHI.

HIRLAM is used with two different resolutions, 5 km (hereafter HIRLAM05) and 11 km (HIRLAM11), while a resolution of 2.5 km is in use for AROME (AROME2.5) (Fyrby, 2015). Vertical resolution is defined in levels, HIRLAM using 65 levels (Olsson, 2015) and AROME 60 levels (Météo-France, n.d.). All three are used for forecasts of ranges up to 48 or 60 hours (Björck, 2015b). One of the largest differences between these models apart from the resolution is the fact that HIRLAM approximates hydrostatic equilibrium while AROME does not. Another big difference lies in AROME being able to explicitly resolve deep convection because of its higher resolution, while HIRLAM needs to parameterize this (Ivarsson, 2015).

For both models , the initialization data comes from past forecasts and current observations that are weighted together with the use of an analysis system. HIRLAM05 and AROME2.5 use 3DVAR variational methods for the analysis while 4DVAR is used for HIRLAM11 (Ivarsson, 2015). Boundaries are taken from the global model ECMWF for all models (Ivarsson, 2015).

Outputs of prognostic variables defined for all model levels are both in HIRLAM and AROME the following: Horizontal wind components, temperature, specific humidity, pressure, geopotential height, vertical wind velocity and turbulent kinetic energy. Moreover , cloud cover, cloud liquid water, cloud ice and precipitation type and amount are calculated (HIRLAM consortium, n.d., Bouttier, 2009).

For HIRLAM, the Kain-Fritsch convective parameterization scheme (Kain & Fritsch, 1990) is in use for both shallow and deep convection while for AROME, parameterization through an eddy-diffusivity/mass-flux scheme (EDMF) is in use for shallow convection (Ivarsson, 2015).

For forecasts over 60 hours, only the centre model ECMWF is used. The current configuration uses a resolution of 16 km (Urquhart, n.d.). Because this resolution is too big to resolve convective events and because the model is only used fully for forecasts of longer ranges, further configurations are not examined in this study.

In the process of making the aviation-targeted forecast products listed in section 4.1 (Forecast products), to know when to include warnings of CB or thunderstorms, the forecasters use the methods described in section 2.3 (Forecasting Methods) where a lot of emphasis is put on vertical profiles of the atmosphere (Fyrby, 2015). For this purpose, simulated soundings based on forecasts can be put together from the outputs of the models (Björck, 2015b).

The models give outputs of precipitation and, according to Fyrby (2015), to some extent also of convective clouds. Because of its parameterization schemes , HIRLAM can differentiate between convective and stratiform precipitation (Portal, 2013). AROME gives precipitation fields of rain, snow and graupel (small hail), but it cannot differentiate between convective and stratiform precipitation explicitly (Portal, 2013). To be able to predict convective weather more specifically, different indices are used. Among them, the forecasters at SMHI use scripts that give information about where convective clouds will develop, at what level the bases will occur and how high the tops will reach in terms of top-temperature (Fyrby, 2015). From this they use rule of thumb-values to receive the risk of rain and thunder.

Since HIRLAM parameterizes convection with the Kain-Fritsch convective scheme, the model needs to calculate values of CAPE and CIN and thus outputs of these are directly available for the forecasters to use (HIRLAM consortium, n.d.). AROME provides indices of thunderstorm risk, one hour accumulated lightning strikes and integrated graupel amount (Fyrby, 2015). The forecasters at SMHI also use indices such as the KO-index, K-index and an internally developed index called the “thunder index” (Fyrby, 2015).

4.3 Evaluations of models and indices

Earlier, in addition to the forecast products listed earlier, SMHI also produced the “thermal forecast” designed for glider pilots. From experience received by working with this product, Olofsson (n.d.) (confirmed by Olsson, 2015), writes that HIRLAM has trouble differentiating between dry thermals and thermals accompanied by small or few cumulus. The problem is partly explained by the limited vertical resolution making it hard to resolve details of inversions, but also by the difficulties in describing the moisture fluctuations from the ground (Olofsson, n.d, Olsson, 2015).

Stensrud (2007) has discussed the grid spacing dependency of the Kain-Fritsch parameterization scheme used by HIRLAM. He referred to a study made in 1999 and stated that the precipitation produced by the scheme has been shown to be very sensitive to grid spacing. As grid-spacing decreases the scheme generally produces larger and more accurate maximum rainfall amounts and overall greater detail. Although this is to be expected, the location of the rainfall “is often displaced several hundreds of kilometres from the actual position” (Stensrud, 2007).

A study done by Björck (2010) for SMHI looked at the performance of five different models. Among those were HIRLAM11, HIRLAM05 and AROME2.5, which are in use by SMHI today. Differences from the configuration of the models in use today were  area coverage (smaller for the study), AROME being initialized by HIRLAM instead of ECMWF, and data assimilation being done by 3DVAR for HIRLAM11 instead of 4DVAR (Björck, 2015a, Björck, 2010, Ivarsson, 2015). For verification reference observational RADAR data was used while analytical data derived by MESAN was used to examine the performance of the models (Björck, 2010).

Björck (2010) examined the effect of increased model resolution on convective precipitation and even though precipitation is only one result of convective events, this can still give an indication on the overall performance of the forecasts. Howeverone should be aware that the amount of precipitation not necessarily reflects the intensity of the convective event. A vigorous thunderstorm with forceful gusts, wind shears and even lightning might in some cases cause only light or even no precipitation (National Weather Service, 2010).

The verification was done by using a set of different neighbourhood verification sizes together with a set of different accumulation thresholds of rainfall intensity which gives positive feedback when the forecast has predicted “the correct intensity in the vicinity of the observed event” (Björck, 2010). The models were run over a limited area including southern Sweden for three hand-picked days with convective activity. The simulations started at 00 UTC and the verifications were made at 9, 12, 15 and 18 UTC the same day (Björck, 2010).

Björck (2010) presented results of the frequency bias for different threshold values for every model each of the three days. The results were not consistent but from the way AROME differed from the rest he concluded that the other models that use parameterization for convection acted to restrain forecasts of high intensity (Björck, 2010). In two of the cases examined , this favoured HIRLAM over AROME as AROME had a quite high frequency bias from tending to overestimate the number of high intensity precipitation events. However , the actual number was scarce during these days, and AROME scored better in the third case, where events of high precipitation were more numerous (Björck, 2010). That is, while AROME was able to simulate local events of high precipitation much more efficiently than HIRLAM, it tended to overestimate the number of such events .

Other results showed the different models’ performances by calculating the FSS (fractions skill score) while varying the size of the verification neighbourhood as well as varying the threshold values. Conclusions drawn suggested that no model had a clear advantage over the others and that increasing model resolution did not seem like an obvious solution to getting better forecasts. On the other hand, three cases are far from enough to make justified conclusions (Björck, 2010).

Björck (2010) also discussed the characteristics of the outputs from the different models by showing graphical examples including those in Figure 4. He emphasized that even though the models with higher resolution look more detailed they should not be assumed to necessarily be so. There is a big risk of them being overly trusted. Björck did not mention the differences of HIRLAM11 and HIRLAM05 more than that they showed no significant differences “neither in frequency bias nor spatial accuracy” (Björck, 2010) and that “the Kain-Fritsch parameterization scheme appears to be of equal functionality for both scales” (Björck, 2010).

Fig 4 [Figure not shown]

Figure 4 – Example of precipitation forecasts over southern Sweden from HIRLAM11 (a), HIRLAM05 (b), AROME2.5 (c) and radar observation (c). The fields show precipitation in mm/3h. Dark green and pink colours show areas of extra heavy precipitation (up to 30mm/3h). The forecast range is 15 hours. Björck (2010), reprinted with permission.

Additionally , Björck (2010) made an attempt to examine the temporal variability of the forecasts from AROME in terms of the FSS. This was done to account for the possibility of the forecasts being more incorrect in terms of time than space. However , while the FSS clearly varied with time, the results showed no clear bias in temporal error . As a reflection of his own results, Björck (2010) recommended that “high-resolution models should be used on a larger scale and the realistic detail should not be trusted” (Björck, 2010).

In another study done by Weusthoff et al. (2010) an evaluation was done of the benefits of finer resolution models that could simulate deep convection explicitly. This was done by comparing three fine resolution models with their larger resolution counterparts over a period of 6 months. AROME 2.5 was compared to ALADIN with 10.0 km resolution (hereafter ALADIN10) over a limited area over alpine terrain in Switzerland, both initialized at 0000UTC. The parameter examined was precipitation and just like in the previous study, verifications were done using the neighbourhood methodology along with radar observational data (Weusthoff et al., 2010).

The study showed results of FSS and ETS (Equitable Threat Score) both as an average over the entire time period of six months as well as an average solely over days in which precipitation over 1 mm / 3 h was produced for at least 10 000 grid points according to radar data. The ETS uses the areal mean of the precipitation fields in the neighbourhood to compute categorical scores while FSS shows the frequency of the precipitation fields exceeding a critical threshold.

The results from the entire time period (Figure 5) showed a clear advantage of AROME2.5 over ALADIN10 when it came to FSS, while ETS considered the two models rather equal. According to Weusthoff et al. (2010), because the FSS and ETS indicated different results when comparing the two models, the slight improvements of precipitation forecasts were probably the cause of a better forecast of the precipitation structure, resulting in higher FSS, rather than a better quantitative precipitation forecast.

Fig 5 [Figure not shown]

Figure 5 – Results from Weushoff et al. (2010) showing a comparison between AROME2.5 and ALADIN10. The values are averages over a full 6 month period of FSS (left) and ETS (right) for different threshold values and neighbourhood spatial scales. The numbers show the scores of ALADIN10 while the colours show the difference between the two models. ©American Meteorological Society. Used with permission.

Fig 6 [Figure not shown]

Figure 6 – Results from Weushoff et al. (2010) showing a comparison between AROME2.5 and ALADIN10 as in Figure 5, but only during specifically rainy days as specified in the text. ©American Meteorological Society. Used with permission.

Looking instead at the summary for precipitation days only for the FSS score (Figure 5), ALADIN10 appeared to forecast events of lighter precipitation better than AROME2.5, while AROME2.5 still had an advantage over ALADIN10 for heavier precipitation. The ETS still suggested that the two models were equal.

The scores were higher in this conditional verification which was only to be expected since the FSS is known to be correlated to the rain amount. The two scores behave differently because of the different properties of the scores. The indication that ALADIN might have been better than AROME during days with a lot of precipitation for low thresholds should be considered with caution since these cases were very few when only particularly rainy days were considered (Weusthoff et al., 2010).

Simply examining the absolute values of the scores, Weusthoff et al. (2010) concluded that both models had skill in predicting convective events, but that these were best for low precipitation thresholds and large spatial scales. Score values were around 0.8 for FSS and 0.43 for ETS at a threshold of 0.1 mm/ 3 h and a spatial scale of 150 km (Weusthoff et al., 2010).

For all model pairs , the study showed that the high-resolution models mostly outperformed their lower-resolution counterparts. Weusthoff et al. (2010) reasoned that this is the cause of the higher resolution models’ abilities to explicitly resolve deep convection.

In 2008 an evaluation of the preoperational AROME model with 2.5 km resolution was done as described by Seity et al. (2010). The area of the study was limited to France and it was compared to the ALADIN model already in use over the same area. The verification was done by using observational data from radiosonde measurements and ground-based observations of pressure, temperature, humidity, wind and precipitation.

Conclusions drawn concerning convection was first of all that the model seemed to overestimate heavy rainfall, but this was apparently corrected for before the operational model was put in use (Seity et al. 2010). Secondly, a case study showed that the model was able to reproduce “the classic features of strong cells that have thunder” (Seity et al., 2010) in that of wind speeds and directions (including vertical), the development of an anvil as well as heavy rain. On the other hand it simulated the case event with a delay of 2 hours and “slightly overestimated horizontal size and intensity” (Seity et al., 2010).

Although not yet implemented by the institutes in Sweden, there is a new higher resolution model of AROME (1.3 km) available in 2015. Seity et al. (2014) have evaluated the performance in increased horizontal and vertical resolution of convective events of this new tool and compared it to AROME2.5. The study was done over 48 convective days and the different model resolutions were applied to examine the number of simulated convective cells in terms of diurnal cycle, cell size and maximal intensity (dBZ).

Results from the study are shown in Figure 7 and from them Seity et al. (2014) concluded that the higher resolution model predicted the convective events more accurately. The results also indicated that AROME2.5 had a tendency to predict cells that are too large.

Fig 7 [Figure not shown]

Figure 7 – Results from Seity et al. (2014) showing the number of convective cells simulated by AROME1.3 (red) and AROME2.5 (green) as well as detected by RADAR (blue). (a) as a function of time, (b) as a function of cell size (km²) and (c) as a function of simulated or detected radar reflectivity intensity (dBZ). Reprinted with permission.

In the context of indices, in 2013 an investigation was done by SMHI concerning the performance of integrated graupel as an index for convective precipitation in the pre-operational AROME2.5 (Portal, 2013). The index was tested along with the field for lightning strikes for different kinds of weather situations with showers and/or thunderstorms and was compared to radar and lightning detection data as well as to synoptic ground observations (Portal, 2013). At the same time , convective precipitation fields and the CAPE chart from HIRLAM11 was tested so as to see whether AROME with its graupel index could outperform the already operational HIRLAM in events of convection.

The evaluations were done at 09, 12 and 15 of a forecast issued at 00 the same day (Fyrby, 2015). The two models were evaluated over various areas in northern Europe, focusing on southern Scandinavia, with a grade from 1 to 10, where a grade of 10 was a perfect score. The grade was not based on any pre-determined threshold values or skill scores, but rather the meteorologist’s personal perception of the models’ performances (Portal, 2013). Around 430 grades were given for each of the models.

Results showed that AROME scored better than HIRLAM with an average grade of 7.4 as opposed to 6.4. Looking at each model separately, when addressing the different performances in terms of geography, intensity and time, it was found that both models resulted in close to consistent grades between these criteria (Figure 8a). The performance of the two models along the day appeared to be only slightly decreasing (Figure 8b), indicating that both models are quite consistent in time when looking in the short range. It was concluded that graupel was indeed a good index for convective precipitation and was put in use along with AROME in 2014 (Portal, 2013).

Fig 8 [Figure not shown]

Figure 8 – Evaluations from SMHI (Portal, 2015) of AROME2.5 (dark) and HIRLAM11 (light) in the performance of forecasting convective events. Grades (1–10) are estimated by meteorologists at SMHI from their own judgment. In (a), the performance is divided between criteria of geography, intensity and time. In (b), the performance along the day is presented. The grades are averages of around 430 evaluations done of each model. Reprinted with permission.

Two other studies of indices have been included. One was done by Haklander and van Delden (2003) over the Netherlands, and the other was done by Kunz (2007) over south-west Germany. These two areas differ considerably when looking at the terrain, but otherwise the two studies are very similar. Both studies looked at the performance of different thunderstorm predictors and convective indices such as CAPE, KO and the K-index used by SMHI (Haklander & van Delden, 2003, Kunz, 2007). The indices were derived from the vertical profiles of radiosondes in both studies. Haklander & van Delden used sixth-hourly soundings between 1993 and 2000 while Kunz (2007) used daily soundings between 1986 and 2003. A range of verification scores based on yes/no thresholds were used in the studies, among those the Heidke Skill Score (HSS) and the True Skill Statistic (TSS) which were used by both. The methods of calculating the optimum thresholds differed somewhat between the studies.

Among CAPE, KO and the K-index, all three had a HSS score of 0.46 for Kunz (2007), while for Haklander & van Delden (2003), CAPE had 0.35, KO had 0.31 and KO and 0.21. The scores were generally slightly higher for TSS, with a maximum of 0.52 for KO found by Haklander & van Delden (2003). Perfect scores for both indices would be 1, making it apparent that all of these indices have flaws and cannot be fully trusted.

4.4 Comments from Fyrby at SMHI and Jakobsson at METOCC

Through correspondence with both the SMHI and METOCC a perspective of the difficulties of convective forecasting from experienced forecasters has been gained. Those contacted also work in departments specifically targeting the air traffic and hence know firsthand of the routines for giving pilots information about the weather. The comments are summarized below.

When asked about in what range forecasts including convection are reliable, Tomas Fyrby (2015), Aviation Meteorologist from the air traffic department at SMHI, said that while short term forecasts of 0–12 hours are by far the best, one day forecasts or sometimes even two day forecasts can be okay as well. In terms of briefing a pilot a few hours ahead, relatively good information can be given about which areas will be subjected to CB activity and how intense they will get. However , it is almost impossible to tell more than one hour ahead where this activity, such as showers, will be located. The best way is to brief a pilot directly before take-off by looking at observational data and trying to find passages between the cells (Fyrby, 2015).

Lieutenant colonel P.O. Jakobsson (2015), Defence Meteorologist at METOCC, was asked the same question, and answered differently for organized convection and randomized convection. For organized convective clusters a forecast for the same day can be made with quite good accuracy over the course of the entire day. Jakobsson (2015) said this is because once an area of organised convective cells is created it is generally possible to predict quite accurately how active it will be and where it is going to move. For randomized thermal convection however , detailed forecasts are only somewhat reliable in a range of 1–2 hours and have an approximate accuracy in the range of one hour. Forecasts for days ahead can only be as detailed as suggesting that “in this or that area there will possibly/probably/maybe be one or more showers” (Jakobsson, 2015).

Jakobsson (2015) also discussed the use of observational data such as radar and satellite images to get information about the location and intensity of active areas. Even though these are just observations, they are the most useful tools for nowcasting, since a forecaster by himself can predict the general progress and development of convective cells a few hours ahead of time.

Concerning the capabilities of the models AROME and HIRLAM in use by SMHI, Fyrby (2015) suggested that while AROME has the highest resolution, the precision of the forecasts is not to be trusted. AROME presents convective cells with great detail and compared to HIRLAM its fine resolution makes it able to separate singular convective cells. It is also better at estimating the maximum rainfall. Nevertheless , since the precision is insufficient, the model often predicts the showers in the wrong location.

In contrast , HIRLAM gives poorer details and predicts a lesser amount of rainfall over a broader region, but can be preferable as the effects of double penalty makes HIRLAM generally score better than AROME (Fyrby, 2015). In reality though, a forecaster at SMHI uses both models to get a better overview and weighs up the different results to produce as good forecasts as possible (Fyrby, 2015). The reason for the poor precision in AROME is according to Fyrby (2015) the fact that the input data for the initialization includes too many approximations in its assimilations as a result of insufficient observational coverage.

Concerning the applications of the forecasts for the air traffic , Jakobsson (2015) explained that even though they have direct contact with the pilots out flying, the intensity of convective cells such as CB clouds are seldom discussed except for in terms of visibility inside showers or the presence of lightning. Since all weather-related education given to pilots urges them not to fly into or in the absolute vicinity of a CB, the pilots are well aware of the great hazards of turbulence and icing inside the clouds, and information about such are hence excessive (Jakobsson, 2015).

4.5 What are the chances of a pilot planning a safe flight?

To answer the questions posed in the introduction, a summary of the above results was made. Comparing the insights gained by Fyrby (2015) and Jakobsson (2015) to the results from the model evaluations, a conclusion was made that models today are not very accurate in predicting convective events. Björck (2010) and Olofsson (n.d.) seem to agree with Fyrby (2015) that HIRLAM has drawbacks in the form of too few details and overly expanded areal distributions. The details are too few to be able to tell a pilot how many CB or thunderstorm cells will be active and where each cell will be located.

In contrast , there is the higher-resolution non-hydrostatic model AROME2.5, which in theory should have the ability to capture single convective cells in a way that far exceeds the hydrostatic HIRLAM. However , based on the many investigations consulted of whether or not higher-resolution models score better (Björck, 2010, Weusthoff et al., 2010, Seity et al., 2010, Seity et al. 2014), it is clear that being able to explicitly resolve deep convection and leave out hydrostatic assumptions is not enough to produce considerably better forecasts. Even though the structure of convective events is captured much more realistically, the forecasts are far from perfect in space and time, as confirmed by Fyrby (2015).

As a consequence of the models’ imperfect handling of convection, a great amount of experience is required from the forecaster. Stensrud (2007) said “It is clear that there remains a need for human forecasters to interpret these types of numerical forecasts wisely.” However , this might not be that easy. From the studies done by Haklander & Van Delden (2003) and Kunz (2007), it was apparent that some of the indices frequently consulted by forecasters are far from perfect. Moreover , if the forecasted simulated soundings behind the indices have errors from the start, they will be even more unreliable.

The products examined seem to be good at warning for CB and thunderstorms but appear very imprecise in stating the location and time for these occurrences. They suggest that several different outcomes are possible at the same time. In the context of CB induced aviation hazards, the only hazards explicitly included are the presence of lightning, the maximum gust speed and the type and strength of precipitation in showers. As mentioned in section 4.1 (Forecast products), for all of the products, icing and turbulence from CB are not included in the forecast but are implied if CB or TS is printed. Microbursts and local wind shears are not warned for.

Referring to the usage of observational data suggested by Fyrby (2015) and Jakobsson (2015), in the absence of detailed forecasts, consulting observations and analyses just before the flight usually gives a good estimate of what the weather situation will be like. The progress of already developed convective cells is relatively easy to interpret, but as seen in the example of the balloon flight in Stockholm in 2007, the pilot should always be aware that the cells can take on a direction or course of development not anticipated.

Finally, to answer the question of what the chances are of a pilot planning a safe flight for a day of convective activity, the answer is that they are very small. Because of all the shortcomings in convective forecasting, it is concluded that if a pilot plans to go flying in active convective weather, he or she cannot be expected to plan an altogether safe flight beforehand. Even if the weather might allow a flight clear of the convective activity by zigzagging between the cells, the pilot will not be able to know beforehand how to plan his/her route to avoid them. However , this does not mean that the pilot cannot trust the forecasts products. In fact, they are probably correct quite often as a consequence of the fact that they are so vague and “safe” in the sense that all details are omitted and that several possible outcomes are presented at the same time. The fact that they span such a short time range and cover extensive general areas is also in their favour.

It is recommended for pilots to always use forecasts of as short range as possible. Longer-range forecasts should be complemented with nowcasts and observations just prior to a flight. In critical circumstances, with a lot of convective activity , the pilot should receive a personal close to real-time analysis from a meteorologist immediately before the flight and preferably even during the flight. In that way, the meteorologist can be more specific about the location and strength of active CB and TS in the direct vicinity of the intended route.

Using the products examined in this study can only provide information about in what general areas to expect CB and thunderstorms. The pilot cannot expect to get details of specific aviation hazards accompanying such weather. Therefore , pilots flying aircraft sensitive to icing and strong turbulence are recommended to always stay clear of CB and thunderstorms.