4.3 Descriptive statistics
A preliminary study of our data sample provided the main descriptive statistics of dependent and
explanatory variables. Table 2 shows the main descriptive statistics for the variables used in the
analysis for the entire sample and for firms during their life cycle, according to the groups sorted by the
cluster analysis approach. A brief review of the entire sample shows that the means and medians of
several of the variables were asymmetrically distributed. However, since small and medium-sized firms
typically comprise a heterogeneous group, this result was not unexpected. Only the variable SIZE was
not asymmetrically distributed. Furthermore, in the entire sample, financial debt relative to the capital
of the mean firm was about 45%. A comparison of the mean value with the median value, and
considering the standard deviation (about 31%), showed that financial debt, as source of finance, varied
considerably across firms.
6. Capital Structure Determinants at Different Stage of Their Life Cycle: Cluster Analysis
Results
To verify the existence of different capital-structure determinants for firms at different stages of
their life cycle, in this section the sample was sorted according to a cluster analysis approach. Instead
of using a deterministic approach, for example, by identifying, alternatively, young firms as those less
than 5, 10, or 15 years old, we applied an inductive criteria. The cluster analysis approach revealed
whether there were structural differences arising within the sample, and allowing to sort it,
independently of the arbitrary sorting criteria. The number of clusters leading to the greatest separation
(distance) was not known a priori but was computed from the data. The goal was to minimize
variability within the clusters and maximize variability between clusters. The two-step cluster analysis
employed here is an exploratory tool designed to reveal natural groupings (or clusters) within a dataset
that would otherwise not be apparent (He at al. 2005, Chiu et al. 2001).
The algorithm had several
desirable features that differentiated it from traditional clustering techniques. First of all, it allowed for
the handling of continuous variables (by assuming variables to be independent, a joint multinomial-normal distribution was applied to continuous variables) and automatically selecting the number of
clusters (by comparing the values of a model-choice criterion across different clustering solutions, the
procedure automatically determined the optimal number of clusters). Four clusters representing
different features were automatically identified. Cluster 1 was not representative and was deleted, as it
consisted of less then 1% of the firms in the entire sample. Cluster 2 represented about 14,5% of the
entire sample and consisted of old firms with an average age of 58 years and a standard deviation of
13.2 (8.9% sales growth on average).
Cluster 3 (about 39,7% of the whole sample) comprised mainly
middle-aged firms (28 years old) with a standard deviation of 6.3 (10.2% sales growth on average).
Cluster 4 (about 45,0% of the entire sample) represented young firms with an average age of 11 years
and a standard deviation of 5 (17.7% sales growth on average). According to the characteristics of the
clusters obtained, showed in table 4, clusters 4, 3 and 2, i.e., young, middle (growing), and old firms,
were analyzed. Table 5 shows the main descriptive statistics for the three clusters.