A Verification of Class Structure Evolution
Model and its Parameters
Mikio OHKI
Nippon Institute of Technology
4-1
Gakusendai Miyashiro
Saitama Japan
+81-480-33-7466
E-mail: ohki@nit.ac.jp
Shinjiro
AKIYAMA
JIPEngineering Service Co.,Ltd
8-3
Koamicho Nihonbashi Chuoku
Tokyo Japan
+81-3-3808-1361
Yasushi
KAMBAYASHI
Nippon Institute of
Technology
4-1
Gakusendai Miyashiro
Saitama Japan
+81-480-33-7466
E-mail:
yasushi@nit.ac.jp
ABSTRACT
It is widely accepted that the role of software garchitecth that provide frameworks to program developers is important in the object-oriented software development processes. When developers try to extend the base classes given by the architect, they may want some guidelines that tell them how many subclasses and how many methods in one subclass are reasonable. So far we are not aware of such guidelines. Through measurements of Java and Delphi class libraries, we have distilled formulae that forecast the number of methods and the number of subclasses when constructing class trees from the base classes. We propose that we should focus to extract methods and attributes rather than class structure. The formulae we have formulated support this proposition.
Categories and Subject Descriptors
[Metrics]: Object Oriented Program, Software
Evolution.
General Terms
Measurement, Experimentation, Verification.
Keywords
Evolution model, Architect, MeasurementCVerification .
It is well known that using pre-organized class libraries is
effective in the object-oriented software development processes. Jacobson et al. have proclaimed that
the key person in such software development is the garchitecth who defines the
framework for the target application [1].
The major task of the architect is to prepare the base classes for
the application domain so that the fellow developers can use these classes as
the framework for the application. Although architects usually provide
information about class structure as well as the function and usage of each
class, they do not provide how to extend those base classes through inheritance
and composition. What the
developers would like to know is the guidelines which of the following options
and when they should employ them to construct the subclasses. The options they can choose are:
(1) Construct one subclass and pack all the methods in
it.
(2)
Construct several subclasses in the same level, immediately one under the super
class, and make each subclass have a group of methods.
(3) Construct yet another structured set of subclasses
which has hierarchical structure.
So
far we are not aware of rules that can be used as guidelines to construct the
complete class structure based on a given application framework. Of course, we have some empirical
knowledge rules such as gClass hierarchy should be based on ecasesf,h but we
would like to have quantitative guidelines such as how many subclasses should
be constructed under one super class and how many methods each subclass should
have.
We
have found that well-organized class libraries have some common structural
pattern in their class hierarchy, and that such patterns are preserved through
class evolution. Therefore we have
analyzed the statistical characteristics of well-organized class
libraries, and distilled such patterns.
The patterns suggest a good way to construct subclasses of application
framework.
In this paper, we describe: 1) our hypothesis that the structure of a class is the history of the class evolution, 2) the idea to formulate model formulae to construct subclass, and 3) parameters for the formulae statistically computed from class libraries of Java and Delphi. We close our discussion with a new hypothesis that is suggested those formulae and the verification of them. We ignored the ginterfacesh in Java to simplify the discussion.
Several studies have tried to quantify to what extend the number of
methods and attributes are correlated with class structure [2]. Nakatani et al. have suggested,
gInheritance is a means to adapt to a new circumstance caused by requests for
changes that the super class can not handleh [3]. This thesis is based on the hypothesis that the class
structure shows the history of the given application. Reading class structure, we can trace how the application
has adopted to the new requirements and how each class has survived in the
course of design selections. Our
discussion is based on the idea that such effort for adoption is the driving
force of the inheritance.
The first step toward the guideline for subclass construction is to
find the relationship between a super class and the immediate subclasses. Through such relation, we can
statistically forecast the number of subclasses, attributes and methods. In order to analyze class libraries, we
use two viewpoints as follows:
1) The relationship
between the characteristics of a super class level i, i.e. the number of
attributes Ιi and the number of methods Μi, and those of the subclasses level i+1, i.e. the number of
attributes Ιi+1 and the number of methods Μi+1.
2) The relationship
between the characteristics of a super class level i and the number of
the subclasses of the super class ni+1 .
The observation upon the class libraries of Java and Delphi has
suggested that the number of methods and the number of attributes in all the
subclasses with level i+1 are related to the number of methods and the
number of attributes in the common super class level i,
respectively. These relationships
can be expressed as the following formulae.
°Ιi+1 = f(Ιi) ₯₯₯₯₯₯ (1)
°Μi+1 = g(Μi) ₯₯₯₯₯₯ (2)
There are two types of methods in the subclasses. One group is a set of new methods with
new names that simply add new functions to the subclasses. The other is a set of methods that
finalize the inheritance chain so that the subclasses of the subclass cannot
inherit those methods (by using keywords gprivateh and gfinalh). Therefore the formula (2) can be
refined as formula (3). In this
formula, ΏΜi@expresses the increasing factor for the number of methods of the
first group proportional to the number of methods in the super classΜi, and (1|ΐΜi)@ expresses the decreasing factor
for the number of methods of the second group proportional to the number of
methods in the super class. The increasing factor Ώ stands for the growth rate of the
number of newly added methods in subclasses. The decreasing factor ΐ stands for
the ratio of the methods that finalize the inheritance.
°Μi+1 =ΏΜi(1|ΐΜi) ₯₯₯₯₯₯ (3)
The formula(3) describes that the number of methods in all the
subclasses, °Μi+1, is determined by the cross-correlation
between the increasing factor and the decreasing factor. The number of methods in all the
subclass is expressed by a quadratic equation. In other words, it represents a logistic-like mapping
function that the number of the methods in all the subclasses with level i+1
and the number of methods in the super class level i, transit
themselves with keeping the autocorrelation-ship. This situation is depicted in Figure 1.
Unlike the case of methods, the number of attributes in all the
subclasses increases monotonically. Therefore, the relationship between the
number of attributes in a super class and the number of all the subclasses can
be conjectured as follows:
°Ιi+1 = ΑΙi + Β ₯₯₯₯₯ (4)
The number of subclasses of the super class level i can be
conjectured as follows:
i+1 = ΓΜi|Ζ ₯₯₯₯₯₯ (5)
The formula (5) describes that a super class with many methods has a
small number of subclasses, and a super class with few methods has many
subclasses. Since the inheritance
is based on gcases,h it is reasonable that a super class with much
functionality has fewer subclasses than that with little functionality.
In order
to verify the hypotheses, we have counted the methods of classes in ComponentUI
in Java class library. The
relationship between all the classes and their subclasses is shown in Figure 2.
Figure 2 displays two distinct groups of relations. We conjectured that the relationship
between a super class that is near the root of a class hierarchy and its
subclasses may be different from the relationship between a super class that is
near the leaf of a class hierarchy.
Therefore we divided classes into two groups A and B based on the number
of methods, and observed where those classes in each group are found in the
class hierarchy. Group A consists
of classes with the number of methods less than thirty, and group B consists of
classes with the number of methods more than or equal to thirty.
The number of gthirtyh is chosen
heuristically. We compared the
average distance of each class from the leaf of class structure. The results are shown in Table 1. We employed the t-test and found
statistical significance (The null hypothesis was rejected with 5% critical
value.) Classes of group A reside
closer to the root of class hierarchy than classes of group B. Classes of group B reside relatively
close to the leaf classes.
Table 1. Comparison of the distances from the leaves between group A
and group B
|
Group A |
Group B |
Number
of Classes |
25 |
7 |
Average
Distance from Leaf (Number of levels) |
1.292 |
2.000 |
Variance
of Average Distance |
0.373 |
0.285 |
Standard
Deviation of Average Distance |
0.624 |
0.577 |
Computed t-value |
-2.61 |
|
Classes of group A behave according to the formula (3),
but classes of group B behave differently. It seems that classes of group B have linear relation with
respect to super classes and their subclasses. Therefore, the relation between the number of methods in all
the subclasses and the number of methods in their super class of group B can be
expressed in a linear formula as follows:
°Μi+1 = aΜi + b ₯₯₯₯₯₯(6)
Upon these observation, we determined to find parameters Ώand ΐ for group A, and parameters a and b for group B.
3.2 Estimating
Parameters for Formulae
(a) Java Class Library
Since gComponentUIh and gComponenth
in Java class library provide enough classes for measurement, we employed the
least squares method to obtain parameters Ώ and ΐ, and a and b for
formulae(3) and (6), respectively.
The results are shown in Table 2. The correlations among group A are
shown in Figures 3 and 4. Classes
of group B show linear correlations.
Table 2. Parameters
for forecasting formulae computed from Java class library
NOCT NOC Forecasting Model Formula Parameters PCP LOS ComponentUI Group_A 24 Quadratic Ώ= 1.534 0.54 5% ΐ= 0.020 Group_B 7 Linear a = 1.059 0.76 5% b = 8.367 Component Group_A 14 Quadratic Ώ= 1.874 0.47 5% ΐ= 0.028 Group_B 10 Linear a = 1.305 0.76 5% b = 15.683
NOCT: Names of
class tree
NOC: Number of sample classes
PCP: Pearson correlation parameters
LOS: Level of significance
/The level of significance represents the reject level for the null
hypothesis of the Pearson correlation parameters. /
(b) Delphi Class Library
We performed the similar analysis
against VCL (Visual Component Library) of Delphi. We chose four class trees, TObject, TPersistant, and TWinControl,
because of their rich class hierarchy.
Since most of the classes have more than thirty methods, i.e. group A
methods, we performed the regression analysis to obtain Ώ and ΐ for formula
(3). The results are shown in
Table 3.
3.3 Forecasting Formulae
for the Number of Subclasses
Next, we find the parameters in formula (5), i.e. Γ and Ζ. The relationship between the number of
methods in a class and the number of methods in all the subclasses can be
depicted in Figure 5.
Upon
applying regression analysis to these data, we obtained the forecasting model
formula(7) as follows(the number of samples is fifty, the correlation parameter
is 0.60). This formula forecasts the maximum number of subclasses of a super
class. We will scrutinize this
formula in Section4.
ni+1 = 18.15Μi|0.584 @@ ₯₯₯₯₯₯ (7)
4.@OBSERVATIONS AND DISCUSSION
4.1 Rationale
of the Formula for the Number of Methods
In the previous section, we verified the formula (3), the
number of methods in all the subclasses is determined by the cross-correlation
between the increasing factor and the decreasing factor, by using the Java
class libraries, i.e. ComponentUI and Component, and VCL of Delphi. The observed data show statistical
significance. From the
observation, we can conclude that the number of methods in all the subclasses
is determined by the cross-correlation between the increasing factor and the
decreasing factor. We believe that
the software evolution appears as the increase of the number of methods in the
target software. We demonstrate
such increase of the number of methods can be expressed by the logistic-like
mapping function. We can say that
the class libraries that we used for verification have evolved along the model
formulae that we developed.
4.2 Rationale of the Formula for the Number
of Subclasses
One way to explain the fact that the number of subclasses
is inverse proportional to the power of the number of methods in the immediate
super class is introducing a new concept, namely the connecting force of
methods. Such conceptual force
among methods can be formulated as follows:
F = C mΖ ₯₯₯₯₯₯(8)
In this formula, C and m stand for a constant and the number of methods in a class, respectively. For example, when a method in a class has interactions with all methods in the class including the method itself, the number of the interactions is m2. If we assume such connecting force, constructing a subclass requires another imaginary force to extract methods from the super class against this connecting force. Therefore, even though the requirement for subclasses occurs in a constant probability, the frequency that the requirement is satisfied with a certain effort is inverse proportional to the connecting force. Upon applying this hypothesis to the number of subclasses and the number of methods in the super class, it is easy to understand the fact that the maximum number of subclasses of a super class is inverse proportional to the number of methods in the super class. This hypothesis explains the formula ( 7).
4.3 Class Modeling
Based on Attributes
It is demonstrated that the
number of methods in all the subclasses is expressed in logistic-like mapping
function. The logistic mapping
function is known that it can be used to forecast the variation of
population. This fact suggests
that methods may determine the characteristics of the class. In other words, one should construct a
class from methods in bottom-up way.
We would like to propose the following propositions for discussions.
(1) The
task of a method is modifying some attributes. Through this modification, a method affects the behavior of
other methods. We should pay more
attention to methods and attributes rather than classes. Classes can be seen as mere containers
of methods and attributes. When we design software, we should extract methods
and attributes before constructing class structures. The methodology that CRC
cards employs suggests the same approach [5][6]
(2)@There should
be some rules that we can use for constructing classes from methods (and
attributes.) The other study of
ours on the timing of data generation and method@implementation suggests these
constructing rules [7]. Formulating these rules is the further research
direction.
5. ACKNOWLEDGMENTS
A part of this research was performed as one of the HITOCC projects and partially sponsored by Japan Information Technology Services Industry Association.
[1]
I.Jacobso,G.
Booch,J. Rumbaugh,hThe Unifield Software Development Process,h Addison Wesley
(1999)
[2]
ChidamberKamemerCgA@Metrics
Suite for Object Oriented Design,hIEEE@Trans. SE Vol.20, No.6 pp.476-493@(1994)
[3] Takako Nakatani, Tetuo TamaiC"A
Study on Statistic Characteristics of Inheritance Tree Evolution,h Proceedings
of Object-Oriented Symposium, IPSJ, pp.137`144i1999j. In Japanese.
[4] Mikio Ohki,Shoijiro Akiyama,gA
Class Structure Evolutional Mode and Analysis of its Parameters,h IPSJ Vol.2001 No.92@SE-133-3 pp.15-22(2001) .
In Japanese.
[5].Wirfs-Brock,
B.Wilkerson, gObject-Oriented Design: A Responsibility-Driven Approach,"
Proc of OOPSLAf89, ACM, pp. 71-75, 1989.
[6].Wirfs-Brock,gDesigning
Objects and Their Interactions: A Brief Look at Responsibility-Driven Design,h
Carroll, J. M. ed., Scenario-Based Design, John Wiley & Sons,1995
[7]
Mikio Ohki,Kohei Akiyama,hA Proposal of Conceptual Modeling Criteria and their
Validity Evaluation,h IEICE VOL.J84-D-1 No.6 pp.723-735(2001) In Japanese