7.3 A methodology and honeypot conceptualization 151
7.3.2 Data(base) models for storing e-mails
E-mails can be stored in flat files, such as used on e-mail servers, or, more
structured, in databases, the latter option facilitating data analysis. Because
data analysis is the primary goal of the empirical analysis of the abuse of e-
mail addresses, a database model is proposed in this subsection. According to
database theory, first, a semantic data model ought to be designed before the
database model is created. The rest of this subsection follows this procedure by
presenting an object-oriented data model, an equivalent relational data model,
and a relational database model. The development of these two equivalent data
models is driven by the goal to support the use of databases that follow one of
the two currently most important modeling paradigms: structural modeling
and object-oriented modeling. The representation of a relational database
model is due to the fact that such a model was chosen for storing e-mails in
the prototypic implementation of an empirical study.
The object-oriented data model
Modeling the structure and the content of Internet e-mails is not a straight-
forward procedure, as different modeling issues which are partially opposed
to each other have to be addressed: simplicity, completeness, correctness, and
practice-orientation. As the class model below is intended to address spam
issues, all compromises as well as the level of abstraction were made in fa-
vor of adequately covering spam issues. The modeling language used for the
representation of the object-oriented data model is UML 2.0.
The basic structure and content of an Internet message (e-mail) is speci-
fied with the Internet standard Request for Comments (RFC) [142]. An e-mail
consists of header fields (collectively called “the header of the message”) fol-
lowed, optionally, by a body. Many other Internet standards have emerged,
which extend RFC 2822 and, except for one exemption, are obviously relevant
to spam (see below). These are not regarded in detail, or not at all in this
model. If necessary, they have to be integrated into this model later. RFC
2076 [130] and its updated, but not yet standardized version [131] compile in-
formation from other e-mail-related RFCs and also integrate a few commonly
occurring e-mail parts which are not defined in RFCs.
In particular, no security aspects are regarded in the basic model, as no
spam e-mail observed by the author featured any security item, e.g. not
included are: Secure MIME (S/MIME) [139], Open Pretty Good Privacy
(PGP) [19], and Privacy Enhancement for Internet Electronic Mail (PEM)
[100, 91, 11, 89].
As RFC 2822 was designed to send only plain text e-mails with ASCII
symbols, thus excluding any binary documents, e.g. executable files, pictures,
videos, and compressed files, from being attached, the Internet community
has accepted the Multipurpose Internet Mail Extensions (MIME) standard,
as specified in RFCs 2045-2049 [61, 62, 109, 63, 60] as extension; these RFCs