123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216 |
- :mod:`email` Package Architecture
- =================================
- Overview
- --------
- The email package consists of three major components:
- Model
- An object structure that represents an email message, and provides an
- API for creating, querying, and modifying a message.
- Parser
- Takes a sequence of characters or bytes and produces a model of the
- email message represented by those characters or bytes.
- Generator
- Takes a model and turns it into a sequence of characters or bytes. The
- sequence can either be intended for human consumption (a printable
- unicode string) or bytes suitable for transmission over the wire. In
- the latter case all data is properly encoded using the content transfer
- encodings specified by the relevant RFCs.
- Conceptually the package is organized around the model. The model provides both
- "external" APIs intended for use by application programs using the library,
- and "internal" APIs intended for use by the Parser and Generator components.
- This division is intentionally a bit fuzzy; the API described by this
- documentation is all a public, stable API. This allows for an application
- with special needs to implement its own parser and/or generator.
- In addition to the three major functional components, there is a third key
- component to the architecture:
- Policy
- An object that specifies various behavioral settings and carries
- implementations of various behavior-controlling methods.
- The Policy framework provides a simple and convenient way to control the
- behavior of the library, making it possible for the library to be used in a
- very flexible fashion while leveraging the common code required to parse,
- represent, and generate message-like objects. For example, in addition to the
- default :rfc:`5322` email message policy, we also have a policy that manages
- HTTP headers in a fashion compliant with :rfc:`2616`. Individual policy
- controls, such as the maximum line length produced by the generator, can also
- be controlled individually to meet specialized application requirements.
- The Model
- ---------
- The message model is implemented by the :class:`~email.message.Message` class.
- The model divides a message into the two fundamental parts discussed by the
- RFC: the header section and the body. The `Message` object acts as a
- pseudo-dictionary of named headers. Its dictionary interface provides
- convenient access to individual headers by name. However, all headers are kept
- internally in an ordered list, so that the information about the order of the
- headers in the original message is preserved.
- The `Message` object also has a `payload` that holds the body. A `payload` can
- be one of two things: data, or a list of `Message` objects. The latter is used
- to represent a multipart MIME message. Lists can be nested arbitrarily deeply
- in order to represent the message, with all terminal leaves having non-list
- data payloads.
- Message Lifecycle
- -----------------
- The general lifecycle of a message is:
- Creation
- A `Message` object can be created by a Parser, or it can be
- instantiated as an empty message by an application.
- Manipulation
- The application may examine one or more headers, and/or the
- payload, and it may modify one or more headers and/or
- the payload. This may be done on the top level `Message`
- object, or on any sub-object.
- Finalization
- The Model is converted into a unicode or binary stream,
- or the model is discarded.
- Header Policy Control During Lifecycle
- --------------------------------------
- One of the major controls exerted by the Policy is the management of headers
- during the `Message` lifecycle. Most applications don't need to be aware of
- this.
- A header enters the model in one of two ways: via a Parser, or by being set to
- a specific value by an application program after the Model already exists.
- Similarly, a header exits the model in one of two ways: by being serialized by
- a Generator, or by being retrieved from a Model by an application program. The
- Policy object provides hooks for all four of these pathways.
- The model storage for headers is a list of (name, value) tuples.
- The Parser identifies headers during parsing, and passes them to the
- :meth:`~email.policy.Policy.header_source_parse` method of the Policy. The
- result of that method is the (name, value) tuple to be stored in the model.
- When an application program supplies a header value (for example, through the
- `Message` object `__setitem__` interface), the name and the value are passed to
- the :meth:`~email.policy.Policy.header_store_parse` method of the Policy, which
- returns the (name, value) tuple to be stored in the model.
- When an application program retrieves a header (through any of the dict or list
- interfaces of `Message`), the name and value are passed to the
- :meth:`~email.policy.Policy.header_fetch_parse` method of the Policy to
- obtain the value returned to the application.
- When a Generator requests a header during serialization, the name and value are
- passed to the :meth:`~email.policy.Policy.fold` method of the Policy, which
- returns a string containing line breaks in the appropriate places. The
- :meth:`~email.policy.Policy.cte_type` Policy control determines whether or
- not Content Transfer Encoding is performed on the data in the header. There is
- also a :meth:`~email.policy.Policy.binary_fold` method for use by generators
- that produce binary output, which returns the folded header as binary data,
- possibly folded at different places than the corresponding string would be.
- Handling Binary Data
- --------------------
- In an ideal world all message data would conform to the RFCs, meaning that the
- parser could decode the message into the idealized unicode message that the
- sender originally wrote. In the real world, the email package must also be
- able to deal with badly formatted messages, including messages containing
- non-ASCII characters that either have no indicated character set or are not
- valid characters in the indicated character set.
- Since email messages are *primarily* text data, and operations on message data
- are primarily text operations (except for binary payloads of course), the model
- stores all text data as unicode strings. Un-decodable binary inside text
- data is handled by using the `surrogateescape` error handler of the ASCII
- codec. As with the binary filenames the error handler was introduced to
- handle, this allows the email package to "carry" the binary data received
- during parsing along until the output stage, at which time it is regenerated
- in its original form.
- This carried binary data is almost entirely an implementation detail. The one
- place where it is visible in the API is in the "internal" API. A Parser must
- do the `surrogateescape` encoding of binary input data, and pass that data to
- the appropriate Policy method. The "internal" interface used by the Generator
- to access header values preserves the `surrogateescaped` bytes. All other
- interfaces convert the binary data either back into bytes or into a safe form
- (losing information in some cases).
- Backward Compatibility
- ----------------------
- The :class:`~email.policy.Policy.Compat32` Policy provides backward
- compatibility with version 5.1 of the email package. It does this via the
- following implementation of the four+1 Policy methods described above:
- header_source_parse
- Splits the first line on the colon to obtain the name, discards any spaces
- after the colon, and joins the remainder of the line with all of the
- remaining lines, preserving the linesep characters to obtain the value.
- Trailing carriage return and/or linefeed characters are stripped from the
- resulting value string.
- header_store_parse
- Returns the name and value exactly as received from the application.
- header_fetch_parse
- If the value contains any `surrogateescaped` binary data, return the value
- as a :class:`~email.header.Header` object, using the character set
- `unknown-8bit`. Otherwise just returns the value.
- fold
- Uses :class:`~email.header.Header`'s folding to fold headers in the
- same way the email5.1 generator did.
- binary_fold
- Same as fold, but encodes to 'ascii'.
- New Algorithm
- -------------
- header_source_parse
- Same as legacy behavior.
- header_store_parse
- Same as legacy behavior.
- header_fetch_parse
- If the value is already a header object, returns it. Otherwise, parses the
- value using the new parser, and returns the resulting object as the value.
- `surrogateescaped` bytes get turned into unicode unknown character code
- points.
- fold
- Uses the new header folding algorithm, respecting the policy settings.
- surrogateescaped bytes are encoded using the ``unknown-8bit`` charset for
- ``cte_type=7bit`` or ``8bit``. Returns a string.
- At some point there will also be a ``cte_type=unicode``, and for that
- policy fold will serialize the idealized unicode message with RFC-like
- folding, converting any surrogateescaped bytes into the unicode
- unknown character glyph.
- binary_fold
- Uses the new header folding algorithm, respecting the policy settings.
- surrogateescaped bytes are encoded using the `unknown-8bit` charset for
- ``cte_type=7bit``, and get turned back into bytes for ``cte_type=8bit``.
- Returns bytes.
- At some point there will also be a ``cte_type=unicode``, and for that
- policy binary_fold will serialize the message according to :rfc:``5335``.
|