Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8880163
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T20:06:52+00:00 2026-06-14T20:06:52+00:00

Introduction Background I’m writing a script to upload stuff including files using the multipart/form-data

  • 0

Introduction

Background

I’m writing a script to upload stuff including files using the multipart/form-data content type defined in RFC 2388. In the long run, I’m trying to provide a simple Python script to do uploads of binary packages for github, which involves sending form-like data to Amazon S3.

Related

This question has already asked about how to do this, but it is without an accepted answer so far, and the more useful of the two answers it currently has points to these recipes which in turn build the whole message manually. I am somewhat concerned about this approach, particularly with regard to charsets and binary content.

There is also this question, with its currently highest-scoring answer suggesting the MultipartPostHandler module. But that is not much different from the recipes I mentioned, and therefore my concerns apply tho that as well.

Concerns

Binary content

RFC 2388 Section 4.3 explicitely states that content is expected to be 7 bit unless declared otherwise, and therefore a Content-Transfer-Encoding header might be required. Does that mean I’d have to Base64-encode binary file content? Or would Content-Transfer-Encoding: 8bit be sufficient for arbitrary files? Or should that read Content-Transfer-Encoding: binary?

Charset for header fields

Header fields in general, and the filename header field in particular, are ASCII only by default. I’d like my method to be able to pass non-ASCII file names as well. I know that for my current application of uploading stuff for github, I probably won’t need that as the file name is given in a separate field. But I’d like my code to be reusable, so I’d rather encode the file name parameter in a conforming way. RFC 2388 Section 4.4 advises the format introduced in RFC 2231, e.g. filename*=utf-8''t%C3%A4st.txt.

My approach

Using python libraries

As multipart/form-data is essentially a MIME type, I thought that it should be possible to use the email package from the standard python libraries to compose my post. The rather complicated handling of non-ASCII header fields in particular is something I’d like to delegate.

Work so far

So I wrote the following code:

#!/usr/bin/python3.2

import email.charset
import email.generator
import email.header
import email.mime.application
import email.mime.multipart
import email.mime.text
import io
import sys

class FormData(email.mime.multipart.MIMEMultipart):

    def __init__(self):
        email.mime.multipart.MIMEMultipart.__init__(self, 'form-data')

    def setText(self, name, value):
        part = email.mime.text.MIMEText(value, _charset='utf-8')
        part.add_header('Content-Disposition', 'form-data', name=name)
        self.attach(part)
        return part

    def setFile(self, name, value, filename, mimetype=None):
        part = email.mime.application.MIMEApplication(value)
        part.add_header('Content-Disposition', 'form-data',
                        name=name, filename=filename)
        if mimetype is not None:
            part.set_type(mimetype)
        self.attach(part)
        return part

    def http_body(self):
        b = io.BytesIO()
        gen = email.generator.BytesGenerator(b, False, 0)
        gen.flatten(self, False, '\r\n')
        b.write(b'\r\n')
        b = b.getvalue()
        pos = b.find(b'\r\n\r\n')
        assert pos >= 0
        return b[pos + 4:]

fd = FormData()
fd.setText('foo', 'bar')
fd.setText('täst', 'Täst')
fd.setFile('file', b'abcdef'*50, 'Täst.txt')
sys.stdout.buffer.write(fd.http_body())

The result looks like this:

--===============6469538197104697019==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: form-data; name="foo"

YmFy

--===============6469538197104697019==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: form-data; name*=utf-8''t%C3%A4st

VMOkc3Q=

--===============6469538197104697019==
Content-Type: application/octet-stream
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: form-data; name="file"; filename*=utf-8''T%C3%A4st.txt

YWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJj
ZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVm
YWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJj
ZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVm
YWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJj
ZGVmYWJjZGVmYWJjZGVm

--===============6469538197104697019==--

It does seem to handle headers reasonably well. Binary file content will get base64-encoded, which might be avoidable but which should work well enough. What worries me are the text fields in between. They are base64-encoded as well. I think that according to the standard, this should work well enough, but I’d rather have plain text in there, just in case some dumb framework has to deal with the data at an intermediate level and does not know about Base64 encoded data.

Questions

  • Can I use 8 bit data for my text fields and still conform to the specification?
  • Can I get the email package to serialize my text fields as 8 bit data without extra encoding?
  • If I have to stick to some 7 bit encoding, can I get the implementation to use quoted printable for those text parts where that encoding is shorter than base64?
  • Can I avoid base64 encoding for binary file content as well?
  • If I can avoid it, should I write the Content-Transfer-Encoding as 8bit or as binary?
  • If I had to serialize the body myself, how could I use the email.header package on its own to just format header values? (email.utils.encode_rfc2231 does this.)
  • Is there some implementation that already did all I’m trying to do?

These questions are very closely related, and could be summarized as “how would you implement this”. In many cases, answering one question either answers or obsoletes another one. So I hope you agree that a single post for all of them is appropriate.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T20:06:53+00:00Added an answer on June 14, 2026 at 8:06 pm

    This is a placeholder answer, describing what I did while waiting for some authoritative input to some of my questions. I’ll be happy to accept a different answer if it demonstrates that this approach is wrong or unsuitable in at least one of the design decisions.

    Here is the code I used to make this work according to my taste for now.
    I made the following decisions:

    Can I use 8 bit data for my text fields and still conform to the specification?

    I decided to do so. At least for this application, it does work.

    Can I get the email package to serialize my text fields as 8 bit data without extra encoding?

    I found no way, so I’m doing my own serialization, just as all the other recipes I saw on this.

    Can I avoid base64 encoding for binary file content as well?

    Simply sending the file content in binary seems to work well enough, at least in my single application.

    If I can avoid it, should I write the Content-Transfer-Encoding as 8bit or as binary?

    As RFC 2045 Section 2.8 states, that 8bit data is subject to a line length limitation of 998 octets between CRLF pairs, I decided that binary is the more general and thus the more appropriate description here.

    If I had to serialize the body myself, how could I use the email.header package on its own to just format header values?

    As already edited into my question, email.utils.encode_rfc2231 is very useful for this. I try to encode using ascii first, but use that method in case of either non-ascii data or ascii characters which are forbidden inside a double-quoted string.

    Is there some implementation that already did all I’m trying to do?

    Not that I’m aware of. Other implementations are invited to adopt ideas from my code, though.


    Edit:

    Thanks to this comment I’m now aware that the use of RFC 2231 for headers is not universally accepted: the current draft of HTML 5 forbids its use. It has also been seen to cause problems in the wild. But since POST headers not always correspond to a specific HTML document (think web APIs for example), I’m not sure I’d trust that draft in that regard either. Perhaps the right way to go is giving both encoded and unencoded name, the way RFC 5987 Section 4.2 suggests. But that RFC is for HTTP headers, while a multipart/form-data header is technically HTTP body. That RFC therefore doesn’t apply, and I do not know of any RFC which would explicitely allow (or even encourage) the use of both forms simultaneously for multipart/form-data.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

INTRODUCTION I'm using excel downloads as a way of users downloading a score sheet,
Introduction: I want to loop through XML files with flexible categories structure. Problem: I
Introduction We have an OpenID Provider which we created using the DotNetOpenAuth component. Everything
Introduction In a current project I'm working on we're using the ChartBoost SDK for
According to page 20 of Introduction to Background Tasks , under Network resource constraints
I have a dictionary with the following data: Key Value 1 Introduction 1.1 General
I'm using backstretch for the background image so that I don't have problems with
Introduction As a developer, I'm involved in writing a lot of mathematical code everyday
Introduction Peter Weinhart describes how to design a generic intrusive_ptr base class using CRTP
Introduction I heard something about writing device drivers in Java (heard as in with

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.