Well, Maybe 5M is not that much, but it needs to receive a XML based on the following schema
http://www.sat.gob.mx/sitio_internet/cfd/3/cfdv3.xsd
Therefore I need to save almost all the information per row. Now by law we are required to save the information for a very long time and eventually this database will be very very veeeeery big.
Maybe create a table every day? something like _invoices_16_07_2012.
Well, I’m lost..I have no idea how to do this, but I know is possible.
On top of that, I need to create a PDF and 2 more files based on each XML and keep them on HD.
And you should be able to retrieve your files quickly using a web site.
Thats a lot of data to put into one field in a single row (not sure if that was something you were thinking about doing).
Write a script to parse the xml object and save each value from the xml in a separate field or in a way that makes sense for you (so you’ll have to create a table with all the appropriate fields). You should be able to input your data as one row per xml sheet.
You’ll also want to shard your database and spread it across a cluster of servers on many tables. MySQL does support this but I’ve only boostrapped my own sharding mechanism before.
Do not create a table per XML sheet as that is overkill.
Now, why do you need mysql for this? Are you querying the data in the XML? If you’re storing this data simply for archival purposes, you don’t need mysql, but can instead compress the files into, say, a tarball and store them directly on disk. Your website can easily fetch the file in this way.
If you do need a big data store that can handle 5M transactions with as much data as you’re saying, you might also want to look into something like Hadoop and store the data in a Distributed File System. If you want to more easily query your data, look into HBase which can run on top of Hadoop.
Hope this helps.