I have a scenario where multiple users will be doing import processes, but all of them will be working for different clients.
I have one core table which gets the most hits whenever import processes run. I have 2 options now
- To have one core table and do the sequential imports by making queue for the import processes.
- To have 300 core table, one for each client, it will allow the users to work on the import processes simultaneously without waiting for one another.
Can anyone suggest which one is better and why?
I am giving my requirements in more detailed this time. Can you all once again have a look at and provide your comments after going through the requirements.
The query is regarding data modeling for core functionality of my application.
I have a scenario where multiple users will be doing import processes, but all of them will be working for different clients. Also, at the same time client’s data could be shown to the user and can be modified/inserted too, while the import process for the same or different client is in process.
I have 2 core tables which get the most hits whenever import processes run.
I have 2 options now
1. To have 2 core tables and do the sequential imports by making queue for the import processes.
Table 1
ID
ClientID
SourceID
Count
AnotherCol1
AnotherCol2
AnotherCol3
Table 2
ID
ClientID
OrderID
Count
AnotherCol4
AnotherCol5
AnotherCol6
- To have 1000 core table, 2 for each client (I may have maximum 500 clients), it will allow the users to work on the import processes simultaneously without waiting for one another.
More information about the import process:
1. These table is not going to be used in any Reporting.
2. Each import process will insert 20k-30k records (7 columns) in these each table. And there will be around 40-50 such imports in a day.
3. While the import process is going on, data could be retrieved from these tables by some other user and INSERT OR UPDATED too.
4. These are going to be one of the most usable tables in the application.
5. BULK INSERT will be used for insertion.
6. Clustered index is on the Primary Key which is an Identity column.
7. We are considering the table partitioning too.
Can you please suggest which option is better and why?
Also, if you suggest to go with option 2, then would it not be a performance hit to create so many tables in the database? Should we create a separate database for these 1000 tables in this case?
I am giving my requirements in more detailed this time. Can you all once again have a look at and provide your comments after going through the requirements.
The query is regarding data modeling for core functionality of my application.
I have a scenario where multiple users will be doing import processes, but all of them will be working for different clients. Also, at the same time client’s data could be shown to the user and can be modified/inserted too, while the import process for the same or different client is in process.
I have 2 core tables which get the most hits whenever import processes run.
I have 2 options now
1. To have 2 core tables and do the sequential imports by making queue for the import processes.
Table 1
ID
ClientID
SourceID
Count
AnotherCol1
AnotherCol2
AnotherCol3
Table 2
ID
ClientID
OrderID
Count
AnotherCol4
AnotherCol5
AnotherCol6
- To have 1000 core table, 2 for each client (I may have maximum 500 clients), it will allow the users to work on the import processes simultaneously without waiting for one another.
More information about the import process:
1. These table is not going to be used in any Reporting.
2. Each import process will insert 20k-30k records (7 columns) in these each table. And there will be around 40-50 such imports in a day.
3. While the import process is going on, data could be retrieved from these tables by some other user and INSERT OR UPDATED too.
4. These are going to be one of the most usable tables in the application.
5. BULK INSERT will be used for insertion.
6. Clustered index is on the Primary Key which is an Identity column.
7. We are considering the table partitioning too.
Can you please suggest which option is better and why?
Also, if you suggest to go with option 2, then would it not be a performance hit to create so many tables in the database? Should we create a separate database for these 1000 tables in this case?
Another option is a third scenario where you have a single table, but you still can do the imports in parallel by having a batch identifier in the table which stops people stepping on each other.
The main problem with having multiple people in the same table is that you cannot do things like TRUNCATE.
For me the decision would be related to about where the data eventually goes. Is this just a staging table for convenience because there is going to be some SQL for transform or lookup run against it after load? Would it be possible to make such tables in a separate database or schema and with unique names which would make it easy for them to be cleaned up without interfering with or bloating the transaction log in your primary database> Do you need to insert in bulk, then apply indexes and eventually drop the table? Is such a table even necessary if you are using SSIS to load the data, you can often do a lot of work in the pipeline without needing a staging table?
All these things would play into my decision making process on the architecture.