Yet another How-to-free-memory question:
I’m copying data between two databases which are currently identical but will soon be getting out of sync. I have put together a skeleton app in C# using Reflection and ADO.Net Entities that does this:
For each table in the source database:
- Clear the corresponding table in the destination database
- For each object in the source table
- For each property in the source object
- If an identically-named property exists in the destination object, use Reflection to copy the source property to the destination property
- For each property in the source object
This works great until I get to the big 900MB table that has user-uploaded files in it.
The process of copying the blobs (up to 7 MB each) to my machine and back to the destination database uses up local memory. However, that memory isn’t getting freed, and the process dies once it’s copied about 750 MB worth of data – with my program having 1500 MB of allocated space when the OutOfMemoryException is thrown, presumably two copies of everything that it’s copied so far.
I tried a naive approach first, doing a simple copy. It worked on every table until I got to the big one. I have tried forcing a GC.Collect() with no obvious change to the results. I’ve also tried putting the actual copy into a separate function in hopes that the reference going out of scope would help it get GCed. I even put a Thread.Sleep in to try to give background processes more time to run. All of these have had no effect.
Here’s the relevant code as it exists right now:
public static void CopyFrom<TSource, TDest>(this ObjectSet<TDest> Dest, ObjectSet<TSource> Source, bool SaveChanges, ObjectContext context)
where TSource : class
where TDest : class {
int total = Source.Count();
int count = 0;
foreach (var src in Source) {
count++;
CopyObject(src, Dest);
if (SaveChanges && context != null) {
context.SaveChanges();
GC.Collect();
if (count % 100 == 0) {
Thread.Sleep(2000);
}
}
}
}
I didn’t include the CopyObject() function, it just uses reflection to evaluate the properties of src and put them into identically-named properties in a new object to be appended to Dest.
SaveChanges is a Boolean variable passed in saying that this extra processing should be done, it’s only true on the big table, false otherwise.
So, my question: How can I modify this code to not run me out of memory?
The problem is that your database context utilizes a lot of caching internally, and it’s holding onto a lot of your information and preventing the garbage collector from freeing it (whether you call
Collector not).This means that your context is defined at too high of a scope. (It appears, based on your edit, that you’re using it across tables. That’s…not good.) You haven’t shown where it is defined, but wherever it is it should probably be at a lower level. Keep in mind that because of connection pooling creating new contexts is not expensive, and based on your use cases you shouldn’t need to rely on a lot of the cached info (because you’re not touching items more than once) so frequently creating new contexts shouldn’t add performance costs, even though it’s substantially decreasing your memory footprint.