Introduction
I’m building an API wrapper for the SE API 2.0
Currently, I’m implementing a cache feature, this wasn’t an issue until now. Now I’m taking concurrency into account. This would be my test method:
Code
public static void TestConcurrency()
{
Stopwatch sw = new Stopwatch();
sw.Start();
IList<Task> tasks = new List<Task>();
for (int i = 0; i < 1000; i++)
{
tasks.Add(Task.Factory.StartNew(p => client.GetAnswers(), null));
}
Task.WaitAll(tasks.ToArray());
sw.Stop();
Console.WriteLine("elapsed: {0}", sw.Elapsed.ToString());
Console.ReadKey();
}
Description
Internally, the client has a RequestHandler class, which attempts to fetch a value from the cache, and if it fails to do so, it performs the actual request.
Code
/// <summary>
/// Checks the cache and then performs the actual request, if required.
/// </summary>
/// <typeparam name="T">The strong type of the expected API result against which to deserialize JSON.</typeparam>
/// <param name="endpoint">The API endpoint to query.</param>
/// <returns>The API response object.</returns>
private IApiResponse<T> InternalProcessing<T>(string endpoint) where T : class
{
IApiResponse<T> result = FetchFromCache<T>(endpoint);
return result ?? PerformRequest<T>(endpoint);
}
Description
The code that actually performs the request is irrelevant to this issue. The code that attempts to access the cache does the following:
Code
/// <summary>
/// Attempts to fetch the response object from the cache instead of directly from the API.
/// </summary>
/// <typeparam name="T">The strong type of the expected API result against which to deserialize JSON.</typeparam>
/// <param name="endpoint">The API endpoint to query.</param>
/// <returns>The API response object.</returns>
private IApiResponse<T> FetchFromCache<T>(string endpoint) where T : class
{
IApiResponseCacheItem<T> cacheItem = Store.Get<T>(endpoint);
if (cacheItem != null)
{
IApiResponse<T> result = cacheItem.Response;
result.Source = ResultSourceEnum.Cache;
return result;
}
return null;
}
Description
The actual implementation of the cache store works on a ConcurrentDictionary, when the Get<T>() method is invoked, I:
- Check whether the dictionary has an entry for
endpoint. - If it does, I verify whether or not it holds a response object.
- If it doesn’t already have a response object, the cache item’s state will be
Processing, and the thread will be put to sleep for a small amount of time, waiting on the actual request being completed. - Once or if the response is “commited” to the cache store (this happens once the request is completed), the cache item is returned.
- If the cache item was too old or the request processing timed out, the entry is removed from the store.
- If the cache didn’t have an entry for
endpoint,nullis pushed into the cache as the response forendpoint, signaling a request on that endpoint is being processed and there is no need to issue more requests on the same endpoint. Thennullis returned, signaling an actual request should be made.
Code
/// <summary>
/// Attempts to access the internal cache and retrieve a response cache item without querying the API.
/// <para>If the endpoint is not present in the cache yet, null is returned, but the endpoint is added to the cache.</para>
/// <para>If the endpoint is present, it means the request is being processed. In this case we will wait on the processing to end before returning a result.</para>
/// </summary>
/// <typeparam name="T">The strong type of the expected API result.</typeparam>
/// <param name="endpoint">The API endpoint</param>
/// <returns>Returns an API response cache item if successful, null otherwise.</returns>
public IApiResponseCacheItem<T> Get<T>(string endpoint) where T : class
{
IApiResponseCacheItem cacheItem;
if (Cache.TryGetValue(endpoint, out cacheItem))
{
while (cacheItem.IsFresh && cacheItem.State == CacheItemStateEnum.Processing)
{
Thread.Sleep(10);
}
if (cacheItem.IsFresh && cacheItem.State == CacheItemStateEnum.Cached)
{
return (IApiResponseCacheItem<T>)cacheItem;
}
IApiResponseCacheItem value;
Cache.TryRemove(endpoint, out value);
}
Push<T>(endpoint, null);
return null;
}
The issue is indeterminately, sometimes two requests make it through, instead of just one like it is designed to happen.
I’m thinking somewhere along the way something that’s not thread safe is being accessed. But I can’t identify what that might be. What could it be, or how should I debug this properly?
Update
The issue was I wasn’t always being thread safe on the ConcurrentDictionary
This method wasn’t retuning a boolean indicating whether the cache was successfully updated, therefore if this method failed, null would have been returned twice by Get<T>().
Code
/// <summary>
/// Attempts to push API responses into the cache store.
/// </summary>
/// <typeparam name="T">The strong type of the expected API result.</typeparam>
/// <param name="endpoint">The queried API endpoint.</param>
/// <param name="response">The API response.</param>
/// <returns>True if the operation was successful, false otherwise.</returns>
public bool Push<T>(string endpoint, IApiResponse<T> response) where T : class
{
if (endpoint.NullOrEmpty())
{
return false;
}
IApiResponseCacheItem item;
if (Cache.TryGetValue(endpoint, out item))
{
((IApiResponseCacheItem<T>)item).UpdateResponse(response);
return true;
}
else
{
item = new ApiResponseCacheItem<T>(response);
return Cache.TryAdd(endpoint, item);
}
}
Description
The solution was to implement the return value, and changing Get<T>() adding this:
Code
if (Push<T>(endpoint, null) || retries > 1) // max retries for sanity.
{
return null;
}
else
{
return Get<T>(endpoint, ++retries); // retry push.
}
A ConcurrentDirectionary is thread-safe but that doesn’t automatically make your code thread safe. The above snippet is the core of the problem. Two threads could call the Get() method at the same time and get a null. They’ll both continue on and call PerformRequest() concurrently. You’ll need to merge the InternalProcessing() and FetchFromCache() and ensure that only one thread can call PerformRequest by using a lock. That might produce poor concurrency, perhaps you could just drop a duplicate response. In all likelihood, the requests get serialized by the SE server anyway so it probably doesn’t matter.