I’m engaged in developing a turn-based casual MMORPG game server.
The low level engine(NOT written by us) which handle networking,
multi-threading, timer, inter-server communication, main game loop etc, was
written by C++. The high level game logic was written by Python.
My question is about the data model design in our game.
At first we simply try to load all data of a player into RAM and a shared data
cache server when client login and schedule a timer periodically flush data into
data cache server and data cache server will persist data into database.
But we found this approach has some problems
1) Some data needs to be saved or checked instantly, such as quest progress,
level up, item & money gain etc.
2) According to game logic, sometimes we need to query some offline player’s
data.
3) Some global game world data needs to be shared between different game
instances which may be running on a different host or a different process on the
same host. This is the main reason we need a data cache server sits between game
logic server and database.
4) Player needs freely switch between game instances.
Below is the difficulty we encountered in the past:
1) All data access operation should be asynchronized to avoid network I/O
blocking the main game logic thread. We have to send message to database or
cache server and then handle data reply message in callback function and
continue proceed game logic. It quickly become painful to write some moderate
complex game logic that needs to talk several times with db and the game logic
is scattered in many callback functions makes it hard to understand and
maintain.
2) The ad-hoc data cache server makes things more complex, we hard to maintain
data consistence and effectively update/load/refresh data.
3) In-game data query is inefficient and cumbersome, game logic need to query
many information such as inventory, item info, avatar state etc. Some
transaction machanism is also needed, for example, if one step failed the entire
operation should be rollback. We try to design a good data model system in RAM,
building a lot of complex indexs to ease numerous information query, adding
transaction support etc. Quickly I realized what we are building is a in-memory
database system, we are reinventing the wheel…
Finally I turn to the stackless python, we removed the cache server. All data
are saved in database. Game logic server directly query database. With stackless
python’s micro tasklet and channel, we can write game logic in a synchronized
way. It is far more easy to write and understand and productivity greatly
improved.
In fact, the underlying DB access is also asynchronized: One client tasklet
issue request to another dedicate DB I/O worker thread and the tasklet is
blocked on a channel, but the entire main game logic is not blocked, other
client’s tasklet will be scheduled and run freely. When DB data reply the
blocked tasklet will be waken up and continue to run on the ‘break
point'(continuation?).
With above design, I have some questions:
1) The DB access will be more frequently than previous cached solution, does the
DB can support high frequent query/update operation? Does some mature cache
solution such as redis, memcached is needed in near future?
2) Are there any serious pitfalls in my design? Can you guys give me some better
suggestions, especially on in-game data management pattern.
Any suggestion would be appreciated, thanks.
I’ve worked with one MMO engine that operated in a somewhat similar fashion. It was written in Java, however, not Python.
With regards to your first set of points:
1) async db access We actually went the other route, and avoided having a “main game logic thread.” All game logic tasks were spawned as new threads. The overhead of thread creation and destruction was completely lost in the noise floor compared to I/O. This also preserved the semantics of having each “task” as a reasonably straightforward method, instead of the maddening chain of callbacks that one otherwise ends up with (although there were still cases of this.) It also meant that all game code had to be concurrent, and we grew increasingly reliant upon immutable data objects with timestamps.
2) ad-hoc cache We employed a lot of WeakReference objects (I believe Python has a similar concept?), and also made use of a split between the data objects, e.g. “Player”, and the “loader” (actually database access methods) e.g. “PlayerSQLLoader;” the instances kept a pointer to their Loader, and the Loaders were called by a global “factory” class that would handle cache lookups versus network or SQL loads. Every “Setter” method in a data class would call the method
changed, which was an inherited boilerplate formyLoader.changed (this);In order to handle loading objects from other active servers, we employed “proxy” objects that used the same data class (again, say, “Player,”) but the Loader class we associated was a network proxy that would (synchronously, but over gigabit local network) update the “master” copy of that object on another server; in turn, the “master” copy would call
changeditself.Our SQL
UPDATElogic had a timer. If the backend database had received anUPDATEof the object within the last ($n) seconds (we typically kept this around 5), it would instead add the object to a “dirty list.” A background timer task would periodically wake and attempt to flush any objects still on the “dirty list” to the database backend asynchronously.Since the global factory maintained WeakReferences to all in-core objects, and would look for a single instantiated copy of a given game object on any live server, we would never attempt to instantiate a second copy of one game object backed by a single DB record, so the fact that the in-RAM state of the game might differ from the SQL image of it for up to 5 or 10 seconds at a time was inconsequential.
Our entire SQL system ran in RAM (yes, a lot of RAM) as a mirror to another server who tried valiantly to write to disc. (That poor machine burned out RAID drives on average of once every 3-4 months due to “old age.” RAID is good.)
Notably, the objects had to be flushed to database when being removed from cache, e.g. due to exceeding the cache RAM allowance.
3) in-memory database … I hadn’t run across this precise situation. We did have “transaction-like” logic, but it all occurred on the level of Java getters/setters.
And, in regards to your latter points:
1) Yes, PostgreSQL and MySQL in particular deal well with this, particularly when you use a RAMdisk mirror of the database to attempt to minimize actual HDD wear and tear. In my experience, MMO’s do tend to hammer the database more than is strictly necessary, however. Our “5 second rule”* was built specifically to avoid having to solve the problem “correctly.” Each of our setters would call
changed. In our usage pattern, we found that an object typically had either 1 field changed, and then no activity for some time, or else had a “storm” of updates happen, where many fields changed in a row. Building proper transactions or so (e.g. informing the object that it was about to accept many writes, and should wait for a moment before saving itself to the DB) would have involved more planning, logic, and major rewrites of the system; so, instead, we bypassed the situation.2) Well, there’s my design above 🙂
In point of fact, the MMO engine I’m presently working on uses even more reliance upon in-RAM SQL databases, and (I hope) will be doing so a bit better. However, that system is being built using an Entity-Component-System model, rather than the OOP model that I described above.
If you already are based on an OOP model, shifting to ECS is a pretty paradigm shift and, if you can make OOP work for your purposes, it’s probably better to stick with what your team already knows.
*- “the 5 second rule” is a colloquial US “folk belief” that after dropping food on the floor, it’s still OK to eat it if you pick it up within 5 seconds.