Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6349993
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 24, 20262026-05-24T21:43:35+00:00 2026-05-24T21:43:35+00:00

I’m trying to move a RDBMS model over to Cassandra, and having a hard

  • 0

I’m trying to move a RDBMS model over to Cassandra, and having a hard time creating the schema. Here is my data model:

CREATE TABLE Domain (
    ID INT NOT NULL PRIMARY KEY,
    DomainName NVARCHAR(74) NOT NULL,
    HasBadWords BIT,
    ...
);
INSERT INTO Domain (DomainName, HasBadWords) VALUES ('domain1.com', 0);
INSERT INTO Domain (DomainName, HasBadWords) VALUES ('domain2.com', 0);

CREATE TABLE ZoneFile (
    ID INT NOT NULL PRIMARY KEY,
    DomainID INT NOT NULL,
    Available BIT NOT NULL,
    Nameservers NVARCHAR(MAX),
    Timestamp DATETIME NOT NULL
);
INSERT INTO ZoneFile (DomainID, Available, Nameservers, Timestamp) VALUES (1, 0, "ns1", '2010-01-01');
INSERT INTO ZoneFile (DomainID, Available, Nameservers, Timestamp) VALUES (2, 0, "ns1", '2010-01-01');
INSERT INTO ZoneFile (DomainID, Available, Nameservers, Timestamp) VALUES (1, 1, "ns2", '2011-01-01');
INSERT INTO ZoneFile (DomainID, Available, Nameservers, Timestamp) VALUES (2, 1, "ns2", '2011-01-01');

CREATE TABLE Backlinks (
    ID INT NOT NULL PRIMARY KEY,
    DomainID INT NOT NULL,
    Backlinks INT NOT NULL,
    Indexed INT NOT NULL,
    Timestamp DATETIME NOT NULL
);
INSERT INTO Backlinks (DomainID, Backlinks, Indexed, Timestamp) VALUES (1, 100, 200, '2010-01-01');
INSERT INTO Backlinks (DomainID, Backlinks, Indexed, Timestamp) VALUES (2, 300, 600, '2010-01-01');
INSERT INTO Backlinks (DomainID, Backlinks, Indexed, Timestamp) VALUES (1, 500, 1000, '2010-01-01');
INSERT INTO Backlinks (DomainID, Backlinks, Indexed, Timestamp) VALUES (2, 600, 1200, '2010-01-01');

From this, I’ve deduced that I can probably have one Keyspace: DomainData. In this keyspace, I can have a columnfamily called “Domain” which is like my Domain table in sql:

"Domain" : { //ColumnFamily
    "domain1.com" : { "HasBadWords" : 0 }, //SuperColumn
    "domain2.com" : { "HasBadWords" : 0 }  //SuperColumn
}

The next tables are where I start getting confused. ZoneFile and Backlinks are essentially supposed to store a history of results from looking up these values for each domain. So, one Domain to Many ZoneFile records. For querying purposes, I want to be able to easily get the ‘newest’ ZoneFile record, or a given Domain. I will need to do the same for Backlinks.

I was considering something like this, and doing a range lookup on the key for the domain, and then getting the ‘last’ record which should be the newest timestamp…

"ZoneFiles" : { //ColumnFamily
    "domain1.com:2010-01-01 12:00:00.000" : { "Available" : 0, "Nameservers" : "ns1" }, //SuperColumn
    "domain1.com:2011-01-01 12:00:00.000" : { "Available" : 1, "Nameservers" : "ns2" }, //SuperColumn
    "domain2.com:2010-01-01 12:00:00.000" : { "Available" : 0, "Nameservers" : "ns1" }, //SuperColumn
    "domain2.com:2011-01-01 12:00:00.000" : { "Available" : 1, "Nameservers" : "ns2" }  //SuperColumn
}

I’m not convinced this is the right answer, the combination of a string domain and string datetime in a key feels wrong. Could someone point me in the right direction?

EDIT:

Assuming I use:

"ZoneFiles" : {
  "domain1.com" : {
    timestamp1 : "{\"available\":1,\"nameservers\":\"ns1\"}",
    timestamp2 : "{\"available\":1,\"nameservers\":\"ns1\"}",
  }
}

How would I query a list of domain rows where the newest timestamp is older than a given date?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-24T21:43:37+00:00Added an answer on May 24, 2026 at 9:43 pm

    If I understand your question correctly, the only query you want to do on this model is “please get me the latest zonefile or backlinks for a given domain” ?

    If thats the case, I would store the latest values for these in the “Domain” column family, under the domains row key, in separate columns. I would also store when this latest value was updated (the timestamp). Every time you get new values for the info in zonefile and backlinks, I would just overwrite the value in the “Domain” column family and update the timestamp.

    I assume you are also keeping this historical data so you can query it, and I assume the kind of query will be “show me all the updates for a given domain between two times” (is this correct?). If so, I wouldn’t manually construct a composite row key like that, since it will require you to use the Order Preserving Partitioner to get the correct results from get_range_slices. And as you probably know, load balancing with the OPP can be a difficult task.

    Instead, I would have the row key be domain id, and the column key be the timestamp of the update. Then you can either pack you updates into a single value (eg using json), use super columns or use the new composite keys in 0.8. If done like this, you can use a get_slice to satisfy your query, and it will behave correctly with the Random Partitioner, making load balancing much easier.

    Tom Wilkie | Acunu | http://www.acunu.com | @tom_wilkie

    Reply to comment: “how would I query a list of domains that’s most recent zonefile timestamp column is older than a given timestamp?”

    You could do that by inserting into another column family:

    row key: day (or hour, or some other reasonable 'bucketing') 
    column key: timestamp of update 
    value: domain
    

    …every time you update the zonefile. Then, to get the most recently updated domains since t, do:

    result = []
    for i in day(t) ... day(now):
        result.extend(get_slice(i, range(t, '')))
    

    This would require you to remove repeat entries from result, so would only work best when t is pretty recent. You also have to consider the load balancing for the writes, which would focus all the load on a single server (since, at any one time, you are inserting into only one row)

    If these trade offs aren’t appropriate, then you could look at the hadoop integrations and use that to perform this query. Or you could make other tradeoff (use the OPP, or do a read before a write to remove the duplicates, which would be v. slow)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Basically, what I'm trying to create is a page of div tags, each has
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I'm trying to decode HTML entries from here NYTimes.com and I cannot figure out
I'm trying to create an if statement in PHP that prevents a single post
I am trying to understand how to use SyndicationItem to display feed which is
I have a string like this: La Torre Eiffel paragonata all’Everest What PHP function
I am trying to render a haml file in a javascript response like so:
I'm parsing an RSS feed that has an ’ in it. SimpleXML turns this
We're building an app, our first using Rails 3, and we're having to build
I want to construct a data frame in an Rcpp function, but when I

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.