Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8183481
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 7, 20262026-06-07T01:10:23+00:00 2026-06-07T01:10:23+00:00

I have started using binary trees in c++, and i must say i really

  • 0

I have started using binary trees in c++, and i must say i really like the idea and things are clear for me, until i think of storing data on the disk in an order where later i can instantly read a chunk of data.
So far i have stored everything (nodes) into the ram… but this is just a simple and not real life app. I am not interested in storing the whole binary tree on the disk as that would be useless again since you have to read it again back to the memory! what i am after is a method just like for example MYSQL.
I haven’t found any good article on this so i would appreciate if i someone include some urls or books.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-07T01:10:25+00:00Added an answer on June 7, 2026 at 1:10 am

    The main difference from b-tree and b+tree:
    – The leaf nodes are linked for fast lockup sequential reads. Can point ascending, can point descending , or both (like i saw in one IBM DB)

    • You should write it on disk, if the table or file grows, you will have memory problems.
      (SEEK operations on files ARE REALLY FAST. You can create a 1 GB file on disk in less than 1 second… C# filestream,method .SetFilesize)

    • If you manage to have multiple readers/writers, you need concurrency control over the index and table(or file)…. You gona do that in memory? If a power failure occures, how do you rollback?Ye, you dont.

    IE:Field f1 is indexed.

    WHERE 1=1 (dont need to access b+tree, give me all and the order is irrelevant)

    WHERE 1=1 ORDER BY f1 ASC/DESC (Need to access b+tree, give me all by ascending/descending order)

    WHERE f1>=100 (Need to access b+tree, lock up where the leaf node =100 and give all leaf node items following right pointers. If this process is a multithreaded read, they probablly come with a strange order, but no problem… no order by in clause).

    WHERE f1>=100 order by f1 asc (Need to access b+tree, lock up where the leaf node =100 and give all leaf node items following right pointers. This process shouldnt be multithreaded following the b+tree, comes naturally in order.

    Field f2 indexed with a b+tree and type string.

    Where name like ‘%ODD’ (Internally, the compared value must be inverted and the all symbol stays at the end Like starts with ‘DDO’ and ends with anything. ‘DDOT’ is in the group so ‘TODD’ must belongs to the result!!!! Tricky, tricky logic ;P)

    with this statement,
    WHERE name like ‘%OD%’ (has in the middle ‘OD’). The things get hot :))))
    Internally, the result is the UNION of the sub result for ‘OD%’ with the sub result inverted ‘DO%’. After that, removes of starting ‘OD’ and ending ‘OD’ without ‘OD’ in the middle, otherwise its a valid result(‘ODODODODOD’ its a valid result. Invalid results ‘ODABCD’ and ‘ABCDOD’ ).

    Consider what i said and check some more things if you gona do deep:
    – FastIO on files:C# Filestream no_buffered_flag, wriththought disk flag on.
    – Memory mapped files/memory views: Yes we can manipulate an huge file in small portions as we need it
    – Indexes:Bitmap index, hash index (hash function;perfect hash function;ambiguity of the hashfunction), sparse index, dense index, b+tree, r-tree, reversed index.
    – Multithreads: lock, mutexes,semaphores
    – Transactional conciderations (Log file, 2phase commit;3phase commit);
    – Locks (database,table,page,record)
    – Deadlocks: 3 ways to kill it (longer conflicting process;Younger conflicting process;The process which locks more objects). Modern RDBMs use a mixed of the 3 ways…
    – SQL parsing (AST-Tree).
    – Caching recurrent queries.
    – Triggers, procedures, views, etc.
    – Passing parameters to the procedures (can use the object type ;P)

    • DONT LOAD EVERYTHING IN MEMORY,INTELLIGENT SOLUTIONS LOADS PARTS AS THEY NEED IT AND RELEASES WHEN ITS NO LONGER USABLE. Why=> your db engine (and PC) becomes more responsive using less memory. Using b+tree for lockup the branch leaf nodes needs just 2 Disk IO’s. Knowing the lockup value, you get the record long pointer. SEEK the main file for the position, read the content. This is too fast. Memory is faster… Yes it is, but can you put 10 GB’s of a b+tree on memory? If so, how your DB engine program starts to behave? Slowlly?

    Forget binary trees and convencional btrees: they are academic tutorials. Real life they are replaced by hashtables or b+trees (B PLUS TREE showing storage and ordered ascending- http://en.wikipedia.org/wiki/B%2B_tree)

    • Consider using dataspaces for the db data in multiple disks. You can parallelize Disk IO performance. Dont forget to mirrored them… Each dataspace, should have a fragment of the table with a fragment of the indice, with a partial log file. You should develop the coordinator which presents wiselly the queries for the sub units.

    IE: 3 dataspaces…
    INSERT INTO etc…… only should happend in 1 table space.

    but
    select * from TB_XPTO, should be presented to all dataspaces.

    select * from TB_XPTO order by an indexed field, should be presented to all dataspaces. Each data space executes the instruction, so now we have a 3 subsets by their sub order.

    The result will be on the coordinator, where will reorder it again.
    Confuse, BUT FAST!!!!!!

    The coordinator should controls the master transaction.

    if dataspace A commited
    dataspace B commited
    dataspace C is in uncommited state
    the coordinator will rollback a C,B and A.

    if dataspace A commited
    dataspace B commited
    dataspace C commited
    the coordinator will Commit the overall transaction.

    COORDINATOR LOG:
    CREATE MASTER TRANSACTION UID 121212, CHILD TRANSACTIONS(1111,2222,3333)

    DATA SPACE A LOG
    1111 INSERT len byte array
    1111 INSERT len byte array
    COMMIT 1111

    DATA SPACE B LOG
    2222 INSERT len byte array
    2222 INSERT len byte array
    COMMIT 2222

    DATA SPACE C LOG
    3333 INSERT len byte array
    3333 —> No more nothing….. Power failure here!!!!!!!

    On startup coordinator check if the db was properlly closed, if not, it will check his log file. Well, is missing a master commit line like COMMIT 121212. So it will enquire the data spaces for the log consistency.
    A,B repplies COMMITED, but C, after checked his log file, detects a failure. Replies UNCOMMITED.
    Master Coordinator FORCES TABLESPACE A,B,C FOR ROLLBACK 1111,2222,3333
    After that, himself rollbacks his master transaction and puts DB state=OK.

    The main point here is speed on insert,selects, updates, and deletes

    • Consider to maintain the index well balanced. Many deletes on the index will unbalanced it. An unbalanced index drops its performance…. Add a heap on the head of the index file, for controlling it. Some math here would help. If deletes are higher than 5% of records, balance it and reset the counter. If an update is over on an indexed field, should count it too.

    • Be smart considering the field index. If the column is Gender, there are only 2 options(i hope, lol…. ops, can be nullable too….), a bitmap index is well applied. If the distinctness (i think i spell it badlly) of a field is 100% (all values heterogeneous), like a sequence applied on a field like Oracle do, or an identity field like SQL Server do, a b+tree is well applied. If a field is kind of geometric type, like in Oracle, the R-Tree is the best. For strings, reversed Index is well applied, or b+tree if heterogenous.

    • Houston, we have problems….
      NULL value fields, should be considered too in the index. Its a value too!!!!
      IE: WHERE F1 is null

    • Add some socket functionality:Async TCP/IP SERVER

    -If you delete a record, dont resize the file right now. Mark it as deleted. You should do some metrics here too. If unused space > x and transactions =0, do a database lock and re-allocate pointers, then resize database. Some spaces appears on the DB file, you can try to do some page locks instead of database lock… Things can keep going and no one gets hurt…. Measure the last unlocked page of the DB, lock it for you. Check a deleted page that you can fill with your page. Not Found, release lock; If found, move for the new position, fix pointers, mark old page as deleted, resize file, release lock. Why so many operations? To keep the log well formed!!!! You can split the page in small pages, but you get fragmentation (argh…we lost speed commander?)… 2 algorithms comes here. Best-Fit, and Worst-Fit….Google it. The best is …. using both 😛

    And if you solve all of this stuff, you can shout out loud “DAM, I DID A DATABASE… IM GONA NAME IT ORACLE!!!!” ;P

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have started using M-x compile to compile programs in say, C. What is
Today I have been experimenting with SQL binary objects. I started by storing an
I have started using Linq to SQL in a (bit DDD like) system which
I have started using WSO2 Stratos live and started using WSO2 data services server.
I have started using SqLite recently, so I am relatively new to it. I
I have started using https://github.com/omab/django-social-auth and been successfully able to login via twitter, google
I have started using struts .I have hanged in a place ,Code is bellow
I have started using PDO in php for the first time. Here is my
I have started using Linq to SQL for a project im working on and
I have started using of generics in Delphi 2010 but I have a problem

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.