I have a function called get_chapter which takes a page number as an argument and returns a unique string representing the chapter the page belongs to, for example “The Story Continues”. If I enter a page number outside the book, I am returned an empty string.
The first page is page 0. Chapters are a set of consecutive pages and a given page belongs to only one chapter.
What algorithm would you recommend which can identify the page ranges for each chapter? Any estimate as to how many times I would need to call get_chapter?
I need to limit calls to get_chapter as much as possible. Chapters average 50000 pages. And there are approximately 30000000 pages in the book! Not sure how many chapters exist.
Prime a list of chapter boundaries with the first page.
Set
lowto the first page andhighto the last.If
get_chapter(low) == get_chapter(high), then you know everything in that range is in the same chapter, and you don’t need to divide it further.If
get_chapter(low) != get_chapter(high)andlow + 1 == high, then you have adjacent pages in different chapters. That means a new chapter begins at high.If
get_chapter(low) != get_chapter(high)andlow + 1 < high, then there is at least one chapter boundary in the range. Split the range by choosing a page in middle and recursively descend both of the new ranges (low:middle and middle:high).If you added the boundaries to a list as you found them, and you always recursed the lower subrange first, then you’re done. Otherwise, sort the boundary list.
I believe the run-time complexity is approximately O(number_of_chapters * log_2(average_chapter_size)), but that’s a gut-check and not a thorough analysis.