I have a boatload of text files (see below) that I need to parse. They contain chapter information that I’d like to capture and use to create associated records. Report has_many :chapters
Basically I need to read each line and for each BookmarkTitle capture the chapter name (ignoring the CR ) and then capture the BookmarkPageNumber. Then bundle the pairing up and create a new record with it: report.page.create(title: bookmark_title, page_number: bookmark_page_number)
I’ve played a bit with IO‘s readline but not sure how to capture the contents… a RegEx perhaps? or a more Rails-y way?
sample txt file:
InfoKey: Creator
InfoValue: Adobe Acrobat 9.3.4
InfoKey: Producer
InfoValue: Adobe Acrobat 9.34 Paper Capture Plug-in
InfoKey: ModDate
InfoValue: D:20110315193536-04'00'
InfoKey: CreationDate
InfoValue: D:20110208171413-05'00'
PdfID0: 2dab1ce43882a53cbc24dbb839f921f8
PdfID1: 43b19192e920f38f65de0bf0a2be
NumberOfPages: 258
BookmarkTitle: 1980 Field Service Annual Report
BookmarkLevel: 1
BookmarkPageNumber: 3
BookmarkTitle: TABLE OF CONTENTS
BookmarkLevel: 1
BookmarkPageNumber: 4
BookmarkTitle: LIST OF EXHIBITS
BookmarkLevel: 1
BookmarkPageNumber: 7
BookmarkTitle: I - INTRODUCTION
BookmarkLevel: 1
BookmarkPageNumber: 11
BookmarkTitle: II - EXECUTIVE SUMMARY
BookmarkLevel: 1
BookmarkPageNumber: 16
BookmarkTitle: III - RESULTS AND ANALYSIS OF THE MAINTENANCE USER SURVEY
BookmarkLevel: 1
BookmarkPageNumber: 45
BookmarkTitle: IV - COMPARATIVE ANALYSIS OF BIGCO AND OTHER MAINTENANCE VENDORS
BookmarkLevel: 1
BookmarkPageNumber: 102
BookmarkTitle: V - RESULTS OF VENDOR SURVEY
BookmarkLevel: 1
BookmarkPageNumber: 127
BookmarkTitle: VI - SIGNIFICANT VENDOR ACTIVITIES, 1979-1980
BookmarkLevel: 1
BookmarkPageNumber: 190
BookmarkTitle: APPENDIX A: DEFINITIONS
BookmarkLevel: 1
BookmarkPageNumber: 199
BookmarkTitle: APPENDIX B: RESEARCH METHODOLOGY
BookmarkLevel: 1
BookmarkPageNumber: 204
BookmarkTitle: APPENDIX C: SUPPORTING CHARTS
BookmarkLevel: 1
BookmarkPageNumber: 211
BookmarkTitle: APPENDIX D: USER QUESTIONNAIRE
BookmarkLevel: 1
BookmarkPageNumber: 222
BookmarkTitle: APPENDIX E: VENDOR QUESTIONNAIRE
BookmarkLevel: 1
BookmarkPageNumber: 237
I’m sorry, I’m not a Ruby-On-Rails Developer, but that regular expression will match each bookmark, and return:
It does assume that level and page number are numeric without spaces, comma or decimals. But that could easily by changed.