Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7930247
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 3, 20262026-06-03T20:21:46+00:00 2026-06-03T20:21:46+00:00

I have two files with with the same number of columns, but a different

  • 0

I have two files with with the same number of columns, but a different number of rows. One file is a list of timestamps and a list of words, the second file is a list of timestamps with a list of sounds in each of the words, i.e.,:

9640 12783 she
12783 17103 had
...

and:

9640 11240 sh
11240 12783 iy
12783 14078 hv
14078 16157 ae
16157 16880 dcl
16880 17103 d
...

I want to merge these two files and create a list of entries with the word as one value, and the phonetic transcription as the other, i.e.,:

[['she', 'sh iy']
 ['had', 'hv ae dcl d']
  ...

I’m a complete Python (and programming) noob, but my original idea was to do this by searching the second file for the second field in the first file, and then appending them into a list. I tried doing it this way:

word = open('SA1.WRD','r')
phone = open('SA1.PHN','r')
word_phone = []

for line in word.readlines():
    words = line.split()
    word = words[2]
    word_phone.append(word)

for line in phone.readlines():
    phones = line.split()
    phone = phones[2]
    if int(phones[1]) <= int(words[1]):
        word_phone.append(phone)

print word_phone

This is the output:

['she', 'had', 'your', 'dark', 'suit', 'in', 'greasy', 'wash', 'water', 'all', 'year', 'sh', 'iy', 'hv', 'ae', 'dcl', 'd', 'y', 'er', 'dcl', 'd', 'aa', 'r', 'kcl', 'k', 's', 'uw', 'dx', 'ih', 'ng', 'gcl', 'g', 'r', 'iy', 's', 'iy', 'w', 'aa', 'sh', 'epi', 'w', 'aa', 'dx', 'er', 'q', 'ao', 'l', 'y', 'iy', 'axr']

As I said, I’m a total noob, and some suggestions would be very helpful.

Update:
I’d like to revisit this question if possible. I’ve modified Lattyware’s code to operate on a directory:

phns = []
wrds = []
for root, dir, files in os.walk(sys.argv[1]):
    wrds = wrds + [ os.path.join( root, f ) for f in files if f.endswith( '.WRD' ) ]
    phns = phns + [ os.path.join( root, f ) for f in files if f.endswith( '.PHN' ) ]
phns.sort()
wrds.sort()
files = (zip(wrds,phns))

#OPEN THE WORD AND PHONE FILES, COMPARE THEM
output = []
for file in files:
    with open( file[0] ) as unsplit_words, open( file[1] ) as unsplit_sounds:
        sounds = (line.split() for line in unsplit_sounds)
        words = (line.split() for line in unsplit_words)
        output = output +  [
          (word, " ".join(sound for _, _, sound in
                    takeuntil(sounds, stop)))
                for start, stop, word in words
            ]

There is some information I would like to retain in the filepaths of these files. I was wondering how I might go about appending the split file path to the tuples in the list this code returns, e.g.,

[('she', 'sh iy', 'directory', 'subdirectory'), ('had', 'hv ae dcl d', 'directory', subdirectory')]

I figured I could I could split the paths and then zip the lists together, but there are 53,000 total items in the list the code above outputs, but only 6300 file pairs being processed.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-03T20:21:49+00:00Added an answer on June 3, 2026 at 8:21 pm

    This is a task where the main issue is matching the sounds with the words. Fortunately, this is easy to do as we can simply take all the sounds until they match the words end time.

    To do this, we must construct a takeuntil() function – itertools.takewhile() (my original solution) unfortunately takes an extra value, so this is the best solution.

    def takeuntil(iterable, stop):
        for x in iterable:
            yield x
            if x[1] == stop:
                break
    
    with open("SA1.WRD") as unsplit_words, open("SA1.PHN") as unsplit_sounds:
        sounds = (line.split() for line in unsplit_sounds)
        words = (line.split() for line in unsplit_words)
        output = [
            (word, " ".join(sound for _, _, sound in takeuntil(sounds, stop)))
            for start, stop, word in words
        ]
    
    print(output)
    

    Gives us:

    [('she', 'sh iy'), ('had', 'hv ae dcl d')]
    

    This code uses the with statement for readability and closing the files (even on exceptions). It also makes a lot of use of list comprehensions and generator expressions.

    There are some bad patterns in your code. Your use of open() without the with statement is a bad idea, and using readlines() isn’t needed (loop directly over the file – it’s lazy and therefore far more efficient in most cases, not to mention nicer to read and less to type).

    So how does this work? Let’s run through it:

    First we open both our files to read from, and throw in quick generator expressions to split the lines in the files.

    Next comes a bit of a monster list comprehension. What we do in this is take sounds from our sounds iterable until we reach the last sound belonging to the word we are on, then move onto the next word, returning the word and the list of associated sounds. We then use str.join() to join the sounds into a single string.

    If you have trouble understanding the thought process, then here is an expanded version that works the same way, albeit much less efficiently due to the python-side loops (generators and list comprehensions make the above far quicker):

    with open("SA1.WRD") as words, open("SA1.PHN") as sounds:
        output = []
        current = []
        for line in words:
            start, stop, word = line.split()
            for sound_line in sounds:
                sound_start, sound_stop, sound = sound_line.split()
                current.append(sound)
                if sound_stop == stop:
                    break
            output.append((word, " ".join(current)))
            current = []
    
    print(output)
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have two files, both in the same format -- two columns both containing
While data frame columns must have the same number rows, is there any way
I have two files which both follow the same pattern: TEST CASE 1: 0.004
I have two .ascx files in the same folder in an ASP.NET MVC project.
I have two properties files that are not the same and I need to
How can I discover if two given files have the same permissions in Ruby?
I have two quick questions: When do two file descriptors point to the same
I have two HANDLEs and they are created from the same file, in such
In this the catalog.xml file. I have two books who have the same inventory
Two Windows processes have memory mapped the same shared file. If the file consists

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.