Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8651991
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T14:12:12+00:00 2026-06-12T14:12:12+00:00

I’m trying to process a log file, each line of which looks something like

  • 0

I’m trying to process a log file, each line of which looks something like this:

flow_stats: 0.30062869162666672 gid 0 fid 1 pkts 5.0 fldur 0.30001386666666674 avgfldur 0.30001386666666674 actfl 3142 avgpps 16.665896331902879 finfl 1

I’m interested in the pkts field and the fldur field. I’ve got a Python script that can read in a million-line log file, create a list for each number of packets of all the different durations, sort those lists and figure out the median in about 3 seconds.

I’m playing around with the Go programming language and thought I’d rewrite this, in the hope that it would run faster.

So far, I’ve been disappointed. Just reading the file in to the data structure takes about 5.5 seconds. So I’m wondering if some of you wonderful people can help me make this go (hehe) faster.

Here’s my loop:

data := make(map[int][]float32)
infile, err := os.Open("tmp/flow.tr")
defer infile.Close()
if err != nil {
  panic(err)
}
reader := bufio.NewReader(infile)

line, err := reader.ReadString('\n')
for {
  if len(line) == 0 {
    break
  }
  if err != nil && err != io.EOF {
    panic(err)
  }
  split_line := strings.Fields(line)
  num_packets, err := strconv.ParseFloat(split_line[7], 32)
  duration, err := strconv.ParseFloat(split_line[9], 32)
  data[int(num_packets)] = append(data[int(num_packets)], float32(duration))

  line, err = reader.ReadString('\n')
}

Note that I do actually check the errs in the loop — I’ve omitted that for brevity. google-pprof indicates that a majority of the time is being spent in strings.Fields by strings.FieldsFunc, unicode.IsSpace, and runtime.stringiter2.

How can I make this run faster?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T14:12:13+00:00Added an answer on June 12, 2026 at 2:12 pm

    Replacing

    split_line := strings.Fields(line)

    with

    split_line := strings.SplitN(line, " ", 11)

    Yielded ~4x speed improvement on a 1M line randomly generated file that mimicked the format you provided above:

    strings.Fields version: Completed in 4.232525975s

    strings.SplitN version: Completed in 1.111450755s

    Some of the efficiency comes from being able to avoid parsing and splitting the input line after the duration has be split, but most of it comes from the simpler splitting logic in SplitN. Even splitting all of the strings doesn’t take much longer than stopping after the duration. Using:

    split_line := strings.SplitN(line, " ", -1)

    Completed in 1.554971313s

    SplitN and Fields are not the same. Fields assumes tokens are bounded by 1 or more whitespace characters, where SplitN treats tokens as anything bounded by the separator string. If your input had multiple spaces between tokens, split_line would contain empty tokens for each pair of spaces.

    Sorting and calculating the median does not add much time. I changed the code to use a float64 rather than a float32 as a matter of convenience when sorting. Here’s the complete program:

    package main
    
    import (
        "bufio"
        "fmt"
        "os"
        "sort"
        "strconv"
        "strings"
        "time"
    )
    
    // SortKeys returns a sorted list of key values from a map[int][]float64.
    func sortKeys(items map[int][]float64) []int {
        keys := make([]int, len(items))
        i := 0
        for k, _ := range items {
            keys[i] = k
            i++
        }
        sort.Ints(keys)
        return keys
    }
    
    // Median calculates the median value of an unsorted slice of float64.
    func median(d []float64) (m float64) {
        sort.Float64s(d)
        length := len(d)
        if length%2 == 1 {
            m = d[length/2]
        } else {
            m = (d[length/2] + d[length/2-1]) / 2
        }
        return m
    }
    
    func main() {
        data := make(map[int][]float64)
        infile, err := os.Open("sample.log")
        defer infile.Close()
        if err != nil {
            panic(err)
        }
        reader := bufio.NewReaderSize(infile, 256*1024)
    
        s := time.Now()
        for {
            line, err := reader.ReadString('\n')
            if len(line) == 0 {
                break
            }
            if err != nil {
                panic(err)
            }
            split_line := strings.SplitN(line, " ", 11)
            num_packets, err := strconv.ParseFloat(split_line[7], 32)
            if err != nil {
                panic(err)
            }
            duration, err := strconv.ParseFloat(split_line[9], 32)
            if err != nil {
                panic(err)
            }
            pkts := int(num_packets)
            data[pkts] = append(data[pkts], duration)
        }
    
        for _, k := range sortKeys(data) {
            fmt.Printf("pkts: %d, median: %f\n", k, median(data[k]))
        }
        fmt.Println("\nCompleted in ", time.Since(s))
    }
    

    And the output:

    pkts: 0, median: 0.498146
    pkts: 1, median: 0.511023
    pkts: 2, median: 0.501408
    ...
    pkts: 99, median: 0.501517
    pkts: 100, median: 0.491499
    
    Completed in  1.497052072s
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a string like this: La Torre Eiffel paragonata all’Everest What PHP function
I am trying to render a haml file in a javascript response like so:
I would like to run a str_replace or preg_replace which looks for certain words
I am trying to understand how to use SyndicationItem to display feed which is
Basically, what I'm trying to create is a page of div tags, each has
For some reason, after submitting a string like this Jack’s Spindle from a text
I'm parsing an RSS feed that has an ’ in it. SimpleXML turns this
I have an autohotkey script which looks up a word in a bilingual dictionary
I'm trying to select an H1 element which is the second-child in its group
I have a .ini file as follows: [playlist] numberofentries=2 File1=http://87.230.82.17:80 Title1=(#1 - 365/1400) Example

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.