Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7830057
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 2, 20262026-06-02T11:03:28+00:00 2026-06-02T11:03:28+00:00

Short version: This prints 3, which makes sense because in Go strings are basically

  • 0

Short version:
This prints 3, which makes sense because in Go strings are basically a slice of bytes, and it takes three bytes to represent this character. How can I get len, and regexp functions to work in terms of characters, not bytes.

package main
import "fmt"
func main() {
    fmt.Println(len("ウ"))//returns 3
    fmt.Println(utf8.RuneCountInString("ウ"))//returns 1
}

Background:

I’m saving text into the GAE datastore using JDO (Java).

Then I’m processing the text using Go, specifically I’m using regexp.FindStringIndex and saving the index to the datastore.

Then back in Java land I send the unmodified text, and index to the GWT client via json.

Somewhere along the way the indexes are ‘shifting’, so by the time its on the client, they are off.

It seems the issue has to do with character encoding, I’m assuming Java/Go are interpreting the text (indexes) differently utf-8 char/byte?. I see references to Runes in the regexp package.

I think I can either make regexp.FindStringIndex return byte indexes in go, or make GWT client understand the utf-8 indexes.

Any suggestions? I should be using UTF-8 incase I need to internationalize the app in the future, right?

Thanks

EDIT:

Also when I was finding the index using Java on the server things just worked.

On the client (GWT) I’m using text.substring(start,end)

TEST:

package main

import "regexp"
import "fmt"

func main() {
    fmt.Print(regexp.MustCompile(`a`).FindStringIndex("ウィキa")[1])
}

The code outputs 10, not 4.

The plan is to get FindStringIndex to return 4, any ideas?

Update 2: Position Conversion

func main() {
    s:="ab日aba本語ba";
    byteIndex:=regexp.MustCompile(`a`).FindAllStringIndex(s,-1)
    fmt.Println(byteIndex)//[[0 1] [5 6] [7 8] [15 16]]

    offset :=0
    posMap := make([]int,len(s))//maps byte-positions to char-positions
    for pos, char := range s {
        fmt.Printf("character %c starts at byte position %d, has an offset of %d, and a char position of %d.\n", char, pos,offset,pos-offset)
        posMap[pos]=offset
        offset += utf8.RuneLen(char)-1
    }
    fmt.Println("posMap =",posMap)
    for pos ,value:= range byteIndex{
        fmt.Printf("pos:%d value:%d subtract %d\n",pos,value,posMap[value[0]])
        value[1]-=posMap[value[0]]
        value[0]-=posMap[value[0]]
    }
    fmt.Println(byteIndex)//[[0 1] [3 4] [5 6] [9 10]]

}

* Update 2 *

    lastPos:=-1
    for pos, char := range s {
        offset +=pos-lastPos-1
        fmt.Printf("character %c starts at byte position %d, has an offset of %d, and a char position of %d.\n", char, pos,offset,pos-offset)
        posMap[pos]=offset
        lastPos=pos
    }
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-02T11:03:30+00:00Added an answer on June 2, 2026 at 11:03 am

    As you’ve probably gathered, Go and Java treat strings differently. In Java, a string is a series of codepoints (characters); in Go, a string is a series of bytes. Text manipulation functions in Go understand UTF-8 codepoints when necessary, but since the string is represented as bytes, the indices they return and work with are byte indexes, not character indexes.

    As you observe in the comments, you can use a RuneReader and FindReaderIndex to get indexes in characters rather than bytes. strings.Reader provides an implementation of RuneReader, so you can use strings.NewReader to wrap a string in a RuneReader.

    Another option is to take the substring you want the length of in characters and pass it to utf8.RuneLen, which returns the number of characters in the UTF-8 string. Using a RuneReader is probably more efficient, however.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

This is basically a rails 3 version of this question . Short of parsing
So the short version of this is: Can I traverse only the elements within
Short version : echo testing | vim - | grep good This doesn't work
The short version of the question - why can't I do this? I'm restricted
So the wonderful low down on this doozie of a problem: short version: We
Short Version: When I've created a Channel using ChannelFactory on a client which uses
I am trying to implement a simpler version of this algorithm but which works
Short Version I want to extend SoapClient so it does this internally when accessing
Short version: I want to trigger the Form_Load() event without making the form visible.
Short version: I'm wondering if it's possible, and how best, to utilise CPU specific

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.