Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8521265
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T06:46:27+00:00 2026-06-11T06:46:27+00:00

I have a column of data that contains strings, and I want to create

  • 0

I have a column of data that contains strings, and I want to create a new column that takes only the first two characters from the corresponding data string.

It seems logical to use the apply function for this, but it doesn’t work like expected. It does not even seem to be consistent with other uses of apply. See below.

In [205]: dfrm_test = pandas.DataFrame({"A":np.repeat("the", 10)})

In [206]: dfrm_test
Out[206]:
     A
0  the
1  the
2  the
3  the
4  the
5  the
6  the
7  the
8  the
9  the

In [207]: dfrm_test["A"].apply(lambda x: x+" cat")
Out[207]:
0    the cat
1    the cat
2    the cat
3    the cat
4    the cat
5    the cat
6    the cat
7    the cat
8    the cat
9    the cat
Name: A

In [208]: dfrm_test["A"].apply(lambda x: x[0:2])
Out[208]:
0    the
1    the
Name: A

Based on this, it appears that apply does nothing but perform the NumPy equivalent of whatever is called inside. That is, apply seems to execute the same thing as arr + " cat" in the first example. And if NumPy happens to broadcast that, then it will work. If not, then it won’t.

But this seems to break from what apply promises in the docs. Below is the quotation for what pandas.Series.apply should expect:

Invoke function on values of Series. Can be ufunc or Python function expecting only single values (link)

It says explicitly that it can accept Python functions expecting only single values. And the function that’s not working (lambda x: x[0:2]) definitely satisfies that. It doesn’t say that the single argument must be an array. And given that things like numpy.sqrt are commonly used for single inputs (so not exclusively arrays), it seems natural to expect Pandas to work with any such function.

Is there some way of using apply that I am missing here?

Note: I did write my own extra function below:

def ix2(arr):
    return np.asarray([x[0:2] for x in arr])

and I verified that this version does work with Pandas apply. But this is beside the point. It would be easier to write something that operated externally on top of a Series object than to have to constantly write wrappers that use list comprehensions to effectively loop over the contents of the Series. Isn’t this specifically what apply is supposed to abstract away from the user?

I am using Pandas version 0.7.3, and it is on a workplace shared network, so there’s no way to upgrade to the recent release.

Added:

I was able to confirm that this behavior changes from version 0.7.3 to version 0.8.1. In 0.8.1 it works as expected with no NumPy ufunc wrapper.

My guess is that in the code, someone was trying to use numpy.vectorize or numpy.frompyfunc within a try-except statement. Perhaps it did not work correctly with the particular lambda function I am using, and so in the except part of the code, it defaulted to just relying on generic NumPy broadcasting.

It would be great to get some confirmation on this from a Pandas developer, if possible. But in the meantime, the ufunc workaround should suffice.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T06:46:29+00:00Added an answer on June 11, 2026 at 6:46 am

    One workaround I can think of would be converting the Python function to numpy.ufunc with numpy.frompyfunc:

    numpy.frompyfunc((lambda x: x[0:2]), 1, 1)
    

    and use this in apply:

    In [50]: dfrm_test
    Out[50]:
         A
    0  the
    1  the
    2  the
    3  the
    4  the
    5  the
    6  the
    7  the
    8  the
    9  the
    
    In [51]: dfrm_test["A"].apply(np.frompyfunc((lambda x: x[0:2]), 1, 1))
    Out[51]:
    0    th
    1    th
    2    th
    3    th
    4    th
    5    th
    6    th
    7    th
    8    th
    9    th
    Name: A
    
    In [52]: pandas.version.version
    Out[52]: '0.7.3'
    
    In [53]: dfrm_test["A"].apply(lambda x: x[0:2])
    Out[53]:
    0    the
    1    the
    Name: A
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a .csv file that contains data for only certain columns in a
I have a data frame with two columns, and want to create a third
I have an NSTableView that contains a few columns that are populated with data.
Let's assume that we have a data frame x which contains the columns job
I have some data that I am displaying in 3 column format, of the
I have some data that is stored in a TIMESTAMP(6) WITH TIMEZONE column in
I have a varchar column that has data like this top<somenumber> so the word
I have table with column Percentage varchar(10) Data in that table is Pecentage 2/10
I have some data that looks something like this... +----------+----------+----------+ | Column 1 |
I have a table of data that I need to dynamically add a column

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.