Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 925287
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T19:29:45+00:00 2026-05-15T19:29:45+00:00

I’m using an ARM Cortex-A8 based processor called as i.MX515. There is linux Ubuntu

  • 0

I’m using an ARM Cortex-A8 based processor called as i.MX515. There is linux Ubuntu 9.10 distribution. I’m running a very big application written in C and I’m making use of gettimeofday(); functions to measure the time my application takes.

main()

{

gettimeofday(start);
....
....
....
gettimeofday(end);

}

This method was sufficient to look at what blocks of my application was taking what amount of time. But, now that, I’m trying to optimize my code very throughly, with the gettimeofday() method of calculating time, I see a lot of fluctuation between successive runs (Run before and after my optimizations), so I’m not able to determine the actual execution times, hence the impact of my improvements.

Can anyone suggest me what I should do?

If by accessing the cycle counter (Idea suggested on ARM website for Cortex-M3) can anyone point me to some code which gives me the steps I have to follow to access the timer registers on Cortex-A8?

If this method is not very accurate then please suggest some alternatives.

Thanks


Follow ups

Follow up 1: Wrote the following program on Code Sorcery, the executable was generated which when I tried running on the board, I got – Illegal instruction message 🙁

static inline unsigned int get_cyclecount (void)
{
    unsigned int value;
    // Read CCNT Register
    asm volatile ("MRC p15, 0, %0, c9, c13, 0\t\n": "=r"(value));
    return value;
}

static inline void init_perfcounters (int32_t do_reset, int32_t enable_divider)
{
    // in general enable all counters (including cycle counter)
    int32_t value = 1;

    // peform reset:
    if (do_reset)
    {
    value |= 2;     // reset all counters to zero.
    value |= 4;     // reset cycle counter to zero.
    }

    if (enable_divider)
    value |= 8;     // enable "by 64" divider for CCNT.

    value |= 16;

    // program the performance-counter control-register:
    asm volatile ("MCR p15, 0, %0, c9, c12, 0\t\n" :: "r"(value));

    // enable all counters:
    asm volatile ("MCR p15, 0, %0, c9, c12, 1\t\n" :: "r"(0x8000000f));

    // clear overflows:
    asm volatile ("MCR p15, 0, %0, c9, c12, 3\t\n" :: "r"(0x8000000f));
}



int main()
{

    /* enable user-mode access to the performance counter*/
asm ("MCR p15, 0, %0, C9, C14, 0\n\t" :: "r"(1));

/* disable counter overflow interrupts (just in case)*/
asm ("MCR p15, 0, %0, C9, C14, 2\n\t" :: "r"(0x8000000f));

    init_perfcounters (1, 0);

    // measure the counting overhead:
    unsigned int overhead = get_cyclecount();
    overhead = get_cyclecount() - overhead;

    unsigned int t = get_cyclecount();

    // do some stuff here..
    printf("\nHello World!!");

    t = get_cyclecount() - t;

    printf ("function took exactly %d cycles (including function call) ", t - overhead);

    get_cyclecount();

    return 0;
}

Follow up 2: I had written to Freescale for support and they have sent me back the following reply and a program (I did not quite understand much from it)

Here is what we can help you with right now:
I am sending you attach an example of code, that sends an stream using the UART, from what your code, it seems that you are not init correctly the MPU.

(hash)include <stdio.h>
(hash)include <stdlib.h>

(hash)define BIT13 0x02000

(hash)define R32   volatile unsigned long *
(hash)define R16   volatile unsigned short *
(hash)define R8   volatile unsigned char *

(hash)define reg32_UART1_USR1     (*(R32)(0x73FBC094))
(hash)define reg32_UART1_UTXD     (*(R32)(0x73FBC040))

(hash)define reg16_WMCR         (*(R16)(0x73F98008))
(hash)define reg16_WSR              (*(R16)(0x73F98002))

(hash)define AIPS_TZ1_BASE_ADDR             0x70000000
(hash)define IOMUXC_BASE_ADDR               AIPS_TZ1_BASE_ADDR+0x03FA8000

typedef unsigned long  U32;
typedef unsigned short U16;
typedef unsigned char  U8;


void serv_WDOG()
{
    reg16_WSR = 0x5555;
    reg16_WSR = 0xAAAA;
}


void outbyte(char ch)
{
    while( !(reg32_UART1_USR1 & BIT13)  );

    reg32_UART1_UTXD = ch ;
}


void _init()
{

}



void pause(int time) 
{
    int i;

    for ( i=0 ; i < time ;  i++);

} 


void led()
{

//Write to Data register [DR]

    *(R32)(0x73F88000) = 0x00000040;  // 1 --> GPIO 2_6 
    pause(500000);

    *(R32)(0x73F88000) = 0x00000000;  // 0 --> GPIO 2_6 
    pause(500000);


}

void init_port_for_led()
{


//GPIO 2_6   [73F8_8000] EIM_D22  (AC11)    DIAG_LED_GPIO
//ALT1 mode
//IOMUXC_SW_MUX_CTL_PAD_EIM_D22  [+0x0074]
//MUX_MODE [2:0]  = 001: Select mux mode: ALT1 mux port: GPIO[6] of instance: gpio2.

 // IOMUXC control for GPIO2_6

*(R32)(IOMUXC_BASE_ADDR + 0x74) = 0x00000001; 

//Write to DIR register [DIR]

*(R32)(0x73F88004) = 0x00000040;  // 1 : GPIO 2_6  - output

*(R32)(0x83FDA090) = 0x00003001;
*(R32)(0x83FDA090) = 0x00000007;


}

int main ()
{
  int k = 0x12345678 ;

    reg16_WMCR = 0 ;                        // disable watchdog
    init_port_for_led() ;

    while(1)
    {
        printf("Hello word %x\n\r", k ) ;
        serv_WDOG() ;
        led() ;

    }

    return(1) ;
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T19:29:46+00:00Added an answer on May 15, 2026 at 7:29 pm

    Accessing the performance counters isn’t difficult, but you have to enable them from kernel-mode. By default the counters are disabled.

    In a nutshell you have to execute the following two lines inside the kernel. Either as a loadable module or just adding the two lines somewhere in the board-init will do:

      /* enable user-mode access to the performance counter*/
      asm ("MCR p15, 0, %0, C9, C14, 0\n\t" :: "r"(1));
    
      /* disable counter overflow interrupts (just in case)*/
      asm ("MCR p15, 0, %0, C9, C14, 2\n\t" :: "r"(0x8000000f));
    

    Once you did this the cycle counter will start incrementing for each cycle. Overflows of the register will go unnoticed and don’t cause any problems (except they might mess up your measurements).

    Now you want to access the cycle-counter from the user-mode:

    We start with a function that reads the register:

    static inline unsigned int get_cyclecount (void)
    {
      unsigned int value;
      // Read CCNT Register
      asm volatile ("MRC p15, 0, %0, c9, c13, 0\t\n": "=r"(value));  
      return value;
    }
    

    And you most likely want to reset and set the divider as well:

    static inline void init_perfcounters (int32_t do_reset, int32_t enable_divider)
    {
      // in general enable all counters (including cycle counter)
      int32_t value = 1;
    
      // peform reset:  
      if (do_reset)
      {
        value |= 2;     // reset all counters to zero.
        value |= 4;     // reset cycle counter to zero.
      } 
    
      if (enable_divider)
        value |= 8;     // enable "by 64" divider for CCNT.
    
      value |= 16;
    
      // program the performance-counter control-register:
      asm volatile ("MCR p15, 0, %0, c9, c12, 0\t\n" :: "r"(value));  
    
      // enable all counters:  
      asm volatile ("MCR p15, 0, %0, c9, c12, 1\t\n" :: "r"(0x8000000f));  
    
      // clear overflows:
      asm volatile ("MCR p15, 0, %0, c9, c12, 3\t\n" :: "r"(0x8000000f));
    }
    

    do_reset will set the cycle-counter to zero. Easy as that.

    enable_diver will enable the 1/64 cycle divider. Without this flag set you’ll be measuring each cycle. With it enabled the counter gets increased for every 64 cycles. This is useful if you want to measure long times that would otherwise cause the counter to overflow.

    How to use it:

      // init counters:
      init_perfcounters (1, 0); 
    
      // measure the counting overhead:
      unsigned int overhead = get_cyclecount();
      overhead = get_cyclecount() - overhead;    
    
      unsigned int t = get_cyclecount();
    
      // do some stuff here..
      call_my_function();
    
      t = get_cyclecount() - t;
    
      printf ("function took exactly %d cycles (including function call) ", t - overhead);
    

    Should work on all Cortex-A8 CPUs..

    Oh – and some notes:

    Using these counters you’ll measure the exact time between the two calls to get_cyclecount() including everything spent in other processes or in the kernel. There is no way to restrict the measurement to your process or a single thread.

    Also calling get_cyclecount() isn’t free. It will compile to a single asm-instruction, but moves from the co-processor will stall the entire ARM pipeline. The overhead is quite high and can skew your measurement. Fortunately the overhead is also fixed, so you can measure it and subtract it from your timings.

    In my example I did that for every measurement. Don’t do this in practice. An interrupt will sooner or later occur between the two calls and skew your measurements even further. I suggest that you measure the overhead a couple of times on an idle system, ignore all outsiders and use a fixed constant instead.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm new to using the Perl treebuilder module for HTML parsing and can't figure
That's pretty much it. I'm using Nokogiri to scrape a web page what has
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I am reading a book about Javascript and jQuery and using one of the
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
I'm using v2.0 of ClassTextile.php, with the following call: $testimonial_text = $textile->TextileRestricted($_POST['testimonial']); ... and
I have a French site that I want to parse, but am running into
We're building an app, our first using Rails 3, and we're having to build
I'm parsing an RSS feed that has an &#8217; in it. SimpleXML turns this
Configuring TinyMCE to allow for tags, based on a customer requirement. My config is

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.