What is a middle-level semantic concept? I came across it in a paper that explains how videos on the Web can be annotated more clearly, and I didn’t fully understand the term.
Thanks.
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
The term comes from a sophisticated approach to automatic image classification known as “Multi-level Semantics.” The ultimate goal of this science is to develop software algorithms that are capable of viewing a picture and automatically assigning meaningful labels. The idea (very loosely) involves categorizing a picture hierarchically; that is, the picture is initially assigned a “high-level” semantic meaning (e.g., “crowd” or “outdoor landscape”), and then progressively assigned more meaningful “middle-level” semantics based on feature-recognition within smaller homogeneous regions of the image. A similar approach has been used for years in document classification (for textual categorization and summarization), and is being extended for use with videos as well.
The whole idea is built around the concept of “ontologies”, which can basically be thought of as hierarchical classification categories (like the ones we used to use in the early category-based search engines, in which you could drill down into increasingly more specific categories). For a more precise and technical description of this whole topic, you might try to skim this paper: “Ontology-based large-scale image classification, indexing and exploration” (Y. Gao, pgs. 9ff)
Note that I am by no means an expert in this field. You could no doubt get a much more precise and informative answer if you post this question over at the “Signal Processing” stackexchange forum.