Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...

Author:
Yun Q. Shi

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Moshe Y. Vardi Rice University, Houston, TX, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany

4499

Yun Q. Shi (Ed.)

Transactions on Data Hiding and Multimedia Security II

13

Volume Editor Yun Q. Shi New Jersey Institute of Technology Department of Electrical and Computer Engineering 323, M.L. King Blvd., Newark, NJ 07102, USA E-mail: [email protected]

Library of Congress Control Number: 2007928444 CR Subject Classification (1998): K.4.1, K.6.5, H.5.1, D.4.6, E.3, E.4, F.2.2, H.3, I.4 LNCS Sublibrary: SL 4 – Security and Cryptology ISSN ISSN ISBN-10 ISBN-13

0302-9743 (Lecture Notes in Computer Science) 1864-3043 (Transactions on Data Hiding and Multimedia Security) 3-540-73091-5 Springer Berlin Heidelberg New York 978-3-540-73091-0 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2007 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12077731 06/3180 543210

Preface

In this volume we present the second issue of the LNCS Transactions on Data Hiding and Multimedia Security. In the ﬁrst paper, Adelsbach et al. introduce ﬁngercasting, a combination of broadcast encryption and ﬁngerprinting for secure content distribution. They also provide for the ﬁrst time a security proof for a lookup table-based encryption scheme. In the second paper, He and Kirovski propose an estimation attack on content-based video ﬁngerprinting schemes. Although the authors tailor the attack towards a speciﬁc video ﬁngerprint, the generic form of the attack is expected to be applicable to a wide range of video watermarking schemes. In the third paper, Ye et al. present a new feature distance measure for error-resilient image authentication, which allows one to diﬀerentiate maliciousimage manipulations from changes that do not interfere with the semantics of an image. In the fourth paper, Luo et al. present a steganalytic technique against steganographic embedding methods utilizing the two least signiﬁcant bit planes. Experimental results demonstrate that this steganalysis method can reliably detect embedded messages and estimate their length with high precision. Finally, Alface and Macq present a comprehensive survey on blind and robust 3-D shape watermarking. We hope that this issue is of great interest to the research community and will trigger new research in the ﬁeld of data hiding and multimedia security. Finally, we want to thank all the authors, reviewers and editors who devoted their valuable time to the success of this second issue. Special thanks go to Springer and Alfred Hofmann for their continuous support. March 2007

Yun Q. Shi (Editor-in-Chief) Hyoung-Joong Kim (Vice Editor-in-Chief) Stefan Katzenbeisser (Vice Editor-in-Chief)

LNCS Transactions on Data Hiding and Multimedia Security Editorial Board

Editor-in-Chief Yun Q. Shi

New Jersey Institute of Technology, Newark, NJ, USA [email protected]

Vice Editors-in-Chief Hyoung-Joong Kim Stefan Katzenbeisser

Korea University, Seoul, Korea [email protected] Philips Research Europe, Eindhoven, Netherlands [email protected]

Associate Editors Mauro Barni Jeffrey Bloom Jana Dittmann

Jiwu Huang Mohan Kankanhalli Darko Kirovski C. C. Jay Kuo Heung-Kyu Lee

Benoit Macq Nasir Memon Kivanc Mihcak

University of Siena, Siena, Italy [email protected] Thomson, Princeton, NJ, USA [email protected] Otto-von-Guericke-University Magdeburg, Magdeburg,Germany [email protected] Sun Yat-sen University, Guangzhou, China [email protected] National University of Singapore, Singapore [email protected] Microsoft, Redmond, WA, USA [email protected] University of Southern California, Los Angeles, USA [email protected] Korea Advanced Institute of Science and Technology, Daejeon, Korea [email protected] Catholic University of Louvain, Belgium [email protected] Polytechnic University, Brooklyn, NY, USA [email protected] Bogazici University, Istanbul, Turkey [email protected]

VIII

Editorial Board

Hideki Noda Jeng-Shyang Pan

Fernando Perez-Gonzalez Andreas Pfitzmann Alessandro Piva Yong-Man Ro

Ahmad-Reza Sadeghi Kouichi Sakurai Qibin Sun Edward Wong

Kyushu Institute of Technology, Iizuka, Japan [email protected] National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan [email protected] University of Vigo, Vigo, Spain [email protected] Dresden University of Technology, Germany [email protected] University of Florence, Florence, Italy [email protected] Information and Communications University, Daejeon, Korea [email protected] Ruhr-University, Bochum, Germany [email protected] Kyushu University, Fukuoka, Japan [email protected] Institute of Infocomm Research, Singapore [email protected] Polytechnic University, Brooklyn, NY, USA [email protected]

Advisory Board Pil Joong Lee

Bede Liu

Pohang University of Science and Technology, Pohang, Korea [email protected] Princeton University, Princeton, NJ, USA [email protected]

Table of Contents

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andr´e Adelsbach, Ulrich Huber, and Ahmad-Reza Sadeghi

1

An Estimation Attack on Content-Based Video Fingerprinting . . . . . . . . . Shan He and Darko Kirovski

35

Statistics- and Spatiality-Based Feature Distance Measure for Error Resilient Image Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shuiming Ye, Qibin Sun, and Ee-Chien Chang

48

LTSB Steganalysis Based on Quartic Equation . . . . . . . . . . . . . . . . . . . . . . . Xiangyang Luo, Chunfang Yang, Daoshun Wang, and Fenlin Liu

68

From 3D Mesh Data Hiding to 3D Shape Blind and Robust Watermarking: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Patrice Rondao Alface and Benoit Macq

91

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

117

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages Andr´e Adelsbach, Ulrich Huber, and Ahmad-Reza Sadeghi Horst G¨ ortz Institute for IT Security Ruhr-Universit¨ at Bochum Universit¨ atsstraße 150 D-44780 Bochum Germany [email protected], {huber,sadeghi}@crypto.rub.de

Abstract. We propose a stream cipher that provides conﬁdentiality, traceability and renewability in the context of broadcast encryption assuming that collusion-resistant watermarks exist. We prove it to be as secure as the generic pseudo-random sequence on which it operates. This encryption approach, termed ﬁngercasting, achieves joint decryption and ﬁngerprinting of broadcast messages in such a way that an adversary cannot separate both operations or prevent them from happening simultaneously. The scheme is a combination of a known broadcast encryption scheme, a well-known class of ﬁngerprinting schemes and an encryption scheme inspired by the Chameleon cipher. It is the ﬁrst to provide a formal security proof and a non-constant lower bound for resistance against collusion of malicious users, i.e., a minimum number of content copies needed to remove all ﬁngerprints. To achieve traceability, the scheme ﬁngerprints the receivers’ key tables such that they embed a ﬁngerprint into the content during decryption. The scheme is eﬃcient and includes parameters that allow, for example, to trade-oﬀ storage size for computation cost at the receiving end. Keywords: Chameleon encryption, stream cipher, spread-spectrum watermarking, ﬁngerprinting, collusion resistance, frame-proofness, broadcast encryption.

1

Introduction

Experience shows that adversaries attack Broadcast Encryption (BE) systems in a variety of diﬀerent ways. Their attacks may be on the hardware that stores cryptographic keys, e.g., when they extract keys from a compliant device to develop a pirate device such as the DeCSS software that circumvents the Content Scrambling System [2]. Alternatively, their attacks may be on the decrypted content, e.g., when a legitimate user shares decrypted content with illegitimate users on a ﬁle sharing system such as Napster, Kazaa, and BitTorrent.

An extended abstract of this paper appeared in the Proceedings of the Tenth Australasian Conference on Information Security and Privacy (ACISP 2006) [1].

Y.Q. Shi (Eds.): Transactions on DHMS II, LNCS 4499, pp. 1–34, 2007. c Springer-Verlag Berlin Heidelberg 2007

2

A. Adelsbach, U. Huber, and A.-R. Sadeghi

The broadcasting sender thus has three security requirements: conﬁdentiality, traceability of content and keys, and renewability of the encryption scheme. The requirements cover two aspects. Conﬁdentiality tries to prevent illegal copies in the ﬁrst place, whereas traceability is a second line of defense aimed at ﬁnding the origin of an illegal copy (content or key). The need for traceability originates from the fact that conﬁdentiality may be compromised in rare cases, e.g., when a few users illegally distribute their secret keys. Renewability ensures that after such rare events, the encryption system can recover from the security breach. In broadcasting systems deployed today, e.g., Content Protection for PreRecorded Media [3] or the Advanced Access Content System [4], conﬁdentiality and renewability often rely on BE because it provides short ciphertexts while at the same time having realistic storage requirements in devices and acceptable computational overhead. Traitor tracing enables traceability of keys, whereas ﬁngerprinting provides traceability of content. Finally, renewability may be achieved using revocation of the leaked keys. However, none of the mentioned cryptographic schemes covers all three security requirements. Some existing BE schemes lack traceability of keys, whereas no practically relevant scheme provides traceability of content [5,6,7,8]. Traitor tracing only provides traceability of keys, but not of content [9,10]. Fingerprinting schemes alone do not provide conﬁdentiality [11]. The original Chameleon cipher provides conﬁdentiality, traceability and a hint on renewability, but with a small constant bound for collusion resistance and, most importantly, without formal proof of security [12]. Asymmetric schemes, which provide each compliant device with a certiﬁcate and accompany content with Certiﬁcate Revocation Lists (CRLs), lack traceability of content and may reach the limits of renewability when CRLs become too large to be processed by real-world devices. Finally, a trivial combination of ﬁngerprinting and encryption leads to an unacceptable transmission overhead because the broadcasting sender needs to sequentially transmit each ﬁngerprinted copy. Our Contribution. We present, to the best of our knowledge, the ﬁrst rigorous security proof of Chameleon ciphers, thus providing a sound foundation for the recent applications of these ciphers, e.g., [13]. Furthermore, we give an explicit criterion to judge the security of the Chameleon cipher’s key table. Our ﬁngercasting approach fulﬁlls all three security requirements at the same time. It is a combination of (i) a new Chameleon cipher based on the ﬁnger printing capabilities of a well-known class of watermarking schemes and (ii) an arbitrary broadcast encryption scheme, which explains the name of the approach. The basic idea is to use the Chameleon cipher for combining decryption and ﬁngerprinting. To achieve renewability, we use a BE scheme to provide fresh session keys as input to the Chameleon scheme. To achieve traceability, we ﬁngerprint the receivers’ key tables such that they embed a ﬁngerprint into the content during decryption. To enable higher collusion resistance than the original Chameleon scheme, we tailor our scheme to emulate any watermarking scheme whose coeﬃcients follow a

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

3

probability distribution that can be disaggregated into additive components.1 As proof of concept, we instantiate the watermarking scheme with Spread Spectrum Watermarking (SSW), which has proven collusion resistance [14,15]. However, we might as well instantiate it with any other such scheme. Joint decryption and ﬁngerprinting has signiﬁcant advantages compared to existing methods such as transmitter-side or receiver-side Fingerprint Embedding (FE) [11]. Transmitter-side FE is the trivial combination of ﬁngerprinting and encryption by the sender. As discussed above, the transmission overhead is in the order of the number of copies to be distributed, which is prohibitive in practical applications. Receiver-side FE happens in the user’s receiver; after distribution of a single encrypted copy of the content, a secure receiver based on tamperresistant hardware is trusted to embed the ﬁngerprint after decryption. This saves bandwidth on the broadcast channel. However, perfect tamper-resistance cannot be achieved under realistic assumptions [16]. An adversary may succeed in extracting the keys of a receiver and subsequently decrypt without embedding a ﬁngerprint. Our ﬁngercasting approach combines the advantages of both methods. It saves bandwidth by broadcasting a single encrypted copy of the content. In addition, it ensures embedding of a ﬁngerprint even if a malicious user succeeds in extracting the decryption keys of a receiver. Furthermore, as long as the number of colluding users remains below a threshold, the colluders can only create decryption keys and content copies that incriminate at least one of them. This paper enhances our extended abstract [1] in the following aspects. First, the extended abstract does not contain the security proof, which is the major contribution. Second, we show here that our instantiation of SSW is exact, whereas the extended abstract only claims this result. Last, we discuss here the trade-oﬀ between storage size and computation cost at the receiving end.

2

Related Work

The original Chameleon cipher of Anderson and Manifavas is 3-collusion-resistant [12]: A collusion of up to 3 malicious users has a negligible chance of creating a good copy that does not incriminate them. Each legitimate user knows the seed of a Pseudo-Random Sequence (PRS) and a long table ﬁlled with random keywords. Based on the sender’s master table, each receiver obtains a slightly diﬀerent table copy, where individual bits in the keywords are modiﬁed in a characteristic way. Interpreting the PRS as a sequence of addresses in the table, the sender adds the corresponding keywords in the master table bitwise modulo 2 in order to mask the plaintext word. The receiver applies the same operation to the ciphertext using its table copy, thus embedding the ﬁngerprint. The original cipher, however, has some inconveniences. Most importantly, it has no formal security analysis and bounds the collusion resistance by the constant number 3, whereas our scheme allows to choose this bound depending on the number of available watermark coefﬁcients. In addition, the original scheme 1

Our scheme does not yet support ﬁngerprints based on coding theory.

4

A. Adelsbach, U. Huber, and A.-R. Sadeghi

limits the content space (and keywords) to strings with characteristic bit positions that may be modiﬁed without visibly altering the content. In contrast, our scheme uses algebraic operations in a group of large order, which enables modiﬁcation of any bit in the keyword and processing of arbitrary documents. Chameleon was inspired by work from Maurer [17,18]. His cipher achieves information-theoretical security in the bounded storage model with high probability. In contrast, Chameleon and our proposed scheme only achieve computational security. The reason is that the master table length in Maurer’s cipher is super-polynomial. As any adversary would need to store most of the table to validate guesses, the bounded storage capacity defeats all attacks with high probability. However, Maurer’s cipher was never intended to provide traceability of content or renewability, but only conﬁdentiality. Ferguson et al. discovered security weaknesses in a randomized stream cipher similar to Chameleon [19]. However, their attack only works for linear sequences of keywords in the master table, not for the PRSs of our proposed solution. Ergun, Kilian, and Kumar prove that an averaging attack with additional Gaussian noise defeats any watermarking scheme [20]. Their bound on the minimum number of diﬀerent content copies needed for the attack asymptotically coincides with the bound on the maximum number of diﬀerent content copies to which the watermarking scheme of Kilian et al. is collusion-resistant [15]. As we can emulate [15] with our ﬁngercasting approach, its collusion resistance is—at least asymptotically—the best we can hope for. Recently there was a great deal of interest in joint ﬁngerprinting and decryption [13,21,22,11,23]. Basically, we can distinguish three strands of work. The ﬁrst strand of work applies Chameleon in diﬀerent application settings. Briscoe et al. introduce Nark, which is an application of the original Chameleon scheme in the context of Internet multicast [13]. However, in contrast to our new Chameleon cipher they neither enhance Chameleon nor analyze its security. The second strand of work tries to achieve joint ﬁngerprinting and decryption by either trusting network nodes to embed ﬁngerprints (Watercasting in [21]) or doubling the size of the ciphertext by sending diﬀerently ﬁngerprinted packets of content [22]. Our proposed solution neither relies on trusted network nodes nor increases the ciphertext size. The third strand of work proposes new joint ﬁngerprinting and decryption processes, but at the price of replacing encryption with scrambling, which does not achieve indistinguishability of ciphertext and has security concerns [11,23]. In contrast, our new Chameleon cipher achieves indistinguishability of ciphertext.

3 3.1

Preliminaries Notation

We recall some standard notations that will be used throughout the paper. First, we denote scalar objects with lower-case variables, e.g., o1 , and object tuples as

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

5

well as roles with upper-case variables, e.g., X1 . When we summarize objects or roles in set notation, we use an upper-case calligraphic variable, e.g., O := {o1 , o2 , . . .} or X := {X1 , X2 , . . .}. Second, let A be an algorithm. By y ← A(x) we denote that y was obtained by running A on input x. If A is deterministic, then y is a variable with a unique value. Conversely, if A is probabilistic, then y is a random variable. For example, by y ← N(μ, σ) we denote that y was obtained by selecting it at random with normal distribution, where μ is the mean and R R σ the standard deviation. Third, o1 ←O and o2 ←[0, z ] denote the selection of a random element of the set O and the interval [0, z ] with uniform distribution. Finally, V · W denotes the dot product of two vectors V := (v1 , . . . , vn ) and n W := (w1 , . . . , wn ), which is deﬁned as V · W := j =1 vj wj , while ||V || denotes √ the Euclidean norm ||V || := V · V . 3.2

Roles and Objects in Our System Model

The (broadcast) center manages the broadcast channel, distributes decryption keys and is fully trusted. The users obtain the content via devices that we refer to as receivers. For example, a receiver may be a set-top box in the context of payTV or a DVD player in movie distribution. We denote the number of receivers with N ; the set of receivers is U := {ui | 1 ≤ i ≤ N }. When a receiver violates the terms and conditions of the application, e.g., leaks its keys or shares content, the center revokes the receiver’s keys and thus makes them useless for decryption purposes. We denote the set of revoked receivers with R := {r1 , r2 , . . .} ⊂ U. We represent broadcast content as a sequence M := (m1 , . . . , mn ) of real numbers in [0, z ], where M is an element of the content space M.2 For example, these numbers may be the n most signiﬁcant coeﬃcients of the Discrete Cosine Transform (DCT) as described in [14]. However, they should not be thought of as a literal description of the underlying content, but as a representation of the values that are to be changed by the watermarking process [20]. We refer to these values as signiﬁcant and to the remainder as insigniﬁcant. In the remainder of this paper, we only refer to the signiﬁcant part of the content, but brieﬂy comment on the insigniﬁcant part in Section 5. 3.3

Cryptographic Building Blocks

Negligible Function. A negligible function f : N → R is a function where the inverse of any polynomial is asymptotically an upper bound: ∀k > 0 ∃λ0 ∀λ > λ0 :

f(λ) < 1/λk

Probabilistic Polynomial Time. A probabilistic polynomial-time algorithm is an algorithm for which there exists a polynomial poly such that for every input x ∈ {0, 1}∗ the algorithm always halts after poly(|x |) steps, independently of the outcome of its internal coin tosses. 2

Although this representation mainly applies to images, we discuss an extension to movies and songs in Section 5.

6

A. Adelsbach, U. Huber, and A.-R. Sadeghi

Pseudo-Random Sequence (PRS). We ﬁrst deﬁne the notion of pseudorandomness and then proceed to deﬁne a Pseudo-Random Sequence Generator (PRSG). For further details we refer to [24, Section 3.3.1]: Deﬁnition 1 (Pseudo-randomness). Let len : N → N be a polynomial such that len(λ) > λ for all λ ∈ N and let Ulen(λ) be a random variable uniformly distributed over the strings {0, 1}len(λ) of length len(λ). Then the random variable X with |X | = len(λ) is called pseudo-random if for every probabilistic polynomialtime distinguisher D, the advantage Adv (λ) is a negligible function: Adv (λ) := Pr [D(X ) = 1] − Pr D(Ulen(λ) ) = 1 Deﬁnition 2 (Pseudo-Random Sequence Generator). A PRSG is a deterministic polynomial-time algorithm G that satisﬁes two requirements: 1. Expansion: There exists a polynomial len : N → N such that len(λ) > λ for all λ ∈ N and |G(str )| = len(|str|) for all str ∈ {0, 1}∗. 2. Pseudo-randomness: The random variable G(Uλ ) is pseudo-random. A PRS is a sequence G(str ) derived from a uniformly distributed random seed str using a PRSG. Chameleon Encryption. To set up a Chameleon scheme CE := (KeyGenCE, KeyExtrCE, EncCE, DecCE, DetectCE), the center generates the secret master table MT , the secret table ﬁngerprints TF := (TF (1) , . . . , TF (N ) ), and selects a threshold t using the key generation algorithm (MT , TF , t ) ← KeyGenCE(N , 1λ , par CE ), where N is the number of receivers, λ a security parameter, and par CE a set of performance parameters. To add receiver ui to the system, the center uses the key extraction algorithm RT (i ) ← KeyExtrCE(MT , TF , i) to deliver the secret receiver table RT (i ) to ui . To encrypt content M exclusively for the receivers in possession of a receiver table RT (i ) and a fresh session key k sess , the center uses the encryption algorithm C ← EncCE(MT , k sess , M ), where the output is the ciphertext C . Only a receiver ui in possession of RT (i ) and k sess is capable of decrypting C and obtaining a ﬁngerprinted copy M (i ) of content M using the decryption algorithm M (i ) ← DecCE(RT (i ) , k sess , C ). When the center discovers an illegal copy M ∗ of content M , it executes DetectCE, which uses the ﬁngerprint detection algorithm DetectFP of the underlying ﬁngerprinting scheme to detect whether RT (i ) left traces in M ∗ . For further details on our notation of a Chameleon scheme, we refer to Appendix C. Fingerprinting. To set up a ﬁngerprinting scheme, the center generates the secret content ﬁngerprints CF := (CF (1) , . . . , CF (N ) ) and the secret similarity threshold t using the setup algorithm (CF , t ) ← SetupFP(N , n , par FP ), where N is the number of receivers, n the number of content coeﬃcients, and par FP a set of performance parameters. To embed the content ﬁngerprint CF (i ) := (cf (1i ) , . . . , cf (ni) ) of receiver ui into the original content M , the center uses the embedding algorithm M (i ) ← EmbedFP(M , CF (i ) ). To verify whether an illegal copy M ∗ of content M contains traces of the content ﬁngerprint CF (i ) of receiver

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

7

ui , the center uses the detection algorithm dec ← DetectFP(M , M ∗ , CF (i ) , t ). It calculates the similarity between the detected ﬁngerprint CF ∗ := M ∗ − M and CF (i ) using a similarity measure. If the similarity is above the threshold t , then the center declares ui guilty (dec = true), otherwise innocent (dec = false). This type of detection algorithm is called non-blind because it needs the original content M as input; the opposite is a blind detection algorithm. We call a ﬁngerprinting scheme additive if the probability distribution ProDis of its coeﬃcients has the following property: Adding two independent random variables that follow ProDis results in a random variable that also follows ProDis. For example, the normal distribution has this property, where the means and variances add up during addition. Spread Spectrum Watermarking (SSW) is an instance of an additive ﬁngerprinting scheme. We describe the SSW scheme of [15], which we later use to achieve collusion resistance. The content ﬁngerprint CF (i ) consists of independent random variables cf (ji ) with normal distribution ProDis = N(0, σ ), where σ is a function fσ (N , n , par FP ). The similarity threshold t is a function ft (σ , N , par FP ). Both functions fσ and ft are speciﬁed in [15]. During EmbedFP, the center adds the ﬁngerprint coeﬃcients to the content coeﬃcients: mj(i ) ← mj + cf (ji ) . The similarity test is Sim(CF ∗ , CF (i ) ) ≥ t with Sim(CF ∗ , CF (i ) ) := (CF ∗ · CF (i ) )/||CF ∗ ||. Finally, the scheme’s security is given by: Theorem 1. [15, Section 3.4] In the SSW scheme with the above parameters, an adversarial coalition needs Ω( n / ln N ) diﬀerently ﬁngerprinted copies of content M to have a non-negligible chance of creating a good copy M ∗ without any coalition member’s ﬁngerprint. For further details on our notation of a ﬁngerprinting scheme and the SSW scheme of [15], we refer to Appendix D. Broadcast Encryption. To set up the scheme, the center generates the secret master key MK using the key generation algorithm MK ← KeyGenBE(N , 1λ ), where N is the number of receivers and λ the security parameter. To add receiver ui to the system, the center uses the key extraction algorithm SK (i ) ← KeyExtrBE(MK , i) to extract the secret key SK (i ) of ui . To encrypt session key k sess exclusively for the non-revoked receivers U \ R, the center uses the encryption algorithm C ← EncBE(MK , R, k sess ), where the output is the ciphertext C . Only a non-revoked receiver ui has a matching private key SK (i ) that allows to decrypt C and obtain k sess using the decryption algorithm k sess ← DecBE(i, SK (i ) , C ). For further details on our notation of a BE scheme, we refer to Appendix E. 3.4

Requirements of a Fingercasting Scheme

Before we enter into the details of our ﬁngercasting approach, we summarize its requirements: correctness, security, collusion resistance, and frame-proofness. To put it simply, the aim of our ﬁngercasting approach is to generically combine an instance of a BE scheme, a Chameleon scheme, and a ﬁngerprinting scheme

8

A. Adelsbach, U. Huber, and A.-R. Sadeghi

such that the combination inherits the security of BE and Chameleon as well as the collusion resistance of ﬁngerprinting. To deﬁne correctness we ﬁrst need to clarify how intrusive a ﬁngerprint may be. For a copy to be good, the ﬁngerprint may not perceptibly deteriorate its quality: Deﬁnition 3 (Goodness). Goodness is a predicate Good : M2 → {true, false} over two messages M1 , M2 ∈ M that evaluates their perceptual diﬀerence. A ﬁngerprinted copy M (i ) is called good if its perceptual diﬀerence to the original content M is below a perceptibility threshold. We denote this with Good(M (i ) , M ) = true. Otherwise, the copy is called bad. Deﬁnition 4 (Correctness). Let p bad 1 be the maximum allowed probability of a bad copy. A ﬁngercasting scheme is correct if the probability for a nonrevoked receiver to obtain a bad copy M (i ) of the content M is at most p bad , where the probability is taken over all coin tosses of the setup and encryption algorithm: ∀M ∈ M, ∀ui ∈ U \ R : Pr Good(M , M (i ) ) = false ≤ p bad √ The SSW scheme of [15] uses the goodness predicate ||M (i ) − M || ≤ n δ, where n is the number of content coeﬃcients and δ a goodness criterion. All relevant BE schemes provide IND-CCA1 security [6,7,8], which is a stronger notion than IND-CPA security. As we aim to achieve at least IND-CPA security, the remaining requirements only relate to the Chameleon scheme CE. We deﬁne IND-CPA security of CE by a game between an IND-CPA adversary A and a challenger C: The challenger runs (MT , TF , t ) ← KeyGenCE(N , 1λ , par CE ), generates a secret random session key k sess and sends (MT , TF , t ) to A. A outputs two content items M0 , M1 ∈ M on which it wishes to be chalR lenged. C picks a random bit b ←{0, 1} and sends the challenge ciphertext Cb ← EncCE(MT , k sess , Mb ) to A. Finally, A outputs a guess b and wins if b = b. We -cpa deﬁne the advantage of A against CE as Advind A,CE (λ ) := |Pr [b = 0|b = 0] − Pr [b = 0|b = 1] |. For further details on security notions we refer to [25]. Deﬁnition 5 (IND-CPA security). A Chameleon scheme CE is IND-CPA secure if for every probabilistic polynomial-time IND-CPA adversary A we have -cpa that Advind A,CE (λ ) is a negligible function. We note that in Deﬁnition 5, the adversary is not an outsider or third party, but an insider in possession of the master table (not only a receiver table). Nevertheless, the adversary should have a negligible advantage in distinguishing the ciphertexts of two messages of his choice as long as the session key remains secret. Collusion resistance is deﬁned by the following game between an adversarial coalition A ⊆ U \ R and a challenger C: The challenger runs KeyGenCE on parameters (N , 1λ , par CE ), generates a ciphertext C ← EncCE(MT , k sess , M ), and gives A the receiver tables RT (i ) of all coalition members as well as the session key k sess . Then A outputs a document copy M ∗ and wins if for all coalition members the detection algorithm fails (false negative):

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

9

Deﬁnition 6 (Collusion resistance). Let DetectFP be the ﬁngerprint detection algorithm of the ﬁngerprinting scheme that a Chameleon scheme CE instantiates. Then CE is (q, p neg )-collusion-resistant if for every probabilistic polynomial-time adversarial coalition A of at most q := |A| colluders we have that Pr Good(M ∗ , M )=true, ∀ui ∈ A : DetectFP(M , M ∗ , CF (i ) , t )=false ≤ p neg , where the false negative probability is taken over the coin tosses of the setup algorithm, of the adversarial coalition A, and of the session key k sess . Note that 1-collusion resistance is also called robustness. Frame-proofness is similar to collusion resistance, but A wins the game if the detection algorithm accuses an innocent user (false positive). Deﬁnition 7 (Frame-proofness). Let DetectFP be the ﬁngerprint detection algorithm of the ﬁngerprinting scheme that a Chameleon scheme CE instantiates. Then CE is (q, p pos )-frame-proof if for every probabilistic polynomial-time adversarial coalition A of at most q := |A| colluders we have that / A : DetectFP(M , M ∗ , CF (i ) , t )=true ≤ p pos , Pr Good(M ∗ , M )=true, ∃ui ∈ where the false positive probability is taken over the coin tosses of the setup algorithm, of the adversarial coalition A, and of the session key k sess . In Deﬁnitions 6 and 7, the adversarial coalition again consists of insiders in possession of their receiver tables and the session key. Nevertheless, the coalition should have a well-deﬁned and small chance of creating a plaintext copy that incriminates none of the coalition members (collusion resistance) or an innocent user outside the coalition (frame-proofness).

4 4.1

Proposed Solution High-Level Overview of the Proposed Fingercasting Scheme

To ﬁngercast content, the center uses the BE scheme to send a fresh session key to each non-revoked receiver. This session key initializes a pseudo-random sequence generator. The resulting pseudo-random sequence represents a sequence of addresses in the master table of our new Chameleon scheme. The center encrypts the content with the master table entries to which the addresses refer. Each receiver has a unique receiver table that diﬀers only slightly from the master table. During decryption, these slight diﬀerences in the receiver table lead to slight, but characteristic diﬀerences in the content copy. Interaction Details. We divide this approach into the same ﬁve steps that we have seen for Chameleon schemes in Section 3.3. First, the key generation algorithm of the ﬁngercasting scheme consists of the key generations algorithms of the two underlying schemes KeyGenBE and KeyGenCE. The center’s master key thus consists of MK , MT and TF . Second, the same observation holds

10

A. Adelsbach, U. Huber, and A.-R. Sadeghi

for the key extraction algorithm of the ﬁngercasting scheme. It consists of the respective algorithms in the two underlying schemes KeyExtrBE and KeyExtrCE. The secret key of receiver ui therefore has two elements: SK (i ) and RT (i ) . Third, the encryption algorithm deﬁnes how we interlock the two underlying schemes. To encrypt, the center generates a fresh and random session key R k sess ←{0, 1}λ. This session key is broadcasted to the non-revoked receivers using the BE scheme: CBE ← EncBE(MK , R, k sess ). Subsequently, the center uses k sess to determine addresses in the master table MT of the Chameleon scheme and encrypts with the corresponding entries: CCE ← EncCE(MT , k sess , M ). The ciphertext of the ﬁngercasting scheme thus has two elements CBE and CCE . Fourth, the decryption algorithm inverts the encryption algorithm with unnoticeable, but characteristic errors. First of all, each non-revoked receiver ui recovers the correct session key: k sess ← DecBE(i, SK (i ) , CBE ). Therefore, ui can recalculate the PRS and the correct addresses in receiver table RT (i ) . However, this receiver table is slightly diﬀerent from the master table. Therefore, ui obtains a ﬁngerprinted copy M (i ) that is slightly diﬀerent from the original content: M (i ) ← DecCE(RT (i ) , k sess , CCE ). Last, the ﬁngerprint detection algorithm of the ﬁngercasting scheme is identical to that of the underlying ﬁngerprinting scheme. 4.2

A New Chameleon Scheme

Up to now, we have focused on the straightforward aspects of our approach; we have neglected the intrinsic diﬃculties and the impact of the requirements on the Chameleon scheme. In the sequel, we will show a speciﬁc Chameleon scheme that fulﬁlls all of them. We design it in such a way that its content ﬁngerprints can emulate any additive ﬁngerprinting scheme, which we later instantiate with the SSW scheme as proof of concept. Key Generation. To deﬁne this algorithm, we need to determine how the center generates the master table MT and the table ﬁngerprints TF . To generate MT , the center chooses L table entries at random from the interval [0, z ] with R independent uniform distribution: mt α ←[0, z ] for all α ∈ {1, . . . , L}. As the table entries will be addressed with bit words, we select L = 2l such that l indicates the number of bits needed to deﬁne the binary address of an entry in the table. The center thus obtains the master table MT := (mt 1 , mt 2 , . . . , mt L ). To generate the table ﬁngerprints TF := (TF (1) , . . . , TF (N ) ), the center selects for each receiver ui and each master table entry mt α a ﬁngerprint coeﬃcient in order to disturb the original entry. Speciﬁcally, each ﬁngerprint coeﬃcient tf (αi ) of table ﬁngerprint TF (i ) is independently distributed according to the probability distribution ProDis of the additive ﬁngerprinting scheme, but scaled down with an attenuation factor f ∈ R, f ≥ 1: tf (αi ) ← 1/f · ProDis(par FP )

(1)

Key Extraction. After the probabilistic key generation algorithm we now describe the deterministic key extraction algorithm. The center processes table

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

11

(a) To derive RT (i ) from MT , the cen- (b) To derive ciphertext C from plaintext ter subtracts the L ﬁngerprint coeﬃcients M , the center uses the session key to gentf (αi ) at address α for all α ∈ {1, . . . , L}. erate a PRS. It then adds the addressed master table entries to the plaintext. Fig. 1. Receiver table derivation and ciphertext calculation

ﬁngerprint TF (i ) := (tf (1i ) , . . . , tf (Li ) ) of receiver ui as follows: The center subtracts each ﬁngerprint coeﬃcient in TF (i ) from the corresponding master table entry to obtain the receiver table entry, which we illustrate in Fig. 1(a): ∀ α ∈ {1, . . . , L} :

rt (αi ) ← mt α − tf (αi ) mod p

(2)

Remark 1. The modulo operator allows only integer values to be added. However, the master table, the table ﬁngerprints and the content coeﬃcients are based on real numbers with ﬁnite precision. We solve this ostensible contradiction by scaling the real values to the integer domain by an appropriate scaling factor ρ, possibly ignoring further decimal digits. ρ must be chosen large enough to allow a computation in the integer domain with a suﬃciently high precision. We implicitly assume this scaling to the integer domain whenever real values are used. For example, with real-valued variables rt (i ) , mt, and tf (i ) the operation rt (i ) ← (mt − tf (i ) ) mod p actually stands for ρ · rt (i ) ← (ρ · mt − ρ · tf (i ) ) mod p. The group order p := ρ · z + 1 is deﬁned by the content space [0, z ] (see Section 3.2) and the scaling factor ρ. Encryption. Fig. 1(b) gives an overview of the encryption algorithm. The session key k sess is used as the seed of a PRSG with expansion function len(|k sess |) ≥ n ·s ·l , where parameter s will be speciﬁed below. To give a practical example for a PRSG, k sess may serve as the key for a conventional block cipher, e.g., AES or

12

A. Adelsbach, U. Huber, and A.-R. Sadeghi

triple DES,3 in output feedback mode. Each block of l bits of the pseudo-random sequence is interpreted as an address β in the master table MT . For each coefﬁcient of the plaintext, the center uses s addresses that deﬁne s entries of the master table. In total, the center obtains n · s addresses that we denote with βj ,k , where j is the coeﬃcient index, k the address index, and Extracti extracts the i-th block of length l from its input string: ∀j ∈ {1, . . . , n}, ∀k ∈ {1, . . . , s} :

βj ,k ← Extract(j −1)s+k (G(k sess ))

(3)

For each content coeﬃcient, the center adds the s master table entries modulo the group order. In Fig. 1(b), we illustrate the case s = 4, which is the design choice in the original Chameleon cipher. The j -th coeﬃcient cj of the ciphertext C is calculated as ∀j ∈ {1, . . . , n} :

s mt βj ,k mod p , cj ← mj +

(4)

k =1

where mt βj ,k denotes the master table entry referenced by address βj ,k from (3). Decryption. The decryption algorithm proceeds in the same way as the encryption algorithm with two exceptions. First, the receiver has to use its receiver table RT (i ) instead of MT . Second, the addition is replaced by subtraction. The j -th coeﬃcient mj(i ) of the plaintext copy M (i ) of receiver ui is thus calculated as s mj(i ) ← cj − rt (βij),k mod p,

(5)

k =1

where rt (βij),k denotes the receiver table entry of receiver ui referenced by address βj ,k generated in (3). As the receiver table RT (i ) slightly diﬀers from the master table MT , the plaintext copy M (i ) obtained by receiver ui slightly diﬀers from the original plaintext M . By appropriately choosing the attenuation factor f in (1), the distortion of M (i ) with respect to M is the same as that of the instantiated ﬁngerprinting scheme and goodness is preserved (see Section 4.3). Fingerprint Detection.When the center detects an illegal copy M ∗ = (m1∗ , . . . , mn∗ ) of content M , it tries to identify the receivers that participated in the generation of M ∗ . To do so, the center veriﬁes whether the ﬁngerprint of a suspect receiver ui is present in M ∗ . Obviously, the ﬁngerprint is unlikely to appear in its original form; an adversary may have modiﬁed it by applying common attacks such as resampling, requantization, compression, cropping, and rotation. Furthermore, the adversary may have applied an arbitrary combination of these known attacks and other yet unknown attacks. Finally, an adversarial coalition may have colluded and created M ∗ using several diﬀerent copies of M . The ﬁngerprint detection algorithm is identical to that of the underlying ﬁngerprinting scheme: dec ← DetectFP(M , M ∗ , CF (i ) , t ). In order to properly scale 3

Advanced Encryption Standard [26] and Data Encryption Standard [27].

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

13

the content ﬁngerprint, we need to select the attenuation factor f in (1). We choose it such that the addition of s attenuated ﬁngerprint coeﬃcients generates a random variable that follows ProDis without attenuation (for an example see Section 4.3). In order to verify whether the table ﬁngerprint TF (i ) of receiver ui left traces in M ∗ , DetectFP calculates the similarity between the detected content ﬁngerprint CF ∗ with coeﬃcients cf ∗j := mj∗ −mj and the content ﬁngerprint CF (i ) in ui ’s copy M (i ) with cf (ji ) := mj(i ) − mj

(4),(5)

=

s s (2) mt βj ,k − rt (βij),k = tf (βij),k ,

k =1

(6)

k =1

where tf (βij),k is the ﬁngerprint coeﬃcient that ﬁngerprinted receiver table RT (i ) at address α = βj ,k in (2). If the similarity is above threshold t , the center declares ui guilty. Note that the calculation of CF ∗ necessitates the original content M , whereas the calculation of CF (i ) relies on the session key k sess and the table ﬁngerprint TF (i ) ; the scheme is thus non-blind in its current version. However, we assume it is possible to design an extended scheme with a blind detection algorithm. If instantiated with Spread Spectrum Watermarking, the watermark is often robust enough to be detected even in the absence of the original content. The same algorithm applies to detection of ﬁngerprints in illegal copies of receiver tables. Their ﬁngerprints have the same construction and statistical properties, where the attenuated amplitude of the ﬁngerprint coeﬃcients in (1) is compensated by a higher number of coeﬃcients, as the relation L/f ≈ n holds for practical parameter choices (see Section 5.1). When the center detects the ﬁngerprint of a certain user in an illegal content copy or an illegal receiver table, there are two potential countermeasures with diﬀerent security and performance tradeoﬀs. One is to simply revoke the user in the BE scheme such that the user’s BE decryption key becomes useless and no longer grants access to the session key. However, the user’s receiver table still allows to decrypt content if yet another user illegally shares the session key. In an Internet age, this is a valid threat as two illegal users may collude such that one user publishes the receiver table (and gets caught) and the other user anonymously publishes the session keys (and doesn’t get caught). Nevertheless, we stress that this weakness, namely the non-traceability of session keys, is common to all revocation BE schemes because the session key is identical for all users and therefore does not allow tracing.4 In order to avoid this weakness, the other potential countermeasure is to not only revoke the user whose receiver table was illegally shared, but also renew the master table and redistribute the new receiver tables. If the broadcast channel has enough spare bandwidth, the center can broadcast the receiver tables individually to all receivers in oﬀ-peak periods, i.e., when the channel’s bandwidth 4

The common assumption for revocation BE schemes is that it is diﬃcult to share the session key anonymously on a large scale without being caught. Even if key sharing may be possible on a small scale, e.g., among family and friends, the main goal is to allow revocation of a user that shared the decryption key or session keys and got caught, no matter by which means of technical or legal tracing.

14

A. Adelsbach, U. Huber, and A.-R. Sadeghi

is not fully used for regular transmission. The relevant BE schemes [6,7,8] allow to encrypt each receiver table individually for the corresponding receiver such that only this receiver can decrypt and obtain access.5 If the broadcast channel’s bandwidth is too low, then the receiver tables need to be redistributed as in the initial setup phase, e.g., via smartcards. Parameter Selection. The new Chameleon scheme has two major parameters L and s that allow a trade-oﬀ between the size of RT (i ) , which ui has to store, and the computation cost, which grows linearly with the number s of addresses per content coeﬃcient in (4). By increasing L, we can decrease s in order to replace computation cost with storage size. Further details follow in Section 5.1. 4.3

Instantiation with Spread Spectrum Watermarking

In this section, we instantiate the ﬁngerprinting scheme with the SSW scheme of [15] and thereby inherit its collusion resistance and frame-proofness. Let the center choose the SSW scheme’s parameters par FP = (δ, p bad , p pos ), which allows to calculate a standard deviation σ and a threshold t via two functions fσ (N , n , δ, p bad ) and ft (σ , N , p pos ) deﬁned in [15]. The probability distribution of the SSW scheme is then ProDis = N(0, σ ). We set f = s because then 1/f · N(0, σ )√in (1) is still a normal distribution with mean 0 and standard deviation 1/ s · σ , and adding s of those variables in (4) and (5) leads to the required random variable with standard deviation σ . It remains to deﬁne the similarity measure for the detection algorithm dec ← DetectFP(M , M ∗ , CF (i ) , t ), which [15] deﬁnes as: dec = true if

CF ∗ · CF (i ) >t ||CF ∗ ||

We call an instantiation exact if it achieves the same statistical properties as the ﬁngerprinting scheme that it instantiates. Theorem 2 below states that the above choice is an exact instantiation of the SSW scheme. Theorem 2. Let σ and σ be the standard deviations of the SSW scheme and the Chameleon scheme instantiated with SSW, respectively, and n and n be their number of content coeﬃcients. Then the following mapping between both schemes is an exact instantiation: √ σ = s · σ (⇔ f = s) and n = n Towards the proof of Theorem 2. We prove an even stronger result than Theorem 2. In addition to the exactness of the instantiation, we also prove that it is optimal to ﬁngerprint every entry of the receiver tables. To do so, we ﬁrst formulate Lemmata 1–4 and then describe why they imply Theorem 2. For 5

In all of these schemes, the center shares with each user an individual secret, which they can use for regular symmetric encryption.

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

15

the Lemmata, we introduce a parameter F ∈ {1, 2, . . . , L} that describes the number of receiver table entries that obtain a ﬁngerprint coeﬃcient tf (αi ) in (2). The position of the F ﬁngerprinted entries in the receiver table is selected with uniform distribution. We show that the choice F = L is optimal in the sense that the resulting instantiation is exact. The diﬃculty in analyzing the SSW instantiation is that each content coeﬃcient is not only ﬁngerprinted with a single ﬁngerprint coeﬃcient as in SSW, but with up to s such variables as can be seen from (6). Note that for F < L some receiver table entries do not receive a ﬁngerprint coeﬃcient and are therefore identical to the master table entry. In order to analyze the statistical properties of the resulting ﬁngerprint, we will need to calculate the expectation and variance of two parameters that link the instantiation to the original SSW scheme. The ﬁrst parameter is the number N fp of ﬁngerprint coeﬃcients tf (i ) that are added to a content coeﬃcient mj by using the receiver table RT (i ) in (5) instead of the master table MT in (4). In SSW, N fp has the constant value 1, i.e., a content ﬁngerprint consists of one ﬁngerprint coeﬃcient per content coeﬃcient, whereas in our scheme N fp varies between 0 and s as shown in (6). If only F of the L receiver table entries have been ﬁngerprinted, then tf (i ) = 0 for the remaining L − F entries. The second parameter is the number of content coeﬃcients that carry a detectable content ﬁngerprint. In SSW, this number has the constant value n , i.e., every coeﬃcient carries a ﬁngerprint with ﬁxed standard deviation, whereas in our scheme, some of the n coeﬃcients may happen to receive no or only few ﬁngerprint coeﬃcients tf (i ) . Speciﬁcally, this happens when the receiver table entry rt (βij),k of (5) did not receive a ﬁngerprint coeﬃcient in (2) for F < L. The next lemma gives the number of normally distributed table ﬁngerprint coeﬃcients that our scheme adds to a content coeﬃcient. This number is a random variable characterized by its expectation and standard variance. We prove the lemmata under the uniform sequence assumption, i.e., the sequence used to select the addresses from the master table has independent uniform distribution. We stress that we only use it to ﬁnd the optimal mapping with SSW; security and collusion resistance of the proposed scheme do not rely on this assumption for the ﬁnal choice of parameters (see the end of this section).6 Lemma 1. Let N fp be the random variable counting the number of ﬁngerprinted receiver table entries with which a coeﬃcient mj(i ) of copy M (i ) is ﬁngerprinted. Then the probability of obtaining k ∈ {0, . . . , s} ﬁngerprinted entries is

s F k F Pr N fp = k = ( ) (1 − )s−k k L L The expectation and the variance of N fp are E(N fp ) = s 6

F L

and

2 fp σN − E(N fp )]2 ) = s fp := E([N

F F (1 − ). L L

Note that even if this was not the case, we can show that the adversary’s advantage is still negligible by a simple reduction argument.

16

A. Adelsbach, U. Huber, and A.-R. Sadeghi

Proof. During decryption, the receiver subtracts s receiver table entries rt (αi ) from the ciphertext coeﬃcient using (5). Each entry rt (αi ) is either ﬁngerprinted or not. Under the uniform sequence assumption, the addresses of the subtracted entries rt (αi ) have independent uniform distribution. In addition, the F ﬁngerprinted entries are distributed over RT (i ) with independent uniform distribution. Therefore, the probability that a single address α = βj ,k in (5) points to a ﬁngerprinted receiver table entry rt (αi ) is F /L, which is the number of ﬁngerprinted receiver table entries divided by the total number of entries. As the underlying experiment is a sequence of s consecutive yes-no experiments with success probability F /L, it follows that N fp has binomial distribution. This implies the probability, the expectation, and the variance. Lemma 1 allows us to determine how many ﬁngerprint coeﬃcients we can expect in each content coeﬃcient and how the number of such ﬁngerprint coeﬃcients varies. The next question is what kind of random variable results from adding N fp ﬁngerprint coeﬃcients. Lemma 2. By adding a number N fp of independent N(0, σ)-distributed ﬁngerprint coeﬃcients, the resulting √random variable has normal distribution with mean 0 and standard deviation N fp σ. Proof. Each ﬁngerprint coeﬃcient is independently distributed according to the normal distribution N(0, σ). When two independent and normally distributed random variables are added, the resulting random variable is also normally distributed, while the means and the variances add up.√Due to linearity, the result√ ing standard deviation for N fp random variables is N fp σ 2 = N fp σ. In order to ﬁngerprint the content coeﬃcients with the same standard deviation σ as in the SSW scheme, the natural choice is to choose σ such that E(N fp )σ = σ . The remaining question is how many content coeﬃcients are actually ﬁngerprinted; note that due to the randomness of N fp , some content coeﬃcients may receive more ﬁngerprint coeﬃcients than others. We determine the expected number of ﬁngerprinted content coeﬃcients in the next two lemmata, while we leave it open how many ﬁngerprint coeﬃcients are needed for detection: fp ∈ {1, . . . , s} be the minimum number of table ﬁngerprint Lemma 3. Let Nmin coeﬃcients needed to obtain a detectable ﬁngerprint in content coeﬃcient mj(i ) . fp Then the probability p ﬁng that coeﬃcient mj(i ) of copy M (i ) obtains at least Nmin ﬁngerprint coeﬃcients is

p ﬁng =

s s F k F ( ) (1 − )s−k k L L fp

k =Nmin

Proof. The lemma is a corollary of Lemma 1 by adding the probabilities of all fp events whose value of N fp is greater than or equal to Nmin .

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

17

Lemma 4. Let N ﬁng ∈ {0, . . . , n} be the random variable counting the number of ﬁngerprinted coeﬃcients. Then the expectation of N ﬁng is E(N

ﬁng

n n )= j (p ﬁng )j (1 − p ﬁng )n−j = np ﬁng j j =0

Proof. The lemma follows from the fact that N ﬁng has binomial distribution with success probability p ﬁng and n experiments. Given Lemmata 1–4 we can derive some of the parameters in our scheme from SSW. Suppose that the center has already selected the parameters of the SSW scheme such that the requirements on the number of receivers and collusion resistance are met. This includes the choice of N , n , and par FP = par CE := (δ, p bad , p pos ); it allows to derive σ and t of SSW based on the functions fσ (N , n , δ, p bad ) and ft (σ , N , p pos ), which are deﬁned in [15]. √ Based on the center’s selection, we can derive the parameters n, F /L, and s · σ in our Chameleon scheme as follows. Our ﬁrst aim is to achieve the same expected standard deviation in the content coeﬃcients of our scheme as in SSW, i.e., σ = E(N fp ) · σ, which by Lemma 1 leads to σ = sF /L · σ. Our second aim is to minimize the variance of N fp in order to have N fp = E(N fp ) not only on average, but for as many content coeﬃcients as possible, where N fp = E(N fp ) implies that the content coeﬃcient in our scheme obtains a ﬁngerprint with the 2 same statistical properties as in SSW. The two minima of σN fp = s·F /L·(1−F /L) are F /L = 0 and F /L = 1, of which only the second is meaningful. F /L = 1 or F = L is the case where all entries of the master table are ﬁngerprinted. As 2 fp = s, the content this optimum case leads to a variance of σN fp = 0 and N coeﬃcients of our scheme and SSW have the same statistical properties. This proves Theorem 2 and the claim should be ﬁngerprinted. √ that all tables entries With F /L = 1 and σ = s · σ, we obtain Pr N fp = s = 1 by Lemma 1 and p ﬁng = 1 by Lemma 3. Finally, we conclude that E(N ﬁng ) = n · p ﬁng = n by Lemma 4 and set E(N ﬁng ) = n = n . We stress that the equalities hold even if we replace the uniform sequence with a pseudo-random sequence; for F = L the equations N fp = s and N ﬁng = n are obviously independent of the uniform distribution of the sequence of addresses in the master table. We note that the number s of addresses per content coeﬃcient, introduced in (4), is still undetermined and may be chosen according to the security requirements (see Section 4.4). 4.4

Analysis

Correctness, Collusion Resistance and Frame-Proofness. Correctness follows from the correctness of the two underlying schemes, i.e., the BE scheme and the Chameleon scheme. Correctness of the Chameleon scheme follows from the correctness of the underlying ﬁngerprinting scheme, which we can instantiate exactly by properly choosing the scaling factor in (1) and thus making the content ﬁngerprint of (6) identical to a ﬁngerprint of the instantiated ﬁngerprinting

18

A. Adelsbach, U. Huber, and A.-R. Sadeghi

scheme. Collusion resistance and frame-proofness of content and receiver tables follows from the collusion resistance and frame-proofness of the instantiated ﬁngerprinting scheme. The mapping in Section 4.3 is an exact instantiation of the SSW scheme and therefore inherits its collusion resistance and frame-proofness (see Theorem 1). We note that the proof of Theorem 1, which appears in [15], covers both collusion resistance and frame-proofness, although the original text of the theorem only seems to cover collusion resistance. Collusion resistance, related to false negatives, is shown in [15, Section 3.4], whereas frame-proofness, related to false positives, is shown in [15, Section 3.2]. IND-CPA Security. We reduce the security of our Chameleon scheme to that of the PRSG with which it is instantiated. In order to prove IND-CPA security, we prove that the key stream produced by the Chameleon scheme is pseudorandom (see Deﬁnition 1). IND-CPA security of the proposed scheme follows by a simple reduction argument (see [28, Section 5.3.1]). To further strengthen the proof, we assume that the adversary is in possession of the master table and all receiver tables, although in practice the adversary only has one or several receiver tables. By scaling the real values of the content coeﬃcients to the integer domain (see Remark 1), we obtain a plaintext symbol space P with a cardinality Z deﬁned by the content and the scaling factor ρ. In the remainder of this section we assume that the plaintext symbol space P and the key symbol space K are equal to {0, 1, . . . , Z − 1}. We make this assumption to simplify our notation, but stress that this is no restriction, as there is a one-to-one mapping between the actual plaintext symbol space [0, z ] and the scaled space {0, 1, . . . , Z − 1}, which enumerates the elements of [0, z ] starting from 0.7 In the sequel, by key symbols we mean the elements of K. We also note that the obvious choice for the group order p is the size of the symbol space: p = |K| = Z . This ensures identical size of plaintext and ciphertext space. The proof is divided into 4 major steps. First, we show the properties of the random variable that results from a single draw from the master table (Lemma 5). Second, we deﬁne these properties as the starting point of an iteration on the number s of draws from the master table (Deﬁnition 8). Third, we prove that the random variable that results from adding randomly drawn master table entries improves with every draw, where improving means being statistically closer to a truly random variable (Lemma 6). Last, we prove the pseudo-randomness of the Chameleon scheme’s key stream (Theorem 3). Lemma 5. Let Pr X (1) = x denote the probability of drawing the key symbol x ∈ K in a single draw from master table MT . Let ηk ∈ {0, 1, . . . , L} denote the number of times that key symbol xk ∈ K appears in MT . When we select a master table entry at a random address with uniform then the distribution, probability of obtaining key symbol xk ∈ K is pk := Pr X (1) = xk = ηLk . 7

Note that [0, z ] consists of real numbers with ﬁnite precision. As pointed out in Remark 1 these real numbers are mapped to integers by applying a scaling factor ρ.

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

19

Proof. There are L entries in the master table. Due to the uniform distribution of the selected address, each master table entry has the same probability of being selected. Therefore, the probability of a speciﬁc key symbol xk ∈ K being selected is the number ηk of occurrences of xk in the master table divided by the total number L of master table entries. For a single draw from the master table, the resulting random variable thus only depends on the number of occurrences of the key symbols within the master table. As the master table entries are generated with uniform distribution, the frequencies are unlikely to be identical for each key symbol, leading to a nonuniform and therefore insecure distribution Pr X (1) . Deﬁnition 8 (Strong convergence). Let U be a random variable uniformly (1) distributed over the key symbol space. Let the statistical quality SQ 1of MT (1) 1 Z −1 (1) be the statistical diﬀerence between X and U : SQ := 2 k =0 pk − Z . We call the master table strongly converging if 2SQ (1) ≤ d for some d ∈ R such that d < 1. The statistical quality SQ (1) is thus a measure for the initial suitability of the master table for generating a uniform distribution. The next lemma is the main result of the security analysis; it proves that the statistical quality SQ (s ) gets better with every of the s draws. Lemma 6. Let U be a random variable uniformly distributed over the key symbol space. Let MT be a strongly converging master table. Let Xk denote the k -th draw from MT and X (s ) the random variable resulting s from s independent uniformly distributed draws added modulo Z : X (s ) := k =1 Xk mod Z . Then the statistical diﬀerence SQ (s ) between X (s ) and U is a negligible function with an upper bound of 12 d s . Proof. The proof is by induction. For all k ∈ K, let pk(i ) := Pr X (i ) = k denote the probability of the event that in the i-th iteration the random variable with an adX (i ) takes the value of key symbol k . Represent this probability Z −1 ditive error ek(i ) such that pk(i ) = Z1 (1 + ek(i ) ). Due to k =0 pk(i ) = 1, we obZ −1 tain k =0 ek(i ) = 0. The induction start is trivially fulﬁlled by every strongly converging master table: SQ (1) ≤ 12 d . As the induction hypothesis, we have Z −1 Z −1 (i ) 1 SQ (i ) ≤ 12 d i , where SQ (i ) := 12 k =0 |pk(i ) − Z1 | = 2Z k =0 |ek |. The induction (i+1) 1 i+1 claim is SQ ≤ d . The induction proof follows: Iteration i + 1 is deﬁned i+1 2 as X (i+1) := k =1 Xk mod Z , which is equal to X (i+1) = X (i ) + Xi+1 mod Z , where Xi+1 is just a single draw with the probabilities pk from Lemma 5 and erZ −1 ror representation pk = Z1 (1 + ek ) such that k =0 ek = 0. Therefore, we obtain for all k ∈ K that −1 Z Pr X (i+1) = k = Pr X (i ) = j · Pr [Xi+1 = (k − j ) mod Z ] j =0

20

A. Adelsbach, U. Huber, and A.-R. Sadeghi

=

Z −1 j =0

1 = 2 Z

pj(i ) p(k −j ) mod Z =

Z −1 j =0

1+

Z −1 j =0

(i )

Z −1 1 (1 + ej(i ) )(1 + e(k −j ) mod Z ) Z 2 j =0

ej +

Z −1

e(k −j ) mod Z +

j =0

=0

=

Z −1 j =0

(i )

ej e(k −j ) mod Z

=0

Z −1 1 1 (i ) + 2 e e(k −j ) mod Z Z Z j =0 j

The upper bound for the statistical diﬀerence in iteration i + 1 is Z −1 Z −1 −1 1 1 Z 1 1 (i+1) ( i ) (i+1) SQ := = k − = ej e(k −j ) mod Z Pr X 2 Z 2 Z 2 j =0 k =0 k =0 Z −1 Z −1 (i ) 1 1 ≤ |ek | = 2SQ (i ) SQ (1) ≤ d i+1 , ek 2 2Z 2 k =0

k =0

where the ﬁrst inequality follows from the fact that the two sums on the left-hand side run over every combination of ej(i ) e(k −j ) mod Z , which may have opposite signs, whereas the right-hand side adds the absolute values of all combinations, avoiding any mutual elimination of combinations with opposite signs. Note that the proof relies on the uniform sequence assumption, i.e., the addresses used to point into the master table have independent uniform distribution. Clearly, this assumption has to be slightly weakened in practice by replacing true randomness with pseudo-randomness. In Theorem 3 we therefore show that we can use pseudo-randomness without compromising security. The idea is to reduce an attack on the Chameleon key stream to an attack on the PRSG itself: Theorem 3. Let U be a random variable uniformly distributed over the key symbol space. Let MT be a strongly converging master table. Let the number s(λ ) of draws from MT be a polynomial function of the security parameter λ of CE such that the statistical diﬀerence SQ (s ) (λ ) between X (s ) and U is a negligible function under the uniform sequence assumption. Then even after replacement of the uniform sequence of addresses with a PRS, no probabilistic polynomial-time adversary can distinguish the pseudo-random key stream consisting of variables X (s ) from a truly random key stream with variables U . Before we enter into the details of the proof, we clarify the attack goal, the adversary’s capabilities, and the criteria for a successful break of (i) a PRSG and (ii) the pseudo-randomness of our Chameleon scheme’s key stream: (i) The goal of an adversary A attacking a PRSG is to distinguish the output of G on a random seed from a random string of identical length (see Deﬁnition 2).

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

21

A’s capabilities are limited to a probabilistic Turing machine whose running time is polynomially bounded in the length of its input (and thus also in the security parameter λ, which is deﬁnes the input length). A successful break is R deﬁned as follows: The challenger C generates a random seed str ←{0, 1}λ and R a random string str 1 ←{0, 1}len(λ) with uniform distribution. C then applies the R PRSG to str and obtains str 0 ← G(str ). Finally, C tosses a coin b ←{0, 1} with uniform distribution and sends str b to A. The challenge for A is to distinguish the two cases, i.e., guess whether str b was generated with the PRSG (b = 0) or the uniform distribution (b = 1). A wins if the guess b is equal to b. The advantage of A is deﬁned as: Adv (λ) := |Pr [b = 0|b = 0] − Pr [b = 0|b = 1]| ,

(7)

where the randomness is taken over all coin tosses of C and A. (ii) The goal of adversary A attacking the pseudo-randomness of the Chameleon scheme’s key stream is to distinguish n instances of X (s ) from a truly random key stream. A is limited to a probabilistic Turing machine whose running time is polynomially bounded in the length of its input (and thus also in the security parameter λ , as this input is given in unary representation). A successful break is deﬁned as follows: The challenger C generates a stream of n random keys: R K1 := (k1,1 , . . . , k1,n ) such that k1,j ←K for all j ∈ {1, . . . , n}. Next, C generR ates a random seed str ←{0, 1}λ and a strongly converging master table MT . Then C applies the PRSG to str in order to obtain a pseudo-random sequence of length len(λ) ≥ n · s · l , which is interpreted as a sequence of n · s addresses in the master table. Subsequently, C adds for each content coeﬃcient mj the corresponding s master table entries modulo Z to obtain the other key stream candidate: K0 := (k0,1 , . . . , k0,n ) such that k0,j ← sk =1 mt βj ,k mod Z . Finally, C R tosses a coin b ←{0, 1} with uniform distribution and sends key stream candidate Kb to A. The challenge for A is to distinguish the two cases, i.e., guess whether Kb was generated with the Chameleon scheme (b = 0) or the uniform distribution (b = 1). A wins if the guess b is equal to b. The advantage is analogous to (7). After deﬁnition of the attack games, we give the full proof of Theorem 3: Proof. The proof is by contradiction. Assuming that the advantage of an adversary A against the pseudo-randomness of the Chameleon scheme’s key stream is not negligible, we construct a distinguisher A for the PRSG itself, contradicting the assumptions on the PRSG from Deﬁnition 2. We show the individual steps of constructing A in Fig. 2. R

1. The challenger C generates a random seed str ←{0, 1}λ and a random string R str 1 ←{0, 1}len(λ) with uniform distribution. C then applies the PRSG to str : R str 0 ← G(str ). Finally, C tosses a coin b ←{0, 1} with uniform distribution. 2. C sends str b to A . A needs to guess b.

22

A. Adelsbach, U. Huber, and A.-R. Sadeghi

Adversary A

Challenger C

Adversary A

1) 2) −−−−−−− → − 3) 4) −−−−−−− → −

5)

6) −−−−−−−− ← 7) 8) −−−−−−−− ← Fig. 2. Construction of adversary A based on adversary A

3. A generates a strongly converging master table MT . Then A takes the string str b of length len(λ) ≥ n · s · l and interprets it as a sequence of n · s addresses in the master table according to (3). Subsequently, A adds for each content coeﬃcient mj the corresponding s master table entries modulo Z to obtain a key stream Kb := (kb,1 , . . . , kb,n ) such that kb,j ← s mt βj ,k mod Z . k =1 4. A sends the key stream Kb to A as a challenge. 5. A calculates the guess b , where b = 0 represents the random case, i.e., A guesses that Kb is a truly random key stream, and b = 1 represents the pseudo-random case, i.e., A guesses that Kb was generated with the Chameleon scheme. 6. A sends the guess b to A . 7. A copies A’s guess. 8. A sends b to C as a guess for b. To ﬁnish the proof, we need to show that if the advantage of A against the pseudo-randomness of the Chameleon key stream is not negligible, then the advantage of A against the PRSG is not negligible. We prove this by bounding the probability diﬀerences in the real attack scenario, where A is given input by a correct challenger, and the simulated attack, where A is given slightly incorrect input by A . The contradictive assumption is that A’s advantage against the Chameleon encryption scheme is not negligible in the real attack: real Pr [b = 0|b = 0] − Prreal [b = 0|b = 1] ≥ CE (λ ) , where Prreal [ ] denotes probabilities in the real attack between a Chameleon challenger and a Chameleon adversary A and CE (λ ) is A’s advantage, which is not negligible. The randomness is taken over all coin tosses of C and A.

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

23

Next, we summarize the input to A in the real attack and the simulated attack. In the real attack, A obtains either the key stream output K0 of the Chameleon scheme on a truly random seed str (b = 0), or a truly random key stream K1 (b = 1). Speciﬁcally, the key stream element k0,j of K0 is equal to s k0,j = k =1 mt βj ,k mod Z , where the truly random seed str determines the addresses of the master table entries mt j via the PRSG according to (3). In the simulated attack, A does not apply the PRSG and instead uses the challenge str b as a shortcut. A obtains either the key stream output K0 of the Chameleon scheme executed on a pseudo-random string str 0 , derived from a truly random seed str (b = 0), or the key stream output K1 of the Chameleon scheme executed on a truly random string str 1 (b = 1). The key stream outputs K0 and K1 in the simulated attack thus only diﬀer by the fact that K0 comes from a pseudo-random string and K1 from a truly random string. There is no diﬀerence between real and simulated attack for b = 0. The key stream outputs K0real and K0sim both come from a PRSG executed on a truly random seed str, leading to the following relation: real Pr [b = 0|b = 0] − Prsim [b = 0|b = 0] = 0 , where the randomness is taken over all coin tosses of C and A in the real attack and those of C, A and A in the simulated attack. For b = 1 and a real attack, A obtains a truly random key stream K1real . In the simulated attack, A operates on a truly random string str 1 that determines n · s addresses according to (3). As str 1 is truly random, the n · s addresses are also truly random with independent uniform distribution. Combined with the assumptions of the theorem, this implies that each pair of key stream elements in real and simulated attack has a negligible statistical diﬀerence. Negligible statistical diﬀerence implies polynomial-time indistinguishability [24, Section 3.2.2]. Let diﬀ (λ ) be the corresponding negligible bound on the advantage of a distinguisher, which applies for one key stream element. Then the diﬀerence between both attacks for all n key stream elements has a negligible upper bound n · diﬀ (λ ): real Pr [b = 0|b = 1] − Prsim [b = 0|b = 1] ≤ n · diﬀ (λ ) , where the randomness is taken over all coin tosses of C and A in the real attack and those of C, A and A in the simulated attack. The last three inequalities lead to a lower bound for the success probability of A in the simulated attack, which is also the success probability of A in the attack against the PRSG: sim Pr [b = 0|b = 0] − Prsim [b = 0|b = 1] ≥ CE (λ ) − n · diﬀ (λ ) As CE (λ ) is not negligible by the contradictive assumption, diﬀ (λ ) is negligible by the negligible statistical diﬀerence and n is a constant, we conclude that the

24

A. Adelsbach, U. Huber, and A.-R. Sadeghi

success probability of A against the PRSG is not negligible, completing the contradiction and the proof.

5

Implementation

The master table MT obviously becomes strongly converging for suﬃciently large L. Our simulation shows that L = 4Z gives high assurance of strong convergence. However, lower values still lead to weak convergence in the sense that it is not proven by our upper bound, but can easily be veriﬁed numerically. As discussed in Section 4.2 we need to choose the number s of draws from MT in accordance with L. The upper bound in Theorem 6 is too conservative to choose s in practice. Our simulation shows that the statistical diﬀerence SQ (s ) not only decreases with factor d ≈ 2SQ (1) < 1, but with an even smaller factor. This is due to the fact that some of the combinations ej(i ) e(k −j ) mod Z on the left-hand side of the inequality in the proof of Lemma 6 cancel out. In Appendix F we therefore give an explicit formula for calculation of the exact statistical diﬀerence after s draws from MT . The center can thus generate MT with arbitrary length L, numerically verify convergence and determine the minimum number of draws smin that provides the desired statistical diﬀerence. The content representation can be extended to cover movies and songs by interpreting them as a sequence of content items. A straightforward approach is to regularly refresh the session key. While further reﬁnements are possible, aiming to prevent sequence-speciﬁc attacks such as averaging across movie frames, they are beyond the scope of this document. However, it remains to deﬁne how the insigniﬁcant part of the content should be processed (see Section 3.2). There are three obvious options: sending it in the clear, passing it through our scheme or encrypting it separately. Note that by its very deﬁnition, this part does not give signiﬁcant information about the content and was not watermarked because the coeﬃcients do not have perceptible inﬂuence on the reassembled content. The easiest option is thus to pass them through the proposed scheme, which does not inﬂuence goodness and maintains conﬁdentiality of the content. At ﬁrst sight our proposed scheme trivially fulﬁlls the correctness requirement (see Deﬁnition 4) due to the correctness of the SSW scheme. However, both schemes face diﬃculties in the rare event that a content coeﬃcient is at the lower or upper end of the interval [0, z ], which corresponds with plaintext symbols close to 0 or Z − 1. If the additive ﬁngerprint coeﬃcient causes a trespass of lower or upper bound, the SSW scheme needs to decrease the coeﬃcient’s amplitude and round to the corresponding bound. Similarly, our scheme must avoid a wrap-around in the additive group, e.g., when plaintext symbol Z − 2 obtains a coeﬃcient of +3 and ends up at 1 after decryption. There are many options with diﬀerent security trade-oﬀs, such as sending a ﬂag or even sending the coeﬃcient in cleartext; the appropriate choice depends on further requirements of the implementation. Note that the center trivially anticipates the occurrence of a wrap-around from inspecting the content coeﬃcients.

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

5.1

25

Eﬃciency

Three performance parameters determine whether the proposed scheme is eﬃcient and implementable: transmission overhead, storage size of a receiver, and computation cost. We stress that our scheme enables a tradeoﬀ between storage size and computation cost. Increasing the size L of the master table (and thus the storage size) decreases the necessary number s of draws (and thus the computation cost), as can be seen from Lemma 6 and Deﬁnition 8, where SQ (1) and thus d decreases with L. This feature allows us to adapt the scheme to the particular constraints of the receiver, in particular to decrease s. The transmission overhead of the Chameleon scheme is 0 if the master table and receiver tables are not renewed on a regular basis. In this scenario, the Chameleon scheme’s transmission overhead is 0 because ciphertext and cleartext use the same symbol space and thus have the same length; the transmission overhead of ﬁngercasting is thus determined by that of the broadcast encryption scheme, which is moderate [5,6,7,8].8 For the storage size, we highlight the parameters of a computation-intensive implementation. Let the content be an image with n = 10, 000 signiﬁcant coeﬃcients of 16 bit length, such that Z = 216 . By testing several lengths L of the master table MT , we found a statistical quality of SQ (1) = d /2 < 1/8 for L = 8 · Z = 8 · 216 = 219 = 2l . A receiver table thus has 219 · 16 = 223 bit or 220 Byte = 210 kByte = 1 MByte, which seems acceptable in practice. The computation cost depends mostly on the number s of draws from the master table. To achieve a small statistical diﬀerence SQ (s ) , e.g., below 2−128 , we choose s = 64 and therefore SQ (s ) < 1/2 · d s = 2−1 · 2−2·64 = 2−129 by the conservative upper bound of Lemma 6. Compared to a conventional stream cipher that encrypts n·log2 Z bits, a receiver has to generate n·s·l pseudo-random bits, which is an overhead of (s · l )/ log2 Z = 76. To generate the pseudo-random key stream, the receiver has to perform n ·s table lookups and n ·(s + 1) modular operations in a group of size 216 . In further tests, we also found a more storage-intensive implementation with L = 225 and s = 25, which leads to 64 MBytes of storage and an overhead of (s · l )/ log2 Z ≈ 39. By calculating the exact statistical diﬀerence of Appendix F instead of the conservative upper bound of Lemma 6, s decreases further, but we are currently unaware of any direct formula to calculate s based on a master table length L and a desired statistical diﬀerence SQ (s ) (or vice versa). If the security requirements of an implementation require a regular renewal of the master table and the subsequent redistribution of the receiver tables, then the transmission overhead obviously increases. For each redistribution, the total key material to be transmitted has the size of the master table times the number of receivers. As mentioned before, a redistribution channel then becomes necessary if the broadcast channel does not have enough spare bandwidth. 8

For example, this overhead is far smaller than that of the trivial solution, which consists of sequentially sending an individually ﬁngerprinted copy of the content individually encrypted over the broadcast channel.

26

6

A. Adelsbach, U. Huber, and A.-R. Sadeghi

Conclusion and Open Problems

In this document we gave a formal proof of the security of a new Chameleon cipher. Applied to a generic ﬁngercasting approach, it provides conﬁdentiality of ciphertext, traceability of content and keys as well as renewability. We achieved conﬁdentiality through a combination of a generic broadcast encryption (BE) scheme and the new Chameleon cipher. The BE scheme provides a fresh session key, which the Chameleon scheme uses to generate a pseudo-random key stream. The pseudo-random key stream arises from adding key symbols at pseudo-random addresses in a long master table, initially ﬁlled with random key symbols. We have reduced the security of the pseudo-random key stream to that of a pseudo-random sequence generator. In addition, we achieved traceability of keys and content through embedding of a receiver-speciﬁc ﬁngerprint into the master table copies, which are given to the receivers. During decryption, these ﬁngerprints are inevitably embedded into the content, enabling the tracing of malicious users. We achieve the same collusion resistance as an exemplary watermarking scheme with proven security bound. It may be replaced with any ﬁngerprinting scheme whose watermarks can be decomposed into additive components. Finally, we achieved renewability through revocation, which is performed in the BE scheme. Two open problems are the most promising for future work. First of all, the detection algorithm should be extended in order to allow blind detection of a watermark even in the absence of the original content. Another open problem is to combine Chameleon encryption with a code-based ﬁngerprinting scheme in the sense of Boneh and Shaw [29]. The master table in Chameleon would need to embed components of codewords in such a way that a codeword is embedded into the content.

References 1. Adelsbach, A., Huber, U., Sadeghi, A.R.: Fingercasting—joint ﬁngerprinting and decryption of broadcast messages. Tenth Australasian Conference on Information Security and Privacy—ACISP 2006, Melbourne, Australia, July 3-5, 2006. Volume 4058 of Lecture Notes in Computer Science, Springer (2006) 2. Touretzky, D.S.: Gallery of CSS descramblers. Webpage, Computer Science Department of Carnegie Mellon University (2000) URL http://www.cs.cmu.edu/ ~dst/DeCSS/Gallery (November 17, 2005). 3. 4C Entity, LLC: CPPM speciﬁcation—introduction and common cryptographic elements. Speciﬁcation Revision 1.0 (2003) URL http://www.4centity.com/data/ tech/spec/cppm-base100.pdf. 4. AACS Licensing Administrator: Advanced access content system (AACS): Introduction and common cryptographic elements. Speciﬁcation Revision 0.90 (2005) URL http://www.aacsla.com/specifications/AACS Spec-Common 0.90.pdf. 5. Fiat, A., Naor, M.: Broadcast encryption. In Stinson, D.R., ed.: CRYPTO 1993. Volume 773 of Lecture Notes in Computer Science, Springer (1994) 480–491 6. Naor, D., Naor, M., Lotspiech, J.: Revocation and tracing schemes for stateless receivers. In Kilian, J., ed.: CRYPTO 2001. Volume 2139 of Lecture Notes in Computer Science, Springer (2001) 41–62

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

27

7. Halevy, D., Shamir, A.: The LSD broadcast encryption scheme. In Yung, M., ed.: CRYPTO 2002. Volume 2442 of Lecture Notes in Computer Science, Springer (2002) 47–60 8. Jho, N.S., Hwang, J.Y., Cheon, J.H., Kim, M.H., Lee, D.H., Yoo, E.S.: One-way chain based broadcast encryption schemes. In Cramer, R., ed.: EUROCRYPT 2005. Volume 3494 of Lecture Notes in Computer Science, Springer (2005) 559–574 9. Chor, B., Fiat, A., Naor, M.: Tracing traitors. In Desmedt, Y., ed.: CRYPTO 1994. Volume 839 of Lecture Notes in Computer Science, Springer (1994) 257–270 10. Naor, M., Pinkas, B.: Threshold traitor tracing. In Krawczyk, H., ed.: CRYPTO 1998. Volume 1462 of Lecture Notes in Computer Science, Springer (1998) 502–517 11. Kundur, D., Karthik, K.: Video ﬁngerprinting and encryption principles for digital rights management. Proceedings of the IEEE 92(6) (2004) 918–932 12. Anderson, R.J., Manifavas, C.: Chameleon—a new kind of stream cipher. In Biham, E., ed.: FSE 1997. Volume 1267 of Lecture Notes in Computer Science, Springer (1997) 107–113 13. Briscoe, B., Fairman, I.: Nark: Receiver-based multicast non-repudiation and key management. In: ACM EC 1999, ACM Press (1999) 22–30 14. Cox, I.J., Kilian, J., Leighton, T., Shamoon, T.: Secure spread spectrum watermarking for multimedia. IEEE Transactions on Image Processing 6(12) (1997) 1673–1687 15. Kilian, J., Leighton, F.T., Matheson, L.R., Shamoon, T.G., Tarjan, R.E., Zane, F.: Resistance of digital watermarks to collusive attacks. Technical Report TR-585-98, Princeton University, Department of Computer Science (1998) URL: ftp://ftp.cs.princeton.edu/techreports/1998/585.ps.gz. 16. Anderson, R.J., Kuhn, M.: Tamper resistance—a cautionary note. In Tygar, D., ed.: USENIX Electronic Commerce 1996, USENIX (1996) 1–11 17. Maurer, U.M.: A provably-secure strongly-randomized cipher. In Damg˚ ard, I., ed.: EUROCRYPT 1990. Volume 473 of Lecture Notes in Computer Science, Springer (1990) 361–373 18. Maurer, U.: Conditionally-perfect secrecy and a provably-secure randomized cipher. Journal of Cryptology 5(1) (1992) 53–66 19. Ferguson, N., Schneier, B., Wagner, D.: Security weaknesses in a randomized stream cipher. In Dawson, E., Clark, A., Boyd, C., eds.: ACISP 2000. Volume 1841 of Lecture Notes in Computer Science, Springer (2000) 234–241 20. Erg¨ un, F., Kilian, J., Kumar, R.: A note on the limits of collusion-resistant watermarks. In Stern, J., ed.: EUROCRYPT 1999. Volume 1592 of Lecture Notes in Computer Science, Springer (1999) 140–149 21. Brown, I., Perkins, C., Crowcroft, J.: Watercasting: Distributed watermarking of multicast media. In Rizzo, L., Fdida, S., eds.: Networked Group Communication 1999. Volume 1736 of Lecture Notes in Computer Science, Springer (1999) 286–300 22. Parviainen, R., Parnes, P.: Large scale distributed watermarking of multicast media through encryption. In Steinmetz, R., Dittmann, J., Steinebach, M., eds.: Communications and Multimedia Security (CMS 2001). Volume 192 of IFIP Conference Proceedings., International Federation for Information Processing, Communications and Multimedia Security (IFIP), Kluwer (2001) 149–158 23. Luh, W., Kundur, D.: New paradigms for eﬀective multicasting and ﬁngerprinting of entertainment media. IEEE Communications Magazine 43(5) (2005) 77–84 24. Goldreich, O.: Basic Tools. First edn. Volume 1 of Foundations of Cryptography. Cambridge University Press, Cambridge, UK (2001)

28

A. Adelsbach, U. Huber, and A.-R. Sadeghi

25. Bellare, M., Namprempre, C.: Authenticated encryption: Relations among notions and analysis of the generic composition paradigm. In Okamoto, T., ed.: ASIACRYPT 2000. Volume 1976 of Lecture Notes in Computer Science, Springer (2000) 531–545 26. National Institute of Standards and Technology, Announcing the Advanced Encryption Standard (AES), Federal Information Processing Standards Publication FIPS PUB 197, November 26, 2001, URL http://csrc.nist.gov/ publications/fips/fips197/fips-197.pdf . 27. National Institute of Standards and Technology, Data Encryption Standard (DES), Federal Information Processing Standards Publication FIPS PUB 46-3, October 25, 1999, URL http://csrc.nist.gov/publications/fips/fips46-3/ fips46-3.pdf . 28. Goldreich, O.: Basic Applications. First edn. Volume 2 of Foundations of Cryptography. Cambridge University Press, Cambridge, UK (2004) 29. Boneh, D., Shaw, J.: Collusion-secure ﬁngerprinting for digital data (extended abstract). In Coppersmith, D., ed.: CRYPTO 1995. Volume 963 of Lecture Notes in Computer Science, Springer (1995) 452–465

A

Abbreviations

Table 1 summarizes all abbreviations used in this document. Table 1. Abbreviations used in this document Abbreviation

B

Abbreviated Technical Term

AACS

Advanced Access Content System

AES

Advanced Encryption Standard

BE

Broadcast Encryption

CPPM

Content Protection for Pre-Recorded Media

CRL

Certiﬁcate Revocation List

CSS

Content Scrambling System

DCT

Discrete Cosine Transform

DES

Data Encryption Standard

DVD

Digital Versatile Disc

FE

Fingerprint Embedding

PRS

Pseudo-Random Sequence

PRSG

Pseudo-Random Sequence Generator

SSW

Spread Spectrum Watermarking

TV

Television

Summary of Relevant Parameters

Table 2 summarizes all parameters of our ﬁngercasting approach and the underlying ﬁngerprinting scheme, which we instantiate with the SSW scheme of [15].

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

Table 2. Parameters of the proposed ﬁngercasting scheme and the SSW scheme Parameter N ui q M mj n n CF (i ) cf (j i ) M∗ CF ∗ C cj k sess MT α mt α TF (i ) tf (αi ) RT (i ) rt (αi ) l L F s par CE par FP σ σ p bad p pos p neg δ t dec z Z ρ p

Description Number of receivers i-th receiver Maximum tolerable number of colluding receivers Representation of the original content j -th coeﬃcient of content M Number of coeﬃcients (Chameleon scheme) Number of coeﬃcients (ﬁngerprinting scheme) Content ﬁngerprint of receiver ui Coeﬃcient j of ui ’s content ﬁngerprint CF (i ) Illegal copy of the original content Fingerprint found in an illegal copy M ∗ Ciphertext of the original content M j -th coeﬃcient of ciphertext C Session key used as a seed for the PRSG Master table of the Chameleon scheme Address of a table entry α-th entry of the master table MT Table ﬁngerprint for receiver table of receiver ui h-th coeﬃcient of ui ’s table ﬁngerprint TF (i ) Receiver table of receiver ui α-th entry of the receiver table RT (i ) Number of bits needed for the binary address of a table entry Number of entries of the tables, L = 2l Number of ﬁngerprinted entries of a receiver table Number of master table entries per ciphertext coeﬃcient Input parameters (Chameleon scheme) Input parameters (ﬁngerprinting scheme) Standard deviation for receiver table Standard deviation for SSW scheme Maximum probability of a bad copy Maximum probability of a false positive Maximum probability of a false negative Goodness criterion (SSW scheme) Threshold of similarity measure (SSW scheme) Decision output of detection algorithm Upper bound of interval [0, z ] (content coeﬃcients) Key space size and cardinality of discrete interval [0, z ] Scaling factor from real numbers to group elements Order of the additive group

29

30

C

A. Adelsbach, U. Huber, and A.-R. Sadeghi

Chameleon Encryption

Deﬁnition 9. A Chameleon encryption scheme is a tuple of ﬁve polynomialtime algorithms CE := (KeyGenCE, KeyExtrCE, EncCE, DecCE, DetectCE), where: – KeyGenCE is the probabilistic key generation algorithm used by the center to set up all parameters of the scheme. KeyGenCE takes the number N of receivers, a security parameter λ , and a set of performance parameters par CE as input in order to generate a secret master table MT , a tuple TF := (TF (1) , . . . , TF (N ) ) of secret table ﬁngerprints containing one ﬁngerprint per receiver, and a threshold t . The values N and λ are public:

(MT , TF , t ) ← KeyGenCE(N , 1λ , par CE ) – KeyExtrCE is the deterministic key extraction algorithm used by the center to extract the secret receiver table RT (i ) to be delivered to receiver ui in the setup phase. KeyExtrCE takes the master table MT , the table ﬁngerprints TF , and the index i of receiver ui as input in order to return RT (i ) : RT (i ) ← KeyExtrCE(MT , TF , i) – EncCE is the deterministic encryption algorithm used by the center to encrypt content M such that only receivers in possession of a receiver table and the session key can recover it. EncCE takes the master table MT , a session key k sess , and content M as input in order to return the ciphertext C : C ← EncCE(MT , k sess , M ) – DecCE is the deterministic decryption algorithm used by a receiver ui to decrypt a ciphertext C . DecCE takes the receiver table RT (i ) of receiver ui , a session key k sess , and a ciphertext C as input. It returns a good copy M (i ) of the underlying content M if C is a valid encryption of M using k sess : M (i ) ← DecCE(RT (i ) , k sess , C ) – DetectCE is the deterministic ﬁngerprint detection algorithm used by the center to detect whether the table ﬁngerprint TF (i ) of receiver ui left traces in an illegal copy M ∗ . DetectCE takes the original content M , the illegal copy M ∗ , the session key k sess , the table ﬁngerprint TF (i ) of ui , and the threshold t as input in order to return dec = true if the similarity measure of the underlying ﬁngerprinting scheme indicates that the similarity between M ∗ and M (i ) is above the threshold t . Otherwise it returns dec = false: dec ← DetectCE(M , M ∗ , k sess , TF (i ) , t ) Correctness of CE requires that ∀ui ∈ U : DecCE(RT (i ) , k sess , EncCE(MT , k sess , M )) = M (i ) Good(M

(i )

, M ) = true

such that

(see Deﬁnition 3) with high probability.

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

D

31

Fingerprinting and Spread Spectrum Watermarking

In this section, we detail our notation of a ﬁngerprinting scheme by describing the respective algorithms of Spread Spectrum Watermarking [14,15]. This scheme is a tuple of three polynomial-time algorithms (SetupFP, EmbedFP, DetectFP). We detail each of the three algorithms in Sections D.1–D.3. D.1

Setup Algorithm

SetupFP is the probabilistic setup algorithm used by the center to set up all parameters of the scheme. SetupFP takes the number N of receivers, the number n of content coeﬃcients, a goodness criterion δ, a maximum probability p bad of bad copies, and a maximum probability p pos of false positives as input in order to return a tuple of secret content ﬁngerprints CF , containing one ﬁngerprint per receiver, as well as a similarity threshold t . The values N and n are public: (CF , t ) ← SetupFP(N , n , δ, p bad , p pos ) The algorithm of [14,15] proceeds as follows. The set of content ﬁngerprints CF is deﬁned as CF := (CF (1) , . . . , CF (N ) ). The content ﬁngerprint CF (i ) of receiver ui is a vector CF (i ) := (cf (1i ) , . . . , cf (ni) ) of n ﬁngerprint coeﬃcients. For each receiver index i ∈ {1, . . . , N } and for each coeﬃcient index j ∈ {1, . . . , n }, the ﬁngerprint coeﬃcient follows an independent normal distribution. The standard deviation of this distribution depends on the values N , n , δ, and p bad : ∀ 1 ≤ i ≤ N , ∀ 1 ≤ j ≤ n :

cf (ji ) ← N(0, σ )

with σ = fσ (N , n , δ, p bad )

The similarity threshold t is a function t = ft (σ , N , p pos ) of σ , N , and p pos . The details of fσ and ft can be found in [15]. D.2

Watermark Embedding Algorithm

EmbedFP is the deterministic watermark embedding algorithm used by the center to embed the content ﬁngerprint CF (i ) of receiver ui into the original content M . EmbedFP takes the original content M and the secret content ﬁngerprint CF (i ) of receiver ui as input in order to return the ﬁngerprinted copy M (i ) of ui : M (i ) ← EmbedFP(M , CF (i ) ) The algorithm of [14,15] adds the ﬁngerprint coeﬃcient to the original content coeﬃcient to obtain the ﬁngerprinted content coeﬃcient: ∀j ∈ {1, . . . , n } : D.3

mj(i ) ← mj + cf (ji )

Watermark Detection Algorithm

DetectFP is the deterministic watermark detection algorithm used by the center to verify whether an illegal content copy M ∗ contains traces of the content

32

A. Adelsbach, U. Huber, and A.-R. Sadeghi

ﬁngerprint CF (i ) that was embedded into the content copy M (i ) of receiver ui . DetectFP takes the original content M , the illegal copy M ∗ , the content ﬁngerprint CF (i ) , and the similarity threshold t as input and returns the decision dec ∈ {true, false}: dec ← DetectFP(M , M ∗ , CF (i ) , t ) The algorithm of [14,15] calculates the similarity measure between the ﬁngerprint in the illegal copy and the ﬁngerprint of the suspect receiver. The similarity measure is deﬁned as the dot product between the two ﬁngerprints, divided by the Euclidean norm of the ﬁngerprint in the illegal copy: CF ∗ ← M ∗ − M CF ∗ · CF (i ) Sim(CF ∗ , CF (i ) ) ← ||CF ∗ || If Sim(CF ∗ , CF (i ) ) > t Then Return dec = true Else Return dec = false

E

Broadcast Encryption

In this section we describe a general BE scheme that allows revocation of an arbitrary subset of the set of receivers. Examples for such BE schemes are [6,7,8]. As these schemes all belong to the family of subset cover schemes deﬁned in [6], we use this name to refer to them: Deﬁnition 10. A Subset Cover BE (SCBE) scheme is a tuple of four polynomial-time algorithms (KeyGenBE, KeyExtrBE, EncBE, DecBE), where: – KeyGenBE is the probabilistic key generation algorithm used by the center to set up all parameters of the scheme. KeyGenBE takes the number N of receivers and a security parameter λ as input in order to generate the secret master key MK . The values N and λ are public:

MK ← KeyGenBE(N , 1λ ) – KeyExtrBE is the deterministic key extraction algorithm used by the center to extract the secret key SK (i ) to be delivered to a receiver ui in the setup phase. KeyExtrBE takes the master key MK and the receiver index i as input in order to return the secret key SK (i ) of ui : SK (i ) ← KeyExtrBE(MK , i) – EncBE is the deterministic encryption algorithm used to encrypt session key k sess in such a way that only the non-revoked receivers can recover it. EncBE takes the master key MK , the set R of revoked receivers, and session key k sess as input in order to return the ciphertext CBE : CBE ← EncBE(MK , R, k sess )

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

33

– DecBE is the deterministic decryption algorithm used by a receiver ui to decrypt a ciphertext CBE . DecBE takes the index i of ui , its private key SK (i ) , and a ciphertext CBE as input in order to return the session key k sess if CBE is a valid encryption of k sess and ui is non-revoked, i.e., ui ∈ / R. Otherwise, it returns the failure symbol ⊥: k sess ← DecBE(i, SK (i ) , CBE )

if

ui ∈ /R

Correctness of a SCBE scheme requires that ∀ui ∈ U \ R :

F

DecBE(i, SK (i ) , EncBE(MK , R, k sess )) = k sess .

Selection of the Minimum Number of Draws

The center can calculate the statistical diﬀerence after s draws if it knows the corresponding probability distribution. The next lemma gives an explicit formula for this probability distribution. To determine the minimum number of draws to achieve a maximum statistical diﬀerence, e.g., 2−128 , the center increases s until the statistical diﬀerence is below the desired maximum. Note that this only needs to be done once at setup time of the system when s is chosen. Lemma 7. If the draws use addresses with independent uniform distribution and the master table MT is given in the representation of Lemma 5, then the drawing and adding of s master table entries leads to the random variable s Xj mod Z with X (s ) := j =1

Pr [X

(s )

= x] =

condition

s s0 , . . . , sZ −1

Z −1 k =0

where condition ⇔ (8) ∧ (9) ∧ (10) : sk ≥ 0 ∀ k ∈ {0, 1, . . . , Z − 1} Z −1

sk = s

pk sk

(8) (9)

k =0 Z −1

(

sk · xk ) mod Z = x ,

(10)

k =0

where sk denotes the number of times that key space element xk was chosen in s s! := s0 !·...·s denotes the multinomial coeﬃcient. the s selections and s0 ,...,s Z −1 Z −1 ! Proof. Each of the s selections is a random variable Xj with Pr [Xj = xk ] = pk . The independence of the random addresses transfers to the independence of the Xj . The probability of a complete set of s selections is thus a product of s s probabilities of the form 1 p with appropriate indices. The counter sk stores

34

A. Adelsbach, U. Huber, and A.-R. Sadeghi

the number of times that probability pk appears in this term. This counter is non-negative, implying( 8). In total, there are s selections, implying (9). To fulﬁll the condition X (s ) = x , the addition modulo Z of the s random variables Z −1must have the result x . Given the counters sk , the result of the addition is ( k =0 sk · xk ) mod Z . The combination of both statements implies (10). There is more than one possibility for selecting sk times the key symbol xk during the s selections. Considering all such key symbols in s selections, the total number of possibilities is the number of ways in which we can choose s0 times we reach the key symbol x0 , then s1 times the key symbol x1 , and so forth until s . a total of s selections. This number is the multinomial coeﬃcient s0 ,...,s Z −1 Note that we can trivially verify that the probabilities of all key space elements x in Lemma 7 add to 1. Among the three conditions (8), (9), and (10), the ﬁrst two conditions appear in the well-known multinomial theorem Z −1

(

k =0

s

pk ) =

s0 ,...,sZ −1 ≥0 s0 +...+sZ −1 =s

s s0 , . . . , sZ −1

Z −1

pk sk

k =0

By adding the probabilities over all elements, we obviously add over all addends on the right-hand side of the multinomial theorem. As the left-hand side trivially adds to 1, so do the probabilities over all key space elements.

An Estimation Attack on Content-Based Video Fingerprinting Shan He1 and Darko Kirovski2 1

2

Department of Electrical and Computer Engineering University of Maryland, College Park, MD 20742 U.S.A Microsoft Research, One Microsoft Way, Redmond WA 98052 U.S.A [email protected], [email protected]

Abstract. In this paper we propose a simple signal processing procedure that aims at removing low-frequency fingerprints embedded in video signals. Although we construct an instance of the attack and show its efficacy using a specific video fingerprinting algorithm, the generic form of the attack can be applied to an arbitrary video marking scheme. The proposed attack uses two estimates: one of the embedded fingerprint and another of the original content, to create the attack vector. This vector is amplified and subtracted from the fingerprinted video sequence to create the attacked copy. The amplification factor is maximized under the constraint of achieving a desired level of visual fidelity. In the conducted experiments, the attack procedure on the average halved the expected detector correlation compared to additive white gaussian noise. It also substantially increased the probability of a false positive under attack for the addressed fingerprinting algorithm. Keywords: Video watermarking, fingerprinting, signal estimation.

1 Introduction Content watermarking is a signal processing primitive where a secret noise signal w is added to the original multimedia sequence x so that: (i) perceptually, the watermarked content y = x + w is indistinguishable from the original and (ii) watermark detection produces low error rates both in terms of false positives and negatives. An additional requirement is that the watermark should be detected reliably in marked content even after an arbitrary signal processing primitive f () is applied to y such that f (y) is a perceptually acceptable copy of x. Function f () is constructed without the knowledge of w. Content fingerprinting is a specific application of content watermarking with an objective to produce many unique content copies. Each copy is associated with a particular system user. Thus, a discovered content copy that is illegally used, can be traced to its associated user. Here, a distinct watermark wi (i.e., a fingerprint) is applied to x to create a unique content copy yi . We will denote the set of all fingerprints as W = {w1 , . . . , wM } published in Y = {y1 , . . . , yM }. The fingerprint detector d(x, f (yi ), W) should return the index of the user i associated with the content under Y.Q. Shi (Eds.): Transactions on DHMS II, LNCS 4499, pp. 35–47, 2007. c Springer-Verlag Berlin Heidelberg 2007

36

S. He and D. Kirovski

test yi . Typically, this decision is associated with a confidence level which must be high. In particular, one demands low probability of false positives: Pr[d(x, f (yi ), W) = j, j = i] < εF P ,

(1)

ˆ which is not marked where εF P is typically smaller than 10−9 . In case a content copy y with any of the fingerprints in W, is fed to the detector, it should report that no fingerˆ : d(x, y ˆ , W) = 0 with high confidence. Finally, the detector uses print is identified in y the knowledge of the original x while making its decision. This feature substantially improves the accuracy of the forensic detector compared to “blind” detectors [1] which are prone to de-synchronization attacks [2]. Attacks against fingerprinting technologies can be divided into two classes: collusion and fingerprint removal. A collusion attack considers an adversarial clique Q ⊂ Y of a certain size K. The participating colluders compare their fingerprinted copies to produce a new attack copy which does not include statistically important traces of any of their fingerprints [3, 4]. Another objective that a collusion clique may have, is to frame an innocent colluder. Collusion attacks have attracted great deal of attention from the research community which has mainly focused on producing codes that result in improved collusion resistance [5, 6, 7, 8, 9]. 1.1 Fingerprint Estimation In this paper we address the other class of attacks on multimedia forensic schemes: fingerprint removal via estimation. Here, the adversary has the objective to estimate the value of a given fingerprint wi based upon yi only and without the presence of d(). In essence, this attack aims at denoising yi from its fingerprint. In order to make denoising attacks harder, one may design fingerprints dependent upon x so that it is more difficult to estimate them accurately. The effects of this class of attacks are orthogonal to collusion. An adversarial clique may deploy both types of attacks to achieve its goal: Estimation, to reduce the presence of individual fingerprints in their respective copies, and collusion, to perform the removal of the remaining fingerprint traces by creating a final attack copy. For example, a forensic application that uses spread-spectrum fingerprints wi ∈ {±1}N , where N is sequence length, detects them using a correlation based detector c(x, a, wi ) = N −1 (a − x) · wi , where a is the content under test and operator ‘·’ denotes an inner product of two vectors [1]. Content a is a result of forensic multimedia registration exemplified in [4]. In case a is marked with wi , we model a = x + wi + n, where n is a low magnitude gaussian noise. Under the assumption that E[n · wi ] = 0, we have E[c(x, a, wi )] = 1 and E[c(x, a, wi )] = 0 in case when a is and is not marked with wi respectively. Fingerprint detection is performed using a Neyman-Pearson test c(x, a, wi ) ≶ T , where the detection threshold T establishes the error probabilities for false positives and negatives. As an example, the adversarial clique Q may use estimation and collusion via averaging to produce a “clean” copy of the Content content. K averaging by a collusion of K users produces a copy z = K −1 i=1 yi such that E[c(x, z, wi ∈ Q)] = K −1 . If we denote the efficacy of fingerprint estimation using

An Estimation Attack on Content-Based Video Fingerprinting

37

E[c(x, ei , wi )] = α1 , where ei is the attack vector computed via estimation from yi , K then E[c(x, K −1 i=1 (yi − ei ), wi )] = (αK)−1 . Thus, in the asymptotic case, the estimation attack improves the overall effort by the colluders for a scaling factor α. Knowing that collusion resistance of the best fingerprinting codes for 2 hour video sequences is on the order of K ∼ 102 [7,10], we conclude that estimation is an important component of the overall attack. Finally, it appears that estimating fingerprints is no different from estimating arbitrary watermarks. However, there exists a strong difference in the way how watermarks for content screening [11] and fingerprinting [4] are designed. The replication that is necessary for watermarks tailored to content screening1, makes their estimation substantially easier [11]. On the other hand, fingerprints can be designed with almost no redundancy which makes their estimation substantially more difficult. At last, during fingerprint detection, the forensic tool has access to the original which greatly improves the detection rates.

2 Related Work The idea of watermark removal via estimation is not new. To the best of our knowledge, all developed schemes for the estimation attack have targeted “blindly” detected watermarks. For example, Langelaar et al. used a 3 × 3 median and 3 × 3 high pass filters to successfully launch an estimation attack on a spread spectrum image watermarking scheme [12]. Su and Girod used a Wiener filter to estimate arbitrary watermarks; they constructively expanded their attack to provide a power-spectrum condition required for a watermark to resist minimum mean-squared error estimation [13]. Next, Voloshynovskiy et al. achieved partial watermark removal using a filter based on the Maximum a Posteriori (MAP) principle [14]. Finally, Kirovski et al. investigated the security of a direct-sequence spread-spectrum watermarking scheme for audio by statistically analyzing the effect of the estimation attack on their redundant watermark codes [11]. They used the estimation attack of the form: ⎤ ⎡ (2) e = sign ⎣ (xj + w)⎦ , j∈J

where J is a region in the source signal x marked with the same watermark chip w. This attack can be optimal under a set of assumptions about the watermark and the source signal [11]. In this paper, we propose a simple but novel joint source-fingerprint estimator which performs particularly well on low-frequency watermarks. We also show an interesting anomaly specific to watermarking schemes that construct watermarks dependent upon the source: by applying an attack vector dependent upon the source such as vectors produced by our estimation attack, the probability of false positives may substantially increase in the system compared to additive white gaussian noise of similar magnitude. If discovered and unresolved, this issue renders a forensic technology inapplicable. 1

To resist de-synchronization attacks.

38

S. He and D. Kirovski

3 A Video Fingerprinting Scheme In order to present our estimation attack, we use an existing well-engineered video fingerprinting scheme. The scheme is based upon the image watermarking approach presented in [15] and adjusted and improved to video fingerprinting by Harmanci et al. [16, 17, 18]. Their video fingerprinting scheme marks the content by designing a complexity-adaptive watermark signal via solving an optimization problem. The marking process is performed in several steps. First, each frame of the video sequence is transformed into the DWT(Discrete Wavelet Transform) domain. Since watermarks are applied only to the DC sub-bands (the lowest frequency sub-bands), the algorithm packs these coefficients into a 3D prism x(a, b, t), where the third dimension t represents the frame index (i.e., time). Based upon a unique user key, the fingerprint embedding algorithm selects pseudo-randomly, in terms of positions and sizes, a collection of subprisms P = {p1 , . . . , pn } ⊂ x that may overlap. Prisms’ dimensions are upper and lower bounded (e.g., from 12 × 16 × 20 to 36 × 48 × 60). Then, the coefficients in each prism pj ∈ P are weighted using a smooth weighting prism uj . The weighting prisms are generated pseudo-randomly using a user-specific secret key. Finally, the algorithm computes first order statistics for each g(pj · uj ) (e.g., g() computes the mean of its argument) and quantizes them using a private quantizer q(g(pj · uj ), bit), where bit represents the embedded user-specific data. The desired watermark strength is achieved by adjusting the quantization step size during the embedding. The content update Δj = q(g(pj ·uj ), bit)− g(pj ·uj ) is spread among the pixels of the containing prism using an optimization primitive. To get a better visual quality, Harmanci et al. generate a “complexity map” c using the spatial and temporal information of each component, which is then employed in solving the underlying optimization problem to regularize the watermark. Specifically, the spatial complexity cs (a, b, t) for a given component in the DWT-DC sub-band is determined by estimating the variance of the coefficients in a v = M × M 2D window centered at (a, b, t). Typically, M = 5. The decision relies on the i.i.d. assumption for the coefficients. Using the Gaussian entropy formula cs = 12 log 2πeσ 2 (v), where σ 2 () denotes argument variance, the algorithm estimates the spatial entropy rate of that component and uses it as a measure of spatial complexity. To determine the temporal complexity ct , the scheme performs first order auto-regression (AR) analysis with window length L among the corresponding components along the optical flow [19]. The temporal complexity is obtained by applying the Gaussian entropy formula on the distribution of the innovation process of the AR1 model. Then, ct and cs are linearly combined to compute c. By employing the “complexity map,” the resulting watermark is locally adapted to the statistical complexity of the signal. While aimed at improving the perceptual quality of the resulting sequence, the complexity map significantly reduces the exploration space for watermark estimation. Based upon the complexity map, the watermark embedding procedure computes the optimal update values for each DWT-DC coefficient that realizes the desired Δj for each selected prism pj . Finally, the scheme applies a low-pass filter both spatially and temporally on the watermark signal to further improve watermark’s imperceptibility. Figure 1 shows an example watermark extracted from a single frame of our test video sequence as well as the frequency spectrum analysis of the watermark. One can notice

An Estimation Attack on Content-Based Video Fingerprinting

(a)

(a)

(b)

(b)

39

Magnitude of FFT coefficients of wmk

6000 4000 2000 0 −2000 −4000 −6000 100 −8000 80

60

50 40

20

0

0

(c)

(c)

Fig. 1. Fingerprint example: (a) original frame from the benchmark video clip, (b) resulting fingerprint constructed as a marking of this frame – the fingerprint is in the pixel domain, scaled by a factor 10 and shift to a mean of 128, and (c) watermark amplitude in the DFT domain

Fig. 2. Demonstration of perceptual quality: the first frame of the (a) attacked video with α = 1.5, (b) attacked video with α = 1, and (c) original video

that the effective watermark is highly smoothed and that most of the watermark energy is located in the low-frequency band. This conclusion is important for the application of the estimation attack.

40

S. He and D. Kirovski

Given a received video signal z, the detector first employs the information of the original video signal to undo the operations such as histogram equalization, rotation, de-synchronization, etc. Next, using a suspect user key, the detector extracts the feature vector in the same way as the embedding process. It employs a correlation based detection to identify the existence of a watermark as follows: γ=

g − g) (gz − g) · (ˆ ≶ T, ||ˆ g − g||2

(3)

ˆ = {q(g(pj · uj )), j = 1 . . . n}, and g = pj · uj ), j = 1 . . . n}, g where gz = {g(¯ ¯ j represents a prism extracted from z at a position that {g(pj · uj ), j = 1 . . . n}, and p corresponds to the position of pj within x. If γ is greater than a certain threshold T , the detector concludes that z is marked with the fingerprint generated using the suspect user key; otherwise, no fingerprint is detected.

4 Joint Source-Fingerprint Estimation In this paper, we propose a simple attack with an objective to perform joint sourcefingerprint estimation. Based upon the observation that the targeted fingerprints are mainly located in the low-frequency band, we propose a dual-filter attack that is relatively computationally inexpensive and efficient. The estimation attack is performed in the DWT-DC domain where the fingerprints are embedded. For each coefficient x(a, b, t) in this domain, we choose three prisms k1 , k2 and k3 , all centered at x(a, b, t). The outer and largest of the prisms, k1 , encompasses the next smaller one, k2 ⊂ k1 . Prism k3 is smaller than k1 . We average the coefficients inside two 3D regions: inside k3 and inside k1 − k2 . Since both the smoothing and weighting functions are built to maintain in most cases the same sign for the fingerprint over a certain small region in x, we use: e3 =

1 [x(p) + w(p)] |k3 |

(4)

p∈k3

as the estimate of x ¯ + w(a, b, t) where x ¯ denotes the mean of the underlying source. As the targeted fingerprint is a low-frequency signal, we assume that sign(w(p)) is mostly univocal for p ∈ k3 , thus, sign(|k3 |−1 p∈k3 w(p)) represents a good estimate of sign(w(a, b, t)). Next, we use: e12 =

1 |k1 − k2 |

[x(p) + w(p)]

(5)

p∈k1 −k2

to obtain an alternate estimate of x ¯ only. The reasoning is that the fingerprint spread in the region k1 − k2 , has a variable sign and that it would average itself out in e12 . To achieve this goal, the size of k1 − k2 should be large enough. Also, the size of k3 is chosen to be relatively small to capture the sign of w(a, b, t) and to get a stable x ¯ inside k3 . Usually, we choose the size of k3 to be (6,8,10) to (10,12,14); k1 has size about 4

An Estimation Attack on Content-Based Video Fingerprinting

41

k1 k2 k3

a

Low-pass filter

e3 − e12

Complexity map

α⋅

x + wi

t

zi

−

b

Low-pass filter

Fig. 3. Diagram of the estimation attack

times as large as k3 ; and k2 is comparable to k3 or even smaller. Finally, we construct the attack as: z = x + wi − αc · (e3 − e12 ),

(6)

where α is an amplification factor that can be tuned up as long as z is acceptably perceptually similar to x + wi . In addition, we use a complexity map c derived prior to the attack to improve the perceptual effect of the attack and thus, maximize α. The procedure for computing the complexity map is described in Section 3. Since most of the watermark is concentrated in the low frequency band, we employ low-pass filter on the watermarked video signal before and after the estimation attack described in Eqn.6. The diagram of the final attack process is illustrated in Figure 3.

5 Experimental Results In this section, we demonstrate the effectiveness of the proposed estimation attack. In the experiments we choose the “Rodeo Show” video sequence with frame size 640×480 as the host video sequence, and apply the video fingerprinting scheme of [17]. The embedding parameters are chosen to obtain a solid trade-off between perceptual quality and robustness. The deployed fingerprinting scheme is particularly efficient for video sequences with significant “random” motion, thus, the used video sequence is selected to exhibit the best in the marking scheme. We apply the estimation attack using a prism k3 of size 7 × 9 × 11, a large prism k1 of size 25 × 33 × 81, and k2 = ∅. We chose α ∈ {0.5, 1, 1.5} to adjust the attack strength for high, medium, and low perceptual fidelity of the resulting video sequence respectively. Figure 2 illustrates the resulting perceptual quality for the attacked signal for α ∈ {1, 1.5} using the first frame of the benchmark video sequence. We show the results of the estimation attack in Figures 4 and 5. First, we use 50 different keys to create and embed distinct fingerprints into the test video sequence, resulting in 50 unique copies. Then, in each of these copies we perform fingerprint detection using the corresponding key used during fingerprint embedding. Figures 4(a), (b) and (c) represent the histogram of the detection statistic γ for α = {1.5, 1, 0.5} respectively. The (mean, variance) for these three histograms are (a): (0.407, 0.214);

42

S. He and D. Kirovski

Frequency

Histogram of Estimation Attack w/ LPF α = 1.5 9

Histogram of gamma 100 keys with signal after estimation attack at α =1.5 18

8

16

7

14

6

12

5

10

4

8

3

6

2

4

1

2

0 −0.5

0

0.5 Gamma

1

1.5

0 −1.5

−1

−0.5

(a)

0

0.5

1

1.5

(a)

Histogram of gamma w/ LPF at α = 1

Histogram of gamma 100 keys with signal after estimation attack at α = 1 18

12

16

10 14

Frequency

8

12 10

6 8

4

6 4

2 2

0 −0.2

0

0.2

0.4

0.6 gamma

0.8

1

1.2

1.4

0 −0.8

−0.6

−0.4

−0.2

(b)

0

0.2

0.4

0.6

0.8

(b)

Histogram of Estimation Attack w/ LPF α = 0.5

Histogram of gamma 100 keys with signal after estimation attack at α = 0.5 25

8 7

20

6

Frequency

5

15

4 10

3 2

5

1 0 0.5

0.6

0.7

0.8

0.9 Gamma

1

1.1

1.2

1.3

0 −0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

(c)

(c)

Fig. 4. Histogram of γ under the estimation attack for different α in the case of detecting using the same key as embedding: (a) α = 1.5, (b) α = 1, (c) α = 0.5

Fig. 5. Histogram of γ under the estimation attack for different α in the case of detecting using a different key as embedding: (a) α = 1.5, (b) α = 1, (c) α = 0.5

(b): (0.661, 0.095); (c): (0.866, 0.023). From the results, we can observe that due to the estimation attack, the mean of γ significantly deviates from γ = 1 (the expected value when there is no attack) and the deviation increases for large α. On the average, approximately 60% and 35% of the fingerprint correlation is removed after applying the estimation attack with α = {1.5, 1} respectively. More importantly, the variance of the

An Estimation Attack on Content-Based Video Fingerprinting

43

γ statistic becomes relatively large, e.g., the range of γ for α = 1.5 covers {−0.5, 1.3}. Compared to additive white gaussian noise (AWGN) of the same magnitude as our estimation attack, which will be shown later, the fingerprint detector experiences a nearly 12-fold (from σ 2 (γ) = 0.0175 to 0.2136) and 19-fold (from σ 2 (γ) = 0.0050 to 0.0951) increase in the variance of the detection statistic γ for α = {1.5, 1} respectively. We use this observation to point to a significant anomaly of the particular fingerprinting scheme [17]. According to [16, 17, 18], the examined fingerprinting scheme has been tested under various attacks. It was reported that after the MCTF(Motion Compensated Temporal Filtering) attack with various filter lengthes, the detection statistic γ ranges from 0.85 to 1. Thus the fingerprint can be detected with high probability. Other attacks such as rotation by 2 degrees, cropping by 10%, and the MPEG2 compression at bit rate 500kpbs result in the detection statistic γ around 1 and range within [0.6,1.4] [16, 17]. A general estimation attack based on Wiener filtering similar to the one in [13] was proposed and examined in [15], where the watermark can be detected without an error. Compared with these non-content dependent attacks, the proposed attack is more effective in removing the watermark. In the second set of experiments, we examine the scenario when a fingerprint is created and embedded using a key i and detected with a different key j. This test aims at estimating the probability of a false positive under attack, a feature of crucial importance for fingerprinting systems. A solid fingerprinting scheme must observe low probability of false positives for both cases: when detection is done on x + wi as well as f (x + wi ). Function f () represents an arbitrary attack procedure that does not have knowledge of the user keys. According to [16], the detection statistic γ with incorrect detection key ranges within [-0.02,0.02]. However, from Figure 5, one can observe that the proposed estimation attack increases the variance of γ so that non-trivial portion of the keys results in γ as high as 0.8 or even 1. Compared to additive white gaussian noise (AWGN) of the same magnitude as our estimation attack, the fingerprint detector experiences a nearly 14-fold (from σ(γ)2 = 0.0175 to 0.2418) and 20-fold (from σ(γ)2 = 0.0050 to 0.1008) increase in the variance of the detection statistic γ for α= √ {1.5, 1} respectively. Since the tail of the gaussian error function is proportional to N /σ(γ), in order to maintain the same level of false positives as in the case of detecting a fingerprint on the attacked x + wi , the detector must consume 10 ∼ 20 times more samples to produce equivalent error rates. We were not able to understand analytically the unexpected increase in false positives under the estimation attack – however, we speculate that the dependency of watermarks with respect to the source (content-dependent watermarking) has made them prone to attack vectors which are also content-dependent. To further demonstrate the effectiveness of the proposed estimation attack, we apply the AWGN attack with the same energy as introduced by the estimation attacks. We choose α = 1.5 as an example. In Figure 6(a) and (b), we show the histogram of γ for the case of “same-key” and “different-key” detection, respectively. The increase of the variance is far less significant than that incurred by the estimation attack. Figure 6(c) shows the visual quality of the AWGN-attacked frame, from which we can see that the distortion introduced by AWGN is more noticeable than that introduced

44

S. He and D. Kirovski Histogram of gamma w/ DWT domain AWGN

γ for attacked z and attacked org x with α = 1.5

14

1.5

12

1

10

γ

frequency

0.5

8 0

6 −0.5

4 −1

2

attacked FPed y attacked org x

0 0.4

−1.5

0.5

0.6

0.7 0.8 gamma

0.9

1

1.1

0

5

10

15

20

(a)

25 Key Index

30

35

40

45

50

(a)

Histogram of gamma w/ DWT domain AWGN α = 1.5

γ for attacked z and attacked org x with α = 1

25

1.5

1

20 0.5

γ

15 0

10 −0.5

5

attacked FPed z attacked org x threshold h1

−1

threshold h

2

−1.5

0 −0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0

0.4

5

10

15

20

(b)

25 Key Index

30

35

40

45

50

45

50

(b) γ for attacked z and attacked org x with α = 0.5 1.5

1

0.5

γ

0

−0.5

−1

attacked FPed z attacked org x

−1.5

−2

0

5

10

15

20

25 Key Indiex

30

35

40

(c)

(c)

Fig. 6. Histogram of γ under the AWGN attack with equivalent energy as the estimation attack for α = 1.5: (a) detecting with the same key as embedding; (b) detecting with a different key as embedding; (c) frame after the AWGN attack

Fig. 7. Detection statistic γ with respect to varˆ and attacked ious keys for attacked signal z ˆ : (a) α = 0.5; (b) α = 1; (c) original signal x α = 1.5

by the estimation attack. Comparison of the probability of error and visual quality between the estimation and AWGN attacks, demonstrates that the proposed attack successfully captures the content-based watermark and is a far-stronger attack than the “blind” AWGN attack.

An Estimation Attack on Content-Based Video Fingerprinting

45

6 Discussions and Countermeasure As can be seen from the experimental results, the power of the proposed attack lies in the introduced high probability of false positive Pf p . To better understand this effect, we also examine the detection performance after applying the estimation attack directly onto original signal x and detecting it with various keys. The results are shown in Figure 7 along with the detection of the attacked signal zˆ = f (x+wi ) using corresponding key i. The estimation strength α for Figure 7(a) (b) and (c) are chosen to be 1.5, 1 and 0.5 respectively. The results clearly show that the high false positive probability in ˆ is highly correlated with detection comes from the fact that the attacked original signal x the fingerprints generated from many keys. The underlying reason is that the estimation process on the original signal estimates the low frequency information from the x. On the other hand, each fingerprint is built to be content related and has gone through an intensive low-pass filtering process in the addressed video fingerprinting scheme [17]. As a result, the fingerprint mainly contains the low frequency information of x and thus ˆ , which leads to a large value of false positive probability highly correlated with the x Pf p . Now considering from the embedder’s perspective, we try to find ways to combat this estimation attack. From Figure 7 we see that the detection statistic γ is key-dependent, ˆ is high, while for others the i.e. for some keys, the γ for the attacked original signal x γ is low. Since the embedder has the freedom to choose secret keys for embedding, he can leverage on this freedom to deploy a countermeasure by using only the key set that results in low Pf p . Specifically, the embedder can first examine a large set of keys and ˆ . The embedder can then choose those keys that have high γ on zˆ while have low γ on x define two thresholds h1 and h2 , according to the desired Pf n and Pf p respectively, to help the sifting process as shown in Figure 7(b). The keys whose γ on zˆ is higher ˆ is lower than h2 are eligible for embedding. Other keys may result than h1 and γ on x in high Pf p or Pf n and will be discarded. In the example shown in Figure 7 (b), only 36th, 47th and 50th keys are eligible for embedding given h1 = 0.8 and h2 = 0.3. This countermeasure is quite straight forward and requires a significant amount of computations to select the key set. Moreover, the number of eligible keys are quite limited, e.g. only 3 out of 50 keys in Figure 7(b) satisfy the condition of h1 and h2 . Thus, to get a certain number of eligible keys, the embedder has to examine a large pool of keys. This may not be feasible for real applications such as fingerprinting a 2-hour movie signal. The results suggest that introducing low-frequency content-based signal as fingerprint is vulnerable to the estimation type of attack, which should be taken into consideration in the fingerprint design.

7 Conclusions We proposed a simple dual-filter estimator that aims at removing low-frequency fingerprints embedded in video signals. Although we construct an instance of the attack and show its efficacy using a specific video fingerprinting algorithm, the generic form

46

S. He and D. Kirovski

of the attack can be applied to an arbitrary video marking scheme. In the conducted experiments, the attack procedure on the average removed a substantial portion of the embedded fingerprints compared to additive white gaussian noise. To the best of our knowledge, the attack is the first in published literature to induce a substantial increase of false positives in a particular fingerprinting scheme as opposed to a “blind” attack.

Acknowledgment We thank Dr. M.K. Mihcak and Dr. Y. Yacobi for the valuable discussions.

References 1. I. Cox, J. Kilian, F. Leighton, and T. Shamoon, “Secure Spread Spectrum Watermarking for Multimedia”, IEEE Trans. on Image Processing, 6(12), pp.1673–1687, 1997. 2. F.A.P. Petitcolas, R.J. Anderson, and M.G. Kuhn. “Attacks on Copyright Marking Systems”. Info Hiding Workshop, pp.218–238, 1998. 3. F. Ergun, J. Kilian and R. Kumar, “A Note on the limits of Collusion-Resistant Watermarks”, Eurocrypt ’99, 1999. 4. D. Schonberg and D. Kirovski. “Fingerprinting and Forensic Analysis of Multimedia”. ACM Multimedia, pp.788-795, 2004. 5. D. Boneh and J. Shaw, “Collusion-secure Fingerprinting for Digital Data”, IEEE Tran. on Information Theory, 44(5), pp.1897-1905, 1998. 6. Y. Yacobi, “Improved Boneh-Shaw Content Fingerprinting”, CT-RSA 2001, LNCS 2020, pp.378-391, 2001. 7. W. Trappe, M. Wu, Z.J. Wang, and K.J.R. Liu, “Anti-collusion Fingerprinting for Multimedia”, IEEE Trans. on Sig. Proc., 51(4), pp.1069-1087, 2003. 8. Z.J. Wang, M. Wu, H. Zhao, W. Trappe, and K.J.R. Liu, “Anti-Collusion Forensics of Multimedia Fingerprinting Using Orthogonal Modulation”, IEEE Trans. on Image Proc., pp. 804–821, June 2005. 9. S. He and M. Wu, “Joint Coding and Embedding Techniques for Multimedia Fingerprinting,” IEEE Trans. on Info. Forensics and Security, Vol.1, No.2, pp.231–247, June 2006. 10. D. Kirovski. “Collusion of Fingerprints via the Gradient Attack”. IEEE International Symposium on Information Theory, 2005. 11. D. Kirovski and H.S. Malvar. “Spread Spectrum Watermarking of Audio Signals”. IEEE Transactions on Signal Processing, Vol.51, No.4, pp.1020-33, 2003. 12. G. Langelaar, R. Lagendijk, and J. Biemond. “Removing Spatial Spread Spectrum Watermarks by Non-linear Filtering”. Proceedings of European Signal Processing Conference (EUSIPCO 1998), Vol.4, pp.2281–2284, 1998. 13. J. Su and B. Girod. “Power Spectrum Condition for L2-efficient Watermarking”. IEEE Proc. of International Conference on Image Processing (ICIP 1999), 1999. 14. S. Voloshynovskiy, S. Pereira, A. Herrigel, N. Baumgrtner, and T. Pun. “Generalized watermarking attack based on watermark estimation and perceptual remodulation”. SPIE Conference on Security and Watermarking of Multimedia Content II, 2000. 15. M.K. Mihcak, R. Venkatesan, and M. Kesal. “Watermarking via Optimization Algorithms for Quantizing Randomized Statistics of Image Regions”. Allerton Conference on Communications, Computing and Control, 2002. 16. M. Kucukgoz, O. Harmanci, M.K. Mihcak, and R. Venkatesan. “Robust Video Watermarking via Optimization Algorithm for Quantization of Pseudo-Random Semi-Global Statistics”. SPIE Conference on Security, Watermarking and Stegonography, San Jose, CA, 2005.

An Estimation Attack on Content-Based Video Fingerprinting

47

17. O. Harmanci and M.K. Mihcak. “ Complexity-Regularized Video Watermarking via Quantization of Pseudo-Random Semi-Global Linear Statistics”. Proceedings of European Signal Processing Conference (EUSIPCO), 2005. 18. O. Harmanci and M.K. Mihcak. “Motion Picture Watermarking Via Quantization of PseudoRandom Linear Statistics”. Visual Communications and Image Processing Conference, 2005. 19. S.B. Kang, M. Uyttendaele, S.A.J. Winder, and R. Szeliski. “High Dynamic Range Video”, ACM Trans. on Graphics, Vol.22, Issue 3, pp.319–325, 2003.

Statistics- and Spatiality-Based Feature Distance Measure for Error Resilient Image Authentication Shuiming Ye1,2 , Qibin Sun1 , and Ee-Chien Chang2 1

2

Institute for Infocomm Research, A*STAR, Singapore, 119613 School of Computing, National University of Singapore, Singapore, 117543 {Shuiming, Qibin}@i2r.a-star.edu.sg, [email protected]

Abstract. Content-based image authentication typically assesses authenticity based on a distance measure between the image to be tested and its original. Commonly employed distance measures such as the Minkowski measures (including Hamming and Euclidean distances) may not be adequate for content-based image authentication since they do not exploit statistical and spatial properties in features. This paper proposes a feature distance measure for content-based image authentication based on statistical and spatial properties of the feature diﬀerences. The proposed statistics- and spatiality-based measure (SSM ) is motivated by an observation that most malicious manipulations are localized whereas acceptable manipulations result in global distortions. A statistical measure, kurtosis, is used to assess the shape of the feature diﬀerence distribution; a spatial measure, the maximum connected component size, is used to assess the degree of object concentration of the feature diﬀerences. The experimental results have conﬁrmed that our proposed measure is better than previous measures in distinguishing malicious manipulations from acceptable ones. Keywords: Feature Distance Measure, Image Authentication, Image Transmission, Error Concealment, Digital Watermarking, Digital Signature.

1

Introduction

With the wide availability of digital cameras and image processing software, the generation and manipulation of digital images are easy now. To protect the trustworthiness of digital images, image authentication techniques are required in many scenarios, for example, applications in health care. Image authentication, in general, diﬀers from data authentication in cryptography. Data authentication is designed to detect a single bit change whereas image authentication aims to authenticate the content but not the speciﬁc data representation of an image [1], [2]. Therefore, image manipulations which do not change semantic meaning are often acceptable, such as contrast adjustment, histogram equalization, and compression [3], [4]. Lossy transmission is also considered as acceptable since errors under certain level in images would be tolerable Y.Q. Shi (Eds.): Transactions on DHMS II, LNCS 4499, pp. 48–67, 2007. c Springer-Verlag Berlin Heidelberg 2007

Statistics- and Spatiality-Based Feature Distance Measure

49

(a)

(b)

(c)

(d)

(e)

Fig. 1. Discernable patterns of edge feature diﬀerences caused by acceptable image manipulation and malicious modiﬁcation: (a) original image; (b) tampered image; (c) feature diﬀerence of (b); (d) blurred image (by Gaussian 3×3 ﬁlter); (e) feature diﬀerence of (d)

50

S. Ye, Q. Sun, and E.-C. Chang

and acceptable [5]. Other manipulations that modify image content are classiﬁed as malicious manipulations, such as object removal or insertion. Image authentication is desired to be robust to acceptable manipulations, and necessary to be sensitive to malicious ones. In order to be robust to acceptable manipulations, several content-based image authentication schemes have been proposed [6], [7], [8]. These schemes may be robust to one or several speciﬁc manipulations, however, they would classify the image damaged by transmission errors as unauthentic [9]. Furthermore, contentbased image authentication typically measures authenticity in terms of the distance between a feature vector from the received image and its corresponding vector from the original image, and compares the distance with a preset threshold to make a decision [10], [11]. Commonly employed distance measures, such as the Minkowski metrics [12] (including Hamming and Euclidean distances), may not be suitable for robust image authentication. The reason is that even if these measures are the same (e.g., we cannot tell whether the question image is authentic or not), the feature diﬀerence patterns under typical acceptable modiﬁcations or malicious ones may be still distinguishable (feature diﬀerences are diﬀerences between the feature extracted from the original image and the feature extracted from the testing image). That is to say, these measures do not properly exploit statistical or spatial properties of image features. For example, the Hamming distance measures of Fig. 1(b) and Fig. 1(d) are almost the same, but yet, one could argue that Fig. 1(b) is probably distorted by malicious tampering since the feature diﬀerences concentrate on the eyes. The objective of this paper is to propose a distance measure based on statistical and spatial properties of the feature diﬀerences for content-based image authentication. The proposed measure is derived by exploiting the discernable patterns of feature diﬀerences between the original image and the distorted image to distinguish acceptable manipulations from malicious ones. Two properties, the kurtosis of the feature diﬀerence distribution and the maximum connected component size in the feature diﬀerences, are combined to evaluate the discernable patterns. We call the proposed measure statistics- and spatiality-based measure (SSM ) since it considers both global statistical properties and spatial properties. Many acceptable manipulations, which were detected as malicious modiﬁcations by previous schemes based on Minkowski metrics, were correctly veriﬁed by the proposed scheme based on SSM. To illustrate how the proposed SSM can improve the performance of image authentication scheme, we applied it in a semi-fragile image authentication scheme [13] to authenticate images damaged by transmission errors. The proposed error resilient scheme obtained better robustness against transmission errors in JPEG or JPEG2000 images and other acceptable manipulations than the scheme proposed in [13].

2

Proposed Statistics- and Spatiality-Based Measure (SSM ) for Image Authentication

Content-based or feature-based image authentication generally veriﬁes authenticity by comparing the distance between the feature vector extracted from the

Statistics- and Spatiality-Based Feature Distance Measure

51

testing image and the original with some preset thresholds [14]. The distance metric commonly used is the Minkowski metric d(X, Y ) [12]: N |xi − yi |r )1/r d(X, Y ) = (

(1)

i=1

where X, Y are two N dimensional feature vectors, and r is a Minkowski factor. Note that when r is set as 2, it is actually Euclidean distance; when r is 1, Manhattan distance (or Hamming distance for binary vectors). However, the Minkowski metric does not exploit statistical or spatial properties of image features. Therefore, the image authentication scheme based on Minkowski metric may not be suitable to distinguish the tampered images (e.g., small local objects removed or modiﬁed) from the images by acceptable manipulations such as lossy compression. On the other hand, we found that even if the Minkowski metric distances are the same, the feature diﬀerence under typical acceptable manipulations and malicious ones are still distinguishable especially in the case that the feature contains spatial information such as edges or block DCT coeﬃcients. Therefore, the Minkowski metric is not a proper measure for content-based image authentication. 2.1

Main Observations of Feature Diﬀerences

Many features used in content-based image authentication are composed of localized information about the image such as edges [3], [6], block DCT coeﬃcients [1], [10], [13], highly compressed version of the original image [7], or block intensity histogram [11]. To facilitate discussions, we let xi be the feature value at spatial location i, and X be an N -dimension feature vector, for example, N = W · H when using edge feature (W and H are the width and height of the image). We deﬁne the feature diﬀerence vector δ as the diﬀerence between feature vector X of the testing image and feature vector Y of the original image: δi = |xi − yi |

(2)

where δ i is the diﬀerence of features at spatial location i. After examining many discernable feature diﬀerence patterns from various image manipulations, we could draw three observations on feature diﬀerences: 1. The feature diﬀerences by most acceptable operations are evenly distributed spatially, whereas the diﬀerences by malicious operations are locally concentrated. 2. The maximum connected component size of the feature diﬀerences caused by acceptable manipulations is usually small, whereas the one by malicious operation is large. 3. Even if the maximum connected component size is fairly small, the image could have also been tampered with if those small components are spatially concentrated.

52

S. Ye, Q. Sun, and E.-C. Chang

These observations are supported by our intensive experiments and other literatures mentioned previously [6], [9]. Image contents are typically represented by objects and each object is usually represented by spatially clustered image pixels. Therefore, the feature to represent the content of the image would inherit some spatial relations. A malicious manipulation of an image is usually concentrated on modifying objects in image, changing the image to a new one which carries diﬀerent visual meaning to the observers. If the contents of an image are modiﬁed, the features around the objects may also be changed, and the aﬀected feature points tend to be connected with each other. Therefore, the feature diﬀerences introduced by a meaningful tampering typically would be spatially concentrated. On the contrary, acceptable image manipulations such as image compression, contrast adjustment, and histogram equalization introduce distortions globally into the image. The feature diﬀerences may likely to cluster around all objects in the image, therefore they are not as concentrated locally as those by malicious manipulations. In addition, many objects may spread out spatially in the image, thus the feature diﬀerences are likely to be evenly distributed with little connectedness. The distortion introduced by transmission errors would also be evenly distributed since the transmission errors are randomly introduced into the image [18]. The above observations not only prove the unsuitability of Minkowski metric to be used in image authentication, but also provide some hints on how a good distance function would work: it should exploit the statistical and spatial properties of feature diﬀerences. These observations further lead us to design a new feature distance measure for content-based image authentication. 2.2

Proposed Feature Distance Measure for Image Authentication

Based on the observations discussed so far, a feature distance measure is proposed in this section for image authentication. The distance measure is based on the diﬀerences of the two feature vectors from the testing image and from the original image. Two measures are used to exploit statistical and spatial properties of feature diﬀerences, including the kurtosis (kurt ) of feature diﬀerence distribution and the maximum connected component size (mccs) in the feature diﬀerence map. Observation (1) motivates the uses of the kurtosis measure, and observation (2) motivates the uses of the mccs measure. They are combined together since any one of the above alone is still insuﬃcient, as stated in observation (3). The proposed Statistics- and Spatiality-based Measure (SSM ) is calculated by sigmoid membership function based on both mccs and kurt. Given two feature vectors X and Y , the proposed feature distance measure SSM (X, Y ) is deﬁned as follows: SSM (X, Y ) =

1 1 + e α(mccs·kurt·θ−2 − β)

(3)

Statistics- and Spatiality-Based Feature Distance Measure

53

The measure SSM (X, Y ) is derived from the feature diﬀerence vector δ deﬁned in Eq. (2). The mccs and kurt are obtained from δ, and their details are given in the next few paragraphs. θ is a normalizing factor. The parameter α controls the changing speed especially at the point mccs · kurt · θ−2 = β. β is the average mccs · kurt · θ−2 value obtained by calculating from a set of malicious attacked images and acceptable manipulated images. In this paper, the acceptable manipulations are deﬁned as contrast adjustment, noise addition, blurring, sharpening, compression and lossy transmission (with error concealment); the malicious tampering operations are object replacement, addition or removal. During authentication, if the measure SSM (X, Y ) of an image is smaller than 0.5 (that is, mccs · kurt · θ−2 < β, the image is identiﬁed as authentic, otherwise it is unauthentic. Kurtosis. Kurtosis describes the shape of a random variable’s probability distribution based on the size of the distribution’s tails. It is a statistical measure used to describe the concentration of data around the mean. A high kurtosis portrays a distribution with fat tails and a low even distribution, whereas a low kurtosis portrays a distribution with skinny tails and a distribution concentrated towards the mean. Therefore, it could be used to distinguish feature diﬀerence distribution of the malicious manipulations from that of the acceptable manipulations. Let us partition the spatial locations of the image into neighborhoods, and let Ni be the i-th neighborhood. That is, Ni is a set of locations that are in a same neighborhood. For example, by dividing the image into blocks of 8×8, we have a total of W · H/64 neighborhoods, and each neighborhood contains 64 locations. Let Di be the total feature distortion in the i-th neighborhood Ni : Di = δj (4) j∈Ni

We can view Di as a sample of a distribution D. The kurt in the Eq. (3) is the kurtosis of the distribution D. It can be estimated by: N

kurt(D) =

i=1

(Di − μ)4

N um σ 4

−3

(5)

where Num is the total number of all samples used for estimation. μ and σ is the estimated mean and standard deviation of D, respectively. Maximum Connected Component Size. Connected component is a set of points in which every point is connected to all others. Its size is deﬁned as the total number of points in this set. The maximum connected component size (mccs) is usually calculated by morphological operators. The isolated points in the feature diﬀerence map are ﬁrst removed and then broken segments are joined by morphological dilation. The maximum connected component size (mccs) is then calculated by using connected components labeling on the feature map based on 8-connected neighborhood. Details can be found in [15].

54

S. Ye, Q. Sun, and E.-C. Chang

Normalizing Factor. Since images may have diﬀerent number of objects, details as well as dimensions, normalization is therefore needed. Instead of using traditional normalization (i.e., the ratios of the number of extracted feature points to image dimension), we employ a new normalizing factor θ as: θ=

μ W ·H

(6)

where W and H are the width and height of the image respectively. μ is the estimated mean of D, same as that in Eq.(5). The normalized factor θ makes the proposed measure more suitable for natural scene images.

(a)

(b)

(c)

(d)

Fig. 2. Cases that required both mccs and kurt to work together to successfully detect malicious modiﬁcations: (a) small object tampered (kurt: large; mccs: small); (b) feature diﬀerences of (a); (c) large object tampered with global distortions (kurt: small; mccs: large); (d) feature diﬀerences of (c)

Statistics- and Spatiality-Based Feature Distance Measure

55

It is worth noting that the two measures mccs and kurt should be combined together to handle diﬀerent malicious tampering. Usually tampering results in three cases in terms of the values of mccs and kurt : (1) the most general case is that tampered areas are with large maximum connected size and distributed locally (Fig. 1(b)). In this case, both kurt and mccs are large; (2) small local object is modiﬁed such as a small spot added in face (Fig. 2(a)). In this case, the mccs is usually very small, but kurt is large; (3) tampered areas are with large maximum connected size but these areas are evenly distributed in the whole image (Fig. 2(c)). In this case, the mccs is usually large, but kurt is small. Therefore, it is necessary for SSM to combine these two measures so that SSM could detect all these cases of malicious modiﬁcations.

3

Application of SSM to Error Resilient Image Authentication

Image transmission is always aﬀected by the errors due to channel noises, fading, multi-path transmission and Doppler frequency shift [16] in wireless channel, or packet loss due to congestion in Internet [17]. Therefore, error resilient image authentication which is robust to acceptable manipulations and transmission errors is desirable. Based on the proposed feature distance measure, an error resilient image authentication scheme is proposed in this section. The proposed error resilient scheme exploits the proposed measure in a generic semi-fragile image authentication framework [8] to distinguish images distorted by transmission errors from maliciously modiﬁed ones. The experimental results support that the proposed feature distance measure can improve the performance of the previous scheme in terms of robustness and sensitivity. 3.1

Feature Extraction for Error Resilient Image Authentication

One basic requirement for selecting feature for content-based image authentication is that the feature should be sensitive to malicious attacks on the image content. Edge-based features would be a good choice because usually malicious tampering will incur the changes on edges. And edge may also be robust to some distortions. For instances, the results in [18] show that high edge preserving ratios can be achieved even if there are uncorrectable transmission errors. Therefore, the remaining issue is to make the edge more robust to the deﬁned acceptable manipulations. Note that this is main reason why we employ the normalization by Eq. (6) to suppress those “acceptable” distortions around edges. In [19], a method based on fuzzy reasoning is proposed to classify each pixel of a gray-value image into a shaped, textured, or smooth feature point. In this paper we adopt their fuzzy reasoning based detector because of its good robustness. 3.2

Image Signing

The image signing procedure is outlined in Fig. 3. Binary edge of the original image is extracted using the fuzzy reasoning based edge detection method [19].

56

S. Ye, Q. Sun, and E.-C. Chang

Fig. 3. Signing process of the proposed error resilient image authentication scheme

Then, the edge feature is divided into 8×8 blocks, and edge point number in each block is encoded by error correcting code (ECC) [8]. BCH(7,4,1) is used to generate one parity check bit (PCB) for ECC codeword (edge point number) of every 8×8 block. The signature is generated by hashing and encrypting the concatenated ECC codewords using a private key. Finally, the PCB bits embedded into the DCT coeﬃcients of the image. In our implementation, the PCB bits are embedded into the middle-low frequency DCT coeﬃcients using the same quantization based watermarking as in [13]. Let the total selected DCT coeﬃcients form a set P. For each coeﬃcient c in P, it is replaced with cw which is calculated by: Qround(c/Q), if LSB(round(c/Q)) = w (7) cw = Q (round(c/Q) + sgn (c − Qround(c/Q))) , else where w (0 or 1) is the bit to be embedded. Function round(x) returns the nearest integrate of x, sgn(x) returns the sign of x, and LSB(x) returns the least signiﬁcant bit of x. Eq. (7) makes sure that the LSB of the coeﬃcient is the same as the watermark bit. Note that embedding procedure should not aﬀect the feature extracted, since the watermarking procedure would introduce some distortions. In order to exclude the eﬀect of watermarking from feature extraction, a compensation operator Cw is adopted before feature extraction and watermarking: Ic = Cw (I) (8) Iw = fe (Ic ) Cw (I) = IDCT {IntQuan (di , 2Q, P)}

(9)

where di is the i-th DCT coeﬃcient of I, and IDCT is inverse DCT transform. fe (I) is the watermarking function, and Iw is the ﬁnal watermarked image. The IntQuan(c, P, Q) function is deﬁned as: c, if c ∈ /P IntQuan (c, Q, P) = (10) Q round(c/Q), else Cw is designed according to the watermarking algorithm, which uses 2Q to prequantize the DCT coeﬃcients before feature extraction and watermarking. That

Statistics- and Spatiality-Based Feature Distance Measure

57

is, from Eq. (7), (9) and (10), we can get Cw (Iw ) = Cw (I), thus fe (Iw ) = fe (I), i.e., the feature extracted from the original image I is the same as the one from the watermarked image Iw . This compensation operator ensures that watermarking does not aﬀect the extracted feature. 3.3

Image Authenticity Veriﬁcation

The image veriﬁcation procedure can be viewed as an inverse process of the image signing procedure, as shown in Fig. 4. Firstly, error concealment is carried out if transmission errors are detected. The feature of image is extracted using the same method as used in image signing procedure. Watermarks are then extracted. If there are no uncorrectable errors in ECC codewords, the authentication is based on bit-wise comparison between the decrypted hashed feature and the hashed feature extracted from the image [8]. Otherwise, image authenticity is calculated by the SSM based on diﬀerences between the PCB bits of the re-extracted feature and the extracted watermark. Finally, if the image is identiﬁed as unauthentic, the attacked areas are then detected.

Fig. 4. Image authentication process of the proposed error resilient image authentication scheme

Error Concealment. Given an image to be veriﬁed, the ﬁrst step is to conceal the errors if some transmission errors are detected. For wavelet-based images, edge directed ﬁlter-based error concealment algorithm proposed in [18] is adopted. For DCT-based JPEG images, a content-based error concealment proposed in [20] is used. It is eﬃcient and advisable to apply error concealment before image authentication since the edge feature of the error-concealed image is much closer to the original one than that of the damaged image [18], [20]. As a result, the content authenticity of the error concealed image is higher than that of the damaged image, which is validated in our experiments of the error resilient image authentication.

58

S. Ye, Q. Sun, and E.-C. Chang

Image Content Authenticity. Given an image to be veriﬁed, we repeat feature extraction described in image signing procedure. The corresponding PCB bits (PCB W ) of all 8×8 blocks (one bit/block) of the image are extracted from the embedded watermarks. Then the feature set extracted from the image is combined with the corresponding PCBs to form ECC codewords. If all codewords are correctable, we concatenate all codewords and cryptographically hash the result sequence. The ﬁnal authentication result is then concluded by bit-bybit comparison between these two hashed sets. If there are uncorrectable errors in ECC codewords, image authenticity is calculated based on the proposed distance measure. The two feature vectors in the proposed measure are PCB W from watermarks and the recalculated PCB bits (PCB F ) from ECC coding of the re-extracted image feature set. If the distance measure between PCB W and PCB F is smaller than 0.5 (SSM (PCB W , PCB F ) 106 ). The context of application obviously determines the requirements of the watermarking scheme. Blind detection or retrieval should be preferred to informed detection whenever the availability of the original model implies a risk of misuse or theft [17]. Copyright protection thus demands blind detection (some sideinformation can however be tolerated). Integrity and authentication also require blind detection when the integrity of the original itself cannot be trusted. Moreover, using informed detection or retrieval necessitates the development of eﬃcient database 3D shape retrieval algorithms to compare the original with the suspect mesh [8]. Blind detection (or retrieval) however involves many more challenges than informed detection and still leads to poor robustness results in practice. Robustness requirements are the most diﬃcult to determine. Integrity and authentication (as well as augmented contents) watermarking schemes should resist against RST transforms, lossless format conversion and vertex re-ordering and be fragile against all other attacks. Cayre et al. [13] however also propose cropping as an attack to which these schemes should be robust. For copyright protection applications, robustness is required for all attacks preserving the visual perception of the shape. In practice, most papers proposing

98

P.R. Alface and B. Macq

copyright protection 3D watermarking schemes only test RST transforms, vertex re-ordering, noise addition, compression, simpliﬁcation, smoothing, cropping and subdivision. It is considered that the visual shape is the content to protect. Other kinds of properties of the mesh shape can also be important to protect such as touch perception (roughness and haptic textures properties) and functional imperceptibility. The latter concerns for example industrial CAD models which are virtually designed and then manufactured to be part of a complex system. Attacks and watermarks should not modify the design properties of such models. In conclusion, each proposed watermarking scheme should carefully describe the target application and subsequent requirements.

4

3D Watermarking Schemes

In this survey, we describe most well-known and recent contributions to 3D watermarking. We classify them by the domain of embedding: spatial, transform, compression and attribute domains. This classiﬁcation is further subdivided in function of the targeted application. 4.1

Spatial Domain

The 3D watermarking schemes which embed data in the spatial domain may be classiﬁed in two main categories : Connectivity-driven watermarking schemes and Geometry-driven watermarking schemes. 4.1.1 Connectivity-Driven Watermarking Schemes We refer as connectivity-driven watermarking algorithms to those which make an explicit use of the mesh connectivity (some authors also refer to topological features, where topology must be understood as connectivity) to embed data in the spatial domain. These schemes are typically based on a public or secret traversal of all (or a subset of) the mesh triangles. The original model is usually not needed at the detection or decoding stage, they are therefore blind schemes. For each triangle satisfying an admissibility function, slight modiﬁcations are introduced in local invariants by changing the adjacent point positions. As a consequence, these schemes are sensitive to noise addition. However, welldesigned embeddings may interestingly resist against some local connectivity modiﬁcations. Three main diﬀerent strategies (a.k.a. arrangements [35], see Fig. 3) enable to re-synchronize the embedded data even after re-triangulation or cropping: – Global arrangement: canonical traversal of all the connectivity graph. – Local arrangement: canonical traversal of subsets of the connectivity graph. – Subscript arrangement: explicit embedding of the localization of the information. This implies to hide both the data bit and its subscript as well. If subscript arrangements need to embed more information than the local or global arrangements, they are usually more robust [11].

From 3D Mesh Data Hiding to 3D Shape Blind and Robust Watermarking

99

Fig. 3. Embedding strategies of connectivity-driven schemes: (a) global arrangement, (b) local arrangement, (c) indexed arrangement (data courtesy of Ohbuchi et al. [35])

Among this class of watermarking schemes, Ohbuchi et al. [35] have proposed four diﬀerent watermarking algorithms in the ﬁrst work published on 3D watermarking. These schemes are respectively named Triangle Similarity Quadruple (TSQ), Tetrahedral Volume Ratio (TVR), Triangle Strip Peeling Sequence (TSPS) and Macro Density Pattern (MDP). These schemes have inspired most connectivity-driven schemes developed so far. We classify these schemes by the application they target. 4.1.1.1 Data Hiding. Based on the fact that similar triangles may be deﬁned by two quantities which are invariant to rotation, uniform scaling and translation (RST transforms), TSQ modiﬁes ratios between triangle edge lengths or triangle height and basis lengths. A simple traversal of the mesh triangles is proposed to compute Macro-Embedding-Primitives (MEP). A MEP is deﬁned by a marker M , a subscript S and two data values D1 and D2 (see Fig.4). Decoding is simply achieved by traversing each triangle of the mesh, identifying the MEPs thanks to the marker triangle. Then the subscript enables to re-arrange the encoded data D1 and D2 . This scheme is invariant to RST transforms and to cropping thanks to the subscript arrangement. As security is not dealt with, this scheme can only be used for data hiding applications. The invariant used by TVR is the ratio between an initial tetrahedron volume and the volume of tetrahedron given by an edge and the its two incident triangles. These ratios are slightly modiﬁed to embed the watermark and are invariant to aﬃne transforms. Based on a local or global arrangement, TVR is a blind readable watermarking scheme. This scheme can only be applied on 2-manifold meshes (each edge has at most two incident faces). Benedens has extended this scheme to more general meshes without constraints of topology (Aﬃne Independent Embedding AIE [8]). These schemes are however no more robust against cropping when compared to the TSQ scheme. These schemes can however hide f bits in a triangle mesh of f triangles which is much more than TSQ. The third scheme, TSPS, encodes data in triangle strips given the orientation of the triangles. Based on a local arrangement, it presents the same robustness

100

P.R. Alface and B. Macq

Fig. 4. On the left, Macro Embedding Primitive. For each MEP, the marker M is encoded by modifying the point coordinates of the triangle v1 , v2 , v3 so that dimension-less ratios l14 /l24 , h0 /l12 (lij stands for the length between vertices vi and vj ) are set to speciﬁed values which will enable to retrieve marker triangles at the decoding stage. Then, the subscript is similarly encoded by modifying v0 and subsequently l02 /l01 , h0 /l12 . Finally, data symbols D1 and D2 are encoded in l13 /l34 , h3 /l14 and l45 /l34 , h5 /l24 respectively. On the right, Macro Density Pattern example (data courtesy of Ohbuchi et al. [35]).

properties than the TSQ scheme. The capacity is diﬃcult to estimate as the triangle strips generally do not transverse all the faces of the mesh. If it is not competitive with TSQ or TVR, this scheme is the basis of the best steganographic schemes presented in the sequel. Finally, Ohbuchi’s MDP is a visual watermarking method which embeds a meshed logo in the host model by changing the local density of points (see Fig.4). The logo is invisible with most common shading algorithms [21] but turns visible when the edges of the mesh are rendered. However, visible watermarking of 3D meshes has not many applications so far. Focusing on the improvement of the mesh traversal simplicity and speed, O. Benedens has proposed another connectivity-driven scheme: the Triangle Flood Algorithm (TFA) [4]. This scheme uses connectivity and geometric information to generate a unique traversal of all the mesh triangles. Point positions are modiﬁed to embed the watermark by altering the height of the triangles and also to enable the regeneration of the traversal. This schemes exactly embeds f − 1 bits where f stands for the number of triangles. 4.1.1.2 Steganography. Cayre and Macq [11] have proposed a blind substitutive scheme which encodes a bit in a triangle strip starting from the triangle presenting the maximal area. This triangle strip is determined by the bits of a secret key, and determines the location of the encoded data in the 3D mesh. This scheme can be seen as an extension of TSPS with security properties which make it suitable for steganography purposes. It is indeed not possible to locate the embedded data without the knowledge of the secret key. For these reasons, this spatial substitutive scheme can also be considered as an extension of Quantized Index Modulation (QIM) schemes to 3D models.

From 3D Mesh Data Hiding to 3D Shape Blind and Robust Watermarking

101

Still considering steganography, Wang and Cheng [54] have improved the capacity of the precedent approach. First, they resolve the initial triangle for embedding by Principal Component Analysis (PCA). Next, an eﬃcient triangular mesh traversal method is used to generate a sequence list of triangles, which will contain the hidden message. Finally, they embed three ﬁxed bits per vertex for all vertices relying on three independent degrees of freedom. As a result, they exploit larger capacity in the 3D space. However, this capacity gain has been reached at some expense of the proved security features of the scheme of Cayre and Macq [11]. 4.1.1.3 Authentication. Recently, Cayre et al. [13] have extended their previous scheme by using a global optimal traversal of the mesh and an indexed embedding. The authors specify the requirements of a 3D watermarking scheme for the authentication application context. They show their scheme withstands the attacks they consider in such context : RST transforms and cropping. For the cropping attack, the minimum watermark segment (MWS) is computed. They also propose a careful study of the capacity and security (in bits) of the embedding, the class of robustness and the probability of false alarm. The analysis of such features of a 3D watermarking scheme is diﬃcult to perform for all other connectivity-driven watermarking schemes. In conclusion, connectivity-driven algorithms are characterized by their relative fragility and their blind decoding capabilities. The embedded watermark does generally not resist against noise addition or global imperceptible re-triangulations (with exception to MDP). They are suitable for annotation and related applications only, with exception to more recent works which deal with the security issue [11,54] and [13]. These have been successfully designed respectively for steganographic and authentication purposes. Copyright or copy protection cannot be provided by this class of schemes as they do not resist against re-sampling. 4.1.2 Geometry-Driven Watermarking Schemes This section presents the 3D watermarking schemes which embed data in the geometry. These schemes modify the point positions and/or the point (or face) normals. Point normals are estimations of the local continuous surface normal and are tied to the local shape of the mesh. On one hand, while surface sampling determines point positions, its inﬂuence on point normals is negligible if the point density is suﬃcient to accurately represent the surface. On the other hand, noise addition aﬀects much more point normals and curvature estimations than point positions. Notice some schemes need the orientation of face normals to be consistent and cannot be applied to non-orientable surfaces such as a M¨ obius strip. Point normals are usually estimated by a weighted sum of the adjacent faces normals or adjacent point positions. This means that a modiﬁcation of the connectivity may aﬀect the neighborhood of a point and have an impact on the point normal measure. However, attacks which modify point normals generally have a visual impact on the rendering of the mesh [21] and should therefore not be dealt with by a watermarking scheme.

102

P.R. Alface and B. Macq

4.1.2.1 Data Hiding. The Vertex Flood Algorithm (VFA) [5] embeds information in point positions. Designed for public watermarking, its high capacity is its main feature. Given a point p in the mesh, all points are clustered in subsets (Sk ) accordingly with their distance to p. This point is the barycenter of a reference triangle R whose edges are the closest to a predeﬁned edge length ratio : Sk = {pi ∈ V |k ≤

dMAX pi − p < k + 1}, 0 ≤ k ≤ , W W

(1)

where dMAX is the maximal distance allowed from p, and W is the width of each set. Each non-empty subset is subdivided in m + 2 intervals in order to encode m bits. The distance of each point in a subset is modiﬁed so that it is placed on the middle of one of the m + 2 intervals. The ﬁrst and last intervals are not used for encoding in order to prevent modiﬁcations of point distances which would aﬀect the other subsets. Decoding does not need the original mesh and is simply achieved by reading point positions the subintervals in each subset Sk . As the scheme of Harte et al., VFA only resists against RST transforms. Compared to connectivity-driven watermarking schemes, this scheme can achieve higher capacity, only limited by the point sampling rate and the point position quantization precision. 4.1.2.2 Authentication. Yeo et al. [51] have developed an authentication algorithm by modifying point positions so that each mesh point veriﬁes the following equation: K(I(p)) = W (L(p)) , (2) where K(.) is the veriﬁcation key, I(p) is an index value depending on point coordinates, W (.) is the watermark represented by a binary matrix and L(p) gives the location in the watermark matrix. I(p) has been designed to be dependent on the neighborhood of point p. This interesting feature allows the detection of cropping attacks. Compared to connectivity-driven schemes also targeting authentication, the computational cost of this method is higher and the security features have not been deeply analyzed. 4.1.2.3 Informed Copyright Protection. Informed or blind schemes dedicated to copyright protection are often referred to as robust watermarking schemes. The schemes should resist to all known manipulations and attacks which do not produce a visible distortion on the 3D mesh. Schemes that resist to remeshing and re-sampling are often referred as 3D shape watermarking schemes. The Normal Bin Encoding (NBE) scheme [4] embeds data in point normals. Thanks to the curvature sampling properties pointed out before, this scheme resists against simpliﬁcations of the mesh. Point normals are subdivided in bins. Each bin is deﬁned by a normal nB named the center normal of the bin and an angle φR called bin radius. If the angle between a point normal ni is less than φR then ni belongs to such bin. Each bin encodes one bit of information by using diﬀerent features such as the mean of the bin normals, the bin mean angle diﬀerence and the ratio of normals inside a threshold region determined by an

From 3D Mesh Data Hiding to 3D Shape Blind and Robust Watermarking

103

angle φK (with φK < φR ). Point positions are modiﬁed so that the target value is assigned to the chosen bin feature. The decoding is simply achieved by computing the bins and their features but needs the original model for preprocessing purposes. This scheme has been improved later [8] and provides interesting imperceptibility and robustness features. The main drawback is the scalability of this technique which cannot eﬃciently handle meshes with more than 105 points. Yu et al. [53] have proposed an informed robust scheme based on the histogram of the distances from the points of the surface to its center of gravity. This distance histogram is subdivided in bins and the points are iteratively displaced so that the mean or the variance of the histogram bin lies on the left or right of the bin middle to respectively encode a 0 or a 1. A scrambling of the vertices deﬁned by a secret key is also proposed to secure the embedding of the watermark. The informed detection of the watermark necessitates the registration and resampling of the original and watermarked versions of the 3D model. The robustness features of this scheme cover noising and denoising attacks, cropping and resampling. Unlike NBE, this scheme has good scalability properties. Focusing on imperceptibility criterions such as symmetry and continuity preservation, Benedens [7] has proposed a copyright protection watermarking scheme based on a sculpting approach. It uses Free Form Deformations (FFD) at distinct locations of the mesh (the so-called feature points) to embed a watermark. The basic steps performed by the embedding part of the algorithm consist in a ﬁrst selection procedure of feature points and the displacement of these points along the surface normal (inwards or outwards depending on the watermark value) by a FFD. These two operations are ruled by secret keys. The detector is based on the assumption that random copies of the original model have features that are independently randomly distributed (i.e. independently randomly pointing inwards and outwards the surface following the same distribution). This algorithm presents very good imperceptibility and robustness results against noise addition, smoothing, cropping, aﬃne transforms and a relatively good robustness against re-sampling. The latter strongly depends on the detector properties and registration optimality. Comparing the schemes of Yu et al. [53] and this scheme should be done by using the same ﬁne registration and re-sampling process. It appears that the sculpting approach provides better imperceptibility results and comparable robustness features. 4.1.2.4 Blind Copyright Protection. Unlike informed robust watermarking schemes, these schemes cannot survive combined remeshing and cropping attacks so far. They also generally provide less robustness to geometric attacks. However, blind detection is a nice property that is usually required for a copyright protection application scenario. M. Wagner [50] has proposed a scheme which embeds data in the point normals of the mesh. These normal vectors are estimated by the Laplacian operator (a.k.a. umbrella operator ) applied on the point neighborhood: 1 (pj − pi ) , (3) ni = dpi pj ∈N (pi )

104

P.R. Alface and B. Macq

where dpi is the number of point neighbors of pi and N (pi ) is the neighborhood of pi . The watermark is a continuous function f (p) deﬁned on the unit sphere. Normal vectors ni and the watermark function are converted in integers ki and wi respectively: c ki = ni d ni b wi = 2 f , ni

(4) (5)

where d is the mean length of these normal vectors, c is a parameter given by a secret key, and b is the number of bits needed to encode each wi . The embedding proceeds by replacing b bits of ki by those of wi resulting in ki . Then k d the modiﬁed normals ni are re-computed by ni = ci nnii . The watermarked coordinates of each point pi are obtained by solving the following system of L + 1 linear equations: ni =

1 dpi

(pj − pi ) .

(6)

pj ∈N (pi )

However, it is not possible to build a surface from the sole point normal information and this linear equation system is indeed singular. In order to solve this issue, 20% of the points are not watermarked. The decoding of the watermark needs a modiﬁcation of the parameter c because of the modiﬁcation of the normal mean length d : c = c dd . In order to be robust to aﬃne transforms, a non-Euclidian aﬃne invariant norm [34] is used. The watermark can be either a visual logo on the unit sphere either a gaussian white noise. Scalability and computational cost of this scheme are a concern. Harte et al. [25] have proposed another blind watermarking scheme to embed a watermark in the point positions. One bit is assigned to each point : 1 if the point is outside a bounding volume deﬁned by its point neighborhood and 0 otherwise. This bounding volume may be either deﬁned by a set of bounding planes or by an bounding ellipsoid. During embedding and decoding, points are ranked with respect to their distance to their neighborhood center. This algorithm is robust against RST transforms, noise addition and smoothing. Likewise the scheme of Wagner et al. [50], this scheme cannot withstand connectivity attacks such as remeshing or re-triangulation. However, this scheme presents a far better computational cost since embedding only needs one vertices traversal and limits computations for each point to the one-connected neighbors. Cho et al. [18] have proposed a blind and robust extension of the scheme of Yu et al. [53]. This scheme presents the same robustness features with exception to cropping and any re-sampling attack that modiﬁes the position of the center of gravity (e.g. unbalanced point density). They propose to send the position of this point to the detection side which is not realistic. Indeed, combined cropping and rotation or translation attacks can shift the relative positions of the model and the center of gravity conveyed as side-information. This scheme is limited

From 3D Mesh Data Hiding to 3D Shape Blind and Robust Watermarking

105

to star-shaped models1 but, considering robustness, outperforms the schemes of Harte et al. [25] and Wagner [50]. This scheme is however fragile against cropping. Similarly, Zafeiriou et al. [57] have proposed to change the point coordinates into spherical coordinates with the center of gravity as origin. A Principal Component Analysis (PCA) is used to ﬁrst align the mesh along its principal axes. Then two diﬀerent embedding functions are used to modify geometric invariants. For angle theta and radius r, a continuous neighborhood patch is computed by a NURBS patch. A 0 is encoded if the center point radius is less than the mean radius of the neighborhood and a 1 is encoded otherwise. Similar to the scheme of Cho et al. [18], this scheme shows approximately the same advantages and limitations. This scheme is fragile against cropping and unbalanced re-sampling. Center of gravity shifts and PCA alignment perturbations [8] because of density sampling modiﬁcations are also a weakness which deserves further research. More ﬂexible than connectivity-driven algorithms, geometry-driven algorithms enable very diﬀerent capacity-robustness trade-oﬀs. If steganography and authentication seem better handled by the ﬁrst ones, copyright protection techniques could be provided by geometry-driven schemes. However, there is still no blind and robust watermarking scheme able to resist against cropping and irregular point density re-samplings. 4.2

Transform Domain

This section is dedicated to watermarking schemes which embed information in a mesh transform domain. These transforms are extensions of regularly signal processing to 3D meshes: the mesh spectral decomposition, the wavelet transform and the spherical wavelet transform. 4.2.1 Spectral Decomposition Spectral decomposition (a.k.a. pseudo-frequency decomposition or analysis) of 3D meshes corresponds to the extension of the well-known Discrete Fourier Transform (DFT) or Discrete Cosine Transform (DCT). This extension links the spectral analysis of matrices and of the spectral decomposition of signals deﬁned on graphs [45,28]. The pseudo-frequency analysis of a 3D mesh is given by the projection of the geometry on the eigenvectors of the Laplacian operator deﬁned on the mesh. The Laplacian is usually approximated by the umbrella operator L = D − A where A is the adjacency matrix and D is a diagonal matrix with Dii = valence(pi ). Projecting the geometry canonical coordinates (X, Y, Z) leads to three real-valued spectra often noted (P, Q, R) [12]. Other Laplacian operator approximations have been successfully explored to design transforms which allow an optimal energy compaction in pseudo-low frequencies [58,3,56]. Since this transform is based on the eigen-decomposition of a n by n matrix, mesh connectivity partitioning must be used for meshes of more than 104 points to speed up the computation as well as avoiding numerical instabilities 1

For each point of the surface, the segment linking this point to the center of gravity does not intersect the surface in any other point.

106

P.R. Alface and B. Macq

such as eigenvector order ﬂipping [28,58]. Observing that partitioning induces artifacts on submesh boundaries, Wu et al. [56] have recently proposed radial basis functions (RBF) to compute the spectrum of 3D meshes with up to 106 points without the use of a partition algorithm. A better choice of coordinates than the canonical (X, Y, Z) to project on the spectral basis functions is still an open issue. 4.2.1.1 Informed Copyright Protection. The ﬁrst scheme based on spectral decomposition has been proposed by Ohbuchi et al. in 2002 [37]. Their approach consists in extending spread-spectrum techniques to this transform. Wellbalanced point seeds are interactively selected and initialize a connectivity-based front propagation which builds the partition. An additive watermark is embedded on low pseudo-frequency coeﬃcients (P, Q, R) (the three spectra are embedded in the same way). The informed decoding retrieves the partition and the correspondence between the original connectivity and the watermarked geometry through registration, re-sampling and remeshing. This scheme presents robustness against RST transforms, noise addition, smoothing and cropping. Benedens et al. [9] have improved the precedent scheme by embedding the watermark only in the transformed local normal component of the point coordinates instead of embedding (P, Q, R). They show this operation results in a better trade-oﬀ between imperceptibility and capacity. They show it improves the behavior of the decoder as well. Cotting et al. [16] have extended the work of Ohbuchi et al. [37] to pointsampled surfaces. A neighborhood is still needed to compute the Laplacian eigenvectors and is provided by a k-nearest neighbors algorithm. A hierarchical clustering strategy is used to partition the surface. They also show that other point attributes such as color values can also be projected on the spectral basis functions and watermarked as well. The watermark is extracted through registration with the original and re-sampling. The re-sampling is based on the projection of new points on a polynomial approximation of the surface. Their algorithm presents robustness features very close to [37]. Furthermore, they show the watermark withstands repetitive embeddings of diﬀerent watermarks. Recently, Wu and Kobbelt [56] have proposed an approximation of the Laplacian eigenfunctions by RBF functions. These functions are centered on k (with k

Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Moshe Y. Vardi Rice University, Houston, TX, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany

4499

Yun Q. Shi (Ed.)

Transactions on Data Hiding and Multimedia Security II

13

Volume Editor Yun Q. Shi New Jersey Institute of Technology Department of Electrical and Computer Engineering 323, M.L. King Blvd., Newark, NJ 07102, USA E-mail: [email protected]

Library of Congress Control Number: 2007928444 CR Subject Classification (1998): K.4.1, K.6.5, H.5.1, D.4.6, E.3, E.4, F.2.2, H.3, I.4 LNCS Sublibrary: SL 4 – Security and Cryptology ISSN ISSN ISBN-10 ISBN-13

0302-9743 (Lecture Notes in Computer Science) 1864-3043 (Transactions on Data Hiding and Multimedia Security) 3-540-73091-5 Springer Berlin Heidelberg New York 978-3-540-73091-0 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2007 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12077731 06/3180 543210

Preface

In this volume we present the second issue of the LNCS Transactions on Data Hiding and Multimedia Security. In the ﬁrst paper, Adelsbach et al. introduce ﬁngercasting, a combination of broadcast encryption and ﬁngerprinting for secure content distribution. They also provide for the ﬁrst time a security proof for a lookup table-based encryption scheme. In the second paper, He and Kirovski propose an estimation attack on content-based video ﬁngerprinting schemes. Although the authors tailor the attack towards a speciﬁc video ﬁngerprint, the generic form of the attack is expected to be applicable to a wide range of video watermarking schemes. In the third paper, Ye et al. present a new feature distance measure for error-resilient image authentication, which allows one to diﬀerentiate maliciousimage manipulations from changes that do not interfere with the semantics of an image. In the fourth paper, Luo et al. present a steganalytic technique against steganographic embedding methods utilizing the two least signiﬁcant bit planes. Experimental results demonstrate that this steganalysis method can reliably detect embedded messages and estimate their length with high precision. Finally, Alface and Macq present a comprehensive survey on blind and robust 3-D shape watermarking. We hope that this issue is of great interest to the research community and will trigger new research in the ﬁeld of data hiding and multimedia security. Finally, we want to thank all the authors, reviewers and editors who devoted their valuable time to the success of this second issue. Special thanks go to Springer and Alfred Hofmann for their continuous support. March 2007

Yun Q. Shi (Editor-in-Chief) Hyoung-Joong Kim (Vice Editor-in-Chief) Stefan Katzenbeisser (Vice Editor-in-Chief)

LNCS Transactions on Data Hiding and Multimedia Security Editorial Board

Editor-in-Chief Yun Q. Shi

New Jersey Institute of Technology, Newark, NJ, USA [email protected]

Vice Editors-in-Chief Hyoung-Joong Kim Stefan Katzenbeisser

Korea University, Seoul, Korea [email protected] Philips Research Europe, Eindhoven, Netherlands [email protected]

Associate Editors Mauro Barni Jeffrey Bloom Jana Dittmann

Jiwu Huang Mohan Kankanhalli Darko Kirovski C. C. Jay Kuo Heung-Kyu Lee

Benoit Macq Nasir Memon Kivanc Mihcak

University of Siena, Siena, Italy [email protected] Thomson, Princeton, NJ, USA [email protected] Otto-von-Guericke-University Magdeburg, Magdeburg,Germany [email protected] Sun Yat-sen University, Guangzhou, China [email protected] National University of Singapore, Singapore [email protected] Microsoft, Redmond, WA, USA [email protected] University of Southern California, Los Angeles, USA [email protected] Korea Advanced Institute of Science and Technology, Daejeon, Korea [email protected] Catholic University of Louvain, Belgium [email protected] Polytechnic University, Brooklyn, NY, USA [email protected] Bogazici University, Istanbul, Turkey [email protected]

VIII

Editorial Board

Hideki Noda Jeng-Shyang Pan

Fernando Perez-Gonzalez Andreas Pfitzmann Alessandro Piva Yong-Man Ro

Ahmad-Reza Sadeghi Kouichi Sakurai Qibin Sun Edward Wong

Kyushu Institute of Technology, Iizuka, Japan [email protected] National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan [email protected] University of Vigo, Vigo, Spain [email protected] Dresden University of Technology, Germany [email protected] University of Florence, Florence, Italy [email protected] Information and Communications University, Daejeon, Korea [email protected] Ruhr-University, Bochum, Germany [email protected] Kyushu University, Fukuoka, Japan [email protected] Institute of Infocomm Research, Singapore [email protected] Polytechnic University, Brooklyn, NY, USA [email protected]

Advisory Board Pil Joong Lee

Bede Liu

Pohang University of Science and Technology, Pohang, Korea [email protected] Princeton University, Princeton, NJ, USA [email protected]

Table of Contents

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andr´e Adelsbach, Ulrich Huber, and Ahmad-Reza Sadeghi

1

An Estimation Attack on Content-Based Video Fingerprinting . . . . . . . . . Shan He and Darko Kirovski

35

Statistics- and Spatiality-Based Feature Distance Measure for Error Resilient Image Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shuiming Ye, Qibin Sun, and Ee-Chien Chang

48

LTSB Steganalysis Based on Quartic Equation . . . . . . . . . . . . . . . . . . . . . . . Xiangyang Luo, Chunfang Yang, Daoshun Wang, and Fenlin Liu

68

From 3D Mesh Data Hiding to 3D Shape Blind and Robust Watermarking: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Patrice Rondao Alface and Benoit Macq

91

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

117

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages Andr´e Adelsbach, Ulrich Huber, and Ahmad-Reza Sadeghi Horst G¨ ortz Institute for IT Security Ruhr-Universit¨ at Bochum Universit¨ atsstraße 150 D-44780 Bochum Germany [email protected], {huber,sadeghi}@crypto.rub.de

Abstract. We propose a stream cipher that provides conﬁdentiality, traceability and renewability in the context of broadcast encryption assuming that collusion-resistant watermarks exist. We prove it to be as secure as the generic pseudo-random sequence on which it operates. This encryption approach, termed ﬁngercasting, achieves joint decryption and ﬁngerprinting of broadcast messages in such a way that an adversary cannot separate both operations or prevent them from happening simultaneously. The scheme is a combination of a known broadcast encryption scheme, a well-known class of ﬁngerprinting schemes and an encryption scheme inspired by the Chameleon cipher. It is the ﬁrst to provide a formal security proof and a non-constant lower bound for resistance against collusion of malicious users, i.e., a minimum number of content copies needed to remove all ﬁngerprints. To achieve traceability, the scheme ﬁngerprints the receivers’ key tables such that they embed a ﬁngerprint into the content during decryption. The scheme is eﬃcient and includes parameters that allow, for example, to trade-oﬀ storage size for computation cost at the receiving end. Keywords: Chameleon encryption, stream cipher, spread-spectrum watermarking, ﬁngerprinting, collusion resistance, frame-proofness, broadcast encryption.

1

Introduction

Experience shows that adversaries attack Broadcast Encryption (BE) systems in a variety of diﬀerent ways. Their attacks may be on the hardware that stores cryptographic keys, e.g., when they extract keys from a compliant device to develop a pirate device such as the DeCSS software that circumvents the Content Scrambling System [2]. Alternatively, their attacks may be on the decrypted content, e.g., when a legitimate user shares decrypted content with illegitimate users on a ﬁle sharing system such as Napster, Kazaa, and BitTorrent.

An extended abstract of this paper appeared in the Proceedings of the Tenth Australasian Conference on Information Security and Privacy (ACISP 2006) [1].

Y.Q. Shi (Eds.): Transactions on DHMS II, LNCS 4499, pp. 1–34, 2007. c Springer-Verlag Berlin Heidelberg 2007

2

A. Adelsbach, U. Huber, and A.-R. Sadeghi

The broadcasting sender thus has three security requirements: conﬁdentiality, traceability of content and keys, and renewability of the encryption scheme. The requirements cover two aspects. Conﬁdentiality tries to prevent illegal copies in the ﬁrst place, whereas traceability is a second line of defense aimed at ﬁnding the origin of an illegal copy (content or key). The need for traceability originates from the fact that conﬁdentiality may be compromised in rare cases, e.g., when a few users illegally distribute their secret keys. Renewability ensures that after such rare events, the encryption system can recover from the security breach. In broadcasting systems deployed today, e.g., Content Protection for PreRecorded Media [3] or the Advanced Access Content System [4], conﬁdentiality and renewability often rely on BE because it provides short ciphertexts while at the same time having realistic storage requirements in devices and acceptable computational overhead. Traitor tracing enables traceability of keys, whereas ﬁngerprinting provides traceability of content. Finally, renewability may be achieved using revocation of the leaked keys. However, none of the mentioned cryptographic schemes covers all three security requirements. Some existing BE schemes lack traceability of keys, whereas no practically relevant scheme provides traceability of content [5,6,7,8]. Traitor tracing only provides traceability of keys, but not of content [9,10]. Fingerprinting schemes alone do not provide conﬁdentiality [11]. The original Chameleon cipher provides conﬁdentiality, traceability and a hint on renewability, but with a small constant bound for collusion resistance and, most importantly, without formal proof of security [12]. Asymmetric schemes, which provide each compliant device with a certiﬁcate and accompany content with Certiﬁcate Revocation Lists (CRLs), lack traceability of content and may reach the limits of renewability when CRLs become too large to be processed by real-world devices. Finally, a trivial combination of ﬁngerprinting and encryption leads to an unacceptable transmission overhead because the broadcasting sender needs to sequentially transmit each ﬁngerprinted copy. Our Contribution. We present, to the best of our knowledge, the ﬁrst rigorous security proof of Chameleon ciphers, thus providing a sound foundation for the recent applications of these ciphers, e.g., [13]. Furthermore, we give an explicit criterion to judge the security of the Chameleon cipher’s key table. Our ﬁngercasting approach fulﬁlls all three security requirements at the same time. It is a combination of (i) a new Chameleon cipher based on the ﬁnger printing capabilities of a well-known class of watermarking schemes and (ii) an arbitrary broadcast encryption scheme, which explains the name of the approach. The basic idea is to use the Chameleon cipher for combining decryption and ﬁngerprinting. To achieve renewability, we use a BE scheme to provide fresh session keys as input to the Chameleon scheme. To achieve traceability, we ﬁngerprint the receivers’ key tables such that they embed a ﬁngerprint into the content during decryption. To enable higher collusion resistance than the original Chameleon scheme, we tailor our scheme to emulate any watermarking scheme whose coeﬃcients follow a

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

3

probability distribution that can be disaggregated into additive components.1 As proof of concept, we instantiate the watermarking scheme with Spread Spectrum Watermarking (SSW), which has proven collusion resistance [14,15]. However, we might as well instantiate it with any other such scheme. Joint decryption and ﬁngerprinting has signiﬁcant advantages compared to existing methods such as transmitter-side or receiver-side Fingerprint Embedding (FE) [11]. Transmitter-side FE is the trivial combination of ﬁngerprinting and encryption by the sender. As discussed above, the transmission overhead is in the order of the number of copies to be distributed, which is prohibitive in practical applications. Receiver-side FE happens in the user’s receiver; after distribution of a single encrypted copy of the content, a secure receiver based on tamperresistant hardware is trusted to embed the ﬁngerprint after decryption. This saves bandwidth on the broadcast channel. However, perfect tamper-resistance cannot be achieved under realistic assumptions [16]. An adversary may succeed in extracting the keys of a receiver and subsequently decrypt without embedding a ﬁngerprint. Our ﬁngercasting approach combines the advantages of both methods. It saves bandwidth by broadcasting a single encrypted copy of the content. In addition, it ensures embedding of a ﬁngerprint even if a malicious user succeeds in extracting the decryption keys of a receiver. Furthermore, as long as the number of colluding users remains below a threshold, the colluders can only create decryption keys and content copies that incriminate at least one of them. This paper enhances our extended abstract [1] in the following aspects. First, the extended abstract does not contain the security proof, which is the major contribution. Second, we show here that our instantiation of SSW is exact, whereas the extended abstract only claims this result. Last, we discuss here the trade-oﬀ between storage size and computation cost at the receiving end.

2

Related Work

The original Chameleon cipher of Anderson and Manifavas is 3-collusion-resistant [12]: A collusion of up to 3 malicious users has a negligible chance of creating a good copy that does not incriminate them. Each legitimate user knows the seed of a Pseudo-Random Sequence (PRS) and a long table ﬁlled with random keywords. Based on the sender’s master table, each receiver obtains a slightly diﬀerent table copy, where individual bits in the keywords are modiﬁed in a characteristic way. Interpreting the PRS as a sequence of addresses in the table, the sender adds the corresponding keywords in the master table bitwise modulo 2 in order to mask the plaintext word. The receiver applies the same operation to the ciphertext using its table copy, thus embedding the ﬁngerprint. The original cipher, however, has some inconveniences. Most importantly, it has no formal security analysis and bounds the collusion resistance by the constant number 3, whereas our scheme allows to choose this bound depending on the number of available watermark coefﬁcients. In addition, the original scheme 1

Our scheme does not yet support ﬁngerprints based on coding theory.

4

A. Adelsbach, U. Huber, and A.-R. Sadeghi

limits the content space (and keywords) to strings with characteristic bit positions that may be modiﬁed without visibly altering the content. In contrast, our scheme uses algebraic operations in a group of large order, which enables modiﬁcation of any bit in the keyword and processing of arbitrary documents. Chameleon was inspired by work from Maurer [17,18]. His cipher achieves information-theoretical security in the bounded storage model with high probability. In contrast, Chameleon and our proposed scheme only achieve computational security. The reason is that the master table length in Maurer’s cipher is super-polynomial. As any adversary would need to store most of the table to validate guesses, the bounded storage capacity defeats all attacks with high probability. However, Maurer’s cipher was never intended to provide traceability of content or renewability, but only conﬁdentiality. Ferguson et al. discovered security weaknesses in a randomized stream cipher similar to Chameleon [19]. However, their attack only works for linear sequences of keywords in the master table, not for the PRSs of our proposed solution. Ergun, Kilian, and Kumar prove that an averaging attack with additional Gaussian noise defeats any watermarking scheme [20]. Their bound on the minimum number of diﬀerent content copies needed for the attack asymptotically coincides with the bound on the maximum number of diﬀerent content copies to which the watermarking scheme of Kilian et al. is collusion-resistant [15]. As we can emulate [15] with our ﬁngercasting approach, its collusion resistance is—at least asymptotically—the best we can hope for. Recently there was a great deal of interest in joint ﬁngerprinting and decryption [13,21,22,11,23]. Basically, we can distinguish three strands of work. The ﬁrst strand of work applies Chameleon in diﬀerent application settings. Briscoe et al. introduce Nark, which is an application of the original Chameleon scheme in the context of Internet multicast [13]. However, in contrast to our new Chameleon cipher they neither enhance Chameleon nor analyze its security. The second strand of work tries to achieve joint ﬁngerprinting and decryption by either trusting network nodes to embed ﬁngerprints (Watercasting in [21]) or doubling the size of the ciphertext by sending diﬀerently ﬁngerprinted packets of content [22]. Our proposed solution neither relies on trusted network nodes nor increases the ciphertext size. The third strand of work proposes new joint ﬁngerprinting and decryption processes, but at the price of replacing encryption with scrambling, which does not achieve indistinguishability of ciphertext and has security concerns [11,23]. In contrast, our new Chameleon cipher achieves indistinguishability of ciphertext.

3 3.1

Preliminaries Notation

We recall some standard notations that will be used throughout the paper. First, we denote scalar objects with lower-case variables, e.g., o1 , and object tuples as

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

5

well as roles with upper-case variables, e.g., X1 . When we summarize objects or roles in set notation, we use an upper-case calligraphic variable, e.g., O := {o1 , o2 , . . .} or X := {X1 , X2 , . . .}. Second, let A be an algorithm. By y ← A(x) we denote that y was obtained by running A on input x. If A is deterministic, then y is a variable with a unique value. Conversely, if A is probabilistic, then y is a random variable. For example, by y ← N(μ, σ) we denote that y was obtained by selecting it at random with normal distribution, where μ is the mean and R R σ the standard deviation. Third, o1 ←O and o2 ←[0, z ] denote the selection of a random element of the set O and the interval [0, z ] with uniform distribution. Finally, V · W denotes the dot product of two vectors V := (v1 , . . . , vn ) and n W := (w1 , . . . , wn ), which is deﬁned as V · W := j =1 vj wj , while ||V || denotes √ the Euclidean norm ||V || := V · V . 3.2

Roles and Objects in Our System Model

The (broadcast) center manages the broadcast channel, distributes decryption keys and is fully trusted. The users obtain the content via devices that we refer to as receivers. For example, a receiver may be a set-top box in the context of payTV or a DVD player in movie distribution. We denote the number of receivers with N ; the set of receivers is U := {ui | 1 ≤ i ≤ N }. When a receiver violates the terms and conditions of the application, e.g., leaks its keys or shares content, the center revokes the receiver’s keys and thus makes them useless for decryption purposes. We denote the set of revoked receivers with R := {r1 , r2 , . . .} ⊂ U. We represent broadcast content as a sequence M := (m1 , . . . , mn ) of real numbers in [0, z ], where M is an element of the content space M.2 For example, these numbers may be the n most signiﬁcant coeﬃcients of the Discrete Cosine Transform (DCT) as described in [14]. However, they should not be thought of as a literal description of the underlying content, but as a representation of the values that are to be changed by the watermarking process [20]. We refer to these values as signiﬁcant and to the remainder as insigniﬁcant. In the remainder of this paper, we only refer to the signiﬁcant part of the content, but brieﬂy comment on the insigniﬁcant part in Section 5. 3.3

Cryptographic Building Blocks

Negligible Function. A negligible function f : N → R is a function where the inverse of any polynomial is asymptotically an upper bound: ∀k > 0 ∃λ0 ∀λ > λ0 :

f(λ) < 1/λk

Probabilistic Polynomial Time. A probabilistic polynomial-time algorithm is an algorithm for which there exists a polynomial poly such that for every input x ∈ {0, 1}∗ the algorithm always halts after poly(|x |) steps, independently of the outcome of its internal coin tosses. 2

Although this representation mainly applies to images, we discuss an extension to movies and songs in Section 5.

6

A. Adelsbach, U. Huber, and A.-R. Sadeghi

Pseudo-Random Sequence (PRS). We ﬁrst deﬁne the notion of pseudorandomness and then proceed to deﬁne a Pseudo-Random Sequence Generator (PRSG). For further details we refer to [24, Section 3.3.1]: Deﬁnition 1 (Pseudo-randomness). Let len : N → N be a polynomial such that len(λ) > λ for all λ ∈ N and let Ulen(λ) be a random variable uniformly distributed over the strings {0, 1}len(λ) of length len(λ). Then the random variable X with |X | = len(λ) is called pseudo-random if for every probabilistic polynomialtime distinguisher D, the advantage Adv (λ) is a negligible function: Adv (λ) := Pr [D(X ) = 1] − Pr D(Ulen(λ) ) = 1 Deﬁnition 2 (Pseudo-Random Sequence Generator). A PRSG is a deterministic polynomial-time algorithm G that satisﬁes two requirements: 1. Expansion: There exists a polynomial len : N → N such that len(λ) > λ for all λ ∈ N and |G(str )| = len(|str|) for all str ∈ {0, 1}∗. 2. Pseudo-randomness: The random variable G(Uλ ) is pseudo-random. A PRS is a sequence G(str ) derived from a uniformly distributed random seed str using a PRSG. Chameleon Encryption. To set up a Chameleon scheme CE := (KeyGenCE, KeyExtrCE, EncCE, DecCE, DetectCE), the center generates the secret master table MT , the secret table ﬁngerprints TF := (TF (1) , . . . , TF (N ) ), and selects a threshold t using the key generation algorithm (MT , TF , t ) ← KeyGenCE(N , 1λ , par CE ), where N is the number of receivers, λ a security parameter, and par CE a set of performance parameters. To add receiver ui to the system, the center uses the key extraction algorithm RT (i ) ← KeyExtrCE(MT , TF , i) to deliver the secret receiver table RT (i ) to ui . To encrypt content M exclusively for the receivers in possession of a receiver table RT (i ) and a fresh session key k sess , the center uses the encryption algorithm C ← EncCE(MT , k sess , M ), where the output is the ciphertext C . Only a receiver ui in possession of RT (i ) and k sess is capable of decrypting C and obtaining a ﬁngerprinted copy M (i ) of content M using the decryption algorithm M (i ) ← DecCE(RT (i ) , k sess , C ). When the center discovers an illegal copy M ∗ of content M , it executes DetectCE, which uses the ﬁngerprint detection algorithm DetectFP of the underlying ﬁngerprinting scheme to detect whether RT (i ) left traces in M ∗ . For further details on our notation of a Chameleon scheme, we refer to Appendix C. Fingerprinting. To set up a ﬁngerprinting scheme, the center generates the secret content ﬁngerprints CF := (CF (1) , . . . , CF (N ) ) and the secret similarity threshold t using the setup algorithm (CF , t ) ← SetupFP(N , n , par FP ), where N is the number of receivers, n the number of content coeﬃcients, and par FP a set of performance parameters. To embed the content ﬁngerprint CF (i ) := (cf (1i ) , . . . , cf (ni) ) of receiver ui into the original content M , the center uses the embedding algorithm M (i ) ← EmbedFP(M , CF (i ) ). To verify whether an illegal copy M ∗ of content M contains traces of the content ﬁngerprint CF (i ) of receiver

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

7

ui , the center uses the detection algorithm dec ← DetectFP(M , M ∗ , CF (i ) , t ). It calculates the similarity between the detected ﬁngerprint CF ∗ := M ∗ − M and CF (i ) using a similarity measure. If the similarity is above the threshold t , then the center declares ui guilty (dec = true), otherwise innocent (dec = false). This type of detection algorithm is called non-blind because it needs the original content M as input; the opposite is a blind detection algorithm. We call a ﬁngerprinting scheme additive if the probability distribution ProDis of its coeﬃcients has the following property: Adding two independent random variables that follow ProDis results in a random variable that also follows ProDis. For example, the normal distribution has this property, where the means and variances add up during addition. Spread Spectrum Watermarking (SSW) is an instance of an additive ﬁngerprinting scheme. We describe the SSW scheme of [15], which we later use to achieve collusion resistance. The content ﬁngerprint CF (i ) consists of independent random variables cf (ji ) with normal distribution ProDis = N(0, σ ), where σ is a function fσ (N , n , par FP ). The similarity threshold t is a function ft (σ , N , par FP ). Both functions fσ and ft are speciﬁed in [15]. During EmbedFP, the center adds the ﬁngerprint coeﬃcients to the content coeﬃcients: mj(i ) ← mj + cf (ji ) . The similarity test is Sim(CF ∗ , CF (i ) ) ≥ t with Sim(CF ∗ , CF (i ) ) := (CF ∗ · CF (i ) )/||CF ∗ ||. Finally, the scheme’s security is given by: Theorem 1. [15, Section 3.4] In the SSW scheme with the above parameters, an adversarial coalition needs Ω( n / ln N ) diﬀerently ﬁngerprinted copies of content M to have a non-negligible chance of creating a good copy M ∗ without any coalition member’s ﬁngerprint. For further details on our notation of a ﬁngerprinting scheme and the SSW scheme of [15], we refer to Appendix D. Broadcast Encryption. To set up the scheme, the center generates the secret master key MK using the key generation algorithm MK ← KeyGenBE(N , 1λ ), where N is the number of receivers and λ the security parameter. To add receiver ui to the system, the center uses the key extraction algorithm SK (i ) ← KeyExtrBE(MK , i) to extract the secret key SK (i ) of ui . To encrypt session key k sess exclusively for the non-revoked receivers U \ R, the center uses the encryption algorithm C ← EncBE(MK , R, k sess ), where the output is the ciphertext C . Only a non-revoked receiver ui has a matching private key SK (i ) that allows to decrypt C and obtain k sess using the decryption algorithm k sess ← DecBE(i, SK (i ) , C ). For further details on our notation of a BE scheme, we refer to Appendix E. 3.4

Requirements of a Fingercasting Scheme

Before we enter into the details of our ﬁngercasting approach, we summarize its requirements: correctness, security, collusion resistance, and frame-proofness. To put it simply, the aim of our ﬁngercasting approach is to generically combine an instance of a BE scheme, a Chameleon scheme, and a ﬁngerprinting scheme

8

A. Adelsbach, U. Huber, and A.-R. Sadeghi

such that the combination inherits the security of BE and Chameleon as well as the collusion resistance of ﬁngerprinting. To deﬁne correctness we ﬁrst need to clarify how intrusive a ﬁngerprint may be. For a copy to be good, the ﬁngerprint may not perceptibly deteriorate its quality: Deﬁnition 3 (Goodness). Goodness is a predicate Good : M2 → {true, false} over two messages M1 , M2 ∈ M that evaluates their perceptual diﬀerence. A ﬁngerprinted copy M (i ) is called good if its perceptual diﬀerence to the original content M is below a perceptibility threshold. We denote this with Good(M (i ) , M ) = true. Otherwise, the copy is called bad. Deﬁnition 4 (Correctness). Let p bad 1 be the maximum allowed probability of a bad copy. A ﬁngercasting scheme is correct if the probability for a nonrevoked receiver to obtain a bad copy M (i ) of the content M is at most p bad , where the probability is taken over all coin tosses of the setup and encryption algorithm: ∀M ∈ M, ∀ui ∈ U \ R : Pr Good(M , M (i ) ) = false ≤ p bad √ The SSW scheme of [15] uses the goodness predicate ||M (i ) − M || ≤ n δ, where n is the number of content coeﬃcients and δ a goodness criterion. All relevant BE schemes provide IND-CCA1 security [6,7,8], which is a stronger notion than IND-CPA security. As we aim to achieve at least IND-CPA security, the remaining requirements only relate to the Chameleon scheme CE. We deﬁne IND-CPA security of CE by a game between an IND-CPA adversary A and a challenger C: The challenger runs (MT , TF , t ) ← KeyGenCE(N , 1λ , par CE ), generates a secret random session key k sess and sends (MT , TF , t ) to A. A outputs two content items M0 , M1 ∈ M on which it wishes to be chalR lenged. C picks a random bit b ←{0, 1} and sends the challenge ciphertext Cb ← EncCE(MT , k sess , Mb ) to A. Finally, A outputs a guess b and wins if b = b. We -cpa deﬁne the advantage of A against CE as Advind A,CE (λ ) := |Pr [b = 0|b = 0] − Pr [b = 0|b = 1] |. For further details on security notions we refer to [25]. Deﬁnition 5 (IND-CPA security). A Chameleon scheme CE is IND-CPA secure if for every probabilistic polynomial-time IND-CPA adversary A we have -cpa that Advind A,CE (λ ) is a negligible function. We note that in Deﬁnition 5, the adversary is not an outsider or third party, but an insider in possession of the master table (not only a receiver table). Nevertheless, the adversary should have a negligible advantage in distinguishing the ciphertexts of two messages of his choice as long as the session key remains secret. Collusion resistance is deﬁned by the following game between an adversarial coalition A ⊆ U \ R and a challenger C: The challenger runs KeyGenCE on parameters (N , 1λ , par CE ), generates a ciphertext C ← EncCE(MT , k sess , M ), and gives A the receiver tables RT (i ) of all coalition members as well as the session key k sess . Then A outputs a document copy M ∗ and wins if for all coalition members the detection algorithm fails (false negative):

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

9

Deﬁnition 6 (Collusion resistance). Let DetectFP be the ﬁngerprint detection algorithm of the ﬁngerprinting scheme that a Chameleon scheme CE instantiates. Then CE is (q, p neg )-collusion-resistant if for every probabilistic polynomial-time adversarial coalition A of at most q := |A| colluders we have that Pr Good(M ∗ , M )=true, ∀ui ∈ A : DetectFP(M , M ∗ , CF (i ) , t )=false ≤ p neg , where the false negative probability is taken over the coin tosses of the setup algorithm, of the adversarial coalition A, and of the session key k sess . Note that 1-collusion resistance is also called robustness. Frame-proofness is similar to collusion resistance, but A wins the game if the detection algorithm accuses an innocent user (false positive). Deﬁnition 7 (Frame-proofness). Let DetectFP be the ﬁngerprint detection algorithm of the ﬁngerprinting scheme that a Chameleon scheme CE instantiates. Then CE is (q, p pos )-frame-proof if for every probabilistic polynomial-time adversarial coalition A of at most q := |A| colluders we have that / A : DetectFP(M , M ∗ , CF (i ) , t )=true ≤ p pos , Pr Good(M ∗ , M )=true, ∃ui ∈ where the false positive probability is taken over the coin tosses of the setup algorithm, of the adversarial coalition A, and of the session key k sess . In Deﬁnitions 6 and 7, the adversarial coalition again consists of insiders in possession of their receiver tables and the session key. Nevertheless, the coalition should have a well-deﬁned and small chance of creating a plaintext copy that incriminates none of the coalition members (collusion resistance) or an innocent user outside the coalition (frame-proofness).

4 4.1

Proposed Solution High-Level Overview of the Proposed Fingercasting Scheme

To ﬁngercast content, the center uses the BE scheme to send a fresh session key to each non-revoked receiver. This session key initializes a pseudo-random sequence generator. The resulting pseudo-random sequence represents a sequence of addresses in the master table of our new Chameleon scheme. The center encrypts the content with the master table entries to which the addresses refer. Each receiver has a unique receiver table that diﬀers only slightly from the master table. During decryption, these slight diﬀerences in the receiver table lead to slight, but characteristic diﬀerences in the content copy. Interaction Details. We divide this approach into the same ﬁve steps that we have seen for Chameleon schemes in Section 3.3. First, the key generation algorithm of the ﬁngercasting scheme consists of the key generations algorithms of the two underlying schemes KeyGenBE and KeyGenCE. The center’s master key thus consists of MK , MT and TF . Second, the same observation holds

10

A. Adelsbach, U. Huber, and A.-R. Sadeghi

for the key extraction algorithm of the ﬁngercasting scheme. It consists of the respective algorithms in the two underlying schemes KeyExtrBE and KeyExtrCE. The secret key of receiver ui therefore has two elements: SK (i ) and RT (i ) . Third, the encryption algorithm deﬁnes how we interlock the two underlying schemes. To encrypt, the center generates a fresh and random session key R k sess ←{0, 1}λ. This session key is broadcasted to the non-revoked receivers using the BE scheme: CBE ← EncBE(MK , R, k sess ). Subsequently, the center uses k sess to determine addresses in the master table MT of the Chameleon scheme and encrypts with the corresponding entries: CCE ← EncCE(MT , k sess , M ). The ciphertext of the ﬁngercasting scheme thus has two elements CBE and CCE . Fourth, the decryption algorithm inverts the encryption algorithm with unnoticeable, but characteristic errors. First of all, each non-revoked receiver ui recovers the correct session key: k sess ← DecBE(i, SK (i ) , CBE ). Therefore, ui can recalculate the PRS and the correct addresses in receiver table RT (i ) . However, this receiver table is slightly diﬀerent from the master table. Therefore, ui obtains a ﬁngerprinted copy M (i ) that is slightly diﬀerent from the original content: M (i ) ← DecCE(RT (i ) , k sess , CCE ). Last, the ﬁngerprint detection algorithm of the ﬁngercasting scheme is identical to that of the underlying ﬁngerprinting scheme. 4.2

A New Chameleon Scheme

Up to now, we have focused on the straightforward aspects of our approach; we have neglected the intrinsic diﬃculties and the impact of the requirements on the Chameleon scheme. In the sequel, we will show a speciﬁc Chameleon scheme that fulﬁlls all of them. We design it in such a way that its content ﬁngerprints can emulate any additive ﬁngerprinting scheme, which we later instantiate with the SSW scheme as proof of concept. Key Generation. To deﬁne this algorithm, we need to determine how the center generates the master table MT and the table ﬁngerprints TF . To generate MT , the center chooses L table entries at random from the interval [0, z ] with R independent uniform distribution: mt α ←[0, z ] for all α ∈ {1, . . . , L}. As the table entries will be addressed with bit words, we select L = 2l such that l indicates the number of bits needed to deﬁne the binary address of an entry in the table. The center thus obtains the master table MT := (mt 1 , mt 2 , . . . , mt L ). To generate the table ﬁngerprints TF := (TF (1) , . . . , TF (N ) ), the center selects for each receiver ui and each master table entry mt α a ﬁngerprint coeﬃcient in order to disturb the original entry. Speciﬁcally, each ﬁngerprint coeﬃcient tf (αi ) of table ﬁngerprint TF (i ) is independently distributed according to the probability distribution ProDis of the additive ﬁngerprinting scheme, but scaled down with an attenuation factor f ∈ R, f ≥ 1: tf (αi ) ← 1/f · ProDis(par FP )

(1)

Key Extraction. After the probabilistic key generation algorithm we now describe the deterministic key extraction algorithm. The center processes table

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

11

(a) To derive RT (i ) from MT , the cen- (b) To derive ciphertext C from plaintext ter subtracts the L ﬁngerprint coeﬃcients M , the center uses the session key to gentf (αi ) at address α for all α ∈ {1, . . . , L}. erate a PRS. It then adds the addressed master table entries to the plaintext. Fig. 1. Receiver table derivation and ciphertext calculation

ﬁngerprint TF (i ) := (tf (1i ) , . . . , tf (Li ) ) of receiver ui as follows: The center subtracts each ﬁngerprint coeﬃcient in TF (i ) from the corresponding master table entry to obtain the receiver table entry, which we illustrate in Fig. 1(a): ∀ α ∈ {1, . . . , L} :

rt (αi ) ← mt α − tf (αi ) mod p

(2)

Remark 1. The modulo operator allows only integer values to be added. However, the master table, the table ﬁngerprints and the content coeﬃcients are based on real numbers with ﬁnite precision. We solve this ostensible contradiction by scaling the real values to the integer domain by an appropriate scaling factor ρ, possibly ignoring further decimal digits. ρ must be chosen large enough to allow a computation in the integer domain with a suﬃciently high precision. We implicitly assume this scaling to the integer domain whenever real values are used. For example, with real-valued variables rt (i ) , mt, and tf (i ) the operation rt (i ) ← (mt − tf (i ) ) mod p actually stands for ρ · rt (i ) ← (ρ · mt − ρ · tf (i ) ) mod p. The group order p := ρ · z + 1 is deﬁned by the content space [0, z ] (see Section 3.2) and the scaling factor ρ. Encryption. Fig. 1(b) gives an overview of the encryption algorithm. The session key k sess is used as the seed of a PRSG with expansion function len(|k sess |) ≥ n ·s ·l , where parameter s will be speciﬁed below. To give a practical example for a PRSG, k sess may serve as the key for a conventional block cipher, e.g., AES or

12

A. Adelsbach, U. Huber, and A.-R. Sadeghi

triple DES,3 in output feedback mode. Each block of l bits of the pseudo-random sequence is interpreted as an address β in the master table MT . For each coefﬁcient of the plaintext, the center uses s addresses that deﬁne s entries of the master table. In total, the center obtains n · s addresses that we denote with βj ,k , where j is the coeﬃcient index, k the address index, and Extracti extracts the i-th block of length l from its input string: ∀j ∈ {1, . . . , n}, ∀k ∈ {1, . . . , s} :

βj ,k ← Extract(j −1)s+k (G(k sess ))

(3)

For each content coeﬃcient, the center adds the s master table entries modulo the group order. In Fig. 1(b), we illustrate the case s = 4, which is the design choice in the original Chameleon cipher. The j -th coeﬃcient cj of the ciphertext C is calculated as ∀j ∈ {1, . . . , n} :

s mt βj ,k mod p , cj ← mj +

(4)

k =1

where mt βj ,k denotes the master table entry referenced by address βj ,k from (3). Decryption. The decryption algorithm proceeds in the same way as the encryption algorithm with two exceptions. First, the receiver has to use its receiver table RT (i ) instead of MT . Second, the addition is replaced by subtraction. The j -th coeﬃcient mj(i ) of the plaintext copy M (i ) of receiver ui is thus calculated as s mj(i ) ← cj − rt (βij),k mod p,

(5)

k =1

where rt (βij),k denotes the receiver table entry of receiver ui referenced by address βj ,k generated in (3). As the receiver table RT (i ) slightly diﬀers from the master table MT , the plaintext copy M (i ) obtained by receiver ui slightly diﬀers from the original plaintext M . By appropriately choosing the attenuation factor f in (1), the distortion of M (i ) with respect to M is the same as that of the instantiated ﬁngerprinting scheme and goodness is preserved (see Section 4.3). Fingerprint Detection.When the center detects an illegal copy M ∗ = (m1∗ , . . . , mn∗ ) of content M , it tries to identify the receivers that participated in the generation of M ∗ . To do so, the center veriﬁes whether the ﬁngerprint of a suspect receiver ui is present in M ∗ . Obviously, the ﬁngerprint is unlikely to appear in its original form; an adversary may have modiﬁed it by applying common attacks such as resampling, requantization, compression, cropping, and rotation. Furthermore, the adversary may have applied an arbitrary combination of these known attacks and other yet unknown attacks. Finally, an adversarial coalition may have colluded and created M ∗ using several diﬀerent copies of M . The ﬁngerprint detection algorithm is identical to that of the underlying ﬁngerprinting scheme: dec ← DetectFP(M , M ∗ , CF (i ) , t ). In order to properly scale 3

Advanced Encryption Standard [26] and Data Encryption Standard [27].

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

13

the content ﬁngerprint, we need to select the attenuation factor f in (1). We choose it such that the addition of s attenuated ﬁngerprint coeﬃcients generates a random variable that follows ProDis without attenuation (for an example see Section 4.3). In order to verify whether the table ﬁngerprint TF (i ) of receiver ui left traces in M ∗ , DetectFP calculates the similarity between the detected content ﬁngerprint CF ∗ with coeﬃcients cf ∗j := mj∗ −mj and the content ﬁngerprint CF (i ) in ui ’s copy M (i ) with cf (ji ) := mj(i ) − mj

(4),(5)

=

s s (2) mt βj ,k − rt (βij),k = tf (βij),k ,

k =1

(6)

k =1

where tf (βij),k is the ﬁngerprint coeﬃcient that ﬁngerprinted receiver table RT (i ) at address α = βj ,k in (2). If the similarity is above threshold t , the center declares ui guilty. Note that the calculation of CF ∗ necessitates the original content M , whereas the calculation of CF (i ) relies on the session key k sess and the table ﬁngerprint TF (i ) ; the scheme is thus non-blind in its current version. However, we assume it is possible to design an extended scheme with a blind detection algorithm. If instantiated with Spread Spectrum Watermarking, the watermark is often robust enough to be detected even in the absence of the original content. The same algorithm applies to detection of ﬁngerprints in illegal copies of receiver tables. Their ﬁngerprints have the same construction and statistical properties, where the attenuated amplitude of the ﬁngerprint coeﬃcients in (1) is compensated by a higher number of coeﬃcients, as the relation L/f ≈ n holds for practical parameter choices (see Section 5.1). When the center detects the ﬁngerprint of a certain user in an illegal content copy or an illegal receiver table, there are two potential countermeasures with diﬀerent security and performance tradeoﬀs. One is to simply revoke the user in the BE scheme such that the user’s BE decryption key becomes useless and no longer grants access to the session key. However, the user’s receiver table still allows to decrypt content if yet another user illegally shares the session key. In an Internet age, this is a valid threat as two illegal users may collude such that one user publishes the receiver table (and gets caught) and the other user anonymously publishes the session keys (and doesn’t get caught). Nevertheless, we stress that this weakness, namely the non-traceability of session keys, is common to all revocation BE schemes because the session key is identical for all users and therefore does not allow tracing.4 In order to avoid this weakness, the other potential countermeasure is to not only revoke the user whose receiver table was illegally shared, but also renew the master table and redistribute the new receiver tables. If the broadcast channel has enough spare bandwidth, the center can broadcast the receiver tables individually to all receivers in oﬀ-peak periods, i.e., when the channel’s bandwidth 4

The common assumption for revocation BE schemes is that it is diﬃcult to share the session key anonymously on a large scale without being caught. Even if key sharing may be possible on a small scale, e.g., among family and friends, the main goal is to allow revocation of a user that shared the decryption key or session keys and got caught, no matter by which means of technical or legal tracing.

14

A. Adelsbach, U. Huber, and A.-R. Sadeghi

is not fully used for regular transmission. The relevant BE schemes [6,7,8] allow to encrypt each receiver table individually for the corresponding receiver such that only this receiver can decrypt and obtain access.5 If the broadcast channel’s bandwidth is too low, then the receiver tables need to be redistributed as in the initial setup phase, e.g., via smartcards. Parameter Selection. The new Chameleon scheme has two major parameters L and s that allow a trade-oﬀ between the size of RT (i ) , which ui has to store, and the computation cost, which grows linearly with the number s of addresses per content coeﬃcient in (4). By increasing L, we can decrease s in order to replace computation cost with storage size. Further details follow in Section 5.1. 4.3

Instantiation with Spread Spectrum Watermarking

In this section, we instantiate the ﬁngerprinting scheme with the SSW scheme of [15] and thereby inherit its collusion resistance and frame-proofness. Let the center choose the SSW scheme’s parameters par FP = (δ, p bad , p pos ), which allows to calculate a standard deviation σ and a threshold t via two functions fσ (N , n , δ, p bad ) and ft (σ , N , p pos ) deﬁned in [15]. The probability distribution of the SSW scheme is then ProDis = N(0, σ ). We set f = s because then 1/f · N(0, σ )√in (1) is still a normal distribution with mean 0 and standard deviation 1/ s · σ , and adding s of those variables in (4) and (5) leads to the required random variable with standard deviation σ . It remains to deﬁne the similarity measure for the detection algorithm dec ← DetectFP(M , M ∗ , CF (i ) , t ), which [15] deﬁnes as: dec = true if

CF ∗ · CF (i ) >t ||CF ∗ ||

We call an instantiation exact if it achieves the same statistical properties as the ﬁngerprinting scheme that it instantiates. Theorem 2 below states that the above choice is an exact instantiation of the SSW scheme. Theorem 2. Let σ and σ be the standard deviations of the SSW scheme and the Chameleon scheme instantiated with SSW, respectively, and n and n be their number of content coeﬃcients. Then the following mapping between both schemes is an exact instantiation: √ σ = s · σ (⇔ f = s) and n = n Towards the proof of Theorem 2. We prove an even stronger result than Theorem 2. In addition to the exactness of the instantiation, we also prove that it is optimal to ﬁngerprint every entry of the receiver tables. To do so, we ﬁrst formulate Lemmata 1–4 and then describe why they imply Theorem 2. For 5

In all of these schemes, the center shares with each user an individual secret, which they can use for regular symmetric encryption.

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

15

the Lemmata, we introduce a parameter F ∈ {1, 2, . . . , L} that describes the number of receiver table entries that obtain a ﬁngerprint coeﬃcient tf (αi ) in (2). The position of the F ﬁngerprinted entries in the receiver table is selected with uniform distribution. We show that the choice F = L is optimal in the sense that the resulting instantiation is exact. The diﬃculty in analyzing the SSW instantiation is that each content coeﬃcient is not only ﬁngerprinted with a single ﬁngerprint coeﬃcient as in SSW, but with up to s such variables as can be seen from (6). Note that for F < L some receiver table entries do not receive a ﬁngerprint coeﬃcient and are therefore identical to the master table entry. In order to analyze the statistical properties of the resulting ﬁngerprint, we will need to calculate the expectation and variance of two parameters that link the instantiation to the original SSW scheme. The ﬁrst parameter is the number N fp of ﬁngerprint coeﬃcients tf (i ) that are added to a content coeﬃcient mj by using the receiver table RT (i ) in (5) instead of the master table MT in (4). In SSW, N fp has the constant value 1, i.e., a content ﬁngerprint consists of one ﬁngerprint coeﬃcient per content coeﬃcient, whereas in our scheme N fp varies between 0 and s as shown in (6). If only F of the L receiver table entries have been ﬁngerprinted, then tf (i ) = 0 for the remaining L − F entries. The second parameter is the number of content coeﬃcients that carry a detectable content ﬁngerprint. In SSW, this number has the constant value n , i.e., every coeﬃcient carries a ﬁngerprint with ﬁxed standard deviation, whereas in our scheme, some of the n coeﬃcients may happen to receive no or only few ﬁngerprint coeﬃcients tf (i ) . Speciﬁcally, this happens when the receiver table entry rt (βij),k of (5) did not receive a ﬁngerprint coeﬃcient in (2) for F < L. The next lemma gives the number of normally distributed table ﬁngerprint coeﬃcients that our scheme adds to a content coeﬃcient. This number is a random variable characterized by its expectation and standard variance. We prove the lemmata under the uniform sequence assumption, i.e., the sequence used to select the addresses from the master table has independent uniform distribution. We stress that we only use it to ﬁnd the optimal mapping with SSW; security and collusion resistance of the proposed scheme do not rely on this assumption for the ﬁnal choice of parameters (see the end of this section).6 Lemma 1. Let N fp be the random variable counting the number of ﬁngerprinted receiver table entries with which a coeﬃcient mj(i ) of copy M (i ) is ﬁngerprinted. Then the probability of obtaining k ∈ {0, . . . , s} ﬁngerprinted entries is

s F k F Pr N fp = k = ( ) (1 − )s−k k L L The expectation and the variance of N fp are E(N fp ) = s 6

F L

and

2 fp σN − E(N fp )]2 ) = s fp := E([N

F F (1 − ). L L

Note that even if this was not the case, we can show that the adversary’s advantage is still negligible by a simple reduction argument.

16

A. Adelsbach, U. Huber, and A.-R. Sadeghi

Proof. During decryption, the receiver subtracts s receiver table entries rt (αi ) from the ciphertext coeﬃcient using (5). Each entry rt (αi ) is either ﬁngerprinted or not. Under the uniform sequence assumption, the addresses of the subtracted entries rt (αi ) have independent uniform distribution. In addition, the F ﬁngerprinted entries are distributed over RT (i ) with independent uniform distribution. Therefore, the probability that a single address α = βj ,k in (5) points to a ﬁngerprinted receiver table entry rt (αi ) is F /L, which is the number of ﬁngerprinted receiver table entries divided by the total number of entries. As the underlying experiment is a sequence of s consecutive yes-no experiments with success probability F /L, it follows that N fp has binomial distribution. This implies the probability, the expectation, and the variance. Lemma 1 allows us to determine how many ﬁngerprint coeﬃcients we can expect in each content coeﬃcient and how the number of such ﬁngerprint coeﬃcients varies. The next question is what kind of random variable results from adding N fp ﬁngerprint coeﬃcients. Lemma 2. By adding a number N fp of independent N(0, σ)-distributed ﬁngerprint coeﬃcients, the resulting √random variable has normal distribution with mean 0 and standard deviation N fp σ. Proof. Each ﬁngerprint coeﬃcient is independently distributed according to the normal distribution N(0, σ). When two independent and normally distributed random variables are added, the resulting random variable is also normally distributed, while the means and the variances add up.√Due to linearity, the result√ ing standard deviation for N fp random variables is N fp σ 2 = N fp σ. In order to ﬁngerprint the content coeﬃcients with the same standard deviation σ as in the SSW scheme, the natural choice is to choose σ such that E(N fp )σ = σ . The remaining question is how many content coeﬃcients are actually ﬁngerprinted; note that due to the randomness of N fp , some content coeﬃcients may receive more ﬁngerprint coeﬃcients than others. We determine the expected number of ﬁngerprinted content coeﬃcients in the next two lemmata, while we leave it open how many ﬁngerprint coeﬃcients are needed for detection: fp ∈ {1, . . . , s} be the minimum number of table ﬁngerprint Lemma 3. Let Nmin coeﬃcients needed to obtain a detectable ﬁngerprint in content coeﬃcient mj(i ) . fp Then the probability p ﬁng that coeﬃcient mj(i ) of copy M (i ) obtains at least Nmin ﬁngerprint coeﬃcients is

p ﬁng =

s s F k F ( ) (1 − )s−k k L L fp

k =Nmin

Proof. The lemma is a corollary of Lemma 1 by adding the probabilities of all fp events whose value of N fp is greater than or equal to Nmin .

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

17

Lemma 4. Let N ﬁng ∈ {0, . . . , n} be the random variable counting the number of ﬁngerprinted coeﬃcients. Then the expectation of N ﬁng is E(N

ﬁng

n n )= j (p ﬁng )j (1 − p ﬁng )n−j = np ﬁng j j =0

Proof. The lemma follows from the fact that N ﬁng has binomial distribution with success probability p ﬁng and n experiments. Given Lemmata 1–4 we can derive some of the parameters in our scheme from SSW. Suppose that the center has already selected the parameters of the SSW scheme such that the requirements on the number of receivers and collusion resistance are met. This includes the choice of N , n , and par FP = par CE := (δ, p bad , p pos ); it allows to derive σ and t of SSW based on the functions fσ (N , n , δ, p bad ) and ft (σ , N , p pos ), which are deﬁned in [15]. √ Based on the center’s selection, we can derive the parameters n, F /L, and s · σ in our Chameleon scheme as follows. Our ﬁrst aim is to achieve the same expected standard deviation in the content coeﬃcients of our scheme as in SSW, i.e., σ = E(N fp ) · σ, which by Lemma 1 leads to σ = sF /L · σ. Our second aim is to minimize the variance of N fp in order to have N fp = E(N fp ) not only on average, but for as many content coeﬃcients as possible, where N fp = E(N fp ) implies that the content coeﬃcient in our scheme obtains a ﬁngerprint with the 2 same statistical properties as in SSW. The two minima of σN fp = s·F /L·(1−F /L) are F /L = 0 and F /L = 1, of which only the second is meaningful. F /L = 1 or F = L is the case where all entries of the master table are ﬁngerprinted. As 2 fp = s, the content this optimum case leads to a variance of σN fp = 0 and N coeﬃcients of our scheme and SSW have the same statistical properties. This proves Theorem 2 and the claim should be ﬁngerprinted. √ that all tables entries With F /L = 1 and σ = s · σ, we obtain Pr N fp = s = 1 by Lemma 1 and p ﬁng = 1 by Lemma 3. Finally, we conclude that E(N ﬁng ) = n · p ﬁng = n by Lemma 4 and set E(N ﬁng ) = n = n . We stress that the equalities hold even if we replace the uniform sequence with a pseudo-random sequence; for F = L the equations N fp = s and N ﬁng = n are obviously independent of the uniform distribution of the sequence of addresses in the master table. We note that the number s of addresses per content coeﬃcient, introduced in (4), is still undetermined and may be chosen according to the security requirements (see Section 4.4). 4.4

Analysis

Correctness, Collusion Resistance and Frame-Proofness. Correctness follows from the correctness of the two underlying schemes, i.e., the BE scheme and the Chameleon scheme. Correctness of the Chameleon scheme follows from the correctness of the underlying ﬁngerprinting scheme, which we can instantiate exactly by properly choosing the scaling factor in (1) and thus making the content ﬁngerprint of (6) identical to a ﬁngerprint of the instantiated ﬁngerprinting

18

A. Adelsbach, U. Huber, and A.-R. Sadeghi

scheme. Collusion resistance and frame-proofness of content and receiver tables follows from the collusion resistance and frame-proofness of the instantiated ﬁngerprinting scheme. The mapping in Section 4.3 is an exact instantiation of the SSW scheme and therefore inherits its collusion resistance and frame-proofness (see Theorem 1). We note that the proof of Theorem 1, which appears in [15], covers both collusion resistance and frame-proofness, although the original text of the theorem only seems to cover collusion resistance. Collusion resistance, related to false negatives, is shown in [15, Section 3.4], whereas frame-proofness, related to false positives, is shown in [15, Section 3.2]. IND-CPA Security. We reduce the security of our Chameleon scheme to that of the PRSG with which it is instantiated. In order to prove IND-CPA security, we prove that the key stream produced by the Chameleon scheme is pseudorandom (see Deﬁnition 1). IND-CPA security of the proposed scheme follows by a simple reduction argument (see [28, Section 5.3.1]). To further strengthen the proof, we assume that the adversary is in possession of the master table and all receiver tables, although in practice the adversary only has one or several receiver tables. By scaling the real values of the content coeﬃcients to the integer domain (see Remark 1), we obtain a plaintext symbol space P with a cardinality Z deﬁned by the content and the scaling factor ρ. In the remainder of this section we assume that the plaintext symbol space P and the key symbol space K are equal to {0, 1, . . . , Z − 1}. We make this assumption to simplify our notation, but stress that this is no restriction, as there is a one-to-one mapping between the actual plaintext symbol space [0, z ] and the scaled space {0, 1, . . . , Z − 1}, which enumerates the elements of [0, z ] starting from 0.7 In the sequel, by key symbols we mean the elements of K. We also note that the obvious choice for the group order p is the size of the symbol space: p = |K| = Z . This ensures identical size of plaintext and ciphertext space. The proof is divided into 4 major steps. First, we show the properties of the random variable that results from a single draw from the master table (Lemma 5). Second, we deﬁne these properties as the starting point of an iteration on the number s of draws from the master table (Deﬁnition 8). Third, we prove that the random variable that results from adding randomly drawn master table entries improves with every draw, where improving means being statistically closer to a truly random variable (Lemma 6). Last, we prove the pseudo-randomness of the Chameleon scheme’s key stream (Theorem 3). Lemma 5. Let Pr X (1) = x denote the probability of drawing the key symbol x ∈ K in a single draw from master table MT . Let ηk ∈ {0, 1, . . . , L} denote the number of times that key symbol xk ∈ K appears in MT . When we select a master table entry at a random address with uniform then the distribution, probability of obtaining key symbol xk ∈ K is pk := Pr X (1) = xk = ηLk . 7

Note that [0, z ] consists of real numbers with ﬁnite precision. As pointed out in Remark 1 these real numbers are mapped to integers by applying a scaling factor ρ.

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

19

Proof. There are L entries in the master table. Due to the uniform distribution of the selected address, each master table entry has the same probability of being selected. Therefore, the probability of a speciﬁc key symbol xk ∈ K being selected is the number ηk of occurrences of xk in the master table divided by the total number L of master table entries. For a single draw from the master table, the resulting random variable thus only depends on the number of occurrences of the key symbols within the master table. As the master table entries are generated with uniform distribution, the frequencies are unlikely to be identical for each key symbol, leading to a nonuniform and therefore insecure distribution Pr X (1) . Deﬁnition 8 (Strong convergence). Let U be a random variable uniformly (1) distributed over the key symbol space. Let the statistical quality SQ 1of MT (1) 1 Z −1 (1) be the statistical diﬀerence between X and U : SQ := 2 k =0 pk − Z . We call the master table strongly converging if 2SQ (1) ≤ d for some d ∈ R such that d < 1. The statistical quality SQ (1) is thus a measure for the initial suitability of the master table for generating a uniform distribution. The next lemma is the main result of the security analysis; it proves that the statistical quality SQ (s ) gets better with every of the s draws. Lemma 6. Let U be a random variable uniformly distributed over the key symbol space. Let MT be a strongly converging master table. Let Xk denote the k -th draw from MT and X (s ) the random variable resulting s from s independent uniformly distributed draws added modulo Z : X (s ) := k =1 Xk mod Z . Then the statistical diﬀerence SQ (s ) between X (s ) and U is a negligible function with an upper bound of 12 d s . Proof. The proof is by induction. For all k ∈ K, let pk(i ) := Pr X (i ) = k denote the probability of the event that in the i-th iteration the random variable with an adX (i ) takes the value of key symbol k . Represent this probability Z −1 ditive error ek(i ) such that pk(i ) = Z1 (1 + ek(i ) ). Due to k =0 pk(i ) = 1, we obZ −1 tain k =0 ek(i ) = 0. The induction start is trivially fulﬁlled by every strongly converging master table: SQ (1) ≤ 12 d . As the induction hypothesis, we have Z −1 Z −1 (i ) 1 SQ (i ) ≤ 12 d i , where SQ (i ) := 12 k =0 |pk(i ) − Z1 | = 2Z k =0 |ek |. The induction (i+1) 1 i+1 claim is SQ ≤ d . The induction proof follows: Iteration i + 1 is deﬁned i+1 2 as X (i+1) := k =1 Xk mod Z , which is equal to X (i+1) = X (i ) + Xi+1 mod Z , where Xi+1 is just a single draw with the probabilities pk from Lemma 5 and erZ −1 ror representation pk = Z1 (1 + ek ) such that k =0 ek = 0. Therefore, we obtain for all k ∈ K that −1 Z Pr X (i+1) = k = Pr X (i ) = j · Pr [Xi+1 = (k − j ) mod Z ] j =0

20

A. Adelsbach, U. Huber, and A.-R. Sadeghi

=

Z −1 j =0

1 = 2 Z

pj(i ) p(k −j ) mod Z =

Z −1 j =0

1+

Z −1 j =0

(i )

Z −1 1 (1 + ej(i ) )(1 + e(k −j ) mod Z ) Z 2 j =0

ej +

Z −1

e(k −j ) mod Z +

j =0

=0

=

Z −1 j =0

(i )

ej e(k −j ) mod Z

=0

Z −1 1 1 (i ) + 2 e e(k −j ) mod Z Z Z j =0 j

The upper bound for the statistical diﬀerence in iteration i + 1 is Z −1 Z −1 −1 1 1 Z 1 1 (i+1) ( i ) (i+1) SQ := = k − = ej e(k −j ) mod Z Pr X 2 Z 2 Z 2 j =0 k =0 k =0 Z −1 Z −1 (i ) 1 1 ≤ |ek | = 2SQ (i ) SQ (1) ≤ d i+1 , ek 2 2Z 2 k =0

k =0

where the ﬁrst inequality follows from the fact that the two sums on the left-hand side run over every combination of ej(i ) e(k −j ) mod Z , which may have opposite signs, whereas the right-hand side adds the absolute values of all combinations, avoiding any mutual elimination of combinations with opposite signs. Note that the proof relies on the uniform sequence assumption, i.e., the addresses used to point into the master table have independent uniform distribution. Clearly, this assumption has to be slightly weakened in practice by replacing true randomness with pseudo-randomness. In Theorem 3 we therefore show that we can use pseudo-randomness without compromising security. The idea is to reduce an attack on the Chameleon key stream to an attack on the PRSG itself: Theorem 3. Let U be a random variable uniformly distributed over the key symbol space. Let MT be a strongly converging master table. Let the number s(λ ) of draws from MT be a polynomial function of the security parameter λ of CE such that the statistical diﬀerence SQ (s ) (λ ) between X (s ) and U is a negligible function under the uniform sequence assumption. Then even after replacement of the uniform sequence of addresses with a PRS, no probabilistic polynomial-time adversary can distinguish the pseudo-random key stream consisting of variables X (s ) from a truly random key stream with variables U . Before we enter into the details of the proof, we clarify the attack goal, the adversary’s capabilities, and the criteria for a successful break of (i) a PRSG and (ii) the pseudo-randomness of our Chameleon scheme’s key stream: (i) The goal of an adversary A attacking a PRSG is to distinguish the output of G on a random seed from a random string of identical length (see Deﬁnition 2).

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

21

A’s capabilities are limited to a probabilistic Turing machine whose running time is polynomially bounded in the length of its input (and thus also in the security parameter λ, which is deﬁnes the input length). A successful break is R deﬁned as follows: The challenger C generates a random seed str ←{0, 1}λ and R a random string str 1 ←{0, 1}len(λ) with uniform distribution. C then applies the R PRSG to str and obtains str 0 ← G(str ). Finally, C tosses a coin b ←{0, 1} with uniform distribution and sends str b to A. The challenge for A is to distinguish the two cases, i.e., guess whether str b was generated with the PRSG (b = 0) or the uniform distribution (b = 1). A wins if the guess b is equal to b. The advantage of A is deﬁned as: Adv (λ) := |Pr [b = 0|b = 0] − Pr [b = 0|b = 1]| ,

(7)

where the randomness is taken over all coin tosses of C and A. (ii) The goal of adversary A attacking the pseudo-randomness of the Chameleon scheme’s key stream is to distinguish n instances of X (s ) from a truly random key stream. A is limited to a probabilistic Turing machine whose running time is polynomially bounded in the length of its input (and thus also in the security parameter λ , as this input is given in unary representation). A successful break is deﬁned as follows: The challenger C generates a stream of n random keys: R K1 := (k1,1 , . . . , k1,n ) such that k1,j ←K for all j ∈ {1, . . . , n}. Next, C generR ates a random seed str ←{0, 1}λ and a strongly converging master table MT . Then C applies the PRSG to str in order to obtain a pseudo-random sequence of length len(λ) ≥ n · s · l , which is interpreted as a sequence of n · s addresses in the master table. Subsequently, C adds for each content coeﬃcient mj the corresponding s master table entries modulo Z to obtain the other key stream candidate: K0 := (k0,1 , . . . , k0,n ) such that k0,j ← sk =1 mt βj ,k mod Z . Finally, C R tosses a coin b ←{0, 1} with uniform distribution and sends key stream candidate Kb to A. The challenge for A is to distinguish the two cases, i.e., guess whether Kb was generated with the Chameleon scheme (b = 0) or the uniform distribution (b = 1). A wins if the guess b is equal to b. The advantage is analogous to (7). After deﬁnition of the attack games, we give the full proof of Theorem 3: Proof. The proof is by contradiction. Assuming that the advantage of an adversary A against the pseudo-randomness of the Chameleon scheme’s key stream is not negligible, we construct a distinguisher A for the PRSG itself, contradicting the assumptions on the PRSG from Deﬁnition 2. We show the individual steps of constructing A in Fig. 2. R

1. The challenger C generates a random seed str ←{0, 1}λ and a random string R str 1 ←{0, 1}len(λ) with uniform distribution. C then applies the PRSG to str : R str 0 ← G(str ). Finally, C tosses a coin b ←{0, 1} with uniform distribution. 2. C sends str b to A . A needs to guess b.

22

A. Adelsbach, U. Huber, and A.-R. Sadeghi

Adversary A

Challenger C

Adversary A

1) 2) −−−−−−− → − 3) 4) −−−−−−− → −

5)

6) −−−−−−−− ← 7) 8) −−−−−−−− ← Fig. 2. Construction of adversary A based on adversary A

3. A generates a strongly converging master table MT . Then A takes the string str b of length len(λ) ≥ n · s · l and interprets it as a sequence of n · s addresses in the master table according to (3). Subsequently, A adds for each content coeﬃcient mj the corresponding s master table entries modulo Z to obtain a key stream Kb := (kb,1 , . . . , kb,n ) such that kb,j ← s mt βj ,k mod Z . k =1 4. A sends the key stream Kb to A as a challenge. 5. A calculates the guess b , where b = 0 represents the random case, i.e., A guesses that Kb is a truly random key stream, and b = 1 represents the pseudo-random case, i.e., A guesses that Kb was generated with the Chameleon scheme. 6. A sends the guess b to A . 7. A copies A’s guess. 8. A sends b to C as a guess for b. To ﬁnish the proof, we need to show that if the advantage of A against the pseudo-randomness of the Chameleon key stream is not negligible, then the advantage of A against the PRSG is not negligible. We prove this by bounding the probability diﬀerences in the real attack scenario, where A is given input by a correct challenger, and the simulated attack, where A is given slightly incorrect input by A . The contradictive assumption is that A’s advantage against the Chameleon encryption scheme is not negligible in the real attack: real Pr [b = 0|b = 0] − Prreal [b = 0|b = 1] ≥ CE (λ ) , where Prreal [ ] denotes probabilities in the real attack between a Chameleon challenger and a Chameleon adversary A and CE (λ ) is A’s advantage, which is not negligible. The randomness is taken over all coin tosses of C and A.

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

23

Next, we summarize the input to A in the real attack and the simulated attack. In the real attack, A obtains either the key stream output K0 of the Chameleon scheme on a truly random seed str (b = 0), or a truly random key stream K1 (b = 1). Speciﬁcally, the key stream element k0,j of K0 is equal to s k0,j = k =1 mt βj ,k mod Z , where the truly random seed str determines the addresses of the master table entries mt j via the PRSG according to (3). In the simulated attack, A does not apply the PRSG and instead uses the challenge str b as a shortcut. A obtains either the key stream output K0 of the Chameleon scheme executed on a pseudo-random string str 0 , derived from a truly random seed str (b = 0), or the key stream output K1 of the Chameleon scheme executed on a truly random string str 1 (b = 1). The key stream outputs K0 and K1 in the simulated attack thus only diﬀer by the fact that K0 comes from a pseudo-random string and K1 from a truly random string. There is no diﬀerence between real and simulated attack for b = 0. The key stream outputs K0real and K0sim both come from a PRSG executed on a truly random seed str, leading to the following relation: real Pr [b = 0|b = 0] − Prsim [b = 0|b = 0] = 0 , where the randomness is taken over all coin tosses of C and A in the real attack and those of C, A and A in the simulated attack. For b = 1 and a real attack, A obtains a truly random key stream K1real . In the simulated attack, A operates on a truly random string str 1 that determines n · s addresses according to (3). As str 1 is truly random, the n · s addresses are also truly random with independent uniform distribution. Combined with the assumptions of the theorem, this implies that each pair of key stream elements in real and simulated attack has a negligible statistical diﬀerence. Negligible statistical diﬀerence implies polynomial-time indistinguishability [24, Section 3.2.2]. Let diﬀ (λ ) be the corresponding negligible bound on the advantage of a distinguisher, which applies for one key stream element. Then the diﬀerence between both attacks for all n key stream elements has a negligible upper bound n · diﬀ (λ ): real Pr [b = 0|b = 1] − Prsim [b = 0|b = 1] ≤ n · diﬀ (λ ) , where the randomness is taken over all coin tosses of C and A in the real attack and those of C, A and A in the simulated attack. The last three inequalities lead to a lower bound for the success probability of A in the simulated attack, which is also the success probability of A in the attack against the PRSG: sim Pr [b = 0|b = 0] − Prsim [b = 0|b = 1] ≥ CE (λ ) − n · diﬀ (λ ) As CE (λ ) is not negligible by the contradictive assumption, diﬀ (λ ) is negligible by the negligible statistical diﬀerence and n is a constant, we conclude that the

24

A. Adelsbach, U. Huber, and A.-R. Sadeghi

success probability of A against the PRSG is not negligible, completing the contradiction and the proof.

5

Implementation

The master table MT obviously becomes strongly converging for suﬃciently large L. Our simulation shows that L = 4Z gives high assurance of strong convergence. However, lower values still lead to weak convergence in the sense that it is not proven by our upper bound, but can easily be veriﬁed numerically. As discussed in Section 4.2 we need to choose the number s of draws from MT in accordance with L. The upper bound in Theorem 6 is too conservative to choose s in practice. Our simulation shows that the statistical diﬀerence SQ (s ) not only decreases with factor d ≈ 2SQ (1) < 1, but with an even smaller factor. This is due to the fact that some of the combinations ej(i ) e(k −j ) mod Z on the left-hand side of the inequality in the proof of Lemma 6 cancel out. In Appendix F we therefore give an explicit formula for calculation of the exact statistical diﬀerence after s draws from MT . The center can thus generate MT with arbitrary length L, numerically verify convergence and determine the minimum number of draws smin that provides the desired statistical diﬀerence. The content representation can be extended to cover movies and songs by interpreting them as a sequence of content items. A straightforward approach is to regularly refresh the session key. While further reﬁnements are possible, aiming to prevent sequence-speciﬁc attacks such as averaging across movie frames, they are beyond the scope of this document. However, it remains to deﬁne how the insigniﬁcant part of the content should be processed (see Section 3.2). There are three obvious options: sending it in the clear, passing it through our scheme or encrypting it separately. Note that by its very deﬁnition, this part does not give signiﬁcant information about the content and was not watermarked because the coeﬃcients do not have perceptible inﬂuence on the reassembled content. The easiest option is thus to pass them through the proposed scheme, which does not inﬂuence goodness and maintains conﬁdentiality of the content. At ﬁrst sight our proposed scheme trivially fulﬁlls the correctness requirement (see Deﬁnition 4) due to the correctness of the SSW scheme. However, both schemes face diﬃculties in the rare event that a content coeﬃcient is at the lower or upper end of the interval [0, z ], which corresponds with plaintext symbols close to 0 or Z − 1. If the additive ﬁngerprint coeﬃcient causes a trespass of lower or upper bound, the SSW scheme needs to decrease the coeﬃcient’s amplitude and round to the corresponding bound. Similarly, our scheme must avoid a wrap-around in the additive group, e.g., when plaintext symbol Z − 2 obtains a coeﬃcient of +3 and ends up at 1 after decryption. There are many options with diﬀerent security trade-oﬀs, such as sending a ﬂag or even sending the coeﬃcient in cleartext; the appropriate choice depends on further requirements of the implementation. Note that the center trivially anticipates the occurrence of a wrap-around from inspecting the content coeﬃcients.

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

5.1

25

Eﬃciency

Three performance parameters determine whether the proposed scheme is eﬃcient and implementable: transmission overhead, storage size of a receiver, and computation cost. We stress that our scheme enables a tradeoﬀ between storage size and computation cost. Increasing the size L of the master table (and thus the storage size) decreases the necessary number s of draws (and thus the computation cost), as can be seen from Lemma 6 and Deﬁnition 8, where SQ (1) and thus d decreases with L. This feature allows us to adapt the scheme to the particular constraints of the receiver, in particular to decrease s. The transmission overhead of the Chameleon scheme is 0 if the master table and receiver tables are not renewed on a regular basis. In this scenario, the Chameleon scheme’s transmission overhead is 0 because ciphertext and cleartext use the same symbol space and thus have the same length; the transmission overhead of ﬁngercasting is thus determined by that of the broadcast encryption scheme, which is moderate [5,6,7,8].8 For the storage size, we highlight the parameters of a computation-intensive implementation. Let the content be an image with n = 10, 000 signiﬁcant coeﬃcients of 16 bit length, such that Z = 216 . By testing several lengths L of the master table MT , we found a statistical quality of SQ (1) = d /2 < 1/8 for L = 8 · Z = 8 · 216 = 219 = 2l . A receiver table thus has 219 · 16 = 223 bit or 220 Byte = 210 kByte = 1 MByte, which seems acceptable in practice. The computation cost depends mostly on the number s of draws from the master table. To achieve a small statistical diﬀerence SQ (s ) , e.g., below 2−128 , we choose s = 64 and therefore SQ (s ) < 1/2 · d s = 2−1 · 2−2·64 = 2−129 by the conservative upper bound of Lemma 6. Compared to a conventional stream cipher that encrypts n·log2 Z bits, a receiver has to generate n·s·l pseudo-random bits, which is an overhead of (s · l )/ log2 Z = 76. To generate the pseudo-random key stream, the receiver has to perform n ·s table lookups and n ·(s + 1) modular operations in a group of size 216 . In further tests, we also found a more storage-intensive implementation with L = 225 and s = 25, which leads to 64 MBytes of storage and an overhead of (s · l )/ log2 Z ≈ 39. By calculating the exact statistical diﬀerence of Appendix F instead of the conservative upper bound of Lemma 6, s decreases further, but we are currently unaware of any direct formula to calculate s based on a master table length L and a desired statistical diﬀerence SQ (s ) (or vice versa). If the security requirements of an implementation require a regular renewal of the master table and the subsequent redistribution of the receiver tables, then the transmission overhead obviously increases. For each redistribution, the total key material to be transmitted has the size of the master table times the number of receivers. As mentioned before, a redistribution channel then becomes necessary if the broadcast channel does not have enough spare bandwidth. 8

For example, this overhead is far smaller than that of the trivial solution, which consists of sequentially sending an individually ﬁngerprinted copy of the content individually encrypted over the broadcast channel.

26

6

A. Adelsbach, U. Huber, and A.-R. Sadeghi

Conclusion and Open Problems

In this document we gave a formal proof of the security of a new Chameleon cipher. Applied to a generic ﬁngercasting approach, it provides conﬁdentiality of ciphertext, traceability of content and keys as well as renewability. We achieved conﬁdentiality through a combination of a generic broadcast encryption (BE) scheme and the new Chameleon cipher. The BE scheme provides a fresh session key, which the Chameleon scheme uses to generate a pseudo-random key stream. The pseudo-random key stream arises from adding key symbols at pseudo-random addresses in a long master table, initially ﬁlled with random key symbols. We have reduced the security of the pseudo-random key stream to that of a pseudo-random sequence generator. In addition, we achieved traceability of keys and content through embedding of a receiver-speciﬁc ﬁngerprint into the master table copies, which are given to the receivers. During decryption, these ﬁngerprints are inevitably embedded into the content, enabling the tracing of malicious users. We achieve the same collusion resistance as an exemplary watermarking scheme with proven security bound. It may be replaced with any ﬁngerprinting scheme whose watermarks can be decomposed into additive components. Finally, we achieved renewability through revocation, which is performed in the BE scheme. Two open problems are the most promising for future work. First of all, the detection algorithm should be extended in order to allow blind detection of a watermark even in the absence of the original content. Another open problem is to combine Chameleon encryption with a code-based ﬁngerprinting scheme in the sense of Boneh and Shaw [29]. The master table in Chameleon would need to embed components of codewords in such a way that a codeword is embedded into the content.

References 1. Adelsbach, A., Huber, U., Sadeghi, A.R.: Fingercasting—joint ﬁngerprinting and decryption of broadcast messages. Tenth Australasian Conference on Information Security and Privacy—ACISP 2006, Melbourne, Australia, July 3-5, 2006. Volume 4058 of Lecture Notes in Computer Science, Springer (2006) 2. Touretzky, D.S.: Gallery of CSS descramblers. Webpage, Computer Science Department of Carnegie Mellon University (2000) URL http://www.cs.cmu.edu/ ~dst/DeCSS/Gallery (November 17, 2005). 3. 4C Entity, LLC: CPPM speciﬁcation—introduction and common cryptographic elements. Speciﬁcation Revision 1.0 (2003) URL http://www.4centity.com/data/ tech/spec/cppm-base100.pdf. 4. AACS Licensing Administrator: Advanced access content system (AACS): Introduction and common cryptographic elements. Speciﬁcation Revision 0.90 (2005) URL http://www.aacsla.com/specifications/AACS Spec-Common 0.90.pdf. 5. Fiat, A., Naor, M.: Broadcast encryption. In Stinson, D.R., ed.: CRYPTO 1993. Volume 773 of Lecture Notes in Computer Science, Springer (1994) 480–491 6. Naor, D., Naor, M., Lotspiech, J.: Revocation and tracing schemes for stateless receivers. In Kilian, J., ed.: CRYPTO 2001. Volume 2139 of Lecture Notes in Computer Science, Springer (2001) 41–62

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

27

7. Halevy, D., Shamir, A.: The LSD broadcast encryption scheme. In Yung, M., ed.: CRYPTO 2002. Volume 2442 of Lecture Notes in Computer Science, Springer (2002) 47–60 8. Jho, N.S., Hwang, J.Y., Cheon, J.H., Kim, M.H., Lee, D.H., Yoo, E.S.: One-way chain based broadcast encryption schemes. In Cramer, R., ed.: EUROCRYPT 2005. Volume 3494 of Lecture Notes in Computer Science, Springer (2005) 559–574 9. Chor, B., Fiat, A., Naor, M.: Tracing traitors. In Desmedt, Y., ed.: CRYPTO 1994. Volume 839 of Lecture Notes in Computer Science, Springer (1994) 257–270 10. Naor, M., Pinkas, B.: Threshold traitor tracing. In Krawczyk, H., ed.: CRYPTO 1998. Volume 1462 of Lecture Notes in Computer Science, Springer (1998) 502–517 11. Kundur, D., Karthik, K.: Video ﬁngerprinting and encryption principles for digital rights management. Proceedings of the IEEE 92(6) (2004) 918–932 12. Anderson, R.J., Manifavas, C.: Chameleon—a new kind of stream cipher. In Biham, E., ed.: FSE 1997. Volume 1267 of Lecture Notes in Computer Science, Springer (1997) 107–113 13. Briscoe, B., Fairman, I.: Nark: Receiver-based multicast non-repudiation and key management. In: ACM EC 1999, ACM Press (1999) 22–30 14. Cox, I.J., Kilian, J., Leighton, T., Shamoon, T.: Secure spread spectrum watermarking for multimedia. IEEE Transactions on Image Processing 6(12) (1997) 1673–1687 15. Kilian, J., Leighton, F.T., Matheson, L.R., Shamoon, T.G., Tarjan, R.E., Zane, F.: Resistance of digital watermarks to collusive attacks. Technical Report TR-585-98, Princeton University, Department of Computer Science (1998) URL: ftp://ftp.cs.princeton.edu/techreports/1998/585.ps.gz. 16. Anderson, R.J., Kuhn, M.: Tamper resistance—a cautionary note. In Tygar, D., ed.: USENIX Electronic Commerce 1996, USENIX (1996) 1–11 17. Maurer, U.M.: A provably-secure strongly-randomized cipher. In Damg˚ ard, I., ed.: EUROCRYPT 1990. Volume 473 of Lecture Notes in Computer Science, Springer (1990) 361–373 18. Maurer, U.: Conditionally-perfect secrecy and a provably-secure randomized cipher. Journal of Cryptology 5(1) (1992) 53–66 19. Ferguson, N., Schneier, B., Wagner, D.: Security weaknesses in a randomized stream cipher. In Dawson, E., Clark, A., Boyd, C., eds.: ACISP 2000. Volume 1841 of Lecture Notes in Computer Science, Springer (2000) 234–241 20. Erg¨ un, F., Kilian, J., Kumar, R.: A note on the limits of collusion-resistant watermarks. In Stern, J., ed.: EUROCRYPT 1999. Volume 1592 of Lecture Notes in Computer Science, Springer (1999) 140–149 21. Brown, I., Perkins, C., Crowcroft, J.: Watercasting: Distributed watermarking of multicast media. In Rizzo, L., Fdida, S., eds.: Networked Group Communication 1999. Volume 1736 of Lecture Notes in Computer Science, Springer (1999) 286–300 22. Parviainen, R., Parnes, P.: Large scale distributed watermarking of multicast media through encryption. In Steinmetz, R., Dittmann, J., Steinebach, M., eds.: Communications and Multimedia Security (CMS 2001). Volume 192 of IFIP Conference Proceedings., International Federation for Information Processing, Communications and Multimedia Security (IFIP), Kluwer (2001) 149–158 23. Luh, W., Kundur, D.: New paradigms for eﬀective multicasting and ﬁngerprinting of entertainment media. IEEE Communications Magazine 43(5) (2005) 77–84 24. Goldreich, O.: Basic Tools. First edn. Volume 1 of Foundations of Cryptography. Cambridge University Press, Cambridge, UK (2001)

28

A. Adelsbach, U. Huber, and A.-R. Sadeghi

25. Bellare, M., Namprempre, C.: Authenticated encryption: Relations among notions and analysis of the generic composition paradigm. In Okamoto, T., ed.: ASIACRYPT 2000. Volume 1976 of Lecture Notes in Computer Science, Springer (2000) 531–545 26. National Institute of Standards and Technology, Announcing the Advanced Encryption Standard (AES), Federal Information Processing Standards Publication FIPS PUB 197, November 26, 2001, URL http://csrc.nist.gov/ publications/fips/fips197/fips-197.pdf . 27. National Institute of Standards and Technology, Data Encryption Standard (DES), Federal Information Processing Standards Publication FIPS PUB 46-3, October 25, 1999, URL http://csrc.nist.gov/publications/fips/fips46-3/ fips46-3.pdf . 28. Goldreich, O.: Basic Applications. First edn. Volume 2 of Foundations of Cryptography. Cambridge University Press, Cambridge, UK (2004) 29. Boneh, D., Shaw, J.: Collusion-secure ﬁngerprinting for digital data (extended abstract). In Coppersmith, D., ed.: CRYPTO 1995. Volume 963 of Lecture Notes in Computer Science, Springer (1995) 452–465

A

Abbreviations

Table 1 summarizes all abbreviations used in this document. Table 1. Abbreviations used in this document Abbreviation

B

Abbreviated Technical Term

AACS

Advanced Access Content System

AES

Advanced Encryption Standard

BE

Broadcast Encryption

CPPM

Content Protection for Pre-Recorded Media

CRL

Certiﬁcate Revocation List

CSS

Content Scrambling System

DCT

Discrete Cosine Transform

DES

Data Encryption Standard

DVD

Digital Versatile Disc

FE

Fingerprint Embedding

PRS

Pseudo-Random Sequence

PRSG

Pseudo-Random Sequence Generator

SSW

Spread Spectrum Watermarking

TV

Television

Summary of Relevant Parameters

Table 2 summarizes all parameters of our ﬁngercasting approach and the underlying ﬁngerprinting scheme, which we instantiate with the SSW scheme of [15].

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

Table 2. Parameters of the proposed ﬁngercasting scheme and the SSW scheme Parameter N ui q M mj n n CF (i ) cf (j i ) M∗ CF ∗ C cj k sess MT α mt α TF (i ) tf (αi ) RT (i ) rt (αi ) l L F s par CE par FP σ σ p bad p pos p neg δ t dec z Z ρ p

Description Number of receivers i-th receiver Maximum tolerable number of colluding receivers Representation of the original content j -th coeﬃcient of content M Number of coeﬃcients (Chameleon scheme) Number of coeﬃcients (ﬁngerprinting scheme) Content ﬁngerprint of receiver ui Coeﬃcient j of ui ’s content ﬁngerprint CF (i ) Illegal copy of the original content Fingerprint found in an illegal copy M ∗ Ciphertext of the original content M j -th coeﬃcient of ciphertext C Session key used as a seed for the PRSG Master table of the Chameleon scheme Address of a table entry α-th entry of the master table MT Table ﬁngerprint for receiver table of receiver ui h-th coeﬃcient of ui ’s table ﬁngerprint TF (i ) Receiver table of receiver ui α-th entry of the receiver table RT (i ) Number of bits needed for the binary address of a table entry Number of entries of the tables, L = 2l Number of ﬁngerprinted entries of a receiver table Number of master table entries per ciphertext coeﬃcient Input parameters (Chameleon scheme) Input parameters (ﬁngerprinting scheme) Standard deviation for receiver table Standard deviation for SSW scheme Maximum probability of a bad copy Maximum probability of a false positive Maximum probability of a false negative Goodness criterion (SSW scheme) Threshold of similarity measure (SSW scheme) Decision output of detection algorithm Upper bound of interval [0, z ] (content coeﬃcients) Key space size and cardinality of discrete interval [0, z ] Scaling factor from real numbers to group elements Order of the additive group

29

30

C

A. Adelsbach, U. Huber, and A.-R. Sadeghi

Chameleon Encryption

Deﬁnition 9. A Chameleon encryption scheme is a tuple of ﬁve polynomialtime algorithms CE := (KeyGenCE, KeyExtrCE, EncCE, DecCE, DetectCE), where: – KeyGenCE is the probabilistic key generation algorithm used by the center to set up all parameters of the scheme. KeyGenCE takes the number N of receivers, a security parameter λ , and a set of performance parameters par CE as input in order to generate a secret master table MT , a tuple TF := (TF (1) , . . . , TF (N ) ) of secret table ﬁngerprints containing one ﬁngerprint per receiver, and a threshold t . The values N and λ are public:

(MT , TF , t ) ← KeyGenCE(N , 1λ , par CE ) – KeyExtrCE is the deterministic key extraction algorithm used by the center to extract the secret receiver table RT (i ) to be delivered to receiver ui in the setup phase. KeyExtrCE takes the master table MT , the table ﬁngerprints TF , and the index i of receiver ui as input in order to return RT (i ) : RT (i ) ← KeyExtrCE(MT , TF , i) – EncCE is the deterministic encryption algorithm used by the center to encrypt content M such that only receivers in possession of a receiver table and the session key can recover it. EncCE takes the master table MT , a session key k sess , and content M as input in order to return the ciphertext C : C ← EncCE(MT , k sess , M ) – DecCE is the deterministic decryption algorithm used by a receiver ui to decrypt a ciphertext C . DecCE takes the receiver table RT (i ) of receiver ui , a session key k sess , and a ciphertext C as input. It returns a good copy M (i ) of the underlying content M if C is a valid encryption of M using k sess : M (i ) ← DecCE(RT (i ) , k sess , C ) – DetectCE is the deterministic ﬁngerprint detection algorithm used by the center to detect whether the table ﬁngerprint TF (i ) of receiver ui left traces in an illegal copy M ∗ . DetectCE takes the original content M , the illegal copy M ∗ , the session key k sess , the table ﬁngerprint TF (i ) of ui , and the threshold t as input in order to return dec = true if the similarity measure of the underlying ﬁngerprinting scheme indicates that the similarity between M ∗ and M (i ) is above the threshold t . Otherwise it returns dec = false: dec ← DetectCE(M , M ∗ , k sess , TF (i ) , t ) Correctness of CE requires that ∀ui ∈ U : DecCE(RT (i ) , k sess , EncCE(MT , k sess , M )) = M (i ) Good(M

(i )

, M ) = true

such that

(see Deﬁnition 3) with high probability.

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

D

31

Fingerprinting and Spread Spectrum Watermarking

In this section, we detail our notation of a ﬁngerprinting scheme by describing the respective algorithms of Spread Spectrum Watermarking [14,15]. This scheme is a tuple of three polynomial-time algorithms (SetupFP, EmbedFP, DetectFP). We detail each of the three algorithms in Sections D.1–D.3. D.1

Setup Algorithm

SetupFP is the probabilistic setup algorithm used by the center to set up all parameters of the scheme. SetupFP takes the number N of receivers, the number n of content coeﬃcients, a goodness criterion δ, a maximum probability p bad of bad copies, and a maximum probability p pos of false positives as input in order to return a tuple of secret content ﬁngerprints CF , containing one ﬁngerprint per receiver, as well as a similarity threshold t . The values N and n are public: (CF , t ) ← SetupFP(N , n , δ, p bad , p pos ) The algorithm of [14,15] proceeds as follows. The set of content ﬁngerprints CF is deﬁned as CF := (CF (1) , . . . , CF (N ) ). The content ﬁngerprint CF (i ) of receiver ui is a vector CF (i ) := (cf (1i ) , . . . , cf (ni) ) of n ﬁngerprint coeﬃcients. For each receiver index i ∈ {1, . . . , N } and for each coeﬃcient index j ∈ {1, . . . , n }, the ﬁngerprint coeﬃcient follows an independent normal distribution. The standard deviation of this distribution depends on the values N , n , δ, and p bad : ∀ 1 ≤ i ≤ N , ∀ 1 ≤ j ≤ n :

cf (ji ) ← N(0, σ )

with σ = fσ (N , n , δ, p bad )

The similarity threshold t is a function t = ft (σ , N , p pos ) of σ , N , and p pos . The details of fσ and ft can be found in [15]. D.2

Watermark Embedding Algorithm

EmbedFP is the deterministic watermark embedding algorithm used by the center to embed the content ﬁngerprint CF (i ) of receiver ui into the original content M . EmbedFP takes the original content M and the secret content ﬁngerprint CF (i ) of receiver ui as input in order to return the ﬁngerprinted copy M (i ) of ui : M (i ) ← EmbedFP(M , CF (i ) ) The algorithm of [14,15] adds the ﬁngerprint coeﬃcient to the original content coeﬃcient to obtain the ﬁngerprinted content coeﬃcient: ∀j ∈ {1, . . . , n } : D.3

mj(i ) ← mj + cf (ji )

Watermark Detection Algorithm

DetectFP is the deterministic watermark detection algorithm used by the center to verify whether an illegal content copy M ∗ contains traces of the content

32

A. Adelsbach, U. Huber, and A.-R. Sadeghi

ﬁngerprint CF (i ) that was embedded into the content copy M (i ) of receiver ui . DetectFP takes the original content M , the illegal copy M ∗ , the content ﬁngerprint CF (i ) , and the similarity threshold t as input and returns the decision dec ∈ {true, false}: dec ← DetectFP(M , M ∗ , CF (i ) , t ) The algorithm of [14,15] calculates the similarity measure between the ﬁngerprint in the illegal copy and the ﬁngerprint of the suspect receiver. The similarity measure is deﬁned as the dot product between the two ﬁngerprints, divided by the Euclidean norm of the ﬁngerprint in the illegal copy: CF ∗ ← M ∗ − M CF ∗ · CF (i ) Sim(CF ∗ , CF (i ) ) ← ||CF ∗ || If Sim(CF ∗ , CF (i ) ) > t Then Return dec = true Else Return dec = false

E

Broadcast Encryption

In this section we describe a general BE scheme that allows revocation of an arbitrary subset of the set of receivers. Examples for such BE schemes are [6,7,8]. As these schemes all belong to the family of subset cover schemes deﬁned in [6], we use this name to refer to them: Deﬁnition 10. A Subset Cover BE (SCBE) scheme is a tuple of four polynomial-time algorithms (KeyGenBE, KeyExtrBE, EncBE, DecBE), where: – KeyGenBE is the probabilistic key generation algorithm used by the center to set up all parameters of the scheme. KeyGenBE takes the number N of receivers and a security parameter λ as input in order to generate the secret master key MK . The values N and λ are public:

MK ← KeyGenBE(N , 1λ ) – KeyExtrBE is the deterministic key extraction algorithm used by the center to extract the secret key SK (i ) to be delivered to a receiver ui in the setup phase. KeyExtrBE takes the master key MK and the receiver index i as input in order to return the secret key SK (i ) of ui : SK (i ) ← KeyExtrBE(MK , i) – EncBE is the deterministic encryption algorithm used to encrypt session key k sess in such a way that only the non-revoked receivers can recover it. EncBE takes the master key MK , the set R of revoked receivers, and session key k sess as input in order to return the ciphertext CBE : CBE ← EncBE(MK , R, k sess )

Fingercasting–Joint Fingerprinting and Decryption of Broadcast Messages

33

– DecBE is the deterministic decryption algorithm used by a receiver ui to decrypt a ciphertext CBE . DecBE takes the index i of ui , its private key SK (i ) , and a ciphertext CBE as input in order to return the session key k sess if CBE is a valid encryption of k sess and ui is non-revoked, i.e., ui ∈ / R. Otherwise, it returns the failure symbol ⊥: k sess ← DecBE(i, SK (i ) , CBE )

if

ui ∈ /R

Correctness of a SCBE scheme requires that ∀ui ∈ U \ R :

F

DecBE(i, SK (i ) , EncBE(MK , R, k sess )) = k sess .

Selection of the Minimum Number of Draws

The center can calculate the statistical diﬀerence after s draws if it knows the corresponding probability distribution. The next lemma gives an explicit formula for this probability distribution. To determine the minimum number of draws to achieve a maximum statistical diﬀerence, e.g., 2−128 , the center increases s until the statistical diﬀerence is below the desired maximum. Note that this only needs to be done once at setup time of the system when s is chosen. Lemma 7. If the draws use addresses with independent uniform distribution and the master table MT is given in the representation of Lemma 5, then the drawing and adding of s master table entries leads to the random variable s Xj mod Z with X (s ) := j =1

Pr [X

(s )

= x] =

condition

s s0 , . . . , sZ −1

Z −1 k =0

where condition ⇔ (8) ∧ (9) ∧ (10) : sk ≥ 0 ∀ k ∈ {0, 1, . . . , Z − 1} Z −1

sk = s

pk sk

(8) (9)

k =0 Z −1

(

sk · xk ) mod Z = x ,

(10)

k =0

where sk denotes the number of times that key space element xk was chosen in s s! := s0 !·...·s denotes the multinomial coeﬃcient. the s selections and s0 ,...,s Z −1 Z −1 ! Proof. Each of the s selections is a random variable Xj with Pr [Xj = xk ] = pk . The independence of the random addresses transfers to the independence of the Xj . The probability of a complete set of s selections is thus a product of s s probabilities of the form 1 p with appropriate indices. The counter sk stores

34

A. Adelsbach, U. Huber, and A.-R. Sadeghi

the number of times that probability pk appears in this term. This counter is non-negative, implying( 8). In total, there are s selections, implying (9). To fulﬁll the condition X (s ) = x , the addition modulo Z of the s random variables Z −1must have the result x . Given the counters sk , the result of the addition is ( k =0 sk · xk ) mod Z . The combination of both statements implies (10). There is more than one possibility for selecting sk times the key symbol xk during the s selections. Considering all such key symbols in s selections, the total number of possibilities is the number of ways in which we can choose s0 times we reach the key symbol x0 , then s1 times the key symbol x1 , and so forth until s . a total of s selections. This number is the multinomial coeﬃcient s0 ,...,s Z −1 Note that we can trivially verify that the probabilities of all key space elements x in Lemma 7 add to 1. Among the three conditions (8), (9), and (10), the ﬁrst two conditions appear in the well-known multinomial theorem Z −1

(

k =0

s

pk ) =

s0 ,...,sZ −1 ≥0 s0 +...+sZ −1 =s

s s0 , . . . , sZ −1

Z −1

pk sk

k =0

By adding the probabilities over all elements, we obviously add over all addends on the right-hand side of the multinomial theorem. As the left-hand side trivially adds to 1, so do the probabilities over all key space elements.

An Estimation Attack on Content-Based Video Fingerprinting Shan He1 and Darko Kirovski2 1

2

Department of Electrical and Computer Engineering University of Maryland, College Park, MD 20742 U.S.A Microsoft Research, One Microsoft Way, Redmond WA 98052 U.S.A [email protected], [email protected]

Abstract. In this paper we propose a simple signal processing procedure that aims at removing low-frequency fingerprints embedded in video signals. Although we construct an instance of the attack and show its efficacy using a specific video fingerprinting algorithm, the generic form of the attack can be applied to an arbitrary video marking scheme. The proposed attack uses two estimates: one of the embedded fingerprint and another of the original content, to create the attack vector. This vector is amplified and subtracted from the fingerprinted video sequence to create the attacked copy. The amplification factor is maximized under the constraint of achieving a desired level of visual fidelity. In the conducted experiments, the attack procedure on the average halved the expected detector correlation compared to additive white gaussian noise. It also substantially increased the probability of a false positive under attack for the addressed fingerprinting algorithm. Keywords: Video watermarking, fingerprinting, signal estimation.

1 Introduction Content watermarking is a signal processing primitive where a secret noise signal w is added to the original multimedia sequence x so that: (i) perceptually, the watermarked content y = x + w is indistinguishable from the original and (ii) watermark detection produces low error rates both in terms of false positives and negatives. An additional requirement is that the watermark should be detected reliably in marked content even after an arbitrary signal processing primitive f () is applied to y such that f (y) is a perceptually acceptable copy of x. Function f () is constructed without the knowledge of w. Content fingerprinting is a specific application of content watermarking with an objective to produce many unique content copies. Each copy is associated with a particular system user. Thus, a discovered content copy that is illegally used, can be traced to its associated user. Here, a distinct watermark wi (i.e., a fingerprint) is applied to x to create a unique content copy yi . We will denote the set of all fingerprints as W = {w1 , . . . , wM } published in Y = {y1 , . . . , yM }. The fingerprint detector d(x, f (yi ), W) should return the index of the user i associated with the content under Y.Q. Shi (Eds.): Transactions on DHMS II, LNCS 4499, pp. 35–47, 2007. c Springer-Verlag Berlin Heidelberg 2007

36

S. He and D. Kirovski

test yi . Typically, this decision is associated with a confidence level which must be high. In particular, one demands low probability of false positives: Pr[d(x, f (yi ), W) = j, j = i] < εF P ,

(1)

ˆ which is not marked where εF P is typically smaller than 10−9 . In case a content copy y with any of the fingerprints in W, is fed to the detector, it should report that no fingerˆ : d(x, y ˆ , W) = 0 with high confidence. Finally, the detector uses print is identified in y the knowledge of the original x while making its decision. This feature substantially improves the accuracy of the forensic detector compared to “blind” detectors [1] which are prone to de-synchronization attacks [2]. Attacks against fingerprinting technologies can be divided into two classes: collusion and fingerprint removal. A collusion attack considers an adversarial clique Q ⊂ Y of a certain size K. The participating colluders compare their fingerprinted copies to produce a new attack copy which does not include statistically important traces of any of their fingerprints [3, 4]. Another objective that a collusion clique may have, is to frame an innocent colluder. Collusion attacks have attracted great deal of attention from the research community which has mainly focused on producing codes that result in improved collusion resistance [5, 6, 7, 8, 9]. 1.1 Fingerprint Estimation In this paper we address the other class of attacks on multimedia forensic schemes: fingerprint removal via estimation. Here, the adversary has the objective to estimate the value of a given fingerprint wi based upon yi only and without the presence of d(). In essence, this attack aims at denoising yi from its fingerprint. In order to make denoising attacks harder, one may design fingerprints dependent upon x so that it is more difficult to estimate them accurately. The effects of this class of attacks are orthogonal to collusion. An adversarial clique may deploy both types of attacks to achieve its goal: Estimation, to reduce the presence of individual fingerprints in their respective copies, and collusion, to perform the removal of the remaining fingerprint traces by creating a final attack copy. For example, a forensic application that uses spread-spectrum fingerprints wi ∈ {±1}N , where N is sequence length, detects them using a correlation based detector c(x, a, wi ) = N −1 (a − x) · wi , where a is the content under test and operator ‘·’ denotes an inner product of two vectors [1]. Content a is a result of forensic multimedia registration exemplified in [4]. In case a is marked with wi , we model a = x + wi + n, where n is a low magnitude gaussian noise. Under the assumption that E[n · wi ] = 0, we have E[c(x, a, wi )] = 1 and E[c(x, a, wi )] = 0 in case when a is and is not marked with wi respectively. Fingerprint detection is performed using a Neyman-Pearson test c(x, a, wi ) ≶ T , where the detection threshold T establishes the error probabilities for false positives and negatives. As an example, the adversarial clique Q may use estimation and collusion via averaging to produce a “clean” copy of the Content content. K averaging by a collusion of K users produces a copy z = K −1 i=1 yi such that E[c(x, z, wi ∈ Q)] = K −1 . If we denote the efficacy of fingerprint estimation using

An Estimation Attack on Content-Based Video Fingerprinting

37

E[c(x, ei , wi )] = α1 , where ei is the attack vector computed via estimation from yi , K then E[c(x, K −1 i=1 (yi − ei ), wi )] = (αK)−1 . Thus, in the asymptotic case, the estimation attack improves the overall effort by the colluders for a scaling factor α. Knowing that collusion resistance of the best fingerprinting codes for 2 hour video sequences is on the order of K ∼ 102 [7,10], we conclude that estimation is an important component of the overall attack. Finally, it appears that estimating fingerprints is no different from estimating arbitrary watermarks. However, there exists a strong difference in the way how watermarks for content screening [11] and fingerprinting [4] are designed. The replication that is necessary for watermarks tailored to content screening1, makes their estimation substantially easier [11]. On the other hand, fingerprints can be designed with almost no redundancy which makes their estimation substantially more difficult. At last, during fingerprint detection, the forensic tool has access to the original which greatly improves the detection rates.

2 Related Work The idea of watermark removal via estimation is not new. To the best of our knowledge, all developed schemes for the estimation attack have targeted “blindly” detected watermarks. For example, Langelaar et al. used a 3 × 3 median and 3 × 3 high pass filters to successfully launch an estimation attack on a spread spectrum image watermarking scheme [12]. Su and Girod used a Wiener filter to estimate arbitrary watermarks; they constructively expanded their attack to provide a power-spectrum condition required for a watermark to resist minimum mean-squared error estimation [13]. Next, Voloshynovskiy et al. achieved partial watermark removal using a filter based on the Maximum a Posteriori (MAP) principle [14]. Finally, Kirovski et al. investigated the security of a direct-sequence spread-spectrum watermarking scheme for audio by statistically analyzing the effect of the estimation attack on their redundant watermark codes [11]. They used the estimation attack of the form: ⎤ ⎡ (2) e = sign ⎣ (xj + w)⎦ , j∈J

where J is a region in the source signal x marked with the same watermark chip w. This attack can be optimal under a set of assumptions about the watermark and the source signal [11]. In this paper, we propose a simple but novel joint source-fingerprint estimator which performs particularly well on low-frequency watermarks. We also show an interesting anomaly specific to watermarking schemes that construct watermarks dependent upon the source: by applying an attack vector dependent upon the source such as vectors produced by our estimation attack, the probability of false positives may substantially increase in the system compared to additive white gaussian noise of similar magnitude. If discovered and unresolved, this issue renders a forensic technology inapplicable. 1

To resist de-synchronization attacks.

38

S. He and D. Kirovski

3 A Video Fingerprinting Scheme In order to present our estimation attack, we use an existing well-engineered video fingerprinting scheme. The scheme is based upon the image watermarking approach presented in [15] and adjusted and improved to video fingerprinting by Harmanci et al. [16, 17, 18]. Their video fingerprinting scheme marks the content by designing a complexity-adaptive watermark signal via solving an optimization problem. The marking process is performed in several steps. First, each frame of the video sequence is transformed into the DWT(Discrete Wavelet Transform) domain. Since watermarks are applied only to the DC sub-bands (the lowest frequency sub-bands), the algorithm packs these coefficients into a 3D prism x(a, b, t), where the third dimension t represents the frame index (i.e., time). Based upon a unique user key, the fingerprint embedding algorithm selects pseudo-randomly, in terms of positions and sizes, a collection of subprisms P = {p1 , . . . , pn } ⊂ x that may overlap. Prisms’ dimensions are upper and lower bounded (e.g., from 12 × 16 × 20 to 36 × 48 × 60). Then, the coefficients in each prism pj ∈ P are weighted using a smooth weighting prism uj . The weighting prisms are generated pseudo-randomly using a user-specific secret key. Finally, the algorithm computes first order statistics for each g(pj · uj ) (e.g., g() computes the mean of its argument) and quantizes them using a private quantizer q(g(pj · uj ), bit), where bit represents the embedded user-specific data. The desired watermark strength is achieved by adjusting the quantization step size during the embedding. The content update Δj = q(g(pj ·uj ), bit)− g(pj ·uj ) is spread among the pixels of the containing prism using an optimization primitive. To get a better visual quality, Harmanci et al. generate a “complexity map” c using the spatial and temporal information of each component, which is then employed in solving the underlying optimization problem to regularize the watermark. Specifically, the spatial complexity cs (a, b, t) for a given component in the DWT-DC sub-band is determined by estimating the variance of the coefficients in a v = M × M 2D window centered at (a, b, t). Typically, M = 5. The decision relies on the i.i.d. assumption for the coefficients. Using the Gaussian entropy formula cs = 12 log 2πeσ 2 (v), where σ 2 () denotes argument variance, the algorithm estimates the spatial entropy rate of that component and uses it as a measure of spatial complexity. To determine the temporal complexity ct , the scheme performs first order auto-regression (AR) analysis with window length L among the corresponding components along the optical flow [19]. The temporal complexity is obtained by applying the Gaussian entropy formula on the distribution of the innovation process of the AR1 model. Then, ct and cs are linearly combined to compute c. By employing the “complexity map,” the resulting watermark is locally adapted to the statistical complexity of the signal. While aimed at improving the perceptual quality of the resulting sequence, the complexity map significantly reduces the exploration space for watermark estimation. Based upon the complexity map, the watermark embedding procedure computes the optimal update values for each DWT-DC coefficient that realizes the desired Δj for each selected prism pj . Finally, the scheme applies a low-pass filter both spatially and temporally on the watermark signal to further improve watermark’s imperceptibility. Figure 1 shows an example watermark extracted from a single frame of our test video sequence as well as the frequency spectrum analysis of the watermark. One can notice

An Estimation Attack on Content-Based Video Fingerprinting

(a)

(a)

(b)

(b)

39

Magnitude of FFT coefficients of wmk

6000 4000 2000 0 −2000 −4000 −6000 100 −8000 80

60

50 40

20

0

0

(c)

(c)

Fig. 1. Fingerprint example: (a) original frame from the benchmark video clip, (b) resulting fingerprint constructed as a marking of this frame – the fingerprint is in the pixel domain, scaled by a factor 10 and shift to a mean of 128, and (c) watermark amplitude in the DFT domain

Fig. 2. Demonstration of perceptual quality: the first frame of the (a) attacked video with α = 1.5, (b) attacked video with α = 1, and (c) original video

that the effective watermark is highly smoothed and that most of the watermark energy is located in the low-frequency band. This conclusion is important for the application of the estimation attack.

40

S. He and D. Kirovski

Given a received video signal z, the detector first employs the information of the original video signal to undo the operations such as histogram equalization, rotation, de-synchronization, etc. Next, using a suspect user key, the detector extracts the feature vector in the same way as the embedding process. It employs a correlation based detection to identify the existence of a watermark as follows: γ=

g − g) (gz − g) · (ˆ ≶ T, ||ˆ g − g||2

(3)

ˆ = {q(g(pj · uj )), j = 1 . . . n}, and g = pj · uj ), j = 1 . . . n}, g where gz = {g(¯ ¯ j represents a prism extracted from z at a position that {g(pj · uj ), j = 1 . . . n}, and p corresponds to the position of pj within x. If γ is greater than a certain threshold T , the detector concludes that z is marked with the fingerprint generated using the suspect user key; otherwise, no fingerprint is detected.

4 Joint Source-Fingerprint Estimation In this paper, we propose a simple attack with an objective to perform joint sourcefingerprint estimation. Based upon the observation that the targeted fingerprints are mainly located in the low-frequency band, we propose a dual-filter attack that is relatively computationally inexpensive and efficient. The estimation attack is performed in the DWT-DC domain where the fingerprints are embedded. For each coefficient x(a, b, t) in this domain, we choose three prisms k1 , k2 and k3 , all centered at x(a, b, t). The outer and largest of the prisms, k1 , encompasses the next smaller one, k2 ⊂ k1 . Prism k3 is smaller than k1 . We average the coefficients inside two 3D regions: inside k3 and inside k1 − k2 . Since both the smoothing and weighting functions are built to maintain in most cases the same sign for the fingerprint over a certain small region in x, we use: e3 =

1 [x(p) + w(p)] |k3 |

(4)

p∈k3

as the estimate of x ¯ + w(a, b, t) where x ¯ denotes the mean of the underlying source. As the targeted fingerprint is a low-frequency signal, we assume that sign(w(p)) is mostly univocal for p ∈ k3 , thus, sign(|k3 |−1 p∈k3 w(p)) represents a good estimate of sign(w(a, b, t)). Next, we use: e12 =

1 |k1 − k2 |

[x(p) + w(p)]

(5)

p∈k1 −k2

to obtain an alternate estimate of x ¯ only. The reasoning is that the fingerprint spread in the region k1 − k2 , has a variable sign and that it would average itself out in e12 . To achieve this goal, the size of k1 − k2 should be large enough. Also, the size of k3 is chosen to be relatively small to capture the sign of w(a, b, t) and to get a stable x ¯ inside k3 . Usually, we choose the size of k3 to be (6,8,10) to (10,12,14); k1 has size about 4

An Estimation Attack on Content-Based Video Fingerprinting

41

k1 k2 k3

a

Low-pass filter

e3 − e12

Complexity map

α⋅

x + wi

t

zi

−

b

Low-pass filter

Fig. 3. Diagram of the estimation attack

times as large as k3 ; and k2 is comparable to k3 or even smaller. Finally, we construct the attack as: z = x + wi − αc · (e3 − e12 ),

(6)

where α is an amplification factor that can be tuned up as long as z is acceptably perceptually similar to x + wi . In addition, we use a complexity map c derived prior to the attack to improve the perceptual effect of the attack and thus, maximize α. The procedure for computing the complexity map is described in Section 3. Since most of the watermark is concentrated in the low frequency band, we employ low-pass filter on the watermarked video signal before and after the estimation attack described in Eqn.6. The diagram of the final attack process is illustrated in Figure 3.

5 Experimental Results In this section, we demonstrate the effectiveness of the proposed estimation attack. In the experiments we choose the “Rodeo Show” video sequence with frame size 640×480 as the host video sequence, and apply the video fingerprinting scheme of [17]. The embedding parameters are chosen to obtain a solid trade-off between perceptual quality and robustness. The deployed fingerprinting scheme is particularly efficient for video sequences with significant “random” motion, thus, the used video sequence is selected to exhibit the best in the marking scheme. We apply the estimation attack using a prism k3 of size 7 × 9 × 11, a large prism k1 of size 25 × 33 × 81, and k2 = ∅. We chose α ∈ {0.5, 1, 1.5} to adjust the attack strength for high, medium, and low perceptual fidelity of the resulting video sequence respectively. Figure 2 illustrates the resulting perceptual quality for the attacked signal for α ∈ {1, 1.5} using the first frame of the benchmark video sequence. We show the results of the estimation attack in Figures 4 and 5. First, we use 50 different keys to create and embed distinct fingerprints into the test video sequence, resulting in 50 unique copies. Then, in each of these copies we perform fingerprint detection using the corresponding key used during fingerprint embedding. Figures 4(a), (b) and (c) represent the histogram of the detection statistic γ for α = {1.5, 1, 0.5} respectively. The (mean, variance) for these three histograms are (a): (0.407, 0.214);

42

S. He and D. Kirovski

Frequency

Histogram of Estimation Attack w/ LPF α = 1.5 9

Histogram of gamma 100 keys with signal after estimation attack at α =1.5 18

8

16

7

14

6

12

5

10

4

8

3

6

2

4

1

2

0 −0.5

0

0.5 Gamma

1

1.5

0 −1.5

−1

−0.5

(a)

0

0.5

1

1.5

(a)

Histogram of gamma w/ LPF at α = 1

Histogram of gamma 100 keys with signal after estimation attack at α = 1 18

12

16

10 14

Frequency

8

12 10

6 8

4

6 4

2 2

0 −0.2

0

0.2

0.4

0.6 gamma

0.8

1

1.2

1.4

0 −0.8

−0.6

−0.4

−0.2

(b)

0

0.2

0.4

0.6

0.8

(b)

Histogram of Estimation Attack w/ LPF α = 0.5

Histogram of gamma 100 keys with signal after estimation attack at α = 0.5 25

8 7

20

6

Frequency

5

15

4 10

3 2

5

1 0 0.5

0.6

0.7

0.8

0.9 Gamma

1

1.1

1.2

1.3

0 −0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

(c)

(c)

Fig. 4. Histogram of γ under the estimation attack for different α in the case of detecting using the same key as embedding: (a) α = 1.5, (b) α = 1, (c) α = 0.5

Fig. 5. Histogram of γ under the estimation attack for different α in the case of detecting using a different key as embedding: (a) α = 1.5, (b) α = 1, (c) α = 0.5

(b): (0.661, 0.095); (c): (0.866, 0.023). From the results, we can observe that due to the estimation attack, the mean of γ significantly deviates from γ = 1 (the expected value when there is no attack) and the deviation increases for large α. On the average, approximately 60% and 35% of the fingerprint correlation is removed after applying the estimation attack with α = {1.5, 1} respectively. More importantly, the variance of the

An Estimation Attack on Content-Based Video Fingerprinting

43

γ statistic becomes relatively large, e.g., the range of γ for α = 1.5 covers {−0.5, 1.3}. Compared to additive white gaussian noise (AWGN) of the same magnitude as our estimation attack, which will be shown later, the fingerprint detector experiences a nearly 12-fold (from σ 2 (γ) = 0.0175 to 0.2136) and 19-fold (from σ 2 (γ) = 0.0050 to 0.0951) increase in the variance of the detection statistic γ for α = {1.5, 1} respectively. We use this observation to point to a significant anomaly of the particular fingerprinting scheme [17]. According to [16, 17, 18], the examined fingerprinting scheme has been tested under various attacks. It was reported that after the MCTF(Motion Compensated Temporal Filtering) attack with various filter lengthes, the detection statistic γ ranges from 0.85 to 1. Thus the fingerprint can be detected with high probability. Other attacks such as rotation by 2 degrees, cropping by 10%, and the MPEG2 compression at bit rate 500kpbs result in the detection statistic γ around 1 and range within [0.6,1.4] [16, 17]. A general estimation attack based on Wiener filtering similar to the one in [13] was proposed and examined in [15], where the watermark can be detected without an error. Compared with these non-content dependent attacks, the proposed attack is more effective in removing the watermark. In the second set of experiments, we examine the scenario when a fingerprint is created and embedded using a key i and detected with a different key j. This test aims at estimating the probability of a false positive under attack, a feature of crucial importance for fingerprinting systems. A solid fingerprinting scheme must observe low probability of false positives for both cases: when detection is done on x + wi as well as f (x + wi ). Function f () represents an arbitrary attack procedure that does not have knowledge of the user keys. According to [16], the detection statistic γ with incorrect detection key ranges within [-0.02,0.02]. However, from Figure 5, one can observe that the proposed estimation attack increases the variance of γ so that non-trivial portion of the keys results in γ as high as 0.8 or even 1. Compared to additive white gaussian noise (AWGN) of the same magnitude as our estimation attack, the fingerprint detector experiences a nearly 14-fold (from σ(γ)2 = 0.0175 to 0.2418) and 20-fold (from σ(γ)2 = 0.0050 to 0.1008) increase in the variance of the detection statistic γ for α= √ {1.5, 1} respectively. Since the tail of the gaussian error function is proportional to N /σ(γ), in order to maintain the same level of false positives as in the case of detecting a fingerprint on the attacked x + wi , the detector must consume 10 ∼ 20 times more samples to produce equivalent error rates. We were not able to understand analytically the unexpected increase in false positives under the estimation attack – however, we speculate that the dependency of watermarks with respect to the source (content-dependent watermarking) has made them prone to attack vectors which are also content-dependent. To further demonstrate the effectiveness of the proposed estimation attack, we apply the AWGN attack with the same energy as introduced by the estimation attacks. We choose α = 1.5 as an example. In Figure 6(a) and (b), we show the histogram of γ for the case of “same-key” and “different-key” detection, respectively. The increase of the variance is far less significant than that incurred by the estimation attack. Figure 6(c) shows the visual quality of the AWGN-attacked frame, from which we can see that the distortion introduced by AWGN is more noticeable than that introduced

44

S. He and D. Kirovski Histogram of gamma w/ DWT domain AWGN

γ for attacked z and attacked org x with α = 1.5

14

1.5

12

1

10

γ

frequency

0.5

8 0

6 −0.5

4 −1

2

attacked FPed y attacked org x

0 0.4

−1.5

0.5

0.6

0.7 0.8 gamma

0.9

1

1.1

0

5

10

15

20

(a)

25 Key Index

30

35

40

45

50

(a)

Histogram of gamma w/ DWT domain AWGN α = 1.5

γ for attacked z and attacked org x with α = 1

25

1.5

1

20 0.5

γ

15 0

10 −0.5

5

attacked FPed z attacked org x threshold h1

−1

threshold h

2

−1.5

0 −0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0

0.4

5

10

15

20

(b)

25 Key Index

30

35

40

45

50

45

50

(b) γ for attacked z and attacked org x with α = 0.5 1.5

1

0.5

γ

0

−0.5

−1

attacked FPed z attacked org x

−1.5

−2

0

5

10

15

20

25 Key Indiex

30

35

40

(c)

(c)

Fig. 6. Histogram of γ under the AWGN attack with equivalent energy as the estimation attack for α = 1.5: (a) detecting with the same key as embedding; (b) detecting with a different key as embedding; (c) frame after the AWGN attack

Fig. 7. Detection statistic γ with respect to varˆ and attacked ious keys for attacked signal z ˆ : (a) α = 0.5; (b) α = 1; (c) original signal x α = 1.5

by the estimation attack. Comparison of the probability of error and visual quality between the estimation and AWGN attacks, demonstrates that the proposed attack successfully captures the content-based watermark and is a far-stronger attack than the “blind” AWGN attack.

An Estimation Attack on Content-Based Video Fingerprinting

45

6 Discussions and Countermeasure As can be seen from the experimental results, the power of the proposed attack lies in the introduced high probability of false positive Pf p . To better understand this effect, we also examine the detection performance after applying the estimation attack directly onto original signal x and detecting it with various keys. The results are shown in Figure 7 along with the detection of the attacked signal zˆ = f (x+wi ) using corresponding key i. The estimation strength α for Figure 7(a) (b) and (c) are chosen to be 1.5, 1 and 0.5 respectively. The results clearly show that the high false positive probability in ˆ is highly correlated with detection comes from the fact that the attacked original signal x the fingerprints generated from many keys. The underlying reason is that the estimation process on the original signal estimates the low frequency information from the x. On the other hand, each fingerprint is built to be content related and has gone through an intensive low-pass filtering process in the addressed video fingerprinting scheme [17]. As a result, the fingerprint mainly contains the low frequency information of x and thus ˆ , which leads to a large value of false positive probability highly correlated with the x Pf p . Now considering from the embedder’s perspective, we try to find ways to combat this estimation attack. From Figure 7 we see that the detection statistic γ is key-dependent, ˆ is high, while for others the i.e. for some keys, the γ for the attacked original signal x γ is low. Since the embedder has the freedom to choose secret keys for embedding, he can leverage on this freedom to deploy a countermeasure by using only the key set that results in low Pf p . Specifically, the embedder can first examine a large set of keys and ˆ . The embedder can then choose those keys that have high γ on zˆ while have low γ on x define two thresholds h1 and h2 , according to the desired Pf n and Pf p respectively, to help the sifting process as shown in Figure 7(b). The keys whose γ on zˆ is higher ˆ is lower than h2 are eligible for embedding. Other keys may result than h1 and γ on x in high Pf p or Pf n and will be discarded. In the example shown in Figure 7 (b), only 36th, 47th and 50th keys are eligible for embedding given h1 = 0.8 and h2 = 0.3. This countermeasure is quite straight forward and requires a significant amount of computations to select the key set. Moreover, the number of eligible keys are quite limited, e.g. only 3 out of 50 keys in Figure 7(b) satisfy the condition of h1 and h2 . Thus, to get a certain number of eligible keys, the embedder has to examine a large pool of keys. This may not be feasible for real applications such as fingerprinting a 2-hour movie signal. The results suggest that introducing low-frequency content-based signal as fingerprint is vulnerable to the estimation type of attack, which should be taken into consideration in the fingerprint design.

7 Conclusions We proposed a simple dual-filter estimator that aims at removing low-frequency fingerprints embedded in video signals. Although we construct an instance of the attack and show its efficacy using a specific video fingerprinting algorithm, the generic form

46

S. He and D. Kirovski

of the attack can be applied to an arbitrary video marking scheme. In the conducted experiments, the attack procedure on the average removed a substantial portion of the embedded fingerprints compared to additive white gaussian noise. To the best of our knowledge, the attack is the first in published literature to induce a substantial increase of false positives in a particular fingerprinting scheme as opposed to a “blind” attack.

Acknowledgment We thank Dr. M.K. Mihcak and Dr. Y. Yacobi for the valuable discussions.

References 1. I. Cox, J. Kilian, F. Leighton, and T. Shamoon, “Secure Spread Spectrum Watermarking for Multimedia”, IEEE Trans. on Image Processing, 6(12), pp.1673–1687, 1997. 2. F.A.P. Petitcolas, R.J. Anderson, and M.G. Kuhn. “Attacks on Copyright Marking Systems”. Info Hiding Workshop, pp.218–238, 1998. 3. F. Ergun, J. Kilian and R. Kumar, “A Note on the limits of Collusion-Resistant Watermarks”, Eurocrypt ’99, 1999. 4. D. Schonberg and D. Kirovski. “Fingerprinting and Forensic Analysis of Multimedia”. ACM Multimedia, pp.788-795, 2004. 5. D. Boneh and J. Shaw, “Collusion-secure Fingerprinting for Digital Data”, IEEE Tran. on Information Theory, 44(5), pp.1897-1905, 1998. 6. Y. Yacobi, “Improved Boneh-Shaw Content Fingerprinting”, CT-RSA 2001, LNCS 2020, pp.378-391, 2001. 7. W. Trappe, M. Wu, Z.J. Wang, and K.J.R. Liu, “Anti-collusion Fingerprinting for Multimedia”, IEEE Trans. on Sig. Proc., 51(4), pp.1069-1087, 2003. 8. Z.J. Wang, M. Wu, H. Zhao, W. Trappe, and K.J.R. Liu, “Anti-Collusion Forensics of Multimedia Fingerprinting Using Orthogonal Modulation”, IEEE Trans. on Image Proc., pp. 804–821, June 2005. 9. S. He and M. Wu, “Joint Coding and Embedding Techniques for Multimedia Fingerprinting,” IEEE Trans. on Info. Forensics and Security, Vol.1, No.2, pp.231–247, June 2006. 10. D. Kirovski. “Collusion of Fingerprints via the Gradient Attack”. IEEE International Symposium on Information Theory, 2005. 11. D. Kirovski and H.S. Malvar. “Spread Spectrum Watermarking of Audio Signals”. IEEE Transactions on Signal Processing, Vol.51, No.4, pp.1020-33, 2003. 12. G. Langelaar, R. Lagendijk, and J. Biemond. “Removing Spatial Spread Spectrum Watermarks by Non-linear Filtering”. Proceedings of European Signal Processing Conference (EUSIPCO 1998), Vol.4, pp.2281–2284, 1998. 13. J. Su and B. Girod. “Power Spectrum Condition for L2-efficient Watermarking”. IEEE Proc. of International Conference on Image Processing (ICIP 1999), 1999. 14. S. Voloshynovskiy, S. Pereira, A. Herrigel, N. Baumgrtner, and T. Pun. “Generalized watermarking attack based on watermark estimation and perceptual remodulation”. SPIE Conference on Security and Watermarking of Multimedia Content II, 2000. 15. M.K. Mihcak, R. Venkatesan, and M. Kesal. “Watermarking via Optimization Algorithms for Quantizing Randomized Statistics of Image Regions”. Allerton Conference on Communications, Computing and Control, 2002. 16. M. Kucukgoz, O. Harmanci, M.K. Mihcak, and R. Venkatesan. “Robust Video Watermarking via Optimization Algorithm for Quantization of Pseudo-Random Semi-Global Statistics”. SPIE Conference on Security, Watermarking and Stegonography, San Jose, CA, 2005.

An Estimation Attack on Content-Based Video Fingerprinting

47

17. O. Harmanci and M.K. Mihcak. “ Complexity-Regularized Video Watermarking via Quantization of Pseudo-Random Semi-Global Linear Statistics”. Proceedings of European Signal Processing Conference (EUSIPCO), 2005. 18. O. Harmanci and M.K. Mihcak. “Motion Picture Watermarking Via Quantization of PseudoRandom Linear Statistics”. Visual Communications and Image Processing Conference, 2005. 19. S.B. Kang, M. Uyttendaele, S.A.J. Winder, and R. Szeliski. “High Dynamic Range Video”, ACM Trans. on Graphics, Vol.22, Issue 3, pp.319–325, 2003.

Statistics- and Spatiality-Based Feature Distance Measure for Error Resilient Image Authentication Shuiming Ye1,2 , Qibin Sun1 , and Ee-Chien Chang2 1

2

Institute for Infocomm Research, A*STAR, Singapore, 119613 School of Computing, National University of Singapore, Singapore, 117543 {Shuiming, Qibin}@i2r.a-star.edu.sg, [email protected]

Abstract. Content-based image authentication typically assesses authenticity based on a distance measure between the image to be tested and its original. Commonly employed distance measures such as the Minkowski measures (including Hamming and Euclidean distances) may not be adequate for content-based image authentication since they do not exploit statistical and spatial properties in features. This paper proposes a feature distance measure for content-based image authentication based on statistical and spatial properties of the feature diﬀerences. The proposed statistics- and spatiality-based measure (SSM ) is motivated by an observation that most malicious manipulations are localized whereas acceptable manipulations result in global distortions. A statistical measure, kurtosis, is used to assess the shape of the feature diﬀerence distribution; a spatial measure, the maximum connected component size, is used to assess the degree of object concentration of the feature diﬀerences. The experimental results have conﬁrmed that our proposed measure is better than previous measures in distinguishing malicious manipulations from acceptable ones. Keywords: Feature Distance Measure, Image Authentication, Image Transmission, Error Concealment, Digital Watermarking, Digital Signature.

1

Introduction

With the wide availability of digital cameras and image processing software, the generation and manipulation of digital images are easy now. To protect the trustworthiness of digital images, image authentication techniques are required in many scenarios, for example, applications in health care. Image authentication, in general, diﬀers from data authentication in cryptography. Data authentication is designed to detect a single bit change whereas image authentication aims to authenticate the content but not the speciﬁc data representation of an image [1], [2]. Therefore, image manipulations which do not change semantic meaning are often acceptable, such as contrast adjustment, histogram equalization, and compression [3], [4]. Lossy transmission is also considered as acceptable since errors under certain level in images would be tolerable Y.Q. Shi (Eds.): Transactions on DHMS II, LNCS 4499, pp. 48–67, 2007. c Springer-Verlag Berlin Heidelberg 2007

Statistics- and Spatiality-Based Feature Distance Measure

49

(a)

(b)

(c)

(d)

(e)

Fig. 1. Discernable patterns of edge feature diﬀerences caused by acceptable image manipulation and malicious modiﬁcation: (a) original image; (b) tampered image; (c) feature diﬀerence of (b); (d) blurred image (by Gaussian 3×3 ﬁlter); (e) feature diﬀerence of (d)

50

S. Ye, Q. Sun, and E.-C. Chang

and acceptable [5]. Other manipulations that modify image content are classiﬁed as malicious manipulations, such as object removal or insertion. Image authentication is desired to be robust to acceptable manipulations, and necessary to be sensitive to malicious ones. In order to be robust to acceptable manipulations, several content-based image authentication schemes have been proposed [6], [7], [8]. These schemes may be robust to one or several speciﬁc manipulations, however, they would classify the image damaged by transmission errors as unauthentic [9]. Furthermore, contentbased image authentication typically measures authenticity in terms of the distance between a feature vector from the received image and its corresponding vector from the original image, and compares the distance with a preset threshold to make a decision [10], [11]. Commonly employed distance measures, such as the Minkowski metrics [12] (including Hamming and Euclidean distances), may not be suitable for robust image authentication. The reason is that even if these measures are the same (e.g., we cannot tell whether the question image is authentic or not), the feature diﬀerence patterns under typical acceptable modiﬁcations or malicious ones may be still distinguishable (feature diﬀerences are diﬀerences between the feature extracted from the original image and the feature extracted from the testing image). That is to say, these measures do not properly exploit statistical or spatial properties of image features. For example, the Hamming distance measures of Fig. 1(b) and Fig. 1(d) are almost the same, but yet, one could argue that Fig. 1(b) is probably distorted by malicious tampering since the feature diﬀerences concentrate on the eyes. The objective of this paper is to propose a distance measure based on statistical and spatial properties of the feature diﬀerences for content-based image authentication. The proposed measure is derived by exploiting the discernable patterns of feature diﬀerences between the original image and the distorted image to distinguish acceptable manipulations from malicious ones. Two properties, the kurtosis of the feature diﬀerence distribution and the maximum connected component size in the feature diﬀerences, are combined to evaluate the discernable patterns. We call the proposed measure statistics- and spatiality-based measure (SSM ) since it considers both global statistical properties and spatial properties. Many acceptable manipulations, which were detected as malicious modiﬁcations by previous schemes based on Minkowski metrics, were correctly veriﬁed by the proposed scheme based on SSM. To illustrate how the proposed SSM can improve the performance of image authentication scheme, we applied it in a semi-fragile image authentication scheme [13] to authenticate images damaged by transmission errors. The proposed error resilient scheme obtained better robustness against transmission errors in JPEG or JPEG2000 images and other acceptable manipulations than the scheme proposed in [13].

2

Proposed Statistics- and Spatiality-Based Measure (SSM ) for Image Authentication

Content-based or feature-based image authentication generally veriﬁes authenticity by comparing the distance between the feature vector extracted from the

Statistics- and Spatiality-Based Feature Distance Measure

51

testing image and the original with some preset thresholds [14]. The distance metric commonly used is the Minkowski metric d(X, Y ) [12]: N |xi − yi |r )1/r d(X, Y ) = (

(1)

i=1

where X, Y are two N dimensional feature vectors, and r is a Minkowski factor. Note that when r is set as 2, it is actually Euclidean distance; when r is 1, Manhattan distance (or Hamming distance for binary vectors). However, the Minkowski metric does not exploit statistical or spatial properties of image features. Therefore, the image authentication scheme based on Minkowski metric may not be suitable to distinguish the tampered images (e.g., small local objects removed or modiﬁed) from the images by acceptable manipulations such as lossy compression. On the other hand, we found that even if the Minkowski metric distances are the same, the feature diﬀerence under typical acceptable manipulations and malicious ones are still distinguishable especially in the case that the feature contains spatial information such as edges or block DCT coeﬃcients. Therefore, the Minkowski metric is not a proper measure for content-based image authentication. 2.1

Main Observations of Feature Diﬀerences

Many features used in content-based image authentication are composed of localized information about the image such as edges [3], [6], block DCT coeﬃcients [1], [10], [13], highly compressed version of the original image [7], or block intensity histogram [11]. To facilitate discussions, we let xi be the feature value at spatial location i, and X be an N -dimension feature vector, for example, N = W · H when using edge feature (W and H are the width and height of the image). We deﬁne the feature diﬀerence vector δ as the diﬀerence between feature vector X of the testing image and feature vector Y of the original image: δi = |xi − yi |

(2)

where δ i is the diﬀerence of features at spatial location i. After examining many discernable feature diﬀerence patterns from various image manipulations, we could draw three observations on feature diﬀerences: 1. The feature diﬀerences by most acceptable operations are evenly distributed spatially, whereas the diﬀerences by malicious operations are locally concentrated. 2. The maximum connected component size of the feature diﬀerences caused by acceptable manipulations is usually small, whereas the one by malicious operation is large. 3. Even if the maximum connected component size is fairly small, the image could have also been tampered with if those small components are spatially concentrated.

52

S. Ye, Q. Sun, and E.-C. Chang

These observations are supported by our intensive experiments and other literatures mentioned previously [6], [9]. Image contents are typically represented by objects and each object is usually represented by spatially clustered image pixels. Therefore, the feature to represent the content of the image would inherit some spatial relations. A malicious manipulation of an image is usually concentrated on modifying objects in image, changing the image to a new one which carries diﬀerent visual meaning to the observers. If the contents of an image are modiﬁed, the features around the objects may also be changed, and the aﬀected feature points tend to be connected with each other. Therefore, the feature diﬀerences introduced by a meaningful tampering typically would be spatially concentrated. On the contrary, acceptable image manipulations such as image compression, contrast adjustment, and histogram equalization introduce distortions globally into the image. The feature diﬀerences may likely to cluster around all objects in the image, therefore they are not as concentrated locally as those by malicious manipulations. In addition, many objects may spread out spatially in the image, thus the feature diﬀerences are likely to be evenly distributed with little connectedness. The distortion introduced by transmission errors would also be evenly distributed since the transmission errors are randomly introduced into the image [18]. The above observations not only prove the unsuitability of Minkowski metric to be used in image authentication, but also provide some hints on how a good distance function would work: it should exploit the statistical and spatial properties of feature diﬀerences. These observations further lead us to design a new feature distance measure for content-based image authentication. 2.2

Proposed Feature Distance Measure for Image Authentication

Based on the observations discussed so far, a feature distance measure is proposed in this section for image authentication. The distance measure is based on the diﬀerences of the two feature vectors from the testing image and from the original image. Two measures are used to exploit statistical and spatial properties of feature diﬀerences, including the kurtosis (kurt ) of feature diﬀerence distribution and the maximum connected component size (mccs) in the feature diﬀerence map. Observation (1) motivates the uses of the kurtosis measure, and observation (2) motivates the uses of the mccs measure. They are combined together since any one of the above alone is still insuﬃcient, as stated in observation (3). The proposed Statistics- and Spatiality-based Measure (SSM ) is calculated by sigmoid membership function based on both mccs and kurt. Given two feature vectors X and Y , the proposed feature distance measure SSM (X, Y ) is deﬁned as follows: SSM (X, Y ) =

1 1 + e α(mccs·kurt·θ−2 − β)

(3)

Statistics- and Spatiality-Based Feature Distance Measure

53

The measure SSM (X, Y ) is derived from the feature diﬀerence vector δ deﬁned in Eq. (2). The mccs and kurt are obtained from δ, and their details are given in the next few paragraphs. θ is a normalizing factor. The parameter α controls the changing speed especially at the point mccs · kurt · θ−2 = β. β is the average mccs · kurt · θ−2 value obtained by calculating from a set of malicious attacked images and acceptable manipulated images. In this paper, the acceptable manipulations are deﬁned as contrast adjustment, noise addition, blurring, sharpening, compression and lossy transmission (with error concealment); the malicious tampering operations are object replacement, addition or removal. During authentication, if the measure SSM (X, Y ) of an image is smaller than 0.5 (that is, mccs · kurt · θ−2 < β, the image is identiﬁed as authentic, otherwise it is unauthentic. Kurtosis. Kurtosis describes the shape of a random variable’s probability distribution based on the size of the distribution’s tails. It is a statistical measure used to describe the concentration of data around the mean. A high kurtosis portrays a distribution with fat tails and a low even distribution, whereas a low kurtosis portrays a distribution with skinny tails and a distribution concentrated towards the mean. Therefore, it could be used to distinguish feature diﬀerence distribution of the malicious manipulations from that of the acceptable manipulations. Let us partition the spatial locations of the image into neighborhoods, and let Ni be the i-th neighborhood. That is, Ni is a set of locations that are in a same neighborhood. For example, by dividing the image into blocks of 8×8, we have a total of W · H/64 neighborhoods, and each neighborhood contains 64 locations. Let Di be the total feature distortion in the i-th neighborhood Ni : Di = δj (4) j∈Ni

We can view Di as a sample of a distribution D. The kurt in the Eq. (3) is the kurtosis of the distribution D. It can be estimated by: N

kurt(D) =

i=1

(Di − μ)4

N um σ 4

−3

(5)

where Num is the total number of all samples used for estimation. μ and σ is the estimated mean and standard deviation of D, respectively. Maximum Connected Component Size. Connected component is a set of points in which every point is connected to all others. Its size is deﬁned as the total number of points in this set. The maximum connected component size (mccs) is usually calculated by morphological operators. The isolated points in the feature diﬀerence map are ﬁrst removed and then broken segments are joined by morphological dilation. The maximum connected component size (mccs) is then calculated by using connected components labeling on the feature map based on 8-connected neighborhood. Details can be found in [15].

54

S. Ye, Q. Sun, and E.-C. Chang

Normalizing Factor. Since images may have diﬀerent number of objects, details as well as dimensions, normalization is therefore needed. Instead of using traditional normalization (i.e., the ratios of the number of extracted feature points to image dimension), we employ a new normalizing factor θ as: θ=

μ W ·H

(6)

where W and H are the width and height of the image respectively. μ is the estimated mean of D, same as that in Eq.(5). The normalized factor θ makes the proposed measure more suitable for natural scene images.

(a)

(b)

(c)

(d)

Fig. 2. Cases that required both mccs and kurt to work together to successfully detect malicious modiﬁcations: (a) small object tampered (kurt: large; mccs: small); (b) feature diﬀerences of (a); (c) large object tampered with global distortions (kurt: small; mccs: large); (d) feature diﬀerences of (c)

Statistics- and Spatiality-Based Feature Distance Measure

55

It is worth noting that the two measures mccs and kurt should be combined together to handle diﬀerent malicious tampering. Usually tampering results in three cases in terms of the values of mccs and kurt : (1) the most general case is that tampered areas are with large maximum connected size and distributed locally (Fig. 1(b)). In this case, both kurt and mccs are large; (2) small local object is modiﬁed such as a small spot added in face (Fig. 2(a)). In this case, the mccs is usually very small, but kurt is large; (3) tampered areas are with large maximum connected size but these areas are evenly distributed in the whole image (Fig. 2(c)). In this case, the mccs is usually large, but kurt is small. Therefore, it is necessary for SSM to combine these two measures so that SSM could detect all these cases of malicious modiﬁcations.

3

Application of SSM to Error Resilient Image Authentication

Image transmission is always aﬀected by the errors due to channel noises, fading, multi-path transmission and Doppler frequency shift [16] in wireless channel, or packet loss due to congestion in Internet [17]. Therefore, error resilient image authentication which is robust to acceptable manipulations and transmission errors is desirable. Based on the proposed feature distance measure, an error resilient image authentication scheme is proposed in this section. The proposed error resilient scheme exploits the proposed measure in a generic semi-fragile image authentication framework [8] to distinguish images distorted by transmission errors from maliciously modiﬁed ones. The experimental results support that the proposed feature distance measure can improve the performance of the previous scheme in terms of robustness and sensitivity. 3.1

Feature Extraction for Error Resilient Image Authentication

One basic requirement for selecting feature for content-based image authentication is that the feature should be sensitive to malicious attacks on the image content. Edge-based features would be a good choice because usually malicious tampering will incur the changes on edges. And edge may also be robust to some distortions. For instances, the results in [18] show that high edge preserving ratios can be achieved even if there are uncorrectable transmission errors. Therefore, the remaining issue is to make the edge more robust to the deﬁned acceptable manipulations. Note that this is main reason why we employ the normalization by Eq. (6) to suppress those “acceptable” distortions around edges. In [19], a method based on fuzzy reasoning is proposed to classify each pixel of a gray-value image into a shaped, textured, or smooth feature point. In this paper we adopt their fuzzy reasoning based detector because of its good robustness. 3.2

Image Signing

The image signing procedure is outlined in Fig. 3. Binary edge of the original image is extracted using the fuzzy reasoning based edge detection method [19].

56

S. Ye, Q. Sun, and E.-C. Chang

Fig. 3. Signing process of the proposed error resilient image authentication scheme

Then, the edge feature is divided into 8×8 blocks, and edge point number in each block is encoded by error correcting code (ECC) [8]. BCH(7,4,1) is used to generate one parity check bit (PCB) for ECC codeword (edge point number) of every 8×8 block. The signature is generated by hashing and encrypting the concatenated ECC codewords using a private key. Finally, the PCB bits embedded into the DCT coeﬃcients of the image. In our implementation, the PCB bits are embedded into the middle-low frequency DCT coeﬃcients using the same quantization based watermarking as in [13]. Let the total selected DCT coeﬃcients form a set P. For each coeﬃcient c in P, it is replaced with cw which is calculated by: Qround(c/Q), if LSB(round(c/Q)) = w (7) cw = Q (round(c/Q) + sgn (c − Qround(c/Q))) , else where w (0 or 1) is the bit to be embedded. Function round(x) returns the nearest integrate of x, sgn(x) returns the sign of x, and LSB(x) returns the least signiﬁcant bit of x. Eq. (7) makes sure that the LSB of the coeﬃcient is the same as the watermark bit. Note that embedding procedure should not aﬀect the feature extracted, since the watermarking procedure would introduce some distortions. In order to exclude the eﬀect of watermarking from feature extraction, a compensation operator Cw is adopted before feature extraction and watermarking: Ic = Cw (I) (8) Iw = fe (Ic ) Cw (I) = IDCT {IntQuan (di , 2Q, P)}

(9)

where di is the i-th DCT coeﬃcient of I, and IDCT is inverse DCT transform. fe (I) is the watermarking function, and Iw is the ﬁnal watermarked image. The IntQuan(c, P, Q) function is deﬁned as: c, if c ∈ /P IntQuan (c, Q, P) = (10) Q round(c/Q), else Cw is designed according to the watermarking algorithm, which uses 2Q to prequantize the DCT coeﬃcients before feature extraction and watermarking. That

Statistics- and Spatiality-Based Feature Distance Measure

57

is, from Eq. (7), (9) and (10), we can get Cw (Iw ) = Cw (I), thus fe (Iw ) = fe (I), i.e., the feature extracted from the original image I is the same as the one from the watermarked image Iw . This compensation operator ensures that watermarking does not aﬀect the extracted feature. 3.3

Image Authenticity Veriﬁcation

The image veriﬁcation procedure can be viewed as an inverse process of the image signing procedure, as shown in Fig. 4. Firstly, error concealment is carried out if transmission errors are detected. The feature of image is extracted using the same method as used in image signing procedure. Watermarks are then extracted. If there are no uncorrectable errors in ECC codewords, the authentication is based on bit-wise comparison between the decrypted hashed feature and the hashed feature extracted from the image [8]. Otherwise, image authenticity is calculated by the SSM based on diﬀerences between the PCB bits of the re-extracted feature and the extracted watermark. Finally, if the image is identiﬁed as unauthentic, the attacked areas are then detected.

Fig. 4. Image authentication process of the proposed error resilient image authentication scheme

Error Concealment. Given an image to be veriﬁed, the ﬁrst step is to conceal the errors if some transmission errors are detected. For wavelet-based images, edge directed ﬁlter-based error concealment algorithm proposed in [18] is adopted. For DCT-based JPEG images, a content-based error concealment proposed in [20] is used. It is eﬃcient and advisable to apply error concealment before image authentication since the edge feature of the error-concealed image is much closer to the original one than that of the damaged image [18], [20]. As a result, the content authenticity of the error concealed image is higher than that of the damaged image, which is validated in our experiments of the error resilient image authentication.

58

S. Ye, Q. Sun, and E.-C. Chang

Image Content Authenticity. Given an image to be veriﬁed, we repeat feature extraction described in image signing procedure. The corresponding PCB bits (PCB W ) of all 8×8 blocks (one bit/block) of the image are extracted from the embedded watermarks. Then the feature set extracted from the image is combined with the corresponding PCBs to form ECC codewords. If all codewords are correctable, we concatenate all codewords and cryptographically hash the result sequence. The ﬁnal authentication result is then concluded by bit-bybit comparison between these two hashed sets. If there are uncorrectable errors in ECC codewords, image authenticity is calculated based on the proposed distance measure. The two feature vectors in the proposed measure are PCB W from watermarks and the recalculated PCB bits (PCB F ) from ECC coding of the re-extracted image feature set. If the distance measure between PCB W and PCB F is smaller than 0.5 (SSM (PCB W , PCB F ) 106 ). The context of application obviously determines the requirements of the watermarking scheme. Blind detection or retrieval should be preferred to informed detection whenever the availability of the original model implies a risk of misuse or theft [17]. Copyright protection thus demands blind detection (some sideinformation can however be tolerated). Integrity and authentication also require blind detection when the integrity of the original itself cannot be trusted. Moreover, using informed detection or retrieval necessitates the development of eﬃcient database 3D shape retrieval algorithms to compare the original with the suspect mesh [8]. Blind detection (or retrieval) however involves many more challenges than informed detection and still leads to poor robustness results in practice. Robustness requirements are the most diﬃcult to determine. Integrity and authentication (as well as augmented contents) watermarking schemes should resist against RST transforms, lossless format conversion and vertex re-ordering and be fragile against all other attacks. Cayre et al. [13] however also propose cropping as an attack to which these schemes should be robust. For copyright protection applications, robustness is required for all attacks preserving the visual perception of the shape. In practice, most papers proposing

98

P.R. Alface and B. Macq

copyright protection 3D watermarking schemes only test RST transforms, vertex re-ordering, noise addition, compression, simpliﬁcation, smoothing, cropping and subdivision. It is considered that the visual shape is the content to protect. Other kinds of properties of the mesh shape can also be important to protect such as touch perception (roughness and haptic textures properties) and functional imperceptibility. The latter concerns for example industrial CAD models which are virtually designed and then manufactured to be part of a complex system. Attacks and watermarks should not modify the design properties of such models. In conclusion, each proposed watermarking scheme should carefully describe the target application and subsequent requirements.

4

3D Watermarking Schemes

In this survey, we describe most well-known and recent contributions to 3D watermarking. We classify them by the domain of embedding: spatial, transform, compression and attribute domains. This classiﬁcation is further subdivided in function of the targeted application. 4.1

Spatial Domain

The 3D watermarking schemes which embed data in the spatial domain may be classiﬁed in two main categories : Connectivity-driven watermarking schemes and Geometry-driven watermarking schemes. 4.1.1 Connectivity-Driven Watermarking Schemes We refer as connectivity-driven watermarking algorithms to those which make an explicit use of the mesh connectivity (some authors also refer to topological features, where topology must be understood as connectivity) to embed data in the spatial domain. These schemes are typically based on a public or secret traversal of all (or a subset of) the mesh triangles. The original model is usually not needed at the detection or decoding stage, they are therefore blind schemes. For each triangle satisfying an admissibility function, slight modiﬁcations are introduced in local invariants by changing the adjacent point positions. As a consequence, these schemes are sensitive to noise addition. However, welldesigned embeddings may interestingly resist against some local connectivity modiﬁcations. Three main diﬀerent strategies (a.k.a. arrangements [35], see Fig. 3) enable to re-synchronize the embedded data even after re-triangulation or cropping: – Global arrangement: canonical traversal of all the connectivity graph. – Local arrangement: canonical traversal of subsets of the connectivity graph. – Subscript arrangement: explicit embedding of the localization of the information. This implies to hide both the data bit and its subscript as well. If subscript arrangements need to embed more information than the local or global arrangements, they are usually more robust [11].

From 3D Mesh Data Hiding to 3D Shape Blind and Robust Watermarking

99

Fig. 3. Embedding strategies of connectivity-driven schemes: (a) global arrangement, (b) local arrangement, (c) indexed arrangement (data courtesy of Ohbuchi et al. [35])

Among this class of watermarking schemes, Ohbuchi et al. [35] have proposed four diﬀerent watermarking algorithms in the ﬁrst work published on 3D watermarking. These schemes are respectively named Triangle Similarity Quadruple (TSQ), Tetrahedral Volume Ratio (TVR), Triangle Strip Peeling Sequence (TSPS) and Macro Density Pattern (MDP). These schemes have inspired most connectivity-driven schemes developed so far. We classify these schemes by the application they target. 4.1.1.1 Data Hiding. Based on the fact that similar triangles may be deﬁned by two quantities which are invariant to rotation, uniform scaling and translation (RST transforms), TSQ modiﬁes ratios between triangle edge lengths or triangle height and basis lengths. A simple traversal of the mesh triangles is proposed to compute Macro-Embedding-Primitives (MEP). A MEP is deﬁned by a marker M , a subscript S and two data values D1 and D2 (see Fig.4). Decoding is simply achieved by traversing each triangle of the mesh, identifying the MEPs thanks to the marker triangle. Then the subscript enables to re-arrange the encoded data D1 and D2 . This scheme is invariant to RST transforms and to cropping thanks to the subscript arrangement. As security is not dealt with, this scheme can only be used for data hiding applications. The invariant used by TVR is the ratio between an initial tetrahedron volume and the volume of tetrahedron given by an edge and the its two incident triangles. These ratios are slightly modiﬁed to embed the watermark and are invariant to aﬃne transforms. Based on a local or global arrangement, TVR is a blind readable watermarking scheme. This scheme can only be applied on 2-manifold meshes (each edge has at most two incident faces). Benedens has extended this scheme to more general meshes without constraints of topology (Aﬃne Independent Embedding AIE [8]). These schemes are however no more robust against cropping when compared to the TSQ scheme. These schemes can however hide f bits in a triangle mesh of f triangles which is much more than TSQ. The third scheme, TSPS, encodes data in triangle strips given the orientation of the triangles. Based on a local arrangement, it presents the same robustness

100

P.R. Alface and B. Macq

Fig. 4. On the left, Macro Embedding Primitive. For each MEP, the marker M is encoded by modifying the point coordinates of the triangle v1 , v2 , v3 so that dimension-less ratios l14 /l24 , h0 /l12 (lij stands for the length between vertices vi and vj ) are set to speciﬁed values which will enable to retrieve marker triangles at the decoding stage. Then, the subscript is similarly encoded by modifying v0 and subsequently l02 /l01 , h0 /l12 . Finally, data symbols D1 and D2 are encoded in l13 /l34 , h3 /l14 and l45 /l34 , h5 /l24 respectively. On the right, Macro Density Pattern example (data courtesy of Ohbuchi et al. [35]).

properties than the TSQ scheme. The capacity is diﬃcult to estimate as the triangle strips generally do not transverse all the faces of the mesh. If it is not competitive with TSQ or TVR, this scheme is the basis of the best steganographic schemes presented in the sequel. Finally, Ohbuchi’s MDP is a visual watermarking method which embeds a meshed logo in the host model by changing the local density of points (see Fig.4). The logo is invisible with most common shading algorithms [21] but turns visible when the edges of the mesh are rendered. However, visible watermarking of 3D meshes has not many applications so far. Focusing on the improvement of the mesh traversal simplicity and speed, O. Benedens has proposed another connectivity-driven scheme: the Triangle Flood Algorithm (TFA) [4]. This scheme uses connectivity and geometric information to generate a unique traversal of all the mesh triangles. Point positions are modiﬁed to embed the watermark by altering the height of the triangles and also to enable the regeneration of the traversal. This schemes exactly embeds f − 1 bits where f stands for the number of triangles. 4.1.1.2 Steganography. Cayre and Macq [11] have proposed a blind substitutive scheme which encodes a bit in a triangle strip starting from the triangle presenting the maximal area. This triangle strip is determined by the bits of a secret key, and determines the location of the encoded data in the 3D mesh. This scheme can be seen as an extension of TSPS with security properties which make it suitable for steganography purposes. It is indeed not possible to locate the embedded data without the knowledge of the secret key. For these reasons, this spatial substitutive scheme can also be considered as an extension of Quantized Index Modulation (QIM) schemes to 3D models.

From 3D Mesh Data Hiding to 3D Shape Blind and Robust Watermarking

101

Still considering steganography, Wang and Cheng [54] have improved the capacity of the precedent approach. First, they resolve the initial triangle for embedding by Principal Component Analysis (PCA). Next, an eﬃcient triangular mesh traversal method is used to generate a sequence list of triangles, which will contain the hidden message. Finally, they embed three ﬁxed bits per vertex for all vertices relying on three independent degrees of freedom. As a result, they exploit larger capacity in the 3D space. However, this capacity gain has been reached at some expense of the proved security features of the scheme of Cayre and Macq [11]. 4.1.1.3 Authentication. Recently, Cayre et al. [13] have extended their previous scheme by using a global optimal traversal of the mesh and an indexed embedding. The authors specify the requirements of a 3D watermarking scheme for the authentication application context. They show their scheme withstands the attacks they consider in such context : RST transforms and cropping. For the cropping attack, the minimum watermark segment (MWS) is computed. They also propose a careful study of the capacity and security (in bits) of the embedding, the class of robustness and the probability of false alarm. The analysis of such features of a 3D watermarking scheme is diﬃcult to perform for all other connectivity-driven watermarking schemes. In conclusion, connectivity-driven algorithms are characterized by their relative fragility and their blind decoding capabilities. The embedded watermark does generally not resist against noise addition or global imperceptible re-triangulations (with exception to MDP). They are suitable for annotation and related applications only, with exception to more recent works which deal with the security issue [11,54] and [13]. These have been successfully designed respectively for steganographic and authentication purposes. Copyright or copy protection cannot be provided by this class of schemes as they do not resist against re-sampling. 4.1.2 Geometry-Driven Watermarking Schemes This section presents the 3D watermarking schemes which embed data in the geometry. These schemes modify the point positions and/or the point (or face) normals. Point normals are estimations of the local continuous surface normal and are tied to the local shape of the mesh. On one hand, while surface sampling determines point positions, its inﬂuence on point normals is negligible if the point density is suﬃcient to accurately represent the surface. On the other hand, noise addition aﬀects much more point normals and curvature estimations than point positions. Notice some schemes need the orientation of face normals to be consistent and cannot be applied to non-orientable surfaces such as a M¨ obius strip. Point normals are usually estimated by a weighted sum of the adjacent faces normals or adjacent point positions. This means that a modiﬁcation of the connectivity may aﬀect the neighborhood of a point and have an impact on the point normal measure. However, attacks which modify point normals generally have a visual impact on the rendering of the mesh [21] and should therefore not be dealt with by a watermarking scheme.

102

P.R. Alface and B. Macq

4.1.2.1 Data Hiding. The Vertex Flood Algorithm (VFA) [5] embeds information in point positions. Designed for public watermarking, its high capacity is its main feature. Given a point p in the mesh, all points are clustered in subsets (Sk ) accordingly with their distance to p. This point is the barycenter of a reference triangle R whose edges are the closest to a predeﬁned edge length ratio : Sk = {pi ∈ V |k ≤

dMAX pi − p < k + 1}, 0 ≤ k ≤ , W W

(1)

where dMAX is the maximal distance allowed from p, and W is the width of each set. Each non-empty subset is subdivided in m + 2 intervals in order to encode m bits. The distance of each point in a subset is modiﬁed so that it is placed on the middle of one of the m + 2 intervals. The ﬁrst and last intervals are not used for encoding in order to prevent modiﬁcations of point distances which would aﬀect the other subsets. Decoding does not need the original mesh and is simply achieved by reading point positions the subintervals in each subset Sk . As the scheme of Harte et al., VFA only resists against RST transforms. Compared to connectivity-driven watermarking schemes, this scheme can achieve higher capacity, only limited by the point sampling rate and the point position quantization precision. 4.1.2.2 Authentication. Yeo et al. [51] have developed an authentication algorithm by modifying point positions so that each mesh point veriﬁes the following equation: K(I(p)) = W (L(p)) , (2) where K(.) is the veriﬁcation key, I(p) is an index value depending on point coordinates, W (.) is the watermark represented by a binary matrix and L(p) gives the location in the watermark matrix. I(p) has been designed to be dependent on the neighborhood of point p. This interesting feature allows the detection of cropping attacks. Compared to connectivity-driven schemes also targeting authentication, the computational cost of this method is higher and the security features have not been deeply analyzed. 4.1.2.3 Informed Copyright Protection. Informed or blind schemes dedicated to copyright protection are often referred to as robust watermarking schemes. The schemes should resist to all known manipulations and attacks which do not produce a visible distortion on the 3D mesh. Schemes that resist to remeshing and re-sampling are often referred as 3D shape watermarking schemes. The Normal Bin Encoding (NBE) scheme [4] embeds data in point normals. Thanks to the curvature sampling properties pointed out before, this scheme resists against simpliﬁcations of the mesh. Point normals are subdivided in bins. Each bin is deﬁned by a normal nB named the center normal of the bin and an angle φR called bin radius. If the angle between a point normal ni is less than φR then ni belongs to such bin. Each bin encodes one bit of information by using diﬀerent features such as the mean of the bin normals, the bin mean angle diﬀerence and the ratio of normals inside a threshold region determined by an

From 3D Mesh Data Hiding to 3D Shape Blind and Robust Watermarking

103

angle φK (with φK < φR ). Point positions are modiﬁed so that the target value is assigned to the chosen bin feature. The decoding is simply achieved by computing the bins and their features but needs the original model for preprocessing purposes. This scheme has been improved later [8] and provides interesting imperceptibility and robustness features. The main drawback is the scalability of this technique which cannot eﬃciently handle meshes with more than 105 points. Yu et al. [53] have proposed an informed robust scheme based on the histogram of the distances from the points of the surface to its center of gravity. This distance histogram is subdivided in bins and the points are iteratively displaced so that the mean or the variance of the histogram bin lies on the left or right of the bin middle to respectively encode a 0 or a 1. A scrambling of the vertices deﬁned by a secret key is also proposed to secure the embedding of the watermark. The informed detection of the watermark necessitates the registration and resampling of the original and watermarked versions of the 3D model. The robustness features of this scheme cover noising and denoising attacks, cropping and resampling. Unlike NBE, this scheme has good scalability properties. Focusing on imperceptibility criterions such as symmetry and continuity preservation, Benedens [7] has proposed a copyright protection watermarking scheme based on a sculpting approach. It uses Free Form Deformations (FFD) at distinct locations of the mesh (the so-called feature points) to embed a watermark. The basic steps performed by the embedding part of the algorithm consist in a ﬁrst selection procedure of feature points and the displacement of these points along the surface normal (inwards or outwards depending on the watermark value) by a FFD. These two operations are ruled by secret keys. The detector is based on the assumption that random copies of the original model have features that are independently randomly distributed (i.e. independently randomly pointing inwards and outwards the surface following the same distribution). This algorithm presents very good imperceptibility and robustness results against noise addition, smoothing, cropping, aﬃne transforms and a relatively good robustness against re-sampling. The latter strongly depends on the detector properties and registration optimality. Comparing the schemes of Yu et al. [53] and this scheme should be done by using the same ﬁne registration and re-sampling process. It appears that the sculpting approach provides better imperceptibility results and comparable robustness features. 4.1.2.4 Blind Copyright Protection. Unlike informed robust watermarking schemes, these schemes cannot survive combined remeshing and cropping attacks so far. They also generally provide less robustness to geometric attacks. However, blind detection is a nice property that is usually required for a copyright protection application scenario. M. Wagner [50] has proposed a scheme which embeds data in the point normals of the mesh. These normal vectors are estimated by the Laplacian operator (a.k.a. umbrella operator ) applied on the point neighborhood: 1 (pj − pi ) , (3) ni = dpi pj ∈N (pi )

104

P.R. Alface and B. Macq

where dpi is the number of point neighbors of pi and N (pi ) is the neighborhood of pi . The watermark is a continuous function f (p) deﬁned on the unit sphere. Normal vectors ni and the watermark function are converted in integers ki and wi respectively: c ki = ni d ni b wi = 2 f , ni

(4) (5)

where d is the mean length of these normal vectors, c is a parameter given by a secret key, and b is the number of bits needed to encode each wi . The embedding proceeds by replacing b bits of ki by those of wi resulting in ki . Then k d the modiﬁed normals ni are re-computed by ni = ci nnii . The watermarked coordinates of each point pi are obtained by solving the following system of L + 1 linear equations: ni =

1 dpi

(pj − pi ) .

(6)

pj ∈N (pi )

However, it is not possible to build a surface from the sole point normal information and this linear equation system is indeed singular. In order to solve this issue, 20% of the points are not watermarked. The decoding of the watermark needs a modiﬁcation of the parameter c because of the modiﬁcation of the normal mean length d : c = c dd . In order to be robust to aﬃne transforms, a non-Euclidian aﬃne invariant norm [34] is used. The watermark can be either a visual logo on the unit sphere either a gaussian white noise. Scalability and computational cost of this scheme are a concern. Harte et al. [25] have proposed another blind watermarking scheme to embed a watermark in the point positions. One bit is assigned to each point : 1 if the point is outside a bounding volume deﬁned by its point neighborhood and 0 otherwise. This bounding volume may be either deﬁned by a set of bounding planes or by an bounding ellipsoid. During embedding and decoding, points are ranked with respect to their distance to their neighborhood center. This algorithm is robust against RST transforms, noise addition and smoothing. Likewise the scheme of Wagner et al. [50], this scheme cannot withstand connectivity attacks such as remeshing or re-triangulation. However, this scheme presents a far better computational cost since embedding only needs one vertices traversal and limits computations for each point to the one-connected neighbors. Cho et al. [18] have proposed a blind and robust extension of the scheme of Yu et al. [53]. This scheme presents the same robustness features with exception to cropping and any re-sampling attack that modiﬁes the position of the center of gravity (e.g. unbalanced point density). They propose to send the position of this point to the detection side which is not realistic. Indeed, combined cropping and rotation or translation attacks can shift the relative positions of the model and the center of gravity conveyed as side-information. This scheme is limited

From 3D Mesh Data Hiding to 3D Shape Blind and Robust Watermarking

105

to star-shaped models1 but, considering robustness, outperforms the schemes of Harte et al. [25] and Wagner [50]. This scheme is however fragile against cropping. Similarly, Zafeiriou et al. [57] have proposed to change the point coordinates into spherical coordinates with the center of gravity as origin. A Principal Component Analysis (PCA) is used to ﬁrst align the mesh along its principal axes. Then two diﬀerent embedding functions are used to modify geometric invariants. For angle theta and radius r, a continuous neighborhood patch is computed by a NURBS patch. A 0 is encoded if the center point radius is less than the mean radius of the neighborhood and a 1 is encoded otherwise. Similar to the scheme of Cho et al. [18], this scheme shows approximately the same advantages and limitations. This scheme is fragile against cropping and unbalanced re-sampling. Center of gravity shifts and PCA alignment perturbations [8] because of density sampling modiﬁcations are also a weakness which deserves further research. More ﬂexible than connectivity-driven algorithms, geometry-driven algorithms enable very diﬀerent capacity-robustness trade-oﬀs. If steganography and authentication seem better handled by the ﬁrst ones, copyright protection techniques could be provided by geometry-driven schemes. However, there is still no blind and robust watermarking scheme able to resist against cropping and irregular point density re-samplings. 4.2

Transform Domain

This section is dedicated to watermarking schemes which embed information in a mesh transform domain. These transforms are extensions of regularly signal processing to 3D meshes: the mesh spectral decomposition, the wavelet transform and the spherical wavelet transform. 4.2.1 Spectral Decomposition Spectral decomposition (a.k.a. pseudo-frequency decomposition or analysis) of 3D meshes corresponds to the extension of the well-known Discrete Fourier Transform (DFT) or Discrete Cosine Transform (DCT). This extension links the spectral analysis of matrices and of the spectral decomposition of signals deﬁned on graphs [45,28]. The pseudo-frequency analysis of a 3D mesh is given by the projection of the geometry on the eigenvectors of the Laplacian operator deﬁned on the mesh. The Laplacian is usually approximated by the umbrella operator L = D − A where A is the adjacency matrix and D is a diagonal matrix with Dii = valence(pi ). Projecting the geometry canonical coordinates (X, Y, Z) leads to three real-valued spectra often noted (P, Q, R) [12]. Other Laplacian operator approximations have been successfully explored to design transforms which allow an optimal energy compaction in pseudo-low frequencies [58,3,56]. Since this transform is based on the eigen-decomposition of a n by n matrix, mesh connectivity partitioning must be used for meshes of more than 104 points to speed up the computation as well as avoiding numerical instabilities 1

For each point of the surface, the segment linking this point to the center of gravity does not intersect the surface in any other point.

106

P.R. Alface and B. Macq

such as eigenvector order ﬂipping [28,58]. Observing that partitioning induces artifacts on submesh boundaries, Wu et al. [56] have recently proposed radial basis functions (RBF) to compute the spectrum of 3D meshes with up to 106 points without the use of a partition algorithm. A better choice of coordinates than the canonical (X, Y, Z) to project on the spectral basis functions is still an open issue. 4.2.1.1 Informed Copyright Protection. The ﬁrst scheme based on spectral decomposition has been proposed by Ohbuchi et al. in 2002 [37]. Their approach consists in extending spread-spectrum techniques to this transform. Wellbalanced point seeds are interactively selected and initialize a connectivity-based front propagation which builds the partition. An additive watermark is embedded on low pseudo-frequency coeﬃcients (P, Q, R) (the three spectra are embedded in the same way). The informed decoding retrieves the partition and the correspondence between the original connectivity and the watermarked geometry through registration, re-sampling and remeshing. This scheme presents robustness against RST transforms, noise addition, smoothing and cropping. Benedens et al. [9] have improved the precedent scheme by embedding the watermark only in the transformed local normal component of the point coordinates instead of embedding (P, Q, R). They show this operation results in a better trade-oﬀ between imperceptibility and capacity. They show it improves the behavior of the decoder as well. Cotting et al. [16] have extended the work of Ohbuchi et al. [37] to pointsampled surfaces. A neighborhood is still needed to compute the Laplacian eigenvectors and is provided by a k-nearest neighbors algorithm. A hierarchical clustering strategy is used to partition the surface. They also show that other point attributes such as color values can also be projected on the spectral basis functions and watermarked as well. The watermark is extracted through registration with the original and re-sampling. The re-sampling is based on the projection of new points on a polynomial approximation of the surface. Their algorithm presents robustness features very close to [37]. Furthermore, they show the watermark withstands repetitive embeddings of diﬀerent watermarks. Recently, Wu and Kobbelt [56] have proposed an approximation of the Laplacian eigenfunctions by RBF functions. These functions are centered on k (with k

Our partners will collect data and use cookies for ad personalization and measurement. Learn how we and our ad partner Google, collect and use data. Agree & close