Saturday, October 16, 2010

Differentially private decision trees, part 0

Today, I set out to figure out what several good mechanisms for generating differentially private decision trees ("good" here being relative, since generating decision trees is NP-hard in almost all settings). Despite this, shallow trees are used frequently in machine learning, so the problem seems somewhat practically motivated.  Step 0 for tomorrow: figure out a good metric on trees. Step 1 for tomorrow: figure out a good measure for a tree's "usefulness" given the data it was derived from. With these two things (or sets of things) in hand, using McSherry's exponential method (a mechanism which outputs differentially private structures which outputs structures with probability exponential in the utility of the structure).


 Also, I am left wondering what kind of regularizer I could come up with for ID3. While ID3 is a heuristic, and not the solution to some minimum loss function, each step does minimize the entropy of the attribute it chooses to split on, so there are some similarities. If there were some way to formalize this, there might be some nice connections with Chaudhuri, and Monteleoni's work on empirical risk minimization.

No comments: