SQL插入缺少的日期(SQL interpolate missing dates)

使用SQL Server 2012我有一个名为Allbucket的表

CustodianAccountNum symbol EndDate ManagerName MarketValue NetReturn A9G040819 wabix 12/31/2013 GMO Benchmark 34751.10987 0.004072 A9G040819 wabix 1/31/2014 GMO Benchmark 34128.88767 -0.017905 A9G040819 wabix 2/28/2014 GMO Benchmark 49969.8081 0.0202 A9G040819 wabix 3/31/2014 GMO Benchmark 50370.993 0.008028 A9G040819 wabix 4/30/2014 GMO Benchmark 50995.0584 0.012389 A9G040819 amj 12/31/2013 JPMorgan Alerian 1234.55 -0.008154 A9G040819 amj 2/28/2014 JPMorgan Alerian 14849.76 -0.018599 A9G040819 amj 3/31/2014 JPMorgan Alerian 14892.8 0.015203 A9G040819 amj 4/30/2014 JPMorgan Alerian 15513.6 0.041684

我试图将这些数据从一个系统加载到另一个系统。 但是,它要求每个给定的CustodianAccountNum,因为所有符号在它们都存在的时间段内具有相同的日期间隔。

请注意,amj缺失1/31/2014。 线索是至少有一个其他安全性,在这种情况下,wabix在同一时间段内具有该日期。 另外请注意,有时日期是例如1/15/2014的月内

我希望做一些类似自我连接和分区的事情,其中​​我为给定的CustodianAccountNum采用所有可能的不同日期,然后强制所有行在它们重叠的时间段内具有相同的周期性。 对于那些不是原始的并从该时间段内存在的另一个符号'借来的'的插值行,我想从该符号的前一行拉动LAG市场价值(如果前一行不存在,则存在0)并强制所有其他值为零。 原始数据中还有其他列,但我试图保持这个例子简单。

所以理想情况下,AMJ看起来像这样,因为wabix的日期是1/31/2014

CustodianAccountNum symbol EndDate ManagerName MarketValue NetReturn A9G040819 amj 12/31/2013 JPMorgan Alerian 1234.55 -0.008154 A9G040819 amj 1/31/2014 JPMorgan Alerian 1234.55 -0.0 A9G040819 amj 2/28/2014 JPMorgan Alerian 14849.76 -0.018599 A9G040819 amj 3/31/2014 JPMorgan Alerian 14892.8 0.015203 A9G040819 amj 4/30/2014 JPMorgan Alerian 15513.6 0.041684

缺失日期的指导原则是,如果任何其他符号具有由给定监护人帐户划定的日期。 有数千个不同的账户,但他们只需要按照给定的账户进行对账

我的关注仅限于每个帐户的符号使用期限内存在日期差距。 如果在数年前存在另一个符号名称,我不需要添加很多个月。 我只需要在给定符号的第一个到最后一个日期同步它们,所有符号在时间上重叠。

UPDATE

Gordon Linoff的回复使我接近,但并不完全。 我不得不将外部应用更改为交叉应用,或者我在所有列中获得了成千上万的空记录。

我修改了查询以显示所有需要的列,但是这个查询导致所有的市场价值都为0。基本上我想强制所有值为0的派生行(在我的例子中为1/31/2014)除了我想从之前的市场价格拉动的市场价格。 但是,对于所有非派生行,我希望在整行中使用原始值。

select ab.drank,d.EndDate,ab.BranchName,ab.EntityID,ab.CustodianAccountNum,ab.AccountID,ab.ManagerName, ab.FTAssetStyle,ab.FTAssetClass,ab.PWMSecurityID,ab.AssetClassCode,ab.AssetClass,ab.Symbol,ab.SecType, ab.Cusip,ab.Held,ab.MarketValue, 0 AS GrossFlow,0 AS GrossWeight,0 AS GrossReturn,0 AS NetFlow,0 AS NetWeight, 0 AS NetReturn,0 AS PortfolioFees,0 AS PortfolioExpenses,0 AS ManagerFees,0 AS Income from (select distinct CustodianAccountNum, enddate from Allbucket) d join (select distinct CustodianAccountNum, symbol from Allbucket) s on d.CustodianAccountNum = s.CustodianAccountNum CROSS apply (select top 1 ab.* from Allbucket ab where d.CustodianAccountNum = ab.CustodianAccountNum and d.enddate <= ab.enddate and s.symbol = ab.symbol AND ab.CustodianAccountNum = 'A9G040819' order by d.enddate desc ) ab

Using SQL Server 2012 I have a table called Allbucket

CustodianAccountNum symbol EndDate ManagerName MarketValue NetReturn A9G040819 wabix 12/31/2013 GMO Benchmark 34751.10987 0.004072 A9G040819 wabix 1/31/2014 GMO Benchmark 34128.88767 -0.017905 A9G040819 wabix 2/28/2014 GMO Benchmark 49969.8081 0.0202 A9G040819 wabix 3/31/2014 GMO Benchmark 50370.993 0.008028 A9G040819 wabix 4/30/2014 GMO Benchmark 50995.0584 0.012389 A9G040819 amj 12/31/2013 JPMorgan Alerian 1234.55 -0.008154 A9G040819 amj 2/28/2014 JPMorgan Alerian 14849.76 -0.018599 A9G040819 amj 3/31/2014 JPMorgan Alerian 14892.8 0.015203 A9G040819 amj 4/30/2014 JPMorgan Alerian 15513.6 0.041684

I am trying to load this data from one system to another. However it requires per a given CustodianAccountNum, for all symbols to have the same date intervals during periods of time where they both exist.

Note that amj is missing 1/31/2014. The clue is that at least one other security, in this case wabix has that date during the same span of time. Also note sometimes dates are intramonth such as 1/15/2014

I was hoping to do something like a self join and partition where I take all possible distinct dates for a given CustodianAccountNum then force all rows to have the same periodicity during the span of time they overlap. For the interpolated rows that were not original and 'borrowed' from another symbol existing in that span of time I would like to pull the LAG marketvalue from the previous row of that symbol (if a previous row exists if not then 0) and force all other values to zero. There are additional columns in the original data but I'm trying to keep this example simple.

So Ideally AMJ would look like this because wabix has the 1/31/2014 date

CustodianAccountNum symbol EndDate ManagerName MarketValue NetReturn A9G040819 amj 12/31/2013 JPMorgan Alerian 1234.55 -0.008154 A9G040819 amj 1/31/2014 JPMorgan Alerian 1234.55 -0.0 A9G040819 amj 2/28/2014 JPMorgan Alerian 14849.76 -0.018599 A9G040819 amj 3/31/2014 JPMorgan Alerian 14892.8 0.015203 A9G040819 amj 4/30/2014 JPMorgan Alerian 15513.6 0.041684

the guiding principle of missing dates is if any other Symbol has a date paritioned by a given custodianaccountnum. There are thousands of different accountnums but they only need to align per a given accountnum

My concern is only with date gaps during the lifetime of the Symbol per account. If another symbol name exists years before it I don't need many months of 0 added. I just need them in sync from the first to the last date of a given symbol across all Symbols that overlap in time.

UPDATE

Gordon Linoff's reply gets me close but not quite there. I had to change OUTER APPLY to CROSS apply or I was getting thousands of null records in all columns.

I've amended the query to show all the desired columns however this query is resulting in all but market value being = 0. Basically I want to force all the values to 0 for the derived row (1/31/2014 in my example) except market vlaue which I'd like to pull from the previous market value. However for all non derived rows I want to use the original values across the entire row.

select ab.drank,d.EndDate,ab.BranchName,ab.EntityID,ab.CustodianAccountNum,ab.AccountID,ab.ManagerName, ab.FTAssetStyle,ab.FTAssetClass,ab.PWMSecurityID,ab.AssetClassCode,ab.AssetClass,ab.Symbol,ab.SecType, ab.Cusip,ab.Held,ab.MarketValue, 0 AS GrossFlow,0 AS GrossWeight,0 AS GrossReturn,0 AS NetFlow,0 AS NetWeight, 0 AS NetReturn,0 AS PortfolioFees,0 AS PortfolioExpenses,0 AS ManagerFees,0 AS Income from (select distinct CustodianAccountNum, enddate from Allbucket) d join (select distinct CustodianAccountNum, symbol from Allbucket) s on d.CustodianAccountNum = s.CustodianAccountNum CROSS apply (select top 1 ab.* from Allbucket ab where d.CustodianAccountNum = ab.CustodianAccountNum and d.enddate <= ab.enddate and s.symbol = ab.symbol AND ab.CustodianAccountNum = 'A9G040819' order by d.enddate desc ) ab

最满意答案

稍微不同的方法,但仍然使用笛卡尔积和APPLY运算符(在这一个中需要OUTER APPLY)。 如果你不想要前面的值,那么只需修改COALESCE()。

SQL小提琴

MS SQL Server 2014架构设置

CREATE TABLE Allbucket ([CustodianAccountNum] varchar(9), [symbol] varchar(5), [EndDate] datetime, [ManagerName] varchar(16), [MarketValue] numeric , [NetReturn] decimal(12,6)) ; INSERT INTO Allbucket ([CustodianAccountNum], [symbol], [EndDate], [ManagerName], [MarketValue], [NetReturn]) VALUES ('A9G040819', 'wabix', '2013-12-31 00:00:00', 'GMO Benchmark', 34751.10987, 0.004072), ('A9G040819', 'wabix', '2014-01-31 00:00:00', 'GMO Benchmark', 34128.88767, -0.017905), ('A9G040819', 'wabix', '2014-02-28 00:00:00', 'GMO Benchmark', 49969.8081, 0.0202), ('A9G040819', 'wabix', '2014-03-31 00:00:00', 'GMO Benchmark', 50370.993, 0.008028), ('A9G040819', 'wabix', '2014-04-30 00:00:00', 'GMO Benchmark', 50995.0584, 0.012389), ('A9G040819', 'amj', '2013-12-31 00:00:00', 'JPMorgan Alerian', 1234.55, -0.008154), ('A9G040819', 'amj', '2014-02-28 00:00:00', 'JPMorgan Alerian', 14849.76, -0.018599), ('A9G040819', 'amj', '2014-03-31 00:00:00', 'JPMorgan Alerian', 14892.8, 0.015203), ('A9G040819', 'amj', '2014-04-30 00:00:00', 'JPMorgan Alerian', 15513.6, 0.041684) ;

查询1

SELECT s.CustodianAccountNum , s.symbol , d.enddate , COALESCE(ab.ManagerName, ap.ManagerName) AS ManagerName , COALESCE(ab.MarketValue, ap.MarketValue) AS MarketValue , COALESCE(ab.NetReturn, 0) AS NetReturn FROM ( SELECT CustodianAccountNum , symbol , MIN(enddate) symstart , MAX(enddate) symend FROM Allbucket GROUP BY CustodianAccountNum , symbol ) s JOIN ( SELECT DISTINCT cast(enddate as date) as enddate FROM Allbucket ) d ON d.enddate BETWEEN s.symstart AND s.symend LEFT JOIN Allbucket ab ON s.CustodianAccountNum = ab.CustodianAccountNum AND s.symbol = ab.symbol AND ab.enddate = d.enddate OUTER APPLY ( SELECT TOP 1 t.* FROM Allbucket t WHERE s.CustodianAccountNum = t.CustodianAccountNum AND s.symbol = t.symbol AND d.enddate <= t.enddate ORDER BY d.enddate DESC ) ap

结果

| CustodianAccountNum | symbol | enddate | ManagerName | MarketValue | NetReturn | |---------------------|--------|------------|------------------|-------------|-----------| | A9G040819 | amj | 2013-12-31 | JPMorgan Alerian | 1235 | -0.008154 | | A9G040819 | amj | 2014-01-31 | JPMorgan Alerian | 14850 | 0 | | A9G040819 | amj | 2014-02-28 | JPMorgan Alerian | 14850 | -0.018599 | | A9G040819 | amj | 2014-03-31 | JPMorgan Alerian | 14893 | 0.015203 | | A9G040819 | amj | 2014-04-30 | JPMorgan Alerian | 15514 | 0.041684 | | A9G040819 | wabix | 2013-12-31 | GMO Benchmark | 34751 | 0.004072 | | A9G040819 | wabix | 2014-01-31 | GMO Benchmark | 34129 | -0.017905 | | A9G040819 | wabix | 2014-02-28 | GMO Benchmark | 49970 | 0.0202 | | A9G040819 | wabix | 2014-03-31 | GMO Benchmark | 50371 | 0.008028 | | A9G040819 | wabix | 2014-04-30 | GMO Benchmark | 50995 | 0.012389 |

NB:你可以使用ISNULL()而不是COALESCE()

[EDITS]对NetValue上的数据类型进行了更正,并在enddate上进行了更改,但这是可选的

A slightly different approach, but still using a Cartesian produce and an APPLY operator (OUTER APPLY is needed in this one). To get 0 where you do not want the prior value carried forward, just amend the COALESCE() accordingly.

SQL Fiddle

MS SQL Server 2014 Schema Setup:

CREATE TABLE Allbucket ([CustodianAccountNum] varchar(9), [symbol] varchar(5), [EndDate] datetime, [ManagerName] varchar(16), [MarketValue] numeric , [NetReturn] decimal(12,6)) ; INSERT INTO Allbucket ([CustodianAccountNum], [symbol], [EndDate], [ManagerName], [MarketValue], [NetReturn]) VALUES ('A9G040819', 'wabix', '2013-12-31 00:00:00', 'GMO Benchmark', 34751.10987, 0.004072), ('A9G040819', 'wabix', '2014-01-31 00:00:00', 'GMO Benchmark', 34128.88767, -0.017905), ('A9G040819', 'wabix', '2014-02-28 00:00:00', 'GMO Benchmark', 49969.8081, 0.0202), ('A9G040819', 'wabix', '2014-03-31 00:00:00', 'GMO Benchmark', 50370.993, 0.008028), ('A9G040819', 'wabix', '2014-04-30 00:00:00', 'GMO Benchmark', 50995.0584, 0.012389), ('A9G040819', 'amj', '2013-12-31 00:00:00', 'JPMorgan Alerian', 1234.55, -0.008154), ('A9G040819', 'amj', '2014-02-28 00:00:00', 'JPMorgan Alerian', 14849.76, -0.018599), ('A9G040819', 'amj', '2014-03-31 00:00:00', 'JPMorgan Alerian', 14892.8, 0.015203), ('A9G040819', 'amj', '2014-04-30 00:00:00', 'JPMorgan Alerian', 15513.6, 0.041684) ;

Query 1:

SELECT s.CustodianAccountNum , s.symbol , d.enddate , COALESCE(ab.ManagerName, ap.ManagerName) AS ManagerName , COALESCE(ab.MarketValue, ap.MarketValue) AS MarketValue , COALESCE(ab.NetReturn, 0) AS NetReturn FROM ( SELECT CustodianAccountNum , symbol , MIN(enddate) symstart , MAX(enddate) symend FROM Allbucket GROUP BY CustodianAccountNum , symbol ) s JOIN ( SELECT DISTINCT cast(enddate as date) as enddate FROM Allbucket ) d ON d.enddate BETWEEN s.symstart AND s.symend LEFT JOIN Allbucket ab ON s.CustodianAccountNum = ab.CustodianAccountNum AND s.symbol = ab.symbol AND ab.enddate = d.enddate OUTER APPLY ( SELECT TOP 1 t.* FROM Allbucket t WHERE s.CustodianAccountNum = t.CustodianAccountNum AND s.symbol = t.symbol AND d.enddate <= t.enddate ORDER BY d.enddate DESC ) ap

Results:

| CustodianAccountNum | symbol | enddate | ManagerName | MarketValue | NetReturn | |---------------------|--------|------------|------------------|-------------|-----------| | A9G040819 | amj | 2013-12-31 | JPMorgan Alerian | 1235 | -0.008154 | | A9G040819 | amj | 2014-01-31 | JPMorgan Alerian | 14850 | 0 | | A9G040819 | amj | 2014-02-28 | JPMorgan Alerian | 14850 | -0.018599 | | A9G040819 | amj | 2014-03-31 | JPMorgan Alerian | 14893 | 0.015203 | | A9G040819 | amj | 2014-04-30 | JPMorgan Alerian | 15514 | 0.041684 | | A9G040819 | wabix | 2013-12-31 | GMO Benchmark | 34751 | 0.004072 | | A9G040819 | wabix | 2014-01-31 | GMO Benchmark | 34129 | -0.017905 | | A9G040819 | wabix | 2014-02-28 | GMO Benchmark | 49970 | 0.0202 | | A9G040819 | wabix | 2014-03-31 | GMO Benchmark | 50371 | 0.008028 | | A9G040819 | wabix | 2014-04-30 | GMO Benchmark | 50995 | 0.012389 |

nb: You could use ISNULL() instead of COALESCE()

[EDITS] made correction for data type on NetValue, & change on enddate to date but that is optional

更多推荐